NAME
    Bio::Kmer - Helper module for Kmer Analysis.

SYNOPSIS
    A module for helping with kmer analysis.

      use strict;
      use warnings;
      use Bio::Kmer;
  
      my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});
      my $kmerHash=$kmer->kmers();
      my $countOfCounts=$kmer->histogram();

      my $minimizers = $kmer->minimizers();
      my $minimizerCluster = $kmer->minimizerCluster();

    The BioPerl way

      use strict;
      use warnings;
      use Bio::SeqIO;
      use Bio::Kmer;

      # Load up any Bio::SeqIO object. Quality values will be
      # faked internally to help with compatibility even if
      # a fastq file is given.
      my $seqin = Bio::SeqIO->new(-file=>"input.fasta");
      my $kmer=Bio::Kmer->new($seqin);
      my $kmerHash=$kmer->kmers();
      my $countOfCounts=$kmer->histogram();

DESCRIPTION
    A module for helping with kmer analysis. The basic methods help count
    kmers and can produce a count of counts. Currently this module only
    supports fastq format. Although this module can count kmers with pure
    perl, it is recommended to give the option for a different kmer counter
    such as Jellyfish.

DEPENDENCIES
      * BioPerl
      * Jellyfish >=2
      * Perl threads
      * Perl >=5.10

VARIABLES
    $Bio::Kmer::iThreads
        Boolean describing whether the module instance is using threads

METHODS
    Bio::Kmer->new($filename, \%options)
        Create a new instance of the kmer counter. One object per file.

          Filename can be either a file path or a Bio::SeqIO object.

          Applicable arguments for \%options:
          Argument     Default    Description
          kmercounter  perl       What kmer counter software to use.
                                  Choices: Perl, Jellyfish.
          kmerlength|k 21         Kmer length
          numcpus      1          This module uses perl 
                                  multithreading with pure perl or 
                                  can supply this option to other 
                                  software like jellyfish.
          gt           1          If the count of kmers is fewer 
                                  than this, ignore the kmer. This 
                                  might help speed analysis if you 
                                  do not care about low-count kmers.
          sample       1          Retain only a percentage of kmers.
                                  1 is 100%; 0 is 0%
                                  Only works with the perl kmer counter.
          verbose      0          Print more messages.

          Examples:
          my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});

    $kmer->ntcount()
        Returns the number of base pairs counted. In some cases such as when
        counting with Jellyfish, that number is not calculated; instead the
        length is calculated by the total length of kmers. Internally, this
        number is stored as $kmer->{_ntcount}.

        Note: internally runs $kmer->histogram() if $kmer->{_ntcount} is not
        initially found.

          Arguments: None
          Returns:   integer

    $kmer->count()
        Count kmers. This method is called as soon as new() is called and so
        you should never have to run this method. Internally caches the kmer
        counts to ram.

          Arguments: None
          Returns:   None

    $kmer->clearCache
        Clears kmer counts and histogram counts. You should probably never
        use this method.

          Arguments: None
          Returns:   None

    $kmer->query($queryString)
        Query the set of kmers with your own query

          Arguments: query (string)
          Returns:   Count of kmers. 
                      0 indicates that the kmer was not found.
                     -1 indicates an invalid kmer (e.g., invalid length)

    $kmer->histogram()
        Count the frequency of kmers. Internally caches the histogram to
        ram.

          Arguments: none
          Returns:   Reference to an array of counts. The index of 
                     the array is the frequency.

    $kmer->kmers
        Return actual kmers

          Arguments: None
          Returns:   Reference to a hash of kmers and their counts

    $kmer->minimizers(5)
        Finds minimizer of each kmer

          Arguments: length of minimizer (default: 5)
          returns: hash ref, e.g., $hash = {AAAAA=>AAA, TAGGGT=>AGG,...}

    $kmer->minimizerCluster(5)
        Finds minimizer of each kmer

          Arguments: length of minimizer (default: 5). 
            Internally, calls $kmer->minimizer($l) 
            If $kmer->minimizer has already been called, this parameter will be ignored.
          returns: hash ref, e.g., $hash = {AAA=>[TAAAT, AAAGG,...], ATT=>[GATTC,...]}}

    $kmer->union($kmer2)
        Finds the union between two sets of kmers

          Arguments: Another Bio::Kmer object
          Returns:   List of kmers

    $kmer->intersection($kmer2)
        Finds the intersection between two sets of kmers

          Arguments: Another Bio::Kmer object
          Returns:   List of kmers

    $kmer->subtract($kmer2)
        Finds the set of kmers unique to this Bio::Kmer object.

          Arguments: Another Bio::Kmer object
          Returns:   List of kmers

    $kmer->close()
        Cleans the temporary directory and removes this object from RAM.
        Good for when you might be counting kmers for many things but want
        to keep your overhead low.

          Arguments: None
          Returns:   1

COPYRIGHT AND LICENSE
    MIT license. Go nuts.

AUTHOR
    Author: Lee Katz <lkatz@cdc.gov>

    For additional help, go to https://github.com/lskatz/Bio--Kmer

    CPAN module at http://search.cpan.org/~lskatz/Bio-Kmer/