NAME Bio::Kmer - Helper module for Kmer Analysis. SYNOPSIS A module for helping with kmer analysis. use strict; use warnings; use Bio::Kmer; my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4}); my $kmerHash=$kmer->kmers(); my $countOfCounts=$kmer->histogram(); my $minimizers = $kmer->minimizers(); my $minimizerCluster = $kmer->minimizerCluster(); The BioPerl way use strict; use warnings; use Bio::SeqIO; use Bio::Kmer; # Load up any Bio::SeqIO object. Quality values will be # faked internally to help with compatibility even if # a fastq file is given. my $seqin = Bio::SeqIO->new(-file=>"input.fasta"); my $kmer=Bio::Kmer->new($seqin); my $kmerHash=$kmer->kmers(); my $countOfCounts=$kmer->histogram(); DESCRIPTION A module for helping with kmer analysis. The basic methods help count kmers and can produce a count of counts. Currently this module only supports fastq format. Although this module can count kmers with pure perl, it is recommended to give the option for a different kmer counter such as Jellyfish. DEPENDENCIES * BioPerl * Jellyfish >=2 * Perl threads * Perl >=5.10 VARIABLES $Bio::Kmer::iThreads Boolean describing whether the module instance is using threads METHODS Bio::Kmer->new($filename, \%options) Create a new instance of the kmer counter. One object per file. Filename can be either a file path or a Bio::SeqIO object. Applicable arguments for \%options: Argument Default Description kmercounter perl What kmer counter software to use. Choices: Perl, Jellyfish. kmerlength|k 21 Kmer length numcpus 1 This module uses perl multithreading with pure perl or can supply this option to other software like jellyfish. gt 1 If the count of kmers is fewer than this, ignore the kmer. This might help speed analysis if you do not care about low-count kmers. sample 1 Retain only a percentage of kmers. 1 is 100%; 0 is 0% Only works with the perl kmer counter. verbose 0 Print more messages. Examples: my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4}); $kmer->ntcount() Returns the number of base pairs counted. In some cases such as when counting with Jellyfish, that number is not calculated; instead the length is calculated by the total length of kmers. Internally, this number is stored as $kmer->{_ntcount}. Note: internally runs $kmer->histogram() if $kmer->{_ntcount} is not initially found. Arguments: None Returns: integer $kmer->count() Count kmers. This method is called as soon as new() is called and so you should never have to run this method. Internally caches the kmer counts to ram. Arguments: None Returns: None $kmer->clearCache Clears kmer counts and histogram counts. You should probably never use this method. Arguments: None Returns: None $kmer->query($queryString) Query the set of kmers with your own query Arguments: query (string) Returns: Count of kmers. 0 indicates that the kmer was not found. -1 indicates an invalid kmer (e.g., invalid length) $kmer->histogram() Count the frequency of kmers. Internally caches the histogram to ram. Arguments: none Returns: Reference to an array of counts. The index of the array is the frequency. $kmer->kmers Return actual kmers Arguments: None Returns: Reference to a hash of kmers and their counts $kmer->minimizers(5) Finds minimizer of each kmer Arguments: length of minimizer (default: 5) returns: hash ref, e.g., $hash = {AAAAA=>AAA, TAGGGT=>AGG,...} $kmer->minimizerCluster(5) Finds minimizer of each kmer Arguments: length of minimizer (default: 5). Internally, calls $kmer->minimizer($l) If $kmer->minimizer has already been called, this parameter will be ignored. returns: hash ref, e.g., $hash = {AAA=>[TAAAT, AAAGG,...], ATT=>[GATTC,...]}} $kmer->union($kmer2) Finds the union between two sets of kmers Arguments: Another Bio::Kmer object Returns: List of kmers $kmer->intersection($kmer2) Finds the intersection between two sets of kmers Arguments: Another Bio::Kmer object Returns: List of kmers $kmer->subtract($kmer2) Finds the set of kmers unique to this Bio::Kmer object. Arguments: Another Bio::Kmer object Returns: List of kmers $kmer->close() Cleans the temporary directory and removes this object from RAM. Good for when you might be counting kmers for many things but want to keep your overhead low. Arguments: None Returns: 1 COPYRIGHT AND LICENSE MIT license. Go nuts. AUTHOR Author: Lee Katz <lkatz@cdc.gov> For additional help, go to https://github.com/lskatz/Bio--Kmer CPAN module at http://search.cpan.org/~lskatz/Bio-Kmer/