Jellyfish mer counter
What is it?
Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers quickly by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.
Jellyfish is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.
Jellyfish is distributed as source code under the GPL license. Jellyfish is developed on Linux 64-bit (x86_64). It requires gcc version 4.4 or newer to compiles. It is reported to compile on Linux with the clang compiler version 3.0, MacOS X (Intel 64 bit) with gcc version 4.7 and Microsoft Windows 7 with cygwin and gcc.
The current version is version 2.0. The older version 1.1 is still available from the CBCB group at the University of Maryland. The current version does not have any limitation on the size of k-mers, unlike version 1.1 which was limited to k <= 31. The support for Quake has been dropped in the new version, use version 1.1 with Quake. The User guide gives some information on how to use Jellyfish and the differences between the new and old versions.
The following sequence should be enough to compile:
sudo make install
On RedHat 5 and 6 (or CentOS), the default compiler is too
old. One needs to install version 4.4 with
gcc44-c++ and then run configure like this:
To install in a different directory than the default /usr/local,
pass the --prefix switch to configure. For example, to install into
one's home directory, do:
On MacOS, a recent version of gcc can be install
with MacPorts. Install with:
sudo port install
gcc49 sudo port select --set gcc mp-gcc49
The first command can
take a while to run. The last command is optional and CXX=g++49 can be
passed to ./configure instead, like above.
Bindings to Ruby, Python and Perl
By using one of the switch '--enable-ruby-binding', '--enable-python-binding' or '--enable-perl-binding', one triggers the compilation of bindings to scripting languages. This allows to query the output of Jellyfish directly from these languages.
More documentation and examples are available on the github page. Note that, if one uses the distribution on this page, SWIG and the '--enable-swig' switch are NOT necessary to build the bindings (which are necessary if building from the github tree).
- Bug fix: ignore quality filter for fasta file
- SWIG binding for Ruby, Python and Perl available.
- Added a SWIG directory with bindings to Ruby, Python and Perl. Still experimentatl
- Added an example directory on how to use the library
- Removed many unused files from previous version (1.x)
- Various bug fixes
Version 2.1.3 (Bug fix)
- Fixed compilation problem on CentOS.
- Added an interactive mode to the query subcommand. It enables spawning a 'jellyfish query -i' from a script and query the database
- Speed improvement to the query subcommand. The binary search is guided by the hash ordering
Version 2.1.1 (Bug fix)
- Fixed compilation issues with gcc 4.4.
- Fixed testing issues with gcc 4.8
- Added stats subcommand, similar to existing subcommand in version 1.x
- Added filtering of input bases by their quality value. Similar feature existing in version 1.x but the command line switches are not compatible