SortMeRNA

Fast filtering, mapping and OTU picking

SortMeRNA is a program tool for filtering, mapping and OTU-picking NGS reads in metatranscriptomic and metagenomic data. The core algorithm is based on approximate seeds and allows for fast and sensitive analyses of nucleotide sequences. The main application of SortMeRNA is filtering ribosomal RNA from metatranscriptomic data. Additional applications include OTU-picking and taxonomy assignation available through QIIME v1.9+ (http://qiime.org - v1.9.0-rc1).

SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart rRNA and rejected reads into two files specified by the user. Optionally, it can provide high quality local alignments of rRNA reads against the rRNA database. SortMeRNA works with Illumina, 454, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

If you use SortMeRNA, please cite:
Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Availability

SortMeRNA is implemented in C++ and freely distributed under the GNU lesser general public license (LGPL).

The latest v2.1 (2 February 2016),

Older versions,

Ongoing development for v2.1 can be followed on GibHub.

To download the development version:

  1. clone the repository to a working directory:

    git clone https://github.com/biocore/sortmerna.git

  2. or, download the repository as a zipped folder by clicking on the button "Download ZIP" from the main repository page

Installation

For v2.1, see the README.md page or the User Manual for v2.1 (also included with the distribution package).

Good to Know

SortMeRNA is known to have linking problems using Apple's llvm-gcc compiler (ex. llvm-gcc-4.2), as of Xcode 5 the llvm-gcc compiler has deprecated (see "Deprecation and Removal Notice"). Please use either Clang or the original GCC compiler (installable from MacPorts) (see README.md for details).

Additionally, Clang does not easily support OpenMP for multithreading and thus it is strongly recommended to use the original GCC compiler (versions 4.0 and up, available through MacPorts) if multithreading is desired.

Bug Reports

Users are encouraged to submit bug reports or issues to the SortMeRNA issue tracker on GitHub, or sending an e-mail to the authors.

News!

2 Fev 2016 : SortMeRNA version 2.1 released
major updates:

  • Issues #48, #70 and #72 resolved
  • New format for parameter '--blast'
  • Thanks to Björn Grüning (University of Freiburg) and Nicola Soranzo (The Genome Analysis Center, UK) for updating SortMeRNA's galaxy wrapper to 2.0
  • Thanks to Andreas Tille and the Debian Med team for adding SortMeRNA as a Debian package
  • Thanks to Ben Woodcroft (Australian Center for Ecogenomics) for adding SortMeRNA as a GNU Guix package
  • Thanks to Daniel White (Landcare Research, NZ) for bringing up that various 18S rRNA sequences in the 18S representative database can give higher scoring matches to DNA than rRNA (ex. SILVA ID DaiRer10 to Zebrafish DNA). This results in data assumed to be void of rRNA to contain matches to the 18S representative database.

31 Oct 2014 : SortMeRNA version 2.0 released
major updates:

  • Representative Silva databases updated to version 119 (see User Manual 2.0 for details)
  • OTU-picking extensions added for closed-reference clustering compatible with QIIME’s v1.9 pick_otus.py, pick_closed_reference_otus.py and pick_open_reference_otus.py scripts
  • Thanks to Manikandan Vinu (KAUST), Ludovic Giloteaux (Cornell University) and Tanay Ghosh (Cambridge University) for bug reports (FASTQ parsing and paired reads analysis)
  • Thanks to Björn Grüning (University of Freiburg) for updating the SortMeRNA Galaxy wrapper (for version 1.9), modifications for version 2.0 under development in the Bonsai lab

11 Mar 2014 : SortMeRNA version 1.99 beta released
major updates:

30 Aug 2013 : SortMeRNA version 1.9 released
minor update: fixed a bug for naming output log file (thanks to Shaman Narayanasamy) and updated merge-paired-reads.sh to work on a cluster (thanks to Nicolas Delhomme).

31 May 2013 : updated Makefile for SortMeRNA version 1.8
The paths for binaries sortmerna and buildtrie have been corrected to work with `make install` for installation directories other than the default /usr/local. Please copy all of the files here into your /sortmerna-1.8/ directory before running `configure`.

13 May 2013 : SortMeRNA version 1.8 released
minor update: fixed a bug to detect last (rRNA) read in fastq files, modified merge/unmerge-paired-reads.sh

05 May 2013 : SortMeRNA is now available through Galaxy (thanks to Jean-Frédéric Berthelot)! Download it here.

17 Apr 2013 : Pre-built SortMeRNA databases
Add the contents of this folder to your `/sortmerna/automata` folder and sortmerna can be launched directly without running buildtrie beforehand. This is a good option for users with less than 4GB of RAM available for indexing the databases.

05 Apr 2013 : SortMeRNA version 1.7 released
major update: changed to the usual `configure, make, make install' (see the user manual v-1.7),
added `merge_paired_reads.sh' for forward-reverse paired-end reads (see the user manual v-1.7, section 4.2.4)
minor update: fixed an integer overflow for mmap calculation for 32-bit systems

26 Feb 2013 : SortMeRNA version 1.6 released
For taxonomical analysis, the sequence tags in the rRNA databases now follow the format:
     ">[accession] [taxonomy] [length]"
minor update: changed sysconf library to sysctl for Mac OS

15 Feb 2013 : SortMeRNA version 1.5 released
major update: error for output of paired reads >1GB resolved, option -m added for specifying the amount of memory for loading reads, reads of length <L (default L=18) are automatically considered as non-rRNA
minor update: local timestamp added to --log statistics file, SortMeRNA User Manual updated

06 Feb 2013 : SortMeRNA version 1.4 released
minor update: support for Illumina or 454 reads up to 5000 nucleotides, AUTHORS file added to SortMeRNA directory

24 Jan 2013 : SortMeRNA version 1.3 released
major update: paired-end reads support
minor update: L=20 error messages resolved, I/O file checks modified, Makefile updated

07 Jan 2013 : SortMeRNA version 1.2 released
minor update: input directory can be without suggested path (assumes current)

20 Dec 2012 : SortMeRNA version 1.1 released
minor update: input files without extensions allowed

15 Oct 2012 : SortMeRNA version 1.0 released


Thanks to everyone for bug reports and feedback!