SortMeRNA

Frequently Asked Questions

(1) How does SortMeRNA distinguish between chimeric reads (i.e. reads which map to multiple databases)?

If multiple databases are provided as input to SortMeRNA, each read will be classified to the highest scoring database (based on E-value), independent of its order in the command. If the read scores equally to two or more databases, it will be classified to the first matching database in the command. Although there is no direct option to output chimeric reads into both files, the user may run SortMeRNA using one database at a time and then compare the resulting files to detect chimeric reads.

(2) SortMeRNA does not classify all of the rRNA reads that I expect.

The raw public rRNA databases such as SILVA or Greengenes can contain partial non-rRNA sequences, such as mRNA, miscRNA or other non-coding rRNAs. The accuracy of any program which filters against a database depends upon the quality of the database used. The representative databases distributed with SortMeRNA have been filtered to remove many of these foreign RNAs (please see Section 2.1 of the Supplementary data). An additional filter in SortMeRNA (version 1.9 and below) uses k-mer frequencies to mask k-mers which occur rarely in the database, therefore even if a foreign RNA exists in the database, it will be "masked" and not considered in any further analyses. If a user is certain their set of reads contains 100% rRNA, then SortMeRNA will perform with high accuracy to assign a domain and subunit to each of the reads based on the databases provided. However, if the set of reads is contaminated with other RNAs, SortMeRNA will remain sensitive only to the true rRNAs in the set.

(3) Can SortMeRNA assign taxonomic classification to the detected rRNA reads?

The data structures used by SortMeRNA allow for this option in the future, however it is not available at the moment. If the user wishes to study the taxonomy of their classified rRNA reads, they may run BLASTN on the output files of SortMeRNA and input the resulting alignment files to MEGAN 4 for performing taxonomic analyses. Depending on the abundancy of rRNA in the raw metatranscriptomic reads set, either the entire output file of SortMeRNA can be input to BLASTN, or simply a sample of the reads. In both cases, the user will have an accurate idea of the species within the metatranscriptomic set. For 16S rRNA only, the user may input their SortMeRNA results into the RDP Classifier.

(4) How does SortMeRNA deal with two input read files for forward-reverse paired-end sequencing?

Please refer to the example given in section 4.2.4 of the user manual.

SortMeRNA forward-reverse paired-end format