Usage tests
To run SortMeRNA usage tests (always updated to the latest SortMeRNA master branch on GitHub), download the testing data, edit the self.root path in sortmerna/tests/test_sortmerna.py to point to the testing data directory and then run python /path/to/test_sortmerna.py.
Supplementary data
The supplementary data to accompany: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.
Below are links to all simulated databases and reads for the benchmarking of SortMeRNA. The ordering follows that of the Supplementary data.
Section 2.1, Construction of rRNA databases: Set 1, Set 2, Set 3 and Set 4
- Set 1: 80% identity 16S rRNA bacteria & archaea (2262 sequences)
- Set 2: 80% identity 16S rRNA bacteria & archaea + truncated phylogeny tree (2187 sequences)
- Set 3: 95% identity 23S rRNA bacteria & archaea (1969 sequences)
- Set 4: 95% identity 23S rRNA bacteria & archaea + truncated phylogeny tree (1906 sequences)
Click here for the bash script used to create each representative database from the raw SILVA SSU NR Ref 108 and SILVA LSU Ref 108 databases.
Section 2.2, Simulated reads
- Set 1 : 1,000,000 Illumina rRNA reads (100nt)
- Set 1 : 300,000 Roche 454 rRNA reads (≥200nt)
- Set 2 : 1,000,000 Illumina rRNA reads (100nt)
- Set 2 : 300,000 Roche 454 rRNA reads (≥200nt)
- Set 3 : 1,000,000 Illumina rRNA reads (100nt)
- Set 3 : 300,000 Roche 454 rRNA reads (≥200nt)
- Set 4 : 1,000,000 Illumina rRNA reads (100nt)
- Set 4 : 300,000 Roche 454 rRNA reads (≥200nt)
- Set 5 : 1,000,000 Illumina rRNA reads (100nt)
- Set 5 : 300,000 Roche 454 rRNA reads (≥200nt)
- Set 6 : 1,000,000 Illumina rRNA reads (100nt)
- Set 6 : 300,000 Roche 454 rRNA reads (≥200nt)
Click here for the bash/perl scripts to mask all rRNA sequences in the NCBI bacterial genomes.
Click here for the bash script used to filter Roche 454 MetaSim generated reads (similar steps for Illumina reads).
SRR106861 and SRR013513 metatranscriptomic reads
- SRR106861 original (113128 reads)
- SRR106861 PRINSEQ filtered (105873 reads)
- SRR013513 original (238250 reads)
- SRR013513 PRINSEQ filtered (207368 reads)
Section 3.1, Commands
Click here for the bash script used to run BLASTN, and here
for the BioPerl parser used to choose sequences with e-value <1e-5, coverage ≥50% and identity threshold ≥75%.
Click here for the bash script used to run SSU-ALIGN
Click here for the bash script used to run riboPicker
Click here for the bash script used to run Meta-RNA
Section 3.2, rRNA databases