What is Magnolia ?
Magnolia is a program suite to classify RNA sequences as protein-coding or noncoding genes by comparative analysis. It also produces advanced multiple sequence alignment for the data.
Magnolia extracts information from the similarities and differences in the data, and searches for a specific evolutionary pattern. It combines two evolutionary models.
- Protea for protein-coding sequences. In this first model, the selection pressure tends to preserve the encoded amino acid sequence. For this reason, the mutations are governed by the redundancy of the genetic code: silent mutations or mutations leading to similar amino acids are privileged between codons of the correct reading frame. So it is possible to identify coding sequences by looking for a global conservation of common reading frames.
- caRNAc for noncoding RNA genes. In this latter model, the selection pressure tends to preserve the spatial structure of the molecule. The secondary structure, formed by isosteric base pairs, is more highly conserved than is the sequence. This means that mutations should retain the ability to form base pairs into energically favourable stems. In this context, caRNAc looks for a significantly conserved common secondary structure.
If the sequences are classified as protein coding sequences, then the nucleic multiple alignment is built from the hypothetical amino-acid sequences using ClustalW. This process is able to handle frameshifts. If the sequences are classified as noncoding RNA genes, then the multiple alignment takes into account both the primary structure and the predicted common secondary structure of the RNA sequences. This is done with the gardenia tool.
Availability
You can use Magnolia via the web interface.
Reference
Arnaud Fontaine, Antoine de Monte, Helene Touzet.
MAGNOLIA: multiple alignment of protein-coding and structural RNA sequences.
Nucleic Acids Research 36 (Web-Server-Issue): 14-18, 2008 [pubmed]