Bonsai :: Bioinformatics Software Server

Input sequences

Your data set should include at least two distinct RNA sequences, and sequences should be in FASTA format.

Lower-case and upper-case letters are both accepted. The full standard IUPAC nucleic acid code is not supported: only A, C, G, T and U symbols are recognized. Numerical digits 0, ..., 9, - and dot . symbols are accepted. They are simply ignored.

Options

Amino acid sequence multiple alignment - If the sequences are recognized as protein coding sequences, then a multiple alignment is built based on the putative amino acid sequences. This option allows the user to select the alignment tool: ClustalW (credits), Dialign2 (credits) or T-coffee (credits).

Output files

Protein coding model

If the sequences are classified as protein coding sequences, then two multiple alignments are displayed. The first alignment is built on the putative amino acid sequences obtained by virtual translation using the predicted reading frame. This alignment is computed with ClustalW (credits). The second alignment is the corresponding alignment on nucleic sequences obtained by reverse translation. Both alignments are available in Clustal format.

Non-coding RNA model

The multiple alignment is displayed in a Clustal-like format. The first part contains the primary structure. The second part contains the secondary structure. Base pairings are indicated in bracket-dot format. For each sequence, the individual putative secondary structure is also provided. The result is stored in five formats: CT, JPEG, PS, bracket-dot format, and as a list of contrained base pairings. JPEG and PS files are graphical files that are both automatically produced from the CT file using NAview.

CT format - it provides a textual description of the base pairings. the syntax is as follows: columns 1, 3, 4, and 6 redundantly give sequence indices, column 2 gives the sequence, and column 5 gives j in position i if i-j is a base pair, otherwise this is zero. the heading of the file contains the size of the sequence and its name (found in the header of the Fasta sequence).

bracket-dot format - it consists of three lines. The first line contains a FASTA-like header. The second line contains the nucleic sequence. The last line contains the set of associated pairings encoded by brackets and dots. A base pair between bases i and j is represented by a ( at position i and a ) at position j. Unpaired bases are represented by dots. The lack of pseudoknots in the secondary structure ensures that this notation defines a unique folding.

    >trna E. coli
    ggggcuauagcucagcugggagagcgccugcuuugcacgcaggaggucugcgguucgaucccgcauagcuccacca 
    (((((((..((((........)))).(((((.......))))).....(((((........))))))))))))...

List of constraints - this file describes the structure as a set of initial contraints for external programs, such as Kinefold or Mfold. Each line corresponds to one helix: F i j k forces the formation of base pairs i.j, i+1.j-1, ... , i+k-1.j-k+1.

Retrieve result with an ID