Protea

This page is a user manual for protea web application.

Submission form

Your data set should include at least two distinct nucleotide sequences, and sequences should be in FASTA format. A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 characters.

Lower-case and upper-case letters are both accepted. The full standard IUPAC nucleic acid code is not supported: only A, C, G, T and U symbols are recognized. Numerical digits 0, ..., 9, - and dot . symbols are accepted. They are simply ignored by Protea.

Use both strands: By default, protea tries to translate the sequences on the direct strand and on the reverse strand.

Produce a multiple alignment: When the sequences are classified as protein-coding, Protea buids a multiple sequence alignment for the predicted amino acid sequences, and then back-translate these alignments to nucleotide sequences. Three programs are available for the construction of the multiple sequence alignment: ClustalW, Dialign2 and T-coffee.

Results page

Prediction: The value can be either CODING, or OTHER. The P-value gives additional evidence to the prediction.

When the sequences are classified as CODING, the web interface displays a multiple sequence alignment of the amino acid sequences obtained. It also gives show the RNA alignment obtained by back-translation. Each color corresponds to a codon.

Retrieve result with an ID

Each job is assigned an identifier, that allows to retrieve folding results. Files are stored for one week after job submission.

2013 - Bonsai bioinformatics