Bonsai :: Bioinformatics Software Server

This page is a usr manual for the web application regliss.

Submission form

Enter the RNA sequence. The sequence should be in FASTA format. A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a greater-than (">") symbol in the first column.

Example of FASTA format:

> Name of the sequence 
ctgcgagcgcgcgatgatagcgcggcgagcatgtagcatgctagctgtcgcgagcact
cggccgagatcaggcgatgcatgcgcagggagcagcgagcgacgagcacagcatgcta
gctagatgcatgctgtaggcagcgccgagagacgatggagctgc

Lower-case and upper-case letters are both accepted. The full standard IUPAC nucleic acid code is not supported: only A, C, G, T and U symbols are recognized. Numerical digits 0, ..., 9, - and dot . symbols are not accepted.

Choose your set of helices. regliss takes as input a set of putative helices. A helix is a set of nested base pairs. It can contains bulges or internal loops. The set of helices can be specified explicitly by the user, or computed automatically from the input sequence.

Paste your own set of helices. In this case, helices are given by the user. The decription format is as follows. Each line contains a single helix, and each helix is described in bracket-dot format. Positions that appear in the 5' part of a base pair are marked with an opening bracket and positions that appear in the 3' part of a base pair are marked with a closing bracket. So the helix contains as many base pairs as pairs of matching parentheses. Unpaired bases, such as internal loops or bulges, are indicated with a dot. The position of the helix on the sequence is given by a pair of positions, at the beginning of the line: The first number is the position of the first base of the helix on the sequence, from 5' to 3' starting at position 1. The other number is the position of the matching base on the sequence.

For example, this example specifies a helix starting at position 2, ending at position 25, containing 5 base pairs and one bulge at third position.
```
        2 25	((.(((		)))))
```
This corresponds to this helix on the sample example given in the preceding section.
```
  ctgcgagcgcgcgatgatagctccaggcgagcatgtagcatgctagctgtcgcgagcact
   ((.(((             ))))) 
```
The set of helices can contain as many helices as wanted. Helices can be embedded or overlapping.
Example of embedded helices: the three last helices are embedded in the first one.
```
     3 67  ((((((( )))..))))
     4 66  (((((( )))..)))
     3 67  (((( ))))
     7 61  ((( )))
```

Upload a file. You can upload a text file containig a set of helices. The description format is the same as above.

Compute helices with Unafold (Mfold). With this option, helices are automatically built for the input sequence using the Unafold software (ref). Unafold is used to compute all suboptimal secondary structures for the input sequence, with percent suboptimality 100%. We then proceed as follows. We first select all non-redundant suboptimal structures from Unafold's output. A structure is considered as redundant if it contains another suboptimal structure output by Unafold. We then extract all putative helices from this set of structures. Given a structure, a helix is a maximal set of nested base pairs, such that any other base pair of the structure is either nested in all base pairs of the helix, or juxtaposed with all base pairs of the helix, or all base pairs of the helix are nested in this base pair.

Compute helices with a nearest neighbour model. Helices are computed with a custom program that finds all helices without bulges and internal loops for the nearest neighbor energy model.

Note that it is possible to select multiple choices to build the set of helices. In this case, the different sets of helices are merged, keeping only one copy of each helix.

Paste the helices that must be in all structures. It is possible to force some base pairs to occur in all structures computed by regliss. This field is optional. By default, regliss outputs all locally optimal secondary structures.

Select the output size. Regliss computes all locally optimal secondary structures of the input sequence. This usually includes a very large number of structures, which are sorted according to the free energy value given by the RNAeval program from the Vienna RNA package. It is possible to limit the amount of output data either by giving a percent suboptimality (from 1 to 100) or by giving the maximal number of structures to return. By default, regliss outputs all structures within a 20% suboptimality range.

Results

All structures found by regliss are displayed in a tabular, with one color for each helix. Each line contains a structure in bracket-dot format. At the end of the line, the free energy of the structure computed by RNAeval is given. Structures are sorted by increasing free energy.

               GGUCCCGUAGCUCAGUUGGUUAGAGCGUUGGUCUUAUGAGCCGAAGGUCGCGGGUUCGAGCCCCGCCGGGACCA
 structure1    (((((((..((((.........)))).(((((.......))))).....(((((.......)))))))))))).  (-30.0)
 structure2    (((((((..((((.........)))).(((((.......))))).......((((....))))...))))))).  (-27.7)
 structure3    (((((((..(((((......................)))))........(((((.......)))))))))))).  (-27.15)
 structure4    (((((((..((((.........)))).............(((...))).(((((.......)))))))))))).  (-26.8)
 structure5    (((((((..((((.........))))..........((((((..........))))))........))))))).  (-25.7)
 structure6    (((((((........((((((.................)))))).....(((((.......)))))))))))).  (-25.53)
 structure7    (((((((..(((((......................)))))..........((((....))))...))))))).  (-24.85)
 structure8    (((((((..((((.........)))).............(((...)))...((((....))))...))))))).  (-24.6)
 structure9    (((((((........((((((.................)))))).......((((....))))...))))))).  (-23.23)
 structure10   (((((((..........(((((((((.(((((.......))))).........)))))))))....))))))).  (-22.34)

Output files. For each structure the result is stored in four formats: CT, JPEG, PS, and bracket notation. JPEG and PS files contain 2D visualization of the structures that are both automatically produced from the CT file using NAview.

CT - Connectivity Table format: This is a text file which contains the nucleic acid sequence and base pairing information, such as produced by Mfold.

bracket-dot notation: This is a fasta-like format which contains both the primary and the secondary structure. The first line contains the heading. The second line contains the sequence. The third line contains the set of base pairs encoded by brackets and dots. A base pair between i and j is represented by a ( at position i and a ) at position j. Unpaired bases are represented by dots.

Set of input helices. This section summarizes the set of input helices that have been used to construct all locally optimal secondary structures.

It is also possible to download a zip archive storing all result files (ct, ps, jpeg, bracket notation and helix file).

Energy landscape graph

The energy landscape graph gives an insight into the full energy landscape of the RNA sequence. In this graph, each vertex represents a locally optimal structure. Two structures are in the same neighborhood if they differ by at most two stems. In this case, an edge is put between the two structures.

You can explore the graph and visualize structures by clicking on the vertices. It is also possible to zoom in, zoom out (+, - buttons). When you click on a vertex, the 2D representation of the structure appears in a new window. At the same time the bracket notation of structure is displayed on the top of the graph. The structure window can be dragged and moved. Likewise it can be widened and reduced.

The graph viewer is based on the SVG format. Therefore, to visualize the graph, you need to use a web browser which is compatible with SVG.

Retrieve result with an ID

Each job is assigned an identifier, that allows to retrieve folding results. Files are stored for 24h after job submission.