Best hits of 11110110111 : model-free selection and parameter-free sensitivity calculation of spaced seeds

This page contains experimental datasets, results and associated scripts related to the paper Best hits of 11110110111 : model-free selection and parameter-free sensitivity calculation of spaced seeds and its accompanying set of plots (pdf [dec 12, 2016] or html [dec 12, 2016]).

This content is related to the section Experiments of the paper.

The C++ code used to compute polynomials is integrated into the Iedera 1.06 α9 tool available at http://bioinfo.lifl.fr/yass/iedera.php

6. Experiments

Dominance Pipeline

I.
 dominance_iedera.sh
   +
 iedera v1.06 α9
 →  polynomial files :
   _n[1-4]_w[3-16]_s[6-32]_l[6-64].ied
set of dominant seeds, data in txt format (tar.gz [dec 12, 2016])
II.
 pareto-only_{polynomial|discrete}.py
    +
 ({maxima|maple})
 →  seed selection files :
   _n[1-4]_w[3-16]_s[6-32]_l[6-64]
   _{bernoulli|(reverse)integral|dirac|(reverse)heaviside}
   _python(_{maxima|maple}).txt

set of best seeds, data in txt format (tar.gz [dec 12, 2016])
 → 
III.
 plot.py
   +
 (spaced_seed_list.py)
   _n[1-4]_w[3-16]_s[6-32]
   _{bernoulli|(reverse)integral|dirac|(reverse)heaviside}.pdf

best seeds pareto plots (pdf [dec 12, 2016] or html [dec 12, 2016])
  1. Spaced seeds of weight w in [3..16], span s2 × w, with a number of seed patterns n in [1..4], have been, either fully enumerated when n = 1 and w14, otherwise locally optimized by the hill-climbing process of iedera. Alignment lengths l in [2 × w .. 64] have also been fully enumerated, and (possibly locally) dominant seeds have been selected for each experiment into several *.ied files (e.g. _n3_w8_s16_l27.ied, for 3 seeds of weight 8, span at most 16 on alignments of length 27), together with their polynomial coefficients representation.
  2. Dominant seeds have then been processed according to one of the 6 criteria described in the paper (Bernoulli, ∫ 0xHit Integration and ∫ x1Hit Integration, Dirac, Σ 0xHeaviside and Σ x1Heaviside) and the pareto envelopes have been selected with a python script, and with help of a CAS software (maple or maxima) for continous models, into several *.txt files (e.g. _n3_w8_s16_l27_bernoulli_maxima.txt, for a bernoulli model evaluated with maxima ).
  3. For a given number of seeds n of weight w and a given criterion, the set of optimal seeds (pareto set) has been plotted for all the alignment lengths l considered.

The final set of pareto plots (pdf [dec 12, 2016] or html [dec 12, 2016]), data in txt format (tar.gz [dec 12, 2016]).

Tools :

Data & Results :