procars.step_modules subpackage

The step_modules subpackage contains modules for the different steps of ProCARs, which are:

Modules:

compute_initialpqtree module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


compute_initialpqtree module description:

This module reads the source files (tree file and blocks file), stores information, and fills a CAR file with an initial set of CARs, each composed of a single block.

It needs, as input :

  • a species tree where a node is marked as the ancestor: @
  • a set of universal synteny blocks
  • a CAR file name (in which initial PQtree will be saved)

Module author: Aïda Ouangraoua, Amandine PERRIN

February 2014

procars.step_modules.compute_initialpqtree.commons_read_blocks(block_file_name, nb_species, spe_ids, binary_file_name)[source]

Function reading blocks from a block file and computing block neighbors. Saves all information about blocks into a binary file, used in the next steps

Parameters:

block_file_name : String

Name of file containing the ordered blocks for all species

nb_species : int

Total number of species

spe_ids : dict

Species and their corresponding ID {spe1: id1, spe2: id2, ...}

binary_file_name : String

Name of the file in which saving all information about the blocks

Returns:

int

number of blocks

procars.step_modules.compute_initialpqtree.commons_read_tree(tree_file, binary_file_name)[source]

Reading, and computing parameters and partitions from a tree file, and saving values computed in a binary file.

Parameters:

tree_file_name : String

Name of file containing species tree

binary_file_name : String

Name of file where results must be stored

Returns:

list

nb_species: int, total number of species

spe_ids: dict, {species1: id1, species2: id2, ...}

procars.step_modules.compute_initialpqtree.compute_blocks_neighbors(nb_blocks, nb_species, blocks)[source]

Function computing block neighbors

Parameters:

nb_blocks : int

Total number of blocks

nb_species : int

Total number of species

blocks : list

List of lists

blocks = [[list of blocks for spe_id=0], [list of blocks for spe_id=1], ...]

with list of blocks for spe_id=1 = [[list of blocks for spe_id=1, chromosome 1], [list of blocks for spe_id=1, chromosome 2], ...]

with list of blocks for spe_id=1, chromosome 1 = [[start, end, num_bloc, orient], [start, end, num_bloc, orient], ... for the n blocks]

Returns:

tuple

left : array such that left[spe_id][block_id] is an array containing the left neighbor of block block_id in spe_id

right : array such that right[spe_id][block_id] is an array containing the right neighbor of block_id in spe_id

procars.step_modules.compute_initialpqtree.compute_parameters_from_tree(tree_file_name)[source]

Function computing parameters from a tree file

Parameters:

tree_file_name : String

Name of file containing the species tree to read to extract parameters

Returns:

tuple

tree: dict, tree structure

ancestor_id: int, id of ancestor node in tree

path: list, array of 0 and 1 indicating nodes that are ancestor of ancestor node

leaves: list, array of integers containing a list of leave ids

internals: list, integer array containing an ordered list of internal nodes ids in a postfix order with ancestor node as root

procars.step_modules.compute_initialpqtree.compute_partition_and_species_id(tree, ancestor_id)[source]

Function calling the method partitionning the tree into 3 groups defined by the ancestor, and determining the 3 groups of species IDs, as well as the list of species with their corresponding ID.

Parameters:

tree : dict

Tree structure :

{node_id: [node_name, parent_id, left_child_id, right_child_id], ...}

Where id = -1 if no parent/left_child/right_child And node_name = “N” + node_id for internal nodes, or leaf_name for leaf nodes

ancestor_id : int

ID of ancestor node in tree

Returns:

tuple

species1_id: integer array, ids of group1 species

species2_id: integer array, ids of group2 species

species3_id: integer array, ids of group3 species

spe_ids: map such that spe_ids[spe_name] = spe_id

procars.step_modules.compute_initialpqtree.list_leaves_internals(tree, root_id, path)[source]

Function computing a list of leaves and a postfix ordered list of internal nodes for a tree

Recursive algorithm visiting nodes from ancestor node to leaves, in a prefix order.

Parameters:

tree : dict

Tree structure :

{node_id: [node_name, parent_id, left_child_id, right_child_id], ...}

where id = -1 if no parent/left_child/right_child and node_name = “N” + node_id for internal nodes, or leaf_name for leaf nodes

root_id : int

The node considered as the new root of the tree

path : list

0-1 array such that path[node_id] = 1 if node is on the path between ancestor node and root, and otherwise path[node_id]=0

Returns:

list

leaves_internals: list containing two lists: a list of leaves ids and an ordered list of internal node ids

procars.step_modules.compute_initialpqtree.main(tree_file_name, block_file_name, car_file_name, tree_bin, blocks_bin, adj_file)[source]

Main method.

Reads tree file and block file, stores information into binary files, and creates a CAR file with an initial set of CARs, each composed of a single block. Also creates an empty adjacency file (no adjacency added at step 0).

Parameters:

tree_file_name : string

Name of the species tree to read

block_file_name : string

Name of the blocks file to read

car_file_name : string

Name of PQtree file in which saving the initial set of CARs

tree_bin : string

Name of binary file in which saving all information found in tree file

blocks_bin : string

Name of binary file in which saving all information found in blocks file

adj_file : string

Name of adjacency file in which saving adjacencies added at step0 (= no adjacency -> empty file)

Returns:

int

total number of blocks

procars.step_modules.compute_initialpqtree.mark_ancestors(tree, ancestor_id)[source]

Method marking tree nodes which are on the path between the ancestor node and the root node

Parameters:

tree : dict

Tree structure :

{node_id: [node_name, parent_id, left_child_id, right_child_id], ...}

where id = -1 if no parent/left_child/right_child and node_name = “N” + node_id for internal nodes, or leaf_name for leaf nodes

ancestor_id : int

ID of ancestor node in tree

Returns:

list

0-1 array such that path[node_id] = 1 if node is on the path between ancestor node and root, and otherwise path[node_id] = 0

procars.step_modules.compute_initialpqtree.partition_species(species1, species2, species3, tree, ancestor_id, cur_id, partition_id)[source]

Function computing the partition of species defined by the ancestor node

Recursive algorithm

Parameters:

species1 : list

Species names in group 1 array to be filled

species2 : list

Species names in group 2 array to be filled

species3 : list

Species names in group 3 array to be filled

tree : dict

Tree structure :

{node_id: [node_name, parent_id, left_child_id, right_child_id], ...}

Where id = -1 if no parent/left_child/right_child And node_name = “N” + node_id for internal nodes, or leaf_name for leaf nodes

ancestor_id : int

ID of ancestor node in tree

cur_id : int

ID of current node during the tree transversal

partition_id : int

ID of current group in partition

Returns:

list

species1: String array, names of group1 species

species2: String array, names of group2 species

species3: String array, names of group3 species

procars.step_modules.compute_initialpqtree.write_init_cars_file(car_file_name, nb_blocks)[source]

Creates a CAR file, with an initial set of CARs, each composed of a single block

Parameters:

car_file_name : String

File in which saving the CARs

nb_blocks : int

Total number of blocks = number of CARs in the output file

compute_conserved_adjacencies module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


compute_conserved_adjacencies module description:

From a CAR file, this module computes a set of conserved adjacencies between cars for the ancestor node, and it splits this set into :

  • a set of fully conserved adjacencies and a set of partly conserved adjacencies first
  • among the two sets, a set of non-conflicting adjacencies and a set of conflicting adjacencies

Module author: Aïda Ouangraoua, Amandine PERRIN

February 2014

procars.step_modules.compute_conserved_adjacencies.check_potential_cars_neighbors(spe_id, neighbors, segments_neighbors, cars, block_position_in_car, left, right)[source]

For each pair of potential neighboring CARs, check if the pair of segments really supports the adjacency

Parameters:

spe_id : int

ID of species to check

neighbors : list

List of potential neighbors (with CAR ids). Ex: [[], [1, 2], [2, 3], ...]

segments_neighbors : list

List of potential segment neighbors. Ex. [[], [[1], [2]], [[2], [3]], ...]

cars : list

List of ordered cars

block_position_in_car : list

Integer array such that block_position_in_car[block_id] = position of block in car to which it belongs

left : list

Left neighbors of species for all blocks

right : list

Left neighbors of species for all blocks

Returns:

tuple

left and right valid neighbors

procars.step_modules.compute_conserved_adjacencies.commons_read_cars(nb_blocks, blocks, nb_species, car_file_name, step_nb, binary_file_name)[source]

Function reading blocks from a block file, reading car file, and computing block neighbors

Parameters:

nb_blocks : int

Number of blocks

blocks : list

array of arrays (blocks for each species)

blocks = [[list of blocks for spe_id=0], [list of blocks for spe_id=1], ...]

with list of blocks for spe_id=1 = [[list of blocks for spe_id=1, chromosome 1], [list of blocks for spe_id=1, chromosome 2], ...]

with list of blocks for spe_id=1, chromosome 1 = [[start, end, num_bloc, orient], [start, end, num_bloc, orient], ... for the n blocks]

nb_species : int

Total number of species

car_file_name : String

Name of the PQtree file where current CARs are stored

step_nb : int

Current step of the ProCars method

binary_file_name : String

Name of the binary file where information about current cars must be saved

Returns:

tuple

cars: array of arrays (cars)

nb_cars: integer, total number of cars

block_position_in_car: integer array

blocks_car: array of arrays (blocks’ cars): see return of compute_blocks_cars

procars.step_modules.compute_conserved_adjacencies.compute_adjacencymatrix_from_neighbor_arrays(left, right, nb_blocks)[source]

Function computing an adjacency matrix from left and right neighbors

Parameters:

left : list

Array such that left[block_id] is an array containing all left neighbors of block block_id

right : list

Array such that right[block_id] is an array containing all right neighbors of block_id

nb_blocks : int

Total number of blocks

Returns:

SparseMatrix

adjacencymatrix: 0-1 Matrix which rows and columns correspond to block extremities with:
  • right extremity of bloc_id : extremity_id = block_id
  • left extremity of bloc_id : extremity_id = block_id + nb_blocks
procars.step_modules.compute_conserved_adjacencies.compute_blocks_cars(nb_species, blocks, block_to_car)[source]

Computes the list of blocks’ cars for each species

Parameters:

nb_species : int

Total number of species

blocks : list

array of arrays (blocks for each species)

blocks = [[list of blocks for spe_id=0], [list of blocks for spe_id=1], ...]

with list of blocks for spe_id=1 = [[list of blocks for spe_id=1, chromosome 1], [list of blocks for spe_id=1, chromosome 2], ...]

with list of blocks for spe_id=1, chromosome 1 = [[start, end, num_bloc, orient], [start, end, num_bloc, orient], ... for the n blocks]

block_to_car : list

Integer array such that block_2_car[block_id] = car_id to which block_id belongs

Returns:

list

array of arrays (blocks’ cars)

blocks_car = [[list of block_cars for spe_id=0], [list of block_cars for spe_id=1], ...]

with list of block_cars for spe_id=1 = [[list of blocks’ cars for spe_id=1, chromosome 1], [list of blocks’ cars for spe_id=1, chromosome 2], ...]

procars.step_modules.compute_conserved_adjacencies.compute_cars_neighbors(nb_cars, nb_species, blocks, cars, blocks_car, block_position_in_car, binary_file_name)[source]

Finds all possible neighbors for each CAR

Parameters:

nb_cars : int

Total number of CARs

nb_species : int

Total number of species

blocks : list

All blocks, for each species: array of arrays (blocks for each species)

blocks = [[list of blocks for spe_id=0], [list of blocks for spe_id=1], ...]

with list of blocks for spe_id=1 = [[list of blocks for spe_id=1, chromosome 1], [list of blocks for spe_id=1, chromosome 2], ...]

with list of blocks for spe_id=1, chromosome 1 = [[start, end, num_bloc, orient], [start, end, num_bloc, orient], ... for the n blocks]

cars : list

List of cars with signed blocks in each one

blocks_cars : list

blocks_car = [[list of block_cars for spe_id=0], [list of block_cars for spe_id=1], ...]

with list of block_cars for spe_id=1 = [[list of blocks’ cars for spe_id=1, chromosome 1], [list of blocks’ cars for spe_id=1, chromosome 2], ...]

block_position_in_car : list

Integer array such that block_position_in_car[block_id] = position of block in car to which it belongs

Returns:

list

left car neighbors and right car neighbors such that left[spe_id][car_id] = [list of left neighbors of car_id in species spe_id]

procars.step_modules.compute_conserved_adjacencies.compute_conserved_adjacencies(left, right, nb_blocks, species1_id, species2_id, species3_id)[source]

Procedure computing the conservation status of all potential adjacencies

Fills arrays for fully conserved adjacencies (conserved_left/right_D) and partly conserved adjacencies (conserved_left/right)

Parameters:

left : list

Array such that left[block_id] is an array containing all left neighbors of block block_id

right : list

Array such that right[block_id] is an array containing all right neighbors of block_id

nb_blocks : int

Total number of blocks

species1_id: list

IDs of group1 species

species2_id: list

IDs of group2 species

species3_id: list

IDs of group3 species

Returns:

list

conserved_left_D: array of left fully conserved adjacencies

conserved_right_D: array of right fully conserved adjacencies

conserved_left: array of left conserved adjacencies

conserved_right: array of right conserved adjacencies

procars.step_modules.compute_conserved_adjacencies.compute_extremity(segment, car_id, cars, block_position_in_car, orientation)[source]

Function computing the orientation of a CAR in a CAR adjacency.

Parameters:

segment : list

List of block IDs

car_id : int

Given CAR id

cars : list

List of all current cars

block_position_in_car : list

Integer array such that block_position_in_car[block_id] = position of block in car to which it belongs

orientation : int

1 if positive orientation, -1 otherwise

Returns:

int

extremity: oriented car, or 0 if segment is not an extremity of car

procars.step_modules.compute_conserved_adjacencies.compute_potential_cars_neighbors(spe_id, blocks, blocks_car)[source]

Compute potential pairs of neighboring cars

Parameters:

spe_id : int

ID of current species

blocks : list

Positions of blocks for each species (see commons_read_cars)

blocks_car : list

Blocks’ CARs for each species. See return of compute_blocks_cars

Returns:

tuple

neighbors: pair of cars having consecutive segments in species

segments_neighbors: composition of corresponding segments (in terms of signed blocks)

procars.step_modules.compute_conserved_adjacencies.diff_nc_matrix_m(matrix1, matrix2, nb_blocks)[source]

Function splitting an adjacency matrix matrix2 into two matrices : the non-conflicting and the conflicting adjacencies with an other matrix matrix1

Parameters:

matrix1 : SparseMatrix

0-1 Matrix, first adjacency matrix

matrix2 : SparseMatrix

0-1 Matrix, second adjacency matrix

nb_blocks : int

Total number of blocks

Returns:

tuple

nc_adjacencymatrix: SparseMatrix containing non-conflicting adj.

c_adjacencymatrix: SparseMatrix containing conflicting adj.

procars.step_modules.compute_conserved_adjacencies.fill_conserved_arrays(conserved_left_D, conserved_right_D, conserved_left, conserved_right, left, right, species1_id, species2_id, species3_id, block1, block2)[source]

Method filling the conservation status arrays for a given adjacency block1, block2

Parameters:

conserved_left_D : list

Array of left fully conserved adjacencies to be filled

conserved_right_D : list

Array of right fully conserved adjacencies to be filled

conserved_left : list

Array of left conserved adjacencies to be filled

conserved_right : list

Array of right conserved adjacencies to be filled

left : list

Array such that left[block_id] is an array containing all left neighbors of block block_id

right : list

Array such that right[block_id] is an array containing all right neighbors of block_id

species1_id: list

IDs of group1 species

species2_id: list

IDs of group2 species

species3_id: list

IDs of group3 species

block1: int

First block id of the given adjacency

block2: int

Second block id of the given adjacency

procars.step_modules.compute_conserved_adjacencies.find_adjacency_sets(nb_cars, nb_species, blocks, cars, blocks_car, block_position_in_car, car_neigh_bin, species1_id, species2_id, species3_id)[source]

Finds all sets of adjacencies, according to their status : (non)conflicting, fully/partly conserved, etc.

Parameters:

nb_cars : int

Total number of cars

nb_species : int

Total number of species

blocks : list

All blocks, for each species: array of arrays (blocks for each species)

blocks = [[list of blocks for spe_id=0], [list of blocks for spe_id=1], ...]

with list of blocks for spe_id=1 = [[list of blocks for spe_id=1, chromosome 1], [list of blocks for spe_id=1, chromosome 2], ...]

with list of blocks for spe_id=1, chromosome 1 = [[start, end, num_bloc, orient], [start, end, num_bloc, orient], ... for the n blocks]

cars : list

For each car id, ordered list of signed blocks in the car

blocks_car : list

[[list of block_cars for spe_id = 0], [list of block_cars for spe_id = 1], ...]

with list of block_cars for spe_id = 1 = [[list of blocks’ cars for spe_id=1, chromosome 1], [list of blocks’ cars for spe_id=1, chromosome 2], ...]

block_position_in_car : list

Integer array such that block_position_in_car[block_id] = position of block in car to which it belongs

car_neigh_bin : string

Name of the binary file where car neighbors information is stored

species1_id : list

IDs of group1 species

species2_id : list

IDs of group2 species

species3_id : list

IDs of group3 species

Returns:

tuple

car_left: left neighbors of each car, such that left[spe_id][car_id] = [list of left neighbors of car_id in species spe_id]

car_right: same as car_left, for right neighbors

fs_nc_adj: fully conserved non-conflicting adjacencies

ps_nc2_adj: partly conserved non-conflicting adjacencies

fs_c_adj: fully conserved conflicting adjacencies

ps_nc_c_adj: partly conserved adjacencies in conflict with fully conserved ones

ps_c_adj: cnflicting partly conserved adjacencies

procars.step_modules.compute_conserved_adjacencies.find_info_and_write_adjs(group1_adjs, group2_adjs, car_left, car_right, cars, step_nb, species1_id, species2_id, species3_id, file_handler)[source]

Find the conservation status of all retained adjacencies, and write them into the output file

Parameters:

group1_adjs : list

List of car_adjacencies fully conserved

group2_adjs : list

List of car_adjacencies partly conserved

car_left : list

For each car, left neighbor

car_right : list

For each car, right neighbor

cars : list

For each car id, ordered list of signed blocks in the car

step_nb : int

Current step of the ProCars method

species1_id : list

IDs of group1 species

species2_id : list

IDs of group2 species

species3_id : list

IDs of group3 species

file_handler : fileObject

Open file in which writing the given adjacencies and their information

procars.step_modules.compute_conserved_adjacencies.main(car_file_name, adjacency_file_name_prefix, step_nb, tree_bin, blocks_bin, car_bin, car_neigh_bin)[source]

Main method

From a CAR file, it computes a set of conserved adjacencies between CARs for the ancestor node, and it splits this set into a set of non-conflicting adjacencies and a set of conflicting adjacencies.

Parameters:

car_file_name : string

Prefix of the PQtree files

adjacency_file_name_prefix : string

Prefix of the Adjacencies files

step_nb : int

Current step of the ProCARs method

tree_bin : string

Name of the binary file where information about the species tree is stored

blocks_bin : string

Name of the binary file where information about blocks is stored

car_bin : string

Name of the binary file where information about current cars is stored

car_neigh_bin : string

Name of the binary file where information about car neighbors is stored

Returns:

tuple

nb_final_adj: total number of adjacencies added

nb_discarded: total number of discarded adjacencies

fs_nc_adj: fully conserved non-conflicting adjacencies

ps_nc2_adj: partly conserved non-conflicting adjacencies

fs_c_adj: conflicting fully conserved adjacencies

ps_nc_c_adj: partly conserved adjacencies in conflict with fully conserved ones

ps_c_adj: conflicting partly conserved adjacencies

procars.step_modules.compute_conserved_adjacencies.reverse(segment)[source]

Function producing the reverse of a segment

Parameters:

segment : list

Segment of CAR IDs

Returns:

list

segment_rev: reverse of given segment

Examples

For example:

reverse([1 -2 3]) = [-3 2 -1]
procars.step_modules.compute_conserved_adjacencies.segment_car_common_extremity(segment, car, block_position_in_car, side_segment, side_car)[source]

Function checking if a segment and a car share a same extremity region in terms of synteny.

Parameters:

segment : list

Segment of CAR IDs

car : list

Ordered list of signed blocks in the given CAR

block_position_in_car : list

Integer array such that block_position_in_car[block_id] = position of block in car to which it belongs

side_segment : int

1 if we look at right end of the segment, 0 otherwise

side_car : int

1 if we look at right end of the CAR, 0 otherwise

Returns:

boolean

True if it shares an extremity, False otherwise

Examples

  • segment_car_common_extremity([1 -2 4 -3], [1 2 3 4], 1, 1) = True, because they share a end region composed of blocks 3 and 4.
  • segment_car_common_extremity([1 -2 4 -3], [1 2 3 4], 0, 0) = True, because they share a begin region composed of blocks 1 and 2.
  • segment_car_common_extremity([1 -2 4 -3], [-3 4 2 -1], 0, 1) = True, because they do share a same region that is the beginning of segment and the end of car
procars.step_modules.compute_conserved_adjacencies.write_discarded_adjs(filename, group1_adjs, group2_adjs, car_left, car_right, species1_id, species2_id, species3_id, cars, step_nb)[source]

Write adjacencies found for ancestor into the output file

Parameters:

filename : String

Name of the output file in which writting all adjacencies

group1_adjs : list

List of car_adjacencies fully conserved but conflicting

group2_adjs : list

List of car_adjacencies partly conserved and in conflict with fully or partly conserved adjs

car_left : list

For each car, left neighbor

car_right : list

For each car, right neighbor

species1_id : list

List of IDs of species in group1 of partition defined by the ancestor node

species2_id : list

List of IDs of species in group2 of partition defined by the ancestor node

species3_id : list

List of IDs of species in group3 of partition defined by the ancestor node

cars : list

List of all cars

step_nb : int

Current step of the ProCARs method

procars.step_modules.compute_conserved_adjacencies.write_retained_adjs(last_adjs, new_adjs, group1_adjs, group2_adjs, cars, step_nb, car_left, car_right, species1_id, species2_id, species3_id)[source]

Write adjacencies found for ancestor into the output file

Parameters:

last_adjs : String

Name of the file where previously added adjacencies are stored

new_adjs : String

Name of the file where new adjacencies must be written

group1_adjs : list

List of fully conserved non-conflicting adjacencies

group2_adjs : list

List of partly conserved non-conflicting adjacencies

cars : list

List of all cars

step_nb : int

Step of the ProCARs method

car_left : list

For each car, left neighbor

car_right : list

For each car, right neighbor

species1_id : list

List of IDs of species in group1 of partition defined by the ancestor node

species2_id : list

List of IDs of species in group2 of partition defined by the ancestor node

species3_id : list

List of IDs of species in group3 of partition defined by the ancestor node

compute_resolved_conflicts module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


compute_resolved_conflicts module description:

This module is used to resolve conflicts in a given set of adjacencies, when, at the previous step, the set of non-conflicting adjacencies was empty.

Method:

From a set of conflicting conserved adjacencies with the set of species supporting each adjacencies, it computes a maximum set of non-conflicting adjacencies. We use the Sankoffs algorithm’s for the minimal mutation problem.

Module author: Aïda Ouangraoua, Amandine PERRIN

February 2014

procars.step_modules.compute_resolved_conflicts.change_adj_discarded_files(adj_prefix_file, step)[source]

Save discarded adjacencies into a new file for information, and remove them from last file

Parameters:

adj_prefix_file : string

Prefix of adjacency file name

step : int

Current step of ProCars method

procars.step_modules.compute_resolved_conflicts.compute_cost(tree, path, leaves, internals, labels, nb_labels)[source]

Function computing optimal labels for the ancestor node in a tree

Dynamic programming version of the Fitch Algorithm for the Small Parsimony Problem (Fitch 71)

Parameters:

tree: dict

Tree structure:

{node_id: [node_name, parent_id, left_child_id, right_child_id], ...}

Where id = -1 if no parent/left_child/right_child

And node_name = “N” + node_id for internal nodes, or leaf_name for leaf nodes

path : list

Array, associating value 1 to a node if it is on the path between ancestor and root node, otherwise value 0

leaves : list

List of leaf IDs in the tree

internals : list

Ordered list of internal node IDs in the tree

labels : dict

Containing labels of leaves nodes: 0 if adjacency is absent, 1 if present

nb_labels : int

Number of label types

Returns:

list

tab: costs for each node of the tree

procars.step_modules.compute_resolved_conflicts.find_conflicts_couples(adj_ids)[source]

Find couples of adjacencies in conflict

Among all possible couples of adjacencies, finds the ones which involve a common end.

Parameters:

adj_ids : list

List of IDs such that adj_ids[id] = (bloc1, bloc2) with bloc1 and bloc2 the two block numbers of the adjacency with id ‘id’

Returns:

tuple

conflicts: a list containing all couples of adjacencies IDs which are in conflict (because they share one end).

procars.step_modules.compute_resolved_conflicts.find_independant_sets(nb_adjacencies, conflicts)[source]

Find sets of independent adjacencies

From the list of conflicting adjacencies, it tries to create sets by adding new adjacencies in each set, which are compatible with all adjacencies already in the set. The set of independent (= not in conflict) adjacencies is then growing at each iteration, until we cannot add any more adjacency.

Parameters:

nb_adjacencies : int

Total number of adjacencies

conflicts : int

List of pairs of adjacencies IDs which are in conflict

Returns:

list

independent_sets: list of sets of compatible adjacencies. All sets have the same size, which is the maximum possible size of an independent adjacencies set

procars.step_modules.compute_resolved_conflicts.find_mincost_set(independent_sets, adjacencies_score)[source]

Among all sets of compatible adjacencies of maximum size, find the one with minimum cost

Parameters:

independent_sets : list

List of sets of independent adjacencies IDs

adjacencies_score : list

List of scores of the adjacencies, such that adjacencies_score[id] = score of adjacency with ID = id

Returns:

list

maximum_set: a list of compatible adjacencies, with minimum global cost

procars.step_modules.compute_resolved_conflicts.get_adjacencies_costs(adj_file, nb_species, leaves, tree, spe_ids, path, internals, ancestor_id)[source]

Read the file where discarded adjacencies were stored, and calculate the cost of each of these adjacencies

Parameters:

adj_file : string

File in which discarded adjacencies are written

nb_species : int

Total number of species

leaves : list

List of IDs of tree leaves (= genomes)

tree : dict

Tree structure

{node_id: [node_name, parent_id, left_child_id, right_child_id], ...}

Where id = -1 if no parent/left_child/right_child

And node_name = “N” + node_id for internal nodes, or leaf_name for leaf nodes

spe_ids : dict

Species as keys, and their corresponding ID as value

path : list

Array, associating value 1 to a node if it is on the path between ancestor and root node, otherwise value 0

internals : list

IDs of internal nodes of the tree

ancestor_id : int

ID of the ancestor node

Returns:

tuple

  • adjacencies_score: dict such that adjacencies_score[id] = score of the adjacency with ID id

  • adj_ids: dict for corresponding adjacencies and adjacency IDs: adj_ids[id] = (bloc1, bloc2)

  • step_car_adjs: dict such that step_car_adjs[id] = (car1, car2, type, step, labels)

    where car1, car2 corresponds to the car adjacency corresponding to the adjacency with ID id, type is the type of adjacency, and step is the step at which the adjacency was found, and labels is some information about the presence (1) or absence (0) of the adjacency in each species.

procars.step_modules.compute_resolved_conflicts.main(adjacency_file_name, step, tree_bin)[source]

Main method. From a set of conflicting conserved adjacencies with the set of species supporting each adjacencies, it computes a maximum set of non-conflicting adjacencies.

Parameters:

adjacency_file_name : string

Prefix of the adjacency file

step : int

Current step of the ProCARs method

tree_bin : string

Name of the binary file containing all information about the species tree.

Returns:

tuple

maximum_set: list List of retained adjacency ids (one of the independent_sets)

independent_sets: list List of compatible adjacency sets. All sets have the same size, which is the maximum possible size of an independent adjacencies set

procars.step_modules.compute_resolved_conflicts.write_retained_adjs(adj_filename_prefix, adj_infos, maximum_set, adj_ids, step)[source]

Write all adjacencies in the retained set of non-conflicting adjacencies (maximum size and minimum cost) into a new adjacency file.

Parameters:

adj_filename_prefix : string

Prefix of adjacency file name

adj_infos : dict

Dictionary with adjacency IDs as keys, and values are a list with:
  • the car adjacency (car1, car2)
  • the type of adjacency
  • the step at which it was found
  • the presence of this adjacency in each species

maximum_set : list

List of retained adjacency ids

adj_ids : dict

Keys are adjacency IDs, and values are the adjacency corresponding to the ID

step : int

Current step of ProCars method

compute_dcj_adjacencies module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


compute_dcj_adjacencies module description:

This module is used to find DCJ-reliable adjacencies. It is called when, at the previous step, both the sets of compatible adjacencies and conflicting adjacencies are empty. Hence, we try to add new adjacencies by adding DCJ-reliable ones, if possible.

Module author: Aïda Ouangraoua, Amandine PERRIN

February 2014

procars.step_modules.compute_dcj_adjacencies.compute_adjacencymatrix_from_adjacency_array(adjacencies, nb_blocks)[source]

Function computing an adjacency matrix from an adjacency array

Parameters:

adjacencies : list

Array containing pairs of signed blocks, with abs(block1) < abs(block2)

nb_blocks : int

Total number of blocks

Returns:

SparseMatrix

adjacencymatrix: 0-1 Matrix which rows and columns correspond to block extremities:
  • right extremity of bloc_id : extremity_id = block_id
  • left extremity of bloc_id : extremity_id = block_id + nb_blocks
procars.step_modules.compute_dcj_adjacencies.compute_neighbor(spe, block, side, left, right)[source]

Finds the side (left (0) or right (1)) neighbor of the given block (block)

Parameters:

spe : int

Species id for which we want to find a neighbor

block : int

Block number of the species for which we want to find a neighbor

side : int

0 for left extremity of the given block of the given species, 1 for right extremity

left : list

Array such that left[spe_id][block_id] is an array containing the left neighbor of block block_id in spe_id

right : list

Array such that right[spe_id][block_id] is an array containing the right neighbor of block_id in spe_id

Returns:

int

neighbor: signed integer, neighboring block of the given block

procars.step_modules.compute_dcj_adjacencies.find_possible_dcj_adjs(nb_cars, possible_partitions, cars, left, right, car_left, car_right)[source]

From a set of CARs in the current ancestral PQtree, and the list of CAR and block neighbors in all species, finds which CAR adjacencies are DCJ-reliable. A next step will check if adjacencies in this step are not in conflict.

Parameters:

nb_cars : int

Total number of cars in the current PQtree

possible_partitions : list

List of partitions, which are ordered lists of lists species ids corresponding to different combinations of the 3 groups partition defined by the ancestor

cars : list

List of lists of ordered signed block numbers in each car

left : list

Array such that left[spe_id][block_id] is an array containing the left neighbor of block block_id in spe_id

right : list

Array such that right[spe_id][block_id] is an array containing the right neighbor of block_id in spe_id

car_left : list

Array such that car_left[species_id][car_id] = list of potential left car neighbors of car_id in species ‘species_id’

car_right : list

Array such that car_right[species_id][car_id] = list of potential right car neighbors of car_id in species ‘species_id’

Returns:

list

car_adjacencies: list of lists (possible DCJ-reliable CAR adjacencies)

procars.step_modules.compute_dcj_adjacencies.is_adjacency_in_car(cars, block_adjacency)[source]

Checks if a block adjacency belongs to the CARs

Parameters:

cars : list

List of all CARs of the current PQtree

block_adjacency : tuple

One given block adjacency to search among the CARs

Returns:

boolean

True if the adjacency is found in the CARs, False otherwise

procars.step_modules.compute_dcj_adjacencies.is_car_adjacency(spe, car_adjacency, car_left, car_right)[source]

Checks if a given CAR adjacency is a possible car adjacency in the given species

Parameters:

spe : int

Species ID in which searching the car adjacency

car_adjacency : tuple

CAR adjacency to find

car_left : list

Array such that car_left[species_id][car_id] = list of potential left car neighbors of car_id in species ‘species_id’

car_right : list

Array such that car_right[species_id][car_id] = list of potential right car neighbors of car_id in species ‘species_id’

Returns:

boolean

True if car_adjacency belongs to spe, False otherwise

procars.step_modules.compute_dcj_adjacencies.is_dcj_reliable(possible_partitions, car_adjacency, cars, left, right, car_left, car_right)[source]

Finds signal from a given car_adjacency in all possible partitions of the species tree.

Parameters:

possible_partitions : list

List of partitions, which are ordered lists of species IDs lists corresponding to different combinations of the 3 groups partition defined by the ancestor

car_adjacency : tuple

(car1, car2) car adjacency to test

cars : list

List of lists of ordered signed block numbers in each car

left : list

Array such that left[spe_id][block_id] is an array containing the left neighbor of block block_id in spe_id

right : list

Array such that right[spe_id][block_id] is an array containing the right neighbor of block_id in spe_id

car_left : list

Array such that car_left[species_id][car_id] = list of potential left car neighbors of car_id in species ‘species_id’

car_right : list

Array such that car_right[species_id][car_id] = list of potential right car neighbors of car_id in species ‘species_id’

Returns:

boolean

True if the adjacency is dcj-reliable and conserved, False otherwise

procars.step_modules.compute_dcj_adjacencies.is_dcj_reliable_conserved(partition, car_adjacency, block_adjacency, cars, left, right, car_left, car_right)[source]

Finds signal from a given car_adjacency in a given partition of the tree species.

A car adjacency is DCJ-reliable if the signal of this DCJ adjacency is conserved on a path of the species tree that goes through the ancestor (from a group of the partition defined by the ancestor to another group of this same partition)

Parameters:

partition : list

Ordered list of lists of species IDs corresponding to a given combination of the 3 groups partition defined by the ancestor

car_adjacency : tuple

(car1, car2) car adjacency

block_adjacency : tuple

(bloc1, bloc2) block adjacency corresponding to the car adjacency

cars : list

List of lists of ordered signed block numbers in each car

left : list

Array such that left[spe_id][block_id] is an array containing the left neighbor of block block_id in spe_id

right : list

Array such that right[spe_id][block_id] is an array containing the right neighbor of block_id in spe_id

car_left : list

Array such that car_left[species_id][car_id] = list of potential left car neighbors of car_id in species ‘species_id’

car_right : list

Array such that car_right[species_id][car_id] = list of potential right car neighbors of car_id in species ‘species_id’

Returns:

boolean

True if the DCJ-reliable adjacency is conserved, False otherwise

procars.step_modules.compute_dcj_adjacencies.main(adjacency_file_name_prefix, step_nb, car_bin, neigh_bin, tree_bin, blocks_bin)[source]

Main method, to find DCJ-reliable adjacencies

Parameters:

adjacency_file_name_prefix : string

Prefix of the adjacncy file name

step_nb : int

Current step of the ProCars method

car_bin : string

Name of the binary file where all information about CARs is saved

neigh_bin : string

Name of the binary file where all information about the neighbors of each CAR is saved

tree_bin : string

Name of the binary file where all information about the species tree is saved

blocks_bin : string

Name of the binary file where all information about the blocks is saved

Returns:

tuple

step_nb: current step of the ProCars method

dcj_NC_Adj: list of all non-conflicting DCJ-reliable adjacencies

dcj_NC_Pair: list of matrix indexes of non-conflicting DCJ-adjacencies.

procars.step_modules.compute_dcj_adjacencies.write_retained_adjs(adj_filename_prefix, cars, dcj_NC_Adj, step, car_left, car_right, species1_id, species2_id, species3_id)[source]

Write all retained DCJ-reliable adjacencies into a txt file

Parameters:

adj_filename_prefix : string

Prefix of the file where retained adjacencies are stored

cars : list

List of ordered lists of signed blocks corresponding to the different cars, such that cars[car_id] = ordered list of blocks in car_id

dcj_NC_Adj : list

List of retained compatible set of dcj-reliable adjacencies

step : int

Current step in the whole method

car_left : list

Array such that car_left[species_id][car_id] = list of potential left car neighbors of car_id in species ‘species_id’

car_right : list

Array such that car_right[species_id][car_id] = list of potential right car neighbors of car_id in species ‘species_id’

species1_id : list

List of IDs of species in group1 of partition defined by the ancestor node

species2_id : list

List of IDs of species in group2 of partition defined by the ancestor node

species3_id : list

List of IDs of species in group3 of partition defined by the ancestor node

compute_pqtree module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


compute_pqtree module description:

This module creates a new CAR file, from a number of blocks and a set of adjacencies between those blocks. The CAR file has the following format:

#CAR1
_Q bloc1 bloc2 bloc3 ... blocn Q_
#CAR2
_Q bloc1 -bloc2 ... Q_
...

Module author: Aïda Ouangraoua, Amandine PERRIN

February 2014

procars.step_modules.compute_pqtree.find_cars(nb_blocks, left, right, adjacency_prefix, step_nb)[source]

From the list of added adjacencies, finds all CARs to put into the PQtree file

Parameters:

nb_blocks : int

Total number of blocks

left : dict

Dictionnary such that left[abs(block_num)] = signed block_number of block on the left end of block_num (if block_num > 0: previous block, else : next block). Left blocks calculated according to the current step adjacency file. If no left block: 0

right : dict

Dictionnary such that right[abs(block_num)] = signed block_number of block on the right end of block_num (if block_num > 0: next block, else : previous block). Right blocks calculated according to the current step adjacency file. If no right block: 0

adjacency_prefix : string

Prefix of all adjacencies files

step_nb : int

Current step of ProCARs method

Returns:

list

genome: List of lists, such that:

all_cars[car_num] = [bloc1, bloc2, ...] = all ordered signed blocks of car number car_num

procars.step_modules.compute_pqtree.init_read_from_block_in_cycle(signed_block, nex_signed_block, next_prev_signed_block, next_prev_discarded_signed_block, end)[source]

Initialization of the resolution of cycle CARs

Checks if the block on the end extremity of the given signed_block was also found at last step or not.

  • if it was found and added at last step, keep it
  • if it was found but discarded in last step, keep it, but also store it in to_be_discarded_unresolved in order to remove it if the cycle is still unresolved
  • if it was not found at last step, remove it: store it to to_be_discarded_resolved
Parameters:

signed_block : int

Signed block for which we are looking for the adjacency with its next block.

nex_signed_block : int

Next block of the given signed_block.
  • If we look on the left end and signed_block > 0, nex_signed_block = previous[signed_block]
  • If we look on the left end and signed_block < 0, nex_signed_block = -next[-signed_block]
  • If we look on the right end and signed_block > 0, nex_signed_block = next[signed_block]
  • If we look on the right end and signed_block < 0, nex_signed_block = -previous[-signed_block]

next_prev_signed_block : int

Same as nex_signed_block but for adjacencies added at last step

next_prev_discarded_signed_block : int

Same as nex_signed_block but for adjacencies discarded at last step

end : int

0 for left end, 1 for right end

Returns:

tuple

to_be_discarded_unresolved: List containing adjacencies that we add to the current CAR because they were discarded at the previous step, but that we will remove if we find a cycle CAR again (unresolved cycle)

to_be_discarded_resolved: List of adjacencies found at this step but not at last step, and hence to remove from the adjacency file

current_signed_block: New current signed block (next block of signed_block) if the adjacency was found in last step, or 0 if it was not.

procars.step_modules.compute_pqtree.main(nb_blocks, adjacency_prefix, car_prefix, step_nb)[source]

Main method, to create the new PQtree file.

Parameters:

nb_blocks : int

Total number of blocks

adjacency_prefix : string

Prefix of the adjacency file name

car_prefix : string

Prefix of the PQtree file name

step_nb : int

Current step of the ProCARs method

procars.step_modules.compute_pqtree.read_from_block(signed_block, left, right, end)[source]

From a signed block, and a list of all blocks neighbors, finds a chain of blocks forming a CAR from the given block towards the given end (left or right)

Parameters:

signed_block : int

ID of the given block, with its orientation (sign)

left : dict

Dictionnary such that left[abs(block_num)] = signed block_number of block on the left end of block_num (if block_num > 0: previous block, else : next block). Left blocks calculated according to the current step adjacency file. If no left block: 0

right : dict

Dictionnary such that right[abs(block_num)] = signed block_number of block on the right end of block_num (if block_num > 0: next block, else : previous block). Right blocks calculated according to the current step adjacency file. If no right block: 0

end : int

0 for left extremity of the block, 1 for right extremity

Returns:

list

car: ordered list of signed blocks in the required CAR or empty list if cycle CAR

procars.step_modules.compute_pqtree.read_from_block_in_cycle(signed_block, left, right, left_prev, right_prev, left_prev_discarded, right_prev_discarded, end)[source]

Finds a non-cyclic CAR

From a signed_block, which was found to be in a cycle CAR if we added all adjacencies found at the current step, it tries to find a non cycle CAR by requiring a new condition: add neighbor in the CAR if it was also found as a conserved adjacency at last step (in added adjacencies or discarded ones).

Parameters:

signed_block : int

Number of the given block, with its orientation (sign)

left : dict

Dictionnary such that left[abs(block_num)] = signed block_number of block on the left end of block_num (if block_num > 0: previous block, else : next block). Left blocks calculated according to the current step adjacency file. If no left block: 0

right : dict

Dictionnary such that right[abs(block_num)] = signed block_number of block on the right end of block_num (if block_num > 0: next block, else : previous block). Right blocks calculated according to the current step adjacency file. If no right block: 0

left_prev : dict

Same as left, but for adjacencies found in last step

right_prev : dict

Same as right, but for adjacencies found in last step

left_prev_discarded : dict

For each block number, an array containing its potential left neighbors: left[bloc1] = [bloc2, -bloc3,..]. According to adjacencies discarded at the last step

right_prev_discarded : dict

For each block number, an array containing its potential right neighbors: right[bloc1] = [-bloc2, bloc3,..]. According to adjacencies discarded at the last step

end : int

0 for left extremity of the block, 1 for right extremity

Returns:

list

car: ordered list of signed blocks in the required CAR. If the cycle is unresolved, it returns the cycle CAR. Else, it returns the part of the CAR from the block signed_block until the end extremity of the CAR.

to_be_discarded: adjacencies to remove from the current set of adjacencies, because of the resolution of the cycle CAR.

procars.step_modules.compute_pqtree.remove_adjacency_from_file(adjacency_file_name, adjacency_list)[source]

Removes an adjacency in an adjacency file

When all conserved non-conflicting adjacencies are found, this module computes the sets of blocks into CARs. If adjacencies added infer a cycle CAR, we eliminate one of these adjacencies (the less reliable) in order to form a linear chromosome. The removed adjacency must then be removed from the adjacency file.

Parameters:

adjacency_file_name : string

Name of the current step adjacency file

adjacency_list : list

List of lists: adjacencies to remove from the file.