procars.utils subpackage

The utils subpackage contains modules providing functions used by several steps of ProCARs. These can be input/output functions, or utility functions in order to work with adjacencies (transform CAR adjacencies to block adjacencies, find non-conflicting adjacencies in a given set, etc.).

Modules:

initial_files_reader module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


initial_files_reader module description:

This module contains some methods used to read and parse the initial files:

  • Phylogeny tree file (where a node is marked as the ancestor: @)
  • Blocks file (with block order of each genome of the tree)

Module author: Aïda Ouangraoua, Amandine PERRIN

May 2014

procars.utils.initial_files_reader.read_block_file(block_file_name, nb_species, spe_ids)[source]

Method reading blocks from a block file

Parameters:

block_file_name : String

Name of file containing the ordered blocks for all species

nb_species : int

Total number of species

spe_ids : dict

Species and their corresponding ID {spe1: id1, spe2: id2, ...}

Returns:

list

blocks: array of arrays (blocks)
blocks = [[list of blocks for spe_id=0], [list of blocks for spe_id=1], ...]
with list of blocks for spe_id=1 = [[list of blocks for spe_id=1, chromosome 1], [list of blocks for spe_id=1, chromosome 2], ... ]
with list of blocks for spe_id=1, chromosome 1 = [[start, end, num_bloc, orient], [start, end, num_bloc, orient], ... for the n blocks ]

nb_blocks: int, total number of blocks

procars.utils.initial_files_reader.read_block_line(line, chm, spe_ids, blocks, bloc_id)[source]

Reading one line of a block file, in order to return a new block number, or the positions of a block in a species chromosome.

Parameters:

line : String

A line of the block file

chm : dict

Chromosome IDs : {chromo: id, chromo2: id2, ...}

spe_ids : dict

Species and their corresponding ID {spe1: id1, spe2: id2, ...}

blocks : list

Array of blocks already read, to complete

bloc_id : int

Current bloc id

Returns:

list

chm: dict, chromosome IDs : {chromo: id, chromo2: id2, ...}

bloc_id: int, current bloc id

procars.utils.initial_files_reader.read_treefile(tree_file_name)[source]

Function reading the tree structure from a file

Parameters:

tree_file_name : String

Name of file containing species tree

Returns:

tuple

tree: dict, tree structure:
{node_id: [node_name, parent_id, left_child_id, right_child_id], ...}
where id = -1 if no parent/left_child/right_child
and node_name = “N” + node_id for internal nodes, or leaf_name for leaf nodes

ancestor_id: int, id of ancestor node in tree

util_adjacency_functions module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


util_adjacency_functions module description:

Module containing methods used by other modules like compute_resolved_conflicts and compute_conserved_adjacencies in order to find required adjacencies.

It contains for exemple methods in order to:

  • split a set of adjacencies in a set of conflicting adjacencies and a set of non-conflicting ones.
  • Find the block numbers involved in a given adjacency according to the adjacency ID in the matrix

Module author: Aïda Ouangraoua, Amandine PERRIN

February 2014

procars.utils.util_adjacency_functions.car_to_block(signed_car, side, cars)[source]

From a signed car, returns the signed block at the side extremity

Parameters:

signed_car : int

A given CAR ID

side : int

0 for left end of the car, 1 for right end

cars : list

list of all cars

Returns:

int

signed block, which is at the given end of the given car

procars.utils.util_adjacency_functions.compute_adjacencies(extremity_pairs, nb_blocks)[source]

Function computing the CAR adjacencies (car1, car2) stored into an extremity pairs array

Parameters:

extremity_pairs : list

Array containing pairs of extremities (corresponding to indexes of SparseMatrix)

nb_blocks : int

Total number of blocks

Returns:

list

adjacencies: array containing pairs of signed integers (adjacencies)

procars.utils.util_adjacency_functions.compute_extremity_pair(adjacencymatrix)[source]

Function computing the extremity pairs stored into an adjacency matrix: it finds all couples of indexes for which an adjacency exists (=indexes for which adjacencymatrix[x, y] != 0)

Parameters:

adjacencymatrix : SparseMatrix

Sparse matrix, with 1 at indices of possible adjacencies, 0 otherwise

Returns:

list

extremity_pairs: array containing pairs of integers (extremities)

procars.utils.util_adjacency_functions.conservation_status(adjacency, left, right, species1_id, species2_id, species3_id)[source]

Function computing the conservation status of an adjacency

nbgroup = number of group in which adjacency is present. If nbgroup = 3, the adjacency is ‘fully’. If nbgroup = 2, the adjacency is partly. Otherwise, the adjacency is not conserved.

Parameters:

adjacency : list

Pair of signed integers

left : list

Array such that left[block_id] is an array containing all left neighbors of block block_id

right : list

Array such that right[block_id] is an array containing all right neighbors of block_id

species1_id: list

Ids of group1 species

species2_id: list

Ids of group2 species

species3_id: list

Ids of group3 species

Returns:

tuple

nbgroup: integer, number of group(s) in which adjacency is present

species_supporting: 0-1 array indicating species in which adjacency is present (1) or absent (0)

procars.utils.util_adjacency_functions.find_adj_info(car_adjacency, car_left, car_right, species1_id, species2_id, species3_id)[source]

Find information on a car_adjacency: in which genomes it is present

Parameters:

car_adjacency : tuple

The given car adjacency (two car numbers)

car_left : list

For each car, left neighbor

car_right : list

For each car, right neighbor

species1_id : list

Ids of species in group1 in tree partition from ancestor

species2_id : list

Ids of species in group2 in tree partition from ancestor

species3_id : list

Ids of species in group3 in tree partition from ancestor

Returns:

list

labels: a list of integers indicating if the adjacency is present (1) or absent (0) in each species

procars.utils.util_adjacency_functions.split_adjacencymatrix(adjacencymatrix, nb_blocks)[source]

Function splitting an adjacency matrix into two matrices: the non-conflicting and the conflicting adjacency matrices

Parameters:

adjacencymatrix : SparseMatrix

0-1 Matrix, the adjacency matrix (1 for each possible adjacency, 0 otherwise)

nb_blocks : int

Total number of blocks

Returns:

tuple of SparseMatrix

non_conflicting: 0-1 Matrix containing non-conflicting adjs

conflicting: 0-1 Matrix containing conflicting adjs.

util_classes module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


util_classes module description:

This module contains classes used by ProCARs.

  • SparseMatrix: provides a sparse matrix data structure

Module author: Aïda Ouangraoua, Amandine PERRIN

July 2014

class procars.utils.util_classes.SparseMatrix(line_nb, col_nb)[source]

Class providing a sparse matrix data structure, used when a matrix with a lot of zeros is needed.

Attributes

sparse_matrix (dict) dictionary where keys are IDs of lines containing non-null elements, and values are a dictionary with column IDs with non-null elements as keys and value at these positions (line_id, column_id) as values. Ex: {x1: {y1: 1, y3: 7}, x8: {y4: 4, y3: 5}} All other positions in the matrix are zeros.
line_nb (int) sparse matrix x size (number of lines)
col_nb (int) sparse matrix y size (number of columns)

Methods

pairs()[source]

Get all pairs (x, y) of sparse_matrix = all non-null element indexes of the matrix

Returns:

list

pairs: List of tuples (x, y) for which sparse_matrix[x, y] != 0

utils_IO module

Copyright © Bonsai - LIFL (Université Lille 1, CNRS UMR 8022) and Inria-Lille Nord Europe

contact: aida.ouangraoua@inria.fr, amandine.perrin@inria.fr

This software is a computer program whose purpose is to progressively reconstruct ancestral gene orders.

This software is governed by the CeCILL-B license under French law and abiding by the rules of distribution of free software. You can use, modify and/or redistribute the software under the terms of the CeCILL-B license as circulated by CEA, CNRS and Inria at the following URL http://www.cecill.info, or in the LICENCE file at the root directory of this program.

The fact that you are presently reading this means that you have had knowledge of the CeCILL-B license and that you accept its terms.


utils_IO module description:

This module contains some methods used by the other modules when they need to read/write into files.

Module author: Aïda Ouangraoua, Amandine PERRIN

May 2014

procars.utils.utils_IO.read_adjacency_file(adjacency_file_name, nb_blocks, discarded=False)[source]

Read an adjacency file (or a discarded adjacency file), and find all blocks and their left and right neighbors

used in compute_pqtree and resolve_conflict

Parameters:

adjacency_file_name : string

Name of the file containing the adjacencies to parse

nb_blocks : int

Total number of blocks

discarded : boolean

True if we are reading a file of discarded adjacencies (and hence there can be multiple left/right neighbors), False for a file of added adjacencies (one left/right neighbor per block_end)

Returns:

tuple

left: dict with:
  • if not discarded: for each block number, its left neighbor: left[bloc2] = bloc1
  • if discarded: for each block number, an array containing its potential left neighbors: left[bloc1] = [bloc2, -bloc3,..]
right: dict with:
  • if not discarded: for each block number, its right neighbor: right[bloc1] = bloc2
  • if discarded: for each block number, an array containing its potential right neighbors: right[bloc1] = [-bloc2, bloc3,..]
procars.utils.utils_IO.read_binary_file(bin_filename)[source]

Reading information stored into a binary file

Parameters:

bin_filename : string

Name of the file in which reading information

Returns:

tuple

information stored in the file

procars.utils.utils_IO.read_car_file(car_file_name, nb_blocks)[source]

Function reading a CAR file

Parameters:

car_file_name : string

Name of the file containing the current PQtree

nb_blocks : int

Total number of blocks

Returns:

tuple

cars: array of arrays (cars)

block_to_car: integer array such that block_to_car[block_id] = car_id to which block_id belongs

block_position_in_car: integer array such that block_position_in_car[block_id] = position of block in car to which it belongs

procars.utils.utils_IO.read_conflict_adj_file(adj_file, nb_species, leaves, tree, spe_ids)[source]

Read file containing previously discarded adjacencies, and yield them with their information

Parameters:

adj_file : string

File in which discarded adjacencies are written

nb_species : int

Total number of species

leaves : list

List of IDs of tree leaves (= genomes)

tree : dict

A tree structure

spe_ids : dict

Species as keys, and their corresponding ID as value

Returns:

tuple

tuple yield for each adjacency = each line of the file:

labels: dict with species as keys, and an int specifying if the adjacency is present (2) or absent (1) in the given species.

adj_id: int, ID of current adjacency

adjacency: tuple, current adjacency (num_bloc1, num_bloc2)

step_car_adj: tuple, current car adjacency, type of adjacency, and step at which it was found (num_car1, num_car2, type, step)

Warning

These Python objects are yield and not returned

procars.utils.utils_IO.save_binary_information(bin_filename, information)[source]

Saving information into a binary file

Parameters:

bin_filename : String

Name of the binary file in which saving the CARs information

information : list

Information we need to save into a binary file

procars.utils.utils_IO.write_adjacency(output_file, cars, car_adjacency, adj_type, step_nb, labels)[source]

Writes all adjacency information into a file handler

Parameters:

output_file : FileHandler

Open file or StringIO in which writting status of adjacencies

cars : list

List of current cars

car_adjacency : list

List of two signed blocks (a given adjacency)

adj_type : int

Type of adjacency : 0 if fully, 1 if partly

step_nb : int

Current step of the ProCars Method

labels : string

More information (presence/absence of the adjacency in each species)

procars.utils.utils_IO.write_car_file(car_filename, all_cars)[source]

Writes all CARs into the given file

Parameters:

car_filename : String

Name of the file in which all CARs are stored (= PQtree file)

all_cars : list

List of lists, such that all_cars[car_num] = [bloc1, bloc2, ...] = all ordered signed blocks of car number car_num

procars.utils.utils_IO.write_retained_conflict_adjs(filename, adj_infos, maximum_set, adj_ids)[source]

Write retained adjacencies after a conflict resolution into a txt file -> next adjacency file

Parameters:

filename : string

Name of the file in which retained adjacencies are stored

adj_infos : dict

Dictionary with adjacency IDs as keys, and values are a list with the car adjacency (car1, car2), the type of adjacency, the step at which it was found and the presence of this adjacency in each species

maximum_set : list

List of retained adjacency ids

adj_ids : dict

Keys are adjacency IDs, and values are the adjacency corresponding to the ID

Table Of Contents

Previous topic

ProCARs API

Next topic

procars.step_modules subpackage