perturbationx.topo_npa

Subpackages

Submodules

Package Contents

Functions

permute_adjacency(adj[, permutations, iterations, ...])

Permute an adjacency matrix.

permute_edge_list(edge_list[, node_list, iterations, ...])

Permute an edge list.

format_dataset(dataset[, computing_statistics])

Format a dataset for use with toponpa.

prune_network_dataset(graph, adj_b, dataset, dataset_id)

Prune a network and dataset to match each other.

infer_graph_attributes(graph[, relation_translator, ...])

Infer attributes of a network and add them to the graph instance.

compute_variances(lap_b, lap_c, lap_q, stderr, ...)

Compute the variance of the perturbation score and core node coefficients.

confidence_interval(values, variances[, alpha])

Compute the confidence intervals for the given significance level.

test_permutations(lap_b, lap_c, lap_q, adj_perms, ...)

Test the null hypothesis that the ordering of boundary coefficients and the distribution of core edges does not

p_value(value, distribution)

Compute the p-value for the given value in the given distribution.

coefficient_inference(lap_b, lap_c, boundary_coefficients)

Infer core coefficients from boundary coefficients and Laplacian matrices.

perturbation_amplitude(lap_q, core_coefficients, ...)

Compute the perturbation amplitude from the core Laplacian and core coefficients.

perturbation_amplitude_contributions(lap_q, ...)

Compute the perturbation amplitude and relative contributions from the core Laplacian and core coefficients.

generate_adjacencies(graph[, directed, sparse])

Generate the boundary and core adjacency matrices from a graph.

generate_boundary_laplacian(adj_b[, boundary_edge_minimum])

Generate the boundary Lb Laplacian from a boundary adjacency matrix.

generate_core_laplacians(lap_b, adj_c[, ...])

Generate the core Laplacians from a boundary Laplacian and core adjacency matrix.

toponpa(graph, relation_translator, datasets[, ...])

Compute the Network Perturbation Amplitude (NPA) for a given network and datasets.

evaluate_modifications(graph, relation_translator, ...)

Evaluate the generated network modifications.

perturbationx.topo_npa.permute_adjacency(adj: numpy.ndarray | scipy.sparse.sparray, permutations=('k2',), iterations=500, permutation_rate=1.0, seed=None)

Permute an adjacency matrix.

Parameters:
  • adj (np.ndarray | sp.sparray) – The adjacency matrix to permute.

  • permutations (list, optional) – The permutations to apply. May contain ‘k1’ and ‘k2’ in any order. Defaults to (‘k2’,).

  • iterations (int, optional) – The number of permutations to generate. Defaults to 500.

  • permutation_rate (float, optional) – The fraction of edges to permute. Defaults to 1.

  • seed (int, optional) – The seed for the random number generator.

Raises:

ValueError – If the adjacency matrix is not square.

Returns:

A dictionary of lists with permuted adjacency matrices, keyed by the permutation name.

Return type:

dict

perturbationx.topo_npa.permute_edge_list(edge_list: numpy.ndarray, node_list=None, iterations=500, method='k1', permutation_rate=1.0, seed=None)

Permute an edge list.

Parameters:
  • edge_list (np.ndarray) – The edge list to permute. Must be a 2D array with shape (n_edges, 4). The first two columns contain the source and target nodes, the third column contains the edge type, and the fourth column contains the confidence weight. Confidence weights are optional.

  • node_list (list, optional) – The list of nodes to use in the permutation. Only edges that connect nodes in this list are permuted. If None, the list is inferred from the edge list.

  • iterations (int, optional) – The number of permutations to generate. Defaults to 500.

  • method (str, optional) – The permutation method to use. Defaults to ‘k1’. May be ‘k1’ or ‘k2’.

  • permutation_rate (float | str, optional) – The fraction of edges to permute. Defaults to 1. If ‘confidence’, the confidence weights are used to determine the number of edges to permute. For each edge, a random number is drawn from a uniform distribution between 0 and 1. If the confidence weight is larger than this number, the edge is permuted.

  • seed (int, optional) – The seed for the random number generator.

Raises:

ValueError – If the permutation method is unknown.

Returns:

A list of permutations. Each permutation is a list of tuples with the source node, target node, and edge type. If the edge type is None, the edge is removed.

perturbationx.topo_npa.format_dataset(dataset: pandas.DataFrame, computing_statistics=True)

Format a dataset for use with toponpa.

Parameters:
  • dataset (pd.DataFrame) – The dataset to format. Must contain columns ‘nodeID’ and ‘logFC’. If computing_statistics is True, the dataset must also contain a column ‘stderr’ or ‘t’.

  • computing_statistics (bool, optional) – Whether statistics will be computed from the dataset. Defaults to True.

Raises:

ValueError – If the dataset is not a pandas.DataFrame, or if it does not contain columns ‘nodeID’ and ‘logFC’, or if computing_statistics is True and the dataset does not contain a column ‘stderr’ or ‘t’.

Returns:

The formatted dataset.

Return type:

pd.DataFrame

perturbationx.topo_npa.prune_network_dataset(graph: networkx.DiGraph, adj_b: numpy.ndarray | scipy.sparse.sparray, dataset: pandas.DataFrame, dataset_id: str, missing_value_pruning_mode='nullify', opposing_value_pruning_mode=None, opposing_value_minimum_amplitude=1.0, boundary_edge_minimum=6, verbose=True)

Prune a network and dataset to match each other.

Parameters:
  • graph (nx.DiGraph) – The network to prune.

  • adj_b (np.ndarray | sp.sparray) – The boundary adjacency matrix to prune.

  • dataset (pd.DataFrame) – The dataset to use for pruning.

  • dataset_id (str) – The name of the dataset.

  • missing_value_pruning_mode (str, optional) – The mode to use for pruning nodes with missing values. Must be one of ‘remove’ or ‘nullify’. Defaults to ‘nullify’.

  • opposing_value_pruning_mode (str, optional) – The mode to use for pruning edges with opposing values. Must be one of ‘remove’, ‘nullify’, or ‘none’. Defaults to None.

  • opposing_value_minimum_amplitude (float, optional) – The minimum amplitude of the dataset values to consider. Values with an absolute value smaller than this threshold are ignored. Defaults to 1.

  • boundary_edge_minimum (int, optional) – The minimum number of boundary edges a core node must have to be included in the pruned network. If a core node has fewer boundary edges after ‘remove’ pruning, all of its edges are removed. This parameter is ignored if ‘nullify’ pruning is used. Defaults to 6.

  • verbose (bool, optional) – Whether to log network statistics.

Raises:

ValueError – If the missing value pruning mode is invalid, or if the opposing value pruning mode is invalid, or if the boundary edge minimum is negative, or if the adjacency matrix is not two-dimensional, or if the dataset does not contain any boundary nodes.

Returns:

The pruned boundary adjacency matrix and the pruned dataset.

Return type:

(np.ndarray | sp.sparray, pd.DataFrame)

perturbationx.topo_npa.infer_graph_attributes(graph: networkx.DiGraph, relation_translator: perturbationx.io.RelationTranslator | None = None, verbose=True)

Infer attributes of a network and add them to the graph instance.

Parameters:
  • graph (nx.DiGraph) – The network to process.

  • relation_translator (perturbationx.RelationTranslator, optional) – The relation translator to use. If None, a new instance will be created.

  • verbose (bool, optional) – Whether to log network statistics.

Raises:

ValueError – If the same node appears in both the core and boundary network.

Returns:

The processed network.

Return type:

nx.DiGraph

perturbationx.topo_npa.compute_variances(lap_b: numpy.ndarray | scipy.sparse.sparray, lap_c: numpy.ndarray | scipy.sparse.sparray, lap_q: numpy.ndarray | scipy.sparse.sparray, stderr: numpy.ndarray, core_coefficients: numpy.ndarray, core_edge_count: int)

Compute the variance of the perturbation score and core node coefficients.

Parameters:
  • lap_b (np.ndarray | sp.sparray) – The Lb boundary Laplacian.

  • lap_c (np.ndarray | sp.sparray) – The Lc core Laplacian.

  • lap_q (np.ndarray | sp.sparray) – The Q core Laplacian.

  • stderr (np.ndarray) – The standard error of the boundary coefficients.

  • core_coefficients (np.ndarray) – The core node coefficients.

  • core_edge_count (int) – The number of edges in the core network.

Returns:

The variance of the perturbation score and core node coefficients.

Return type:

(np.ndarray, np.ndarray)

perturbationx.topo_npa.confidence_interval(values: numpy.ndarray, variances: numpy.ndarray, alpha=0.95)

Compute the confidence intervals for the given significance level.

Parameters:
  • values (np.ndarray) – The mean values for which to compute the confidence intervals.

  • variances (np.ndarray) – The variances of the values.

  • alpha (float, optional) – The confidence level. Defaults to 0.95.

Returns:

The lower and upper confidence intervals and the p-values.

Return type:

(np.ndarray, np.ndarray, np.ndarray)

perturbationx.topo_npa.test_permutations(lap_b: numpy.ndarray | scipy.sparse.sparray, lap_c: numpy.ndarray | scipy.sparse.sparray, lap_q: numpy.ndarray | scipy.sparse.sparray, adj_perms: dict, core_edge_count: int, boundary_coefficients: numpy.ndarray, permutations=('o', 'k2'), full_core_permutation=True, exact_boundary_outdegree=True, permutation_rate=1.0, iterations=500, seed=None)

Test the null hypothesis that the ordering of boundary coefficients and the distribution of core edges does not affect the perturbation score. This is a convenience function that calls test_boundary_permutations and test_core_permutations.

Parameters:
  • lap_b (np.ndarray | sp.sparray) – The Lb boundary Laplacian.

  • lap_c (np.ndarray | sp.sparray) – The Lc core Laplacian.

  • lap_q (np.ndarray | sp.sparray) – The Q core Laplacian.

  • adj_perms (dict) – The adjacency matrices of the core network permutations. The keys are the permutation names, the values are the adjacency matrices. The adjacency matrices may be sparse or dense.

  • core_edge_count (int) – The number of edges in the core network.

  • boundary_coefficients (np.ndarray) – The boundary coefficients.

  • permutations (list, optional) – The permutations to test. May contain ‘o’, ‘k1’, and ‘k2’ in any order. Defaults to (‘o’, ‘k2’). For ‘k1’ and ‘k2’, the adjacency matrices to test must be provided in adj_perms.

  • full_core_permutation (bool, optional) – Whether to use the full permutation matrix for each core permutation. Partial permutations sample core coefficients, while full permutations sample perturbation scores. Defaults to True.

  • exact_boundary_outdegree (bool, optional) – Whether to use the exact boundary outdegree. If False, the boundary outdegree is set to 1 for all core nodes with boundary edges. Defaults to True.

  • permutation_rate (float, optional) – The fraction of boundary coefficients to permute. Defaults to 1.

  • iterations (int, optional) – The number of boundary permutations to perform. Defaults to 500.

  • seed (int, optional) – The seed for the random number generator. Defaults to None.

Returns:

The distributions of perturbation scores under the null hypothesis.

Return type:

dict

perturbationx.topo_npa.p_value(value: float, distribution: list)

Compute the p-value for the given value in the given distribution.

Parameters:
  • value (float) – The value for which to compute the p-value.

  • distribution (list) – The distribution.

Returns:

The p-value.

Return type:

float

perturbationx.topo_npa.coefficient_inference(lap_b: numpy.ndarray | scipy.sparse.sparray, lap_c: numpy.ndarray | scipy.sparse.sparray, boundary_coefficients: numpy.ndarray)

Infer core coefficients from boundary coefficients and Laplacian matrices.

Parameters:
  • lap_b (np.ndarray | sp.sparray) – The Lb boundary Laplacian.

  • lap_c (np.ndarray | sp.sparray) – The Lc core Laplacian.

  • boundary_coefficients (np.ndarray) – The boundary coefficients.

Raises:

ValueError – If the Laplacian matrices are misshapen or if the matrix dimensions do not match.

Returns:

The inferred core coefficients.

Return type:

np.ndarray

perturbationx.topo_npa.perturbation_amplitude(lap_q: numpy.ndarray | scipy.sparse.sparray, core_coefficients: numpy.ndarray, core_edge_count: int)

Compute the perturbation amplitude from the core Laplacian and core coefficients.

Parameters:
  • lap_q (np.ndarray | sp.sparray) – The Q core Laplacian.

  • core_coefficients (np.ndarray) – The core coefficients.

  • core_edge_count (int) – The number of edges in the core network.

Raises:

ValueError – If the Laplacian matrix is misshapen or if the matrix dimensions do not match.

Returns:

The perturbation amplitude.

Return type:

np.ndarray

perturbationx.topo_npa.perturbation_amplitude_contributions(lap_q: numpy.ndarray | scipy.sparse.sparray, core_coefficients: numpy.ndarray, core_edge_count: int)

Compute the perturbation amplitude and relative contributions from the core Laplacian and core coefficients.

Parameters:
  • lap_q (np.ndarray | sp.sparray) – The Q core Laplacian.

  • core_coefficients (np.ndarray) – The core coefficients.

  • core_edge_count (int) – The number of edges in the core network.

Raises:

ValueError – If the Laplacian matrix is misshapen or if the matrix dimensions do not match.

Returns:

The perturbation amplitude and relative contributions.

Return type:

(np.ndarray, np.ndarray)

perturbationx.topo_npa.generate_adjacencies(graph: networkx.DiGraph, directed=False, sparse=True)

Generate the boundary and core adjacency matrices from a graph.

Parameters:
  • graph (nx.DiGraph) – The graph.

  • directed (bool, optional) – Whether to generate directed adjacency matrices. Defaults to False.

  • sparse (bool, optional) – Whether to generate sparse adjacency matrices. Defaults to True.

Returns:

The boundary and core adjacency matrices.

Return type:

(np.ndarray, np.ndarray) | (sp.sparray, sp.sparray)

perturbationx.topo_npa.generate_boundary_laplacian(adj_b: numpy.ndarray | scipy.sparse.sparray, boundary_edge_minimum=6)

Generate the boundary Lb Laplacian from a boundary adjacency matrix.

Parameters:
  • adj_b (np.ndarray | sp.sparray) – The boundary adjacency matrix.

  • boundary_edge_minimum (int, optional) – The minimum number of boundary edges a core node must have to be included in the Lb Laplacian. Nodes with fewer boundary edges are removed from the Lb Laplacian. Defaults to 6.

Raises:

ValueError – If the adjacency matrix is misshapen or if the boundary edge minimum is negative.

Returns:

The boundary Lb Laplacian.

Return type:

np.ndarray | sp.sparray

perturbationx.topo_npa.generate_core_laplacians(lap_b: numpy.ndarray | scipy.sparse.sparray, adj_c: numpy.ndarray | scipy.sparse.sparray, exact_boundary_outdegree=True)

Generate the core Laplacians from a boundary Laplacian and core adjacency matrix.

Parameters:
  • lap_b (np.ndarray | sp.sparray) – The boundary Laplacian.

  • adj_c (np.ndarray | sp.sparray) – The core adjacency matrix.

  • exact_boundary_outdegree (bool, optional) – Whether to use the exact boundary outdegree. If False, the boundary outdegree is set to 1 for all core nodes with boundary edges. Defaults to True.

Returns:

The core Laplacians.

Return type:

(np.ndarray, np.ndarray) | (sp.sparray, sp.sparray)

perturbationx.topo_npa.toponpa(graph: networkx.DiGraph, relation_translator: perturbationx.io.RelationTranslator, datasets: dict, missing_value_pruning_mode='nullify', opposing_value_pruning_mode=None, opposing_value_minimum_amplitude=1.0, boundary_edge_minimum=6, exact_boundary_outdegree=True, compute_statistics=True, alpha=0.95, permutations=('o', 'k2'), full_core_permutation=True, p_iters=500, p_rate=1.0, sparse=True, seed=None, verbose=True)

Compute the Network Perturbation Amplitude (NPA) for a given network and datasets.

Parameters:
  • graph (nx.DiGraph) – The network graph.

  • relation_translator (perturbationx.RelationTranslator) – The relation translator.

  • datasets (dict) – The datasets to use. The keys are the dataset IDs and the values are the datasets, which are pandas DataFrames.

  • missing_value_pruning_mode (str, optional) – The mode to use for pruning nodes with missing values. Must be one of ‘remove’ or ‘nullify’. Defaults to ‘nullify’.

  • opposing_value_pruning_mode (str, optional) – The mode to use for pruning edges with opposing values. Must be one of ‘remove’, ‘nullify’, or ‘none’. Defaults to None.

  • opposing_value_minimum_amplitude (float, optional) – The minimum amplitude of the dataset values to consider. Values with an absolute value smaller than this threshold are ignored. Defaults to 1.

  • boundary_edge_minimum (int, optional) – The minimum number of boundary edges a core node must have to be included in the pruned network. If a core node has fewer boundary edges after ‘remove’ pruning, all of its edges are removed. This parameter is ignored if ‘nullify’ pruning is used. Defaults to 6.

  • exact_boundary_outdegree (bool, optional) – Whether to use the exact boundary outdegree. If False, the boundary outdegree is set to 1 for all core nodes with boundary edges. Defaults to True.

  • compute_statistics (bool, optional) – Whether to compute variances and confidence intervals. Defaults to True.

  • alpha (float, optional) – The confidence level for the confidence intervals. Defaults to 0.95.

  • permutations (list, optional) – The permutations to test. May contain ‘o’, ‘k1’, and ‘k2’ in any order. Defaults to (‘o’, ‘k2’).

  • full_core_permutation (bool, optional) – Whether to use the full permutation matrix for each core permutation. Partial permutations sample core coefficients, while full permutations sample perturbation scores. Defaults to True.

  • p_iters (int, optional) – The number of permutations to perform. Defaults to 500.

  • p_rate (float, optional) – The fraction of boundary coefficients to permute. Defaults to 1.

  • sparse (bool, optional) – Whether to use sparse computation. Defaults to True.

  • seed (int, optional) – The seed for the random number generator. Defaults to None.

  • verbose (bool, optional) – Whether to log progress and network statistics. Defaults to True.

Raises:

ValueError – If the same node appears in both the core and boundary network.

Returns:

The NPA result.

Return type:

perturbationx.NPAResult

perturbationx.topo_npa.evaluate_modifications(graph: networkx.DiGraph, relation_translator: perturbationx.io.RelationTranslator, modifications: list, nodes: list, datasets: dict, missing_value_pruning_mode='nullify', opposing_value_pruning_mode=None, opposing_value_minimum_amplitude=1.0, boundary_edge_minimum=6, exact_boundary_outdegree=True, sparse=True, seed=None, verbose=True)

Evaluate the generated network modifications.

Parameters:
  • graph (nx.DiGraph) – The network graph.

  • relation_translator (perturbationx.RelationTranslator) – The relation translator.

  • modifications (list) – The list of modifications. Each modification is a list of tuples of the form (source, target, relation, confidence).

  • nodes (list) – The nodes that were modified.

  • datasets (dict) – The datasets to use. The keys are the dataset IDs and the values are the datasets, which are pandas DataFrames.

  • missing_value_pruning_mode (str, optional) – The mode to use for pruning nodes with missing values. Must be one of ‘remove’ or ‘nullify’. Defaults to ‘nullify’.

  • opposing_value_pruning_mode (str, optional) – The mode to use for pruning edges with opposing values. Must be one of ‘remove’, ‘nullify’, or ‘none’. Defaults to None.

  • opposing_value_minimum_amplitude (float, optional) – The minimum amplitude of the dataset values to consider. Values with an absolute value smaller than this threshold are ignored. Defaults to 1.

  • boundary_edge_minimum (int, optional) – The minimum number of boundary edges a core node must have to be included in the pruned network. If a core node has fewer boundary edges after ‘remove’ pruning, all of its edges are removed. This parameter is ignored if ‘nullify’ pruning is used. Defaults to 6.

  • exact_boundary_outdegree (bool, optional) – Whether to use the exact boundary outdegree. If False, the boundary outdegree is set to 1 for all core nodes with boundary edges. Defaults to True.

  • sparse (bool, optional) – Whether to use sparse computation. Defaults to True.

  • seed (int, optional) – The seed for the random number generator. Defaults to None.

  • verbose (bool, optional) – Whether to log progress and network statistics. Defaults to True.

Raises:

ValueError – If the same node appears in both the core and boundary network.

Returns:

List of tuples of the form (modification, npa), where modification is the modification and npa is a dictionary of the form {dataset_id: npa}.

Return type:

list