obnb.graph

obnb.graph.base.BaseGraph

Base Graph object that contains basic graph operations.

DenseGraph

DenseGraph object storing data using numpy array.

SparseGraph

SparseGraph object storing data as adjacency list.

DirectedSparseGraph

Directed sparse graph that also keeps track of reversed edge data.

OntologyGraph

Ontology graph.

class obnb.graph.base.BaseGraph(log_level='WARNING', verbose=False, logger=None)[source]

Base Graph object that contains basic graph operations.

Initialize BaseGraph object.

Parameters:
  • log_level (Literal['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET']) –

  • verbose (bool) –

  • logger (Logger | None) –

add_edge(node_id1, node_id2, weight=1.0, reduction=None)[source]

Add or update an edge in the graph.

Parameters:
  • node_id1 (str) – ID of node 1.

  • node_id2 (str) – ID of node 2.

  • weight (float) – Edge weight to use.

  • reduction (Optional[str]) – Type of edge reduction to use if the target edge already exist. If not set, warn if old edge exists with different edge weight value then the input edge weight, and then overwrite it with the new value.

add_node(node, exist_ok=False)[source]

Add a new node to the graph.

Parameters:
  • node (str) – Name (or ID) of the node.

  • exist_ok (bool) – Do not raise IDExistsError even if the node to be added already exist.

add_nodes(nodes, exist_ok=False)[source]

Add new nodes to the graph.

Parameters:
  • node – Names (or IDs) of the nodes.

  • exist_ok (bool) – Do not raise IDExistsError even if the node to be added already exist.

  • nodes (Iterable[str]) –

connected_components()[source]

Find connected components.

Return type:

List[List[str]]

get_neighbors(node, direction='both')[source]

Get neighboring nodes of the input node.

Parameters:
  • node (Union[str, int]) – Node index (int) or node ID (str).

  • direction (Literal['out', 'in', 'both']) – Direction of the edges to be considered [“in”, “out”, “both”], default is “both”.

Return type:

List[str]

Returns:

List of neighboring node IDs.

get_node_id(node)[source]

Return the node ID given the node index or node ID.

Parameters:

node (Union[str, int]) – Node index (int) or node ID (str). If input is already node ID, return directly. If input is node index, then return the node ID of the corresponding node index.

Return type:

str

Returns:

Node ID.

get_node_idx(node)[source]

Return the node index given the node ID or node index.

Parameters:

node (Union[str, int]) – Node index (int) or node ID (str). If input is already node index, return directly. If input is node index, then return the node index of the corresponding node ID.

Return type:

int

Returns:

Node index.

property idmap

Map node ID to the corresponding index.

induced_subgraph(node_ids)[source]

Return a subgraph induced by a subset of nodes.

Parameters:

node_ids (List[str]) –

is_connected()[source]

Return True if the graph is connected.

Return type:

bool

isempty()[source]

Bool: true if graph is empty, indicated by empty idmap.

largest_connected_subgraph()[source]

Return the largest connected subgraph of the graph.

property node_ids: Tuple[str, ...]

Return node IDs as a tuple.

property num_edges: int

Int: Number of edges.

property num_nodes: int

Return the number of nodes in the graph indicated by the ID map.

remove_edge(node_id1, node_id2)[source]

Remove an edge in the graph.

Parameters:
  • node_id1 (str) – ID of node 1.

  • node_id2 (str) – ID of node 2.

property size

Int: number of nodes in graph.

Graph and feature vector objects.

class obnb.graph.DenseGraph(log_level='WARNING', verbose=False, logger=None)[source]

DenseGraph object storing data using numpy array.

Initialize DenseGraph object.

Parameters:
  • log_level (Literal['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET']) –

  • verbose (bool) –

  • logger (Logger | None) –

connected_components()[source]

Find connected components via Breadth First Search.

Returns a list of connected components sorted by the number of nodes, each of which is a list of node ids within a connected component. :rtype: List[List[str]]

Note

This BFS approach assumes the graph is undirected.

Return type:

List[List[str]]

degree(weighted=False, direction='out')[source]

Return node degrees.

Parameters:
  • weighted (bool) – Whether or not consider edge weights.

  • direction (str) – ‘in’ or ‘out’ degrees. This option is only relevant for directed graphs.

Return type:

ndarray

classmethod from_cx_stream_file(*args, **kwargs)[source]

Construct DenseGraph from CX stream files.

classmethod from_edglst(path_to_edglst, weighted, directed, **kwargs)[source]

Read from edgelist and construct BaseGraph.

classmethod from_mat(mat, ids=None, **kwargs)[source]

Construct DenseGraph using ids and adjcency matrix.

Parameters:
  • mat (numpy.ndarray) – 2D numpy array of adjacency matrix

  • ids (list or IDmap) – list of IDs or idmap of the adjacency matrix, if None, use input ordering of nodes as IDs (default: None).

classmethod from_npy(path_to_npy, **kwargs)[source]

Read numpy array from .npy file and construct BaseGraph.

get_edge(node_id1, node_id2)[source]

Return edge weight between node_id1 and node_id2.

Parameters:
  • node_id1 (str) – ID of first node

  • node_id2 (str) – ID of second node

induced_subgraph(node_ids)[source]

Return a subgraph induced by a subset of nodes.

Parameters:

node_ids (List[str]) – List of nodes of interest.

property mat

Node information stored as numpy matrix.

property nonzeros

Return an matrix indicating nonzero entries of the adjacency.

Note

Technically, it considers ‘positive’ ratherthan ‘nonzero’. That is, the entries in the adjacency that have positive edge values.

property norm_mat

Column normalized adjacency matrix.

property num_edges: int

Int: Number of edges.

propagate(seed)[source]

Propagate label informmation.

Parameters:
  • seeds – 1-dimensinoal numpy array where each entry is the seed information for a specific node.

  • seed (ndarray) –

Raises:

ValueError – If seed is not a 1-dimensional array with the size of number of the nodes in the graph.

Return type:

ndarray

to_coo()[source]

Convert DenseGraph to edge_index and edge_weight.

to_feature()[source]

Convert DenseGraph to a FeatureVec.

to_sparse_graph()[source]

Convert DenseGraphh to a SparseGraph.

class obnb.graph.DirectedSparseGraph(weighted=True, directed=True, log_level='WARNING', verbose=False, logger=None)[source]

Directed sparse graph that also keeps track of reversed edge data.

The reversed edge data is captured for more efficient “propagation upwards” in addition to the more natural “propagation downwards” operation.

Initialize the directed sparse graoh.

Parameters:
  • weighted (bool) –

  • directed (bool) –

  • log_level (Literal['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET']) –

  • verbose (bool) –

  • logger (Logger | None) –

add_edge(node_id1, node_id2, weight=1.0, reduction=None)[source]

Add a directed edge and record in the reversed adjacency list.

Parameters:
  • node_id1 (str) –

  • node_id2 (str) –

  • weight (float) –

  • reduction (str | None) –

connected_components()[source]

Find connected components.

degree(weighted=False, direction='out')[source]

Return node degrees.

Parameters:
  • weighted (bool) – Whether or not consider edge weights.

  • direction (str) – ‘in’ or ‘out’ degrees. This option is only relevant for directed graphs.

Return type:

ndarray

property directed: bool

Bool: Indicate whether edges are directed or not.

remove_edge(node_id1, node_id2)[source]

Remove an edge in the graph.

Parameters:
  • node_id1 (str) – ID of node 1.

  • node_id2 (str) – ID of node 2.

property rev_edge_data: List[Dict[int, float]]

Adjacency list of reversed edge direction.

to_undirected_sparse_graph(reduction='none', **kwargs)[source]

Turn the directed sparse graph into an undirected sparse graph.

Parameters:

reduction – Type of edge weight reduction to use when directed edges from both directions (source->target and target->source) are present. By default, no reduction will be used, which raises a ValueError exception in the presence of bidirectionarl edges. Other avialble reduction strategies are: “mean” and “max”.

class obnb.graph.OntologyGraph(log_level='WARNING', verbose=False, logger=None, **kwargs)[source]

Ontology graph.

An ontology graph is a directed acyclic graph (DAG). Here, we represent this data type using DirectedSparseGraph, which keeps track of both the forward direction of edges (_edge_data) and the reversed direction of edges (_rev_edge_data). This bidirectional awareness is useful in the context of propagating information “upwards” or “downloads”.

The idmap attribute is swapped with a more functional IDProp object that allows the storing of node information such as the name and the node attributes.

Initialize the ontology graph.

Parameters:
  • log_level (Literal['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET']) –

  • verbose (bool) –

  • logger (Logger | None) –

add_edge(node_id1, node_id2, weight=1.0, reduction=None)[source]

Add a directed edge and record in the reversed adjacency list.

Parameters:
  • node_id1 (str) –

  • node_id2 (str) –

  • weight (float) –

  • reduction (str | None) –

ancestors(node)[source]

Return the ancestor nodes of a given node. :rtype: Set[str]

Note

To enable cache utilization to optimize dynamic programming, execute this with the cach_on_static context. Note that this would only be done when not more structural changes (node and edge modifications) will be introduced throughout the span of this context.

Parameters:

node (str | int) –

Return type:

Set[str]

cache_on_static()[source]

Use cached values to speed up computation on static ontology.

Note

This should only be used when the ontology graph is stable, meaning that no further changes including edge and node addition/removal will be introduced. However, node attribute manipulation is ok.

classmethod from_obo(path)[source]

Construct the ontology graph from an obo file.

Parameters:

path (str) –

get_node_attr(node)[source]

Get node attribute of a given node.

Parameters:

node (Union[str, int]) – Node index (int) or node ID (str).

Return type:

Optional[List[str]]

get_node_name(node)[source]

Get the name of a given node.

Parameters:

node (Union[str, int]) – Node index (int) or node ID (str).

Return type:

str

static iter_terms(fp)[source]

Iterate over terms from a file pointer and yield OBO terms.

Parameters:

fp (TextIO) – File pointer, can be iterated over the lines.

Return type:

Iterator[Tuple[str, str, Optional[List[str]], Optional[List[str]]]]

static parse_stanza_simplified(stanza_lines)[source]

Parse OBO term from the stanza.

Parse unique id and name per ontology. Parse list of xref, is_a, and part_of relationships (other relationships, e.g., regulates, are ignored).

Note

term_xrefs and term_parents can be None if such information is not available. Meanwhile, term_id and term_name will always be available; otherwise an exception will be raised.

Parameters:

stanza_lines (Iterable[str]) – Iterable of strings (lines), and each line contains certain type of information inferred by the line prefix. Here, we are only interested in four such items, namely “id: ” (identifier of the term), “name: ” (name of the term), “xref: ” (cross reference of the term) and “is_a: ” (parent(s) of the term).

Raises:

OboTermIncompleteError – If either term_id or term_name is not available.

Return type:

Tuple[str, str, Optional[List[str]], Optional[List[str]]]

propagate_node_attrs(pbar=False)[source]

Propagate node attribute upwards the ontology.

Starting from the leaf node, propagate the node attributes to its parent node so that the parent node contains all the node attributes from its children, plus its original node attributes. This is done via recursion _aggregate_node_attrs.

Note

To enable effective dynamic programming of propagating attributes, lru_cache is used to decorate _aggregate_node_attrs. By the end of this function run, the cache is cleared to prevent overhead of calling __eq__ in the next execution.

Parameters:

pbar (bool) – If set to True, display a progress bar showing the progress of annotation propagation (default: False).

read_obo(path, xref_prefix=None)[source]

Read OBO-formatted ontology.

Parameters:
  • path (str) – Path to the OBO file.

  • xref_prefix (str, optional) – Prefix of xref to be captured and return a dictionary of xref to term_id. If not set, then do not capture any xref (default: None).

Return type:

Dict[str, Set[str]]

Returns:

A dictionary where the key is a cross reference term (or the ontology term) id, and the corresponding value is a set of term ids that are related to the key.

release_cache()[source]

Release cache.

restrict_to_branch(node, inclusive=True)[source]

Restrict the ontology to a branch under the specified node.

For example, the ontology

A | B D | | C E F

restricted to the node D (inclusive) is

D | E F

Parameters:
  • node (Union[str, int]) – The node under which the branch will be restricted.

  • inclusive (bool) – If set to True, then include the specified node in the branch. Otherwise, do not include.

Returns:

A new ontology graph restricted to the branch under

the specified node.

Return type:

OntologyGraph

set_node_attr(node, node_attr)[source]

Set node attribute of a given node.

Parameters:
  • node (Union[str, int]) – Node index (int) or node ID (str).

  • node_attr (list of str) – Node attributes to set.

set_node_name(node, node_name)[source]

Set the name of a given node.

Parameters:
  • node (Union[str, int]) – Node index (int) or node ID (str).

  • node_attr (list of str) – Node attributes to set.

  • node_name (str) –

update_node_attr(node, new_node_attr)[source]

Update node attributes of a given node.

Can update using a single instance or a list of instances.

Parameters:
  • node (Union[str, int]) – Node index (int) or node ID (str).

  • new_node_attr (Union[List[str], str]) – Node attribute(s) to update.

class obnb.graph.SparseGraph(weighted=True, directed=False, self_loops=False, log_level='WARNING', verbose=False, logger=None)[source]

SparseGraph object storing data as adjacency list.

Initialize SparseGraph object.

Parameters:
  • weighted (bool) –

  • directed (bool) –

  • self_loops (bool) –

  • log_level (Literal['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET']) –

  • verbose (bool) –

  • logger (Logger | None) –

add_edge(node_id1, node_id2, weight=1.0, reduction=None)[source]

Add or update an edge in the sparse graph.

Parameters:
  • node_id1 (str) – ID of node 1.

  • node_id2 (str) – ID of node 2.

  • weight (float) – Edge weight to use.

  • reduction (str, optional) – Type of edge reduction to use if edge already existsed. If it is not set, warn if old edge exists with different values and overwrite it with the new avlue. Possible options are ‘min’, ‘max’, and None (default: None).

connected_components()[source]

Find connected components via Breadth First Search.

Returns a list of connected components sorted by the number of nodes, each of which is a list of node ids within a connected component.

Return type:

List[List[str]]

construct_adj_vec(src_idx)[source]

Construct and return a specific row vector of the adjacency matrix.

Parameters:

src_idx (int) – index of row

degree(weighted=False, direction='out')[source]

Return node degrees.

Parameters:
  • weighted (bool) – Whether or not consider edge weights.

  • direction (str) – ‘in’ or ‘out’ degrees. This option is only relevant for directed graphs.

Return type:

ndarray

property directed

Bool: Indicate whether edges are directed or not.

property edge_data: List[Dict[int, float]]

list of dict: adjacency list data.

edge_gen(compact=True)[source]

Iterate over all edges in the graph.

Parameters:

compact (bool) – If set to True, then only show one of the edges for the undirected graph. Otherwise, print the edges from both directions.

Yields:

A three-tuple containing (1) the source node id, (2) the target node id, and (3) the edge weight.

Return type:

Iterator[Tuple[str, str, float]]

static edglst_reader(edg_path, weighted, directed, cut_threshold, show_pbar=True)[source]

Edge list file reader.

Read line by line from a edge list file and yield (node_id1, node_id2, weight)

static edglst_writer(outpth, edge_gen, weighted, directed, cut_threshold)[source]

Edge list file writer.

Write line by line to edge list.

classmethod from_cx_stream_file(path, directed=False, self_loops=False, **kwargs)[source]

Construct SparseGraph from a CX stream file.

Parameters:
  • path (str) –

  • directed (bool) –

  • self_loops (bool) –

classmethod from_mat(mat, ids=None, **kwargs)[source]

Construct SparseGraph using ids and adjacency matrix.

Parameters:
  • mat (numpy.ndarray) – 2D numpy array of adjacency matrix

  • ids (list or IDmap) – list of IDs or idmap of the adjacency matrix, if None, use input ordering of nodes as IDs. (default: None).

classmethod from_npz(path, weighted, directed=False, **kwargs)[source]

Construct SparseGraph from a npz file.

induced_subgraph(node_ids)[source]

Return a subgraph induced by a subset of nodes.

Parameters:

node_ids (List[str]) – List of nodes of interest.

static npy_reader(mat, weighted, directed, cut_threshold, show_pbar=False)[source]

Numpy reader.

Load an numpy matrix (either from file path or numpy matrix directly) and yield node_id1, node_id2, weight.

Note

The matrix should be in shape (N, N+1), where N is number of nodes. The first column of the matrix encodes the node IDs

Progress bar is not supported for the numpy reader. So show_pbar the pbar option does nothing even it is set to True.

property num_edges: int

Int: Number of edges.

read(file, reader='edglst', cut_threshold=0, show_pbar=True)[source]

Read data and construct sparse graph.

Parameters:
  • file (str) – path to input file

  • weighted (bool) – if not weighted, all weights are set to 1

  • directed (bool) – if not directed, automatically add 2 edges

  • reader – generator function that yield edges from file

  • cut_threshold (float) – threshold below which edges are not considered

  • show_pbar – Show progress bar for loading the graph (if the reader supports this option).

TODO: reader part looks sus, check unit test

read_cx_stream_file(path, interaction_types=None, node_id_prefix='ncbigene', node_id_entry='r', default_edge_weight=1.0, edge_weight_attr_name=None, reduction='max', use_node_alias=False, node_id_converter=None)[source]

Read from a CX stream file.

Parameters:
  • path (str) – Path to the cx file.

  • interaction_types (list of str, optional) – Types of interactions to be considered if not set, consider all (default: None).

  • node_id_prefix (str, optional) – Prefix of the ID to be considered, if not set, consider all IDs (default: “ncbigene”).

  • node_id_entry (str) – use “n” for name of the entity, or “r” for other reprentations (default: “r”).

  • default_edge_weight (float) – edge weight to use if no edge weights specified by edge attributes (default: 1.0).

  • edge_weight_attr_name (str, optional) – name of the edge attribute to use for edge weights, must be numeric type, i.e. “d” must be “double” or “integer” or “long”. If not set, do to use any edge attributes (default: None)

  • reduction (str, optional) – How to due with duplicated edge weights, current options are “max” and “min”, which uses the maximum or minimum value among the duplicated edge weights; alternatively, if set to None, then it will use the last edge weight seen (default: “max”).

  • use_node_alias (bool) – If set, use node alias as node ID, otherwise use the default node aspect for reading node ID. Similar to the default node ID option, this requires that the prefix matches node_id_prefix if set. Note that when use_node_alias is set, the node_id_prefix becomes mandatory.If multiple node ID aliases with matching prefix are available, use the first one. (default: False)

  • node_id_converter (Mapping[str, str], optional) – A mapping object that maps a given node ID to a new node ID of interest.

read_npz(path, cut_threshold=None)[source]

Read from npz file.

The file contains two fields: “edge_index” and “node_ids”, where the first is a 2 x m numpy array encoding the m edges, and the second is a one dimensional numpy array of str type encoding the node IDs. Additionally, “edge_weight” is available if the graph is weighted, which is a one dimensional numpy array (of size m) of edge weights.

Note

The weighted and directed options are for compatibility

to the SparseGraph object.

Parameters:
  • path (str) – path to the .npz file

  • cut_threshold (float, optional) – threshold of edge weights below which the edges are ignored, if not set, consider all edges (default: None).

remove_edge(node_id1, node_id2)[source]

Remove an edge in the graph.

Parameters:
  • node_id1 (str) – ID of node 1.

  • node_id2 (str) – ID of node 2.

save(outpth, writer='edglst', cut_threshold=0)[source]

Save graph to file.

Parameters:
  • outpth (str) – path to output file

  • writer (str) – writer function (or name of default writer) to generate file (‘edglst’, ‘npy’).

  • cut_threshold (float) – threshold of edge weights below which the edges are not considered.

save_npz(out_path, weighted=True)[source]

Save the graph as npz.

The npz file contains three fields, including “edge_index”,

“edge_weight”, and “node_ids”. If the the weighted option is set to False, then the “edge_weight” would not be saved.

Parameters:
  • out_path (str) – path to the output file.

  • weighted (bool) – whether should save the edge weights (default: True).

to_adjmat(default_val=0)[source]

Construct adjacency matrix from edgelist data.

Parameters:

default_val (float) – default value for missing edges

to_coo()[source]

Convert to edge_index and edge_weight.

to_dense_graph()[source]

Convert SparseGraph to a DenseGraph.

property weighted

Bool: Whether weights (3rd column in edgelist) are available.