obnb.data

`annotated_ontology`	Annotated ontology data.
`annotation`	Annotation data.
`network`	Network data.
`ontology`	Ontology data.
`BioGRID`	The BioGRID Protein Protein Interaction network.
`BioPlex`	The BioPlex3-shared Protein Protein Interaction network.
`ComPPIHumanInt`	The ComPPI human integrated interaction network.
`ConsensusPathDB`	The ConsensusPathDB interaction network.
`FunCoup`	The FunCoup funcional association network.
`HIPPIE`	The HIPPIE Human scored Protein Protein Interaction network.
`HuRI`	The Human Reference Interactome.
`HuMAP`	The hu.MAP 2.0 protein interaction network.
`HumanBaseTopGlobal`	The HumanBase-global network (top edges).
`HumanNet`	The HumanNetv3 gene interaction networks.
`HumanNet_CC`	Overloaded class `HumanNet_CC` inherited from `HumanNet`.
`HumanNet_FN`	Overloaded class `HumanNet_FN` inherited from `HumanNet`.
`OmniPath`	The OmniPath intra- dand inter-cellular signaling knowledge base.
`PCNet`	The PCNet (v1.3) Parsimonious Composite human gene interaction network.
`ProteomeHD`	The ProteomeHD Protein Protein Interaction network.
`SIGNOR`	The SIGnaling Network Open Resource human gene interaction network.
`STRING`	The STRING Human Protein Protein Interaction network.
`DISEASES`	The DISEASES disease gene set collection.
`DISEASES_ExperimentsFiltered`	Overloaded class `DISEASES_ExperimentsFiltered` inherited from `DISEASES`.
`DISEASES_ExperimentsFull`	Overloaded class `DISEASES_ExperimentsFull` inherited from `DISEASES`.
`DISEASES_KnowledgeFiltered`	Overloaded class `DISEASES_KnowledgeFiltered` inherited from `DISEASES`.
`DISEASES_KnowledgeFull`	Overloaded class `DISEASES_KnowledgeFull` inherited from `DISEASES`.
`DISEASES_TextminingFiltered`	Overloaded class `DISEASES_TextminingFiltered` inherited from `DISEASES`.
`DISEASES_TextminingFull`	Overloaded class `DISEASES_TextminingFull` inherited from `DISEASES`.
`DisGeNET`	The DisGeNET disease gene set collection.
`DisGeNET_Animal`	Overloaded class `DisGeNET_Animal` inherited from `DisGeNET`.
`DisGeNET_BEFREE`	Overloaded class `DisGeNET_BEFREE` inherited from `DisGeNET`.
`DisGeNET_Curated`	Overloaded class `DisGeNET_Curated` inherited from `DisGeNET`.
`DisGeNET_GWAS`	Overloaded class `DisGeNET_GWAS` inherited from `DisGeNET`.
`GOBP`	The Gene Ontology Biological Process gene set collection.
`GOCC`	The Gene Ontology Cellular Component gene set collection.
`GOMF`	The Gene Ontology Molecular Function gene set collection.
`HPO`	The HPO disease and triat gene set collection.
`DISEASESAnnotation`	DISEASES disease gene annotations from the JensonLab.
`DisGeNETAnnotation`	DisGeNET disease gene annotations.
`GeneOntologyAnnotation`	Gene Ontology annotations.
`HumanPhenotypeOntologyAnnotation`	The Human Phenotype Ontology gene annotations.
`GeneOntology`	The Gene Ontology.
`MondoDiseaseOntology`	The Mondo Disease Ontology.

Base data objects

class obnb.data.base.BaseData(root, *, version='latest', redownload=False, reprocess=False, retransform=False, log_level='INFO', pre_transform='default', transform=None, cache_transform=True, download_cache=True, gene_id_converter='HumanEntrez', **kwargs)[source]

BaseData object.

This is an abstract (mixin) class for constructing data objects. The main methods are _download and _process, which are wrappers that download the raw files and process the files into the final processed file if they are not yet available. Otherwise, directly load the previously processed file.

Initialize BaseData object.

Parameters:

root (str) – Root directory of the data files.
version (str) – Name of the version of the data to use, default setting ‘latest’ will download and process the latest data from the source.
redownload (bool) – If set to True, redownload the data even if all raw files are downloaded.
reprocess (bool) – If set to True, reprocess the data even if the processed data is available.
retransform (bool) – If set to tTrue, retransform the data even if the cached transformation is available.
pre_transform (Any) – Optional pre_transformation to be applied to the data object before saving as the final processed data object. If set to ‘default’, will use the default pre_transformation.
transform (Optional[Any]) – Optional transformation to be applied to the data object.
cache_transform (bool) – Whether or not to cache the transformed data. The cached transformed data will be saved under <data_root_directory>/processed/.cache/.
download_cache (bool) – If set to True, then check to see if <root>/.cache exists, and if not, pull the cache from versioned archive.
gene_id_converter (Optional[Union[Mapping[str, str], str]]) – A mapping object that maps a given node ID to a new node ID of interest. Or the name of a predefined MygeneInfoConverter object as a string.
log_level (Literal['CRITICAL', 'ERROR', 'WARNING', 'INFO', 'DEBUG', 'NOTSET']) –

Note

The pre_transform option is only valid when version is set to ‘latest’.

apply_transform(transform)[source]

Apply a (pre-)transformation to the loaded data.

Parameters:: transform (Any) –

property cache_dir: str: Return transformed data cache directory.

property classname: str: Return data object name.

download()[source]: Download raw files.

download_archive(version)[source]

Load data from archived version that ensures reproducibility.

Note

The downloaded data is assumed to be a zip file, which will be unzipped and saved to the root directory.

Parameters:: version (str) – Archival version.

download_completed()[source]

Check if all raw files are downloaded.

Return type:: bool

property info_dir: str: Return info file directory.

property info_log_path: str: Return path to the data processing information log file.

load_processed_data(path=None)[source]

Load processed data into the data object.

Note

Any existing data must be purged upon calling this function. That is, the data object (self) will contain exactly the data loaded, but not not anything else.

Parameters:: path (str | None) –

process()[source]: Process raw files and save processed data.

process_completed()[source]

Check if all processed files are available..

Return type:: bool

property processed_dir: str: Return raw file directory.

processed_file_path(idx)[source]

Return path to a processed file given its index.

Return type:: str

property processed_files: List[str]: Return a list of processed file names.

property raw_dir: str: Return raw file directory.

raw_file_path(idx)[source]

Return path to a raw file given its index.

Return type:: str
Parameters:: idx (int) –

property raw_files: List[str]: Return a list of raw file names.

save(path)[source]

Save the data object to file.

Parameters:: path – Path to the data file to save.

to_config()[source]

Generate configuration dictionary from the data object. :rtype: Dict[str, Any]

Note

If a parameter of the data object is a dictionary, it cannot contain value that is another dictionary. The only exception currently is pre_transform.

Return type:: Dict[str, Any]

class obnb.data.network.base.BaseNDExData(root, weighted, directed, largest_comp=False, gene_id_converter='HumanEntrez', cx_kwargs=None, **kwargs)[source]

The BaseNdexData object for retrieving networks from NDEX.

www.ndexbio.org

Initialize the BaseNdexData object.

Parameters:

root (str) – The root directory of the data.
weighted (bool) – Whether the network is weighted or not.
directed (bool) – Whether the network is directed or not.
largest_comp (bool) – If set to True, then only take the largest connected component of the graph.
cx_kwargs (Optional[Dict[str, Any]]) – Keyword arguments used for reading the cx file.
gene_id_converter (Mapping[str, str] | str | None) –

download()[source]: Download data from NDEX via ndex2 client.

load_processed_data(path=None)[source]

Load processed network.

Parameters:: path (str | None) –

process()[source]: Process data and save for later usage.

property processed_files: List[str]: Return a list of processed file names.

property raw_files: List[str]: Return a list of raw file names.

class obnb.data.annotated_ontology.base.BaseAnnotatedOntologyData(root, *, annotation_factory, ontology_factory, annotation_kwargs={}, ontology_kwargs={}, **kwargs)[source]

General object for labelset collection from annotated ontology.

Initialize the BaseAnnotatedOntologyData object.

Parameters:

root (str) –
annotation_factory (Type[BaseAnnotationData]) –
ontology_factory (Type[BaseOntologyData]) –
annotation_kwargs (Dict[str, Any]) –
ontology_kwargs (Dict[str, Any]) –

apply_transform(transform)[source]

Apply a (pre-)transformation to the loaded data.

Parameters:: transform (Any) –

download()[source]: Download raw files.

download_completed()[source]

Check if all raw files are downloaded.

Return type:: bool

load_processed_data(path=None)[source]

Load processed labels from GMT.

Parameters:: path (str | None) –

process()[source]: Process raw data and save as gmt for future usage.

property processed_files: List[str]: Return a list of processed file names.

save(path)[source]: Save the labelset collection as gmt.

Data objects

Interface with various databases to retrieve data.

class obnb.data.BioGRID(root, *, weighted=False, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The BioGRID Protein Protein Interaction network.

Initialize the BioGRID network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.BioPlex(root, *, weighted=False, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The BioPlex3-shared Protein Protein Interaction network.

Initialize the BioPlex network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.ComPPIHumanInt(root, weighted=True, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The ComPPI human integrated interaction network.

Initialize the CompPPI object.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.ConsensusPathDB(root, weighted=True, directed=False, largest_comp=True, gene_id_converter=None, fill_value='max', **kwargs)[source]

The ConsensusPathDB interaction network.

The ConsensusPathDB integrates gene interaction evidences from many databases:

BIND

BioCarta

Biogrid

CORUM

DIP

HPRD

HumanCyc

INOH

InnateDB

IntAct

MINT

MIPS-MPPI

Manual upload

MatrixDB

NetPath

PDB

PDZBase

PID

PINdb

PhosphoPOINT

Reactome

Spike

These sources cover a wide range of interaction tyeps:

Protein interactions

Signaling reactions

Metabolic reactions

Gene regulations

Genetic interactions

Drug-target interactions

Biochemical pathways

Check out the ConsensusPathDB webpage for more information about the specific types of interactions provided by each source databases.

[Last updated: 2023-02-13]

Initialize the ConsensusPathDB object.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –
fill_value (Literal['mean', 'max']) –

property raw_files: List[str]: Return a list of raw file names.

class obnb.data.DISEASES(root, score_min=3, score_max=None, channel='integrated_full', min_size=10, max_size=600, overlap=0.7, jaccard=0.5, gene_id_converter='HumanEntrez', **kwargs)[source]

The DISEASES disease gene set collection.

Initialize the DisGeNET data object.

Parameters:

root (str) –
score_min (float | None) –
score_max (float | None) –
channel (str) –
min_size (int) –
max_size (int) –
overlap (float) –
jaccard (float) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.DISEASESAnnotation(root, *, score_min=3, score_max=None, channel='integrated_full', **kwargs)[source]

DISEASES disease gene annotations from the JensonLab.

Disease gene associations are retrieved from diseases.jensenlab.org

This is the integrated disease annotation channel from the Jensen Lab DISEASES annotation database, which combines evidences from text-mining, knowledge, and experiment channels. See the DISEASES webpage for more information

Initialize DisGeNET annotation data object.

Parameters:

root (str) –
score_min (float | None) –
score_max (float | None) –
channel (str) –

get_column_names()[source]

Get annotation table column names for the selected channel.

All channels start with four columns: gene_id, gene_name, term_id, and term_name. The extended columns are specific to each type of channels: :rtype: List[str]

integrated contains an additional confidence score column.
textmining contains z-score, the confidence score, and the URL to the abstracts view.
knowledge contains the source database, evidence type, and the confidence score.
experiments continas the source data base, source score, and the confidence score.

See the download page for more information.

Note

The confidence scores (score) are normalized across the DISEASES database. In brief, it is a “5-star” system, where the disease-gene associations with highest confidence are assigned with full socre of five. The confidence scores from the textmining channel are computed as half of the z-score and are capped at the value of four. On the other hand, the confidence scores from the experiments channel were calibrated using the gold-standard benchmarking scheme, where the gold-standards are derived from curated annotations, i.e., the knowledge channel. All annotations from the knowledge channel are scored 4-5 stars.

Return type:: List[str]

load_processed_data()[source]: Load processed data into the data object.

Note

Any existing data must be purged upon calling this function. That is, the data object (self) will contain exactly the data loaded, but not not anything else.

class obnb.data.DISEASES_ExperimentsFiltered(*args, **kwargs)

Overloaded class DISEASES_ExperimentsFiltered inherited from DISEASES.

kwargs: {‘channel’: ‘experiments_filtered’, ‘score_min’: None}

Initialize the DisGeNET data object.

class obnb.data.DISEASES_ExperimentsFull(*args, **kwargs)

Overloaded class DISEASES_ExperimentsFull inherited from DISEASES.

kwargs: {‘channel’: ‘experiments_full’, ‘score_min’: None}

Initialize the DisGeNET data object.

class obnb.data.DISEASES_KnowledgeFiltered(*args, **kwargs)

Overloaded class DISEASES_KnowledgeFiltered inherited from DISEASES.

kwargs: {‘channel’: ‘knowledge_filtered’, ‘score_min’: None}

Initialize the DisGeNET data object.

class obnb.data.DISEASES_KnowledgeFull(*args, **kwargs)

Overloaded class DISEASES_KnowledgeFull inherited from DISEASES.

kwargs: {‘channel’: ‘knowledge_full’, ‘score_min’: None}

Initialize the DisGeNET data object.

class obnb.data.DISEASES_TextminingFiltered(*args, **kwargs)

Overloaded class DISEASES_TextminingFiltered inherited from DISEASES.

kwargs: {‘channel’: ‘textmining_filtered’, ‘score_min’: None}

Initialize the DisGeNET data object.

class obnb.data.DISEASES_TextminingFull(*args, **kwargs)

Overloaded class DISEASES_TextminingFull inherited from DISEASES.

kwargs: {‘channel’: ‘textmining_full’, ‘score_min’: None}

Initialize the DisGeNET data object.

class obnb.data.DisGeNET(root, dsi_min=None, dsi_max=None, dpi_min=None, dpi_max=None, min_size=10, max_size=600, overlap=0.7, jaccard=0.5, data_sources=None, gene_id_converter=None, **kwargs)[source]

The DisGeNET disease gene set collection.

Initialize the DisGeNET data object.

Parameters:

root (str) –
dsi_min (float | None) –
dsi_max (float | None) –
dpi_min (float | None) –
dpi_max (float | None) –
min_size (int) –
max_size (int) –
overlap (float) –
jaccard (float) –
data_sources (List[str] | None) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.DisGeNETAnnotation(root, *, data_sources=None, dsi_min=None, dsi_max=None, dpi_min=None, dpi_max=None, **kwargs)[source]

DisGeNET disease gene annotations.

Disease gene associations are retrieved from disgenet.org.

There are four different categories of annotation sources from DisGeNET ( see below). By default, we only use the Curated and the Inferred data sources. User can change the sources by passing the list of sources to the data_sources argument. (Note: ~70% of the disease-gene annotations in DisGeNET are only available in the literature data source). See the DisGeNET data sources documentation page for more information.

Curated (CURATED):
- CGI Caner Genome Interpreter
- CLINGEN Clinical Genome Resource
- CTD_human Comparative Toxicogenomics Database (Human)
- GENOMICS_ENGLAND Genomics England PanelApp
- ORPHANET Orphan drugs and rare diseases
- PSYGENET Psychiatric disorders gene association network
- CLINVAR ClinVar disease-gene information with supported evidences
Inferred (INFERRED):
- HPO Human Phenotype Ontology
- UNIPROT UniProt/SwissProt database
- GWASCAT GWAS Catalog curated SNPs (p-val < 1e-6)
- GWASDB GWASdb (p-val < 1e-6)
Animal models (ANIMAL):
- CTD_mouse Comparative Toxicogenomics Database (Mouse)
- CTD_rat Comparative Toxicogenomics Database (Rat)
- MGD Mouse Genome Database
- RGD Rat Genome Database
Literature (LITERATURE):
- BEFREE Disease-gene association extracted from MEDLINE using BeFree
- LHGDN Literature derived human disease network

[Last updated: 2023-01-14]

Parameters:

root (str) – Root directory of the data.
data_sources (Optional[List[str]]) – List of evidence types to be considered. If not set, then use the default channels (curated and inferred evidences).
dsi_min (Optional[float]) – Minimum value of DSI below which the annotations are removed.
dsi_max (Optional[float]) – Maximum value of DSI above which the annotations are removed.
dpi_min (Optional[float]) – Minimum value of DPI below which the annotations are removed.
dpi_max (Optional[float]) – Maximum value of DPI above which the annotations are removed.

Notes

DSI and DPI stands for Disease Specificity Index and Disease Pleiotropy Index. The two metrics measure how specific a gene is associated to a particular disease (vs. being associated to many diseases) and how pleiotropic a gene is (i.e., does the gene contribute to a wide variety of disease types, according to MeSH disease classes). The exact definitions of DSI and DPI can be found on in the DisGeNET documentation webpage.

Initialize DisGeNET annotation data object.

load_processed_data()[source]: Load processed data into the data object.

Note

Any existing data must be purged upon calling this function. That is, the data object (self) will contain exactly the data loaded, but not not anything else.

class obnb.data.DisGeNET_Animal(*args, **kwargs)

Overloaded class DisGeNET_Animal inherited from DisGeNET.

kwargs: {‘data_sources’: [‘CTD_mouse’, ‘CTD_rat’, ‘MGD’, ‘RGD’]}

Initialize the DisGeNET data object.

class obnb.data.DisGeNET_BEFREE(*args, **kwargs)

Overloaded class DisGeNET_BEFREE inherited from DisGeNET.

kwargs: {‘data_sources’: [‘BEFREE’]}

Initialize the DisGeNET data object.

class obnb.data.DisGeNET_Curated(*args, **kwargs)

Overloaded class DisGeNET_Curated inherited from DisGeNET.

kwargs: {‘data_sources’: [‘CGI’, ‘CLINGEN’, ‘CTD_human’, ‘GENOMICS_ENGLAND’, ‘ORPHANET’, ‘PSYGENET’, ‘CLINVAR’]}

Initialize the DisGeNET data object.

class obnb.data.DisGeNET_GWAS(*args, **kwargs)

Overloaded class DisGeNET_GWAS inherited from DisGeNET.

kwargs: {‘data_sources’: [‘GWASCAT’, ‘GWASDB’]}

Initialize the DisGeNET data object.

class obnb.data.FunCoup(root, *, weighted=True, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The FunCoup funcional association network.

The edge weights are PFC values, which is a probabilistic estimation about whether a pair of genes are functionally coupled.

https://funcoup5.scilifelab.se/help/#Citation

Initialize the FunCoup network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.GOBP(*args, **kwargs)

The Gene Ontology Biological Process gene set collection.

Initialize the GO data object.

class obnb.data.GOCC(*args, **kwargs)

The Gene Ontology Cellular Component gene set collection.

Initialize the GO data object.

class obnb.data.GOMF(*args, **kwargs)

The Gene Ontology Molecular Function gene set collection.

Initialize the GO data object.

class obnb.data.GeneOntology(root, **kwargs)[source]

The Gene Ontology.

http://geneontology.org/

Initialize GeneOntology data object.

class obnb.data.GeneOntologyAnnotation(root, *, data_sources=None, **kwargs)[source]

Gene Ontology annotations.

Gene ontology annotations are retrieved from geneontology.org.

There are sevone categories of gene annotation evidences from the Gene Ontology. By default, we only use Experimental evidences, Author Statements, and the Curator inferred evidence types to ensure the quality of the annotations. See the gene ontology evidence codes documentation page for more information.

Experimental evidences (EXPERIMENTAL):
- EXP Experiment
- IDA Direct assay
- IPI Physical interaction
- IMP Mutant phenotype
- IGI Genetic interaction
- IEP Expression pattern
Phylogenetically-inffered (PHYLOGENIC):
- IBA Biological aspect of ancestor
- IBD Biological aspect of descendant
- IKR Key residues
- IRD Rapid divergence
Computational analysis (COMPUTATIONAL):
- ISS Sequence or structural similarity
- ISO Sequence orthology
- ISA Sequence alignment
- ISM Sequence model
- IGC Genomic context
- RCA Reviewed computational analysis
Author statements (AUTHOR):
- TAS Tracable author statement
- NAS Nontracable author statement
Curator statements (CURATOR):
- IC Inferred by curator
- ND No biological data available
Electronic annotation evidences (ELECTRONIC):
- IEA Electronic annotation

[Last updated: 2023-03-10]

Parameters:

root (str) – Root directory of the data.
data_sources (Optional[List[str]]) – List of evidene types to be considered. If not set, then use the default channels (experimental evidences, author and curator statements).

Initialize GeneOntology annotation data object.

load_processed_data()[source]: Load processed data into the data object.

Note

Any existing data must be purged upon calling this function. That is, the data object (self) will contain exactly the data loaded, but not not anything else.

class obnb.data.HIPPIE(root, *, weighted=True, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The HIPPIE Human scored Protein Protein Interaction network.

Note: the inferred PPI directionality is disregarded, i.e. the resulting: network is undirected.

Initialize the HIPPIE network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.HPO(root, min_size=10, max_size=600, overlap=0.7, jaccard=0.5, data_sources=None, gene_id_converter=None, **kwargs)[source]

The HPO disease and triat gene set collection.

Initialize the HPO data object.

Parameters:

root (str) –
min_size (int) –
max_size (int) –
overlap (float) –
jaccard (float) –
data_sources (List[str] | None) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.HuMAP(root, *, weighted=True, directed=False, largest_comp=True, gene_id_converter=None, **kwargs)[source]

The hu.MAP 2.0 protein interaction network.

Initialize the HumanBase-global network with top edges.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.HuRI(root, *, weighted=False, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The Human Reference Interactome.

Initialize the HuRI network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.HumanBaseTopGlobal(root, *, weighted=True, directed=False, largest_comp=True, gene_id_converter=None, **kwargs)[source]

The HumanBase-global network (top edges).

Initialize the HumanBase-global network with top edges.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.HumanNet(root, *, channel='XC', weighted=True, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The HumanNetv3 gene interaction networks.

https://staging2.inetbio.org/humannetv3/

The HumanNetv3 gene interaction networks are constructed using various types gene association evidences:

Integrated networks:

FN: Functional gene interaction network (CX + DB + DP + GI + GN + PG + PI).

XC: Full gene interaction network extended by co-citation (FN + CC).

Individual networks:

CC: Gene association inferred from co-citation.

CX: Gene association inferred from co-expression.

DB: Pathway database.

DP: Protein domain profile associations.

GI: Genetic interactions.

GN: Gene neighborhood.

PG: Phylogenetic profile associations.

PI: Protein-protein interactions.

Initialize the HumanNet network data.

Parameters:

root (str) –
channel (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

property processed_files: List[str]: Return a list of processed file names.

property raw_files: List[str]: Return a list of raw file names.

class obnb.data.HumanNet_CC(*args, **kwargs)

Overloaded class HumanNet_CC inherited from HumanNet.

kwargs: {‘channel’: ‘CC’}

Initialize the HumanNet network data.

class obnb.data.HumanNet_FN(*args, **kwargs)

Overloaded class HumanNet_FN inherited from HumanNet.

kwargs: {‘channel’: ‘FN’}

Initialize the HumanNet network data.

class obnb.data.HumanPhenotypeOntologyAnnotation(root, *, data_sources=None, **kwargs)[source]

The Human Phenotype Ontology gene annotations.

Annotations are retrieved from https://hpo.jax.org/app/

The disease/trait annotations mainly come from two sources, namely, OMIM and Orphanet. By default, both sources are used. For more information, please refer to the HPO annotation download page

[Last updated: 2023-06-08]

Parameters:

root (str) – Root directory of the data.
data_sources (Optional[List[str]]) – List of evidene types to be considered. If not set, then use the default channels (“OMIM” and “ORPHA”).

Initialize GeneOntology annotation data object.

load_processed_data()[source]: Load processed data into the data object.

Note

Any existing data must be purged upon calling this function. That is, the data object (self) will contain exactly the data loaded, but not not anything else.

class obnb.data.MondoDiseaseOntology(root, xref_prefix=None, **kwargs)[source]

The Mondo Disease Ontology.

https://mondo.monarchinitiative.org/

Initialize MondoDiseaseOntology data object.

class obnb.data.OmniPath(root, weighted=False, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The OmniPath intra- dand inter-cellular signaling knowledge base.

https://omnipathdb.org/

dorothea Interactions obtained from the DoRothEA database, which contains comprehensive resource of TF-promoter interactions curated from over 18 sources. Only the interactions with confidence from A-D are included in the OmniPath database.
kinaseextra Addition kinase-substrate interactions from prior knowledge.
ligrecextra Ligand-receptor interactions from prior knowledge.
lncrna_mrna Interactions between long non-coding RNAs and mRNAs, curated from three literatures.
mirnatarget Micro RNA target interactions.
omnipath Interaction information from literature curation, high throughput experiments, and prior knowledge.
pathwayextra Pathway information from prior konwledge.
small_molecule Small molecul protein interactions.
tf_mirna Transcription factor micro RNA interaction curated from two literature sources.
tf_target Transcription factor target curated from six literatures.
tfregulons Transcription factor regulon interacions.

Note

Prior knolwedge means annotations done by the aurhors without any literature references.

Initialize the OmniPath object.

Parameters:

weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

property raw_files: List[str]: Return a list of raw file names.

class obnb.data.PCNet(root, *, weighted=False, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The PCNet (v1.3) Parsimonious Composite human gene interaction network.

Initialize the PCNet network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.ProteomeHD(root, *, weighted=False, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The ProteomeHD Protein Protein Interaction network.

Initialize the ProteomeHD network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.SIGNOR(root, *, weighted=False, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The SIGnaling Network Open Resource human gene interaction network.

Initialize the SIGNOR network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

class obnb.data.STRING(root, *, weighted=True, directed=False, largest_comp=True, gene_id_converter='HumanEntrez', **kwargs)[source]

The STRING Human Protein Protein Interaction network.

Initialize the STRING network data.

Parameters:

root (str) –
weighted (bool) –
directed (bool) –
largest_comp (bool) –
gene_id_converter (Mapping[str, str] | str | None) –

Experimental features

Experimental data objects.

These are some of the interesting datasets that might be be considered to be included in the main data objects.

Warning

Since these are not stable objects, they are subject to change in future versions of the package.

class obnb.data.experimental.AlevinFry(root, dataset_id, quiet=False, delete_tar=False, **kwargs)[source]

The AlevinFry scRNA-seq datasets.

https://github.com/COMBINE-lab/alevin-fry

Initialize the AlevinFry data object.

Parameters:

root (str) – The root directory of the data.
dataset_id (int) – The ID of the Alevin-Fry dataset (see more at https://github.com/COMBINE-lab/pyroe).
quiet (bool) – If set to True, do not print any information to the screen about data downloading and processing.
delete_art – If set to True, delete the tar ball file after the data has been extracted.
delete_tar (bool) –

download()[source]: Download raw files.

download_completed()[source]

Check if all raw files are downloaded.

Return type:: bool

load_processed_data(path=None)[source]

Load processed data into the data object.

Note

Any existing data must be purged upon calling this function. That is, the data object (self) will contain exactly the data loaded, but not not anything else.

Parameters:: path (str | None) –

process_completed()[source]

Check if all processed files are available..

Return type:: bool