Fetch Module API Reference¶

Complete reference for biodbs.fetch module.

Summary¶

Fetcher Classes¶

Class	Description
`UniProt_Fetcher`	Fetch protein data from UniProt REST API
`PubChem_Fetcher`	Fetch chemical data from PubChem PUG REST/View APIs
`Ensembl_Fetcher`	Fetch genomic data from Ensembl REST API
`BioMart_Fetcher`	Query Ensembl BioMart for gene annotations
`KEGG_Fetcher`	Fetch pathway and gene data from KEGG API
`ChEMBL_Fetcher`	Fetch bioactivity data from ChEMBL API
`QuickGO_Fetcher`	Fetch GO annotations from QuickGO API
`HPA_Fetcher`	Fetch protein expression from Human Protein Atlas
`NCBI_Fetcher`	Fetch gene data from NCBI Entrez
`FDA_Fetcher`	Fetch drug/device data from openFDA
`Reactome_Fetcher`	Fetch pathway data from Reactome
`DO_Fetcher`	Fetch disease terms from Disease Ontology
`EnrichR_Fetcher`	Perform gene set enrichment via EnrichR
`HGNC_Fetcher`	Fetch gene nomenclature from HGNC
`ClinVar_Fetcher`	Fetch clinical variant data from ClinVar

UniProt Functions¶

Function	Description
`uniprot_get_entry`	Get a single UniProt entry by accession
`uniprot_search`	Search UniProtKB with query
`uniprot_search_by_gene`	Search by gene name
`gene_to_uniprot`	Map gene symbols to UniProt accessions
`uniprot_map_ids`	Map IDs between databases

PubChem Functions¶

Function	Description
`pubchem_get_compound`	Get compound record by CID
`pubchem_search_by_name`	Search compounds by name
`pubchem_get_properties`	Get compound properties

Ensembl Functions¶

Function	Description
`ensembl_lookup`	Lookup entity by Ensembl ID
`ensembl_lookup_symbol`	Lookup by gene symbol
`ensembl_get_sequence`	Get nucleotide/protein sequence
`ensembl_get_xrefs`	Get cross-references

BioMart Functions¶

Function	Description
`biomart_get_genes`	Get gene annotations by Ensembl IDs
`biomart_convert_ids`	Convert between gene ID types
`biomart_query`	Custom BioMart query

KEGG Functions¶

Function	Description
`kegg_list`	List entries in a KEGG database
`kegg_get`	Get KEGG entry by ID
`kegg_link`	Get cross-references between databases
`kegg_conv`	Convert between KEGG and external IDs

ChEMBL Functions¶

Function	Description
`chembl_get_molecule`	Get molecule by ChEMBL ID
`chembl_search_molecules`	Search molecules by name
`chembl_get_approved_drugs`	Get approved drugs list

QuickGO Functions¶

Function	Description
`quickgo_search_annotations`	Search GO annotations
`quickgo_get_terms`	Get GO term details

HPA Functions¶

Function	Description
`hpa_get_gene`	Get gene expression data
`hpa_get_tissue_expression`	Get tissue-level expression

NCBI Functions¶

Function	Description
`ncbi_get_gene`	Get gene info by Entrez ID
`ncbi_symbol_to_id`	Convert gene symbol to Entrez ID

FDA Functions¶

Function	Description
`fda_search`	Search openFDA endpoints
`fda_drug_events`	Search drug adverse events

Reactome Functions¶

Function	Description
`reactome_analyze`	Analyze gene list against Reactome

Disease Ontology Functions¶

Function	Description
`do_get_term`	Get disease term by DOID
`do_get_children`	Get child terms

EnrichR Functions¶

Function	Description
`enrichr_enrich`	Perform enrichment analysis
`enrichr_get_libraries`	List available gene set libraries

HGNC Functions¶

Function	Description
`hgnc_fetch`	Exact-match lookup by any HGNC field
`hgnc_search`	Wildcard / boolean search across HGNC
`hgnc_fetch_by_symbol`	Fetch gene by approved symbol
`hgnc_fetch_by_hgnc_id`	Fetch gene by HGNC ID
`hgnc_fetch_by_entrez_id`	Fetch gene by Entrez Gene ID
`hgnc_fetch_by_ensembl_id`	Fetch gene by Ensembl gene ID
`hgnc_fetch_by_uniprot_id`	Fetch gene by UniProt accession
`hgnc_fetch_by_refseq`	Fetch gene by RefSeq accession
`hgnc_search_symbol`	Wildcard search on gene symbols
`hgnc_info`	Return HGNC service metadata

ClinVar Functions¶

Function	Description
`clinvar_search`	Search ClinVar with an Entrez query string
`clinvar_count`	Count ClinVar records matching a query
`clinvar_fetch_by_id`	Fetch variant summaries by variation UID
`clinvar_search_gene`	Search and fetch variants for a gene
`clinvar_search_condition`	Search and fetch variants for a condition
`clinvar_fetch_vcv`	Fetch full VCV XML record
`clinvar_fetch_rcv`	Fetch full RCV XML record
`clinvar_link_pubmed`	Get PubMed IDs linked to a variation

Fetcher Classes¶

UniProt_Fetcher¶

UniProt_Fetcher ¶

UniProt_Fetcher()

Fetcher for UniProt REST API.

Provides access to UniProtKB protein data including:

Entry retrieval by accession
Search by query
ID mapping between databases
Batch retrieval

Example

fetcher = UniProt_Fetcher()

# Get protein by accession
entry = fetcher.get_entry("P05067")  # APP protein
print(entry.entries[0].protein_name)

# Search for proteins
results = fetcher.search("gene:TP53 AND organism_id:9606")
print(results.as_dataframe())

# Get multiple entries
entries = fetcher.get_entries(["P05067", "P04637", "P00533"])

# Map IDs
mapping = fetcher.map_ids(
    ["P05067", "P04637"],
    from_db="UniProtKB_AC-ID",
    to_db="GeneID"
)

Initialize UniProt fetcher.

get_entry ¶

get_entry(
    accession: str, fields: Optional[str] = None
) -> UniProtFetchedData

Get a UniProt entry by accession.

Parameters:

Name	Type	Description	Default
`accession`	`str`	UniProt accession (e.g., "P05067").	required
`fields`	`Optional[str]`	Comma-separated list of fields to return.	`None`

Returns:

Type	Description
`UniProtFetchedData`	UniProtFetchedData with the entry.

Example

fetcher = UniProt_Fetcher()
entry = fetcher.get_entry("P05067")
print(entry.entries[0].protein_name)

get_entries ¶

get_entries(
    accessions: List[str], fields: Optional[str] = None
) -> UniProtFetchedData

Get multiple UniProt entries by accessions.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of UniProt accessions.	required
`fields`	`Optional[str]`	Comma-separated list of fields to return.	`None`

Returns:

Type	Description
`UniProtFetchedData`	UniProtFetchedData with all entries.

Example

fetcher = UniProt_Fetcher()
entries = fetcher.get_entries(["P05067", "P04637", "P00533"])
print(entries.get_gene_names())

search ¶

search(
    query: str,
    fields: Optional[str] = None,
    sort: Optional[str] = None,
    size: int = 25,
    include_isoform: bool = False,
    cursor: Optional[str] = None,
) -> UniProtSearchResult

Search UniProtKB.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query (e.g., "gene:TP53 AND organism_id:9606").	required
`fields`	`Optional[str]`	Comma-separated list of fields to return.	`None`
`sort`	`Optional[str]`	Sort field and direction (e.g., "accession desc").	`None`
`size`	`int`	Number of results per page (max 500).	`25`
`include_isoform`	`bool`	Include isoforms in results.	`False`
`cursor`	`Optional[str]`	Cursor for pagination.	`None`

Returns:

Type	Description
`UniProtSearchResult`	UniProtSearchResult with matching entries.

Example

fetcher = UniProt_Fetcher()
results = fetcher.search("gene:BRCA1 AND reviewed:true")
print(results.as_dataframe())

search_all ¶

search_all(
    query: str,
    fields: Optional[str] = None,
    sort: Optional[str] = None,
    max_results: int = 10000,
    include_isoform: bool = False,
) -> UniProtFetchedData

Search and retrieve all results with pagination.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query.	required
`fields`	`Optional[str]`	Fields to return.	`None`
`sort`	`Optional[str]`	Sort field and direction.	`None`
`max_results`	`int`	Maximum results to retrieve.	`10000`
`include_isoform`	`bool`	Include isoforms.	`False`

Returns:

Type	Description
`UniProtFetchedData`	UniProtFetchedData with all matching entries.

search_by_gene ¶

search_by_gene(
    gene_name: str,
    organism: Optional[Union[int, str]] = None,
    reviewed_only: bool = False,
    size: int = 25,
) -> UniProtSearchResult

Search by gene name.

Parameters:

Name	Type	Description	Default
`gene_name`	`str`	Gene name to search.	required
`organism`	`Optional[Union[int, str]]`	Organism tax ID or name.	`None`
`reviewed_only`	`bool`	Only return reviewed entries.	`False`
`size`	`int`	Results per page.	`25`

Returns:

Type	Description
`UniProtSearchResult`	UniProtSearchResult with matching entries.

Example

fetcher = UniProt_Fetcher()
results = fetcher.search_by_gene("TP53", organism=9606, reviewed_only=True)

search_by_organism ¶

search_by_organism(
    organism: Union[int, str],
    reviewed_only: bool = False,
    size: int = 25,
) -> UniProtSearchResult

Search by organism.

Parameters:

Name	Type	Description	Default
`organism`	`Union[int, str]`	Organism tax ID or name.	required
`reviewed_only`	`bool`	Only return reviewed entries.	`False`
`size`	`int`	Results per page.	`25`

Returns:

Type	Description
`UniProtSearchResult`	UniProtSearchResult with matching entries.

search_by_keyword ¶

search_by_keyword(
    keyword: str,
    organism: Optional[Union[int, str]] = None,
    reviewed_only: bool = False,
    size: int = 25,
) -> UniProtSearchResult

Search by keyword.

Parameters:

Name	Type	Description	Default
`keyword`	`str`	Keyword to search (e.g., "kinase", "receptor").	required
`organism`	`Optional[Union[int, str]]`	Optional organism filter.	`None`
`reviewed_only`	`bool`	Only return reviewed entries.	`False`
`size`	`int`	Results per page.	`25`

Returns:

Type	Description
`UniProtSearchResult`	UniProtSearchResult with matching entries.

map_ids ¶

map_ids(
    ids: List[str],
    from_db: str = "UniProtKB_AC-ID",
    to_db: str = "UniProtKB",
    poll_interval: float = 1.0,
    max_wait: float = 60.0,
) -> Dict[str, List[str]]

Map IDs between databases.

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of IDs to map.	required
`from_db`	`str`	Source database (e.g., "UniProtKB_AC-ID", "Gene_Name", "GeneID").	`'UniProtKB_AC-ID'`
`to_db`	`str`	Target database (e.g., "UniProtKB", "GeneID", "PDB").	`'UniProtKB'`
`poll_interval`	`float`	Seconds between status checks.	`1.0`
`max_wait`	`float`	Maximum seconds to wait for job completion.	`60.0`

Returns:

Type	Description
`Dict[str, List[str]]`	Dictionary mapping input IDs to lists of output IDs.

Example

fetcher = UniProt_Fetcher()
mapping = fetcher.map_ids(
    ["P05067", "P04637"],
    from_db="UniProtKB_AC-ID",
    to_db="GeneID"
)

gene_to_uniprot ¶

gene_to_uniprot(
    gene_names: List[str],
    organism: int = 9606,
    reviewed_only: bool = True,
) -> Dict[str, str]

Map gene names to UniProt accessions.

Uses concurrent requests for efficient batch processing.

Parameters:

Name	Type	Description	Default
`gene_names`	`List[str]`	List of gene names.	required
`organism`	`int`	Organism tax ID (default human).	`9606`
`reviewed_only`	`bool`	Only return reviewed entries.	`True`

Returns:

Type	Description
`Dict[str, str]`	Dictionary mapping gene names to accessions.

Example

fetcher = UniProt_Fetcher()
mapping = fetcher.gene_to_uniprot(["TP53", "BRCA1", "EGFR"])

uniprot_to_gene ¶

uniprot_to_gene(accessions: List[str]) -> Dict[str, str]

Map UniProt accessions to gene names.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of UniProt accessions.	required

Returns:

Type	Description
`Dict[str, str]`	Dictionary mapping accessions to gene names.

get_sequences ¶

get_sequences(accessions: List[str]) -> Dict[str, str]

Get protein sequences for accessions.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of UniProt accessions.	required

Returns:

Type	Description
`Dict[str, str]`	Dictionary mapping accessions to sequences.

PubChem_Fetcher¶

PubChem_Fetcher ¶

PubChem_Fetcher(**data_manager_kws)

Fetcher for PubChem PUG REST and PUG View APIs.

PubChem provides two REST APIs:

PUG REST - Structured data access:

Compound records (structures, properties, synonyms)
Substance records (deposited data)
Bioassay data
Gene and protein information
Structure searches (similarity, substructure)

PUG View - Annotation/web page content:

Detailed compound annotations
Safety and hazards information
Pharmacology and biochemistry
Literature and patents
Drug and medication information

Example

fetcher = PubChem_Fetcher()

# Get compound by CID
aspirin = fetcher.get_compound(2244)
print(aspirin.results[0])

# Get compound properties
props = fetcher.get_properties(
    [2244, 3672],
    properties=["MolecularFormula", "MolecularWeight"]
)
df = props.as_dataframe()

# Search by name
results = fetcher.search_by_name("aspirin")

# Similarity search
similar = fetcher.similarity_search(
    smiles="CC(=O)OC1=CC=CC=C1C(=O)O",
    threshold=90
)

# Get safety data
safety = fetcher.get_safety_data(2244)

# Get pharmacology info
pharma = fetcher.get_pharmacology(2244)

get ¶

get(
    domain: str,
    namespace: str,
    identifiers: Optional[
        Union[str, int, List[Union[str, int]]]
    ] = None,
    operation: Optional[str] = None,
    properties: Optional[List[str]] = None,
    output: str = "JSON",
    search_type: Optional[str] = None,
    threshold: Optional[int] = None,
    max_records: Optional[int] = None,
) -> PUGRestFetchedData

Fetch data from PubChem PUG REST API.

Parameters:

Name	Type	Description	Default
`domain`	`str`	PubChem domain (compound, substance, assay, etc.).	required
`namespace`	`str`	Identifier namespace (cid, name, smiles, etc.).	required
`identifiers`	`Optional[Union[str, int, List[Union[str, int]]]]`	ID(s) to look up.	`None`
`operation`	`Optional[str]`	Operation to perform (property, synonyms, etc.).	`None`
`properties`	`Optional[List[str]]`	List of properties for property operation.	`None`
`output`	`str`	Output format (JSON, XML, CSV, SDF, PNG).	`'JSON'`
`search_type`	`Optional[str]`	For structure searches (smiles, smarts, inchi).	`None`
`threshold`	`Optional[int]`	Similarity threshold (0-100) for similarity searches.	`None`
`max_records`	`Optional[int]`	Maximum records to return.	`None`

Returns:

Type	Description
`PUGRestFetchedData`	PUGRestFetchedData with parsed results.

get_all ¶

get_all(
    domain: str,
    namespace: str,
    identifiers: List[Union[str, int]],
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    batch_size: int = 100,
    rate_limit_per_second: int = 5,
    operation: Optional[str] = None,
    properties: Optional[List[str]] = None,
    **kwargs: Any,
) -> Union[PUGRestFetchedData, Path]

Fetch data for many identifiers by batching.

PubChem allows multiple CIDs/SIDs in a single request (comma-separated), but there are limits. This method batches requests.

Parameters:

Name	Type	Description	Default
`domain`	`str`	PubChem domain.	required
`namespace`	`str`	Identifier namespace.	required
`identifiers`	`List[Union[str, int]]`	List of IDs to fetch.	required
`method`	`Literal['concat', 'stream_to_storage']`	"concat" or "stream_to_storage".	`'concat'`
`batch_size`	`int`	IDs per request (default 100).	`100`
`rate_limit_per_second`	`int`	Max requests per second.	`5`
`operation`	`Optional[str]`	Operation to perform.	`None`
`properties`	`Optional[List[str]]`	Properties for property operation.	`None`
`**kwargs`	`Any`	Additional parameters.	`{}`

Returns:

Type	Description
`Union[PUGRestFetchedData, Path]`	Combined PUGRestFetchedData or Path to output file.

get_compound ¶

get_compound(cid: int) -> PUGRestFetchedData

Get a compound record by CID.

get_compounds ¶

get_compounds(cids: List[int]) -> PUGRestFetchedData

Get multiple compound records by CID.

get_substance ¶

get_substance(sid: int) -> PUGRestFetchedData

Get a substance record by SID.

get_assay ¶

get_assay(aid: int) -> PUGRestFetchedData

Get an assay record by AID.

search_by_name ¶

search_by_name(name: str) -> PUGRestFetchedData

Search compounds by name.

search_by_smiles ¶

search_by_smiles(smiles: str) -> PUGRestFetchedData

Search compounds by SMILES.

search_by_inchikey ¶

search_by_inchikey(inchikey: str) -> PUGRestFetchedData

Search compounds by InChIKey.

search_by_formula ¶

search_by_formula(formula: str) -> PUGRestFetchedData

Search compounds by molecular formula.

get_properties ¶

get_properties(
    cids: Union[int, List[int]],
    properties: Optional[List[str]] = None,
) -> PUGRestFetchedData

Get compound properties.

Parameters:

Name	Type	Description	Default
`cids`	`Union[int, List[int]]`	Compound ID(s).	required
`properties`	`Optional[List[str]]`	Properties to retrieve. Defaults to common properties.	`None`

get_synonyms ¶

get_synonyms(cid: int) -> PUGRestFetchedData

Get synonyms for a compound.

get_cids_by_name ¶

get_cids_by_name(name: str) -> PUGRestFetchedData

Get CIDs matching a name.

get_sids_for_compound ¶

get_sids_for_compound(cid: int) -> PUGRestFetchedData

Get SIDs associated with a compound.

get_aids_for_compound ¶

get_aids_for_compound(cid: int) -> PUGRestFetchedData

Get assay AIDs associated with a compound.

similarity_search ¶

similarity_search(
    smiles: str, threshold: int = 90, max_records: int = 100
) -> PUGRestFetchedData

Find similar compounds by SMILES.

Parameters:

Name	Type	Description	Default
`smiles`	`str`	Query SMILES string.	required
`threshold`	`int`	Similarity threshold (0-100).	`90`
`max_records`	`int`	Maximum records to return.	`100`

substructure_search ¶

substructure_search(
    smiles: str, max_records: int = 100
) -> PUGRestFetchedData

Find compounds containing a substructure.

Parameters:

Name	Type	Description	Default
`smiles`	`str`	Query SMILES string.	required
`max_records`	`int`	Maximum records to return.	`100`

get_compound_image ¶

get_compound_image(
    cid: int, image_size: str = "large"
) -> PUGRestFetchedData

Get compound structure image (PNG).

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required
`image_size`	`str`	Image size (small, large, or pixel size like "300x300").	`'large'`

get_compound_sdf ¶

get_compound_sdf(cid: int) -> PUGRestFetchedData

Get compound structure in SDF format.

get_description ¶

get_description(cid: int) -> PUGRestFetchedData

Get compound description.

get_view ¶

get_view(
    record_id: Union[int, str],
    record_type: str = "compound",
    heading: Optional[str] = None,
    output: str = "JSON",
) -> PUGViewFetchedData

Fetch annotation data from PubChem PUG View API.

PUG View provides detailed annotation/web page content including safety data, pharmacology, literature, patents, etc.

Parameters:

Name	Type	Description	Default
`record_id`	`Union[int, str]`	Record ID (CID for compounds, SID for substances, etc.).	required
`record_type`	`str`	Type of record (compound, substance, assay, gene, protein, etc.).	`'compound'`
`heading`	`Optional[str]`	Optional heading to filter to a specific section.	`None`
`output`	`str`	Output format (JSON or XML).	`'JSON'`

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData with hierarchical annotation data.

get_compound_annotations ¶

get_compound_annotations(cid: int) -> PUGViewFetchedData

Get full annotation data for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData with all annotation sections.

get_substance_annotations ¶

get_substance_annotations(sid: int) -> PUGViewFetchedData

Get full annotation data for a substance.

Parameters:

Name	Type	Description	Default
`sid`	`int`	Substance ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData with all annotation sections.

get_safety_data ¶

get_safety_data(cid: int) -> PUGViewFetchedData

Get safety and hazards information for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData filtered to Safety and Hazards section.

get_toxicity_data ¶

get_toxicity_data(cid: int) -> PUGViewFetchedData

Get toxicity information for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData filtered to Toxicity section.

get_pharmacology ¶

get_pharmacology(cid: int) -> PUGViewFetchedData

Get pharmacology and biochemistry information for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData filtered to Pharmacology and Biochemistry section.

get_drug_info ¶

get_drug_info(cid: int) -> PUGViewFetchedData

Get drug and medication information for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData filtered to Drug and Medication Information section.

get_literature ¶

get_literature(cid: int) -> PUGViewFetchedData

Get literature references for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData filtered to Literature section.

get_patents ¶

get_patents(cid: int) -> PUGViewFetchedData

Get patent information for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData filtered to Patents section.

get_names_and_identifiers ¶

get_names_and_identifiers(cid: int) -> PUGViewFetchedData

Get names and identifiers for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData filtered to Names and Identifiers section.

get_physical_properties ¶

get_physical_properties(cid: int) -> PUGViewFetchedData

Get chemical and physical properties for a compound.

Parameters:

Name	Type	Description	Default
`cid`	`int`	Compound ID.	required

Returns:

Type	Description
`PUGViewFetchedData`	PUGViewFetchedData filtered to Chemical and Physical Properties section.

Ensembl_Fetcher¶

Ensembl_Fetcher ¶

Ensembl_Fetcher(**data_manager_kws)

Fetcher for Ensembl REST API.

Ensembl REST API provides access to genomic data including:

Gene/transcript/protein lookup and information
Genomic and protein sequences
Feature overlap queries
Cross-references to external databases
Homology and comparative genomics
Variant data and VEP (Variant Effect Predictor)
Coordinate mapping between assemblies
Phenotype and ontology data

Example

fetcher = Ensembl_Fetcher()

# Lookup a gene by Ensembl ID
gene = fetcher.lookup("ENSG00000141510")
print(gene.results[0]["display_name"])  # TP53

# Get sequence for a transcript
seq = fetcher.get_sequence("ENST00000269305", sequence_type="cds")

# Find features overlapping a region
features = fetcher.get_overlap_region(
    "human", "7:140424943-140624564",
    feature=["gene", "transcript"]
)

# Get homologs for a gene
homologs = fetcher.get_homology("human", "ENSG00000141510")

# Get variant consequences
vep = fetcher.get_vep_hgvs("human", "ENST00000366667:c.803C>T")

get ¶

get(
    endpoint: str,
    id: Optional[str] = None,
    ids: Optional[List[str]] = None,
    species: Optional[str] = None,
    symbol: Optional[str] = None,
    region: Optional[str] = None,
    gene: Optional[str] = None,
    name: Optional[str] = None,
    content_type: str = "json",
    **kwargs: Any,
) -> EnsemblFetchedData

Fetch data from Ensembl REST API.

Parameters:

Name	Type	Description	Default
`endpoint`	`str`	Ensembl endpoint (e.g., "lookup/id", "sequence/id").	required
`id`	`Optional[str]`	Ensembl stable ID for single lookups.	`None`
`ids`	`Optional[List[str]]`	List of IDs for batch requests.	`None`
`species`	`Optional[str]`	Species name (e.g., "human", "homo_sapiens").	`None`
`symbol`	`Optional[str]`	Gene symbol for symbol-based lookups.	`None`
`region`	`Optional[str]`	Genomic region (e.g., "X:1000000..1000100:1").	`None`
`gene`	`Optional[str]`	Gene name or ID for phenotype endpoints.	`None`
`name`	`Optional[str]`	Name for name-based lookups.	`None`
`content_type`	`str`	Response format ("json", "fasta", "text").	`'json'`
`**kwargs`	`Any`	Additional endpoint-specific parameters.	`{}`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with parsed results.

lookup ¶

lookup(
    id: str,
    species: Optional[str] = None,
    expand: bool = False,
    format: str = "full",
    db_type: str = "core",
    phenotypes: bool = False,
    utr: bool = False,
    mane: bool = False,
) -> EnsemblFetchedData

Look up an Ensembl stable ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ensembl stable ID (e.g., ENSG00000141510).	required
`species`	`Optional[str]`	Species name/alias (optional, auto-detected from ID).	`None`
`expand`	`bool`	Include connected features (transcripts, exons).	`False`
`format`	`str`	Response format ("full" or "condensed").	`'full'`
`db_type`	`str`	Database type ("core" or "otherfeatures").	`'core'`
`phenotypes`	`bool`	Include phenotypes (genes only).	`False`
`utr`	`bool`	Include UTR features (requires expand=True).	`False`
`mane`	`bool`	Include MANE features (requires expand=True).	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with gene/transcript/protein information.

lookup_batch ¶

lookup_batch(
    ids: List[str],
    species: Optional[str] = None,
    expand: bool = False,
    format: str = "full",
    db_type: str = "core",
) -> EnsemblFetchedData

Look up multiple Ensembl stable IDs in batch.

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of Ensembl stable IDs (max 1000).	required
`species`	`Optional[str]`	Species name/alias.	`None`
`expand`	`bool`	Include connected features.	`False`
`format`	`str`	Response format.	`'full'`
`db_type`	`str`	Database type.	`'core'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with results for each ID.

lookup_symbol ¶

lookup_symbol(
    species: str,
    symbol: str,
    expand: bool = False,
    format: str = "full",
) -> EnsemblFetchedData

Look up a gene by symbol.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name (e.g., "human", "mouse").	required
`symbol`	`str`	Gene symbol (e.g., "BRCA2", "TP53").	required
`expand`	`bool`	Include connected features.	`False`
`format`	`str`	Response format.	`'full'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with gene information.

get_sequence ¶

get_sequence(
    id: str,
    sequence_type: str = "genomic",
    species: Optional[str] = None,
    expand_5prime: Optional[int] = None,
    expand_3prime: Optional[int] = None,
    start: Optional[int] = None,
    end: Optional[int] = None,
    mask: Optional[str] = None,
    mask_feature: bool = False,
    multiple_sequences: bool = False,
    format: str = "fasta",
) -> EnsemblFetchedData

Get sequence for an Ensembl stable ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ensembl stable ID (gene, transcript, exon, protein).	required
`sequence_type`	`str`	Type of sequence ("genomic", "cds", "cdna", "protein").	`'genomic'`
`species`	`Optional[str]`	Species name (optional).	`None`
`expand_5prime`	`Optional[int]`	Extend upstream (genomic only).	`None`
`expand_3prime`	`Optional[int]`	Extend downstream (genomic only).	`None`
`start`	`Optional[int]`	Trim sequence start.	`None`
`end`	`Optional[int]`	Trim sequence end.	`None`
`mask`	`Optional[str]`	Mask repeats ("hard" or "soft", genomic only).	`None`
`mask_feature`	`bool`	Mask introns/UTRs.	`False`
`multiple_sequences`	`bool`	Return multiple sequences per ID.	`False`
`format`	`str`	Output format ("fasta" or "json").	`'fasta'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with sequence data.

get_sequence_batch ¶

get_sequence_batch(
    ids: List[str],
    sequence_type: str = "genomic",
    species: Optional[str] = None,
    format: str = "fasta",
) -> EnsemblFetchedData

Get sequences for multiple Ensembl IDs in batch.

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of Ensembl stable IDs (max 50).	required
`sequence_type`	`str`	Type of sequence.	`'genomic'`
`species`	`Optional[str]`	Species name.	`None`
`format`	`str`	Output format.	`'fasta'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with sequences.

get_sequence_region ¶

get_sequence_region(
    species: str,
    region: str,
    expand_5prime: Optional[int] = None,
    expand_3prime: Optional[int] = None,
    mask: Optional[str] = None,
    coord_system: Optional[str] = None,
    format: str = "fasta",
) -> EnsemblFetchedData

Get genomic sequence for a region.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name (e.g., "human").	required
`region`	`str`	Genomic region (e.g., "X:1000000..1000100:1").	required
`expand_5prime`	`Optional[int]`	Extend upstream.	`None`
`expand_3prime`	`Optional[int]`	Extend downstream.	`None`
`mask`	`Optional[str]`	Mask repeats ("hard" or "soft").	`None`
`coord_system`	`Optional[str]`	Coordinate system filter.	`None`
`format`	`str`	Output format ("fasta" or "json").	`'fasta'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with sequence.

get_overlap_id ¶

get_overlap_id(
    id: str,
    feature: Union[str, List[str]],
    species: Optional[str] = None,
    biotype: Optional[str] = None,
    logic_name: Optional[str] = None,
    db_type: str = "core",
) -> EnsemblFetchedData

Get features overlapping an Ensembl ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ensembl stable ID.	required
`feature`	`Union[str, List[str]]`	Feature type(s) to retrieve (gene, transcript, exon, etc.).	required
`species`	`Optional[str]`	Species name.	`None`
`biotype`	`Optional[str]`	Filter by biotype (e.g., "protein_coding").	`None`
`logic_name`	`Optional[str]`	Filter by analysis logic name.	`None`
`db_type`	`str`	Database type.	`'core'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with overlapping features.

get_overlap_region ¶

get_overlap_region(
    species: str,
    region: str,
    feature: Union[str, List[str]],
    biotype: Optional[str] = None,
    logic_name: Optional[str] = None,
    so_term: Optional[str] = None,
    variant_set: Optional[str] = None,
    db_type: str = "core",
) -> EnsemblFetchedData

Get features overlapping a genomic region.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name (e.g., "human").	required
`region`	`str`	Genomic region (e.g., "7:140424943-140624564", max 5Mb).	required
`feature`	`Union[str, List[str]]`	Feature type(s) to retrieve.	required
`biotype`	`Optional[str]`	Filter by biotype.	`None`
`logic_name`	`Optional[str]`	Filter by analysis logic name.	`None`
`so_term`	`Optional[str]`	Sequence Ontology term filter.	`None`
`variant_set`	`Optional[str]`	Variant set restriction (e.g., "ClinVar").	`None`
`db_type`	`str`	Database type.	`'core'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with overlapping features.

get_xrefs ¶

get_xrefs(
    id: str,
    species: Optional[str] = None,
    external_db: Optional[str] = None,
    all_levels: bool = False,
    db_type: str = "core",
    object_type: Optional[str] = None,
) -> EnsemblFetchedData

Get external cross-references for an Ensembl ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ensembl stable ID.	required
`species`	`Optional[str]`	Species name.	`None`
`external_db`	`Optional[str]`	Filter by external database (e.g., "HGNC", "UniProt").	`None`
`all_levels`	`bool`	Find all linked features.	`False`
`db_type`	`str`	Database type.	`'core'`
`object_type`	`Optional[str]`	Filter by feature type.	`None`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with cross-references.

get_xrefs_symbol ¶

get_xrefs_symbol(
    species: str,
    symbol: str,
    external_db: Optional[str] = None,
    db_type: str = "core",
    object_type: Optional[str] = None,
) -> EnsemblFetchedData

Look up Ensembl objects by external symbol.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`symbol`	`str`	External symbol (e.g., gene name "BRCA2").	required
`external_db`	`Optional[str]`	Filter by external database.	`None`
`db_type`	`str`	Database type.	`'core'`
`object_type`	`Optional[str]`	Filter by feature type.	`None`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with matching Ensembl objects.

get_homology ¶

get_homology(
    species: str,
    id: str,
    homology_type: str = "all",
    target_species: Optional[str] = None,
    target_taxon: Optional[int] = None,
    aligned: bool = True,
    cigar_line: bool = True,
    sequence: str = "protein",
    compara: str = "vertebrates",
    format: str = "full",
) -> EnsemblFetchedData

Get homology information for a gene.

Parameters:

Name	Type	Description	Default
`species`	`str`	Source species name.	required
`id`	`str`	Ensembl gene ID.	required
`homology_type`	`str`	Type of homology ("orthologues", "paralogues", "all").	`'all'`
`target_species`	`Optional[str]`	Filter by target species.	`None`
`target_taxon`	`Optional[int]`	Filter by target taxon ID.	`None`
`aligned`	`bool`	Include aligned sequences.	`True`
`cigar_line`	`bool`	Return sequence in CIGAR format.	`True`
`sequence`	`str`	Sequence type ("none", "cdna", "protein").	`'protein'`
`compara`	`str`	Compara database name.	`'vertebrates'`
`format`	`str`	Response format ("full" or "condensed").	`'full'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with homology data.

get_homology_symbol ¶

get_homology_symbol(
    species: str,
    symbol: str,
    homology_type: str = "all",
    target_species: Optional[str] = None,
    sequence: str = "protein",
) -> EnsemblFetchedData

Get homology information for a gene by symbol.

Parameters:

Name	Type	Description	Default
`species`	`str`	Source species name.	required
`symbol`	`str`	Gene symbol.	required
`homology_type`	`str`	Type of homology.	`'all'`
`target_species`	`Optional[str]`	Filter by target species.	`None`
`sequence`	`str`	Sequence type.	`'protein'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with homology data.

get_variation ¶

get_variation(
    species: str,
    id: str,
    genotypes: bool = False,
    pops: bool = False,
    population_genotypes: bool = False,
    phenotypes: bool = False,
    genotyping_chips: bool = False,
) -> EnsemblFetchedData

Get variant information by rsID.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`id`	`str`	Variant ID (e.g., "rs56116432").	required
`genotypes`	`bool`	Include individual genotypes.	`False`
`pops`	`bool`	Include population allele frequencies.	`False`
`population_genotypes`	`bool`	Include population genotype frequencies.	`False`
`phenotypes`	`bool`	Include phenotypes.	`False`
`genotyping_chips`	`bool`	Include genotyping chip info.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with variant data.

get_vep_hgvs ¶

get_vep_hgvs(
    species: str,
    hgvs_notation: str,
    canonical: bool = False,
    domains: bool = False,
    hgvs: bool = False,
    numbers: bool = False,
    protein: bool = False,
    refseq: bool = False,
    variant_class: bool = False,
) -> EnsemblFetchedData

Get variant consequences using HGVS notation.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`hgvs_notation`	`str`	HGVS notation (e.g., "ENST00000366667:c.803C>T").	required
`canonical`	`bool`	Only return canonical transcript.	`False`
`domains`	`bool`	Include protein domains.	`False`
`hgvs`	`bool`	Add HGVS nomenclature.	`False`
`numbers`	`bool`	Include exon/intron numbers.	`False`
`protein`	`bool`	Include protein position and amino acid changes.	`False`
`refseq`	`bool`	Include RefSeq transcripts.	`False`
`variant_class`	`bool`	Include variant class.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with VEP results.

get_vep_id ¶

get_vep_id(
    species: str,
    id: str,
    canonical: bool = False,
    domains: bool = False,
    hgvs: bool = False,
    numbers: bool = False,
    protein: bool = False,
) -> EnsemblFetchedData

Get variant consequences using variant ID.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`id`	`str`	Variant ID (e.g., rsID).	required
`canonical`	`bool`	Only return canonical transcript.	`False`
`domains`	`bool`	Include protein domains.	`False`
`hgvs`	`bool`	Add HGVS nomenclature.	`False`
`numbers`	`bool`	Include exon/intron numbers.	`False`
`protein`	`bool`	Include protein position.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with VEP results.

get_vep_region ¶

get_vep_region(
    species: str,
    region: str,
    allele: str,
    canonical: bool = False,
    domains: bool = False,
    hgvs: bool = False,
    numbers: bool = False,
    protein: bool = False,
) -> EnsemblFetchedData

Get variant consequences using genomic coordinates.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`region`	`str`	Genomic region (e.g., "9:22125503-22125502:1").	required
`allele`	`str`	Variant allele (e.g., "C", "DUP").	required
`canonical`	`bool`	Only return canonical transcript.	`False`
`domains`	`bool`	Include protein domains.	`False`
`hgvs`	`bool`	Add HGVS nomenclature.	`False`
`numbers`	`bool`	Include exon/intron numbers.	`False`
`protein`	`bool`	Include protein position.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with VEP results.

map_assembly ¶

map_assembly(
    species: str,
    asm_one: str,
    region: str,
    asm_two: str,
    coord_system: str = "chromosome",
    target_coord_system: str = "chromosome",
) -> EnsemblFetchedData

Map coordinates between assemblies.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`asm_one`	`str`	Source assembly version (e.g., "GRCh37").	required
`region`	`str`	Genomic region to map (e.g., "X:1000000..1000100:1").	required
`asm_two`	`str`	Target assembly version (e.g., "GRCh38").	required
`coord_system`	`str`	Input coordinate system.	`'chromosome'`
`target_coord_system`	`str`	Output coordinate system.	`'chromosome'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with mapped coordinates.

get_phenotype_gene ¶

get_phenotype_gene(
    species: str,
    gene: str,
    include_associated: bool = False,
    include_overlap: bool = False,
    include_pubmed_id: bool = False,
    include_review_status: bool = False,
    include_submitter: bool = False,
) -> EnsemblFetchedData

Get phenotypes associated with a gene.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`gene`	`str`	Gene name or Ensembl ID.	required
`include_associated`	`bool`	Include phenotypes from associated variants.	`False`
`include_overlap`	`bool`	Include phenotypes from overlapping features.	`False`
`include_pubmed_id`	`bool`	Include PubMed IDs.	`False`
`include_review_status`	`bool`	Include review status.	`False`
`include_submitter`	`bool`	Include submitter names.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with phenotype data.

get_phenotype_region ¶

get_phenotype_region(
    species: str,
    region: str,
    include_pubmed_id: bool = False,
    include_review_status: bool = False,
) -> EnsemblFetchedData

Get phenotypes in a genomic region.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`region`	`str`	Genomic region.	required
`include_pubmed_id`	`bool`	Include PubMed IDs.	`False`
`include_review_status`	`bool`	Include review status.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with phenotype data.

get_ontology_term ¶

get_ontology_term(
    id: str,
    relation: Optional[str] = None,
    simple: bool = False,
) -> EnsemblFetchedData

Get ontology term information.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ontology term ID (e.g., "GO:0005667").	required
`relation`	`Optional[str]`	Relationship types to include.	`None`
`simple`	`bool`	Don't fetch parent/child terms.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with ontology term data.

get_ontology_ancestors ¶

get_ontology_ancestors(
    id: str,
    ontology: Optional[str] = None,
    zero_distance: bool = False,
) -> EnsemblFetchedData

Get ancestor terms for an ontology term.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ontology term ID.	required
`ontology`	`Optional[str]`	Filter by ontology.	`None`
`zero_distance`	`bool`	Include the term itself.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with ancestor terms.

get_ontology_descendants ¶

get_ontology_descendants(
    id: str,
    ontology: Optional[str] = None,
    zero_distance: bool = False,
    subset: Optional[str] = None,
) -> EnsemblFetchedData

Get descendant terms for an ontology term.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ontology term ID.	required
`ontology`	`Optional[str]`	Filter by ontology.	`None`
`zero_distance`	`bool`	Include the term itself.	`False`
`subset`	`Optional[str]`	Filter by subset.	`None`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with descendant terms.

get_genetree ¶

get_genetree(
    id: str,
    aligned: bool = False,
    cigar_line: bool = False,
    sequence: str = "protein",
    nh_format: str = "simple",
    prune_species: Optional[str] = None,
    prune_taxon: Optional[int] = None,
    clusterset_id: Optional[str] = None,
    compara: str = "vertebrates",
) -> EnsemblFetchedData

Get gene tree by tree ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	Gene tree ID (e.g., "ENSGT00390000003602").	required
`aligned`	`bool`	Include aligned sequences.	`False`
`cigar_line`	`bool`	Return sequence in CIGAR format.	`False`
`sequence`	`str`	Sequence type ("none", "cdna", "protein").	`'protein'`
`nh_format`	`str`	Newick format type.	`'simple'`
`prune_species`	`Optional[str]`	Filter by species.	`None`
`prune_taxon`	`Optional[int]`	Filter by taxon ID.	`None`
`clusterset_id`	`Optional[str]`	Gene-tree resource name.	`None`
`compara`	`str`	Compara database name.	`'vertebrates'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with gene tree data.

get_genetree_member ¶

get_genetree_member(
    species: str,
    id: str,
    aligned: bool = False,
    sequence: str = "protein",
    compara: str = "vertebrates",
) -> EnsemblFetchedData

Get gene tree containing a gene ID.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`id`	`str`	Ensembl gene ID.	required
`aligned`	`bool`	Include aligned sequences.	`False`
`sequence`	`str`	Sequence type.	`'protein'`
`compara`	`str`	Compara database name.	`'vertebrates'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with gene tree data.

get_assembly_info ¶

get_assembly_info(
    species: str,
    bands: bool = False,
    synonyms: bool = False,
) -> EnsemblFetchedData

Get assembly information for a species.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name.	required
`bands`	`bool`	Include karyotype band information.	`False`
`synonyms`	`bool`	Include known synonyms.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with assembly information.

get_species_info ¶

get_species_info(
    division: Optional[str] = None,
    strain_collection: Optional[str] = None,
    hide_strain_info: bool = False,
) -> EnsemblFetchedData

Get information about available species.

Parameters:

Name	Type	Description	Default
`division`	`Optional[str]`	Filter by Ensembl division.	`None`
`strain_collection`	`Optional[str]`	Filter by strain collection.	`None`
`hide_strain_info`	`bool`	Hide strain information.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData with species information.

BioMart_Fetcher¶

BioMart_Fetcher ¶

BioMart_Fetcher(
    host: Union[str, BioMartHost] = main,
    **data_manager_kws: Any,
)

Fetcher for BioMart (Ensembl) genomic data.

BioMart provides access to:

Gene information (IDs, names, descriptions, coordinates)
Transcript and protein data
Sequence data (cDNA, coding, peptide)
Homology information
Variation data
GO annotations

The API has a hierarchical structure:

Server: Contains multiple marts (e.g., ENSEMBL_MART_ENSEMBL)
Mart: Contains multiple datasets (e.g., hsapiens_gene_ensembl)
Dataset: Contains filters and attributes for queries

Example

fetcher = BioMart_Fetcher()

# List available marts
marts = fetcher.list_marts()
print(marts.marts)

# List datasets in a mart
datasets = fetcher.list_datasets()
print(datasets.search(contain="human"))

# Get gene info by Ensembl IDs
data = fetcher.get_genes(
    ids=["ENSG00000141510", "ENSG00000012048"],
    attributes=["ensembl_gene_id", "external_gene_name", "description"]
)
df = data.as_dataframe()

# Get genes by gene names
data = fetcher.get_genes_by_name(
    names=["TP53", "BRCA1", "BRCA2"],
    attributes=["ensembl_gene_id", "chromosome_name", "start_position"]
)

Note

BioMart API has rate limits and can be slow for large queries. Use batching for queries with many filter values.

Initialize BioMart fetcher.

Parameters:

Name	Type	Description	Default
`host`	`Union[str, BioMartHost]`	BioMart host (default: www.ensembl.org).	`main`
`**data_manager_kws`	`Any`	Keyword arguments for BioMartDataManager.	`{}`

host `property` ¶

host: str

Get current host.

list_marts ¶

list_marts() -> BioMartRegistryData

List available marts on the server.

Returns:

Type	Description
`BioMartRegistryData`	BioMartRegistryData with mart information.

list_datasets ¶

list_datasets(
    mart: Union[str, BioMartMart] = ensembl,
) -> BioMartDatasetsData

List datasets available in a mart.

Parameters:

Name	Type	Description	Default
`mart`	`Union[str, BioMartMart]`	Mart name (default: ENSEMBL_MART_ENSEMBL).	`ensembl`

Returns:

Type	Description
`BioMartDatasetsData`	BioMartDatasetsData with dataset information.

get_config ¶

get_config(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    use_cache: bool = True,
) -> BioMartConfigData

Get dataset configuration (filters and attributes).

Parameters:

Name	Type	Description	Default
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`use_cache`	`bool`	Whether to use cached configuration.	`True`

Returns:

Type	Description
`BioMartConfigData`	BioMartConfigData with filters and attributes.

list_attributes ¶

list_attributes(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    contain: Optional[str] = None,
    pattern: Optional[str] = None,
) -> Any

List available attributes for a dataset.

Parameters:

Name	Type	Description	Default
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`contain`	`Optional[str]`	Filter attributes containing this string.	`None`
`pattern`	`Optional[str]`	Filter attributes matching this regex pattern.	`None`

Returns:

Type	Description
`Any`	DataFrame with attribute information.

list_filters ¶

list_filters(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    contain: Optional[str] = None,
    pattern: Optional[str] = None,
) -> Any

List available filters for a dataset.

Parameters:

Name	Type	Description	Default
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`contain`	`Optional[str]`	Filter filters containing this string.	`None`
`pattern`	`Optional[str]`	Filter filters matching this regex pattern.	`None`

Returns:

Type	Description
`Any`	DataFrame with filter information.

query ¶

query(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    attributes: Optional[List[str]] = None,
    filters: Optional[
        Dict[str, Union[str, List[str]]]
    ] = None,
    unique_rows: bool = True,
) -> BioMartQueryData

Execute a BioMart query.

Parameters:

Name	Type	Description	Default
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`attributes`	`Optional[List[str]]`	List of attributes to retrieve.	`None`
`filters`	`Optional[Dict[str, Union[str, List[str]]]]`	Dict of filter name to value(s).	`None`
`unique_rows`	`bool`	Whether to return unique rows only.	`True`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData with query results.

batch_query ¶

batch_query(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    attributes: Optional[List[str]] = None,
    filter_name: str = "ensembl_gene_id",
    filter_values: List[str] = None,
    batch_size: int = 500,
    max_workers: int = 4,
    show_progress: bool = True,
) -> BioMartQueryData

Execute a batched BioMart query for many filter values.

BioMart has limits on query size, so large filter lists are split into batches and queried in parallel using threads.

Parameters:

Name	Type	Description	Default
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`attributes`	`Optional[List[str]]`	List of attributes to retrieve.	`None`
`filter_name`	`str`	Name of the filter to batch.	`'ensembl_gene_id'`
`filter_values`	`List[str]`	List of filter values.	`None`
`batch_size`	`int`	Number of values per batch.	`500`
`max_workers`	`int`	Number of parallel workers.	`4`
`show_progress`	`bool`	Whether to show progress bar.	`True`

Returns:

Type	Description
`BioMartQueryData`	Combined BioMartQueryData with all results.

get_genes ¶

get_genes(
    ids: List[str],
    attributes: Optional[List[str]] = None,
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get gene information by Ensembl gene IDs.

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of Ensembl gene IDs.	required
`attributes`	`Optional[List[str]]`	Attributes to retrieve. Defaults to common gene attributes.	`None`
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`batch_size`	`int`	Batch size for large queries.	`500`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData with gene information.

get_genes_by_name ¶

get_genes_by_name(
    names: List[str],
    attributes: Optional[List[str]] = None,
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get gene information by gene names (symbols).

Parameters:

Name	Type	Description	Default
`names`	`List[str]`	List of gene names/symbols.	required
`attributes`	`Optional[List[str]]`	Attributes to retrieve.	`None`
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`batch_size`	`int`	Batch size for large queries.	`500`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData with gene information.

get_genes_by_chromosome ¶

get_genes_by_chromosome(
    chromosome: str,
    start: Optional[int] = None,
    end: Optional[int] = None,
    attributes: Optional[List[str]] = None,
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
) -> BioMartQueryData

Get genes on a chromosome, optionally within a region.

Parameters:

Name	Type	Description	Default
`chromosome`	`str`	Chromosome name (e.g., "1", "X", "MT").	required
`start`	`Optional[int]`	Start position (optional).	`None`
`end`	`Optional[int]`	End position (optional).	`None`
`attributes`	`Optional[List[str]]`	Attributes to retrieve.	`None`
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData with genes in the region.

get_transcripts ¶

get_transcripts(
    gene_ids: List[str],
    attributes: Optional[List[str]] = None,
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get transcript information for genes.

Parameters:

Name	Type	Description	Default
`gene_ids`	`List[str]`	List of Ensembl gene IDs.	required
`attributes`	`Optional[List[str]]`	Attributes to retrieve.	`None`
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`batch_size`	`int`	Batch size for large queries.	`500`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData with transcript information.

get_go_annotations ¶

get_go_annotations(
    gene_ids: List[str],
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get Gene Ontology annotations for genes.

Parameters:

Name	Type	Description	Default
`gene_ids`	`List[str]`	List of Ensembl gene IDs.	required
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`batch_size`	`int`	Batch size for large queries.	`500`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData with GO annotations.

get_homologs ¶

get_homologs(
    gene_ids: List[str],
    target_species: str = "mmusculus",
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get homolog information for genes.

Parameters:

Name	Type	Description	Default
`gene_ids`	`List[str]`	List of Ensembl gene IDs.	required
`target_species`	`str`	Target species for homologs (e.g., "mmusculus").	`'mmusculus'`
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`batch_size`	`int`	Batch size for large queries.	`500`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData with homolog information.

convert_ids ¶

convert_ids(
    ids: List[str],
    from_type: str = "ensembl_gene_id",
    to_type: str = "external_gene_name",
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Convert between different ID types.

Common ID types: - ensembl_gene_id - ensembl_transcript_id - ensembl_peptide_id - external_gene_name - entrezgene_id - uniprot_gn_id - hgnc_symbol - hgnc_id - refseq_mrna - refseq_peptide

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of IDs to convert.	required
`from_type`	`str`	Source ID type (also used as filter).	`'ensembl_gene_id'`
`to_type`	`str`	Target ID type.	`'external_gene_name'`
`dataset`	`Union[str, BioMartDataset]`	Dataset name.	`hsapiens_gene`
`batch_size`	`int`	Batch size for large queries.	`500`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData with ID mappings.

KEGG_Fetcher¶

KEGG_Fetcher ¶

KEGG_Fetcher(**data_manager_kws: Any)

Fetcher for KEGG REST API.

KEGG (Kyoto Encyclopedia of Genes and Genomes) provides access to:

Pathway information and diagrams
Gene and protein entries
Compound and drug data
Disease information
Organism-specific pathway lists
ID conversion between databases

Operations:

info: Get database statistics
list: List database entries
find: Search entries by keyword
get: Retrieve specific entries
conv: Convert IDs between databases
link: Find linked entries across databases
ddi: Drug-drug interactions

Example

fetcher = KEGG_Fetcher()

# Get database info
info = fetcher.get("info", database="pathway")
print(info.text)

# List human pathways
pathways = fetcher.get("list", database="pathway", organism="hsa")
print(pathways.to_dataframe())

# Search for genes
results = fetcher.get("find", database="genes", query="tp53")

# Get specific entries
entries = fetcher.get("get", dbentries=["hsa:7157", "hsa:672"])
for record in entries.records:
    print(record.get("ENTRY"), record.get("NAME"))

# Convert KEGG IDs to NCBI Gene IDs
mapping = fetcher.get("conv", target_db="ncbi-geneid", dbentries=["hsa:7157"])

Initialize KEGG fetcher.

Parameters:

Name	Type	Description	Default
`**data_manager_kws`	`Any`	Keyword arguments for KEGGDataManager (e.g., storage_path for stream_to_storage method).	`{}`

get ¶

get(operation: str, **kwargs: Any) -> KEGGFetchedData

Fetch data from KEGG REST API.

Parameters:

Name	Type	Description	Default
`operation`	`str`	KEGG operation (info, list, find, get, conv, link, ddi).	required
`**kwargs`	`Any`	Operation-specific parameters (database, query, dbentries, etc.).	`{}`

Returns:

Type	Description
`KEGGFetchedData`	KEGGFetchedData with parsed results.

get_all ¶

get_all(
    operation: str,
    dbentries: List[str],
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    batch_size: int = DEFAULT_BATCH_SIZE,
    rate_limit_per_second: int = 3,
    get_option: Optional[str] = None,
    **kwargs: Any,
) -> Union[KEGGFetchedData, Path]

Fetch data for many entries by batching and concurrent requests.

KEGG limits certain operations (get, conv, link, ddi) to a small number of entries per request. This method splits a large entry list into batches and fetches them concurrently.

Parameters:

Name	Type	Description	Default
`operation`	`str`	KEGG operation (`get`, `conv`, `link`, `ddi`).	required
`dbentries`	`List[str]`	List of database entry IDs to fetch.	required
`method`	`Literal['concat', 'stream_to_storage']`	`"concat"` returns a single :class:`KEGGFetchedData`. `"stream_to_storage"` writes batches to storage and returns the output file :class:`Path` (requires `storage_path` in constructor).	`'concat'`
`batch_size`	`int`	Entries per request (default 10, KEGG's limit).	`DEFAULT_BATCH_SIZE`
`rate_limit_per_second`	`int`	Max requests per second (default 3 to be conservative with KEGG).	`3`
`get_option`	`Optional[str]`	For `get` operation, the output format (aaseq, ntseq, image, json, etc.).	`None`
`**kwargs`	`Any`	Additional parameters (target_db for conv/link, etc.).	`{}`

Returns:

Type	Description
`Union[KEGGFetchedData, Path]`	Combined KEGGFetchedData or Path to output file.

Example::

fetcher = KEGG_Fetcher(storage_path="./data")
genes = ["hsa:10458", "hsa:7157", "hsa:672", ...]  # 100+ genes
data = fetcher.get_all("get", genes)
print(len(data.records))

ChEMBL_Fetcher¶

ChEMBL_Fetcher ¶

ChEMBL_Fetcher(**data_manager_kws)

Fetcher for ChEMBL REST API.

ChEMBL provides bioactivity data for drug-like molecules including:

Molecules and their properties
Bioactivity measurements
Targets (proteins, cell lines, organisms)
Assays and documents
Drug information and indications

Example

fetcher = ChEMBL_Fetcher()

# Get a specific molecule by ChEMBL ID
aspirin = fetcher.get(resource="molecule", chembl_id="CHEMBL25")
print(aspirin.results[0]["pref_name"])

# Search for molecules
results = fetcher.get(
    resource="molecule",
    search_query="aspirin",
    limit=10
)

# Filter activities by target
activities = fetcher.get(
    resource="activity",
    filters={"target_chembl_id": "CHEMBL240"},
    limit=100
)

# Similarity search
similar = fetcher.get(
    resource="similarity",
    smiles="CC(=O)Oc1ccccc1C(=O)O",  # Aspirin SMILES
    similarity_threshold=70,
    limit=50
)

get ¶

get(
    resource: str,
    chembl_id: Optional[str] = None,
    search_query: Optional[str] = None,
    filters: Optional[Dict[str, Any]] = None,
    smiles: Optional[str] = None,
    similarity_threshold: Optional[int] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    format: str = "json",
) -> ChEMBLFetchedData

Fetch data from ChEMBL REST API.

Parameters:

Name	Type	Description	Default
`resource`	`str`	ChEMBL resource (molecule, activity, target, etc.).	required
`chembl_id`	`Optional[str]`	Optional ChEMBL ID for single-entry lookup.	`None`
`search_query`	`Optional[str]`	Optional full-text search query.	`None`
`filters`	`Optional[Dict[str, Any]]`	Optional field filters (e.g., {"max_phase": 4}).	`None`
`smiles`	`Optional[str]`	SMILES string for similarity/substructure search.	`None`
`similarity_threshold`	`Optional[int]`	Threshold for similarity search (40-100).	`None`
`limit`	`Optional[int]`	Max records to return (1-1000).	`None`
`offset`	`Optional[int]`	Pagination offset.	`None`
`format`	`str`	Output format (json or xml).	`'json'`

Returns:

Type	Description
`ChEMBLFetchedData`	ChEMBLFetchedData with parsed results.

get_all ¶

get_all(
    resource: str,
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    limit_per_page: int = 1000,
    max_records: Optional[int] = None,
    rate_limit_per_second: int = 5,
    search_query: Optional[str] = None,
    filters: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> Union[ChEMBLFetchedData, Path]

Fetch multiple pages of results concurrently.

Parameters:

Name	Type	Description	Default
`resource`	`str`	ChEMBL resource (molecule, activity, target, etc.).	required
`method`	`Literal['concat', 'stream_to_storage']`	"concat" returns a single ChEMBLFetchedData. "stream_to_storage" streams each batch to storage and returns the output file Path.	`'concat'`
`limit_per_page`	`int`	Records per request (default 1000, max 1000).	`1000`
`max_records`	`Optional[int]`	Total records to fetch. None means fetch all.	`None`
`rate_limit_per_second`	`int`	Max concurrent requests per second.	`5`
`search_query`	`Optional[str]`	Optional full-text search query.	`None`
`filters`	`Optional[Dict[str, Any]]`	Optional field filters.	`None`
`**kwargs`	`Any`	Additional parameters.	`{}`

Returns:

Type	Description
`Union[ChEMBLFetchedData, Path]`	Combined ChEMBLFetchedData or Path to output file.

get_molecule ¶

get_molecule(chembl_id: str) -> ChEMBLFetchedData

Get a single molecule by ChEMBL ID.

get_target ¶

get_target(chembl_id: str) -> ChEMBLFetchedData

Get a single target by ChEMBL ID.

search_molecules ¶

search_molecules(
    query: str, limit: int = 20
) -> ChEMBLFetchedData

Search molecules by name or description.

get_activities_for_target ¶

get_activities_for_target(
    target_chembl_id: str, limit: int = 1000
) -> ChEMBLFetchedData

Get bioactivity data for a specific target.

get_activities_for_molecule ¶

get_activities_for_molecule(
    molecule_chembl_id: str, limit: int = 1000
) -> ChEMBLFetchedData

Get bioactivity data for a specific molecule.

similarity_search ¶

similarity_search(
    smiles: str, threshold: int = 70, limit: int = 100
) -> ChEMBLFetchedData

Find molecules similar to a given SMILES structure.

substructure_search ¶

substructure_search(
    smiles: str, limit: int = 100
) -> ChEMBLFetchedData

Find molecules containing a given substructure.

get_approved_drugs ¶

get_approved_drugs(limit: int = 1000) -> ChEMBLFetchedData

Get approved drugs (max_phase = 4).

get_drug_indications ¶

get_drug_indications(
    molecule_chembl_id: str, limit: int = 100
) -> ChEMBLFetchedData

Get indications for a specific drug/molecule.

get_mechanisms ¶

get_mechanisms(
    molecule_chembl_id: str, limit: int = 100
) -> ChEMBLFetchedData

Get mechanisms of action for a specific molecule.

QuickGO_Fetcher¶

QuickGO_Fetcher ¶

QuickGO_Fetcher(**data_manager_kws: Any)

Fetcher for QuickGO API (GO annotations, ontology, gene products).

QuickGO provides access to:

Gene Ontology term information
GO annotations for genes/proteins
Gene product information
Annotation downloads in various formats (GAF, GPAD, TSV)

Categories:

ontology: GO term search and retrieval
annotation: GO annotation search and download
geneproduct: Gene product information

Example

fetcher = QuickGO_Fetcher()

# Search GO terms
data = fetcher.get(
    category="ontology",
    endpoint="search",
    query="apoptosis"
)

# Get GO term by ID
data = fetcher.get(
    category="ontology",
    endpoint="terms/{ids}",
    ids=["GO:0008150", "GO:0003674"]
)

# Search annotations for human
data = fetcher.get(
    category="annotation",
    endpoint="search",
    goId="GO:0006915",  # apoptotic process
    taxonId=9606
)
df = data.as_dataframe()

Initialize QuickGO fetcher.

Parameters:

Name	Type	Description	Default
`**data_manager_kws`	`Any`	Keyword arguments for QuickGODataManager (e.g., storage_path for stream_to_storage method).	`{}`

get ¶

get(
    category: str, endpoint: str, **kwargs: Any
) -> QuickGOFetchedData

Fetch data from QuickGO API.

Parameters:

Name	Type	Description	Default
`category`	`str`	QuickGO category (ontology, annotation, geneproduct).	required
`endpoint`	`str`	API endpoint (search, terms/{ids}, downloadSearch, etc.).	required
`**kwargs`	`Any`	Endpoint-specific parameters.	`{}`

Returns:

Type	Description
`QuickGOFetchedData`	QuickGOFetchedData with parsed results.

get_all ¶

get_all(
    category: str,
    endpoint: str,
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    limit_per_page: int = DEFAULT_LIMIT,
    max_records: Optional[int] = None,
    rate_limit_per_second: int = 5,
    **kwargs: Any,
) -> Union[QuickGOFetchedData, Path]

Fetch multiple pages of results concurrently.

Parameters:

Name	Type	Description	Default
`category`	`str`	QuickGO category (ontology, annotation, geneproduct).	required
`endpoint`	`str`	API endpoint (search, etc.). Note: downloadSearch doesn't support pagination, use get() directly.	required
`method`	`Literal['concat', 'stream_to_storage']`	`"concat"` returns a single QuickGOFetchedData. `"stream_to_storage"` streams each batch to storage and returns the output file Path.	`'concat'`
`limit_per_page`	`int`	Records per request (default 100, max 10000).	`DEFAULT_LIMIT`
`max_records`	`Optional[int]`	Total records to fetch. None means fetch all.	`None`
`rate_limit_per_second`	`int`	Max concurrent requests per second.	`5`
`**kwargs`	`Any`	Forwarded to the API (goId, taxonId, etc.).	`{}`

Returns:

Type	Description
`Union[QuickGOFetchedData, Path]`	Combined QuickGOFetchedData or Path to output file.

HPA_Fetcher¶

HPA_Fetcher ¶

HPA_Fetcher(**data_manager_kws)

Fetcher for Human Protein Atlas data.

The Human Protein Atlas provides proteomics data including:

Tissue expression (protein and RNA)
Subcellular location
Cell type expression
Blood cell expression
Brain region expression
Cancer/pathology data

Example

fetcher = HPA_Fetcher()

# Get gene data by Ensembl ID
tp53 = fetcher.get_gene("ENSG00000141510")
print(tp53.results[0])

# Search for genes
results = fetcher.search("TP53")
print(results.get_gene_names())

# Get specific columns for genes
data = fetcher.search_download(
    search="TP53",
    columns=["g", "gs", "eg", "gd", "rnats_s"]
)
df = data.as_dataframe()

# Get expression data with default columns
expr = fetcher.get_expression("BRCA1")

# Get subcellular location data
loc = fetcher.get_subcellular_location("ENSG00000141510")

get_gene ¶

get_gene(
    ensembl_id: str, format: str = "json"
) -> HPAFetchedData

Get gene data by Ensembl ID.

Parameters:

Name	Type	Description	Default
`ensembl_id`	`str`	Ensembl gene ID (e.g., "ENSG00000141510").	required
`format`	`str`	Output format (json, tsv, xml).	`'json'`

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with gene information.

get_genes ¶

get_genes(
    ensembl_ids: List[str],
    format: str = "json",
    rate_limit_per_second: int = 5,
) -> HPAFetchedData

Get data for multiple genes by Ensembl IDs.

Parameters:

Name	Type	Description	Default
`ensembl_ids`	`List[str]`	List of Ensembl gene IDs.	required
`format`	`str`	Output format.	`'json'`
`rate_limit_per_second`	`int`	Rate limit for API calls.	`5`

Returns:

Type	Description
`HPAFetchedData`	Combined HPAFetchedData.

search ¶

search(
    query: str, format: str = "json", compress: str = "no"
) -> HPAFetchedData

Search for genes in HPA.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query (gene name, etc.).	required
`format`	`str`	Output format (json, tsv, xml).	`'json'`
`compress`	`str`	Whether to compress response (yes/no).	`'no'`

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with search results.

search_download ¶

search_download(
    search: str,
    columns: Optional[List[str]] = None,
    format: str = "json",
    compress: str = "no",
) -> HPAFetchedData

Fetch customized data using the search_download API.

This is the most flexible way to retrieve HPA data, allowing selection of specific columns.

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required
`columns`	`Optional[List[str]]`	List of column specifiers (see HPA_COLUMNS). If None, uses DEFAULT_GENE_COLUMNS.	`None`
`format`	`str`	Output format (json or tsv).	`'json'`
`compress`	`str`	Whether to compress response (yes/no).	`'no'`

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with requested columns.

get_all ¶

get_all(
    search: str,
    columns: Optional[List[str]] = None,
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    format: str = "json",
    **kwargs: Any,
) -> Union[HPAFetchedData, Path]

Fetch data with batching support.

Note: HPA's search_download API doesn't natively support pagination, so this method is mainly useful for storing results.

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required
`columns`	`Optional[List[str]]`	List of column specifiers.	`None`
`method`	`Literal['concat', 'stream_to_storage']`	"concat" or "stream_to_storage".	`'concat'`
`format`	`str`	Output format.	`'json'`
`**kwargs`	`Any`	Additional parameters.	`{}`

Returns:

Type	Description
`Union[HPAFetchedData, Path]`	HPAFetchedData or Path to stored file.

get_expression ¶

get_expression(
    search: str, columns: Optional[List[str]] = None
) -> HPAFetchedData

Get expression data for gene(s).

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required
`columns`	`Optional[List[str]]`	Expression columns to retrieve. If None, uses DEFAULT_EXPRESSION_COLUMNS.	`None`

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with expression data.

get_subcellular_location ¶

get_subcellular_location(
    search: str, columns: Optional[List[str]] = None
) -> HPAFetchedData

Get subcellular location data for gene(s).

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required
`columns`	`Optional[List[str]]`	Subcellular location columns to retrieve. If None, uses DEFAULT_SUBCELLULAR_COLUMNS.	`None`

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with subcellular location data.

get_pathology ¶

get_pathology(
    search: str, columns: Optional[List[str]] = None
) -> HPAFetchedData

Get pathology/cancer prognostics data for gene(s).

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required
`columns`	`Optional[List[str]]`	Pathology columns to retrieve. If None, uses DEFAULT_PATHOLOGY_COLUMNS.	`None`

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with pathology data.

get_protein_class ¶

get_protein_class(search: str) -> HPAFetchedData

Get protein class information for gene(s).

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with protein class information.

get_tissue_expression ¶

get_tissue_expression(
    search: str, tissues: Optional[List[str]] = None
) -> HPAFetchedData

Get tissue-specific RNA expression data.

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required
`tissues`	`Optional[List[str]]`	List of tissue column names to include. If None, gets general tissue expression info.	`None`

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with tissue expression data.

get_blood_expression ¶

get_blood_expression(search: str) -> HPAFetchedData

Get blood cell expression data for gene(s).

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with blood cell expression data.

get_brain_expression ¶

get_brain_expression(search: str) -> HPAFetchedData

Get brain region expression data for gene(s).

Parameters:

Name	Type	Description	Default
`search`	`str`	Gene search query.	required

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData with brain region expression data.

download_bulk_data ¶

download_bulk_data(
    file_type: str = "json",
    version: Optional[str] = None,
    output_path: Optional[str] = None,
) -> Path

Download bulk HPA data file.

Parameters:

Name	Type	Description	Default
`file_type`	`str`	File type to download (tsv, json, xml).	`'json'`
`version`	`Optional[str]`	HPA version number (e.g., "24"). None for latest.	`None`
`output_path`	`Optional[str]`	Path to save file. If None, saves to data manager path.	`None`

Returns:

Type	Description
`Path`	Path to downloaded file.

list_columns `staticmethod` ¶

list_columns() -> Dict[str, str]

List available column specifiers for search_download API.

Returns:

Type	Description
`Dict[str, str]`	Dictionary mapping column codes to descriptions.

NCBI_Fetcher¶

NCBI_Fetcher ¶

NCBI_Fetcher(api_key: Optional[str] = None)

Fetcher for NCBI Datasets API.

Provides access to NCBI gene, taxonomy, and genome data via the Datasets REST API v2.

Example

fetcher = NCBI_Fetcher()

# Get gene information by NCBI Gene ID
genes = fetcher.get_genes_by_id([7157, 672])  # TP53, BRCA1
print(genes.as_dataframe())

# Get gene by symbol and taxon
genes = fetcher.get_genes_by_symbol(["TP53", "BRCA1"], taxon="human")

# Get taxonomy information
tax = fetcher.get_taxonomy([9606, 10090])  # Human, mouse
print(tax.as_dataframe())

# Translate gene symbols to IDs
mapping = fetcher.symbol_to_id(["TP53", "BRCA1"], taxon="human")

Initialize NCBI fetcher.

Parameters:

Name	Type	Description	Default
`api_key`	`Optional[str]`	NCBI API key for higher rate limits. Can also be set via NCBI_API_KEY environment variable.	`None`

get_genes_by_id ¶

get_genes_by_id(
    gene_ids: List[int],
    returned_content: Optional[str] = None,
    page_size: int = 100,
    query: Optional[str] = None,
    types: Optional[List[str]] = None,
) -> NCBIGeneFetchedData

Get gene data reports by NCBI Gene IDs.

Parameters:

Name	Type	Description	Default
`gene_ids`	`List[int]`	List of NCBI Gene IDs (e.g., [7157, 672]).	required
`returned_content`	`Optional[str]`	Content type (COMPLETE, IDS_ONLY, COUNTS_ONLY).	`None`
`page_size`	`int`	Results per page (max 1000).	`100`
`query`	`Optional[str]`	Additional search query.	`None`
`types`	`Optional[List[str]]`	Gene type filter (e.g., ["PROTEIN_CODING"]).	`None`

Returns:

Type	Description
`NCBIGeneFetchedData`	NCBIGeneFetchedData with gene reports.

Example

fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_id([7157, 672]) print(genes.get_gene_symbols()) ['TP53', 'BRCA1']

get_genes_by_symbol ¶

get_genes_by_symbol(
    symbols: List[str],
    taxon: Union[int, str] = "human",
    returned_content: Optional[str] = None,
    page_size: int = 100,
) -> NCBIGeneFetchedData

Get gene data reports by gene symbols and taxon.

Parameters:

Name	Type	Description	Default
`symbols`	`List[str]`	List of gene symbols (e.g., ["TP53", "BRCA1"]).	required
`taxon`	`Union[int, str]`	Taxon ID, common name, or scientific name.	`'human'`
`returned_content`	`Optional[str]`	Content type.	`None`
`page_size`	`int`	Results per page.	`100`

Returns:

Type	Description
`NCBIGeneFetchedData`	NCBIGeneFetchedData with gene reports.

Example

fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_symbol(["TP53", "BRCA1"], taxon="human") print(genes.to_id_mapping())

get_genes_by_accession ¶

get_genes_by_accession(
    accessions: List[str],
    returned_content: Optional[str] = None,
    page_size: int = 100,
) -> NCBIGeneFetchedData

Get gene data reports by RefSeq accessions.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of RefSeq accessions (e.g., ["NM_000546.6"]).	required
`returned_content`	`Optional[str]`	Content type.	`None`
`page_size`	`int`	Results per page.	`100`

Returns:

Type	Description
`NCBIGeneFetchedData`	NCBIGeneFetchedData with gene reports.

get_genes_by_taxon ¶

get_genes_by_taxon(
    taxon: Union[int, str],
    query: Optional[str] = None,
    types: Optional[List[str]] = None,
    page_size: int = 100,
    page_token: Optional[str] = None,
) -> NCBIGeneFetchedData

Get gene data reports by taxon.

Parameters:

Name	Type	Description	Default
`taxon`	`Union[int, str]`	Taxon ID, common name, or scientific name.	required
`query`	`Optional[str]`	Search query for gene name/symbol/description.	`None`
`types`	`Optional[List[str]]`	Gene type filter.	`None`
`page_size`	`int`	Results per page.	`100`
`page_token`	`Optional[str]`	Token for pagination.	`None`

Returns:

Type	Description
`NCBIGeneFetchedData`	NCBIGeneFetchedData with gene reports.

Example

fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_taxon("human", query="kinase")

get_taxonomy ¶

get_taxonomy(
    taxons: List[Union[int, str]], page_size: int = 100
) -> NCBITaxonomyFetchedData

Get taxonomy data reports.

Parameters:

Name	Type	Description	Default
`taxons`	`List[Union[int, str]]`	List of taxonomy IDs or names.	required
`page_size`	`int`	Results per page.	`100`

Returns:

Type	Description
`NCBITaxonomyFetchedData`	NCBITaxonomyFetchedData with taxonomy reports.

Example

fetcher = NCBI_Fetcher() tax = fetcher.get_taxonomy([9606, 10090]) print(tax.as_dataframe())

get_genome_by_accession ¶

get_genome_by_accession(
    accessions: List[str], page_size: int = 100
) -> NCBIGenomeFetchedData

Get genome assembly data reports by accession.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of assembly accessions (e.g., ["GCF_000001405.40"]).	required
`page_size`	`int`	Results per page.	`100`

Returns:

Type	Description
`NCBIGenomeFetchedData`	NCBIGenomeFetchedData with genome reports.

Example

fetcher = NCBI_Fetcher() genomes = fetcher.get_genome_by_accession(["GCF_000001405.40"])

get_genome_by_taxon ¶

get_genome_by_taxon(
    taxon: Union[int, str],
    page_size: int = 100,
    page_token: Optional[str] = None,
    reference_only: bool = False,
    assembly_source: Optional[str] = None,
) -> NCBIGenomeFetchedData

Get genome assembly data reports by taxon.

Parameters:

Name	Type	Description	Default
`taxon`	`Union[int, str]`	Taxon ID, common name, or scientific name.	required
`page_size`	`int`	Results per page.	`100`
`page_token`	`Optional[str]`	Token for pagination.	`None`
`reference_only`	`bool`	If True, only return reference genomes.	`False`
`assembly_source`	`Optional[str]`	Filter by source ("refseq", "genbank", "all").	`None`

Returns:

Type	Description
`NCBIGenomeFetchedData`	NCBIGenomeFetchedData with genome reports.

get_version ¶

get_version() -> str

Get NCBI Datasets API version.

Returns:

Type	Description
`str`	Version string.

symbol_to_id ¶

symbol_to_id(
    symbols: List[str], taxon: Union[int, str] = "human"
) -> Dict[str, int]

Convert gene symbols to NCBI Gene IDs.

Parameters:

Name	Type	Description	Default
`symbols`	`List[str]`	List of gene symbols.	required
`taxon`	`Union[int, str]`	Taxon for the genes.	`'human'`

Returns:

Type	Description
`Dict[str, int]`	Dictionary mapping symbols to gene IDs.

Example

fetcher = NCBI_Fetcher() mapping = fetcher.symbol_to_id(["TP53", "BRCA1"]) print(mapping)

id_to_symbol ¶

id_to_symbol(gene_ids: List[int]) -> Dict[int, str]

Convert NCBI Gene IDs to gene symbols.

Parameters:

Name	Type	Description	Default
`gene_ids`	`List[int]`	List of NCBI Gene IDs.	required

Returns:

Type	Description
`Dict[int, str]`	Dictionary mapping gene IDs to symbols.

Example

fetcher = NCBI_Fetcher() mapping = fetcher.id_to_symbol([7157, 672]) print(mapping)

get_gene_info ¶

get_gene_info(
    identifiers: List[Union[int, str]],
    taxon: Union[int, str] = "human",
) -> NCBIGeneFetchedData

Get gene information by mixed identifiers (IDs or symbols).

Automatically detects whether input is gene IDs or symbols and routes to the appropriate endpoint.

Parameters:

Name	Type	Description	Default
`identifiers`	`List[Union[int, str]]`	List of gene IDs (int) or symbols (str).	required
`taxon`	`Union[int, str]`	Taxon for symbol lookups.	`'human'`

Returns:

Type	Description
`NCBIGeneFetchedData`	NCBIGeneFetchedData with gene reports.

FDA_Fetcher¶

FDA_Fetcher ¶

FDA_Fetcher(
    api_key: Optional[str] = None,
    limit: Optional[int] = None,
    **data_manager_kws: Any,
)

Fetcher for openFDA API.

The openFDA API provides access to FDA data including:

Drug adverse events (drug/event)
Drug product labeling (drug/label)
Drug recalls and enforcement (drug/enforcement)
Device adverse events and recalls
Food recalls and enforcement

Rate limits:

Without API key: 240 requests/min, 1,000 requests/day per IP
With API key: 240 requests/min, 120,000 requests/day per key

Example

fetcher = FDA_Fetcher()

# Search drug adverse events
events = fetcher.get(
    category="drug",
    endpoint="event",
    search={"patient.drug.medicinalproduct": "aspirin"},
    limit=10
)
df = events.as_dataframe(columns=["receivedate", "patient.patientsex"])

# Get drug labels
labels = fetcher.get(
    category="drug",
    endpoint="label",
    search={"openfda.brand_name": "TYLENOL"},
    limit=5
)

Initialize FDA fetcher.

Parameters:

Name	Type	Description	Default
`api_key`	`Optional[str]`	openFDA API key for higher rate limits (optional).	`None`
`limit`	`Optional[int]`	Default limit for queries. If None, uses API default.	`None`
`**data_manager_kws`	`Any`	Keyword arguments for FDADataManager (e.g., storage_path for stream_to_storage method).	`{}`

get ¶

get(
    category: str,
    endpoint: str,
    stream: Optional[bool] = None,
    **kwargs: Any,
) -> FDAFetchedData

Fetch data from openFDA API.

Parameters:

Name	Type	Description	Default
`category`	`str`	FDA category (e.g., "drug", "device", "food").	required
`endpoint`	`str`	Category endpoint (e.g., "event", "label", "enforcement").	required
`stream`	`Optional[bool]`	If True, stream the response (for large downloads).	`None`
`**kwargs`	`Any`	Query parameters including: - search: Search query dict (e.g., {"field": "value"}). - limit: Maximum records to return (1-1000). - skip: Number of records to skip for pagination. - sort: Sort field and direction. - count: Field to count occurrences of. - api_key: Override default API key.	`{}`

Returns:

Type	Description
`FDAFetchedData`	FDAFetchedData with query results.

Example

fetcher = FDA_Fetcher() data = fetcher.get( ... category="drug", ... endpoint="event", ... search={"patient.drug.medicinalproduct": "aspirin"}, ... limit=10 ... ) print(data)

get_all ¶

get_all(
    category: str,
    endpoint: str,
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    batch_size: int = 1000,
    max_records: Optional[int] = None,
    rate_limit_per_second: int = 4,
    **kwargs: Any,
) -> Union[FDAFetchedData, Path]

Fetch multiple pages of results concurrently.

Uses :meth:schedule_process to dispatch page requests across threads while staying within the FDA rate limit.

Parameters:

Name	Type	Description	Default
`category`	`str`	FDA category (e.g. `"drug"`).	required
`endpoint`	`str`	FDA endpoint (e.g. `"event"`).	required
`method`	`Literal['concat', 'stream_to_storage']`	`"concat"` accumulates all results in memory and returns a single :class:`FDAFetchedData`. `"stream_to_storage"` streams each batch to the data manager as JSON Lines and returns the output file :class:`Path`.	`'concat'`
`batch_size`	`int`	Records per request (max 1000).	`1000`
`max_records`	`Optional[int]`	Total records to fetch. `None` means fetch all available records.	`None`
`rate_limit_per_second`	`int`	Max concurrent requests per second (FDA default: 240/min ≈ 4/sec).	`4`
`**kwargs`	`Any`	Forwarded to the API (`search`, `sort`, etc.).	`{}`

Note — openFDA rate limits: Without an API key: 240 req/min, 1 000 req/day per IP. With an API key: 240 req/min, 120 000 req/day per key.

Reactome_Fetcher¶

Reactome_Fetcher ¶

Reactome_Fetcher(species: str = 'Homo sapiens')

Fetcher for Reactome pathway analysis and content APIs.

Reactome provides comprehensive pathway analysis including:

Over-representation analysis (ORA)
Expression analysis
Species comparison
Pathway hierarchy and content

Example

fetcher = Reactome_Fetcher()

# Perform pathway analysis
genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"]
result = fetcher.analyze(genes)
print(result.significant_pathways().as_dataframe())

# Analysis with projection to human
result = fetcher.analyze_projection(genes, species="Mus musculus")

# Get top-level pathways
pathways = fetcher.get_pathways_top("Homo sapiens")
print(pathways.get_pathway_names())

# Get species list
species = fetcher.get_species()
print(species.get_species_names())

Initialize Reactome fetcher.

Parameters:

Name	Type	Description	Default
`species`	`str`	Default species for analysis (e.g., "Homo sapiens").	`'Homo sapiens'`

set_species ¶

set_species(species: str)

Change the default species.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name (e.g., "Homo sapiens", "Mus musculus").	required

analyze ¶

analyze(
    identifiers: List[str],
    species: Optional[str] = None,
    interactors: bool = False,
    page_size: int = 100,
    sort_by: str = "ENTITIES_FDR",
    order: str = "ASC",
    resource: str = "TOTAL",
    p_value: float = 1.0,
    include_disease: bool = True,
    min_entities: Optional[int] = None,
    max_entities: Optional[int] = None,
) -> ReactomeFetchedData

Perform pathway over-representation analysis.

Submits identifiers to Reactome Analysis Service and returns enriched pathways with statistics.

Parameters:

Name	Type	Description	Default
`identifiers`	`List[str]`	List of identifiers (gene symbols, UniProt IDs, etc.).	required
`species`	`Optional[str]`	Species name. None uses default.	`None`
`interactors`	`bool`	Include interactors in analysis.	`False`
`page_size`	`int`	Number of results per page.	`100`
`sort_by`	`str`	Sort field (ENTITIES_FDR, ENTITIES_PVALUE, etc.).	`'ENTITIES_FDR'`
`order`	`str`	Sort order (ASC, DESC).	`'ASC'`
`resource`	`str`	Resource filter (TOTAL, UNIPROT, ENSEMBL, etc.).	`'TOTAL'`
`p_value`	`float`	P-value cutoff for filtering results.	`1.0`
`include_disease`	`bool`	Include disease pathways.	`True`
`min_entities`	`Optional[int]`	Minimum pathway size.	`None`
`max_entities`	`Optional[int]`	Maximum pathway size.	`None`

Returns:

Type	Description
`ReactomeFetchedData`	ReactomeFetchedData with pathway enrichment results.

Example

fetcher = Reactome_Fetcher() genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = fetcher.analyze(genes) print(result.significant_pathways(fdr_threshold=0.01).as_dataframe())

analyze_projection ¶

analyze_projection(
    identifiers: List[str],
    species: Optional[str] = None,
    interactors: bool = False,
    page_size: int = 100,
    sort_by: str = "ENTITIES_FDR",
    order: str = "ASC",
    resource: str = "TOTAL",
    p_value: float = 1.0,
    include_disease: bool = True,
) -> ReactomeFetchedData

Analyze identifiers and project results to Homo sapiens.

This is useful for analyzing data from other species while viewing results in the context of human pathways.

Parameters:

Name	Type	Description	Default
`identifiers`	`List[str]`	List of identifiers.	required
`species`	`Optional[str]`	Source species name (for mapping).	`None`
`interactors`	`bool`	Include interactors.	`False`
`page_size`	`int`	Results per page.	`100`
`sort_by`	`str`	Sort field.	`'ENTITIES_FDR'`
`order`	`str`	Sort order.	`'ASC'`
`resource`	`str`	Resource filter.	`'TOTAL'`
`p_value`	`float`	P-value cutoff.	`1.0`
`include_disease`	`bool`	Include disease pathways.	`True`

Returns:

Type	Description
`ReactomeFetchedData`	ReactomeFetchedData with human-projected pathway results.

analyze_single ¶

analyze_single(
    identifier: str,
    species: Optional[str] = None,
    interactors: bool = False,
) -> ReactomeFetchedData

Analyze a single identifier across species.

Parameters:

Name	Type	Description	Default
`identifier`	`str`	Single identifier to analyze.	required
`species`	`Optional[str]`	Species filter.	`None`
`interactors`	`bool`	Include interactors.	`False`

Returns:

Type	Description
`ReactomeFetchedData`	ReactomeFetchedData with pathways containing the identifier.

get_result_by_token ¶

get_result_by_token(
    token: str,
    species: Optional[str] = None,
    page_size: int = 100,
    page: int = 1,
    sort_by: str = "ENTITIES_FDR",
    order: str = "ASC",
    resource: str = "TOTAL",
    p_value: float = 1.0,
) -> ReactomeFetchedData

Retrieve analysis results by token.

Parameters:

Name	Type	Description	Default
`token`	`str`	Analysis token from previous analysis.	required
`species`	`Optional[str]`	Species filter.	`None`
`page_size`	`int`	Results per page.	`100`
`page`	`int`	Page number.	`1`
`sort_by`	`str`	Sort field.	`'ENTITIES_FDR'`
`order`	`str`	Sort order.	`'ASC'`
`resource`	`str`	Resource filter.	`'TOTAL'`
`p_value`	`float`	P-value cutoff.	`1.0`

Returns:

Type	Description
`ReactomeFetchedData`	ReactomeFetchedData with analysis results.

get_found_entities ¶

get_found_entities(
    token: str, pathway_id: str
) -> List[Dict[str, Any]]

Get entities found in a specific pathway.

Parameters:

Name	Type	Description	Default
`token`	`str`	Analysis token.	required
`pathway_id`	`str`	Pathway stable ID (e.g., "R-HSA-123456").	required

Returns:

Type	Description
`List[Dict[str, Any]]`	List of found entity dictionaries.

get_not_found_identifiers ¶

get_not_found_identifiers(token: str) -> List[str]

Get identifiers that were not found in Reactome.

Parameters:

Name	Type	Description	Default
`token`	`str`	Analysis token.	required

Returns:

Type	Description
`List[str]`	List of unmapped identifier strings.

download_results_json ¶

download_results_json(token: str) -> Dict[str, Any]

Download complete analysis results as JSON.

Parameters:

Name	Type	Description	Default
`token`	`str`	Analysis token.	required

Returns:

Type	Description
`Dict[str, Any]`	Complete analysis results dictionary.

map_identifiers ¶

map_identifiers(
    identifiers: List[str], interactors: bool = False
) -> List[Dict[str, Any]]

Map identifiers to Reactome entities without analysis.

Parameters:

Name	Type	Description	Default
`identifiers`	`List[str]`	List of identifiers to map.	required
`interactors`	`bool`	Include interactor mapping.	`False`

Returns:

Type	Description
`List[Dict[str, Any]]`	List of mapped entity dictionaries.

get_pathways_top ¶

get_pathways_top(
    species: Optional[str] = None,
) -> ReactomePathwaysData

Get top-level pathways for a species.

Parameters:

Name	Type	Description	Default
`species`	`Optional[str]`	Species name (e.g., "Homo sapiens").	`None`

Returns:

Type	Description
`ReactomePathwaysData`	ReactomePathwaysData with top-level pathway information.

Example

fetcher = Reactome_Fetcher() pathways = fetcher.get_pathways_top("Homo sapiens") print(pathways.get_pathway_names())

get_events_hierarchy ¶

get_events_hierarchy(
    species: Optional[str] = None,
) -> List[Dict[str, Any]]

Get full event hierarchy for a species.

Parameters:

Name	Type	Description	Default
`species`	`Optional[str]`	Species name.	`None`

Returns:

Type	Description
`List[Dict[str, Any]]`	List of event hierarchy dictionaries.

get_pathways_for_entity ¶

get_pathways_for_entity(
    entity_id: str,
) -> ReactomePathwaysData

Get pathways containing a specific entity.

Parameters:

Name	Type	Description	Default
`entity_id`	`str`	Entity identifier (UniProt, gene symbol, etc.).	required

Returns:

Type	Description
`ReactomePathwaysData`	ReactomePathwaysData with pathways containing the entity.

get_species ¶

get_species() -> ReactomeSpeciesData

Get all species in Reactome.

Returns:

Type	Description
`ReactomeSpeciesData`	ReactomeSpeciesData with species information.

Example

fetcher = Reactome_Fetcher() species = fetcher.get_species() print(species.get_species_names()[:10])

get_species_main ¶

get_species_main() -> ReactomeSpeciesData

Get main species with curated or computationally inferred pathways.

Returns:

Type	Description
`ReactomeSpeciesData`	ReactomeSpeciesData with main species information.

get_database_version ¶

get_database_version() -> str

Get current Reactome database version.

Returns:

Type	Description
`str`	Database version string.

query_entry ¶

query_entry(entry_id: str) -> Dict[str, Any]

Query a Reactome entry by ID.

Parameters:

Name	Type	Description	Default
`entry_id`	`str`	Reactome stable ID (e.g., "R-HSA-123456").	required

Returns:

Type	Description
`Dict[str, Any]`	Entry details dictionary.

get_participants ¶

get_participants(event_id: str) -> List[Dict[str, Any]]

Get all participants in an event (pathway/reaction).

Parameters:

Name	Type	Description	Default
`event_id`	`str`	Reactome stable ID (e.g., "R-HSA-69278").	required

Returns:

Type	Description
`List[Dict[str, Any]]`	List of participant dictionaries with physical entity info.

Example

fetcher = Reactome_Fetcher() participants = fetcher.get_participants("R-HSA-69278") for p in participants[:3]: ... print(p.get("displayName"))

get_participants_physical_entities ¶

get_participants_physical_entities(
    event_id: str,
) -> List[Dict[str, Any]]

Get participating physical entities in an event.

Parameters:

Name	Type	Description	Default
`event_id`	`str`	Reactome stable ID.	required

Returns:

Type	Description
`List[Dict[str, Any]]`	List of physical entity dictionaries.

get_participants_reference_entities ¶

get_participants_reference_entities(
    event_id: str,
) -> List[Dict[str, Any]]

Get reference entities (genes/proteins) for an event.

This returns the external database references (UniProt, NCBI Gene, etc.) for all participants in a pathway or reaction.

Parameters:

Name	Type	Description	Default
`event_id`	`str`	Reactome stable ID (e.g., "R-HSA-69278").	required

Returns:

Type	Description
`List[Dict[str, Any]]`	List of reference entity dictionaries containing: - identifier: External ID (e.g., UniProt accession) - databaseName: Source database (e.g., "UniProt") - displayName: Human-readable name - geneName: Gene symbol (if available)

Example

fetcher = Reactome_Fetcher() refs = fetcher.get_participants_reference_entities("R-HSA-69278") for ref in refs[:5]: ... print(f"{ref.get('geneName')}: {ref.get('identifier')}")

get_pathway_genes ¶

get_pathway_genes(
    pathway_id: str, id_type: str = "gene_symbol"
) -> List[str]

Get gene identifiers for a pathway.

Convenience method that extracts gene IDs from reference entities.

Parameters:

Name	Type	Description	Default
`pathway_id`	`str`	Reactome pathway stable ID.	required
`id_type`	`str`	Type of ID to return: - "gene_symbol": Gene symbols (default) - "uniprot": UniProt accessions - "all": Return dict with all available IDs	`'gene_symbol'`

Returns:

Type	Description
`List[str]`	List of gene identifiers.

Example

fetcher = Reactome_Fetcher() genes = fetcher.get_pathway_genes("R-HSA-69278") print(genes[:10]) ['TP53', 'MDM2', 'CDKN1A', ...]

get_all_pathways_with_genes ¶

get_all_pathways_with_genes(
    species: Optional[str] = None,
    id_type: str = "gene_symbol",
    include_hierarchy: bool = True,
) -> Dict[str, tuple]

Get all pathways with their gene members for a species.

This method builds a complete pathway-gene mapping suitable for local over-representation analysis.

Parameters:

Name	Type	Description	Default
`species`	`Optional[str]`	Species name (e.g., "Homo sapiens").	`None`
`id_type`	`str`	Gene ID type ("gene_symbol" or "uniprot").	`'gene_symbol'`
`include_hierarchy`	`bool`	If True, include all pathways in hierarchy. If False, only top-level pathways.	`True`

Returns:

Type	Description
`Dict[str, tuple]`	Dict mapping pathway_id -> (pathway_name, set of gene IDs).

Example

fetcher = Reactome_Fetcher() pathways = fetcher.get_all_pathways_with_genes("Homo sapiens") for pid, (name, genes) in list(pathways.items())[:3]: ... print(f"{pid}: {name} ({len(genes)} genes)")

Note

This method makes many API calls and may take several minutes for species with many pathways. Results should be cached.

get_event_ancestors ¶

get_event_ancestors(event_id: str) -> List[Dict[str, Any]]

Get ancestor pathways for an event.

Parameters:

Name	Type	Description	Default
`event_id`	`str`	Reactome stable ID.	required

Returns:

Type	Description
`List[Dict[str, Any]]`	List of ancestor pathway dictionaries.

get_complex_subunits ¶

get_complex_subunits(
    complex_id: str,
) -> List[Dict[str, Any]]

Get subunits of a complex.

Parameters:

Name	Type	Description	Default
`complex_id`	`str`	Reactome complex stable ID.	required

Returns:

Type	Description
`List[Dict[str, Any]]`	List of subunit dictionaries.

get_entity_component_of ¶

get_entity_component_of(
    entity_id: str,
) -> List[Dict[str, Any]]

Get complexes/sets that contain an entity.

Parameters:

Name	Type	Description	Default
`entity_id`	`str`	Reactome entity stable ID.	required

Returns:

Type	Description
`List[Dict[str, Any]]`	List of container entity dictionaries.

get_entity_other_forms ¶

get_entity_other_forms(
    entity_id: str,
) -> List[Dict[str, Any]]

Get other forms of a physical entity.

Parameters:

Name	Type	Description	Default
`entity_id`	`str`	Reactome entity stable ID.	required

Returns:

Type	Description
`List[Dict[str, Any]]`	List of other form dictionaries.

get_diseases ¶

get_diseases() -> List[Dict[str, Any]]

Get all disease objects in Reactome.

Returns:

Type	Description
`List[Dict[str, Any]]`	List of disease dictionaries.

get_diseases_doid ¶

get_diseases_doid() -> List[str]

Get all Disease Ontology IDs (DOIDs) in Reactome.

Returns:

Type	Description
`List[str]`	List of DOID strings.

map_to_reactions ¶

map_to_reactions(
    identifier: str, resource: str = "UniProt"
) -> List[Dict[str, Any]]

Map an identifier to Reactome reactions.

Parameters:

Name	Type	Description	Default
`identifier`	`str`	External identifier (e.g., UniProt accession).	required
`resource`	`str`	Source database ("UniProt", "NCBI", "ENSEMBL", etc.).	`'UniProt'`

Returns:

Type	Description
`List[Dict[str, Any]]`	List of reaction dictionaries.

DO_Fetcher¶

DO_Fetcher ¶

DO_Fetcher()

Fetcher for Disease Ontology API.

Provides access to disease ontology data via two APIs

Direct DO API for basic metadata
EBI Ontology Lookup Service (OLS) for comprehensive queries

Example

fetcher = DO_Fetcher()

# Get disease term by DOID
term = fetcher.get_term("DOID:162")  # Cancer
print(term.as_dataframe())

# Search for diseases
results = fetcher.search("cancer")
print(results.get_doids())

# Get term hierarchy
parents = fetcher.get_parents("DOID:162")
children = fetcher.get_children("DOID:162")

# Get cross-references
term = fetcher.get_term("DOID:162")
print(term.terms[0].mesh_id)  # Get MeSH ID
print(term.terms[0].umls_cui)  # Get UMLS CUI

Initialize Disease Ontology fetcher.

get_term ¶

get_term(doid: str, use_ols: bool = True) -> DOFetchedData

Get a disease term by DOID.

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID (e.g., "DOID:162", "162", "DOID_162").	required
`use_ols`	`bool`	If True, use OLS API for more detailed data.	`True`

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with the disease term.

Example

fetcher = DO_Fetcher()
term = fetcher.get_term("DOID:162")  # Cancer
print(term.terms[0].name)
# 'cancer'

get_terms ¶

get_terms(
    doids: List[str], use_ols: bool = True
) -> DOFetchedData

Get multiple disease terms by DOIDs.

Parameters:

Name	Type	Description	Default
`doids`	`List[str]`	List of Disease Ontology IDs.	required
`use_ols`	`bool`	If True, use OLS API for more detailed data.	`True`

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with all disease terms.

Example

fetcher = DO_Fetcher()
terms = fetcher.get_terms(["DOID:162", "DOID:10283"])
print(terms.get_names())

get_all_terms ¶

get_all_terms(
    page: int = 0, page_size: int = 100
) -> DOFetchedData

Get all disease terms from the ontology (paginated).

Parameters:

Name	Type	Description	Default
`page`	`int`	Page number (0-indexed).	`0`
`page_size`	`int`	Number of terms per page.	`100`

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with disease terms.

search ¶

search(
    query: str,
    exact: bool = False,
    rows: int = 20,
    start: int = 0,
    obsoletes: bool = False,
) -> DOSearchFetchedData

Search for disease terms.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query string.	required
`exact`	`bool`	If True, search for exact matches only.	`False`
`rows`	`int`	Maximum number of results to return.	`20`
`start`	`int`	Starting offset for pagination.	`0`
`obsoletes`	`bool`	If True, include obsolete terms.	`False`

Returns:

Type	Description
`DOSearchFetchedData`	DOSearchFetchedData with search results.

Example

fetcher = DO_Fetcher()
results = fetcher.search("breast cancer")
print(results.get_doids())

search_by_xref ¶

search_by_xref(
    database: str, external_id: str
) -> DOSearchFetchedData

Search for disease terms by external database reference.

Parameters:

Name	Type	Description	Default
`database`	`str`	Database name (e.g., "MESH", "UMLS_CUI", "ICD10CM").	required
`external_id`	`str`	ID in the external database.	required

Returns:

Type	Description
`DOSearchFetchedData`	DOSearchFetchedData with matching terms.

Example

fetcher = DO_Fetcher()
results = fetcher.search_by_xref("MESH", "D001943")  # Breast cancer

get_parents ¶

get_parents(doid: str) -> DOFetchedData

Get parent terms of a disease.

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID.	required

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with parent terms.

Example

fetcher = DO_Fetcher()
parents = fetcher.get_parents("DOID:1612")  # Breast cancer
for term in parents.terms:
    print(f"{term.doid}: {term.name}")

get_children ¶

get_children(doid: str) -> DOFetchedData

Get child terms of a disease.

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID.	required

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with child terms.

Example

fetcher = DO_Fetcher()
children = fetcher.get_children("DOID:162")  # Cancer
print(f"Cancer has {len(children)} child terms")

get_ancestors ¶

get_ancestors(doid: str) -> DOFetchedData

Get all ancestor terms of a disease.

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID.	required

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with ancestor terms.

get_descendants ¶

get_descendants(doid: str) -> DOFetchedData

Get all descendant terms of a disease.

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID.	required

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with descendant terms.

get_hierarchical_parents ¶

get_hierarchical_parents(doid: str) -> DOFetchedData

Get hierarchical parent terms (includes part_of relationships).

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID.	required

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with hierarchical parent terms.

get_hierarchical_children ¶

get_hierarchical_children(doid: str) -> DOFetchedData

Get hierarchical child terms (includes part_of relationships).

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID.	required

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with hierarchical child terms.

get_ontology_info ¶

get_ontology_info() -> Dict[str, Any]

Get Disease Ontology metadata.

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with ontology information.

Example

fetcher = DO_Fetcher()
info = fetcher.get_ontology_info()
print(info.get("config", {}).get("title"))

doid_to_mesh ¶

doid_to_mesh(doids: List[str]) -> Dict[str, Optional[str]]

Convert DOIDs to MeSH IDs.

Parameters:

Name	Type	Description	Default
`doids`	`List[str]`	List of Disease Ontology IDs.	required

Returns:

Type	Description
`Dict[str, Optional[str]]`	Dictionary mapping DOIDs to MeSH IDs.

Example

fetcher = DO_Fetcher()
mapping = fetcher.doid_to_mesh(["DOID:162", "DOID:1612"])
print(mapping)

doid_to_umls ¶

doid_to_umls(doids: List[str]) -> Dict[str, Optional[str]]

Convert DOIDs to UMLS CUIs.

Parameters:

Name	Type	Description	Default
`doids`	`List[str]`	List of Disease Ontology IDs.	required

Returns:

Type	Description
`Dict[str, Optional[str]]`	Dictionary mapping DOIDs to UMLS CUIs.

doid_to_icd10 ¶

doid_to_icd10(doids: List[str]) -> Dict[str, Optional[str]]

Convert DOIDs to ICD-10 codes.

Parameters:

Name	Type	Description	Default
`doids`	`List[str]`	List of Disease Ontology IDs.	required

Returns:

Type	Description
`Dict[str, Optional[str]]`	Dictionary mapping DOIDs to ICD-10 codes.

EnrichR_Fetcher¶

EnrichR_Fetcher ¶

EnrichR_Fetcher(organism: str = 'human')

Fetcher for EnrichR gene set enrichment analysis API.

EnrichR provides enrichment analysis against 200+ gene set libraries covering pathways, ontologies, transcription factors, and more.

Supported organisms:

human (default)
mouse
fly (FlyEnrichr)
yeast (YeastEnrichr)
worm (WormEnrichr)
fish (FishEnrichr)

Example

fetcher = EnrichR_Fetcher()

# Get available gene set libraries
libraries = fetcher.get_libraries()
print(libraries.get_library_names()[:10])

# Perform enrichment analysis
genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"]
result = fetcher.enrich(genes, library="KEGG_2021_Human")
print(result.significant_terms().get_term_names())

Initialize EnrichR fetcher.

Parameters:

Name	Type	Description	Default
`organism`	`str`	Target organism (human, mouse, fly, yeast, worm, fish).	`'human'`

set_organism ¶

set_organism(organism: str)

Change the target organism.

Parameters:

Name	Type	Description	Default
`organism`	`str`	Target organism (human, mouse, fly, yeast, worm, fish).	required

get_libraries ¶

get_libraries() -> EnrichRLibrariesData

Get available gene set libraries and their statistics.

Returns:

Type	Description
`EnrichRLibrariesData`	EnrichRLibrariesData containing library information.

Example

fetcher = EnrichR_Fetcher() libs = fetcher.get_libraries() kegg_libs = libs.search("KEGG") print(kegg_libs.get_library_names())

enrich ¶

enrich(
    genes: List[str],
    library: str,
    description: str = "biodbs gene list",
) -> EnrichRFetchedData

Perform enrichment analysis against a gene set library.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols to analyze.	required
`library`	`str`	Name of the gene set library (e.g., "KEGG_2021_Human").	required
`description`	`str`	Description for the gene list.	`'biodbs gene list'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData containing enrichment results.

Example

fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = fetcher.enrich(genes, "KEGG_2021_Human") top = result.top_terms(5) print(top.get_term_names())

enrich_multiple ¶

enrich_multiple(
    genes: List[str],
    libraries: List[str],
    description: str = "biodbs gene list",
) -> Dict[str, EnrichRFetchedData]

Perform enrichment analysis against multiple libraries.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols to analyze.	required
`libraries`	`List[str]`	List of library names to query.	required
`description`	`str`	Description for the gene list.	`'biodbs gene list'`

Returns:

Type	Description
`Dict[str, EnrichRFetchedData]`	Dictionary mapping library names to EnrichRFetchedData.

Example

fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1", "EGFR"] results = fetcher.enrich_multiple( ... genes, ... ["KEGG_2021_Human", "GO_Biological_Process_2023"] ... ) for lib, data in results.items(): ... print(f"{lib}: {len(data)} terms")

enrich_with_background ¶

enrich_with_background(
    genes: List[str],
    background: List[str],
    library: str,
    description: str = "biodbs gene list",
) -> EnrichRFetchedData

Perform enrichment analysis with a custom background gene set.

Uses the speedrichr API for background enrichment.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of query gene symbols.	required
`background`	`List[str]`	List of background gene symbols.	required
`library`	`str`	Name of the gene set library.	required
`description`	`str`	Description for the gene list.	`'biodbs gene list'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData containing enrichment results.

Example

fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1"] background = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS", ...] result = fetcher.enrich_with_background( ... genes, background, "GO_Biological_Process_2023" ... )

view_gene_list ¶

view_gene_list(user_list_id: int) -> List[str]

Retrieve a previously submitted gene list.

Parameters:

Name	Type	Description	Default
`user_list_id`	`int`	The userListId from a previous addList call.	required

Returns:

Type	Description
`List[str]`	List of gene symbols.

get_gene_map ¶

get_gene_map(gene: str, library: str) -> Dict[str, Any]

Get gene set membership for a single gene.

Parameters:

Name	Type	Description	Default
`gene`	`str`	Gene symbol.	required
`library`	`str`	Gene set library name.	required

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with gene set membership information.

export_results ¶

export_results(
    user_list_id: int,
    library: str,
    filename: str = "enrichr_results",
) -> str

Export enrichment results as text.

Parameters:

Name	Type	Description	Default
`user_list_id`	`int`	The userListId from a previous addList call.	required
`library`	`str`	Gene set library name.	required
`filename`	`str`	Output filename (without extension).	`'enrichr_results'`

Returns:

Type	Description
`str`	Tab-separated enrichment results as string.

enrich_kegg ¶

enrich_kegg(
    genes: List[str], year: str = "2021"
) -> EnrichRFetchedData

Perform KEGG pathway enrichment.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols.	required
`year`	`str`	KEGG library year version.	`'2021'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData with KEGG pathway enrichment.

enrich_go_bp ¶

enrich_go_bp(
    genes: List[str], year: str = "2023"
) -> EnrichRFetchedData

Perform GO Biological Process enrichment.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols.	required
`year`	`str`	GO library year version.	`'2023'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData with GO BP enrichment.

enrich_go_mf ¶

enrich_go_mf(
    genes: List[str], year: str = "2023"
) -> EnrichRFetchedData

Perform GO Molecular Function enrichment.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols.	required
`year`	`str`	GO library year version.	`'2023'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData with GO MF enrichment.

enrich_go_cc ¶

enrich_go_cc(
    genes: List[str], year: str = "2023"
) -> EnrichRFetchedData

Perform GO Cellular Component enrichment.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols.	required
`year`	`str`	GO library year version.	`'2023'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData with GO CC enrichment.

enrich_reactome ¶

enrich_reactome(
    genes: List[str], year: str = "2022"
) -> EnrichRFetchedData

Perform Reactome pathway enrichment.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols.	required
`year`	`str`	Reactome library year version.	`'2022'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData with Reactome enrichment.

enrich_wikipathways ¶

enrich_wikipathways(
    genes: List[str], year: str = "2023"
) -> EnrichRFetchedData

Perform WikiPathways enrichment.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols.	required
`year`	`str`	WikiPathways library year version.	`'2023'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData with WikiPathways enrichment.

HGNC_Fetcher¶

HGNC_Fetcher ¶

Fetcher for the HGNC REST API (rest.genenames.org).

The HGNC (HUGO Gene Nomenclature Committee) REST API provides authoritative human gene nomenclature data: approved symbols, names, aliases, previous symbols, and cross-references to Ensembl, NCBI Gene, UniProt, OMIM, etc.

Three endpoints are exposed:

info — service metadata (last update, document count, field lists).
fetch — exact-match lookup by any stored field; returns full records.
search — wildcard / boolean query; returns lightweight summaries (hgnc_id, symbol, score only).

Rate limit: 10 requests per second (enforced automatically).

Example::

fetcher = HGNC_Fetcher()

# Exact lookup by symbol
data = fetcher.fetch("symbol", "TP53")
entry = data[0]          # HGNCEntry
print(entry.hgnc_id)     # "HGNC:11998"
print(entry.entrez_id)   # "7157"

# Wildcard search
hits = fetcher.search("symbol", "ZNF*")
print(hits.num_found)    # many zinc-finger genes

# Service metadata
info = fetcher.info()
print(info["response"]["numDoc"])

info ¶

info() -> dict

Retrieve HGNC service metadata.

Returns the raw parsed JSON dict which contains

lastModified: timestamp of last database update
numDoc: total number of records
searchableFields: list of fields that can be queried
storedFields: list of fields returned by fetch

Returns:

Type	Description
`dict`	Raw JSON dict from `/info`.

Raises:

Type	Description
`APIError`	On HTTP errors.

fetch ¶

fetch(field: str, term: str) -> HGNCFetchedData

Exact-match lookup by any stored field.

Returns full gene records for all entries where field exactly equals term. No wildcard expansion is performed.

Parameters:

Name	Type	Description	Default
`field`	`str`	HGNC stored field name (e.g. `"symbol"`, `"hgnc_id"`, `"ensembl_gene_id"`, `"entrez_id"`, `"uniprot_ids"`).	required
`term`	`str`	Exact value to match.	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` containing full :class:`HGNCEntry` records.

Raises:

Type	Description
`APIValidationError`	If the field name is not recognised (HTTP 400).
`APIError`	On other HTTP errors.

Example::

data = fetcher.fetch("symbol", "BRCA1")
print(data[0].ensembl_gene_id)  # ENSG00000012048

search ¶

search(
    query_or_field: str, term: Optional[str] = None
) -> HGNCFetchedData

Wildcard / boolean search.

Two calling styles are supported:

Free-form query: search("symbol:ZNF* AND status:Approved")
Field + term: search("symbol", "ZNF*")

Wildcard characters

* — zero or more characters
? — exactly one character

Boolean operators: AND, OR, NOT (URL-encoded as +AND+, +OR+, +NOT+ internally).

Note

Search responses contain only hgnc_id, symbol, and score. Use :meth:fetch to retrieve complete records.

Parameters:

Name	Type	Description	Default
`query_or_field`	`str`	A full Solr query string, OR a field name when term is also provided.	required
`term`	`Optional[str]`	The search term for the given field. Leave `None` when passing a full query string as the first argument.	`None`

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with `is_search=True`; items are plain
`HGNCFetchedData`	dicts with `hgnc_id`, `symbol`, `score`.

Raises:

Type	Description
`APIValidationError`	On an invalid query (HTTP 400).
`APIError`	On other HTTP errors.

Example::

# All ZNF genes
hits = fetcher.search("symbol", "ZNF*")

# Approved genes on chromosome 17
hits = fetcher.search("status:Approved+AND+location:17*")

ClinVar_Fetcher¶

ClinVar_Fetcher ¶

ClinVar_Fetcher(api_key: Optional[str] = None)

Fetcher for the ClinVar E-utilities API.

Wraps the four E-utility endpoints that ClinVar supports (esearch, esummary, efetch, elink) with rate limiting and optional API key authentication.

Parameters:

Name	Type	Description	Default
`api_key`	`Optional[str]`	NCBI API key for 10 req/s (vs. 3 req/s without). Falls back to the `NCBI_API_KEY` environment variable.	`None`

Example::

fetcher = ClinVar_Fetcher()

# Search for all pathogenic BRCA1 variants
uids = fetcher.search("BRCA1[gene] AND pathogenic[clnsig]")

# Fetch summaries for the first 10
data = fetcher.fetch_summary(uids[:10])
print(data.as_dataframe())

# One-step helper
data = fetcher.search_gene("TP53", retmax=100)
for v in data:
    print(v.accession, v.clinical_significance)

search ¶

search(
    query: str, retmax: int = 500, retstart: int = 0
) -> List[str]

Find ClinVar variation UIDs matching an Entrez query.

Uses the same query language as the ClinVar website, so you can test a query interactively before automating it.

Common field tags:

BRCA1[gene] — variants in a specific gene
pathogenic[clnsig] — by clinical significance
single_gene[prop] — single-gene variants only
"Breast cancer"[dis] — by associated disease

Parameters:

Name	Type	Description	Default
`query`	`str`	Entrez query string (e.g. `"BRCA1[gene] AND pathogenic[clnsig]"`).	required
`retmax`	`int`	Maximum UIDs to return (default 500; max 10 000).	`500`
`retstart`	`int`	Zero-based offset for pagination.	`0`

Returns:

Type	Description
`List[str]`	List of variation UID strings.

Example::

uids = fetcher.search("TP53[gene] AND pathogenic[clnsig]",
                      retmax=200)

count ¶

count(query: str) -> int

Return the total number of ClinVar records matching query.

Performs an esearch with retmax=0 so no IDs are transferred.

Parameters:

Name	Type	Description	Default
`query`	`str`	Entrez query string.	required

Returns:

Type	Description
`int`	Integer count of matching records.

fetch_summary ¶

fetch_summary(
    ids: List[Union[str, int]], total_count: int = 0
) -> ClinVarFetchedData

Retrieve document summaries for a list of variation UIDs.

Calls esummary with retmode=json to obtain structured data including clinical significance, gene associations, conditions, and genomic coordinates.

Parameters:

Name	Type	Description	Default
`ids`	`List[Union[str, int]]`	ClinVar variation UIDs (integers or strings).	required
`total_count`	`int`	Optional total hit count from a preceding esearch, stored on the returned object for reference.	`0`

Returns:

Type	Description
`ClinVarFetchedData`	class:`ClinVarFetchedData` with one :class:`ClinVarVariant`
`ClinVarFetchedData`	per UID.

Raises:

Type	Description
`APIError`	On HTTP errors.

Example::

data = fetcher.fetch_summary(["65533", "14206"])
for v in data:
    print(v.title, v.clinical_significance)

fetch_vcv ¶

fetch_vcv(accession: str) -> str

Retrieve the full VCV XML record for a variation.

Parameters:

Name	Type	Description	Default
`accession`	`str`	VCV accession with or without version (e.g. `"VCV000014206"` or `"VCV000014206.3"`).	required

Returns:

Type	Description
`str`	Raw XML string.

Example::

xml = fetcher.fetch_vcv("VCV000014206")

fetch_rcv ¶

fetch_rcv(accession: str) -> str

Retrieve the full RCV XML record for a variation-condition pair.

Parameters:

Name	Type	Description	Default
`accession`	`str`	RCV accession with or without version (e.g. `"RCV000000606"` or `"RCV000000606.3"`).	required

Returns:

Type	Description
`str`	Raw XML string.

Example::

xml = fetcher.fetch_rcv("RCV000000606")

link_to_pubmed ¶

link_to_pubmed(variation_id: Union[str, int]) -> List[str]

Return PubMed UIDs linked to a ClinVar variation.

Parameters:

Name	Type	Description	Default
`variation_id`	`Union[str, int]`	ClinVar variation UID.	required

Returns:

Type	Description
`List[str]`	List of PubMed UID strings.

search_gene ¶

search_gene(
    gene_symbol: str,
    single_gene: bool = True,
    retmax: int = 500,
    clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData

Search for variants in a gene and return summaries in one step.

Parameters:

Name	Type	Description	Default
`gene_symbol`	`str`	HGNC gene symbol (e.g. `"BRCA1"`).	required
`single_gene`	`bool`	If `True` (default), restrict to variants assigned to a single gene (`single_gene[prop]`).	`True`
`retmax`	`int`	Maximum number of variants to return.	`500`
`clinical_significance`	`Optional[str]`	Optional filter, e.g. `"pathogenic"`, `"likely pathogenic"`, `"benign"`. Maps to the `[clnsig]` Entrez field tag.	`None`

Returns:

Type	Description
`ClinVarFetchedData`	class:`ClinVarFetchedData` ready to iterate or convert.

Example::

data = fetcher.search_gene("TP53", retmax=200,
                           clinical_significance="pathogenic")
print(data.as_dataframe()[["accession", "title",
                           "clinical_significance"]])

search_condition ¶

search_condition(
    condition: str,
    retmax: int = 500,
    clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData

Search for variants associated with a disease/condition.

Parameters:

Name	Type	Description	Default
`condition`	`str`	Disease or condition name (e.g. `"Breast cancer"`).	required
`retmax`	`int`	Maximum number of variants to return.	`500`
`clinical_significance`	`Optional[str]`	Optional significance filter.	`None`

Returns:

Type	Description
`ClinVarFetchedData`	class:`ClinVarFetchedData`.

Example::

data = fetcher.search_condition("Lynch syndrome", retmax=100)

UniProt¶

uniprot_get_entry¶

uniprot_get_entry ¶

uniprot_get_entry(accession: str) -> UniProtFetchedData

Get a UniProt entry by accession.

Parameters:

Name	Type	Description	Default
`accession`	`str`	UniProt accession (e.g., "P05067").	required

Returns:

Type	Description
`UniProtFetchedData`	UniProtFetchedData with the entry.

Example

entry = uniprot_get_entry("P05067")
print(entry.entries[0].protein_name)
# Amyloid-beta precursor protein

uniprot_search¶

uniprot_search ¶

uniprot_search(
    query: str, size: int = 25, reviewed_only: bool = False
) -> UniProtSearchResult

Search UniProtKB.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query (e.g., "gene:TP53 AND organism_id:9606").	required
`size`	`int`	Number of results per page (max 500).	`25`
`reviewed_only`	`bool`	Only return reviewed (Swiss-Prot) entries.	`False`

Returns:

Type	Description
`UniProtSearchResult`	UniProtSearchResult with matching entries.

Example

results = uniprot_search("kinase AND organism_id:9606", reviewed_only=True)
print(results.as_dataframe()[["accession", "gene_name"]].head())
#   accession gene_name
# 0    P00533      EGFR
# 1    P04629      NTRK1

uniprot_search_by_gene¶

uniprot_search_by_gene ¶

uniprot_search_by_gene(
    gene_name: str,
    organism: Optional[Union[int, str]] = 9606,
    reviewed_only: bool = True,
) -> UniProtSearchResult

Search UniProt by gene name.

Parameters:

Name	Type	Description	Default
`gene_name`	`str`	Gene name to search.	required
`organism`	`Optional[Union[int, str]]`	Organism tax ID or name (default: human).	`9606`
`reviewed_only`	`bool`	Only return reviewed entries.	`True`

Returns:

Type	Description
`UniProtSearchResult`	UniProtSearchResult with matching entries.

Example

results = uniprot_search_by_gene("TP53")
print(results.entries[0].accession)
# P04637

gene_to_uniprot¶

gene_to_uniprot ¶

gene_to_uniprot(
    gene_names: List[str],
    organism: int = 9606,
    reviewed_only: bool = True,
    return_dict: bool = True,
) -> Union[Dict[str, str], DataFrame]

Map gene names to UniProt accessions.

Parameters:

Name	Type	Description	Default
`gene_names`	`List[str]`	List of gene names.	required
`organism`	`int`	Organism tax ID (default: human).	`9606`
`reviewed_only`	`bool`	Only return reviewed entries.	`True`
`return_dict`	`bool`	If True, return dict. If False, return DataFrame.	`True`

Returns:

Type	Description
`Union[Dict[str, str], DataFrame]`	Dictionary or DataFrame mapping gene names to accessions.

Example

mapping = gene_to_uniprot(["TP53", "BRCA1", "EGFR"])
print(mapping)
# {'TP53': 'P04637', 'BRCA1': 'P38398', 'EGFR': 'P00533'}

uniprot_map_ids¶

uniprot_map_ids ¶

uniprot_map_ids(
    ids: List[str], from_db: str, to_db: str
) -> Dict[str, List[str]]

Map IDs between databases using UniProt ID mapping.

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of IDs to map.	required
`from_db`	`str`	Source database (e.g., "UniProtKB_AC-ID", "Gene_Name", "GeneID", "Ensembl").	required
`to_db`	`str`	Target database (e.g., "UniProtKB", "GeneID", "PDB", "Ensembl").	required

Returns:

Type	Description
`Dict[str, List[str]]`	Dictionary mapping input IDs to lists of output IDs.

Common database names

UniProtKB_AC-ID: UniProt accession
UniProtKB: UniProt (returns full entries)
Gene_Name: Gene name
GeneID: NCBI Gene ID
Ensembl: Ensembl ID
PDB: PDB structure ID
RefSeq_Protein: RefSeq protein ID

Example

mapping = uniprot_map_ids(["P05067", "P04637"], "UniProtKB_AC-ID", "GeneID")
print(mapping)
# {'P05067': ['351'], 'P04637': ['7157']}

PubChem¶

pubchem_get_compound¶

pubchem_get_compound ¶

pubchem_get_compound(cid: int) -> PUGRestFetchedData

Get compound data by PubChem CID.

Parameters:

Name	Type	Description	Default
`cid`	`int`	PubChem Compound ID.	required

Returns:

Type	Description
`PUGRestFetchedData`	PUGRestFetchedData containing compound information.

Example

data = pubchem_get_compound(2244) # Aspirin df = data.as_dataframe()

pubchem_search_by_name¶

pubchem_search_by_name ¶

pubchem_search_by_name(name: str) -> PUGRestFetchedData

Search compounds by name.

Parameters:

Name	Type	Description	Default
`name`	`str`	Compound name to search.	required

Returns:

Type	Description
`PUGRestFetchedData`	PUGRestFetchedData containing matching compounds.

Example

data = pubchem_search_by_name("aspirin") cids = data.get_cids()

pubchem_get_properties¶

pubchem_get_properties ¶

pubchem_get_properties(
    cids: Union[int, List[int]],
    properties: Optional[List[str]] = None,
) -> PUGRestFetchedData

Get specific properties for compounds.

Parameters:

Name	Type	Description	Default
`cids`	`Union[int, List[int]]`	Single CID or list of CIDs.	required
`properties`	`Optional[List[str]]`	List of property names. If None, returns common properties.	`None`

Returns:

Type	Description
`PUGRestFetchedData`	PUGRestFetchedData containing property values.

Example

data = pubchem_get_properties(2244, ["MolecularWeight", "MolecularFormula"]) df = data.as_dataframe()

Ensembl¶

ensembl_lookup¶

ensembl_lookup ¶

ensembl_lookup(
    id: str,
    species: Optional[str] = None,
    expand: bool = False,
    db_type: str = "core",
) -> EnsemblFetchedData

Look up an Ensembl stable ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ensembl stable ID (e.g., "ENSG00000141510").	required
`species`	`Optional[str]`	Species name (optional, auto-detected from ID).	`None`
`expand`	`bool`	If True, include connected features (transcripts, exons).	`False`
`db_type`	`str`	Database type ("core" or "otherfeatures").	`'core'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData containing gene/transcript/protein information.

Example

data = ensembl_lookup("ENSG00000141510", expand=True)
print(data.results[0]["display_name"])  # TP53

ensembl_lookup_symbol¶

ensembl_lookup_symbol ¶

ensembl_lookup_symbol(
    species: str, symbol: str, expand: bool = False
) -> EnsemblFetchedData

Look up a gene by symbol.

Parameters:

Name	Type	Description	Default
`species`	`str`	Species name (e.g., "human", "mouse").	required
`symbol`	`str`	Gene symbol (e.g., "BRCA2", "TP53").	required
`expand`	`bool`	If True, include connected features.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData containing gene information.

Example

data = ensembl_lookup_symbol("human", "TP53")
print(data.results[0]["id"])  # ENSG00000141510

ensembl_get_sequence¶

ensembl_get_sequence ¶

ensembl_get_sequence(
    id: str,
    sequence_type: str = "genomic",
    species: Optional[str] = None,
    expand_5prime: Optional[int] = None,
    expand_3prime: Optional[int] = None,
    mask: Optional[str] = None,
    format: str = "fasta",
) -> EnsemblFetchedData

Get sequence for an Ensembl stable ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ensembl stable ID (gene, transcript, exon, protein).	required
`sequence_type`	`str`	Type of sequence ("genomic", "cds", "cdna", "protein").	`'genomic'`
`species`	`Optional[str]`	Species name (optional).	`None`
`expand_5prime`	`Optional[int]`	Extend upstream (genomic only).	`None`
`expand_3prime`	`Optional[int]`	Extend downstream (genomic only).	`None`
`mask`	`Optional[str]`	Mask repeats ("hard" or "soft", genomic only).	`None`
`format`	`str`	Output format ("fasta" or "json").	`'fasta'`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData containing sequence data.

Example

data = ensembl_get_sequence("ENST00000269305", sequence_type="cds") print(data.text) # FASTA sequence

ensembl_get_xrefs¶

ensembl_get_xrefs ¶

ensembl_get_xrefs(
    id: str,
    species: Optional[str] = None,
    external_db: Optional[str] = None,
    all_levels: bool = False,
) -> EnsemblFetchedData

Get external cross-references for an Ensembl ID.

Parameters:

Name	Type	Description	Default
`id`	`str`	Ensembl stable ID.	required
`species`	`Optional[str]`	Species name.	`None`
`external_db`	`Optional[str]`	Filter by external database (e.g., "HGNC", "UniProt").	`None`
`all_levels`	`bool`	If True, find all linked features.	`False`

Returns:

Type	Description
`EnsemblFetchedData`	EnsemblFetchedData containing cross-references.

Example

data = ensembl_get_xrefs("ENSG00000141510", external_db="HGNC")
print(data.results[0]["display_id"])

BioMart¶

biomart_get_genes¶

biomart_get_genes ¶

biomart_get_genes(
    ids: List[str],
    attributes: Optional[List[str]] = None,
    dataset: str = "hsapiens_gene_ensembl",
) -> BioMartQueryData

Get gene information by Ensembl gene IDs.

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of Ensembl gene IDs (e.g., ["ENSG00000141510"]).	required
`attributes`	`Optional[List[str]]`	Attributes to retrieve. If None, uses common gene attributes.	`None`
`dataset`	`str`	BioMart dataset name. Defaults to human genes.	`'hsapiens_gene_ensembl'`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData containing gene information including
`BioMartQueryData`	gene ID, symbol, description, and coordinates.

Example

data = biomart_get_genes(["ENSG00000141510", "ENSG00000012048"]) df = data.as_dataframe() print(df[["ensembl_gene_id", "external_gene_name"]])

biomart_convert_ids¶

biomart_convert_ids ¶

biomart_convert_ids(
    ids: List[str],
    from_type: str = "ensembl_gene_id",
    to_type: str = "external_gene_name",
    dataset: str = "hsapiens_gene_ensembl",
) -> BioMartQueryData

Convert between different gene ID types.

Supported ID types

ensembl_gene_id, ensembl_transcript_id, ensembl_peptide_id
external_gene_name, hgnc_symbol, hgnc_id
entrezgene_id, uniprot_gn_id
refseq_mrna, refseq_peptide

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of IDs to convert.	required
`from_type`	`str`	Source ID type (used as filter).	`'ensembl_gene_id'`
`to_type`	`str`	Target ID type.	`'external_gene_name'`
`dataset`	`str`	BioMart dataset name. Defaults to human genes.	`'hsapiens_gene_ensembl'`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData containing ID mappings with
`BioMartQueryData`	both source and target ID columns.

Example

data = biomart_convert_ids( ... ["TP53", "BRCA1"], ... from_type="external_gene_name", ... to_type="ensembl_gene_id" ... ) df = data.as_dataframe()

biomart_query¶

biomart_query ¶

biomart_query(
    dataset: str = "hsapiens_gene_ensembl",
    attributes: Optional[List[str]] = None,
    filters: Optional[
        Dict[str, Union[str, List[str]]]
    ] = None,
) -> BioMartQueryData

Execute a custom BioMart query.

Parameters:

Name	Type	Description	Default
`dataset`	`str`	BioMart dataset name.	`'hsapiens_gene_ensembl'`
`attributes`	`Optional[List[str]]`	List of attributes to retrieve.	`None`
`filters`	`Optional[Dict[str, Union[str, List[str]]]]`	Dict of filter name to value(s).	`None`

Returns:

Type	Description
`BioMartQueryData`	BioMartQueryData containing query results.

Example

data = biomart_query( ... dataset="hsapiens_gene_ensembl", ... attributes=["ensembl_gene_id", "external_gene_name", "chromosome_name"], ... filters={"chromosome_name": "22", "biotype": "protein_coding"} ... ) df = data.as_dataframe()

KEGG¶

kegg_list¶

kegg_list ¶

kegg_list(
    database: str, organism: Optional[str] = None
) -> KEGGFetchedData

List entries in a KEGG database.

Parameters:

Name	Type	Description	Default
`database`	`str`	Database name (e.g., "pathway", "module", "compound").	required
`organism`	`Optional[str]`	Organism code for pathway/module lists (e.g., "hsa" for human).	`None`

Returns:

Type	Description
`KEGGFetchedData`	KEGGFetchedData containing a list of entries with IDs and descriptions.

Example

data = kegg_list("pathway", organism="hsa") df = data.as_dataframe()

kegg_get¶

kegg_get ¶

kegg_get(
    dbentries: Union[str, List[str]],
    option: Optional[str] = None,
) -> KEGGFetchedData

Retrieve entry data from KEGG database.

Parameters:

Name	Type	Description	Default
`dbentries`	`Union[str, List[str]]`	Entry ID or list of IDs (e.g., "hsa:7157").	required
`option`	`Optional[str]`	Output format ("aaseq", "ntseq", "mol", "kcf", "image", "json").	`None`

Returns:

Type	Description
`KEGGFetchedData`	KEGGFetchedData containing entry data.

Example

data = kegg_get("hsa:7157") # TP53 gene print(data.text)

data = kegg_get("cpd:C00022", option="mol") print(data.text)

kegg_link¶

kegg_link ¶

kegg_link(
    target_db: str, source: Union[str, List[str]]
) -> KEGGFetchedData

Find related entries between KEGG databases.

Parameters:

Name	Type	Description	Default
`target_db`	`str`	Target database (e.g., "pathway", "module", "disease").	required
`source`	`Union[str, List[str]]`	Source database name OR list of entry IDs.	required

Returns:

Type	Description
`KEGGFetchedData`	KEGGFetchedData containing linked entries between databases.

Example

Link genes to pathways¶

data = kegg_link("pathway", ["hsa:10458", "hsa:7157"]) df = data.as_dataframe()

Link all compounds to reactions¶

data = kegg_link("reaction", "compound")

kegg_conv¶

kegg_conv ¶

kegg_conv(
    target_db: str, source: Union[str, List[str]]
) -> KEGGFetchedData

Convert entry IDs between KEGG and external databases.

Parameters:

Name	Type	Description	Default
`target_db`	`str`	Target database (e.g., "ncbi-geneid", "ncbi-proteinid", "uniprot").	required
`source`	`Union[str, List[str]]`	Source database name OR list of entry IDs to convert.	required

Returns:

Type	Description
`KEGGFetchedData`	KEGGFetchedData containing ID mappings between databases.

Example

Convert entire database¶

data = kegg_conv("ncbi-geneid", "hsa")

Convert specific entries¶

data = kegg_conv("ncbi-geneid", ["hsa:10458", "hsa:7157"]) df = data.as_dataframe()

ChEMBL¶

chembl_get_molecule¶

chembl_get_molecule ¶

chembl_get_molecule(chembl_id: str) -> ChEMBLFetchedData

Get molecule data by ChEMBL ID.

Parameters:

Name	Type	Description	Default
`chembl_id`	`str`	ChEMBL molecule ID (e.g., "CHEMBL25").	required

Returns:

Type	Description
`ChEMBLFetchedData`	ChEMBLFetchedData containing molecule information including
`ChEMBLFetchedData`	structure, properties, and cross-references.

Example

data = chembl_get_molecule("CHEMBL25")  # Aspirin
print(data.results[0]["pref_name"])

chembl_search_molecules¶

chembl_search_molecules ¶

chembl_search_molecules(
    query: str, limit: int = 100
) -> ChEMBLFetchedData

Search molecules by name, synonym, or structure.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query (name, synonym, or InChIKey).	required
`limit`	`int`	Maximum number of results to return.	`100`

Returns:

Type	Description
`ChEMBLFetchedData`	ChEMBLFetchedData containing matching molecules.

Example

data = chembl_search_molecules("aspirin") df = data.as_dataframe() print(df[["molecule_chembl_id", "pref_name"]].head())

chembl_get_approved_drugs¶

chembl_get_approved_drugs ¶

chembl_get_approved_drugs(
    limit: int = 1000,
) -> ChEMBLFetchedData

Get list of approved drugs from ChEMBL.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum number of drugs to return.	`1000`

Returns:

Type	Description
`ChEMBLFetchedData`	ChEMBLFetchedData containing approved drug molecules
`ChEMBLFetchedData`	with their names, structures, and approval information.

Example

data = chembl_get_approved_drugs(limit=100) df = data.as_dataframe() print(df[["molecule_chembl_id", "pref_name"]].head())

QuickGO¶

quickgo_search_annotations¶

quickgo_search_annotations ¶

quickgo_search_annotations(
    go_id: Optional[str] = None,
    taxon_id: Optional[int] = None,
    gene_product_id: Optional[str] = None,
    evidence_code: Optional[str] = None,
    limit: int = 100,
) -> QuickGOFetchedData

Search GO annotations with filters.

Parameters:

Name	Type	Description	Default
`go_id`	`Optional[str]`	GO term ID to filter by.	`None`
`taxon_id`	`Optional[int]`	NCBI taxonomy ID (e.g., 9606 for human).	`None`
`gene_product_id`	`Optional[str]`	Gene product ID (e.g., "UniProtKB:P04637").	`None`
`evidence_code`	`Optional[str]`	Evidence code (e.g., "IDA", "IEA").	`None`
`limit`	`int`	Maximum number of results to return.	`100`

Returns:

Type	Description
`QuickGOFetchedData`	QuickGOFetchedData containing matching GO annotations
`QuickGOFetchedData`	with gene products, GO terms, and evidence codes.

Example

data = quickgo_search_annotations(go_id="GO:0006915", taxon_id=9606) df = data.as_dataframe() print(df[["geneProductId", "goId", "goName"]].head())

quickgo_get_terms¶

quickgo_get_terms ¶

quickgo_get_terms(
    ids: Union[str, List[str]],
) -> QuickGOFetchedData

Get GO term details by ID.

Parameters:

Name	Type	Description	Default
`ids`	`Union[str, List[str]]`	GO term ID or list of IDs (e.g., "GO:0008150" or ["GO:0008150", "GO:0003674"]).	required

Returns:

Type	Description
`QuickGOFetchedData`	QuickGOFetchedData containing term details including
`QuickGOFetchedData`	name, definition, aspect, and synonyms.

Example

data = quickgo_get_terms("GO:0006915")  # apoptotic process
print(data.results[0]["name"])

HPA (Human Protein Atlas)¶

hpa_get_gene¶

hpa_get_gene ¶

hpa_get_gene(
    gene: str, fmt: str = "json"
) -> HPAFetchedData

Get protein data for a single gene.

Parameters:

Name	Type	Description	Default
`gene`	`str`	Gene name (e.g., "TP53") or Ensembl ID (e.g., "ENSG00000141510").	required
`fmt`	`str`	Response format ("json", "xml", or "tsv").	`'json'`

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData containing protein information including
`HPAFetchedData`	expression data, antibody information, and references.

Example

data = hpa_get_gene("TP53") print(data.results[0].keys())

hpa_get_tissue_expression¶

hpa_get_tissue_expression ¶

hpa_get_tissue_expression(
    genes: Union[str, List[str]],
) -> HPAFetchedData

Get tissue expression data for genes.

Parameters:

Name	Type	Description	Default
`genes`	`Union[str, List[str]]`	Gene name(s) or Ensembl ID(s).	required

Returns:

Type	Description
`HPAFetchedData`	HPAFetchedData containing tissue expression levels
`HPAFetchedData`	across different human tissues and organs.

Example

data = hpa_get_tissue_expression("TP53") df = data.as_dataframe() print(df[["Gene", "Tissue", "Level"]].head())

NCBI¶

ncbi_get_gene¶

ncbi_get_gene ¶

ncbi_get_gene(
    identifiers: List[Union[int, str]],
    taxon: Union[int, str] = "human",
    api_key: Optional[str] = None,
) -> NCBIGeneFetchedData

Get gene information from NCBI by gene IDs or symbols.

This is a convenience function that wraps the NCBI_Fetcher.

Parameters:

Name	Type	Description	Default
`identifiers`	`List[Union[int, str]]`	List of NCBI Gene IDs (integers) or gene symbols (strings).	required
`taxon`	`Union[int, str]`	Taxonomy ID or name (used for symbol lookups).	`'human'`
`api_key`	`Optional[str]`	Optional NCBI API key for higher rate limits.	`None`

Returns:

Type	Description
`NCBIGeneFetchedData`	NCBIGeneFetchedData containing gene reports.

Examples:

>>> # By gene IDs
>>> genes = ncbi_get_gene([7157, 672])
>>> print(genes.as_dataframe())

>>> # By symbols
>>> genes = ncbi_get_gene(["TP53", "BRCA1"], taxon="human")
>>> print(genes.get_gene_ids())

ncbi_symbol_to_id¶

ncbi_symbol_to_id ¶

ncbi_symbol_to_id(
    symbols: List[str],
    taxon: Union[int, str] = "human",
    api_key: Optional[str] = None,
    return_dict: bool = True,
) -> Union[Dict[str, int], DataFrame]

Convert gene symbols to NCBI Gene IDs.

Parameters:

Name	Type	Description	Default
`symbols`	`List[str]`	List of gene symbols.	required
`taxon`	`Union[int, str]`	Taxonomy ID or name.	`'human'`
`api_key`	`Optional[str]`	Optional NCBI API key.	`None`
`return_dict`	`bool`	If True, return dict. If False, return DataFrame.	`True`

Returns:

Type	Description
`Union[Dict[str, int], DataFrame]`	Dictionary mapping symbols to gene IDs, or DataFrame.

Example

mapping = ncbi_symbol_to_id(["TP53", "BRCA1", "EGFR"]) print(mapping)

FDA¶

fda_search¶

fda_search ¶

fda_search(
    category: str,
    endpoint: str,
    search: Optional[Union[str, Dict]] = None,
    limit: int = 100,
    **kwargs: Any,
) -> FDAFetchedData

Search FDA openFDA database.

Parameters:

Name	Type	Description	Default
`category`	`str`	FDA category ("drug", "device", "food", etc.).	required
`endpoint`	`str`	Endpoint within category ("event", "label", "enforcement", etc.).	required
`search`	`Optional[Union[str, Dict]]`	Search query string or dict of field:value pairs.	`None`
`limit`	`int`	Maximum results per request.	`100`
`**kwargs`	`Any`	Additional parameters (sort, count, skip).	`{}`

Returns:

Type	Description
`FDAFetchedData`	FDAFetchedData containing search results.

Example

data = fda_search("drug", "event", search="aspirin", limit=10) df = data.as_dataframe()

fda_drug_events¶

fda_drug_events ¶

fda_drug_events(
    search: Optional[Union[str, Dict]] = None,
    limit: int = 100,
    **kwargs: Any,
) -> FDAFetchedData

Search FDA drug adverse event reports (FAERS).

Parameters:

Name	Type	Description	Default
`search`	`Optional[Union[str, Dict]]`	Search query (e.g., "patient.drug.openfda.brand_name:aspirin").	`None`
`limit`	`int`	Maximum results to return.	`100`
`**kwargs`	`Any`	Additional parameters (sort, count, skip).	`{}`

Returns:

Type	Description
`FDAFetchedData`	FDAFetchedData containing adverse event reports with
`FDAFetchedData`	patient information, drug details, and outcomes.

Example

data = fda_drug_events(search="aspirin", limit=50) df = data.as_dataframe()

Reactome¶

reactome_analyze¶

reactome_analyze ¶

reactome_analyze(
    identifiers: List[str],
    species: str = "Homo sapiens",
    interactors: bool = False,
    page_size: int = 100,
    sort_by: str = "ENTITIES_FDR",
    order: str = "ASC",
    resource: str = "TOTAL",
    p_value: float = 1.0,
    include_disease: bool = True,
    min_entities: Optional[int] = None,
    max_entities: Optional[int] = None,
) -> ReactomeFetchedData

Perform Reactome pathway over-representation analysis.

Parameters:

Name	Type	Description	Default
`identifiers`	`List[str]`	List of identifiers (gene symbols, UniProt IDs, etc.).	required
`species`	`str`	Species name (e.g., "Homo sapiens", "Mus musculus").	`'Homo sapiens'`
`interactors`	`bool`	Include interactors in analysis.	`False`
`page_size`	`int`	Number of results to return.	`100`
`sort_by`	`str`	Sort field (ENTITIES_FDR, ENTITIES_PVALUE, NAME).	`'ENTITIES_FDR'`
`order`	`str`	Sort order (ASC, DESC).	`'ASC'`
`resource`	`str`	Resource filter (TOTAL, UNIPROT, ENSEMBL, etc.).	`'TOTAL'`
`p_value`	`float`	P-value cutoff for filtering.	`1.0`
`include_disease`	`bool`	Include disease pathways.	`True`
`min_entities`	`Optional[int]`	Minimum pathway size.	`None`
`max_entities`	`Optional[int]`	Maximum pathway size.	`None`

Returns:

Type	Description
`ReactomeFetchedData`	ReactomeFetchedData with pathway enrichment results.

Example

genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = reactome_analyze(genes) print(f"Found {len(result.pathways)} pathways") Found 172 pathways df = result.significant_pathways(fdr_threshold=0.05).as_dataframe() print(df[["stId", "name", "fdr", "found", "total"]].head(3).to_string()) stId name fdr found total 0 R-HSA-6796648 TP53 Regulates Transcription of DNA Repai... 1.08e-06 7 86 1 R-HSA-3700989 Transcriptional Regulation by TP53 6.45e-04 9 487 2 R-HSA-6806003 Regulation of TP53 Expression and Degradation 6.45e-04 4 46

Disease Ontology¶

do_get_term¶

do_get_term ¶

do_get_term(
    doid: str, use_ols: bool = True
) -> DOFetchedData

Get a disease term by DOID.

This is a convenience function that wraps the DO_Fetcher.

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID (e.g., "DOID:162", "162").	required
`use_ols`	`bool`	If True, use OLS API for more detailed data.	`True`

Returns:

Type	Description
`DOFetchedData`	DOFetchedData containing the disease term.

Example

term = do_get_term("DOID:162") # Cancer print(term.terms[0].name) 'cancer'

do_get_children¶

do_get_children ¶

do_get_children(doid: str) -> DOFetchedData

Get child terms of a disease.

Parameters:

Name	Type	Description	Default
`doid`	`str`	Disease Ontology ID.	required

Returns:

Type	Description
`DOFetchedData`	DOFetchedData with child terms.

Example

children = do_get_children("DOID:162") # Cancer print(f"Cancer has {len(children)} child terms")

EnrichR¶

enrichr_enrich¶

enrichr_enrich ¶

enrichr_enrich(
    genes: List[str],
    library: str,
    organism: str = "human",
    description: str = "biodbs gene list",
) -> EnrichRFetchedData

Perform gene set enrichment analysis.

Parameters:

Name	Type	Description	Default
`genes`	`List[str]`	List of gene symbols to analyze.	required
`library`	`str`	Name of the gene set library (e.g., "KEGG_2021_Human").	required
`organism`	`str`	Target organism (human, mouse, fly, yeast, worm, fish).	`'human'`
`description`	`str`	Description for the gene list.	`'biodbs gene list'`

Returns:

Type	Description
`EnrichRFetchedData`	EnrichRFetchedData containing enrichment results with
`EnrichRFetchedData`	term names, p-values, combined scores, and overlapping genes.

Example

genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = enrichr_enrich(genes, "KEGG_2021_Human") top = result.top_terms(5) print(top.get_term_names())

enrichr_get_libraries¶

enrichr_get_libraries ¶

enrichr_get_libraries(
    organism: str = "human",
) -> EnrichRLibrariesData

Get available gene set libraries.

Parameters:

Name	Type	Description	Default
`organism`	`str`	Target organism (human, mouse, fly, yeast, worm, fish).	`'human'`

Returns:

Type	Description
`EnrichRLibrariesData`	EnrichRLibrariesData containing library statistics including
`EnrichRLibrariesData`	library names, number of terms, gene coverage, and categories.

Example

libs = enrichr_get_libraries() kegg = libs.search("KEGG") print(kegg.get_library_names())

HGNC¶

hgnc_fetch¶

hgnc_fetch ¶

hgnc_fetch(field: str, term: str) -> HGNCFetchedData

Exact-match lookup by any HGNC stored field.

Returns full gene records. No wildcard expansion — use :func:hgnc_search for wildcard queries.

Parameters:

Name	Type	Description	Default
`field`	`str`	HGNC field name (e.g. `"symbol"`, `"hgnc_id"`, `"ensembl_gene_id"`, `"entrez_id"`, `"uniprot_ids"`).	required
`term`	`str`	Exact value to match.	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` containing :class:`HGNCEntry` records.

Example::

data = hgnc_fetch("symbol", "TP53")
entry = data[0]
print(entry.hgnc_id, entry.entrez_id, entry.ensembl_gene_id)

hgnc_search¶

hgnc_search ¶

hgnc_search(
    query_or_field: str, term: Optional[str] = None
) -> HGNCFetchedData

Wildcard / boolean search across HGNC records.

Returns lightweight summaries (hgnc_id, symbol, score). Use :func:hgnc_fetch to retrieve full records.

Parameters:

Name	Type	Description	Default
`query_or_field`	`str`	Full Solr query string, OR a field name when term is also given.	required
`term`	`Optional[str]`	Search term for the given field (supports `*` and `?`).	`None`

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with `is_search=True`.

Example::

# All approved TP53 family members
hits = hgnc_search("symbol", "TP53*")
print(hits.symbols())

# Boolean query
hits = hgnc_search("status:Approved+AND+locus_group:non-coding+RNA")

hgnc_fetch_by_symbol¶

hgnc_fetch_by_symbol ¶

hgnc_fetch_by_symbol(symbol: str) -> HGNCFetchedData

Fetch a gene entry by its approved HGNC symbol.

Parameters:

Name	Type	Description	Default
`symbol`	`str`	Approved gene symbol (e.g. `"TP53"`, `"BRCA1"`).	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with the matching gene entry (usually
`HGNCFetchedData`	one record; zero if the symbol is not found).

Example::

data = hgnc_fetch_by_symbol("EGFR")
entry = data[0]
print(entry.name)  # "epidermal growth factor receptor"

hgnc_fetch_by_hgnc_id¶

hgnc_fetch_by_hgnc_id ¶

hgnc_fetch_by_hgnc_id(hgnc_id: str) -> HGNCFetchedData

Fetch a gene entry by its HGNC ID.

Parameters:

Name	Type	Description	Default
`hgnc_id`	`str`	HGNC identifier in the form `"HGNC:NNNN"` (e.g. `"HGNC:11998"` for TP53).	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with the matching gene entry.

Example::

data = hgnc_fetch_by_hgnc_id("HGNC:11998")
print(data[0].symbol)  # "TP53"

hgnc_fetch_by_entrez_id¶

hgnc_fetch_by_entrez_id ¶

hgnc_fetch_by_entrez_id(entrez_id: str) -> HGNCFetchedData

Fetch a gene entry by NCBI Entrez Gene ID.

Parameters:

Name	Type	Description	Default
`entrez_id`	`str`	NCBI Gene ID as a string (e.g. `"7157"` for TP53).	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with the matching gene entry.

Example::

data = hgnc_fetch_by_entrez_id("7157")
print(data[0].symbol)  # "TP53"

hgnc_fetch_by_ensembl_id¶

hgnc_fetch_by_ensembl_id ¶

hgnc_fetch_by_ensembl_id(
    ensembl_id: str,
) -> HGNCFetchedData

Fetch a gene entry by Ensembl stable gene ID.

Parameters:

Name	Type	Description	Default
`ensembl_id`	`str`	Ensembl gene ID (e.g. `"ENSG00000141510"`).	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with the matching gene entry.

Example::

data = hgnc_fetch_by_ensembl_id("ENSG00000141510")
print(data[0].symbol)  # "TP53"

hgnc_fetch_by_uniprot_id¶

hgnc_fetch_by_uniprot_id ¶

hgnc_fetch_by_uniprot_id(
    uniprot_id: str,
) -> HGNCFetchedData

Fetch a gene entry by UniProt accession.

Parameters:

Name	Type	Description	Default
`uniprot_id`	`str`	UniProt accession (e.g. `"P04637"`).	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with the matching gene entry.

Example::

data = hgnc_fetch_by_uniprot_id("P04637")
print(data[0].symbol)  # "TP53"

hgnc_fetch_by_refseq¶

hgnc_fetch_by_refseq ¶

hgnc_fetch_by_refseq(
    refseq_accession: str,
) -> HGNCFetchedData

Fetch a gene entry by RefSeq accession.

Parameters:

Name	Type	Description	Default
`refseq_accession`	`str`	RefSeq accession (e.g. `"NM_000546"`).	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with the matching gene entry.

Example::

data = hgnc_fetch_by_refseq("NM_000546")
print(data[0].symbol)  # "TP53"

hgnc_search_symbol¶

hgnc_search_symbol ¶

hgnc_search_symbol(query: str) -> HGNCFetchedData

Search HGNC gene symbols using wildcard patterns.

Returns lightweight summaries; use :func:hgnc_fetch_by_symbol for full records once you have exact symbols.

Parameters:

Name	Type	Description	Default
`query`	`str`	Symbol query supporting `` (any chars) and `?` (one char). Examples: `"ZNF"`, `"BRCA?"`	required

Returns:

Type	Description
`HGNCFetchedData`	class:`HGNCFetchedData` with `is_search=True`.

Example::

hits = hgnc_search_symbol("TP53*")
print(hits.symbols())
# ['TP53', 'TP53AIP1', 'TP53BP1', 'TP53BP2', ...]

hgnc_info¶

hgnc_info ¶

hgnc_info() -> dict

Return HGNC service metadata.

Includes the database last-modified timestamp, total document count, and the lists of searchable and stored fields.

Returns:

Type	Description
`dict`	Raw JSON dict from the `/info` endpoint.

Example::

info = hgnc_info()
print(info["response"]["numDoc"])

ClinVar¶

clinvar_search¶

clinvar_search ¶

clinvar_search(
    query: str, retmax: int = 500, retstart: int = 0
) -> List[str]

Find ClinVar variation UIDs matching an Entrez query.

Uses the same query language as the ClinVar website. Common field tags:

BRCA1[gene] — gene name
pathogenic[clnsig] — clinical significance
"Breast cancer"[dis] — disease
single_gene[prop] — single-gene variants

Parameters:

Name	Type	Description	Default
`query`	`str`	Entrez query string.	required
`retmax`	`int`	Maximum UIDs to return (default 500).	`500`
`retstart`	`int`	Offset for pagination.	`0`

Returns:

Type	Description
`List[str]`	List of variation UID strings.

Example::

uids = clinvar_search("BRCA1[gene] AND pathogenic[clnsig]")
data = clinvar_fetch_by_id(uids[:20])

clinvar_count¶

clinvar_count ¶

clinvar_count(query: str) -> int

Return the total number of ClinVar records matching query.

Parameters:

Name	Type	Description	Default
`query`	`str`	Entrez query string.	required

Returns:

Type	Description
`int`	Integer count.

Example::

n = clinvar_count("TP53[gene] AND pathogenic[clnsig]")
print(f"TP53 has {n} pathogenic variants in ClinVar")

clinvar_fetch_by_id¶

clinvar_fetch_by_id ¶

clinvar_fetch_by_id(
    ids: List[Union[str, int]],
) -> ClinVarFetchedData

Fetch ClinVar summaries for a list of variation UIDs.

Parameters:

Name	Type	Description	Default
`ids`	`List[Union[str, int]]`	ClinVar variation UIDs (integers or strings).	required

Returns:

Type	Description
`ClinVarFetchedData`	class:`~biodbs.data.ClinVar.data.ClinVarFetchedData`.

Example::

data = clinvar_fetch_by_id([65533, 14206])
print(data.as_dataframe())

clinvar_search_gene¶

clinvar_search_gene ¶

clinvar_search_gene(
    gene_symbol: str,
    retmax: int = 500,
    single_gene: bool = True,
    clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData

Search and fetch ClinVar variants for a gene in one step.

Parameters:

Name	Type	Description	Default
`gene_symbol`	`str`	HGNC gene symbol (e.g. `"BRCA1"`).	required
`retmax`	`int`	Maximum variants to return.	`500`
`single_gene`	`bool`	If `True` (default), restrict to single-gene variants.	`True`
`clinical_significance`	`Optional[str]`	Optional filter (e.g. `"pathogenic"`).	`None`

Returns:

Type	Description
`ClinVarFetchedData`	class:`~biodbs.data.ClinVar.data.ClinVarFetchedData`.

Example::

data = clinvar_search_gene("TP53", retmax=200,
                           clinical_significance="pathogenic")
df = data.as_dataframe()

clinvar_search_condition¶

clinvar_search_condition ¶

clinvar_search_condition(
    condition: str,
    retmax: int = 500,
    clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData

Search and fetch ClinVar variants for a disease/condition.

Parameters:

Name	Type	Description	Default
`condition`	`str`	Disease or condition name (e.g. `"Lynch syndrome"`).	required
`retmax`	`int`	Maximum variants to return.	`500`
`clinical_significance`	`Optional[str]`	Optional significance filter.	`None`

Returns:

Type	Description
`ClinVarFetchedData`	class:`~biodbs.data.ClinVar.data.ClinVarFetchedData`.

Example::

data = clinvar_search_condition("Breast cancer",
                                clinical_significance="pathogenic")

clinvar_fetch_vcv¶

clinvar_fetch_vcv ¶

clinvar_fetch_vcv(accession: str) -> str

Retrieve the full VCV XML record for a variation.

Parameters:

Name	Type	Description	Default
`accession`	`str`	VCV accession (e.g. `"VCV000014206"` or `"VCV000014206.3"`).	required

Returns:

Type	Description
`str`	Raw XML string.

Example::

xml = clinvar_fetch_vcv("VCV000014206")

clinvar_fetch_rcv¶

clinvar_fetch_rcv ¶

clinvar_fetch_rcv(accession: str) -> str

Retrieve the full RCV XML record for a variation-condition pair.

Parameters:

Name	Type	Description	Default
`accession`	`str`	RCV accession (e.g. `"RCV000000606"`).	required

Returns:

Type	Description
`str`	Raw XML string.

Example::

xml = clinvar_fetch_rcv("RCV000000606")

clinvar_link_pubmed¶

clinvar_link_pubmed ¶

clinvar_link_pubmed(
    variation_id: Union[str, int],
) -> List[str]

Return PubMed UIDs linked to a ClinVar variation.

Parameters:

Name	Type	Description	Default
`variation_id`	`Union[str, int]`	ClinVar variation UID.	required

Returns:

Type	Description
`List[str]`	List of PubMed UID strings.

Example::

pmids = clinvar_link_pubmed(65533)

Rate Limiting¶

Function/Class	Description
`RateLimiter`	Global rate limiter for API calls
`get_rate_limiter`	Get the singleton rate limiter instance
`request_with_retry`	Make HTTP request with retry logic