Skip to content

Fetch Module API Reference

Complete reference for biodbs.fetch module.

Summary

Fetcher Classes

Class Description
UniProt_Fetcher Fetch protein data from UniProt REST API
PubChem_Fetcher Fetch chemical data from PubChem PUG REST/View APIs
Ensembl_Fetcher Fetch genomic data from Ensembl REST API
BioMart_Fetcher Query Ensembl BioMart for gene annotations
KEGG_Fetcher Fetch pathway and gene data from KEGG API
ChEMBL_Fetcher Fetch bioactivity data from ChEMBL API
QuickGO_Fetcher Fetch GO annotations from QuickGO API
HPA_Fetcher Fetch protein expression from Human Protein Atlas
NCBI_Fetcher Fetch gene data from NCBI Entrez
FDA_Fetcher Fetch drug/device data from openFDA
Reactome_Fetcher Fetch pathway data from Reactome
DO_Fetcher Fetch disease terms from Disease Ontology
EnrichR_Fetcher Perform gene set enrichment via EnrichR
HGNC_Fetcher Fetch gene nomenclature from HGNC
ClinVar_Fetcher Fetch clinical variant data from ClinVar

UniProt Functions

Function Description
uniprot_get_entry Get a single UniProt entry by accession
uniprot_search Search UniProtKB with query
uniprot_search_by_gene Search by gene name
gene_to_uniprot Map gene symbols to UniProt accessions
uniprot_map_ids Map IDs between databases

PubChem Functions

Function Description
pubchem_get_compound Get compound record by CID
pubchem_search_by_name Search compounds by name
pubchem_get_properties Get compound properties

Ensembl Functions

Function Description
ensembl_lookup Lookup entity by Ensembl ID
ensembl_lookup_symbol Lookup by gene symbol
ensembl_get_sequence Get nucleotide/protein sequence
ensembl_get_xrefs Get cross-references

BioMart Functions

Function Description
biomart_get_genes Get gene annotations by Ensembl IDs
biomart_convert_ids Convert between gene ID types
biomart_query Custom BioMart query

KEGG Functions

Function Description
kegg_list List entries in a KEGG database
kegg_get Get KEGG entry by ID
kegg_link Get cross-references between databases
kegg_conv Convert between KEGG and external IDs

ChEMBL Functions

Function Description
chembl_get_molecule Get molecule by ChEMBL ID
chembl_search_molecules Search molecules by name
chembl_get_approved_drugs Get approved drugs list

QuickGO Functions

Function Description
quickgo_search_annotations Search GO annotations
quickgo_get_terms Get GO term details

HPA Functions

Function Description
hpa_get_gene Get gene expression data
hpa_get_tissue_expression Get tissue-level expression

NCBI Functions

Function Description
ncbi_get_gene Get gene info by Entrez ID
ncbi_symbol_to_id Convert gene symbol to Entrez ID

FDA Functions

Function Description
fda_search Search openFDA endpoints
fda_drug_events Search drug adverse events

Reactome Functions

Function Description
reactome_analyze Analyze gene list against Reactome

Disease Ontology Functions

Function Description
do_get_term Get disease term by DOID
do_get_children Get child terms

EnrichR Functions

Function Description
enrichr_enrich Perform enrichment analysis
enrichr_get_libraries List available gene set libraries

HGNC Functions

Function Description
hgnc_fetch Exact-match lookup by any HGNC field
hgnc_search Wildcard / boolean search across HGNC
hgnc_fetch_by_symbol Fetch gene by approved symbol
hgnc_fetch_by_hgnc_id Fetch gene by HGNC ID
hgnc_fetch_by_entrez_id Fetch gene by Entrez Gene ID
hgnc_fetch_by_ensembl_id Fetch gene by Ensembl gene ID
hgnc_fetch_by_uniprot_id Fetch gene by UniProt accession
hgnc_fetch_by_refseq Fetch gene by RefSeq accession
hgnc_search_symbol Wildcard search on gene symbols
hgnc_info Return HGNC service metadata

ClinVar Functions

Function Description
clinvar_search Search ClinVar with an Entrez query string
clinvar_count Count ClinVar records matching a query
clinvar_fetch_by_id Fetch variant summaries by variation UID
clinvar_search_gene Search and fetch variants for a gene
clinvar_search_condition Search and fetch variants for a condition
clinvar_fetch_vcv Fetch full VCV XML record
clinvar_fetch_rcv Fetch full RCV XML record
clinvar_link_pubmed Get PubMed IDs linked to a variation

Fetcher Classes

UniProt_Fetcher

UniProt_Fetcher

UniProt_Fetcher()

Fetcher for UniProt REST API.

Provides access to UniProtKB protein data including:

  • Entry retrieval by accession
  • Search by query
  • ID mapping between databases
  • Batch retrieval
Example
fetcher = UniProt_Fetcher()

# Get protein by accession
entry = fetcher.get_entry("P05067")  # APP protein
print(entry.entries[0].protein_name)

# Search for proteins
results = fetcher.search("gene:TP53 AND organism_id:9606")
print(results.as_dataframe())

# Get multiple entries
entries = fetcher.get_entries(["P05067", "P04637", "P00533"])

# Map IDs
mapping = fetcher.map_ids(
    ["P05067", "P04637"],
    from_db="UniProtKB_AC-ID",
    to_db="GeneID"
)

Initialize UniProt fetcher.

get_entry

get_entry(
    accession: str, fields: Optional[str] = None
) -> UniProtFetchedData

Get a UniProt entry by accession.

Parameters:

Name Type Description Default
accession str

UniProt accession (e.g., "P05067").

required
fields Optional[str]

Comma-separated list of fields to return.

None

Returns:

Type Description
UniProtFetchedData

UniProtFetchedData with the entry.

Example
fetcher = UniProt_Fetcher()
entry = fetcher.get_entry("P05067")
print(entry.entries[0].protein_name)

get_entries

get_entries(
    accessions: List[str], fields: Optional[str] = None
) -> UniProtFetchedData

Get multiple UniProt entries by accessions.

Parameters:

Name Type Description Default
accessions List[str]

List of UniProt accessions.

required
fields Optional[str]

Comma-separated list of fields to return.

None

Returns:

Type Description
UniProtFetchedData

UniProtFetchedData with all entries.

Example
fetcher = UniProt_Fetcher()
entries = fetcher.get_entries(["P05067", "P04637", "P00533"])
print(entries.get_gene_names())

search

search(
    query: str,
    fields: Optional[str] = None,
    sort: Optional[str] = None,
    size: int = 25,
    include_isoform: bool = False,
    cursor: Optional[str] = None,
) -> UniProtSearchResult

Search UniProtKB.

Parameters:

Name Type Description Default
query str

Search query (e.g., "gene:TP53 AND organism_id:9606").

required
fields Optional[str]

Comma-separated list of fields to return.

None
sort Optional[str]

Sort field and direction (e.g., "accession desc").

None
size int

Number of results per page (max 500).

25
include_isoform bool

Include isoforms in results.

False
cursor Optional[str]

Cursor for pagination.

None

Returns:

Type Description
UniProtSearchResult

UniProtSearchResult with matching entries.

Example
fetcher = UniProt_Fetcher()
results = fetcher.search("gene:BRCA1 AND reviewed:true")
print(results.as_dataframe())

search_all

search_all(
    query: str,
    fields: Optional[str] = None,
    sort: Optional[str] = None,
    max_results: int = 10000,
    include_isoform: bool = False,
) -> UniProtFetchedData

Search and retrieve all results with pagination.

Parameters:

Name Type Description Default
query str

Search query.

required
fields Optional[str]

Fields to return.

None
sort Optional[str]

Sort field and direction.

None
max_results int

Maximum results to retrieve.

10000
include_isoform bool

Include isoforms.

False

Returns:

Type Description
UniProtFetchedData

UniProtFetchedData with all matching entries.

search_by_gene

search_by_gene(
    gene_name: str,
    organism: Optional[Union[int, str]] = None,
    reviewed_only: bool = False,
    size: int = 25,
) -> UniProtSearchResult

Search by gene name.

Parameters:

Name Type Description Default
gene_name str

Gene name to search.

required
organism Optional[Union[int, str]]

Organism tax ID or name.

None
reviewed_only bool

Only return reviewed entries.

False
size int

Results per page.

25

Returns:

Type Description
UniProtSearchResult

UniProtSearchResult with matching entries.

Example
fetcher = UniProt_Fetcher()
results = fetcher.search_by_gene("TP53", organism=9606, reviewed_only=True)

search_by_organism

search_by_organism(
    organism: Union[int, str],
    reviewed_only: bool = False,
    size: int = 25,
) -> UniProtSearchResult

Search by organism.

Parameters:

Name Type Description Default
organism Union[int, str]

Organism tax ID or name.

required
reviewed_only bool

Only return reviewed entries.

False
size int

Results per page.

25

Returns:

Type Description
UniProtSearchResult

UniProtSearchResult with matching entries.

search_by_keyword

search_by_keyword(
    keyword: str,
    organism: Optional[Union[int, str]] = None,
    reviewed_only: bool = False,
    size: int = 25,
) -> UniProtSearchResult

Search by keyword.

Parameters:

Name Type Description Default
keyword str

Keyword to search (e.g., "kinase", "receptor").

required
organism Optional[Union[int, str]]

Optional organism filter.

None
reviewed_only bool

Only return reviewed entries.

False
size int

Results per page.

25

Returns:

Type Description
UniProtSearchResult

UniProtSearchResult with matching entries.

map_ids

map_ids(
    ids: List[str],
    from_db: str = "UniProtKB_AC-ID",
    to_db: str = "UniProtKB",
    poll_interval: float = 1.0,
    max_wait: float = 60.0,
) -> Dict[str, List[str]]

Map IDs between databases.

Parameters:

Name Type Description Default
ids List[str]

List of IDs to map.

required
from_db str

Source database (e.g., "UniProtKB_AC-ID", "Gene_Name", "GeneID").

'UniProtKB_AC-ID'
to_db str

Target database (e.g., "UniProtKB", "GeneID", "PDB").

'UniProtKB'
poll_interval float

Seconds between status checks.

1.0
max_wait float

Maximum seconds to wait for job completion.

60.0

Returns:

Type Description
Dict[str, List[str]]

Dictionary mapping input IDs to lists of output IDs.

Example
fetcher = UniProt_Fetcher()
mapping = fetcher.map_ids(
    ["P05067", "P04637"],
    from_db="UniProtKB_AC-ID",
    to_db="GeneID"
)

gene_to_uniprot

gene_to_uniprot(
    gene_names: List[str],
    organism: int = 9606,
    reviewed_only: bool = True,
) -> Dict[str, str]

Map gene names to UniProt accessions.

Uses concurrent requests for efficient batch processing.

Parameters:

Name Type Description Default
gene_names List[str]

List of gene names.

required
organism int

Organism tax ID (default human).

9606
reviewed_only bool

Only return reviewed entries.

True

Returns:

Type Description
Dict[str, str]

Dictionary mapping gene names to accessions.

Example
fetcher = UniProt_Fetcher()
mapping = fetcher.gene_to_uniprot(["TP53", "BRCA1", "EGFR"])

uniprot_to_gene

uniprot_to_gene(accessions: List[str]) -> Dict[str, str]

Map UniProt accessions to gene names.

Parameters:

Name Type Description Default
accessions List[str]

List of UniProt accessions.

required

Returns:

Type Description
Dict[str, str]

Dictionary mapping accessions to gene names.

get_sequences

get_sequences(accessions: List[str]) -> Dict[str, str]

Get protein sequences for accessions.

Parameters:

Name Type Description Default
accessions List[str]

List of UniProt accessions.

required

Returns:

Type Description
Dict[str, str]

Dictionary mapping accessions to sequences.

PubChem_Fetcher

PubChem_Fetcher

PubChem_Fetcher(**data_manager_kws)

Fetcher for PubChem PUG REST and PUG View APIs.

PubChem provides two REST APIs:

PUG REST - Structured data access:

  • Compound records (structures, properties, synonyms)
  • Substance records (deposited data)
  • Bioassay data
  • Gene and protein information
  • Structure searches (similarity, substructure)

PUG View - Annotation/web page content:

  • Detailed compound annotations
  • Safety and hazards information
  • Pharmacology and biochemistry
  • Literature and patents
  • Drug and medication information
Example
fetcher = PubChem_Fetcher()

# Get compound by CID
aspirin = fetcher.get_compound(2244)
print(aspirin.results[0])

# Get compound properties
props = fetcher.get_properties(
    [2244, 3672],
    properties=["MolecularFormula", "MolecularWeight"]
)
df = props.as_dataframe()

# Search by name
results = fetcher.search_by_name("aspirin")

# Similarity search
similar = fetcher.similarity_search(
    smiles="CC(=O)OC1=CC=CC=C1C(=O)O",
    threshold=90
)

# Get safety data
safety = fetcher.get_safety_data(2244)

# Get pharmacology info
pharma = fetcher.get_pharmacology(2244)

get

get(
    domain: str,
    namespace: str,
    identifiers: Optional[
        Union[str, int, List[Union[str, int]]]
    ] = None,
    operation: Optional[str] = None,
    properties: Optional[List[str]] = None,
    output: str = "JSON",
    search_type: Optional[str] = None,
    threshold: Optional[int] = None,
    max_records: Optional[int] = None,
) -> PUGRestFetchedData

Fetch data from PubChem PUG REST API.

Parameters:

Name Type Description Default
domain str

PubChem domain (compound, substance, assay, etc.).

required
namespace str

Identifier namespace (cid, name, smiles, etc.).

required
identifiers Optional[Union[str, int, List[Union[str, int]]]]

ID(s) to look up.

None
operation Optional[str]

Operation to perform (property, synonyms, etc.).

None
properties Optional[List[str]]

List of properties for property operation.

None
output str

Output format (JSON, XML, CSV, SDF, PNG).

'JSON'
search_type Optional[str]

For structure searches (smiles, smarts, inchi).

None
threshold Optional[int]

Similarity threshold (0-100) for similarity searches.

None
max_records Optional[int]

Maximum records to return.

None

Returns:

Type Description
PUGRestFetchedData

PUGRestFetchedData with parsed results.

get_all

get_all(
    domain: str,
    namespace: str,
    identifiers: List[Union[str, int]],
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    batch_size: int = 100,
    rate_limit_per_second: int = 5,
    operation: Optional[str] = None,
    properties: Optional[List[str]] = None,
    **kwargs: Any,
) -> Union[PUGRestFetchedData, Path]

Fetch data for many identifiers by batching.

PubChem allows multiple CIDs/SIDs in a single request (comma-separated), but there are limits. This method batches requests.

Parameters:

Name Type Description Default
domain str

PubChem domain.

required
namespace str

Identifier namespace.

required
identifiers List[Union[str, int]]

List of IDs to fetch.

required
method Literal['concat', 'stream_to_storage']

"concat" or "stream_to_storage".

'concat'
batch_size int

IDs per request (default 100).

100
rate_limit_per_second int

Max requests per second.

5
operation Optional[str]

Operation to perform.

None
properties Optional[List[str]]

Properties for property operation.

None
**kwargs Any

Additional parameters.

{}

Returns:

Type Description
Union[PUGRestFetchedData, Path]

Combined PUGRestFetchedData or Path to output file.

get_compound

get_compound(cid: int) -> PUGRestFetchedData

Get a compound record by CID.

get_compounds

get_compounds(cids: List[int]) -> PUGRestFetchedData

Get multiple compound records by CID.

get_substance

get_substance(sid: int) -> PUGRestFetchedData

Get a substance record by SID.

get_assay

get_assay(aid: int) -> PUGRestFetchedData

Get an assay record by AID.

search_by_name

search_by_name(name: str) -> PUGRestFetchedData

Search compounds by name.

search_by_smiles

search_by_smiles(smiles: str) -> PUGRestFetchedData

Search compounds by SMILES.

search_by_inchikey

search_by_inchikey(inchikey: str) -> PUGRestFetchedData

Search compounds by InChIKey.

search_by_formula

search_by_formula(formula: str) -> PUGRestFetchedData

Search compounds by molecular formula.

get_properties

get_properties(
    cids: Union[int, List[int]],
    properties: Optional[List[str]] = None,
) -> PUGRestFetchedData

Get compound properties.

Parameters:

Name Type Description Default
cids Union[int, List[int]]

Compound ID(s).

required
properties Optional[List[str]]

Properties to retrieve. Defaults to common properties.

None

get_synonyms

get_synonyms(cid: int) -> PUGRestFetchedData

Get synonyms for a compound.

get_cids_by_name

get_cids_by_name(name: str) -> PUGRestFetchedData

Get CIDs matching a name.

get_sids_for_compound

get_sids_for_compound(cid: int) -> PUGRestFetchedData

Get SIDs associated with a compound.

get_aids_for_compound

get_aids_for_compound(cid: int) -> PUGRestFetchedData

Get assay AIDs associated with a compound.

similarity_search(
    smiles: str, threshold: int = 90, max_records: int = 100
) -> PUGRestFetchedData

Find similar compounds by SMILES.

Parameters:

Name Type Description Default
smiles str

Query SMILES string.

required
threshold int

Similarity threshold (0-100).

90
max_records int

Maximum records to return.

100
substructure_search(
    smiles: str, max_records: int = 100
) -> PUGRestFetchedData

Find compounds containing a substructure.

Parameters:

Name Type Description Default
smiles str

Query SMILES string.

required
max_records int

Maximum records to return.

100

get_compound_image

get_compound_image(
    cid: int, image_size: str = "large"
) -> PUGRestFetchedData

Get compound structure image (PNG).

Parameters:

Name Type Description Default
cid int

Compound ID.

required
image_size str

Image size (small, large, or pixel size like "300x300").

'large'

get_compound_sdf

get_compound_sdf(cid: int) -> PUGRestFetchedData

Get compound structure in SDF format.

get_description

get_description(cid: int) -> PUGRestFetchedData

Get compound description.

get_view

get_view(
    record_id: Union[int, str],
    record_type: str = "compound",
    heading: Optional[str] = None,
    output: str = "JSON",
) -> PUGViewFetchedData

Fetch annotation data from PubChem PUG View API.

PUG View provides detailed annotation/web page content including safety data, pharmacology, literature, patents, etc.

Parameters:

Name Type Description Default
record_id Union[int, str]

Record ID (CID for compounds, SID for substances, etc.).

required
record_type str

Type of record (compound, substance, assay, gene, protein, etc.).

'compound'
heading Optional[str]

Optional heading to filter to a specific section.

None
output str

Output format (JSON or XML).

'JSON'

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData with hierarchical annotation data.

get_compound_annotations

get_compound_annotations(cid: int) -> PUGViewFetchedData

Get full annotation data for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData with all annotation sections.

get_substance_annotations

get_substance_annotations(sid: int) -> PUGViewFetchedData

Get full annotation data for a substance.

Parameters:

Name Type Description Default
sid int

Substance ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData with all annotation sections.

get_safety_data

get_safety_data(cid: int) -> PUGViewFetchedData

Get safety and hazards information for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData filtered to Safety and Hazards section.

get_toxicity_data

get_toxicity_data(cid: int) -> PUGViewFetchedData

Get toxicity information for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData filtered to Toxicity section.

get_pharmacology

get_pharmacology(cid: int) -> PUGViewFetchedData

Get pharmacology and biochemistry information for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData filtered to Pharmacology and Biochemistry section.

get_drug_info

get_drug_info(cid: int) -> PUGViewFetchedData

Get drug and medication information for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData filtered to Drug and Medication Information section.

get_literature

get_literature(cid: int) -> PUGViewFetchedData

Get literature references for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData filtered to Literature section.

get_patents

get_patents(cid: int) -> PUGViewFetchedData

Get patent information for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData filtered to Patents section.

get_names_and_identifiers

get_names_and_identifiers(cid: int) -> PUGViewFetchedData

Get names and identifiers for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData filtered to Names and Identifiers section.

get_physical_properties

get_physical_properties(cid: int) -> PUGViewFetchedData

Get chemical and physical properties for a compound.

Parameters:

Name Type Description Default
cid int

Compound ID.

required

Returns:

Type Description
PUGViewFetchedData

PUGViewFetchedData filtered to Chemical and Physical Properties section.

Ensembl_Fetcher

Ensembl_Fetcher

Ensembl_Fetcher(**data_manager_kws)

Fetcher for Ensembl REST API.

Ensembl REST API provides access to genomic data including:

  • Gene/transcript/protein lookup and information
  • Genomic and protein sequences
  • Feature overlap queries
  • Cross-references to external databases
  • Homology and comparative genomics
  • Variant data and VEP (Variant Effect Predictor)
  • Coordinate mapping between assemblies
  • Phenotype and ontology data
Example
fetcher = Ensembl_Fetcher()

# Lookup a gene by Ensembl ID
gene = fetcher.lookup("ENSG00000141510")
print(gene.results[0]["display_name"])  # TP53

# Get sequence for a transcript
seq = fetcher.get_sequence("ENST00000269305", sequence_type="cds")

# Find features overlapping a region
features = fetcher.get_overlap_region(
    "human", "7:140424943-140624564",
    feature=["gene", "transcript"]
)

# Get homologs for a gene
homologs = fetcher.get_homology("human", "ENSG00000141510")

# Get variant consequences
vep = fetcher.get_vep_hgvs("human", "ENST00000366667:c.803C>T")

get

get(
    endpoint: str,
    id: Optional[str] = None,
    ids: Optional[List[str]] = None,
    species: Optional[str] = None,
    symbol: Optional[str] = None,
    region: Optional[str] = None,
    gene: Optional[str] = None,
    name: Optional[str] = None,
    content_type: str = "json",
    **kwargs: Any,
) -> EnsemblFetchedData

Fetch data from Ensembl REST API.

Parameters:

Name Type Description Default
endpoint str

Ensembl endpoint (e.g., "lookup/id", "sequence/id").

required
id Optional[str]

Ensembl stable ID for single lookups.

None
ids Optional[List[str]]

List of IDs for batch requests.

None
species Optional[str]

Species name (e.g., "human", "homo_sapiens").

None
symbol Optional[str]

Gene symbol for symbol-based lookups.

None
region Optional[str]

Genomic region (e.g., "X:1000000..1000100:1").

None
gene Optional[str]

Gene name or ID for phenotype endpoints.

None
name Optional[str]

Name for name-based lookups.

None
content_type str

Response format ("json", "fasta", "text").

'json'
**kwargs Any

Additional endpoint-specific parameters.

{}

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with parsed results.

lookup

lookup(
    id: str,
    species: Optional[str] = None,
    expand: bool = False,
    format: str = "full",
    db_type: str = "core",
    phenotypes: bool = False,
    utr: bool = False,
    mane: bool = False,
) -> EnsemblFetchedData

Look up an Ensembl stable ID.

Parameters:

Name Type Description Default
id str

Ensembl stable ID (e.g., ENSG00000141510).

required
species Optional[str]

Species name/alias (optional, auto-detected from ID).

None
expand bool

Include connected features (transcripts, exons).

False
format str

Response format ("full" or "condensed").

'full'
db_type str

Database type ("core" or "otherfeatures").

'core'
phenotypes bool

Include phenotypes (genes only).

False
utr bool

Include UTR features (requires expand=True).

False
mane bool

Include MANE features (requires expand=True).

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with gene/transcript/protein information.

lookup_batch

lookup_batch(
    ids: List[str],
    species: Optional[str] = None,
    expand: bool = False,
    format: str = "full",
    db_type: str = "core",
) -> EnsemblFetchedData

Look up multiple Ensembl stable IDs in batch.

Parameters:

Name Type Description Default
ids List[str]

List of Ensembl stable IDs (max 1000).

required
species Optional[str]

Species name/alias.

None
expand bool

Include connected features.

False
format str

Response format.

'full'
db_type str

Database type.

'core'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with results for each ID.

lookup_symbol

lookup_symbol(
    species: str,
    symbol: str,
    expand: bool = False,
    format: str = "full",
) -> EnsemblFetchedData

Look up a gene by symbol.

Parameters:

Name Type Description Default
species str

Species name (e.g., "human", "mouse").

required
symbol str

Gene symbol (e.g., "BRCA2", "TP53").

required
expand bool

Include connected features.

False
format str

Response format.

'full'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with gene information.

get_sequence

get_sequence(
    id: str,
    sequence_type: str = "genomic",
    species: Optional[str] = None,
    expand_5prime: Optional[int] = None,
    expand_3prime: Optional[int] = None,
    start: Optional[int] = None,
    end: Optional[int] = None,
    mask: Optional[str] = None,
    mask_feature: bool = False,
    multiple_sequences: bool = False,
    format: str = "fasta",
) -> EnsemblFetchedData

Get sequence for an Ensembl stable ID.

Parameters:

Name Type Description Default
id str

Ensembl stable ID (gene, transcript, exon, protein).

required
sequence_type str

Type of sequence ("genomic", "cds", "cdna", "protein").

'genomic'
species Optional[str]

Species name (optional).

None
expand_5prime Optional[int]

Extend upstream (genomic only).

None
expand_3prime Optional[int]

Extend downstream (genomic only).

None
start Optional[int]

Trim sequence start.

None
end Optional[int]

Trim sequence end.

None
mask Optional[str]

Mask repeats ("hard" or "soft", genomic only).

None
mask_feature bool

Mask introns/UTRs.

False
multiple_sequences bool

Return multiple sequences per ID.

False
format str

Output format ("fasta" or "json").

'fasta'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with sequence data.

get_sequence_batch

get_sequence_batch(
    ids: List[str],
    sequence_type: str = "genomic",
    species: Optional[str] = None,
    format: str = "fasta",
) -> EnsemblFetchedData

Get sequences for multiple Ensembl IDs in batch.

Parameters:

Name Type Description Default
ids List[str]

List of Ensembl stable IDs (max 50).

required
sequence_type str

Type of sequence.

'genomic'
species Optional[str]

Species name.

None
format str

Output format.

'fasta'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with sequences.

get_sequence_region

get_sequence_region(
    species: str,
    region: str,
    expand_5prime: Optional[int] = None,
    expand_3prime: Optional[int] = None,
    mask: Optional[str] = None,
    coord_system: Optional[str] = None,
    format: str = "fasta",
) -> EnsemblFetchedData

Get genomic sequence for a region.

Parameters:

Name Type Description Default
species str

Species name (e.g., "human").

required
region str

Genomic region (e.g., "X:1000000..1000100:1").

required
expand_5prime Optional[int]

Extend upstream.

None
expand_3prime Optional[int]

Extend downstream.

None
mask Optional[str]

Mask repeats ("hard" or "soft").

None
coord_system Optional[str]

Coordinate system filter.

None
format str

Output format ("fasta" or "json").

'fasta'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with sequence.

get_overlap_id

get_overlap_id(
    id: str,
    feature: Union[str, List[str]],
    species: Optional[str] = None,
    biotype: Optional[str] = None,
    logic_name: Optional[str] = None,
    db_type: str = "core",
) -> EnsemblFetchedData

Get features overlapping an Ensembl ID.

Parameters:

Name Type Description Default
id str

Ensembl stable ID.

required
feature Union[str, List[str]]

Feature type(s) to retrieve (gene, transcript, exon, etc.).

required
species Optional[str]

Species name.

None
biotype Optional[str]

Filter by biotype (e.g., "protein_coding").

None
logic_name Optional[str]

Filter by analysis logic name.

None
db_type str

Database type.

'core'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with overlapping features.

get_overlap_region

get_overlap_region(
    species: str,
    region: str,
    feature: Union[str, List[str]],
    biotype: Optional[str] = None,
    logic_name: Optional[str] = None,
    so_term: Optional[str] = None,
    variant_set: Optional[str] = None,
    db_type: str = "core",
) -> EnsemblFetchedData

Get features overlapping a genomic region.

Parameters:

Name Type Description Default
species str

Species name (e.g., "human").

required
region str

Genomic region (e.g., "7:140424943-140624564", max 5Mb).

required
feature Union[str, List[str]]

Feature type(s) to retrieve.

required
biotype Optional[str]

Filter by biotype.

None
logic_name Optional[str]

Filter by analysis logic name.

None
so_term Optional[str]

Sequence Ontology term filter.

None
variant_set Optional[str]

Variant set restriction (e.g., "ClinVar").

None
db_type str

Database type.

'core'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with overlapping features.

get_xrefs

get_xrefs(
    id: str,
    species: Optional[str] = None,
    external_db: Optional[str] = None,
    all_levels: bool = False,
    db_type: str = "core",
    object_type: Optional[str] = None,
) -> EnsemblFetchedData

Get external cross-references for an Ensembl ID.

Parameters:

Name Type Description Default
id str

Ensembl stable ID.

required
species Optional[str]

Species name.

None
external_db Optional[str]

Filter by external database (e.g., "HGNC", "UniProt").

None
all_levels bool

Find all linked features.

False
db_type str

Database type.

'core'
object_type Optional[str]

Filter by feature type.

None

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with cross-references.

get_xrefs_symbol

get_xrefs_symbol(
    species: str,
    symbol: str,
    external_db: Optional[str] = None,
    db_type: str = "core",
    object_type: Optional[str] = None,
) -> EnsemblFetchedData

Look up Ensembl objects by external symbol.

Parameters:

Name Type Description Default
species str

Species name.

required
symbol str

External symbol (e.g., gene name "BRCA2").

required
external_db Optional[str]

Filter by external database.

None
db_type str

Database type.

'core'
object_type Optional[str]

Filter by feature type.

None

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with matching Ensembl objects.

get_homology

get_homology(
    species: str,
    id: str,
    homology_type: str = "all",
    target_species: Optional[str] = None,
    target_taxon: Optional[int] = None,
    aligned: bool = True,
    cigar_line: bool = True,
    sequence: str = "protein",
    compara: str = "vertebrates",
    format: str = "full",
) -> EnsemblFetchedData

Get homology information for a gene.

Parameters:

Name Type Description Default
species str

Source species name.

required
id str

Ensembl gene ID.

required
homology_type str

Type of homology ("orthologues", "paralogues", "all").

'all'
target_species Optional[str]

Filter by target species.

None
target_taxon Optional[int]

Filter by target taxon ID.

None
aligned bool

Include aligned sequences.

True
cigar_line bool

Return sequence in CIGAR format.

True
sequence str

Sequence type ("none", "cdna", "protein").

'protein'
compara str

Compara database name.

'vertebrates'
format str

Response format ("full" or "condensed").

'full'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with homology data.

get_homology_symbol

get_homology_symbol(
    species: str,
    symbol: str,
    homology_type: str = "all",
    target_species: Optional[str] = None,
    sequence: str = "protein",
) -> EnsemblFetchedData

Get homology information for a gene by symbol.

Parameters:

Name Type Description Default
species str

Source species name.

required
symbol str

Gene symbol.

required
homology_type str

Type of homology.

'all'
target_species Optional[str]

Filter by target species.

None
sequence str

Sequence type.

'protein'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with homology data.

get_variation

get_variation(
    species: str,
    id: str,
    genotypes: bool = False,
    pops: bool = False,
    population_genotypes: bool = False,
    phenotypes: bool = False,
    genotyping_chips: bool = False,
) -> EnsemblFetchedData

Get variant information by rsID.

Parameters:

Name Type Description Default
species str

Species name.

required
id str

Variant ID (e.g., "rs56116432").

required
genotypes bool

Include individual genotypes.

False
pops bool

Include population allele frequencies.

False
population_genotypes bool

Include population genotype frequencies.

False
phenotypes bool

Include phenotypes.

False
genotyping_chips bool

Include genotyping chip info.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with variant data.

get_vep_hgvs

get_vep_hgvs(
    species: str,
    hgvs_notation: str,
    canonical: bool = False,
    domains: bool = False,
    hgvs: bool = False,
    numbers: bool = False,
    protein: bool = False,
    refseq: bool = False,
    variant_class: bool = False,
) -> EnsemblFetchedData

Get variant consequences using HGVS notation.

Parameters:

Name Type Description Default
species str

Species name.

required
hgvs_notation str

HGVS notation (e.g., "ENST00000366667:c.803C>T").

required
canonical bool

Only return canonical transcript.

False
domains bool

Include protein domains.

False
hgvs bool

Add HGVS nomenclature.

False
numbers bool

Include exon/intron numbers.

False
protein bool

Include protein position and amino acid changes.

False
refseq bool

Include RefSeq transcripts.

False
variant_class bool

Include variant class.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with VEP results.

get_vep_id

get_vep_id(
    species: str,
    id: str,
    canonical: bool = False,
    domains: bool = False,
    hgvs: bool = False,
    numbers: bool = False,
    protein: bool = False,
) -> EnsemblFetchedData

Get variant consequences using variant ID.

Parameters:

Name Type Description Default
species str

Species name.

required
id str

Variant ID (e.g., rsID).

required
canonical bool

Only return canonical transcript.

False
domains bool

Include protein domains.

False
hgvs bool

Add HGVS nomenclature.

False
numbers bool

Include exon/intron numbers.

False
protein bool

Include protein position.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with VEP results.

get_vep_region

get_vep_region(
    species: str,
    region: str,
    allele: str,
    canonical: bool = False,
    domains: bool = False,
    hgvs: bool = False,
    numbers: bool = False,
    protein: bool = False,
) -> EnsemblFetchedData

Get variant consequences using genomic coordinates.

Parameters:

Name Type Description Default
species str

Species name.

required
region str

Genomic region (e.g., "9:22125503-22125502:1").

required
allele str

Variant allele (e.g., "C", "DUP").

required
canonical bool

Only return canonical transcript.

False
domains bool

Include protein domains.

False
hgvs bool

Add HGVS nomenclature.

False
numbers bool

Include exon/intron numbers.

False
protein bool

Include protein position.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with VEP results.

map_assembly

map_assembly(
    species: str,
    asm_one: str,
    region: str,
    asm_two: str,
    coord_system: str = "chromosome",
    target_coord_system: str = "chromosome",
) -> EnsemblFetchedData

Map coordinates between assemblies.

Parameters:

Name Type Description Default
species str

Species name.

required
asm_one str

Source assembly version (e.g., "GRCh37").

required
region str

Genomic region to map (e.g., "X:1000000..1000100:1").

required
asm_two str

Target assembly version (e.g., "GRCh38").

required
coord_system str

Input coordinate system.

'chromosome'
target_coord_system str

Output coordinate system.

'chromosome'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with mapped coordinates.

get_phenotype_gene

get_phenotype_gene(
    species: str,
    gene: str,
    include_associated: bool = False,
    include_overlap: bool = False,
    include_pubmed_id: bool = False,
    include_review_status: bool = False,
    include_submitter: bool = False,
) -> EnsemblFetchedData

Get phenotypes associated with a gene.

Parameters:

Name Type Description Default
species str

Species name.

required
gene str

Gene name or Ensembl ID.

required
include_associated bool

Include phenotypes from associated variants.

False
include_overlap bool

Include phenotypes from overlapping features.

False
include_pubmed_id bool

Include PubMed IDs.

False
include_review_status bool

Include review status.

False
include_submitter bool

Include submitter names.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with phenotype data.

get_phenotype_region

get_phenotype_region(
    species: str,
    region: str,
    include_pubmed_id: bool = False,
    include_review_status: bool = False,
) -> EnsemblFetchedData

Get phenotypes in a genomic region.

Parameters:

Name Type Description Default
species str

Species name.

required
region str

Genomic region.

required
include_pubmed_id bool

Include PubMed IDs.

False
include_review_status bool

Include review status.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with phenotype data.

get_ontology_term

get_ontology_term(
    id: str,
    relation: Optional[str] = None,
    simple: bool = False,
) -> EnsemblFetchedData

Get ontology term information.

Parameters:

Name Type Description Default
id str

Ontology term ID (e.g., "GO:0005667").

required
relation Optional[str]

Relationship types to include.

None
simple bool

Don't fetch parent/child terms.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with ontology term data.

get_ontology_ancestors

get_ontology_ancestors(
    id: str,
    ontology: Optional[str] = None,
    zero_distance: bool = False,
) -> EnsemblFetchedData

Get ancestor terms for an ontology term.

Parameters:

Name Type Description Default
id str

Ontology term ID.

required
ontology Optional[str]

Filter by ontology.

None
zero_distance bool

Include the term itself.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with ancestor terms.

get_ontology_descendants

get_ontology_descendants(
    id: str,
    ontology: Optional[str] = None,
    zero_distance: bool = False,
    subset: Optional[str] = None,
) -> EnsemblFetchedData

Get descendant terms for an ontology term.

Parameters:

Name Type Description Default
id str

Ontology term ID.

required
ontology Optional[str]

Filter by ontology.

None
zero_distance bool

Include the term itself.

False
subset Optional[str]

Filter by subset.

None

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with descendant terms.

get_genetree

get_genetree(
    id: str,
    aligned: bool = False,
    cigar_line: bool = False,
    sequence: str = "protein",
    nh_format: str = "simple",
    prune_species: Optional[str] = None,
    prune_taxon: Optional[int] = None,
    clusterset_id: Optional[str] = None,
    compara: str = "vertebrates",
) -> EnsemblFetchedData

Get gene tree by tree ID.

Parameters:

Name Type Description Default
id str

Gene tree ID (e.g., "ENSGT00390000003602").

required
aligned bool

Include aligned sequences.

False
cigar_line bool

Return sequence in CIGAR format.

False
sequence str

Sequence type ("none", "cdna", "protein").

'protein'
nh_format str

Newick format type.

'simple'
prune_species Optional[str]

Filter by species.

None
prune_taxon Optional[int]

Filter by taxon ID.

None
clusterset_id Optional[str]

Gene-tree resource name.

None
compara str

Compara database name.

'vertebrates'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with gene tree data.

get_genetree_member

get_genetree_member(
    species: str,
    id: str,
    aligned: bool = False,
    sequence: str = "protein",
    compara: str = "vertebrates",
) -> EnsemblFetchedData

Get gene tree containing a gene ID.

Parameters:

Name Type Description Default
species str

Species name.

required
id str

Ensembl gene ID.

required
aligned bool

Include aligned sequences.

False
sequence str

Sequence type.

'protein'
compara str

Compara database name.

'vertebrates'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with gene tree data.

get_assembly_info

get_assembly_info(
    species: str,
    bands: bool = False,
    synonyms: bool = False,
) -> EnsemblFetchedData

Get assembly information for a species.

Parameters:

Name Type Description Default
species str

Species name.

required
bands bool

Include karyotype band information.

False
synonyms bool

Include known synonyms.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with assembly information.

get_species_info

get_species_info(
    division: Optional[str] = None,
    strain_collection: Optional[str] = None,
    hide_strain_info: bool = False,
) -> EnsemblFetchedData

Get information about available species.

Parameters:

Name Type Description Default
division Optional[str]

Filter by Ensembl division.

None
strain_collection Optional[str]

Filter by strain collection.

None
hide_strain_info bool

Hide strain information.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData with species information.

BioMart_Fetcher

BioMart_Fetcher

BioMart_Fetcher(
    host: Union[str, BioMartHost] = main,
    **data_manager_kws: Any,
)

Fetcher for BioMart (Ensembl) genomic data.

BioMart provides access to:

  • Gene information (IDs, names, descriptions, coordinates)
  • Transcript and protein data
  • Sequence data (cDNA, coding, peptide)
  • Homology information
  • Variation data
  • GO annotations

The API has a hierarchical structure:

  • Server: Contains multiple marts (e.g., ENSEMBL_MART_ENSEMBL)
  • Mart: Contains multiple datasets (e.g., hsapiens_gene_ensembl)
  • Dataset: Contains filters and attributes for queries
Example
fetcher = BioMart_Fetcher()

# List available marts
marts = fetcher.list_marts()
print(marts.marts)

# List datasets in a mart
datasets = fetcher.list_datasets()
print(datasets.search(contain="human"))

# Get gene info by Ensembl IDs
data = fetcher.get_genes(
    ids=["ENSG00000141510", "ENSG00000012048"],
    attributes=["ensembl_gene_id", "external_gene_name", "description"]
)
df = data.as_dataframe()

# Get genes by gene names
data = fetcher.get_genes_by_name(
    names=["TP53", "BRCA1", "BRCA2"],
    attributes=["ensembl_gene_id", "chromosome_name", "start_position"]
)
Note

BioMart API has rate limits and can be slow for large queries. Use batching for queries with many filter values.

Initialize BioMart fetcher.

Parameters:

Name Type Description Default
host Union[str, BioMartHost]

BioMart host (default: www.ensembl.org).

main
**data_manager_kws Any

Keyword arguments for BioMartDataManager.

{}

host property

host: str

Get current host.

list_marts

list_marts() -> BioMartRegistryData

List available marts on the server.

Returns:

Type Description
BioMartRegistryData

BioMartRegistryData with mart information.

list_datasets

list_datasets(
    mart: Union[str, BioMartMart] = ensembl,
) -> BioMartDatasetsData

List datasets available in a mart.

Parameters:

Name Type Description Default
mart Union[str, BioMartMart]

Mart name (default: ENSEMBL_MART_ENSEMBL).

ensembl

Returns:

Type Description
BioMartDatasetsData

BioMartDatasetsData with dataset information.

get_config

get_config(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    use_cache: bool = True,
) -> BioMartConfigData

Get dataset configuration (filters and attributes).

Parameters:

Name Type Description Default
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
use_cache bool

Whether to use cached configuration.

True

Returns:

Type Description
BioMartConfigData

BioMartConfigData with filters and attributes.

list_attributes

list_attributes(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    contain: Optional[str] = None,
    pattern: Optional[str] = None,
) -> Any

List available attributes for a dataset.

Parameters:

Name Type Description Default
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
contain Optional[str]

Filter attributes containing this string.

None
pattern Optional[str]

Filter attributes matching this regex pattern.

None

Returns:

Type Description
Any

DataFrame with attribute information.

list_filters

list_filters(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    contain: Optional[str] = None,
    pattern: Optional[str] = None,
) -> Any

List available filters for a dataset.

Parameters:

Name Type Description Default
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
contain Optional[str]

Filter filters containing this string.

None
pattern Optional[str]

Filter filters matching this regex pattern.

None

Returns:

Type Description
Any

DataFrame with filter information.

query

query(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    attributes: Optional[List[str]] = None,
    filters: Optional[
        Dict[str, Union[str, List[str]]]
    ] = None,
    unique_rows: bool = True,
) -> BioMartQueryData

Execute a BioMart query.

Parameters:

Name Type Description Default
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
attributes Optional[List[str]]

List of attributes to retrieve.

None
filters Optional[Dict[str, Union[str, List[str]]]]

Dict of filter name to value(s).

None
unique_rows bool

Whether to return unique rows only.

True

Returns:

Type Description
BioMartQueryData

BioMartQueryData with query results.

batch_query

batch_query(
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    attributes: Optional[List[str]] = None,
    filter_name: str = "ensembl_gene_id",
    filter_values: List[str] = None,
    batch_size: int = 500,
    max_workers: int = 4,
    show_progress: bool = True,
) -> BioMartQueryData

Execute a batched BioMart query for many filter values.

BioMart has limits on query size, so large filter lists are split into batches and queried in parallel using threads.

Parameters:

Name Type Description Default
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
attributes Optional[List[str]]

List of attributes to retrieve.

None
filter_name str

Name of the filter to batch.

'ensembl_gene_id'
filter_values List[str]

List of filter values.

None
batch_size int

Number of values per batch.

500
max_workers int

Number of parallel workers.

4
show_progress bool

Whether to show progress bar.

True

Returns:

Type Description
BioMartQueryData

Combined BioMartQueryData with all results.

get_genes

get_genes(
    ids: List[str],
    attributes: Optional[List[str]] = None,
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get gene information by Ensembl gene IDs.

Parameters:

Name Type Description Default
ids List[str]

List of Ensembl gene IDs.

required
attributes Optional[List[str]]

Attributes to retrieve. Defaults to common gene attributes.

None
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
batch_size int

Batch size for large queries.

500

Returns:

Type Description
BioMartQueryData

BioMartQueryData with gene information.

get_genes_by_name

get_genes_by_name(
    names: List[str],
    attributes: Optional[List[str]] = None,
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get gene information by gene names (symbols).

Parameters:

Name Type Description Default
names List[str]

List of gene names/symbols.

required
attributes Optional[List[str]]

Attributes to retrieve.

None
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
batch_size int

Batch size for large queries.

500

Returns:

Type Description
BioMartQueryData

BioMartQueryData with gene information.

get_genes_by_chromosome

get_genes_by_chromosome(
    chromosome: str,
    start: Optional[int] = None,
    end: Optional[int] = None,
    attributes: Optional[List[str]] = None,
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
) -> BioMartQueryData

Get genes on a chromosome, optionally within a region.

Parameters:

Name Type Description Default
chromosome str

Chromosome name (e.g., "1", "X", "MT").

required
start Optional[int]

Start position (optional).

None
end Optional[int]

End position (optional).

None
attributes Optional[List[str]]

Attributes to retrieve.

None
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene

Returns:

Type Description
BioMartQueryData

BioMartQueryData with genes in the region.

get_transcripts

get_transcripts(
    gene_ids: List[str],
    attributes: Optional[List[str]] = None,
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get transcript information for genes.

Parameters:

Name Type Description Default
gene_ids List[str]

List of Ensembl gene IDs.

required
attributes Optional[List[str]]

Attributes to retrieve.

None
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
batch_size int

Batch size for large queries.

500

Returns:

Type Description
BioMartQueryData

BioMartQueryData with transcript information.

get_go_annotations

get_go_annotations(
    gene_ids: List[str],
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get Gene Ontology annotations for genes.

Parameters:

Name Type Description Default
gene_ids List[str]

List of Ensembl gene IDs.

required
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
batch_size int

Batch size for large queries.

500

Returns:

Type Description
BioMartQueryData

BioMartQueryData with GO annotations.

get_homologs

get_homologs(
    gene_ids: List[str],
    target_species: str = "mmusculus",
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Get homolog information for genes.

Parameters:

Name Type Description Default
gene_ids List[str]

List of Ensembl gene IDs.

required
target_species str

Target species for homologs (e.g., "mmusculus").

'mmusculus'
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
batch_size int

Batch size for large queries.

500

Returns:

Type Description
BioMartQueryData

BioMartQueryData with homolog information.

convert_ids

convert_ids(
    ids: List[str],
    from_type: str = "ensembl_gene_id",
    to_type: str = "external_gene_name",
    dataset: Union[str, BioMartDataset] = hsapiens_gene,
    batch_size: int = 500,
) -> BioMartQueryData

Convert between different ID types.

Common ID types: - ensembl_gene_id - ensembl_transcript_id - ensembl_peptide_id - external_gene_name - entrezgene_id - uniprot_gn_id - hgnc_symbol - hgnc_id - refseq_mrna - refseq_peptide

Parameters:

Name Type Description Default
ids List[str]

List of IDs to convert.

required
from_type str

Source ID type (also used as filter).

'ensembl_gene_id'
to_type str

Target ID type.

'external_gene_name'
dataset Union[str, BioMartDataset]

Dataset name.

hsapiens_gene
batch_size int

Batch size for large queries.

500

Returns:

Type Description
BioMartQueryData

BioMartQueryData with ID mappings.

KEGG_Fetcher

KEGG_Fetcher

KEGG_Fetcher(**data_manager_kws: Any)

Fetcher for KEGG REST API.

KEGG (Kyoto Encyclopedia of Genes and Genomes) provides access to:

  • Pathway information and diagrams
  • Gene and protein entries
  • Compound and drug data
  • Disease information
  • Organism-specific pathway lists
  • ID conversion between databases

Operations:

  • info: Get database statistics
  • list: List database entries
  • find: Search entries by keyword
  • get: Retrieve specific entries
  • conv: Convert IDs between databases
  • link: Find linked entries across databases
  • ddi: Drug-drug interactions
Example
fetcher = KEGG_Fetcher()

# Get database info
info = fetcher.get("info", database="pathway")
print(info.text)

# List human pathways
pathways = fetcher.get("list", database="pathway", organism="hsa")
print(pathways.to_dataframe())

# Search for genes
results = fetcher.get("find", database="genes", query="tp53")

# Get specific entries
entries = fetcher.get("get", dbentries=["hsa:7157", "hsa:672"])
for record in entries.records:
    print(record.get("ENTRY"), record.get("NAME"))

# Convert KEGG IDs to NCBI Gene IDs
mapping = fetcher.get("conv", target_db="ncbi-geneid", dbentries=["hsa:7157"])

Initialize KEGG fetcher.

Parameters:

Name Type Description Default
**data_manager_kws Any

Keyword arguments for KEGGDataManager (e.g., storage_path for stream_to_storage method).

{}

get

get(operation: str, **kwargs: Any) -> KEGGFetchedData

Fetch data from KEGG REST API.

Parameters:

Name Type Description Default
operation str

KEGG operation (info, list, find, get, conv, link, ddi).

required
**kwargs Any

Operation-specific parameters (database, query, dbentries, etc.).

{}

Returns:

Type Description
KEGGFetchedData

KEGGFetchedData with parsed results.

get_all

get_all(
    operation: str,
    dbentries: List[str],
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    batch_size: int = DEFAULT_BATCH_SIZE,
    rate_limit_per_second: int = 3,
    get_option: Optional[str] = None,
    **kwargs: Any,
) -> Union[KEGGFetchedData, Path]

Fetch data for many entries by batching and concurrent requests.

KEGG limits certain operations (get, conv, link, ddi) to a small number of entries per request. This method splits a large entry list into batches and fetches them concurrently.

Parameters:

Name Type Description Default
operation str

KEGG operation (get, conv, link, ddi).

required
dbentries List[str]

List of database entry IDs to fetch.

required
method Literal['concat', 'stream_to_storage']

"concat" returns a single :class:KEGGFetchedData. "stream_to_storage" writes batches to storage and returns the output file :class:Path (requires storage_path in constructor).

'concat'
batch_size int

Entries per request (default 10, KEGG's limit).

DEFAULT_BATCH_SIZE
rate_limit_per_second int

Max requests per second (default 3 to be conservative with KEGG).

3
get_option Optional[str]

For get operation, the output format (aaseq, ntseq, image, json, etc.).

None
**kwargs Any

Additional parameters (target_db for conv/link, etc.).

{}

Returns:

Type Description
Union[KEGGFetchedData, Path]

Combined KEGGFetchedData or Path to output file.

Example::

fetcher = KEGG_Fetcher(storage_path="./data")
genes = ["hsa:10458", "hsa:7157", "hsa:672", ...]  # 100+ genes
data = fetcher.get_all("get", genes)
print(len(data.records))

ChEMBL_Fetcher

ChEMBL_Fetcher

ChEMBL_Fetcher(**data_manager_kws)

Fetcher for ChEMBL REST API.

ChEMBL provides bioactivity data for drug-like molecules including:

  • Molecules and their properties
  • Bioactivity measurements
  • Targets (proteins, cell lines, organisms)
  • Assays and documents
  • Drug information and indications
Example
fetcher = ChEMBL_Fetcher()

# Get a specific molecule by ChEMBL ID
aspirin = fetcher.get(resource="molecule", chembl_id="CHEMBL25")
print(aspirin.results[0]["pref_name"])

# Search for molecules
results = fetcher.get(
    resource="molecule",
    search_query="aspirin",
    limit=10
)

# Filter activities by target
activities = fetcher.get(
    resource="activity",
    filters={"target_chembl_id": "CHEMBL240"},
    limit=100
)

# Similarity search
similar = fetcher.get(
    resource="similarity",
    smiles="CC(=O)Oc1ccccc1C(=O)O",  # Aspirin SMILES
    similarity_threshold=70,
    limit=50
)

get

get(
    resource: str,
    chembl_id: Optional[str] = None,
    search_query: Optional[str] = None,
    filters: Optional[Dict[str, Any]] = None,
    smiles: Optional[str] = None,
    similarity_threshold: Optional[int] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
    format: str = "json",
) -> ChEMBLFetchedData

Fetch data from ChEMBL REST API.

Parameters:

Name Type Description Default
resource str

ChEMBL resource (molecule, activity, target, etc.).

required
chembl_id Optional[str]

Optional ChEMBL ID for single-entry lookup.

None
search_query Optional[str]

Optional full-text search query.

None
filters Optional[Dict[str, Any]]

Optional field filters (e.g., {"max_phase": 4}).

None
smiles Optional[str]

SMILES string for similarity/substructure search.

None
similarity_threshold Optional[int]

Threshold for similarity search (40-100).

None
limit Optional[int]

Max records to return (1-1000).

None
offset Optional[int]

Pagination offset.

None
format str

Output format (json or xml).

'json'

Returns:

Type Description
ChEMBLFetchedData

ChEMBLFetchedData with parsed results.

get_all

get_all(
    resource: str,
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    limit_per_page: int = 1000,
    max_records: Optional[int] = None,
    rate_limit_per_second: int = 5,
    search_query: Optional[str] = None,
    filters: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> Union[ChEMBLFetchedData, Path]

Fetch multiple pages of results concurrently.

Parameters:

Name Type Description Default
resource str

ChEMBL resource (molecule, activity, target, etc.).

required
method Literal['concat', 'stream_to_storage']

"concat" returns a single ChEMBLFetchedData. "stream_to_storage" streams each batch to storage and returns the output file Path.

'concat'
limit_per_page int

Records per request (default 1000, max 1000).

1000
max_records Optional[int]

Total records to fetch. None means fetch all.

None
rate_limit_per_second int

Max concurrent requests per second.

5
search_query Optional[str]

Optional full-text search query.

None
filters Optional[Dict[str, Any]]

Optional field filters.

None
**kwargs Any

Additional parameters.

{}

Returns:

Type Description
Union[ChEMBLFetchedData, Path]

Combined ChEMBLFetchedData or Path to output file.

get_molecule

get_molecule(chembl_id: str) -> ChEMBLFetchedData

Get a single molecule by ChEMBL ID.

get_target

get_target(chembl_id: str) -> ChEMBLFetchedData

Get a single target by ChEMBL ID.

search_molecules

search_molecules(
    query: str, limit: int = 20
) -> ChEMBLFetchedData

Search molecules by name or description.

get_activities_for_target

get_activities_for_target(
    target_chembl_id: str, limit: int = 1000
) -> ChEMBLFetchedData

Get bioactivity data for a specific target.

get_activities_for_molecule

get_activities_for_molecule(
    molecule_chembl_id: str, limit: int = 1000
) -> ChEMBLFetchedData

Get bioactivity data for a specific molecule.

similarity_search(
    smiles: str, threshold: int = 70, limit: int = 100
) -> ChEMBLFetchedData

Find molecules similar to a given SMILES structure.

substructure_search(
    smiles: str, limit: int = 100
) -> ChEMBLFetchedData

Find molecules containing a given substructure.

get_approved_drugs

get_approved_drugs(limit: int = 1000) -> ChEMBLFetchedData

Get approved drugs (max_phase = 4).

get_drug_indications

get_drug_indications(
    molecule_chembl_id: str, limit: int = 100
) -> ChEMBLFetchedData

Get indications for a specific drug/molecule.

get_mechanisms

get_mechanisms(
    molecule_chembl_id: str, limit: int = 100
) -> ChEMBLFetchedData

Get mechanisms of action for a specific molecule.

QuickGO_Fetcher

QuickGO_Fetcher

QuickGO_Fetcher(**data_manager_kws: Any)

Fetcher for QuickGO API (GO annotations, ontology, gene products).

QuickGO provides access to:

  • Gene Ontology term information
  • GO annotations for genes/proteins
  • Gene product information
  • Annotation downloads in various formats (GAF, GPAD, TSV)

Categories:

  • ontology: GO term search and retrieval
  • annotation: GO annotation search and download
  • geneproduct: Gene product information
Example
fetcher = QuickGO_Fetcher()

# Search GO terms
data = fetcher.get(
    category="ontology",
    endpoint="search",
    query="apoptosis"
)

# Get GO term by ID
data = fetcher.get(
    category="ontology",
    endpoint="terms/{ids}",
    ids=["GO:0008150", "GO:0003674"]
)

# Search annotations for human
data = fetcher.get(
    category="annotation",
    endpoint="search",
    goId="GO:0006915",  # apoptotic process
    taxonId=9606
)
df = data.as_dataframe()

Initialize QuickGO fetcher.

Parameters:

Name Type Description Default
**data_manager_kws Any

Keyword arguments for QuickGODataManager (e.g., storage_path for stream_to_storage method).

{}

get

get(
    category: str, endpoint: str, **kwargs: Any
) -> QuickGOFetchedData

Fetch data from QuickGO API.

Parameters:

Name Type Description Default
category str

QuickGO category (ontology, annotation, geneproduct).

required
endpoint str

API endpoint (search, terms/{ids}, downloadSearch, etc.).

required
**kwargs Any

Endpoint-specific parameters.

{}

Returns:

Type Description
QuickGOFetchedData

QuickGOFetchedData with parsed results.

get_all

get_all(
    category: str,
    endpoint: str,
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    limit_per_page: int = DEFAULT_LIMIT,
    max_records: Optional[int] = None,
    rate_limit_per_second: int = 5,
    **kwargs: Any,
) -> Union[QuickGOFetchedData, Path]

Fetch multiple pages of results concurrently.

Parameters:

Name Type Description Default
category str

QuickGO category (ontology, annotation, geneproduct).

required
endpoint str

API endpoint (search, etc.). Note: downloadSearch doesn't support pagination, use get() directly.

required
method Literal['concat', 'stream_to_storage']

"concat" returns a single QuickGOFetchedData. "stream_to_storage" streams each batch to storage and returns the output file Path.

'concat'
limit_per_page int

Records per request (default 100, max 10000).

DEFAULT_LIMIT
max_records Optional[int]

Total records to fetch. None means fetch all.

None
rate_limit_per_second int

Max concurrent requests per second.

5
**kwargs Any

Forwarded to the API (goId, taxonId, etc.).

{}

Returns:

Type Description
Union[QuickGOFetchedData, Path]

Combined QuickGOFetchedData or Path to output file.

HPA_Fetcher

HPA_Fetcher

HPA_Fetcher(**data_manager_kws)

Fetcher for Human Protein Atlas data.

The Human Protein Atlas provides proteomics data including:

  • Tissue expression (protein and RNA)
  • Subcellular location
  • Cell type expression
  • Blood cell expression
  • Brain region expression
  • Cancer/pathology data
Example
fetcher = HPA_Fetcher()

# Get gene data by Ensembl ID
tp53 = fetcher.get_gene("ENSG00000141510")
print(tp53.results[0])

# Search for genes
results = fetcher.search("TP53")
print(results.get_gene_names())

# Get specific columns for genes
data = fetcher.search_download(
    search="TP53",
    columns=["g", "gs", "eg", "gd", "rnats_s"]
)
df = data.as_dataframe()

# Get expression data with default columns
expr = fetcher.get_expression("BRCA1")

# Get subcellular location data
loc = fetcher.get_subcellular_location("ENSG00000141510")

get_gene

get_gene(
    ensembl_id: str, format: str = "json"
) -> HPAFetchedData

Get gene data by Ensembl ID.

Parameters:

Name Type Description Default
ensembl_id str

Ensembl gene ID (e.g., "ENSG00000141510").

required
format str

Output format (json, tsv, xml).

'json'

Returns:

Type Description
HPAFetchedData

HPAFetchedData with gene information.

get_genes

get_genes(
    ensembl_ids: List[str],
    format: str = "json",
    rate_limit_per_second: int = 5,
) -> HPAFetchedData

Get data for multiple genes by Ensembl IDs.

Parameters:

Name Type Description Default
ensembl_ids List[str]

List of Ensembl gene IDs.

required
format str

Output format.

'json'
rate_limit_per_second int

Rate limit for API calls.

5

Returns:

Type Description
HPAFetchedData

Combined HPAFetchedData.

search

search(
    query: str, format: str = "json", compress: str = "no"
) -> HPAFetchedData

Search for genes in HPA.

Parameters:

Name Type Description Default
query str

Search query (gene name, etc.).

required
format str

Output format (json, tsv, xml).

'json'
compress str

Whether to compress response (yes/no).

'no'

Returns:

Type Description
HPAFetchedData

HPAFetchedData with search results.

search_download

search_download(
    search: str,
    columns: Optional[List[str]] = None,
    format: str = "json",
    compress: str = "no",
) -> HPAFetchedData

Fetch customized data using the search_download API.

This is the most flexible way to retrieve HPA data, allowing selection of specific columns.

Parameters:

Name Type Description Default
search str

Gene search query.

required
columns Optional[List[str]]

List of column specifiers (see HPA_COLUMNS). If None, uses DEFAULT_GENE_COLUMNS.

None
format str

Output format (json or tsv).

'json'
compress str

Whether to compress response (yes/no).

'no'

Returns:

Type Description
HPAFetchedData

HPAFetchedData with requested columns.

get_all

get_all(
    search: str,
    columns: Optional[List[str]] = None,
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    format: str = "json",
    **kwargs: Any,
) -> Union[HPAFetchedData, Path]

Fetch data with batching support.

Note: HPA's search_download API doesn't natively support pagination, so this method is mainly useful for storing results.

Parameters:

Name Type Description Default
search str

Gene search query.

required
columns Optional[List[str]]

List of column specifiers.

None
method Literal['concat', 'stream_to_storage']

"concat" or "stream_to_storage".

'concat'
format str

Output format.

'json'
**kwargs Any

Additional parameters.

{}

Returns:

Type Description
Union[HPAFetchedData, Path]

HPAFetchedData or Path to stored file.

get_expression

get_expression(
    search: str, columns: Optional[List[str]] = None
) -> HPAFetchedData

Get expression data for gene(s).

Parameters:

Name Type Description Default
search str

Gene search query.

required
columns Optional[List[str]]

Expression columns to retrieve. If None, uses DEFAULT_EXPRESSION_COLUMNS.

None

Returns:

Type Description
HPAFetchedData

HPAFetchedData with expression data.

get_subcellular_location

get_subcellular_location(
    search: str, columns: Optional[List[str]] = None
) -> HPAFetchedData

Get subcellular location data for gene(s).

Parameters:

Name Type Description Default
search str

Gene search query.

required
columns Optional[List[str]]

Subcellular location columns to retrieve. If None, uses DEFAULT_SUBCELLULAR_COLUMNS.

None

Returns:

Type Description
HPAFetchedData

HPAFetchedData with subcellular location data.

get_pathology

get_pathology(
    search: str, columns: Optional[List[str]] = None
) -> HPAFetchedData

Get pathology/cancer prognostics data for gene(s).

Parameters:

Name Type Description Default
search str

Gene search query.

required
columns Optional[List[str]]

Pathology columns to retrieve. If None, uses DEFAULT_PATHOLOGY_COLUMNS.

None

Returns:

Type Description
HPAFetchedData

HPAFetchedData with pathology data.

get_protein_class

get_protein_class(search: str) -> HPAFetchedData

Get protein class information for gene(s).

Parameters:

Name Type Description Default
search str

Gene search query.

required

Returns:

Type Description
HPAFetchedData

HPAFetchedData with protein class information.

get_tissue_expression

get_tissue_expression(
    search: str, tissues: Optional[List[str]] = None
) -> HPAFetchedData

Get tissue-specific RNA expression data.

Parameters:

Name Type Description Default
search str

Gene search query.

required
tissues Optional[List[str]]

List of tissue column names to include. If None, gets general tissue expression info.

None

Returns:

Type Description
HPAFetchedData

HPAFetchedData with tissue expression data.

get_blood_expression

get_blood_expression(search: str) -> HPAFetchedData

Get blood cell expression data for gene(s).

Parameters:

Name Type Description Default
search str

Gene search query.

required

Returns:

Type Description
HPAFetchedData

HPAFetchedData with blood cell expression data.

get_brain_expression

get_brain_expression(search: str) -> HPAFetchedData

Get brain region expression data for gene(s).

Parameters:

Name Type Description Default
search str

Gene search query.

required

Returns:

Type Description
HPAFetchedData

HPAFetchedData with brain region expression data.

download_bulk_data

download_bulk_data(
    file_type: str = "json",
    version: Optional[str] = None,
    output_path: Optional[str] = None,
) -> Path

Download bulk HPA data file.

Parameters:

Name Type Description Default
file_type str

File type to download (tsv, json, xml).

'json'
version Optional[str]

HPA version number (e.g., "24"). None for latest.

None
output_path Optional[str]

Path to save file. If None, saves to data manager path.

None

Returns:

Type Description
Path

Path to downloaded file.

list_columns staticmethod

list_columns() -> Dict[str, str]

List available column specifiers for search_download API.

Returns:

Type Description
Dict[str, str]

Dictionary mapping column codes to descriptions.

NCBI_Fetcher

NCBI_Fetcher

NCBI_Fetcher(api_key: Optional[str] = None)

Fetcher for NCBI Datasets API.

Provides access to NCBI gene, taxonomy, and genome data via the Datasets REST API v2.

Example
fetcher = NCBI_Fetcher()

# Get gene information by NCBI Gene ID
genes = fetcher.get_genes_by_id([7157, 672])  # TP53, BRCA1
print(genes.as_dataframe())

# Get gene by symbol and taxon
genes = fetcher.get_genes_by_symbol(["TP53", "BRCA1"], taxon="human")

# Get taxonomy information
tax = fetcher.get_taxonomy([9606, 10090])  # Human, mouse
print(tax.as_dataframe())

# Translate gene symbols to IDs
mapping = fetcher.symbol_to_id(["TP53", "BRCA1"], taxon="human")

Initialize NCBI fetcher.

Parameters:

Name Type Description Default
api_key Optional[str]

NCBI API key for higher rate limits. Can also be set via NCBI_API_KEY environment variable.

None

get_genes_by_id

get_genes_by_id(
    gene_ids: List[int],
    returned_content: Optional[str] = None,
    page_size: int = 100,
    query: Optional[str] = None,
    types: Optional[List[str]] = None,
) -> NCBIGeneFetchedData

Get gene data reports by NCBI Gene IDs.

Parameters:

Name Type Description Default
gene_ids List[int]

List of NCBI Gene IDs (e.g., [7157, 672]).

required
returned_content Optional[str]

Content type (COMPLETE, IDS_ONLY, COUNTS_ONLY).

None
page_size int

Results per page (max 1000).

100
query Optional[str]

Additional search query.

None
types Optional[List[str]]

Gene type filter (e.g., ["PROTEIN_CODING"]).

None

Returns:

Type Description
NCBIGeneFetchedData

NCBIGeneFetchedData with gene reports.

Example

fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_id([7157, 672]) print(genes.get_gene_symbols()) ['TP53', 'BRCA1']

get_genes_by_symbol

get_genes_by_symbol(
    symbols: List[str],
    taxon: Union[int, str] = "human",
    returned_content: Optional[str] = None,
    page_size: int = 100,
) -> NCBIGeneFetchedData

Get gene data reports by gene symbols and taxon.

Parameters:

Name Type Description Default
symbols List[str]

List of gene symbols (e.g., ["TP53", "BRCA1"]).

required
taxon Union[int, str]

Taxon ID, common name, or scientific name.

'human'
returned_content Optional[str]

Content type.

None
page_size int

Results per page.

100

Returns:

Type Description
NCBIGeneFetchedData

NCBIGeneFetchedData with gene reports.

Example

fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_symbol(["TP53", "BRCA1"], taxon="human") print(genes.to_id_mapping())

get_genes_by_accession

get_genes_by_accession(
    accessions: List[str],
    returned_content: Optional[str] = None,
    page_size: int = 100,
) -> NCBIGeneFetchedData

Get gene data reports by RefSeq accessions.

Parameters:

Name Type Description Default
accessions List[str]

List of RefSeq accessions (e.g., ["NM_000546.6"]).

required
returned_content Optional[str]

Content type.

None
page_size int

Results per page.

100

Returns:

Type Description
NCBIGeneFetchedData

NCBIGeneFetchedData with gene reports.

get_genes_by_taxon

get_genes_by_taxon(
    taxon: Union[int, str],
    query: Optional[str] = None,
    types: Optional[List[str]] = None,
    page_size: int = 100,
    page_token: Optional[str] = None,
) -> NCBIGeneFetchedData

Get gene data reports by taxon.

Parameters:

Name Type Description Default
taxon Union[int, str]

Taxon ID, common name, or scientific name.

required
query Optional[str]

Search query for gene name/symbol/description.

None
types Optional[List[str]]

Gene type filter.

None
page_size int

Results per page.

100
page_token Optional[str]

Token for pagination.

None

Returns:

Type Description
NCBIGeneFetchedData

NCBIGeneFetchedData with gene reports.

Example

fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_taxon("human", query="kinase")

get_taxonomy

get_taxonomy(
    taxons: List[Union[int, str]], page_size: int = 100
) -> NCBITaxonomyFetchedData

Get taxonomy data reports.

Parameters:

Name Type Description Default
taxons List[Union[int, str]]

List of taxonomy IDs or names.

required
page_size int

Results per page.

100

Returns:

Type Description
NCBITaxonomyFetchedData

NCBITaxonomyFetchedData with taxonomy reports.

Example

fetcher = NCBI_Fetcher() tax = fetcher.get_taxonomy([9606, 10090]) print(tax.as_dataframe())

get_genome_by_accession

get_genome_by_accession(
    accessions: List[str], page_size: int = 100
) -> NCBIGenomeFetchedData

Get genome assembly data reports by accession.

Parameters:

Name Type Description Default
accessions List[str]

List of assembly accessions (e.g., ["GCF_000001405.40"]).

required
page_size int

Results per page.

100

Returns:

Type Description
NCBIGenomeFetchedData

NCBIGenomeFetchedData with genome reports.

Example

fetcher = NCBI_Fetcher() genomes = fetcher.get_genome_by_accession(["GCF_000001405.40"])

get_genome_by_taxon

get_genome_by_taxon(
    taxon: Union[int, str],
    page_size: int = 100,
    page_token: Optional[str] = None,
    reference_only: bool = False,
    assembly_source: Optional[str] = None,
) -> NCBIGenomeFetchedData

Get genome assembly data reports by taxon.

Parameters:

Name Type Description Default
taxon Union[int, str]

Taxon ID, common name, or scientific name.

required
page_size int

Results per page.

100
page_token Optional[str]

Token for pagination.

None
reference_only bool

If True, only return reference genomes.

False
assembly_source Optional[str]

Filter by source ("refseq", "genbank", "all").

None

Returns:

Type Description
NCBIGenomeFetchedData

NCBIGenomeFetchedData with genome reports.

get_version

get_version() -> str

Get NCBI Datasets API version.

Returns:

Type Description
str

Version string.

symbol_to_id

symbol_to_id(
    symbols: List[str], taxon: Union[int, str] = "human"
) -> Dict[str, int]

Convert gene symbols to NCBI Gene IDs.

Parameters:

Name Type Description Default
symbols List[str]

List of gene symbols.

required
taxon Union[int, str]

Taxon for the genes.

'human'

Returns:

Type Description
Dict[str, int]

Dictionary mapping symbols to gene IDs.

Example

fetcher = NCBI_Fetcher() mapping = fetcher.symbol_to_id(["TP53", "BRCA1"]) print(mapping)

id_to_symbol

id_to_symbol(gene_ids: List[int]) -> Dict[int, str]

Convert NCBI Gene IDs to gene symbols.

Parameters:

Name Type Description Default
gene_ids List[int]

List of NCBI Gene IDs.

required

Returns:

Type Description
Dict[int, str]

Dictionary mapping gene IDs to symbols.

Example

fetcher = NCBI_Fetcher() mapping = fetcher.id_to_symbol([7157, 672]) print(mapping)

get_gene_info

get_gene_info(
    identifiers: List[Union[int, str]],
    taxon: Union[int, str] = "human",
) -> NCBIGeneFetchedData

Get gene information by mixed identifiers (IDs or symbols).

Automatically detects whether input is gene IDs or symbols and routes to the appropriate endpoint.

Parameters:

Name Type Description Default
identifiers List[Union[int, str]]

List of gene IDs (int) or symbols (str).

required
taxon Union[int, str]

Taxon for symbol lookups.

'human'

Returns:

Type Description
NCBIGeneFetchedData

NCBIGeneFetchedData with gene reports.

FDA_Fetcher

FDA_Fetcher

FDA_Fetcher(
    api_key: Optional[str] = None,
    limit: Optional[int] = None,
    **data_manager_kws: Any,
)

Fetcher for openFDA API.

The openFDA API provides access to FDA data including:

  • Drug adverse events (drug/event)
  • Drug product labeling (drug/label)
  • Drug recalls and enforcement (drug/enforcement)
  • Device adverse events and recalls
  • Food recalls and enforcement

Rate limits:

  • Without API key: 240 requests/min, 1,000 requests/day per IP
  • With API key: 240 requests/min, 120,000 requests/day per key
Example
fetcher = FDA_Fetcher()

# Search drug adverse events
events = fetcher.get(
    category="drug",
    endpoint="event",
    search={"patient.drug.medicinalproduct": "aspirin"},
    limit=10
)
df = events.as_dataframe(columns=["receivedate", "patient.patientsex"])

# Get drug labels
labels = fetcher.get(
    category="drug",
    endpoint="label",
    search={"openfda.brand_name": "TYLENOL"},
    limit=5
)

Initialize FDA fetcher.

Parameters:

Name Type Description Default
api_key Optional[str]

openFDA API key for higher rate limits (optional).

None
limit Optional[int]

Default limit for queries. If None, uses API default.

None
**data_manager_kws Any

Keyword arguments for FDADataManager (e.g., storage_path for stream_to_storage method).

{}

get

get(
    category: str,
    endpoint: str,
    stream: Optional[bool] = None,
    **kwargs: Any,
) -> FDAFetchedData

Fetch data from openFDA API.

Parameters:

Name Type Description Default
category str

FDA category (e.g., "drug", "device", "food").

required
endpoint str

Category endpoint (e.g., "event", "label", "enforcement").

required
stream Optional[bool]

If True, stream the response (for large downloads).

None
**kwargs Any

Query parameters including: - search: Search query dict (e.g., {"field": "value"}). - limit: Maximum records to return (1-1000). - skip: Number of records to skip for pagination. - sort: Sort field and direction. - count: Field to count occurrences of. - api_key: Override default API key.

{}

Returns:

Type Description
FDAFetchedData

FDAFetchedData with query results.

Example

fetcher = FDA_Fetcher() data = fetcher.get( ... category="drug", ... endpoint="event", ... search={"patient.drug.medicinalproduct": "aspirin"}, ... limit=10 ... ) print(data)

get_all

get_all(
    category: str,
    endpoint: str,
    method: Literal[
        "concat", "stream_to_storage"
    ] = "concat",
    batch_size: int = 1000,
    max_records: Optional[int] = None,
    rate_limit_per_second: int = 4,
    **kwargs: Any,
) -> Union[FDAFetchedData, Path]

Fetch multiple pages of results concurrently.

Uses :meth:schedule_process to dispatch page requests across threads while staying within the FDA rate limit.

Parameters:

Name Type Description Default
category str

FDA category (e.g. "drug").

required
endpoint str

FDA endpoint (e.g. "event").

required
method Literal['concat', 'stream_to_storage']

"concat" accumulates all results in memory and returns a single :class:FDAFetchedData. "stream_to_storage" streams each batch to the data manager as JSON Lines and returns the output file :class:Path.

'concat'
batch_size int

Records per request (max 1000).

1000
max_records Optional[int]

Total records to fetch. None means fetch all available records.

None
rate_limit_per_second int

Max concurrent requests per second (FDA default: 240/min ≈ 4/sec).

4
**kwargs Any

Forwarded to the API (search, sort, etc.).

{}

Note — openFDA rate limits: Without an API key: 240 req/min, 1 000 req/day per IP. With an API key: 240 req/min, 120 000 req/day per key.

Reactome_Fetcher

Reactome_Fetcher

Reactome_Fetcher(species: str = 'Homo sapiens')

Fetcher for Reactome pathway analysis and content APIs.

Reactome provides comprehensive pathway analysis including:

  • Over-representation analysis (ORA)
  • Expression analysis
  • Species comparison
  • Pathway hierarchy and content
Example
fetcher = Reactome_Fetcher()

# Perform pathway analysis
genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"]
result = fetcher.analyze(genes)
print(result.significant_pathways().as_dataframe())

# Analysis with projection to human
result = fetcher.analyze_projection(genes, species="Mus musculus")

# Get top-level pathways
pathways = fetcher.get_pathways_top("Homo sapiens")
print(pathways.get_pathway_names())

# Get species list
species = fetcher.get_species()
print(species.get_species_names())

Initialize Reactome fetcher.

Parameters:

Name Type Description Default
species str

Default species for analysis (e.g., "Homo sapiens").

'Homo sapiens'

set_species

set_species(species: str)

Change the default species.

Parameters:

Name Type Description Default
species str

Species name (e.g., "Homo sapiens", "Mus musculus").

required

analyze

analyze(
    identifiers: List[str],
    species: Optional[str] = None,
    interactors: bool = False,
    page_size: int = 100,
    sort_by: str = "ENTITIES_FDR",
    order: str = "ASC",
    resource: str = "TOTAL",
    p_value: float = 1.0,
    include_disease: bool = True,
    min_entities: Optional[int] = None,
    max_entities: Optional[int] = None,
) -> ReactomeFetchedData

Perform pathway over-representation analysis.

Submits identifiers to Reactome Analysis Service and returns enriched pathways with statistics.

Parameters:

Name Type Description Default
identifiers List[str]

List of identifiers (gene symbols, UniProt IDs, etc.).

required
species Optional[str]

Species name. None uses default.

None
interactors bool

Include interactors in analysis.

False
page_size int

Number of results per page.

100
sort_by str

Sort field (ENTITIES_FDR, ENTITIES_PVALUE, etc.).

'ENTITIES_FDR'
order str

Sort order (ASC, DESC).

'ASC'
resource str

Resource filter (TOTAL, UNIPROT, ENSEMBL, etc.).

'TOTAL'
p_value float

P-value cutoff for filtering results.

1.0
include_disease bool

Include disease pathways.

True
min_entities Optional[int]

Minimum pathway size.

None
max_entities Optional[int]

Maximum pathway size.

None

Returns:

Type Description
ReactomeFetchedData

ReactomeFetchedData with pathway enrichment results.

Example

fetcher = Reactome_Fetcher() genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = fetcher.analyze(genes) print(result.significant_pathways(fdr_threshold=0.01).as_dataframe())

analyze_projection

analyze_projection(
    identifiers: List[str],
    species: Optional[str] = None,
    interactors: bool = False,
    page_size: int = 100,
    sort_by: str = "ENTITIES_FDR",
    order: str = "ASC",
    resource: str = "TOTAL",
    p_value: float = 1.0,
    include_disease: bool = True,
) -> ReactomeFetchedData

Analyze identifiers and project results to Homo sapiens.

This is useful for analyzing data from other species while viewing results in the context of human pathways.

Parameters:

Name Type Description Default
identifiers List[str]

List of identifiers.

required
species Optional[str]

Source species name (for mapping).

None
interactors bool

Include interactors.

False
page_size int

Results per page.

100
sort_by str

Sort field.

'ENTITIES_FDR'
order str

Sort order.

'ASC'
resource str

Resource filter.

'TOTAL'
p_value float

P-value cutoff.

1.0
include_disease bool

Include disease pathways.

True

Returns:

Type Description
ReactomeFetchedData

ReactomeFetchedData with human-projected pathway results.

analyze_single

analyze_single(
    identifier: str,
    species: Optional[str] = None,
    interactors: bool = False,
) -> ReactomeFetchedData

Analyze a single identifier across species.

Parameters:

Name Type Description Default
identifier str

Single identifier to analyze.

required
species Optional[str]

Species filter.

None
interactors bool

Include interactors.

False

Returns:

Type Description
ReactomeFetchedData

ReactomeFetchedData with pathways containing the identifier.

get_result_by_token

get_result_by_token(
    token: str,
    species: Optional[str] = None,
    page_size: int = 100,
    page: int = 1,
    sort_by: str = "ENTITIES_FDR",
    order: str = "ASC",
    resource: str = "TOTAL",
    p_value: float = 1.0,
) -> ReactomeFetchedData

Retrieve analysis results by token.

Parameters:

Name Type Description Default
token str

Analysis token from previous analysis.

required
species Optional[str]

Species filter.

None
page_size int

Results per page.

100
page int

Page number.

1
sort_by str

Sort field.

'ENTITIES_FDR'
order str

Sort order.

'ASC'
resource str

Resource filter.

'TOTAL'
p_value float

P-value cutoff.

1.0

Returns:

Type Description
ReactomeFetchedData

ReactomeFetchedData with analysis results.

get_found_entities

get_found_entities(
    token: str, pathway_id: str
) -> List[Dict[str, Any]]

Get entities found in a specific pathway.

Parameters:

Name Type Description Default
token str

Analysis token.

required
pathway_id str

Pathway stable ID (e.g., "R-HSA-123456").

required

Returns:

Type Description
List[Dict[str, Any]]

List of found entity dictionaries.

get_not_found_identifiers

get_not_found_identifiers(token: str) -> List[str]

Get identifiers that were not found in Reactome.

Parameters:

Name Type Description Default
token str

Analysis token.

required

Returns:

Type Description
List[str]

List of unmapped identifier strings.

download_results_json

download_results_json(token: str) -> Dict[str, Any]

Download complete analysis results as JSON.

Parameters:

Name Type Description Default
token str

Analysis token.

required

Returns:

Type Description
Dict[str, Any]

Complete analysis results dictionary.

map_identifiers

map_identifiers(
    identifiers: List[str], interactors: bool = False
) -> List[Dict[str, Any]]

Map identifiers to Reactome entities without analysis.

Parameters:

Name Type Description Default
identifiers List[str]

List of identifiers to map.

required
interactors bool

Include interactor mapping.

False

Returns:

Type Description
List[Dict[str, Any]]

List of mapped entity dictionaries.

get_pathways_top

get_pathways_top(
    species: Optional[str] = None,
) -> ReactomePathwaysData

Get top-level pathways for a species.

Parameters:

Name Type Description Default
species Optional[str]

Species name (e.g., "Homo sapiens").

None

Returns:

Type Description
ReactomePathwaysData

ReactomePathwaysData with top-level pathway information.

Example

fetcher = Reactome_Fetcher() pathways = fetcher.get_pathways_top("Homo sapiens") print(pathways.get_pathway_names())

get_events_hierarchy

get_events_hierarchy(
    species: Optional[str] = None,
) -> List[Dict[str, Any]]

Get full event hierarchy for a species.

Parameters:

Name Type Description Default
species Optional[str]

Species name.

None

Returns:

Type Description
List[Dict[str, Any]]

List of event hierarchy dictionaries.

get_pathways_for_entity

get_pathways_for_entity(
    entity_id: str,
) -> ReactomePathwaysData

Get pathways containing a specific entity.

Parameters:

Name Type Description Default
entity_id str

Entity identifier (UniProt, gene symbol, etc.).

required

Returns:

Type Description
ReactomePathwaysData

ReactomePathwaysData with pathways containing the entity.

get_species

get_species() -> ReactomeSpeciesData

Get all species in Reactome.

Returns:

Type Description
ReactomeSpeciesData

ReactomeSpeciesData with species information.

Example

fetcher = Reactome_Fetcher() species = fetcher.get_species() print(species.get_species_names()[:10])

get_species_main

get_species_main() -> ReactomeSpeciesData

Get main species with curated or computationally inferred pathways.

Returns:

Type Description
ReactomeSpeciesData

ReactomeSpeciesData with main species information.

get_database_version

get_database_version() -> str

Get current Reactome database version.

Returns:

Type Description
str

Database version string.

query_entry

query_entry(entry_id: str) -> Dict[str, Any]

Query a Reactome entry by ID.

Parameters:

Name Type Description Default
entry_id str

Reactome stable ID (e.g., "R-HSA-123456").

required

Returns:

Type Description
Dict[str, Any]

Entry details dictionary.

get_participants

get_participants(event_id: str) -> List[Dict[str, Any]]

Get all participants in an event (pathway/reaction).

Parameters:

Name Type Description Default
event_id str

Reactome stable ID (e.g., "R-HSA-69278").

required

Returns:

Type Description
List[Dict[str, Any]]

List of participant dictionaries with physical entity info.

Example

fetcher = Reactome_Fetcher() participants = fetcher.get_participants("R-HSA-69278") for p in participants[:3]: ... print(p.get("displayName"))

get_participants_physical_entities

get_participants_physical_entities(
    event_id: str,
) -> List[Dict[str, Any]]

Get participating physical entities in an event.

Parameters:

Name Type Description Default
event_id str

Reactome stable ID.

required

Returns:

Type Description
List[Dict[str, Any]]

List of physical entity dictionaries.

get_participants_reference_entities

get_participants_reference_entities(
    event_id: str,
) -> List[Dict[str, Any]]

Get reference entities (genes/proteins) for an event.

This returns the external database references (UniProt, NCBI Gene, etc.) for all participants in a pathway or reaction.

Parameters:

Name Type Description Default
event_id str

Reactome stable ID (e.g., "R-HSA-69278").

required

Returns:

Type Description
List[Dict[str, Any]]

List of reference entity dictionaries containing: - identifier: External ID (e.g., UniProt accession) - databaseName: Source database (e.g., "UniProt") - displayName: Human-readable name - geneName: Gene symbol (if available)

Example

fetcher = Reactome_Fetcher() refs = fetcher.get_participants_reference_entities("R-HSA-69278") for ref in refs[:5]: ... print(f"{ref.get('geneName')}: {ref.get('identifier')}")

get_pathway_genes

get_pathway_genes(
    pathway_id: str, id_type: str = "gene_symbol"
) -> List[str]

Get gene identifiers for a pathway.

Convenience method that extracts gene IDs from reference entities.

Parameters:

Name Type Description Default
pathway_id str

Reactome pathway stable ID.

required
id_type str

Type of ID to return: - "gene_symbol": Gene symbols (default) - "uniprot": UniProt accessions - "all": Return dict with all available IDs

'gene_symbol'

Returns:

Type Description
List[str]

List of gene identifiers.

Example

fetcher = Reactome_Fetcher() genes = fetcher.get_pathway_genes("R-HSA-69278") print(genes[:10]) ['TP53', 'MDM2', 'CDKN1A', ...]

get_all_pathways_with_genes

get_all_pathways_with_genes(
    species: Optional[str] = None,
    id_type: str = "gene_symbol",
    include_hierarchy: bool = True,
) -> Dict[str, tuple]

Get all pathways with their gene members for a species.

This method builds a complete pathway-gene mapping suitable for local over-representation analysis.

Parameters:

Name Type Description Default
species Optional[str]

Species name (e.g., "Homo sapiens").

None
id_type str

Gene ID type ("gene_symbol" or "uniprot").

'gene_symbol'
include_hierarchy bool

If True, include all pathways in hierarchy. If False, only top-level pathways.

True

Returns:

Type Description
Dict[str, tuple]

Dict mapping pathway_id -> (pathway_name, set of gene IDs).

Example

fetcher = Reactome_Fetcher() pathways = fetcher.get_all_pathways_with_genes("Homo sapiens") for pid, (name, genes) in list(pathways.items())[:3]: ... print(f"{pid}: {name} ({len(genes)} genes)")

Note

This method makes many API calls and may take several minutes for species with many pathways. Results should be cached.

get_event_ancestors

get_event_ancestors(event_id: str) -> List[Dict[str, Any]]

Get ancestor pathways for an event.

Parameters:

Name Type Description Default
event_id str

Reactome stable ID.

required

Returns:

Type Description
List[Dict[str, Any]]

List of ancestor pathway dictionaries.

get_complex_subunits

get_complex_subunits(
    complex_id: str,
) -> List[Dict[str, Any]]

Get subunits of a complex.

Parameters:

Name Type Description Default
complex_id str

Reactome complex stable ID.

required

Returns:

Type Description
List[Dict[str, Any]]

List of subunit dictionaries.

get_entity_component_of

get_entity_component_of(
    entity_id: str,
) -> List[Dict[str, Any]]

Get complexes/sets that contain an entity.

Parameters:

Name Type Description Default
entity_id str

Reactome entity stable ID.

required

Returns:

Type Description
List[Dict[str, Any]]

List of container entity dictionaries.

get_entity_other_forms

get_entity_other_forms(
    entity_id: str,
) -> List[Dict[str, Any]]

Get other forms of a physical entity.

Parameters:

Name Type Description Default
entity_id str

Reactome entity stable ID.

required

Returns:

Type Description
List[Dict[str, Any]]

List of other form dictionaries.

get_diseases

get_diseases() -> List[Dict[str, Any]]

Get all disease objects in Reactome.

Returns:

Type Description
List[Dict[str, Any]]

List of disease dictionaries.

get_diseases_doid

get_diseases_doid() -> List[str]

Get all Disease Ontology IDs (DOIDs) in Reactome.

Returns:

Type Description
List[str]

List of DOID strings.

map_to_reactions

map_to_reactions(
    identifier: str, resource: str = "UniProt"
) -> List[Dict[str, Any]]

Map an identifier to Reactome reactions.

Parameters:

Name Type Description Default
identifier str

External identifier (e.g., UniProt accession).

required
resource str

Source database ("UniProt", "NCBI", "ENSEMBL", etc.).

'UniProt'

Returns:

Type Description
List[Dict[str, Any]]

List of reaction dictionaries.

DO_Fetcher

DO_Fetcher

DO_Fetcher()

Fetcher for Disease Ontology API.

Provides access to disease ontology data via two APIs
  • Direct DO API for basic metadata
  • EBI Ontology Lookup Service (OLS) for comprehensive queries
Example
fetcher = DO_Fetcher()

# Get disease term by DOID
term = fetcher.get_term("DOID:162")  # Cancer
print(term.as_dataframe())

# Search for diseases
results = fetcher.search("cancer")
print(results.get_doids())

# Get term hierarchy
parents = fetcher.get_parents("DOID:162")
children = fetcher.get_children("DOID:162")

# Get cross-references
term = fetcher.get_term("DOID:162")
print(term.terms[0].mesh_id)  # Get MeSH ID
print(term.terms[0].umls_cui)  # Get UMLS CUI

Initialize Disease Ontology fetcher.

get_term

get_term(doid: str, use_ols: bool = True) -> DOFetchedData

Get a disease term by DOID.

Parameters:

Name Type Description Default
doid str

Disease Ontology ID (e.g., "DOID:162", "162", "DOID_162").

required
use_ols bool

If True, use OLS API for more detailed data.

True

Returns:

Type Description
DOFetchedData

DOFetchedData with the disease term.

Example
fetcher = DO_Fetcher()
term = fetcher.get_term("DOID:162")  # Cancer
print(term.terms[0].name)
# 'cancer'

get_terms

get_terms(
    doids: List[str], use_ols: bool = True
) -> DOFetchedData

Get multiple disease terms by DOIDs.

Parameters:

Name Type Description Default
doids List[str]

List of Disease Ontology IDs.

required
use_ols bool

If True, use OLS API for more detailed data.

True

Returns:

Type Description
DOFetchedData

DOFetchedData with all disease terms.

Example
fetcher = DO_Fetcher()
terms = fetcher.get_terms(["DOID:162", "DOID:10283"])
print(terms.get_names())

get_all_terms

get_all_terms(
    page: int = 0, page_size: int = 100
) -> DOFetchedData

Get all disease terms from the ontology (paginated).

Parameters:

Name Type Description Default
page int

Page number (0-indexed).

0
page_size int

Number of terms per page.

100

Returns:

Type Description
DOFetchedData

DOFetchedData with disease terms.

search

search(
    query: str,
    exact: bool = False,
    rows: int = 20,
    start: int = 0,
    obsoletes: bool = False,
) -> DOSearchFetchedData

Search for disease terms.

Parameters:

Name Type Description Default
query str

Search query string.

required
exact bool

If True, search for exact matches only.

False
rows int

Maximum number of results to return.

20
start int

Starting offset for pagination.

0
obsoletes bool

If True, include obsolete terms.

False

Returns:

Type Description
DOSearchFetchedData

DOSearchFetchedData with search results.

Example
fetcher = DO_Fetcher()
results = fetcher.search("breast cancer")
print(results.get_doids())

search_by_xref

search_by_xref(
    database: str, external_id: str
) -> DOSearchFetchedData

Search for disease terms by external database reference.

Parameters:

Name Type Description Default
database str

Database name (e.g., "MESH", "UMLS_CUI", "ICD10CM").

required
external_id str

ID in the external database.

required

Returns:

Type Description
DOSearchFetchedData

DOSearchFetchedData with matching terms.

Example
fetcher = DO_Fetcher()
results = fetcher.search_by_xref("MESH", "D001943")  # Breast cancer

get_parents

get_parents(doid: str) -> DOFetchedData

Get parent terms of a disease.

Parameters:

Name Type Description Default
doid str

Disease Ontology ID.

required

Returns:

Type Description
DOFetchedData

DOFetchedData with parent terms.

Example
fetcher = DO_Fetcher()
parents = fetcher.get_parents("DOID:1612")  # Breast cancer
for term in parents.terms:
    print(f"{term.doid}: {term.name}")

get_children

get_children(doid: str) -> DOFetchedData

Get child terms of a disease.

Parameters:

Name Type Description Default
doid str

Disease Ontology ID.

required

Returns:

Type Description
DOFetchedData

DOFetchedData with child terms.

Example
fetcher = DO_Fetcher()
children = fetcher.get_children("DOID:162")  # Cancer
print(f"Cancer has {len(children)} child terms")

get_ancestors

get_ancestors(doid: str) -> DOFetchedData

Get all ancestor terms of a disease.

Parameters:

Name Type Description Default
doid str

Disease Ontology ID.

required

Returns:

Type Description
DOFetchedData

DOFetchedData with ancestor terms.

get_descendants

get_descendants(doid: str) -> DOFetchedData

Get all descendant terms of a disease.

Parameters:

Name Type Description Default
doid str

Disease Ontology ID.

required

Returns:

Type Description
DOFetchedData

DOFetchedData with descendant terms.

get_hierarchical_parents

get_hierarchical_parents(doid: str) -> DOFetchedData

Get hierarchical parent terms (includes part_of relationships).

Parameters:

Name Type Description Default
doid str

Disease Ontology ID.

required

Returns:

Type Description
DOFetchedData

DOFetchedData with hierarchical parent terms.

get_hierarchical_children

get_hierarchical_children(doid: str) -> DOFetchedData

Get hierarchical child terms (includes part_of relationships).

Parameters:

Name Type Description Default
doid str

Disease Ontology ID.

required

Returns:

Type Description
DOFetchedData

DOFetchedData with hierarchical child terms.

get_ontology_info

get_ontology_info() -> Dict[str, Any]

Get Disease Ontology metadata.

Returns:

Type Description
Dict[str, Any]

Dictionary with ontology information.

Example
fetcher = DO_Fetcher()
info = fetcher.get_ontology_info()
print(info.get("config", {}).get("title"))

doid_to_mesh

doid_to_mesh(doids: List[str]) -> Dict[str, Optional[str]]

Convert DOIDs to MeSH IDs.

Parameters:

Name Type Description Default
doids List[str]

List of Disease Ontology IDs.

required

Returns:

Type Description
Dict[str, Optional[str]]

Dictionary mapping DOIDs to MeSH IDs.

Example
fetcher = DO_Fetcher()
mapping = fetcher.doid_to_mesh(["DOID:162", "DOID:1612"])
print(mapping)

doid_to_umls

doid_to_umls(doids: List[str]) -> Dict[str, Optional[str]]

Convert DOIDs to UMLS CUIs.

Parameters:

Name Type Description Default
doids List[str]

List of Disease Ontology IDs.

required

Returns:

Type Description
Dict[str, Optional[str]]

Dictionary mapping DOIDs to UMLS CUIs.

doid_to_icd10

doid_to_icd10(doids: List[str]) -> Dict[str, Optional[str]]

Convert DOIDs to ICD-10 codes.

Parameters:

Name Type Description Default
doids List[str]

List of Disease Ontology IDs.

required

Returns:

Type Description
Dict[str, Optional[str]]

Dictionary mapping DOIDs to ICD-10 codes.

EnrichR_Fetcher

EnrichR_Fetcher

EnrichR_Fetcher(organism: str = 'human')

Fetcher for EnrichR gene set enrichment analysis API.

EnrichR provides enrichment analysis against 200+ gene set libraries covering pathways, ontologies, transcription factors, and more.

Supported organisms:

  • human (default)
  • mouse
  • fly (FlyEnrichr)
  • yeast (YeastEnrichr)
  • worm (WormEnrichr)
  • fish (FishEnrichr)
Example
fetcher = EnrichR_Fetcher()

# Get available gene set libraries
libraries = fetcher.get_libraries()
print(libraries.get_library_names()[:10])

# Perform enrichment analysis
genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"]
result = fetcher.enrich(genes, library="KEGG_2021_Human")
print(result.significant_terms().get_term_names())

Initialize EnrichR fetcher.

Parameters:

Name Type Description Default
organism str

Target organism (human, mouse, fly, yeast, worm, fish).

'human'

set_organism

set_organism(organism: str)

Change the target organism.

Parameters:

Name Type Description Default
organism str

Target organism (human, mouse, fly, yeast, worm, fish).

required

get_libraries

get_libraries() -> EnrichRLibrariesData

Get available gene set libraries and their statistics.

Returns:

Type Description
EnrichRLibrariesData

EnrichRLibrariesData containing library information.

Example

fetcher = EnrichR_Fetcher() libs = fetcher.get_libraries() kegg_libs = libs.search("KEGG") print(kegg_libs.get_library_names())

enrich

enrich(
    genes: List[str],
    library: str,
    description: str = "biodbs gene list",
) -> EnrichRFetchedData

Perform enrichment analysis against a gene set library.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols to analyze.

required
library str

Name of the gene set library (e.g., "KEGG_2021_Human").

required
description str

Description for the gene list.

'biodbs gene list'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData containing enrichment results.

Example

fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = fetcher.enrich(genes, "KEGG_2021_Human") top = result.top_terms(5) print(top.get_term_names())

enrich_multiple

enrich_multiple(
    genes: List[str],
    libraries: List[str],
    description: str = "biodbs gene list",
) -> Dict[str, EnrichRFetchedData]

Perform enrichment analysis against multiple libraries.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols to analyze.

required
libraries List[str]

List of library names to query.

required
description str

Description for the gene list.

'biodbs gene list'

Returns:

Type Description
Dict[str, EnrichRFetchedData]

Dictionary mapping library names to EnrichRFetchedData.

Example

fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1", "EGFR"] results = fetcher.enrich_multiple( ... genes, ... ["KEGG_2021_Human", "GO_Biological_Process_2023"] ... ) for lib, data in results.items(): ... print(f"{lib}: {len(data)} terms")

enrich_with_background

enrich_with_background(
    genes: List[str],
    background: List[str],
    library: str,
    description: str = "biodbs gene list",
) -> EnrichRFetchedData

Perform enrichment analysis with a custom background gene set.

Uses the speedrichr API for background enrichment.

Parameters:

Name Type Description Default
genes List[str]

List of query gene symbols.

required
background List[str]

List of background gene symbols.

required
library str

Name of the gene set library.

required
description str

Description for the gene list.

'biodbs gene list'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData containing enrichment results.

Example

fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1"] background = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS", ...] result = fetcher.enrich_with_background( ... genes, background, "GO_Biological_Process_2023" ... )

view_gene_list

view_gene_list(user_list_id: int) -> List[str]

Retrieve a previously submitted gene list.

Parameters:

Name Type Description Default
user_list_id int

The userListId from a previous addList call.

required

Returns:

Type Description
List[str]

List of gene symbols.

get_gene_map

get_gene_map(gene: str, library: str) -> Dict[str, Any]

Get gene set membership for a single gene.

Parameters:

Name Type Description Default
gene str

Gene symbol.

required
library str

Gene set library name.

required

Returns:

Type Description
Dict[str, Any]

Dictionary with gene set membership information.

export_results

export_results(
    user_list_id: int,
    library: str,
    filename: str = "enrichr_results",
) -> str

Export enrichment results as text.

Parameters:

Name Type Description Default
user_list_id int

The userListId from a previous addList call.

required
library str

Gene set library name.

required
filename str

Output filename (without extension).

'enrichr_results'

Returns:

Type Description
str

Tab-separated enrichment results as string.

enrich_kegg

enrich_kegg(
    genes: List[str], year: str = "2021"
) -> EnrichRFetchedData

Perform KEGG pathway enrichment.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols.

required
year str

KEGG library year version.

'2021'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData with KEGG pathway enrichment.

enrich_go_bp

enrich_go_bp(
    genes: List[str], year: str = "2023"
) -> EnrichRFetchedData

Perform GO Biological Process enrichment.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols.

required
year str

GO library year version.

'2023'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData with GO BP enrichment.

enrich_go_mf

enrich_go_mf(
    genes: List[str], year: str = "2023"
) -> EnrichRFetchedData

Perform GO Molecular Function enrichment.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols.

required
year str

GO library year version.

'2023'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData with GO MF enrichment.

enrich_go_cc

enrich_go_cc(
    genes: List[str], year: str = "2023"
) -> EnrichRFetchedData

Perform GO Cellular Component enrichment.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols.

required
year str

GO library year version.

'2023'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData with GO CC enrichment.

enrich_reactome

enrich_reactome(
    genes: List[str], year: str = "2022"
) -> EnrichRFetchedData

Perform Reactome pathway enrichment.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols.

required
year str

Reactome library year version.

'2022'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData with Reactome enrichment.

enrich_wikipathways

enrich_wikipathways(
    genes: List[str], year: str = "2023"
) -> EnrichRFetchedData

Perform WikiPathways enrichment.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols.

required
year str

WikiPathways library year version.

'2023'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData with WikiPathways enrichment.

HGNC_Fetcher

HGNC_Fetcher

Fetcher for the HGNC REST API (rest.genenames.org).

The HGNC (HUGO Gene Nomenclature Committee) REST API provides authoritative human gene nomenclature data: approved symbols, names, aliases, previous symbols, and cross-references to Ensembl, NCBI Gene, UniProt, OMIM, etc.

Three endpoints are exposed:

  • info — service metadata (last update, document count, field lists).
  • fetch — exact-match lookup by any stored field; returns full records.
  • search — wildcard / boolean query; returns lightweight summaries (hgnc_id, symbol, score only).

Rate limit: 10 requests per second (enforced automatically).

Example::

fetcher = HGNC_Fetcher()

# Exact lookup by symbol
data = fetcher.fetch("symbol", "TP53")
entry = data[0]          # HGNCEntry
print(entry.hgnc_id)     # "HGNC:11998"
print(entry.entrez_id)   # "7157"

# Wildcard search
hits = fetcher.search("symbol", "ZNF*")
print(hits.num_found)    # many zinc-finger genes

# Service metadata
info = fetcher.info()
print(info["response"]["numDoc"])

info

info() -> dict

Retrieve HGNC service metadata.

Returns the raw parsed JSON dict which contains
  • lastModified: timestamp of last database update
  • numDoc: total number of records
  • searchableFields: list of fields that can be queried
  • storedFields: list of fields returned by fetch

Returns:

Type Description
dict

Raw JSON dict from /info.

Raises:

Type Description
APIError

On HTTP errors.

fetch

fetch(field: str, term: str) -> HGNCFetchedData

Exact-match lookup by any stored field.

Returns full gene records for all entries where field exactly equals term. No wildcard expansion is performed.

Parameters:

Name Type Description Default
field str

HGNC stored field name (e.g. "symbol", "hgnc_id", "ensembl_gene_id", "entrez_id", "uniprot_ids").

required
term str

Exact value to match.

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData containing full :class:HGNCEntry records.

Raises:

Type Description
APIValidationError

If the field name is not recognised (HTTP 400).

APIError

On other HTTP errors.

Example::

data = fetcher.fetch("symbol", "BRCA1")
print(data[0].ensembl_gene_id)  # ENSG00000012048

search

search(
    query_or_field: str, term: Optional[str] = None
) -> HGNCFetchedData

Wildcard / boolean search.

Two calling styles are supported:

  1. Free-form query: search("symbol:ZNF* AND status:Approved")
  2. Field + term: search("symbol", "ZNF*")
Wildcard characters
  • * — zero or more characters
  • ? — exactly one character

Boolean operators: AND, OR, NOT (URL-encoded as +AND+, +OR+, +NOT+ internally).

Note

Search responses contain only hgnc_id, symbol, and score. Use :meth:fetch to retrieve complete records.

Parameters:

Name Type Description Default
query_or_field str

A full Solr query string, OR a field name when term is also provided.

required
term Optional[str]

The search term for the given field. Leave None when passing a full query string as the first argument.

None

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with is_search=True; items are plain

HGNCFetchedData

dicts with hgnc_id, symbol, score.

Raises:

Type Description
APIValidationError

On an invalid query (HTTP 400).

APIError

On other HTTP errors.

Example::

# All ZNF genes
hits = fetcher.search("symbol", "ZNF*")

# Approved genes on chromosome 17
hits = fetcher.search("status:Approved+AND+location:17*")

ClinVar_Fetcher

ClinVar_Fetcher

ClinVar_Fetcher(api_key: Optional[str] = None)

Fetcher for the ClinVar E-utilities API.

Wraps the four E-utility endpoints that ClinVar supports (esearch, esummary, efetch, elink) with rate limiting and optional API key authentication.

Parameters:

Name Type Description Default
api_key Optional[str]

NCBI API key for 10 req/s (vs. 3 req/s without). Falls back to the NCBI_API_KEY environment variable.

None

Example::

fetcher = ClinVar_Fetcher()

# Search for all pathogenic BRCA1 variants
uids = fetcher.search("BRCA1[gene] AND pathogenic[clnsig]")

# Fetch summaries for the first 10
data = fetcher.fetch_summary(uids[:10])
print(data.as_dataframe())

# One-step helper
data = fetcher.search_gene("TP53", retmax=100)
for v in data:
    print(v.accession, v.clinical_significance)

search

search(
    query: str, retmax: int = 500, retstart: int = 0
) -> List[str]

Find ClinVar variation UIDs matching an Entrez query.

Uses the same query language as the ClinVar website, so you can test a query interactively before automating it.

Common field tags:

  • BRCA1[gene] — variants in a specific gene
  • pathogenic[clnsig] — by clinical significance
  • single_gene[prop] — single-gene variants only
  • "Breast cancer"[dis] — by associated disease

Parameters:

Name Type Description Default
query str

Entrez query string (e.g. "BRCA1[gene] AND pathogenic[clnsig]").

required
retmax int

Maximum UIDs to return (default 500; max 10 000).

500
retstart int

Zero-based offset for pagination.

0

Returns:

Type Description
List[str]

List of variation UID strings.

Example::

uids = fetcher.search("TP53[gene] AND pathogenic[clnsig]",
                      retmax=200)

count

count(query: str) -> int

Return the total number of ClinVar records matching query.

Performs an esearch with retmax=0 so no IDs are transferred.

Parameters:

Name Type Description Default
query str

Entrez query string.

required

Returns:

Type Description
int

Integer count of matching records.

fetch_summary

fetch_summary(
    ids: List[Union[str, int]], total_count: int = 0
) -> ClinVarFetchedData

Retrieve document summaries for a list of variation UIDs.

Calls esummary with retmode=json to obtain structured data including clinical significance, gene associations, conditions, and genomic coordinates.

Parameters:

Name Type Description Default
ids List[Union[str, int]]

ClinVar variation UIDs (integers or strings).

required
total_count int

Optional total hit count from a preceding esearch, stored on the returned object for reference.

0

Returns:

Type Description
ClinVarFetchedData

class:ClinVarFetchedData with one :class:ClinVarVariant

ClinVarFetchedData

per UID.

Raises:

Type Description
APIError

On HTTP errors.

Example::

data = fetcher.fetch_summary(["65533", "14206"])
for v in data:
    print(v.title, v.clinical_significance)

fetch_vcv

fetch_vcv(accession: str) -> str

Retrieve the full VCV XML record for a variation.

Parameters:

Name Type Description Default
accession str

VCV accession with or without version (e.g. "VCV000014206" or "VCV000014206.3").

required

Returns:

Type Description
str

Raw XML string.

Example::

xml = fetcher.fetch_vcv("VCV000014206")

fetch_rcv

fetch_rcv(accession: str) -> str

Retrieve the full RCV XML record for a variation-condition pair.

Parameters:

Name Type Description Default
accession str

RCV accession with or without version (e.g. "RCV000000606" or "RCV000000606.3").

required

Returns:

Type Description
str

Raw XML string.

Example::

xml = fetcher.fetch_rcv("RCV000000606")
link_to_pubmed(variation_id: Union[str, int]) -> List[str]

Return PubMed UIDs linked to a ClinVar variation.

Parameters:

Name Type Description Default
variation_id Union[str, int]

ClinVar variation UID.

required

Returns:

Type Description
List[str]

List of PubMed UID strings.

search_gene

search_gene(
    gene_symbol: str,
    single_gene: bool = True,
    retmax: int = 500,
    clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData

Search for variants in a gene and return summaries in one step.

Parameters:

Name Type Description Default
gene_symbol str

HGNC gene symbol (e.g. "BRCA1").

required
single_gene bool

If True (default), restrict to variants assigned to a single gene (single_gene[prop]).

True
retmax int

Maximum number of variants to return.

500
clinical_significance Optional[str]

Optional filter, e.g. "pathogenic", "likely pathogenic", "benign". Maps to the [clnsig] Entrez field tag.

None

Returns:

Type Description
ClinVarFetchedData

class:ClinVarFetchedData ready to iterate or convert.

Example::

data = fetcher.search_gene("TP53", retmax=200,
                           clinical_significance="pathogenic")
print(data.as_dataframe()[["accession", "title",
                           "clinical_significance"]])

search_condition

search_condition(
    condition: str,
    retmax: int = 500,
    clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData

Search for variants associated with a disease/condition.

Parameters:

Name Type Description Default
condition str

Disease or condition name (e.g. "Breast cancer").

required
retmax int

Maximum number of variants to return.

500
clinical_significance Optional[str]

Optional significance filter.

None

Returns:

Type Description
ClinVarFetchedData

class:ClinVarFetchedData.

Example::

data = fetcher.search_condition("Lynch syndrome", retmax=100)

UniProt

uniprot_get_entry

uniprot_get_entry

uniprot_get_entry(accession: str) -> UniProtFetchedData

Get a UniProt entry by accession.

Parameters:

Name Type Description Default
accession str

UniProt accession (e.g., "P05067").

required

Returns:

Type Description
UniProtFetchedData

UniProtFetchedData with the entry.

Example
entry = uniprot_get_entry("P05067")
print(entry.entries[0].protein_name)
# Amyloid-beta precursor protein
uniprot_search(
    query: str, size: int = 25, reviewed_only: bool = False
) -> UniProtSearchResult

Search UniProtKB.

Parameters:

Name Type Description Default
query str

Search query (e.g., "gene:TP53 AND organism_id:9606").

required
size int

Number of results per page (max 500).

25
reviewed_only bool

Only return reviewed (Swiss-Prot) entries.

False

Returns:

Type Description
UniProtSearchResult

UniProtSearchResult with matching entries.

Example
results = uniprot_search("kinase AND organism_id:9606", reviewed_only=True)
print(results.as_dataframe()[["accession", "gene_name"]].head())
#   accession gene_name
# 0    P00533      EGFR
# 1    P04629      NTRK1

uniprot_search_by_gene

uniprot_search_by_gene

uniprot_search_by_gene(
    gene_name: str,
    organism: Optional[Union[int, str]] = 9606,
    reviewed_only: bool = True,
) -> UniProtSearchResult

Search UniProt by gene name.

Parameters:

Name Type Description Default
gene_name str

Gene name to search.

required
organism Optional[Union[int, str]]

Organism tax ID or name (default: human).

9606
reviewed_only bool

Only return reviewed entries.

True

Returns:

Type Description
UniProtSearchResult

UniProtSearchResult with matching entries.

Example
results = uniprot_search_by_gene("TP53")
print(results.entries[0].accession)
# P04637

gene_to_uniprot

gene_to_uniprot

gene_to_uniprot(
    gene_names: List[str],
    organism: int = 9606,
    reviewed_only: bool = True,
    return_dict: bool = True,
) -> Union[Dict[str, str], DataFrame]

Map gene names to UniProt accessions.

Parameters:

Name Type Description Default
gene_names List[str]

List of gene names.

required
organism int

Organism tax ID (default: human).

9606
reviewed_only bool

Only return reviewed entries.

True
return_dict bool

If True, return dict. If False, return DataFrame.

True

Returns:

Type Description
Union[Dict[str, str], DataFrame]

Dictionary or DataFrame mapping gene names to accessions.

Example
mapping = gene_to_uniprot(["TP53", "BRCA1", "EGFR"])
print(mapping)
# {'TP53': 'P04637', 'BRCA1': 'P38398', 'EGFR': 'P00533'}

uniprot_map_ids

uniprot_map_ids

uniprot_map_ids(
    ids: List[str], from_db: str, to_db: str
) -> Dict[str, List[str]]

Map IDs between databases using UniProt ID mapping.

Parameters:

Name Type Description Default
ids List[str]

List of IDs to map.

required
from_db str

Source database (e.g., "UniProtKB_AC-ID", "Gene_Name", "GeneID", "Ensembl").

required
to_db str

Target database (e.g., "UniProtKB", "GeneID", "PDB", "Ensembl").

required

Returns:

Type Description
Dict[str, List[str]]

Dictionary mapping input IDs to lists of output IDs.

Common database names
  • UniProtKB_AC-ID: UniProt accession
  • UniProtKB: UniProt (returns full entries)
  • Gene_Name: Gene name
  • GeneID: NCBI Gene ID
  • Ensembl: Ensembl ID
  • PDB: PDB structure ID
  • RefSeq_Protein: RefSeq protein ID
Example
mapping = uniprot_map_ids(["P05067", "P04637"], "UniProtKB_AC-ID", "GeneID")
print(mapping)
# {'P05067': ['351'], 'P04637': ['7157']}

PubChem

pubchem_get_compound

pubchem_get_compound

pubchem_get_compound(cid: int) -> PUGRestFetchedData

Get compound data by PubChem CID.

Parameters:

Name Type Description Default
cid int

PubChem Compound ID.

required

Returns:

Type Description
PUGRestFetchedData

PUGRestFetchedData containing compound information.

Example

data = pubchem_get_compound(2244) # Aspirin df = data.as_dataframe()

pubchem_search_by_name

pubchem_search_by_name

pubchem_search_by_name(name: str) -> PUGRestFetchedData

Search compounds by name.

Parameters:

Name Type Description Default
name str

Compound name to search.

required

Returns:

Type Description
PUGRestFetchedData

PUGRestFetchedData containing matching compounds.

Example

data = pubchem_search_by_name("aspirin") cids = data.get_cids()

pubchem_get_properties

pubchem_get_properties

pubchem_get_properties(
    cids: Union[int, List[int]],
    properties: Optional[List[str]] = None,
) -> PUGRestFetchedData

Get specific properties for compounds.

Parameters:

Name Type Description Default
cids Union[int, List[int]]

Single CID or list of CIDs.

required
properties Optional[List[str]]

List of property names. If None, returns common properties.

None

Returns:

Type Description
PUGRestFetchedData

PUGRestFetchedData containing property values.

Example

data = pubchem_get_properties(2244, ["MolecularWeight", "MolecularFormula"]) df = data.as_dataframe()


Ensembl

ensembl_lookup

ensembl_lookup

ensembl_lookup(
    id: str,
    species: Optional[str] = None,
    expand: bool = False,
    db_type: str = "core",
) -> EnsemblFetchedData

Look up an Ensembl stable ID.

Parameters:

Name Type Description Default
id str

Ensembl stable ID (e.g., "ENSG00000141510").

required
species Optional[str]

Species name (optional, auto-detected from ID).

None
expand bool

If True, include connected features (transcripts, exons).

False
db_type str

Database type ("core" or "otherfeatures").

'core'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData containing gene/transcript/protein information.

Example
data = ensembl_lookup("ENSG00000141510", expand=True)
print(data.results[0]["display_name"])  # TP53

ensembl_lookup_symbol

ensembl_lookup_symbol

ensembl_lookup_symbol(
    species: str, symbol: str, expand: bool = False
) -> EnsemblFetchedData

Look up a gene by symbol.

Parameters:

Name Type Description Default
species str

Species name (e.g., "human", "mouse").

required
symbol str

Gene symbol (e.g., "BRCA2", "TP53").

required
expand bool

If True, include connected features.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData containing gene information.

Example
data = ensembl_lookup_symbol("human", "TP53")
print(data.results[0]["id"])  # ENSG00000141510

ensembl_get_sequence

ensembl_get_sequence

ensembl_get_sequence(
    id: str,
    sequence_type: str = "genomic",
    species: Optional[str] = None,
    expand_5prime: Optional[int] = None,
    expand_3prime: Optional[int] = None,
    mask: Optional[str] = None,
    format: str = "fasta",
) -> EnsemblFetchedData

Get sequence for an Ensembl stable ID.

Parameters:

Name Type Description Default
id str

Ensembl stable ID (gene, transcript, exon, protein).

required
sequence_type str

Type of sequence ("genomic", "cds", "cdna", "protein").

'genomic'
species Optional[str]

Species name (optional).

None
expand_5prime Optional[int]

Extend upstream (genomic only).

None
expand_3prime Optional[int]

Extend downstream (genomic only).

None
mask Optional[str]

Mask repeats ("hard" or "soft", genomic only).

None
format str

Output format ("fasta" or "json").

'fasta'

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData containing sequence data.

Example

data = ensembl_get_sequence("ENST00000269305", sequence_type="cds") print(data.text) # FASTA sequence

ensembl_get_xrefs

ensembl_get_xrefs

ensembl_get_xrefs(
    id: str,
    species: Optional[str] = None,
    external_db: Optional[str] = None,
    all_levels: bool = False,
) -> EnsemblFetchedData

Get external cross-references for an Ensembl ID.

Parameters:

Name Type Description Default
id str

Ensembl stable ID.

required
species Optional[str]

Species name.

None
external_db Optional[str]

Filter by external database (e.g., "HGNC", "UniProt").

None
all_levels bool

If True, find all linked features.

False

Returns:

Type Description
EnsemblFetchedData

EnsemblFetchedData containing cross-references.

Example
data = ensembl_get_xrefs("ENSG00000141510", external_db="HGNC")
print(data.results[0]["display_id"])

BioMart

biomart_get_genes

biomart_get_genes

biomart_get_genes(
    ids: List[str],
    attributes: Optional[List[str]] = None,
    dataset: str = "hsapiens_gene_ensembl",
) -> BioMartQueryData

Get gene information by Ensembl gene IDs.

Parameters:

Name Type Description Default
ids List[str]

List of Ensembl gene IDs (e.g., ["ENSG00000141510"]).

required
attributes Optional[List[str]]

Attributes to retrieve. If None, uses common gene attributes.

None
dataset str

BioMart dataset name. Defaults to human genes.

'hsapiens_gene_ensembl'

Returns:

Type Description
BioMartQueryData

BioMartQueryData containing gene information including

BioMartQueryData

gene ID, symbol, description, and coordinates.

Example

data = biomart_get_genes(["ENSG00000141510", "ENSG00000012048"]) df = data.as_dataframe() print(df[["ensembl_gene_id", "external_gene_name"]])

biomart_convert_ids

biomart_convert_ids

biomart_convert_ids(
    ids: List[str],
    from_type: str = "ensembl_gene_id",
    to_type: str = "external_gene_name",
    dataset: str = "hsapiens_gene_ensembl",
) -> BioMartQueryData

Convert between different gene ID types.

Supported ID types
  • ensembl_gene_id, ensembl_transcript_id, ensembl_peptide_id
  • external_gene_name, hgnc_symbol, hgnc_id
  • entrezgene_id, uniprot_gn_id
  • refseq_mrna, refseq_peptide

Parameters:

Name Type Description Default
ids List[str]

List of IDs to convert.

required
from_type str

Source ID type (used as filter).

'ensembl_gene_id'
to_type str

Target ID type.

'external_gene_name'
dataset str

BioMart dataset name. Defaults to human genes.

'hsapiens_gene_ensembl'

Returns:

Type Description
BioMartQueryData

BioMartQueryData containing ID mappings with

BioMartQueryData

both source and target ID columns.

Example

data = biomart_convert_ids( ... ["TP53", "BRCA1"], ... from_type="external_gene_name", ... to_type="ensembl_gene_id" ... ) df = data.as_dataframe()

biomart_query

biomart_query

biomart_query(
    dataset: str = "hsapiens_gene_ensembl",
    attributes: Optional[List[str]] = None,
    filters: Optional[
        Dict[str, Union[str, List[str]]]
    ] = None,
) -> BioMartQueryData

Execute a custom BioMart query.

Parameters:

Name Type Description Default
dataset str

BioMart dataset name.

'hsapiens_gene_ensembl'
attributes Optional[List[str]]

List of attributes to retrieve.

None
filters Optional[Dict[str, Union[str, List[str]]]]

Dict of filter name to value(s).

None

Returns:

Type Description
BioMartQueryData

BioMartQueryData containing query results.

Example

data = biomart_query( ... dataset="hsapiens_gene_ensembl", ... attributes=["ensembl_gene_id", "external_gene_name", "chromosome_name"], ... filters={"chromosome_name": "22", "biotype": "protein_coding"} ... ) df = data.as_dataframe()


KEGG

kegg_list

kegg_list

kegg_list(
    database: str, organism: Optional[str] = None
) -> KEGGFetchedData

List entries in a KEGG database.

Parameters:

Name Type Description Default
database str

Database name (e.g., "pathway", "module", "compound").

required
organism Optional[str]

Organism code for pathway/module lists (e.g., "hsa" for human).

None

Returns:

Type Description
KEGGFetchedData

KEGGFetchedData containing a list of entries with IDs and descriptions.

Example

data = kegg_list("pathway", organism="hsa") df = data.as_dataframe()

kegg_get

kegg_get

kegg_get(
    dbentries: Union[str, List[str]],
    option: Optional[str] = None,
) -> KEGGFetchedData

Retrieve entry data from KEGG database.

Parameters:

Name Type Description Default
dbentries Union[str, List[str]]

Entry ID or list of IDs (e.g., "hsa:7157").

required
option Optional[str]

Output format ("aaseq", "ntseq", "mol", "kcf", "image", "json").

None

Returns:

Type Description
KEGGFetchedData

KEGGFetchedData containing entry data.

Example

data = kegg_get("hsa:7157") # TP53 gene print(data.text)

data = kegg_get("cpd:C00022", option="mol") print(data.text)

kegg_link(
    target_db: str, source: Union[str, List[str]]
) -> KEGGFetchedData

Find related entries between KEGG databases.

Parameters:

Name Type Description Default
target_db str

Target database (e.g., "pathway", "module", "disease").

required
source Union[str, List[str]]

Source database name OR list of entry IDs.

required

Returns:

Type Description
KEGGFetchedData

KEGGFetchedData containing linked entries between databases.

Example

data = kegg_link("pathway", ["hsa:10458", "hsa:7157"]) df = data.as_dataframe()

data = kegg_link("reaction", "compound")

kegg_conv

kegg_conv

kegg_conv(
    target_db: str, source: Union[str, List[str]]
) -> KEGGFetchedData

Convert entry IDs between KEGG and external databases.

Parameters:

Name Type Description Default
target_db str

Target database (e.g., "ncbi-geneid", "ncbi-proteinid", "uniprot").

required
source Union[str, List[str]]

Source database name OR list of entry IDs to convert.

required

Returns:

Type Description
KEGGFetchedData

KEGGFetchedData containing ID mappings between databases.

Example

Convert entire database

data = kegg_conv("ncbi-geneid", "hsa")

Convert specific entries

data = kegg_conv("ncbi-geneid", ["hsa:10458", "hsa:7157"]) df = data.as_dataframe()


ChEMBL

chembl_get_molecule

chembl_get_molecule

chembl_get_molecule(chembl_id: str) -> ChEMBLFetchedData

Get molecule data by ChEMBL ID.

Parameters:

Name Type Description Default
chembl_id str

ChEMBL molecule ID (e.g., "CHEMBL25").

required

Returns:

Type Description
ChEMBLFetchedData

ChEMBLFetchedData containing molecule information including

ChEMBLFetchedData

structure, properties, and cross-references.

Example
data = chembl_get_molecule("CHEMBL25")  # Aspirin
print(data.results[0]["pref_name"])

chembl_search_molecules

chembl_search_molecules

chembl_search_molecules(
    query: str, limit: int = 100
) -> ChEMBLFetchedData

Search molecules by name, synonym, or structure.

Parameters:

Name Type Description Default
query str

Search query (name, synonym, or InChIKey).

required
limit int

Maximum number of results to return.

100

Returns:

Type Description
ChEMBLFetchedData

ChEMBLFetchedData containing matching molecules.

Example

data = chembl_search_molecules("aspirin") df = data.as_dataframe() print(df[["molecule_chembl_id", "pref_name"]].head())

chembl_get_approved_drugs

chembl_get_approved_drugs

chembl_get_approved_drugs(
    limit: int = 1000,
) -> ChEMBLFetchedData

Get list of approved drugs from ChEMBL.

Parameters:

Name Type Description Default
limit int

Maximum number of drugs to return.

1000

Returns:

Type Description
ChEMBLFetchedData

ChEMBLFetchedData containing approved drug molecules

ChEMBLFetchedData

with their names, structures, and approval information.

Example

data = chembl_get_approved_drugs(limit=100) df = data.as_dataframe() print(df[["molecule_chembl_id", "pref_name"]].head())


QuickGO

quickgo_search_annotations

quickgo_search_annotations

quickgo_search_annotations(
    go_id: Optional[str] = None,
    taxon_id: Optional[int] = None,
    gene_product_id: Optional[str] = None,
    evidence_code: Optional[str] = None,
    limit: int = 100,
) -> QuickGOFetchedData

Search GO annotations with filters.

Parameters:

Name Type Description Default
go_id Optional[str]

GO term ID to filter by.

None
taxon_id Optional[int]

NCBI taxonomy ID (e.g., 9606 for human).

None
gene_product_id Optional[str]

Gene product ID (e.g., "UniProtKB:P04637").

None
evidence_code Optional[str]

Evidence code (e.g., "IDA", "IEA").

None
limit int

Maximum number of results to return.

100

Returns:

Type Description
QuickGOFetchedData

QuickGOFetchedData containing matching GO annotations

QuickGOFetchedData

with gene products, GO terms, and evidence codes.

Example

data = quickgo_search_annotations(go_id="GO:0006915", taxon_id=9606) df = data.as_dataframe() print(df[["geneProductId", "goId", "goName"]].head())

quickgo_get_terms

quickgo_get_terms

quickgo_get_terms(
    ids: Union[str, List[str]],
) -> QuickGOFetchedData

Get GO term details by ID.

Parameters:

Name Type Description Default
ids Union[str, List[str]]

GO term ID or list of IDs (e.g., "GO:0008150" or ["GO:0008150", "GO:0003674"]).

required

Returns:

Type Description
QuickGOFetchedData

QuickGOFetchedData containing term details including

QuickGOFetchedData

name, definition, aspect, and synonyms.

Example
data = quickgo_get_terms("GO:0006915")  # apoptotic process
print(data.results[0]["name"])

HPA (Human Protein Atlas)

hpa_get_gene

hpa_get_gene

hpa_get_gene(
    gene: str, fmt: str = "json"
) -> HPAFetchedData

Get protein data for a single gene.

Parameters:

Name Type Description Default
gene str

Gene name (e.g., "TP53") or Ensembl ID (e.g., "ENSG00000141510").

required
fmt str

Response format ("json", "xml", or "tsv").

'json'

Returns:

Type Description
HPAFetchedData

HPAFetchedData containing protein information including

HPAFetchedData

expression data, antibody information, and references.

Example

data = hpa_get_gene("TP53") print(data.results[0].keys())

hpa_get_tissue_expression

hpa_get_tissue_expression

hpa_get_tissue_expression(
    genes: Union[str, List[str]],
) -> HPAFetchedData

Get tissue expression data for genes.

Parameters:

Name Type Description Default
genes Union[str, List[str]]

Gene name(s) or Ensembl ID(s).

required

Returns:

Type Description
HPAFetchedData

HPAFetchedData containing tissue expression levels

HPAFetchedData

across different human tissues and organs.

Example

data = hpa_get_tissue_expression("TP53") df = data.as_dataframe() print(df[["Gene", "Tissue", "Level"]].head())


NCBI

ncbi_get_gene

ncbi_get_gene

ncbi_get_gene(
    identifiers: List[Union[int, str]],
    taxon: Union[int, str] = "human",
    api_key: Optional[str] = None,
) -> NCBIGeneFetchedData

Get gene information from NCBI by gene IDs or symbols.

This is a convenience function that wraps the NCBI_Fetcher.

Parameters:

Name Type Description Default
identifiers List[Union[int, str]]

List of NCBI Gene IDs (integers) or gene symbols (strings).

required
taxon Union[int, str]

Taxonomy ID or name (used for symbol lookups).

'human'
api_key Optional[str]

Optional NCBI API key for higher rate limits.

None

Returns:

Type Description
NCBIGeneFetchedData

NCBIGeneFetchedData containing gene reports.

Examples:

>>> # By gene IDs
>>> genes = ncbi_get_gene([7157, 672])
>>> print(genes.as_dataframe())
>>> # By symbols
>>> genes = ncbi_get_gene(["TP53", "BRCA1"], taxon="human")
>>> print(genes.get_gene_ids())

ncbi_symbol_to_id

ncbi_symbol_to_id

ncbi_symbol_to_id(
    symbols: List[str],
    taxon: Union[int, str] = "human",
    api_key: Optional[str] = None,
    return_dict: bool = True,
) -> Union[Dict[str, int], DataFrame]

Convert gene symbols to NCBI Gene IDs.

Parameters:

Name Type Description Default
symbols List[str]

List of gene symbols.

required
taxon Union[int, str]

Taxonomy ID or name.

'human'
api_key Optional[str]

Optional NCBI API key.

None
return_dict bool

If True, return dict. If False, return DataFrame.

True

Returns:

Type Description
Union[Dict[str, int], DataFrame]

Dictionary mapping symbols to gene IDs, or DataFrame.

Example

mapping = ncbi_symbol_to_id(["TP53", "BRCA1", "EGFR"]) print(mapping)


FDA

fda_search(
    category: str,
    endpoint: str,
    search: Optional[Union[str, Dict]] = None,
    limit: int = 100,
    **kwargs: Any,
) -> FDAFetchedData

Search FDA openFDA database.

Parameters:

Name Type Description Default
category str

FDA category ("drug", "device", "food", etc.).

required
endpoint str

Endpoint within category ("event", "label", "enforcement", etc.).

required
search Optional[Union[str, Dict]]

Search query string or dict of field:value pairs.

None
limit int

Maximum results per request.

100
**kwargs Any

Additional parameters (sort, count, skip).

{}

Returns:

Type Description
FDAFetchedData

FDAFetchedData containing search results.

Example

data = fda_search("drug", "event", search="aspirin", limit=10) df = data.as_dataframe()

fda_drug_events

fda_drug_events

fda_drug_events(
    search: Optional[Union[str, Dict]] = None,
    limit: int = 100,
    **kwargs: Any,
) -> FDAFetchedData

Search FDA drug adverse event reports (FAERS).

Parameters:

Name Type Description Default
search Optional[Union[str, Dict]]

Search query (e.g., "patient.drug.openfda.brand_name:aspirin").

None
limit int

Maximum results to return.

100
**kwargs Any

Additional parameters (sort, count, skip).

{}

Returns:

Type Description
FDAFetchedData

FDAFetchedData containing adverse event reports with

FDAFetchedData

patient information, drug details, and outcomes.

Example

data = fda_drug_events(search="aspirin", limit=50) df = data.as_dataframe()


Reactome

reactome_analyze

reactome_analyze

reactome_analyze(
    identifiers: List[str],
    species: str = "Homo sapiens",
    interactors: bool = False,
    page_size: int = 100,
    sort_by: str = "ENTITIES_FDR",
    order: str = "ASC",
    resource: str = "TOTAL",
    p_value: float = 1.0,
    include_disease: bool = True,
    min_entities: Optional[int] = None,
    max_entities: Optional[int] = None,
) -> ReactomeFetchedData

Perform Reactome pathway over-representation analysis.

Parameters:

Name Type Description Default
identifiers List[str]

List of identifiers (gene symbols, UniProt IDs, etc.).

required
species str

Species name (e.g., "Homo sapiens", "Mus musculus").

'Homo sapiens'
interactors bool

Include interactors in analysis.

False
page_size int

Number of results to return.

100
sort_by str

Sort field (ENTITIES_FDR, ENTITIES_PVALUE, NAME).

'ENTITIES_FDR'
order str

Sort order (ASC, DESC).

'ASC'
resource str

Resource filter (TOTAL, UNIPROT, ENSEMBL, etc.).

'TOTAL'
p_value float

P-value cutoff for filtering.

1.0
include_disease bool

Include disease pathways.

True
min_entities Optional[int]

Minimum pathway size.

None
max_entities Optional[int]

Maximum pathway size.

None

Returns:

Type Description
ReactomeFetchedData

ReactomeFetchedData with pathway enrichment results.

Example

genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = reactome_analyze(genes) print(f"Found {len(result.pathways)} pathways") Found 172 pathways df = result.significant_pathways(fdr_threshold=0.05).as_dataframe() print(df[["stId", "name", "fdr", "found", "total"]].head(3).to_string()) stId name fdr found total 0 R-HSA-6796648 TP53 Regulates Transcription of DNA Repai... 1.08e-06 7 86 1 R-HSA-3700989 Transcriptional Regulation by TP53 6.45e-04 9 487 2 R-HSA-6806003 Regulation of TP53 Expression and Degradation 6.45e-04 4 46


Disease Ontology

do_get_term

do_get_term

do_get_term(
    doid: str, use_ols: bool = True
) -> DOFetchedData

Get a disease term by DOID.

This is a convenience function that wraps the DO_Fetcher.

Parameters:

Name Type Description Default
doid str

Disease Ontology ID (e.g., "DOID:162", "162").

required
use_ols bool

If True, use OLS API for more detailed data.

True

Returns:

Type Description
DOFetchedData

DOFetchedData containing the disease term.

Example

term = do_get_term("DOID:162") # Cancer print(term.terms[0].name) 'cancer'

do_get_children

do_get_children

do_get_children(doid: str) -> DOFetchedData

Get child terms of a disease.

Parameters:

Name Type Description Default
doid str

Disease Ontology ID.

required

Returns:

Type Description
DOFetchedData

DOFetchedData with child terms.

Example

children = do_get_children("DOID:162") # Cancer print(f"Cancer has {len(children)} child terms")


EnrichR

enrichr_enrich

enrichr_enrich

enrichr_enrich(
    genes: List[str],
    library: str,
    organism: str = "human",
    description: str = "biodbs gene list",
) -> EnrichRFetchedData

Perform gene set enrichment analysis.

Parameters:

Name Type Description Default
genes List[str]

List of gene symbols to analyze.

required
library str

Name of the gene set library (e.g., "KEGG_2021_Human").

required
organism str

Target organism (human, mouse, fly, yeast, worm, fish).

'human'
description str

Description for the gene list.

'biodbs gene list'

Returns:

Type Description
EnrichRFetchedData

EnrichRFetchedData containing enrichment results with

EnrichRFetchedData

term names, p-values, combined scores, and overlapping genes.

Example

genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = enrichr_enrich(genes, "KEGG_2021_Human") top = result.top_terms(5) print(top.get_term_names())

enrichr_get_libraries

enrichr_get_libraries

enrichr_get_libraries(
    organism: str = "human",
) -> EnrichRLibrariesData

Get available gene set libraries.

Parameters:

Name Type Description Default
organism str

Target organism (human, mouse, fly, yeast, worm, fish).

'human'

Returns:

Type Description
EnrichRLibrariesData

EnrichRLibrariesData containing library statistics including

EnrichRLibrariesData

library names, number of terms, gene coverage, and categories.

Example

libs = enrichr_get_libraries() kegg = libs.search("KEGG") print(kegg.get_library_names())


HGNC

hgnc_fetch

hgnc_fetch

hgnc_fetch(field: str, term: str) -> HGNCFetchedData

Exact-match lookup by any HGNC stored field.

Returns full gene records. No wildcard expansion — use :func:hgnc_search for wildcard queries.

Parameters:

Name Type Description Default
field str

HGNC field name (e.g. "symbol", "hgnc_id", "ensembl_gene_id", "entrez_id", "uniprot_ids").

required
term str

Exact value to match.

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData containing :class:HGNCEntry records.

Example::

data = hgnc_fetch("symbol", "TP53")
entry = data[0]
print(entry.hgnc_id, entry.entrez_id, entry.ensembl_gene_id)
hgnc_search(
    query_or_field: str, term: Optional[str] = None
) -> HGNCFetchedData

Wildcard / boolean search across HGNC records.

Returns lightweight summaries (hgnc_id, symbol, score). Use :func:hgnc_fetch to retrieve full records.

Parameters:

Name Type Description Default
query_or_field str

Full Solr query string, OR a field name when term is also given.

required
term Optional[str]

Search term for the given field (supports * and ?).

None

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with is_search=True.

Example::

# All approved TP53 family members
hits = hgnc_search("symbol", "TP53*")
print(hits.symbols())

# Boolean query
hits = hgnc_search("status:Approved+AND+locus_group:non-coding+RNA")

hgnc_fetch_by_symbol

hgnc_fetch_by_symbol

hgnc_fetch_by_symbol(symbol: str) -> HGNCFetchedData

Fetch a gene entry by its approved HGNC symbol.

Parameters:

Name Type Description Default
symbol str

Approved gene symbol (e.g. "TP53", "BRCA1").

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with the matching gene entry (usually

HGNCFetchedData

one record; zero if the symbol is not found).

Example::

data = hgnc_fetch_by_symbol("EGFR")
entry = data[0]
print(entry.name)  # "epidermal growth factor receptor"

hgnc_fetch_by_hgnc_id

hgnc_fetch_by_hgnc_id

hgnc_fetch_by_hgnc_id(hgnc_id: str) -> HGNCFetchedData

Fetch a gene entry by its HGNC ID.

Parameters:

Name Type Description Default
hgnc_id str

HGNC identifier in the form "HGNC:NNNN" (e.g. "HGNC:11998" for TP53).

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with the matching gene entry.

Example::

data = hgnc_fetch_by_hgnc_id("HGNC:11998")
print(data[0].symbol)  # "TP53"

hgnc_fetch_by_entrez_id

hgnc_fetch_by_entrez_id

hgnc_fetch_by_entrez_id(entrez_id: str) -> HGNCFetchedData

Fetch a gene entry by NCBI Entrez Gene ID.

Parameters:

Name Type Description Default
entrez_id str

NCBI Gene ID as a string (e.g. "7157" for TP53).

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with the matching gene entry.

Example::

data = hgnc_fetch_by_entrez_id("7157")
print(data[0].symbol)  # "TP53"

hgnc_fetch_by_ensembl_id

hgnc_fetch_by_ensembl_id

hgnc_fetch_by_ensembl_id(
    ensembl_id: str,
) -> HGNCFetchedData

Fetch a gene entry by Ensembl stable gene ID.

Parameters:

Name Type Description Default
ensembl_id str

Ensembl gene ID (e.g. "ENSG00000141510").

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with the matching gene entry.

Example::

data = hgnc_fetch_by_ensembl_id("ENSG00000141510")
print(data[0].symbol)  # "TP53"

hgnc_fetch_by_uniprot_id

hgnc_fetch_by_uniprot_id

hgnc_fetch_by_uniprot_id(
    uniprot_id: str,
) -> HGNCFetchedData

Fetch a gene entry by UniProt accession.

Parameters:

Name Type Description Default
uniprot_id str

UniProt accession (e.g. "P04637").

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with the matching gene entry.

Example::

data = hgnc_fetch_by_uniprot_id("P04637")
print(data[0].symbol)  # "TP53"

hgnc_fetch_by_refseq

hgnc_fetch_by_refseq

hgnc_fetch_by_refseq(
    refseq_accession: str,
) -> HGNCFetchedData

Fetch a gene entry by RefSeq accession.

Parameters:

Name Type Description Default
refseq_accession str

RefSeq accession (e.g. "NM_000546").

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with the matching gene entry.

Example::

data = hgnc_fetch_by_refseq("NM_000546")
print(data[0].symbol)  # "TP53"

hgnc_search_symbol

hgnc_search_symbol

hgnc_search_symbol(query: str) -> HGNCFetchedData

Search HGNC gene symbols using wildcard patterns.

Returns lightweight summaries; use :func:hgnc_fetch_by_symbol for full records once you have exact symbols.

Parameters:

Name Type Description Default
query str

Symbol query supporting * (any chars) and ? (one char). Examples: "ZNF*", "BRCA?"

required

Returns:

Type Description
HGNCFetchedData

class:HGNCFetchedData with is_search=True.

Example::

hits = hgnc_search_symbol("TP53*")
print(hits.symbols())
# ['TP53', 'TP53AIP1', 'TP53BP1', 'TP53BP2', ...]

hgnc_info

hgnc_info

hgnc_info() -> dict

Return HGNC service metadata.

Includes the database last-modified timestamp, total document count, and the lists of searchable and stored fields.

Returns:

Type Description
dict

Raw JSON dict from the /info endpoint.

Example::

info = hgnc_info()
print(info["response"]["numDoc"])

ClinVar

clinvar_search(
    query: str, retmax: int = 500, retstart: int = 0
) -> List[str]

Find ClinVar variation UIDs matching an Entrez query.

Uses the same query language as the ClinVar website. Common field tags:

  • BRCA1[gene] — gene name
  • pathogenic[clnsig] — clinical significance
  • "Breast cancer"[dis] — disease
  • single_gene[prop] — single-gene variants

Parameters:

Name Type Description Default
query str

Entrez query string.

required
retmax int

Maximum UIDs to return (default 500).

500
retstart int

Offset for pagination.

0

Returns:

Type Description
List[str]

List of variation UID strings.

Example::

uids = clinvar_search("BRCA1[gene] AND pathogenic[clnsig]")
data = clinvar_fetch_by_id(uids[:20])

clinvar_count

clinvar_count

clinvar_count(query: str) -> int

Return the total number of ClinVar records matching query.

Parameters:

Name Type Description Default
query str

Entrez query string.

required

Returns:

Type Description
int

Integer count.

Example::

n = clinvar_count("TP53[gene] AND pathogenic[clnsig]")
print(f"TP53 has {n} pathogenic variants in ClinVar")

clinvar_fetch_by_id

clinvar_fetch_by_id

clinvar_fetch_by_id(
    ids: List[Union[str, int]],
) -> ClinVarFetchedData

Fetch ClinVar summaries for a list of variation UIDs.

Parameters:

Name Type Description Default
ids List[Union[str, int]]

ClinVar variation UIDs (integers or strings).

required

Returns:

Type Description
ClinVarFetchedData

class:~biodbs.data.ClinVar.data.ClinVarFetchedData.

Example::

data = clinvar_fetch_by_id([65533, 14206])
print(data.as_dataframe())

clinvar_search_gene

clinvar_search_gene

clinvar_search_gene(
    gene_symbol: str,
    retmax: int = 500,
    single_gene: bool = True,
    clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData

Search and fetch ClinVar variants for a gene in one step.

Parameters:

Name Type Description Default
gene_symbol str

HGNC gene symbol (e.g. "BRCA1").

required
retmax int

Maximum variants to return.

500
single_gene bool

If True (default), restrict to single-gene variants.

True
clinical_significance Optional[str]

Optional filter (e.g. "pathogenic").

None

Returns:

Type Description
ClinVarFetchedData

class:~biodbs.data.ClinVar.data.ClinVarFetchedData.

Example::

data = clinvar_search_gene("TP53", retmax=200,
                           clinical_significance="pathogenic")
df = data.as_dataframe()

clinvar_search_condition

clinvar_search_condition

clinvar_search_condition(
    condition: str,
    retmax: int = 500,
    clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData

Search and fetch ClinVar variants for a disease/condition.

Parameters:

Name Type Description Default
condition str

Disease or condition name (e.g. "Lynch syndrome").

required
retmax int

Maximum variants to return.

500
clinical_significance Optional[str]

Optional significance filter.

None

Returns:

Type Description
ClinVarFetchedData

class:~biodbs.data.ClinVar.data.ClinVarFetchedData.

Example::

data = clinvar_search_condition("Breast cancer",
                                clinical_significance="pathogenic")

clinvar_fetch_vcv

clinvar_fetch_vcv

clinvar_fetch_vcv(accession: str) -> str

Retrieve the full VCV XML record for a variation.

Parameters:

Name Type Description Default
accession str

VCV accession (e.g. "VCV000014206" or "VCV000014206.3").

required

Returns:

Type Description
str

Raw XML string.

Example::

xml = clinvar_fetch_vcv("VCV000014206")

clinvar_fetch_rcv

clinvar_fetch_rcv

clinvar_fetch_rcv(accession: str) -> str

Retrieve the full RCV XML record for a variation-condition pair.

Parameters:

Name Type Description Default
accession str

RCV accession (e.g. "RCV000000606").

required

Returns:

Type Description
str

Raw XML string.

Example::

xml = clinvar_fetch_rcv("RCV000000606")
clinvar_link_pubmed(
    variation_id: Union[str, int],
) -> List[str]

Return PubMed UIDs linked to a ClinVar variation.

Parameters:

Name Type Description Default
variation_id Union[str, int]

ClinVar variation UID.

required

Returns:

Type Description
List[str]

List of PubMed UID strings.

Example::

pmids = clinvar_link_pubmed(65533)

Rate Limiting

Function/Class Description
RateLimiter Global rate limiter for API calls
get_rate_limiter Get the singleton rate limiter instance
request_with_retry Make HTTP request with retry logic