Skip to content

Data Fetching Overview

The biodbs.fetch module provides unified access to biological and chemical databases.

Related sections:

Available Databases

Database Description Key Functions
UniProt Protein sequences and annotations uniprot_get_entry, uniprot_search
PubChem Chemical compounds and properties pubchem_get_compound, pubchem_search_by_name
Ensembl Genomic data and sequences ensembl_lookup, ensembl_get_sequence
BioMart Gene annotations and queries biomart_get_genes, biomart_convert_ids
KEGG Pathways and biological systems kegg_get, kegg_find, kegg_link
ChEMBL Bioactive molecules and targets chembl_get_molecule, chembl_get_target
QuickGO Gene Ontology annotations quickgo_search_terms, quickgo_get_terms
HPA Protein expression data hpa_get_gene, hpa_get_tissue_expression
NCBI Gene and taxonomy data ncbi_get_gene, ncbi_symbol_to_id
FDA Drug and device data fda_drug_events, fda_drug_labels
Reactome Pathway data and analysis reactome_analyze, reactome_get_pathway_genes
EnrichR Gene set enrichment enrichr_analyze, enrichr_get_libraries
Disease Ontology Disease terms do_get_term, do_search
HGNC Authoritative human gene nomenclature hgnc_fetch_by_symbol, hgnc_search_symbol
ClinVar Clinical variant classifications clinvar_search_gene, clinvar_fetch_by_id

Quick Start

Import Convention

# Import specific functions
from biodbs.fetch import uniprot_get_entry, pubchem_get_compound

# Or import fetcher classes
from biodbs.fetch.uniprot import UniProt_Fetcher
from biodbs.fetch.pubchem import PubChem_Fetcher

Basic Usage Pattern

All fetch functions follow a similar pattern:

from biodbs.fetch import uniprot_get_entry

# Fetch data
data = uniprot_get_entry("P04637")

# Access results
for entry in data.entries:
    print(entry.primaryAccession)
    print(entry.protein_name)

# Convert to DataFrame
df = data.as_dataframe()

# Get as dictionary
records = data.as_dict()

Batch Operations

Fetch multiple items efficiently:

from biodbs.fetch import uniprot_get_entries

# Batch retrieval
entries = uniprot_get_entries(["P04637", "P00533", "P38398"])

# Results combined in single response
print(f"Found {len(entries)} entries")
df = entries.as_dataframe()

Search Operations

Most databases support search:

from biodbs.fetch import uniprot_search, pubchem_search_by_name

# UniProt search with query syntax
results = uniprot_search(
    "gene:BRCA1 AND organism_id:9606 AND reviewed:true",
    size=100
)

# PubChem search by name
compounds = pubchem_search_by_name("aspirin")

Output Formats

All data containers support multiple output formats:

df = data.as_dataframe(engine="pandas")
df = data.as_dataframe(engine="polars")
records = data.as_dict()
import json
json_str = json.dumps(data.as_dict())

Error Handling

from biodbs.fetch import uniprot_get_entry

try:
    entry = uniprot_get_entry("P04637")
except ConnectionError as e:
    # API unavailable or rate limited
    print(f"Connection error: {e}")
except ValueError as e:
    # Invalid input
    print(f"Invalid input: {e}")

Rate Limiting

All fetchers automatically handle rate limiting:

  • Requests are throttled to respect API limits
  • Automatic retry with exponential backoff on 429 errors
  • Configurable via the RateLimiter class
from biodbs.fetch._rate_limit import get_rate_limiter

limiter = get_rate_limiter()
# Rate limits are set per-host automatically

Using Fetcher Classes

For more control, use fetcher classes directly:

from biodbs.fetch.uniprot import UniProt_Fetcher

# Create fetcher
fetcher = UniProt_Fetcher()

# Access all methods
entry = fetcher.get_entry("P04637")
results = fetcher.search("gene:TP53")
mapping = fetcher.gene_to_uniprot(["TP53", "BRCA1"])

# Batch operations with concurrency
sequences = fetcher.get_sequences(["P04637", "P00533"])

Next Steps