Data Fetching Overview¶

The biodbs.fetch module provides unified access to biological and chemical databases.

Related sections:

API Reference - Complete function and class documentation
ID Translation - Convert between identifier systems
Analysis - Statistical analysis of fetched data
Knowledge Graph - Build graphs from fetched data

Available Databases¶

Database	Description	Key Functions
UniProt	Protein sequences and annotations	`uniprot_get_entry`, `uniprot_search`
PubChem	Chemical compounds and properties	`pubchem_get_compound`, `pubchem_search_by_name`
Ensembl	Genomic data and sequences	`ensembl_lookup`, `ensembl_get_sequence`
BioMart	Gene annotations and queries	`biomart_get_genes`, `biomart_convert_ids`
KEGG	Pathways and biological systems	`kegg_get`, `kegg_find`, `kegg_link`
ChEMBL	Bioactive molecules and targets	`chembl_get_molecule`, `chembl_get_target`
QuickGO	Gene Ontology annotations	`quickgo_search_terms`, `quickgo_get_terms`
HPA	Protein expression data	`hpa_get_gene`, `hpa_get_tissue_expression`
NCBI	Gene and taxonomy data	`ncbi_get_gene`, `ncbi_symbol_to_id`
FDA	Drug and device data	`fda_drug_events`, `fda_drug_labels`
Reactome	Pathway data and analysis	`reactome_analyze`, `reactome_get_pathway_genes`
EnrichR	Gene set enrichment	`enrichr_analyze`, `enrichr_get_libraries`
Disease Ontology	Disease terms	`do_get_term`, `do_search`
HGNC	Authoritative human gene nomenclature	`hgnc_fetch_by_symbol`, `hgnc_search_symbol`
ClinVar	Clinical variant classifications	`clinvar_search_gene`, `clinvar_fetch_by_id`

Quick Start¶

Import Convention¶

# Import specific functions
from biodbs.fetch import uniprot_get_entry, pubchem_get_compound

# Or import fetcher classes
from biodbs.fetch.uniprot import UniProt_Fetcher
from biodbs.fetch.pubchem import PubChem_Fetcher

Basic Usage Pattern¶

All fetch functions follow a similar pattern:

from biodbs.fetch import uniprot_get_entry

# Fetch data
data = uniprot_get_entry("P04637")

# Access results
for entry in data.entries:
    print(entry.primaryAccession)
    print(entry.protein_name)

# Convert to DataFrame
df = data.as_dataframe()

# Get as dictionary
records = data.as_dict()

Batch Operations¶

Fetch multiple items efficiently:

from biodbs.fetch import uniprot_get_entries

# Batch retrieval
entries = uniprot_get_entries(["P04637", "P00533", "P38398"])

# Results combined in single response
print(f"Found {len(entries)} entries")
df = entries.as_dataframe()

Search Operations¶

Most databases support search:

from biodbs.fetch import uniprot_search, pubchem_search_by_name

# UniProt search with query syntax
results = uniprot_search(
    "gene:BRCA1 AND organism_id:9606 AND reviewed:true",
    size=100
)

# PubChem search by name
compounds = pubchem_search_by_name("aspirin")

Output Formats¶

All data containers support multiple output formats:

pandasPolarsDictionaryJSON

df = data.as_dataframe(engine="pandas")

df = data.as_dataframe(engine="polars")

records = data.as_dict()

import json
json_str = json.dumps(data.as_dict())

Error Handling¶

from biodbs.fetch import uniprot_get_entry

try:
    entry = uniprot_get_entry("P04637")
except ConnectionError as e:
    # API unavailable or rate limited
    print(f"Connection error: {e}")
except ValueError as e:
    # Invalid input
    print(f"Invalid input: {e}")

Rate Limiting¶

All fetchers automatically handle rate limiting:

Requests are throttled to respect API limits
Automatic retry with exponential backoff on 429 errors
Configurable via the RateLimiter class

from biodbs.fetch._rate_limit import get_rate_limiter

limiter = get_rate_limiter()
# Rate limits are set per-host automatically

Using Fetcher Classes¶

For more control, use fetcher classes directly:

from biodbs.fetch.uniprot import UniProt_Fetcher

# Create fetcher
fetcher = UniProt_Fetcher()

# Access all methods
entry = fetcher.get_entry("P04637")
results = fetcher.search("gene:TP53")
mapping = fetcher.gene_to_uniprot(["TP53", "BRCA1"])

# Batch operations with concurrency
sequences = fetcher.get_sequences(["P04637", "P00533"])

Next Steps¶

UniProt Guide - Detailed UniProt fetching examples
PubChem Guide - Chemical compound data
ID Translation - Map between identifier systems
ORA Analysis - Pathway enrichment analysis