Data Fetching Overview¶
The biodbs.fetch module provides unified access to biological and chemical databases.
Related sections:
- API Reference - Complete function and class documentation
- ID Translation - Convert between identifier systems
- Analysis - Statistical analysis of fetched data
- Knowledge Graph - Build graphs from fetched data
Available Databases¶
| Database | Description | Key Functions |
|---|---|---|
| UniProt | Protein sequences and annotations | uniprot_get_entry, uniprot_search |
| PubChem | Chemical compounds and properties | pubchem_get_compound, pubchem_search_by_name |
| Ensembl | Genomic data and sequences | ensembl_lookup, ensembl_get_sequence |
| BioMart | Gene annotations and queries | biomart_get_genes, biomart_convert_ids |
| KEGG | Pathways and biological systems | kegg_get, kegg_find, kegg_link |
| ChEMBL | Bioactive molecules and targets | chembl_get_molecule, chembl_get_target |
| QuickGO | Gene Ontology annotations | quickgo_search_terms, quickgo_get_terms |
| HPA | Protein expression data | hpa_get_gene, hpa_get_tissue_expression |
| NCBI | Gene and taxonomy data | ncbi_get_gene, ncbi_symbol_to_id |
| FDA | Drug and device data | fda_drug_events, fda_drug_labels |
| Reactome | Pathway data and analysis | reactome_analyze, reactome_get_pathway_genes |
| EnrichR | Gene set enrichment | enrichr_analyze, enrichr_get_libraries |
| Disease Ontology | Disease terms | do_get_term, do_search |
| HGNC | Authoritative human gene nomenclature | hgnc_fetch_by_symbol, hgnc_search_symbol |
| ClinVar | Clinical variant classifications | clinvar_search_gene, clinvar_fetch_by_id |
Quick Start¶
Import Convention¶
# Import specific functions
from biodbs.fetch import uniprot_get_entry, pubchem_get_compound
# Or import fetcher classes
from biodbs.fetch.uniprot import UniProt_Fetcher
from biodbs.fetch.pubchem import PubChem_Fetcher
Basic Usage Pattern¶
All fetch functions follow a similar pattern:
from biodbs.fetch import uniprot_get_entry
# Fetch data
data = uniprot_get_entry("P04637")
# Access results
for entry in data.entries:
print(entry.primaryAccession)
print(entry.protein_name)
# Convert to DataFrame
df = data.as_dataframe()
# Get as dictionary
records = data.as_dict()
Batch Operations¶
Fetch multiple items efficiently:
from biodbs.fetch import uniprot_get_entries
# Batch retrieval
entries = uniprot_get_entries(["P04637", "P00533", "P38398"])
# Results combined in single response
print(f"Found {len(entries)} entries")
df = entries.as_dataframe()
Search Operations¶
Most databases support search:
from biodbs.fetch import uniprot_search, pubchem_search_by_name
# UniProt search with query syntax
results = uniprot_search(
"gene:BRCA1 AND organism_id:9606 AND reviewed:true",
size=100
)
# PubChem search by name
compounds = pubchem_search_by_name("aspirin")
Output Formats¶
All data containers support multiple output formats:
Error Handling¶
from biodbs.fetch import uniprot_get_entry
try:
entry = uniprot_get_entry("P04637")
except ConnectionError as e:
# API unavailable or rate limited
print(f"Connection error: {e}")
except ValueError as e:
# Invalid input
print(f"Invalid input: {e}")
Rate Limiting¶
All fetchers automatically handle rate limiting:
- Requests are throttled to respect API limits
- Automatic retry with exponential backoff on 429 errors
- Configurable via the
RateLimiterclass
from biodbs.fetch._rate_limit import get_rate_limiter
limiter = get_rate_limiter()
# Rate limits are set per-host automatically
Using Fetcher Classes¶
For more control, use fetcher classes directly:
from biodbs.fetch.uniprot import UniProt_Fetcher
# Create fetcher
fetcher = UniProt_Fetcher()
# Access all methods
entry = fetcher.get_entry("P04637")
results = fetcher.search("gene:TP53")
mapping = fetcher.gene_to_uniprot(["TP53", "BRCA1"])
# Batch operations with concurrency
sequences = fetcher.get_sequences(["P04637", "P00533"])
Next Steps¶
- UniProt Guide - Detailed UniProt fetching examples
- PubChem Guide - Chemical compound data
- ID Translation - Map between identifier systems
- ORA Analysis - Pathway enrichment analysis