Skip to content

biodbs

biodbs (Biological Database Services) is a Python library providing unified access to major biological and chemical databases with built-in support for ID translation and enrichment analysis.

Features

  • Unified API - Consistent interface across all supported databases
  • Four Namespaces - Clear separation of concerns:
    • biodbs.fetch - Data retrieval from external databases
    • biodbs.translate - ID mapping between databases
    • biodbs.analysis - Statistical analysis (ORA, enrichment)
    • biodbs.graph - Knowledge graph building and export
  • Multiple Output Formats - pandas/Polars DataFrames, CSV, JSON, SQLite
  • Enrichment Analysis - Over-representation analysis with KEGG, GO, and EnrichR
  • Batch Processing - Efficient handling of large queries with rate limiting
  • Type Safety - Pydantic models for request/response validation

Supported Databases

Database Description Module
UniProt Protein sequences, annotations, and ID mapping biodbs.fetch.uniprot
PubChem Chemical compounds, properties, and bioassays biodbs.fetch.pubchem
Ensembl REST Sequences, variants, homology, VEP, genomic features biodbs.fetch.ensembl
BioMart Gene annotations via Ensembl BioMart biodbs.fetch.biomart
KEGG Pathways, genes, compounds, biological systems biodbs.fetch.KEGG
ChEMBL Bioactive molecules, drug targets, pharmacology biodbs.fetch.ChEMBL
QuickGO Gene Ontology annotations and relationships biodbs.fetch.QuickGO
HPA Human Protein Atlas - protein expression biodbs.fetch.HPA
NCBI Gene information, taxonomy, and genome assemblies biodbs.fetch.NCBI
FDA Drug events, labels, recalls, device data biodbs.fetch.FDA
Reactome Pathway analysis and biological reactions biodbs.fetch.Reactome
Disease Ontology Disease terms and cross-references biodbs.fetch.DiseaseOntology
HGNC Authoritative human gene nomenclature biodbs.fetch.HGNC
ClinVar Clinical variant classifications biodbs.fetch.ClinVar

Quick Example

from biodbs.fetch import uniprot_get_entry, pubchem_get_compound
from biodbs.translate import translate_gene_to_uniprot
from biodbs.analysis import ora_kegg

# Fetch protein data
protein = uniprot_get_entry("P04637")  # TP53
print(protein.entries[0].protein_name)

# Translate gene names to UniProt
mapping = translate_gene_to_uniprot(["TP53", "BRCA1", "EGFR"])
# {'TP53': 'P04637', 'BRCA1': 'P38398', 'EGFR': 'P00533'}

# Perform pathway enrichment
result = ora_kegg(
    gene_list=["TP53", "BRCA1", "BRCA2", "ATM"],
    organism="hsa",
    id_type="symbol"
)

Installation

pip install biodbs

Or with optional dependencies for the graph module:

pip install biodbs[graph]  # For NetworkX and RDF export support

Getting Help

License

MIT License