BioMart¶
Query gene annotations via Ensembl BioMart.
Overview¶
BioMart is a query-oriented data management system for large biological datasets. It excels at:
- Batch Queries - Retrieve data for many genes at once
- ID Conversion - Map between identifier systems
- Annotations - GO terms, descriptions, homologs
- Filtering - Complex queries with multiple filters
Quick Start¶
from biodbs.fetch import (
biomart_get_genes,
biomart_get_genes_by_name,
biomart_convert_ids,
biomart_get_go_annotations,
)
# Get genes by Ensembl ID
genes = biomart_get_genes(["ENSG00000141510", "ENSG00000012048"])
df = genes.as_dataframe()
Gene Retrieval¶
By Ensembl ID¶
from biodbs.fetch import biomart_get_genes
genes = biomart_get_genes(
["ENSG00000141510", "ENSG00000012048"],
dataset="hsapiens_gene_ensembl"
)
By Gene Name¶
from biodbs.fetch import biomart_get_genes_by_name
genes = biomart_get_genes_by_name(["TP53", "BRCA1"])
By Region¶
from biodbs.fetch import biomart_get_genes_by_region
genes = biomart_get_genes_by_region(
chromosome="17",
start=7661779,
end=7687550
)
ID Conversion¶
from biodbs.fetch import biomart_convert_ids
# Gene symbols to Ensembl IDs
converted = biomart_convert_ids(
["TP53", "BRCA1", "EGFR"],
from_type="external_gene_name",
to_type="ensembl_gene_id"
)
# Ensembl to Entrez
converted = biomart_convert_ids(
["ENSG00000141510"],
from_type="ensembl_gene_id",
to_type="entrezgene_id"
)
Supported ID Types¶
| ID Type | Description |
|---|---|
ensembl_gene_id |
Ensembl gene ID |
ensembl_transcript_id |
Ensembl transcript ID |
external_gene_name |
Gene symbol |
hgnc_symbol |
HGNC symbol |
hgnc_id |
HGNC ID |
entrezgene_id |
NCBI Entrez ID |
uniprot_gn_id |
UniProt gene name |
refseq_mrna |
RefSeq mRNA ID |
Annotations¶
GO Annotations¶
from biodbs.fetch import biomart_get_go_annotations
go = biomart_get_go_annotations(["ENSG00000141510"])
df = go.as_dataframe()
Transcripts¶
from biodbs.fetch import biomart_get_transcripts
transcripts = biomart_get_transcripts(["ENSG00000141510"])
Homologs¶
from biodbs.fetch import biomart_get_homologs
homologs = biomart_get_homologs(
["ENSG00000141510"],
target_species="mmusculus"
)
Custom Queries¶
from biodbs.fetch import biomart_query
data = biomart_query(
dataset="hsapiens_gene_ensembl",
attributes=[
"ensembl_gene_id",
"external_gene_name",
"description",
"chromosome_name",
"start_position",
"end_position"
],
filters={"ensembl_gene_id": ["ENSG00000141510", "ENSG00000012048"]}
)
List Available Options¶
from biodbs.fetch import biomart_list_datasets, biomart_list_attributes, biomart_list_filters
# List datasets
datasets = biomart_list_datasets()
# List attributes for a dataset
attributes = biomart_list_attributes("hsapiens_gene_ensembl")
# List filters
filters = biomart_list_filters("hsapiens_gene_ensembl")
Using the Fetcher Class¶
from biodbs.fetch.biomart import BioMart_Fetcher
fetcher = BioMart_Fetcher()
data = fetcher.query(
dataset="hsapiens_gene_ensembl",
attributes=["ensembl_gene_id", "external_gene_name"],
filters={"chromosome_name": "17"}
)
Related Resources¶
- Ensembl REST API - Single gene lookup, sequences, and VEP. Use REST API for detailed single-gene queries.
- UniProt - Protein information and annotations.
- ID Translation - Simplified interface for gene ID conversion using BioMart as a backend with
translate_gene_ids(..., database="biomart").