Gene ID Translation¶

Translate between different gene identifier systems using translate_gene_ids.

Quick Start¶

from biodbs.translate import translate_gene_ids

# Gene symbols → Ensembl IDs (using universal aliases)
result = translate_gene_ids(
    ["TP53", "BRCA1", "EGFR"],
    from_type="gene_symbol",
    to_type="ensembl_gene_id",
)

# Gene symbols → Entrez IDs
result = translate_gene_ids(
    ["TP53", "BRCA1"],
    from_type="gene_symbol",
    to_type="entrez_id",
    database="ncbi",      # default
)

Universal ID Type Aliases¶

Every database has its own field names for the same concept. biodbs provides a set of universal aliases that work regardless of which backend you choose — the correct native name is resolved automatically.

`GeneIDType` enum¶

from biodbs.translate import GeneIDType

# Use enum members as from_type / to_type
result = translate_gene_ids(
    ["TP53", "BRCA1"],
    from_type=GeneIDType.GENE_SYMBOL,
    to_type=GeneIDType.ENSEMBL_GENE_ID,
)

# Plain strings with the same values work identically
result = translate_gene_ids(
    ["TP53", "BRCA1"],
    from_type="gene_symbol",   # same as GeneIDType.GENE_SYMBOL
    to_type="ensembl_gene_id", # same as GeneIDType.ENSEMBL_GENE_ID
)

Universal aliases and their values¶

`GeneIDType` member	String value	Description	Example
`GENE_SYMBOL`	`"gene_symbol"`	Approved gene symbol	`"TP53"`
`ENSEMBL_GENE_ID`	`"ensembl_gene_id"`	Ensembl stable gene ID	`"ENSG00000141510"`
`ENSEMBL_TRANSCRIPT_ID`	`"ensembl_transcript_id"`	Ensembl transcript ID	`"ENST00000269305"`
`ENSEMBL_PROTEIN_ID`	`"ensembl_protein_id"`	Ensembl protein ID	`"ENSP00000269305"`
`ENTREZ_ID`	`"entrez_id"`	NCBI Entrez Gene ID	`"7157"`
`HGNC_ID`	`"hgnc_id"`	HGNC identifier	`"HGNC:11998"`
`HGNC_SYMBOL`	`"hgnc_symbol"`	HGNC-curated symbol	`"TP53"`
`UNIPROT_ID`	`"uniprot_id"`	UniProt accession	`"P04637"`
`REFSEQ_MRNA`	`"refseq_mrna"`	RefSeq mRNA accession	`"NM_000546"`
`REFSEQ_PROTEIN`	`"refseq_protein"`	RefSeq protein accession	`"NP_000537"`
`PDB_ID`	`"pdb_id"`	PDB structure ID	`"2OCJ"`

How aliases resolve per database¶

When you pass a universal alias, it is automatically mapped to the native field name required by the chosen backend. Native field names are also accepted and passed through unchanged — so existing code keeps working.

Universal alias	BioMart	NCBI	UniProt	Ensembl REST	HGNC
`gene_symbol`	`external_gene_name`	`symbol`	`Gene_Name`	`HGNC`	`symbol`
`ensembl_gene_id`	`ensembl_gene_id`	`ensembl_gene_id`	`Ensembl`	`ensembl_gene_id`	`ensembl_gene_id`
`ensembl_transcript_id`	`ensembl_transcript_id`	—	—	—	—
`ensembl_protein_id`	`ensembl_peptide_id`	—	—	—	—
`entrez_id`	`entrezgene_id`	`gene_id`	`GeneID`	`EntrezGene`	`entrez_id`
`hgnc_id`	`hgnc_id`	—	—	—	`hgnc_id`
`hgnc_symbol`	`hgnc_symbol`	—	—	—	`symbol`
`uniprot_id`	`uniprot_gn_id`	`uniprot`	`UniProtKB_AC-ID`	`Uniprot_gn`	`uniprot_ids`
`refseq_mrna`	`refseq_mrna`	`refseq_accession`	—	`RefSeq_mRNA`	`refseq_accession`
`refseq_protein`	`refseq_peptide`	`refseq_accession`	`RefSeq_Protein`	`RefSeq_peptide`	`refseq_accession`
`pdb_id`	—	—	`PDB`	—	—

Native strings are always accepted

If you pass a value that is not in the alias map (e.g. "external_gene_name" or "Gene_Name"), it is forwarded to the database unchanged. This means database-native field names still work, but universal aliases are preferred for portability.

Choosing a Database¶

from biodbs.translate import TranslationDatabase

result = translate_gene_ids(ids, from_type=..., to_type=...,
                             database=TranslationDatabase.NCBI)   # or "ncbi"

Database	String	Best for	Human only?
NCBI (default)	`"ncbi"`	symbol ↔ Entrez ↔ Ensembl; most stable	No
Ensembl REST	`"ensembl"`	Ensembl ID lookups; more stable than BioMart	No
UniProt	`"uniprot"`	UniProt accession, PDB, RefSeq protein	No
BioMart	`"biomart"`	Widest range of ID types; batch queries	No
HGNC	`"hgnc"`	HGNC IDs, approved symbols, aliases	Yes