Translate Module API Reference¶

Complete reference for biodbs.translate module.

Key Features¶

Universal ID aliases: Use GeneIDType enum values (e.g. "gene_symbol", "entrez_id") instead of database-native field names — the correct name is resolved per backend automatically.
Multiple Target Types: All main translation functions accept either a single target type or a list. When a list is provided, all target IDs are returned in one call.

from biodbs.translate import translate_gene_ids, GeneIDType

# Universal alias (works with any database)
result = translate_gene_ids(["TP53"], from_type="gene_symbol", to_type="ensembl_gene_id")

# Enum members are equivalent
result = translate_gene_ids(["TP53"], from_type=GeneIDType.GENE_SYMBOL,
                             to_type=GeneIDType.ENSEMBL_GENE_ID)

# Multiple target types — more efficient than multiple calls
result = translate_gene_ids(
    ["TP53"],
    from_type="gene_symbol",
    to_type=["ensembl_gene_id", "entrez_id", "hgnc_id"]
)

Enums¶

GeneIDType¶

GeneIDType ¶

Bases: str, Enum

Universal gene / protein identifier types.

Use these members as from_type / to_type in :func:translate_gene_ids instead of database-specific field names. The correct native name for the chosen database is resolved automatically.

Raw native strings (e.g. "external_gene_name", "Gene_Name") are still accepted everywhere and passed through unchanged.

Examples:

>>> from biodbs.translate import GeneIDType
>>> translate_gene_ids(["TP53"], from_type=GeneIDType.GENE_SYMBOL,
...                    to_type=GeneIDType.ENSEMBL_GENE_ID)
>>> translate_gene_ids(["TP53"], from_type="gene_symbol",   # same
...                    to_type="ensembl_gene_id")

TranslationDatabase¶

TranslationDatabase ¶

Bases: str, Enum

Databases available for gene ID translation.

Use these as the database parameter in :func:translate_gene_ids and the translation_database parameter in ORA functions.

Members

NCBI: NCBI Datasets API. Most stable; best for symbol ↔ Entrez ↔ Ensembl translations. Default for translate_gene_ids. ENSEMBL: Ensembl REST API (xrefs endpoint). More stable than BioMart; natural choice when working with Ensembl IDs. UNIPROT: UniProt ID-mapping API. Best for protein-centric translations (UniProt accession, PDB, RefSeq protein). BIOMART: BioMart / Ensembl query interface. Supports the widest range of ID types but is less reliable than the other options. HGNC: HGNC REST API. Authoritative for human gene nomenclature; best for translations involving HGNC IDs, approved symbols, aliases, and previous symbols. Human only.

Examples:

>>> from biodbs.translate import TranslationDatabase, translate_gene_ids
>>> translate_gene_ids(["TP53"], from_type="gene_symbol",
...                    to_type="ensembl_gene_id",
...                    database=TranslationDatabase.NCBI)
>>> # Raw strings still work for backwards compatibility
>>> translate_gene_ids(["TP53"], from_type="gene_symbol",
...                    to_type="ensembl_gene_id",
...                    database="ncbi")

Functions Summary¶

Gene Translation¶

Function	Description
`translate_gene_ids`	Translate gene IDs between databases
`translate_gene_ids_kegg`	Translate gene IDs using KEGG API

Chemical Translation¶

Function	Description
`translate_chemical_ids`	Translate chemical IDs via PubChem
`translate_chemical_ids_kegg`	Translate chemical IDs using KEGG API
`translate_chembl_to_pubchem`	Map ChEMBL IDs to PubChem CIDs
`translate_pubchem_to_chembl`	Map PubChem CIDs to ChEMBL IDs

Protein Translation¶

Function	Description
`translate_protein_ids`	Translate protein IDs via UniProt ID mapping
`translate_gene_to_uniprot`	Map gene symbols to UniProt accessions
`translate_uniprot_to_gene`	Map UniProt accessions to gene symbols
`translate_uniprot_to_pdb`	Map UniProt accessions to PDB IDs
`translate_uniprot_to_ensembl`	Map UniProt accessions to Ensembl gene IDs
`translate_uniprot_to_refseq`	Map UniProt accessions to RefSeq protein IDs

Gene Translation¶

translate_gene_ids¶

translate_gene_ids ¶

translate_gene_ids(
    ids: List[str],
    from_type: Union[GeneIDType, str],
    to_type: Union[
        GeneIDType, str, List[Union[GeneIDType, str]]
    ],
    species: Union[Species, str, int] = HUMAN,
    database: Union[
        TranslationDatabase,
        Literal[
            "ncbi", "ensembl", "uniprot", "biomart", "hgnc"
        ],
    ] = NCBI,
    return_dict: bool = False,
) -> Union[
    Dict[str, str], Dict[str, Dict[str, str]], DataFrame
]

Translate gene IDs between different identifier types.

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of gene IDs to translate.	required
`from_type`	`Union[GeneIDType, str]`	Source ID type.	required
`to_type`	`Union[GeneIDType, str, List[Union[GeneIDType, str]]]`	Target ID type(s). Can be a single string or a list of strings. When a list is provided, multiple target IDs are returned.	required
`species`	`Union[Species, str, int]`	Species to translate for. Accepts a :class:`Species` member, a common name (`"human"`), a KEGG code (`"hsa"`), a scientific name, or an NCBI taxon ID (`9606`). Defaults to :attr:`Species.HUMAN`.	`HUMAN`
`database`	`Union[TranslationDatabase, Literal['ncbi', 'ensembl', 'uniprot', 'biomart', 'hgnc']]`	Database backend for translation. Accepts a :class:`TranslationDatabase` member or a plain string. Raw strings are matched case-insensitively for backwards compat. :attr:`TranslationDatabase.NCBI` (default) — NCBI Datasets API. Most reliable; best for symbol ↔ Entrez ↔ Ensembl translations. :attr:`TranslationDatabase.ENSEMBL` — Ensembl REST API (xrefs). Stable; natural when working with Ensembl IDs. :attr:`TranslationDatabase.UNIPROT` — UniProt ID-mapping API. Best for protein-centric translations (UniProt, PDB, RefSeq protein). :attr:`TranslationDatabase.BIOMART` — BioMart query interface. Widest ID-type coverage but less reliable than the other options.	`NCBI`
`return_dict`	`bool`	If True, return a dict mapping from_id -> to_id (or dict of to_ids when to_type is a list). If False (default), return a DataFrame.	`False`

Supported ID types for NCBI

symbol / gene_symbol: Gene symbol (e.g., "TP53")
entrez_id / gene_id: NCBI Gene ID (e.g., "7157")
refseq_accession: RefSeq accession (e.g., "NM_000546.6")
ensembl_gene_id: Ensembl gene ID (output only)
uniprot / swiss_prot: UniProt accession (output only)

Supported ID types for BioMart

ensembl_gene_id: Ensembl gene ID (e.g., "ENSG00000141510")
ensembl_transcript_id: Ensembl transcript ID
ensembl_peptide_id: Ensembl protein ID
external_gene_name: Gene symbol (e.g., "TP53")
hgnc_symbol: HGNC symbol
hgnc_id: HGNC ID (e.g., "HGNC:11998")
entrezgene_id: NCBI Entrez gene ID
uniprot_gn_id: UniProt gene name
refseq_mrna: RefSeq mRNA ID
refseq_peptide: RefSeq protein ID

Supported ID types for Ensembl REST

Input (from_type): Ensembl stable IDs (ENSG, ENST, ENSP*)
Output (to_type): Filter by external_db name (e.g., "HGNC", "EntrezGene", "Uniprot_gn", "RefSeq_mRNA", "RefSeq_peptide")

Supported ID types for UniProt

UniProtKB_AC-ID: UniProt accession (e.g., "P04637")
Gene_Name: Gene symbol (e.g., "TP53")
GeneID: NCBI Gene ID (e.g., "7157")
Ensembl: Ensembl gene ID
RefSeq_Protein: RefSeq protein ID
PDB: PDB structure ID

Supported ID types for HGNC (human only): - symbol / gene_symbol / hgnc_symbol: Approved gene symbol - hgnc_id: HGNC ID (e.g., "HGNC:11998") - entrez_id: NCBI Gene ID - ensembl_gene_id: Ensembl stable gene ID - uniprot_id → uniprot_ids field (first accession returned) - refseq_mrna / refseq_protein → refseq_accession field (first returned)

Returns:

Type	Description
`Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]`	When to_type is a string: Dict mapping source IDs to target IDs, or DataFrame with both columns.
`Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]`	When to_type is a list: Dict mapping source IDs to dicts of {target_type: target_id}, or DataFrame with from_type column and one column per target type.

Example

Gene symbols to Ensembl IDs using the universal enum:

from biodbs.translate import GeneIDType

result = translate_gene_ids(
    ["TP53", "BRCA1", "EGFR"],
    from_type=GeneIDType.GENE_SYMBOL,
    to_type=GeneIDType.ENSEMBL_GENE_ID,
)
print(result)
#   external_gene_name    ensembl_gene_id
# 0               TP53  ENSG00000141510
# 1              BRCA1  ENSG00000012048
# 2               EGFR  ENSG00000146648

Raw database-native strings still work (backwards compatible):

result = translate_gene_ids(
    ["TP53", "BRCA1"],
    from_type="external_gene_name",   # BioMart native
    to_type="ensembl_gene_id",
)

Ensembl IDs to HGNC (using Ensembl REST API):

result = translate_gene_ids(
    ["ENSG00000141510", "ENSG00000012048"],
    from_type=GeneIDType.ENSEMBL_GENE_ID,
    to_type=GeneIDType.GENE_SYMBOL,
    database="ensembl",
)

Multiple target types (BioMart):

result = translate_gene_ids(
    ["TP53", "BRCA1"],
    from_type=GeneIDType.GENE_SYMBOL,
    to_type=[GeneIDType.ENSEMBL_GENE_ID, GeneIDType.ENTREZ_ID],
)
print(result)
#   external_gene_name    ensembl_gene_id  entrezgene_id
# 0               TP53  ENSG00000141510           7157
# 1              BRCA1  ENSG00000012048            672

translate_gene_ids_kegg¶

translate_gene_ids_kegg ¶

translate_gene_ids_kegg(
    ids: List[str], from_db: str, to_db: str
) -> DataFrame

Translate gene IDs using KEGG database.

Useful for converting between KEGG gene IDs and external databases.

Supported databases

KEGG organism codes: "hsa" (human), "mmu" (mouse), "rno" (rat), etc.
ncbi-geneid: NCBI Entrez Gene ID
ncbi-proteinid: NCBI Protein ID
uniprot: UniProt accession

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of gene IDs to translate (e.g., ["hsa:7157", "hsa:672"]).	required
`from_db`	`str`	Source database. Use KEGG entry IDs or external DB name.	required
`to_db`	`str`	Target database name.	required

Returns:

Type	Description
`DataFrame`	DataFrame with source and target ID columns.

Example

KEGG gene IDs to NCBI Entrez:

result = translate_gene_ids_kegg(
    ["hsa:7157", "hsa:672"],
    from_db="hsa",
    to_db="ncbi-geneid",
)
print(result)
#       source           target
# 0  hsa:7157  ncbi-geneid:7157
# 1   hsa:672   ncbi-geneid:672

Convert entire organism's genes to UniProt:

result = translate_gene_ids_kegg([], from_db="hsa", to_db="uniprot")
print(result.head())
#       source              target
# 0    hsa:1    up:P04217
# 1    hsa:2    up:P01023
# ...

Chemical Translation¶

translate_chemical_ids¶

translate_chemical_ids ¶

translate_chemical_ids(
    ids: List[str],
    from_type: str,
    to_type: Union[str, List[str]],
    return_dict: bool = False,
) -> Union[
    Dict[str, str], Dict[str, Dict[str, str]], DataFrame
]

Translate chemical/compound IDs between different identifier types.

Uses PubChem for ID conversion.

Supported ID types

cid: PubChem Compound ID
name: Compound name
smiles: SMILES string (canonical)
inchikey: InChIKey
inchi: InChI string
formula: Molecular formula

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of compound identifiers to translate.	required
`from_type`	`str`	Source ID type ("cid", "name", "smiles", "inchikey").	required
`to_type`	`Union[str, List[str]]`	Target ID type(s). Can be a single string or a list of strings. When a list is provided, multiple target IDs are returned. Valid types: "cid", "name", "smiles", "inchikey", "inchi", "formula".	required
`return_dict`	`bool`	If True, return dict mapping from_id -> to_id (or dict of to_ids when to_type is a list).	`False`

Returns:

Type	Description
`Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]`	When to_type is a string: Dict or DataFrame with translated IDs.
`Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]`	When to_type is a list: Dict mapping source IDs to dicts of {target_type: target_id}, or DataFrame with from_type column and one column per target type.

Example

Names to CIDs:

result = translate_chemical_ids(
    ["aspirin", "ibuprofen"],
    from_type="name",
    to_type="cid",
)
print(result)
#    name   cid    cid
# 0  aspirin  2244  2244
# 1  ibuprofen 3672  3672

CIDs to SMILES:

result = translate_chemical_ids(
    ["2244", "3672"],
    from_type="cid",
    to_type="smiles",
    return_dict=True,
)
print(result)
# {'2244': 'CC(=O)OC1=CC=CC=C1C(=O)O', '3672': 'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O'}

Multiple target types:

result = translate_chemical_ids(
    ["aspirin"],
    from_type="name",
    to_type=["cid", "smiles", "inchikey"],
)
print(result)
#      name   cid                      smiles                    inchikey
# 0  aspirin  2244  CC(=O)OC1=CC=CC=C1C(=O)O  BSYNRYMUTXBXSQ-UHFFFAOYSA-N

translate_chemical_ids_kegg¶

translate_chemical_ids_kegg ¶

translate_chemical_ids_kegg(
    ids: List[str], from_db: str, to_db: str
) -> DataFrame

Translate chemical/compound IDs using KEGG database.

Useful for converting between KEGG compound/drug IDs and external databases.

Supported databases

compound: KEGG Compound
drug: KEGG Drug
pubchem: PubChem CID
chebi: ChEBI ID

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of compound IDs to translate (e.g., ["cpd:C00022", "dr:D00001"]). If empty, converts entire database.	required
`from_db`	`str`	Source database (compound, drug, or entries).	required
`to_db`	`str`	Target database name.	required

Returns:

Type	Description
`DataFrame`	DataFrame with source and target ID columns.

Example

KEGG compound to PubChem:

result = translate_chemical_ids_kegg(
    ["cpd:C00022", "cpd:C00031"],
    from_db="compound",
    to_db="pubchem",
)
print(result)
#         source          target
# 0  cpd:C00022  pubchem:3324
# 1  cpd:C00031  pubchem:5793

translate_chembl_to_pubchem¶

translate_chembl_to_pubchem ¶

translate_chembl_to_pubchem(
    chembl_ids: List[str], return_dict: bool = False
) -> Union[Dict[str, int], DataFrame]

Translate ChEMBL molecule IDs to PubChem CIDs.

Parameters:

Name	Type	Description	Default
`chembl_ids`	`List[str]`	List of ChEMBL IDs (e.g., ["CHEMBL25", "CHEMBL1201585"]).	required
`return_dict`	`bool`	If True, return dict mapping ChEMBL ID -> PubChem CID.	`False`

Returns:

Type	Description
`Union[Dict[str, int], DataFrame]`	Dict or DataFrame with ChEMBL IDs and corresponding PubChem CIDs.

Example

result = translate_chembl_to_pubchem(["CHEMBL25", "CHEMBL1201585"])
print(result)
#       chembl_id  pubchem_cid
# 0       CHEMBL25         2244
# 1  CHEMBL1201585      5284616

translate_pubchem_to_chembl¶

translate_pubchem_to_chembl ¶

translate_pubchem_to_chembl(
    cids: List[int], return_dict: bool = False
) -> Union[Dict[int, str], DataFrame]

Translate PubChem CIDs to ChEMBL molecule IDs.

Parameters:

Name	Type	Description	Default
`cids`	`List[int]`	List of PubChem CIDs (e.g., [2244, 3672]).	required
`return_dict`	`bool`	If True, return dict mapping CID -> ChEMBL ID.	`False`

Returns:

Type	Description
`Union[Dict[int, str], DataFrame]`	Dict or DataFrame with PubChem CIDs and corresponding ChEMBL IDs.

Example

result = translate_pubchem_to_chembl([2244, 3672])
print(result)
#    pubchem_cid    chembl_id
# 0         2244     CHEMBL25
# 1         3672    CHEMBL521

Protein Translation¶

translate_protein_ids¶

translate_protein_ids ¶

translate_protein_ids(
    ids: List[str],
    from_type: str,
    to_type: Union[str, List[str]],
    organism: int = 9606,
    return_dict: bool = False,
) -> Union[
    Dict[str, str], Dict[str, Dict[str, str]], DataFrame
]

Translate protein/gene IDs using UniProt ID mapping service.

This function provides comprehensive ID translation between various biological databases using the UniProt ID mapping API.

Parameters:

Name	Type	Description	Default
`ids`	`List[str]`	List of IDs to translate.	required
`from_type`	`str`	Source ID type. Common options: - "UniProtKB_AC-ID": UniProt accession (e.g., "P04637") - "Gene_Name": Gene name/symbol (e.g., "TP53") - "GeneID": NCBI Gene ID (e.g., "7157") - "Ensembl": Ensembl gene ID (e.g., "ENSG00000141510") - "RefSeq_Protein": RefSeq protein ID - "PDB": PDB structure ID	required
`to_type`	`Union[str, List[str]]`	Target ID type(s). Can be a single string or a list of strings. When a list is provided, multiple target IDs are returned. Common options: - "UniProtKB": UniProt entry (returns accession) - "UniProtKB_AC-ID": UniProt accession - "GeneID": NCBI Gene ID - "Ensembl": Ensembl gene ID - "Ensembl_Protein": Ensembl protein ID - "RefSeq_Protein": RefSeq protein ID - "PDB": PDB structure ID - "STRING": STRING database ID - "ChEMBL": ChEMBL target ID	required
`organism`	`int`	NCBI taxonomy ID (default: 9606 for human). Only used for Gene_Name -> UniProt mapping.	`9606`
`return_dict`	`bool`	If True, return dict mapping from_id -> to_id (or dict of to_ids when to_type is a list). If False, return DataFrame.	`False`

Returns:

Type	Description
`Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]`	When to_type is a string: Dict mapping source IDs to target IDs, or DataFrame with mapping.
`Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]`	When to_type is a list: Dict mapping source IDs to dicts of {target_type: target_id}, or DataFrame with from column and one column per target type.

Example

UniProt to NCBI Gene ID:

result = translate_protein_ids(
    ["P04637", "P00533"],
    from_type="UniProtKB_AC-ID",
    to_type="GeneID",
)
print(result)
#      from     to
# 0  P04637   7157
# 1  P00533   1956

Gene names to UniProt:

result = translate_protein_ids(
    ["TP53", "EGFR"],
    from_type="Gene_Name",
    to_type="UniProtKB",
)
print(result)
#    from       to
# 0  TP53  P04637
# 1  EGFR  P00533

Multiple target types:

result = translate_protein_ids(
    ["P04637", "P00533"],
    from_type="UniProtKB_AC-ID",
    to_type=["GeneID", "Ensembl", "Gene_Name"],
)
print(result)
#      from  GeneID           Ensembl Gene_Name
# 0  P04637    7157  ENSG00000141510      TP53
# 1  P00533    1956  ENSG00000146648      EGFR

translate_gene_to_uniprot¶

translate_gene_to_uniprot ¶

translate_gene_to_uniprot(
    gene_names: List[str],
    organism: int = 9606,
    reviewed_only: bool = True,
    return_dict: bool = True,
) -> Union[Dict[str, str], DataFrame]

Translate gene names/symbols to UniProt accessions.

This is a convenience function for the common use case of mapping gene symbols to their canonical UniProt protein accessions.

Parameters:

Name	Type	Description	Default
`gene_names`	`List[str]`	List of gene names/symbols (e.g., ["TP53", "BRCA1"]).	required
`organism`	`int`	NCBI taxonomy ID (default: 9606 for human).	`9606`
`reviewed_only`	`bool`	Only return reviewed (Swiss-Prot) entries.	`True`
`return_dict`	`bool`	If True, return dict. If False, return DataFrame.	`True`

Returns:

Type	Description
`Union[Dict[str, str], DataFrame]`	Dict or DataFrame mapping gene names to UniProt accessions.

Example

mapping = translate_gene_to_uniprot(["TP53", "BRCA1", "EGFR"])
print(mapping)
# {'TP53': 'P04637', 'BRCA1': 'P38398', 'EGFR': 'P00533'}

translate_uniprot_to_gene¶

translate_uniprot_to_gene ¶

translate_uniprot_to_gene(
    accessions: List[str], return_dict: bool = True
) -> Union[Dict[str, str], DataFrame]

Translate UniProt accessions to gene names/symbols.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of UniProt accessions (e.g., ["P04637", "P00533"]).	required
`return_dict`	`bool`	If True, return dict. If False, return DataFrame.	`True`

Returns:

Type	Description
`Union[Dict[str, str], DataFrame]`	Dict or DataFrame mapping UniProt accessions to gene names.

Example

mapping = translate_uniprot_to_gene(["P04637", "P00533"])
print(mapping)
# {'P04637': 'TP53', 'P00533': 'EGFR'}

translate_uniprot_to_pdb¶

translate_uniprot_to_pdb ¶

translate_uniprot_to_pdb(
    accessions: List[str], return_dict: bool = True
) -> Union[Dict[str, List[str]], DataFrame]

Translate UniProt accessions to PDB structure IDs.

Note: One protein may have multiple PDB structures.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of UniProt accessions.	required
`return_dict`	`bool`	If True, return dict. If False, return DataFrame.	`True`

Returns:

Type	Description
`Union[Dict[str, List[str]], DataFrame]`	Dict mapping accessions to lists of PDB IDs, or DataFrame.

Example

result = translate_uniprot_to_pdb(["P04637"])
print(result)
# {'P04637': ['1A1U', '1AIE', '1C26', '1DT7', ...]}

translate_uniprot_to_ensembl¶

translate_uniprot_to_ensembl ¶

translate_uniprot_to_ensembl(
    accessions: List[str], return_dict: bool = True
) -> Union[Dict[str, str], DataFrame]

Translate UniProt accessions to Ensembl gene IDs.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of UniProt accessions.	required
`return_dict`	`bool`	If True, return dict. If False, return DataFrame.	`True`

Returns:

Type	Description
`Union[Dict[str, str], DataFrame]`	Dict or DataFrame mapping UniProt accessions to Ensembl IDs.

Example

result = translate_uniprot_to_ensembl(["P04637", "P00533"])
print(result)
# {'P04637': 'ENSG00000141510', 'P00533': 'ENSG00000146648'}

translate_uniprot_to_refseq¶

translate_uniprot_to_refseq ¶

translate_uniprot_to_refseq(
    accessions: List[str], return_dict: bool = True
) -> Union[Dict[str, List[str]], DataFrame]

Translate UniProt accessions to RefSeq protein IDs.

Parameters:

Name	Type	Description	Default
`accessions`	`List[str]`	List of UniProt accessions.	required
`return_dict`	`bool`	If True, return dict. If False, return DataFrame.	`True`

Returns:

Type	Description
`Union[Dict[str, List[str]], DataFrame]`	Dict mapping accessions to lists of RefSeq IDs, or DataFrame.

Example

result = translate_uniprot_to_refseq(["P04637"])
print(result)
# {'P04637': ['NP_000537.3', 'NP_001119584.1', ...]}

ID Type Reference¶

Universal Gene ID Aliases¶

Use these values for from_type / to_type in translate_gene_ids. The correct database-native name is resolved automatically per backend.

Universal alias	BioMart	NCBI	UniProt	Ensembl REST	HGNC
`gene_symbol`	`external_gene_name`	`symbol`	`Gene_Name`	`HGNC`	`symbol`
`ensembl_gene_id`	`ensembl_gene_id`	`ensembl_gene_id`	`Ensembl`	`ensembl_gene_id`	`ensembl_gene_id`
`ensembl_transcript_id`	`ensembl_transcript_id`	—	—	—	—
`ensembl_protein_id`	`ensembl_peptide_id`	—	—	—	—
`entrez_id`	`entrezgene_id`	`gene_id`	`GeneID`	`EntrezGene`	`entrez_id`
`hgnc_id`	`hgnc_id`	—	—	—	`hgnc_id`
`hgnc_symbol`	`hgnc_symbol`	—	—	—	`symbol`
`uniprot_id`	`uniprot_gn_id`	`uniprot`	`UniProtKB_AC-ID`	`Uniprot_gn`	`uniprot_ids`
`refseq_mrna`	`refseq_mrna`	`refseq_accession`	—	`RefSeq_mRNA`	`refseq_accession`
`refseq_protein`	`refseq_peptide`	`refseq_accession`	`RefSeq_Protein`	`RefSeq_peptide`	`refseq_accession`
`pdb_id`	—	—	`PDB`	—	—

Native database strings (e.g. "external_gene_name") are also accepted and passed through unchanged.

Protein ID Types (UniProt mapping)¶

ID Type	Description
`UniProtKB_AC-ID`	UniProt accession
`Gene_Name`	Gene symbol
`GeneID`	NCBI Gene ID
`Ensembl`	Ensembl gene ID
`RefSeq_Protein`	RefSeq protein ID
`PDB`	PDB structure ID

Chemical ID Types (PubChem)¶

ID Type	Description
`name`	Compound name
`cid`	PubChem CID
`smiles`	SMILES string
`inchikey`	InChIKey
`formula`	Molecular formula