Skip to content

Translate Module API Reference

Complete reference for biodbs.translate module.

Key Features

  • Universal ID aliases: Use GeneIDType enum values (e.g. "gene_symbol", "entrez_id") instead of database-native field names — the correct name is resolved per backend automatically.
  • Multiple Target Types: All main translation functions accept either a single target type or a list. When a list is provided, all target IDs are returned in one call.
from biodbs.translate import translate_gene_ids, GeneIDType

# Universal alias (works with any database)
result = translate_gene_ids(["TP53"], from_type="gene_symbol", to_type="ensembl_gene_id")

# Enum members are equivalent
result = translate_gene_ids(["TP53"], from_type=GeneIDType.GENE_SYMBOL,
                             to_type=GeneIDType.ENSEMBL_GENE_ID)

# Multiple target types — more efficient than multiple calls
result = translate_gene_ids(
    ["TP53"],
    from_type="gene_symbol",
    to_type=["ensembl_gene_id", "entrez_id", "hgnc_id"]
)

Enums

GeneIDType

GeneIDType

Bases: str, Enum

Universal gene / protein identifier types.

Use these members as from_type / to_type in :func:translate_gene_ids instead of database-specific field names. The correct native name for the chosen database is resolved automatically.

Raw native strings (e.g. "external_gene_name", "Gene_Name") are still accepted everywhere and passed through unchanged.

Examples:

>>> from biodbs.translate import GeneIDType
>>> translate_gene_ids(["TP53"], from_type=GeneIDType.GENE_SYMBOL,
...                    to_type=GeneIDType.ENSEMBL_GENE_ID)
>>> translate_gene_ids(["TP53"], from_type="gene_symbol",   # same
...                    to_type="ensembl_gene_id")

TranslationDatabase

TranslationDatabase

Bases: str, Enum

Databases available for gene ID translation.

Use these as the database parameter in :func:translate_gene_ids and the translation_database parameter in ORA functions.

Members

NCBI: NCBI Datasets API. Most stable; best for symbol ↔ Entrez ↔ Ensembl translations. Default for translate_gene_ids. ENSEMBL: Ensembl REST API (xrefs endpoint). More stable than BioMart; natural choice when working with Ensembl IDs. UNIPROT: UniProt ID-mapping API. Best for protein-centric translations (UniProt accession, PDB, RefSeq protein). BIOMART: BioMart / Ensembl query interface. Supports the widest range of ID types but is less reliable than the other options. HGNC: HGNC REST API. Authoritative for human gene nomenclature; best for translations involving HGNC IDs, approved symbols, aliases, and previous symbols. Human only.

Examples:

>>> from biodbs.translate import TranslationDatabase, translate_gene_ids
>>> translate_gene_ids(["TP53"], from_type="gene_symbol",
...                    to_type="ensembl_gene_id",
...                    database=TranslationDatabase.NCBI)
>>> # Raw strings still work for backwards compatibility
>>> translate_gene_ids(["TP53"], from_type="gene_symbol",
...                    to_type="ensembl_gene_id",
...                    database="ncbi")

Functions Summary

Gene Translation

Function Description
translate_gene_ids Translate gene IDs between databases
translate_gene_ids_kegg Translate gene IDs using KEGG API

Chemical Translation

Function Description
translate_chemical_ids Translate chemical IDs via PubChem
translate_chemical_ids_kegg Translate chemical IDs using KEGG API
translate_chembl_to_pubchem Map ChEMBL IDs to PubChem CIDs
translate_pubchem_to_chembl Map PubChem CIDs to ChEMBL IDs

Protein Translation

Function Description
translate_protein_ids Translate protein IDs via UniProt ID mapping
translate_gene_to_uniprot Map gene symbols to UniProt accessions
translate_uniprot_to_gene Map UniProt accessions to gene symbols
translate_uniprot_to_pdb Map UniProt accessions to PDB IDs
translate_uniprot_to_ensembl Map UniProt accessions to Ensembl gene IDs
translate_uniprot_to_refseq Map UniProt accessions to RefSeq protein IDs

Gene Translation

translate_gene_ids

translate_gene_ids

translate_gene_ids(
    ids: List[str],
    from_type: Union[GeneIDType, str],
    to_type: Union[
        GeneIDType, str, List[Union[GeneIDType, str]]
    ],
    species: Union[Species, str, int] = HUMAN,
    database: Union[
        TranslationDatabase,
        Literal[
            "ncbi", "ensembl", "uniprot", "biomart", "hgnc"
        ],
    ] = NCBI,
    return_dict: bool = False,
) -> Union[
    Dict[str, str], Dict[str, Dict[str, str]], DataFrame
]

Translate gene IDs between different identifier types.

Parameters:

Name Type Description Default
ids List[str]

List of gene IDs to translate.

required
from_type Union[GeneIDType, str]

Source ID type.

required
to_type Union[GeneIDType, str, List[Union[GeneIDType, str]]]

Target ID type(s). Can be a single string or a list of strings. When a list is provided, multiple target IDs are returned.

required
species Union[Species, str, int]

Species to translate for. Accepts a :class:Species member, a common name ("human"), a KEGG code ("hsa"), a scientific name, or an NCBI taxon ID (9606). Defaults to :attr:Species.HUMAN.

HUMAN
database Union[TranslationDatabase, Literal['ncbi', 'ensembl', 'uniprot', 'biomart', 'hgnc']]

Database backend for translation. Accepts a :class:TranslationDatabase member or a plain string. Raw strings are matched case-insensitively for backwards compat.

  • :attr:TranslationDatabase.NCBI (default) — NCBI Datasets API. Most reliable; best for symbol ↔ Entrez ↔ Ensembl translations.
  • :attr:TranslationDatabase.ENSEMBL — Ensembl REST API (xrefs). Stable; natural when working with Ensembl IDs.
  • :attr:TranslationDatabase.UNIPROT — UniProt ID-mapping API. Best for protein-centric translations (UniProt, PDB, RefSeq protein).
  • :attr:TranslationDatabase.BIOMART — BioMart query interface. Widest ID-type coverage but less reliable than the other options.
NCBI
return_dict bool

If True, return a dict mapping from_id -> to_id (or dict of to_ids when to_type is a list). If False (default), return a DataFrame.

False
Supported ID types for NCBI
  • symbol / gene_symbol: Gene symbol (e.g., "TP53")
  • entrez_id / gene_id: NCBI Gene ID (e.g., "7157")
  • refseq_accession: RefSeq accession (e.g., "NM_000546.6")
  • ensembl_gene_id: Ensembl gene ID (output only)
  • uniprot / swiss_prot: UniProt accession (output only)
Supported ID types for BioMart
  • ensembl_gene_id: Ensembl gene ID (e.g., "ENSG00000141510")
  • ensembl_transcript_id: Ensembl transcript ID
  • ensembl_peptide_id: Ensembl protein ID
  • external_gene_name: Gene symbol (e.g., "TP53")
  • hgnc_symbol: HGNC symbol
  • hgnc_id: HGNC ID (e.g., "HGNC:11998")
  • entrezgene_id: NCBI Entrez gene ID
  • uniprot_gn_id: UniProt gene name
  • refseq_mrna: RefSeq mRNA ID
  • refseq_peptide: RefSeq protein ID
Supported ID types for Ensembl REST
  • Input (from_type): Ensembl stable IDs (ENSG, ENST, ENSP*)
  • Output (to_type): Filter by external_db name (e.g., "HGNC", "EntrezGene", "Uniprot_gn", "RefSeq_mRNA", "RefSeq_peptide")
Supported ID types for UniProt
  • UniProtKB_AC-ID: UniProt accession (e.g., "P04637")
  • Gene_Name: Gene symbol (e.g., "TP53")
  • GeneID: NCBI Gene ID (e.g., "7157")
  • Ensembl: Ensembl gene ID
  • RefSeq_Protein: RefSeq protein ID
  • PDB: PDB structure ID

Supported ID types for HGNC (human only): - symbol / gene_symbol / hgnc_symbol: Approved gene symbol - hgnc_id: HGNC ID (e.g., "HGNC:11998") - entrez_id: NCBI Gene ID - ensembl_gene_id: Ensembl stable gene ID - uniprot_id → uniprot_ids field (first accession returned) - refseq_mrna / refseq_protein → refseq_accession field (first returned)

Returns:

Type Description
Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]

When to_type is a string: Dict mapping source IDs to target IDs, or DataFrame with both columns.

Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]

When to_type is a list: Dict mapping source IDs to dicts of {target_type: target_id}, or DataFrame with from_type column and one column per target type.

Example

Gene symbols to Ensembl IDs using the universal enum:

from biodbs.translate import GeneIDType

result = translate_gene_ids(
    ["TP53", "BRCA1", "EGFR"],
    from_type=GeneIDType.GENE_SYMBOL,
    to_type=GeneIDType.ENSEMBL_GENE_ID,
)
print(result)
#   external_gene_name    ensembl_gene_id
# 0               TP53  ENSG00000141510
# 1              BRCA1  ENSG00000012048
# 2               EGFR  ENSG00000146648

Raw database-native strings still work (backwards compatible):

result = translate_gene_ids(
    ["TP53", "BRCA1"],
    from_type="external_gene_name",   # BioMart native
    to_type="ensembl_gene_id",
)

Ensembl IDs to HGNC (using Ensembl REST API):

result = translate_gene_ids(
    ["ENSG00000141510", "ENSG00000012048"],
    from_type=GeneIDType.ENSEMBL_GENE_ID,
    to_type=GeneIDType.GENE_SYMBOL,
    database="ensembl",
)

Multiple target types (BioMart):

result = translate_gene_ids(
    ["TP53", "BRCA1"],
    from_type=GeneIDType.GENE_SYMBOL,
    to_type=[GeneIDType.ENSEMBL_GENE_ID, GeneIDType.ENTREZ_ID],
)
print(result)
#   external_gene_name    ensembl_gene_id  entrezgene_id
# 0               TP53  ENSG00000141510           7157
# 1              BRCA1  ENSG00000012048            672

translate_gene_ids_kegg

translate_gene_ids_kegg

translate_gene_ids_kegg(
    ids: List[str], from_db: str, to_db: str
) -> DataFrame

Translate gene IDs using KEGG database.

Useful for converting between KEGG gene IDs and external databases.

Supported databases
  • KEGG organism codes: "hsa" (human), "mmu" (mouse), "rno" (rat), etc.
  • ncbi-geneid: NCBI Entrez Gene ID
  • ncbi-proteinid: NCBI Protein ID
  • uniprot: UniProt accession

Parameters:

Name Type Description Default
ids List[str]

List of gene IDs to translate (e.g., ["hsa:7157", "hsa:672"]).

required
from_db str

Source database. Use KEGG entry IDs or external DB name.

required
to_db str

Target database name.

required

Returns:

Type Description
DataFrame

DataFrame with source and target ID columns.

Example

KEGG gene IDs to NCBI Entrez:

result = translate_gene_ids_kegg(
    ["hsa:7157", "hsa:672"],
    from_db="hsa",
    to_db="ncbi-geneid",
)
print(result)
#       source           target
# 0  hsa:7157  ncbi-geneid:7157
# 1   hsa:672   ncbi-geneid:672

Convert entire organism's genes to UniProt:

result = translate_gene_ids_kegg([], from_db="hsa", to_db="uniprot")
print(result.head())
#       source              target
# 0    hsa:1    up:P04217
# 1    hsa:2    up:P01023
# ...

Chemical Translation

translate_chemical_ids

translate_chemical_ids

translate_chemical_ids(
    ids: List[str],
    from_type: str,
    to_type: Union[str, List[str]],
    return_dict: bool = False,
) -> Union[
    Dict[str, str], Dict[str, Dict[str, str]], DataFrame
]

Translate chemical/compound IDs between different identifier types.

Uses PubChem for ID conversion.

Supported ID types
  • cid: PubChem Compound ID
  • name: Compound name
  • smiles: SMILES string (canonical)
  • inchikey: InChIKey
  • inchi: InChI string
  • formula: Molecular formula

Parameters:

Name Type Description Default
ids List[str]

List of compound identifiers to translate.

required
from_type str

Source ID type ("cid", "name", "smiles", "inchikey").

required
to_type Union[str, List[str]]

Target ID type(s). Can be a single string or a list of strings. When a list is provided, multiple target IDs are returned. Valid types: "cid", "name", "smiles", "inchikey", "inchi", "formula".

required
return_dict bool

If True, return dict mapping from_id -> to_id (or dict of to_ids when to_type is a list).

False

Returns:

Type Description
Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]

When to_type is a string: Dict or DataFrame with translated IDs.

Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]

When to_type is a list: Dict mapping source IDs to dicts of {target_type: target_id}, or DataFrame with from_type column and one column per target type.

Example

Names to CIDs:

result = translate_chemical_ids(
    ["aspirin", "ibuprofen"],
    from_type="name",
    to_type="cid",
)
print(result)
#    name   cid    cid
# 0  aspirin  2244  2244
# 1  ibuprofen 3672  3672

CIDs to SMILES:

result = translate_chemical_ids(
    ["2244", "3672"],
    from_type="cid",
    to_type="smiles",
    return_dict=True,
)
print(result)
# {'2244': 'CC(=O)OC1=CC=CC=C1C(=O)O', '3672': 'CC(C)CC1=CC=C(C=C1)C(C)C(=O)O'}

Multiple target types:

result = translate_chemical_ids(
    ["aspirin"],
    from_type="name",
    to_type=["cid", "smiles", "inchikey"],
)
print(result)
#      name   cid                      smiles                    inchikey
# 0  aspirin  2244  CC(=O)OC1=CC=CC=C1C(=O)O  BSYNRYMUTXBXSQ-UHFFFAOYSA-N

translate_chemical_ids_kegg

translate_chemical_ids_kegg

translate_chemical_ids_kegg(
    ids: List[str], from_db: str, to_db: str
) -> DataFrame

Translate chemical/compound IDs using KEGG database.

Useful for converting between KEGG compound/drug IDs and external databases.

Supported databases
  • compound: KEGG Compound
  • drug: KEGG Drug
  • pubchem: PubChem CID
  • chebi: ChEBI ID

Parameters:

Name Type Description Default
ids List[str]

List of compound IDs to translate (e.g., ["cpd:C00022", "dr:D00001"]). If empty, converts entire database.

required
from_db str

Source database (compound, drug, or entries).

required
to_db str

Target database name.

required

Returns:

Type Description
DataFrame

DataFrame with source and target ID columns.

Example

KEGG compound to PubChem:

result = translate_chemical_ids_kegg(
    ["cpd:C00022", "cpd:C00031"],
    from_db="compound",
    to_db="pubchem",
)
print(result)
#         source          target
# 0  cpd:C00022  pubchem:3324
# 1  cpd:C00031  pubchem:5793

translate_chembl_to_pubchem

translate_chembl_to_pubchem

translate_chembl_to_pubchem(
    chembl_ids: List[str], return_dict: bool = False
) -> Union[Dict[str, int], DataFrame]

Translate ChEMBL molecule IDs to PubChem CIDs.

Parameters:

Name Type Description Default
chembl_ids List[str]

List of ChEMBL IDs (e.g., ["CHEMBL25", "CHEMBL1201585"]).

required
return_dict bool

If True, return dict mapping ChEMBL ID -> PubChem CID.

False

Returns:

Type Description
Union[Dict[str, int], DataFrame]

Dict or DataFrame with ChEMBL IDs and corresponding PubChem CIDs.

Example
result = translate_chembl_to_pubchem(["CHEMBL25", "CHEMBL1201585"])
print(result)
#       chembl_id  pubchem_cid
# 0       CHEMBL25         2244
# 1  CHEMBL1201585      5284616

translate_pubchem_to_chembl

translate_pubchem_to_chembl

translate_pubchem_to_chembl(
    cids: List[int], return_dict: bool = False
) -> Union[Dict[int, str], DataFrame]

Translate PubChem CIDs to ChEMBL molecule IDs.

Parameters:

Name Type Description Default
cids List[int]

List of PubChem CIDs (e.g., [2244, 3672]).

required
return_dict bool

If True, return dict mapping CID -> ChEMBL ID.

False

Returns:

Type Description
Union[Dict[int, str], DataFrame]

Dict or DataFrame with PubChem CIDs and corresponding ChEMBL IDs.

Example
result = translate_pubchem_to_chembl([2244, 3672])
print(result)
#    pubchem_cid    chembl_id
# 0         2244     CHEMBL25
# 1         3672    CHEMBL521

Protein Translation

translate_protein_ids

translate_protein_ids

translate_protein_ids(
    ids: List[str],
    from_type: str,
    to_type: Union[str, List[str]],
    organism: int = 9606,
    return_dict: bool = False,
) -> Union[
    Dict[str, str], Dict[str, Dict[str, str]], DataFrame
]

Translate protein/gene IDs using UniProt ID mapping service.

This function provides comprehensive ID translation between various biological databases using the UniProt ID mapping API.

Parameters:

Name Type Description Default
ids List[str]

List of IDs to translate.

required
from_type str

Source ID type. Common options: - "UniProtKB_AC-ID": UniProt accession (e.g., "P04637") - "Gene_Name": Gene name/symbol (e.g., "TP53") - "GeneID": NCBI Gene ID (e.g., "7157") - "Ensembl": Ensembl gene ID (e.g., "ENSG00000141510") - "RefSeq_Protein": RefSeq protein ID - "PDB": PDB structure ID

required
to_type Union[str, List[str]]

Target ID type(s). Can be a single string or a list of strings. When a list is provided, multiple target IDs are returned. Common options: - "UniProtKB": UniProt entry (returns accession) - "UniProtKB_AC-ID": UniProt accession - "GeneID": NCBI Gene ID - "Ensembl": Ensembl gene ID - "Ensembl_Protein": Ensembl protein ID - "RefSeq_Protein": RefSeq protein ID - "PDB": PDB structure ID - "STRING": STRING database ID - "ChEMBL": ChEMBL target ID

required
organism int

NCBI taxonomy ID (default: 9606 for human). Only used for Gene_Name -> UniProt mapping.

9606
return_dict bool

If True, return dict mapping from_id -> to_id (or dict of to_ids when to_type is a list). If False, return DataFrame.

False

Returns:

Type Description
Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]

When to_type is a string: Dict mapping source IDs to target IDs, or DataFrame with mapping.

Union[Dict[str, str], Dict[str, Dict[str, str]], DataFrame]

When to_type is a list: Dict mapping source IDs to dicts of {target_type: target_id}, or DataFrame with from column and one column per target type.

Example

UniProt to NCBI Gene ID:

result = translate_protein_ids(
    ["P04637", "P00533"],
    from_type="UniProtKB_AC-ID",
    to_type="GeneID",
)
print(result)
#      from     to
# 0  P04637   7157
# 1  P00533   1956

Gene names to UniProt:

result = translate_protein_ids(
    ["TP53", "EGFR"],
    from_type="Gene_Name",
    to_type="UniProtKB",
)
print(result)
#    from       to
# 0  TP53  P04637
# 1  EGFR  P00533

Multiple target types:

result = translate_protein_ids(
    ["P04637", "P00533"],
    from_type="UniProtKB_AC-ID",
    to_type=["GeneID", "Ensembl", "Gene_Name"],
)
print(result)
#      from  GeneID           Ensembl Gene_Name
# 0  P04637    7157  ENSG00000141510      TP53
# 1  P00533    1956  ENSG00000146648      EGFR

translate_gene_to_uniprot

translate_gene_to_uniprot

translate_gene_to_uniprot(
    gene_names: List[str],
    organism: int = 9606,
    reviewed_only: bool = True,
    return_dict: bool = True,
) -> Union[Dict[str, str], DataFrame]

Translate gene names/symbols to UniProt accessions.

This is a convenience function for the common use case of mapping gene symbols to their canonical UniProt protein accessions.

Parameters:

Name Type Description Default
gene_names List[str]

List of gene names/symbols (e.g., ["TP53", "BRCA1"]).

required
organism int

NCBI taxonomy ID (default: 9606 for human).

9606
reviewed_only bool

Only return reviewed (Swiss-Prot) entries.

True
return_dict bool

If True, return dict. If False, return DataFrame.

True

Returns:

Type Description
Union[Dict[str, str], DataFrame]

Dict or DataFrame mapping gene names to UniProt accessions.

Example
mapping = translate_gene_to_uniprot(["TP53", "BRCA1", "EGFR"])
print(mapping)
# {'TP53': 'P04637', 'BRCA1': 'P38398', 'EGFR': 'P00533'}

translate_uniprot_to_gene

translate_uniprot_to_gene

translate_uniprot_to_gene(
    accessions: List[str], return_dict: bool = True
) -> Union[Dict[str, str], DataFrame]

Translate UniProt accessions to gene names/symbols.

Parameters:

Name Type Description Default
accessions List[str]

List of UniProt accessions (e.g., ["P04637", "P00533"]).

required
return_dict bool

If True, return dict. If False, return DataFrame.

True

Returns:

Type Description
Union[Dict[str, str], DataFrame]

Dict or DataFrame mapping UniProt accessions to gene names.

Example
mapping = translate_uniprot_to_gene(["P04637", "P00533"])
print(mapping)
# {'P04637': 'TP53', 'P00533': 'EGFR'}

translate_uniprot_to_pdb

translate_uniprot_to_pdb

translate_uniprot_to_pdb(
    accessions: List[str], return_dict: bool = True
) -> Union[Dict[str, List[str]], DataFrame]

Translate UniProt accessions to PDB structure IDs.

Note: One protein may have multiple PDB structures.

Parameters:

Name Type Description Default
accessions List[str]

List of UniProt accessions.

required
return_dict bool

If True, return dict. If False, return DataFrame.

True

Returns:

Type Description
Union[Dict[str, List[str]], DataFrame]

Dict mapping accessions to lists of PDB IDs, or DataFrame.

Example
result = translate_uniprot_to_pdb(["P04637"])
print(result)
# {'P04637': ['1A1U', '1AIE', '1C26', '1DT7', ...]}

translate_uniprot_to_ensembl

translate_uniprot_to_ensembl

translate_uniprot_to_ensembl(
    accessions: List[str], return_dict: bool = True
) -> Union[Dict[str, str], DataFrame]

Translate UniProt accessions to Ensembl gene IDs.

Parameters:

Name Type Description Default
accessions List[str]

List of UniProt accessions.

required
return_dict bool

If True, return dict. If False, return DataFrame.

True

Returns:

Type Description
Union[Dict[str, str], DataFrame]

Dict or DataFrame mapping UniProt accessions to Ensembl IDs.

Example
result = translate_uniprot_to_ensembl(["P04637", "P00533"])
print(result)
# {'P04637': 'ENSG00000141510', 'P00533': 'ENSG00000146648'}

translate_uniprot_to_refseq

translate_uniprot_to_refseq

translate_uniprot_to_refseq(
    accessions: List[str], return_dict: bool = True
) -> Union[Dict[str, List[str]], DataFrame]

Translate UniProt accessions to RefSeq protein IDs.

Parameters:

Name Type Description Default
accessions List[str]

List of UniProt accessions.

required
return_dict bool

If True, return dict. If False, return DataFrame.

True

Returns:

Type Description
Union[Dict[str, List[str]], DataFrame]

Dict mapping accessions to lists of RefSeq IDs, or DataFrame.

Example
result = translate_uniprot_to_refseq(["P04637"])
print(result)
# {'P04637': ['NP_000537.3', 'NP_001119584.1', ...]}

ID Type Reference

Universal Gene ID Aliases

Use these values for from_type / to_type in translate_gene_ids. The correct database-native name is resolved automatically per backend.

Universal alias BioMart NCBI UniProt Ensembl REST HGNC
gene_symbol external_gene_name symbol Gene_Name HGNC symbol
ensembl_gene_id ensembl_gene_id ensembl_gene_id Ensembl ensembl_gene_id ensembl_gene_id
ensembl_transcript_id ensembl_transcript_id
ensembl_protein_id ensembl_peptide_id
entrez_id entrezgene_id gene_id GeneID EntrezGene entrez_id
hgnc_id hgnc_id hgnc_id
hgnc_symbol hgnc_symbol symbol
uniprot_id uniprot_gn_id uniprot UniProtKB_AC-ID Uniprot_gn uniprot_ids
refseq_mrna refseq_mrna refseq_accession RefSeq_mRNA refseq_accession
refseq_protein refseq_peptide refseq_accession RefSeq_Protein RefSeq_peptide refseq_accession
pdb_id PDB

Native database strings (e.g. "external_gene_name") are also accepted and passed through unchanged.

Protein ID Types (UniProt mapping)

ID Type Description
UniProtKB_AC-ID UniProt accession
Gene_Name Gene symbol
GeneID NCBI Gene ID
Ensembl Ensembl gene ID
RefSeq_Protein RefSeq protein ID
PDB PDB structure ID

Chemical ID Types (PubChem)

ID Type Description
name Compound name
cid PubChem CID
smiles SMILES string
inchikey InChIKey
formula Molecular formula