Skip to content

Analysis Module API Reference

Complete reference for biodbs.analysis module.

Summary

Classes

Class Description
ORAResult Container for over-representation analysis results
ORATermResult Single term result from ORA
Pathway Represents a biological pathway with gene sets

Enums

Enum Description
Species Supported species for ORA (human, mouse, rat, etc.)
GOAspect Gene Ontology aspects (BP, MF, CC)
CorrectionMethod Multiple testing correction methods (FDR, Bonferroni)
TranslationDatabase Databases for automatic ID translation
PathwayDatabase Pathway database sources (KEGG, GO, Reactome)

Core ORA Functions

Function Description
ora Generic ORA against any pathway database
ora_kegg ORA against KEGG pathways
ora_go ORA against Gene Ontology terms
ora_reactome ORA against Reactome pathways (via API)
ora_reactome_local ORA against Reactome pathways (local calculation)
ora_enrichr ORA via EnrichR web service

Utility Functions

Function Description
hypergeometric_test Compute hypergeometric p-value
multiple_test_correction Apply multiple testing correction

GMT Functions

Function Description
load_gmt Load a GMT file into a dict of Pathway objects
save_gmt Write Pathway objects to a GMT file
fetch_gmt Fetch gene sets from KEGG or EnrichR as GMT

Enums

Species

Supported species for ORA. Each member contains: (taxon_id, common_name, kegg_code, scientific_name).

Member Taxon ID KEGG Code Scientific Name
HUMAN 9606 hsa Homo sapiens
MOUSE 10090 mmu Mus musculus
RAT 10116 rno Rattus norvegicus
ZEBRAFISH 7955 dre Danio rerio
FLY 7227 dme Drosophila melanogaster
WORM 6239 cel Caenorhabditis elegans
YEAST 559292 sce Saccharomyces cerevisiae

Species

Species(
    taxon_id: int,
    common_name: str,
    kegg_code: str,
    scientific_name: str,
)

Bases: Enum

Species with their NCBI taxon IDs and common names.

Each member carries four pieces of metadata so that any naming convention (common name, KEGG code, taxon ID, scientific name) can be used interchangeably everywhere in biodbs.

Attributes:

Name Type Description
taxon_id

NCBI taxonomy ID.

common_name

Lower-case common name (e.g. "human").

kegg_code

Three-letter KEGG organism code (e.g. "hsa").

scientific_name

Binomial scientific name (e.g. "Homo sapiens").

Examples:

>>> from biodbs import Species
>>> translate_gene_ids(["TP53"], from_type=GeneIDType.GENE_SYMBOL,
...                    to_type=GeneIDType.ENSEMBL_GENE_ID,
...                    species=Species.HUMAN)
>>> ora_kegg(genes, species=Species.MOUSE)
>>> ora_go(genes, species=Species.HUMAN)
Source code in biodbs/_funcs/_species.py
def __init__(
    self,
    taxon_id: int,
    common_name: str,
    kegg_code: str,
    scientific_name: str,
):
    self.taxon_id = taxon_id
    self.common_name = common_name
    self.kegg_code = kegg_code
    self.scientific_name = scientific_name

from_taxon_id classmethod

from_taxon_id(taxon_id: int) -> Species

Look up a Species by its NCBI taxonomy ID.

Raises:

Type Description
ValueError

If taxon_id is not in the supported set.

Source code in biodbs/_funcs/_species.py
@classmethod
def from_taxon_id(cls, taxon_id: int) -> "Species":
    """Look up a Species by its NCBI taxonomy ID.

    Raises:
        ValueError: If *taxon_id* is not in the supported set.
    """
    for sp in cls:
        if sp.taxon_id == taxon_id:
            return sp
    raise ValueError(
        f"Unknown taxon ID: {taxon_id}. "
        f"Supported: {', '.join(f'{s.name}={s.taxon_id}' for s in cls)}"
    )

from_kegg_code classmethod

from_kegg_code(kegg_code: str) -> Species

Look up a Species by its KEGG three-letter organism code.

Raises:

Type Description
ValueError

If kegg_code is not recognised.

Source code in biodbs/_funcs/_species.py
@classmethod
def from_kegg_code(cls, kegg_code: str) -> "Species":
    """Look up a Species by its KEGG three-letter organism code.

    Raises:
        ValueError: If *kegg_code* is not recognised.
    """
    for sp in cls:
        if sp.kegg_code == kegg_code:
            return sp
    raise ValueError(
        f"Unknown KEGG code: {kegg_code!r}. "
        f"Supported: {', '.join(s.kegg_code for s in cls)}"
    )

from_name classmethod

from_name(name: str) -> Species

Look up a Species from any of its names.

Accepts the common name ("human"), scientific name ("Homo sapiens"), KEGG code ("hsa"), or the enum member name ("HUMAN"), all case-insensitive.

Raises:

Type Description
ValueError

If name does not match any known species.

Source code in biodbs/_funcs/_species.py
@classmethod
def from_name(cls, name: str) -> "Species":
    """Look up a Species from any of its names.

    Accepts the common name (``"human"``), scientific name
    (``"Homo sapiens"``), KEGG code (``"hsa"``), or the enum
    member name (``"HUMAN"``), all case-insensitive.

    Raises:
        ValueError: If *name* does not match any known species.
    """
    name_lower = name.lower().strip()
    for sp in cls:
        if name_lower in (
            sp.common_name.lower(),
            sp.scientific_name.lower(),
            sp.kegg_code.lower(),
            sp.name.lower(),
        ):
            return sp
    raise ValueError(
        f"Unknown species: {name!r}. "
        f"Supported: {', '.join(s.common_name for s in cls)}"
    )

GOAspect

Gene Ontology aspects for filtering GO terms.

Member Value Description
BIOLOGICAL_PROCESS "biological_process" BP - Biological processes
MOLECULAR_FUNCTION "molecular_function" MF - Molecular functions
CELLULAR_COMPONENT "cellular_component" CC - Cellular components
ALL "all" All GO aspects

GOAspect

Bases: str, Enum

Gene Ontology aspects.

CorrectionMethod

Multiple testing correction methods.

Member Value Description
BONFERRONI "bonferroni" Bonferroni correction (conservative)
BH "benjamini_hochberg" Benjamini-Hochberg FDR (recommended)
BY "benjamini_yekutieli" Benjamini-Yekutieli FDR
HOLM "holm" Holm-Bonferroni method
NONE "none" No correction

CorrectionMethod

Bases: str, Enum

Multiple testing correction methods.

TranslationDatabase

Databases for automatic ID translation.

Member Value Description
NCBI "ncbi" NCBI Datasets API — default; best for symbol ↔ Entrez ↔ Ensembl
ENSEMBL "ensembl" Ensembl REST xrefs — natural choice for Ensembl IDs
UNIPROT "uniprot" UniProt ID mapping — best for protein-centric translations
BIOMART "biomart" BioMart — widest ID type range, but less reliable
HGNC "hgnc" HGNC REST API — authoritative for human nomenclature (human only)

TranslationDatabase

Bases: str, Enum

Databases available for gene ID translation.

Use these as the database parameter in :func:translate_gene_ids and the translation_database parameter in ORA functions.

Members

NCBI: NCBI Datasets API. Most stable; best for symbol ↔ Entrez ↔ Ensembl translations. Default for translate_gene_ids. ENSEMBL: Ensembl REST API (xrefs endpoint). More stable than BioMart; natural choice when working with Ensembl IDs. UNIPROT: UniProt ID-mapping API. Best for protein-centric translations (UniProt accession, PDB, RefSeq protein). BIOMART: BioMart / Ensembl query interface. Supports the widest range of ID types but is less reliable than the other options. HGNC: HGNC REST API. Authoritative for human gene nomenclature; best for translations involving HGNC IDs, approved symbols, aliases, and previous symbols. Human only.

Examples:

>>> from biodbs.translate import TranslationDatabase, translate_gene_ids
>>> translate_gene_ids(["TP53"], from_type="gene_symbol",
...                    to_type="ensembl_gene_id",
...                    database=TranslationDatabase.NCBI)
>>> # Raw strings still work for backwards compatibility
>>> translate_gene_ids(["TP53"], from_type="gene_symbol",
...                    to_type="ensembl_gene_id",
...                    database="ncbi")

PathwayDatabase

Pathway database sources.

Member Value Description
KEGG "kegg" KEGG pathways
GO "go" Gene Ontology terms
ENRICHR "enrichr" EnrichR libraries
REACTOME "reactome" Reactome pathways

PathwayDatabase

Bases: str, Enum

Supported pathway databases.


Result Classes

ORAResult

ORAResult dataclass

ORAResult(
    results: List[ORATermResult],
    query_genes: List[str],
    mapped_genes: List[str],
    unmapped_genes: List[str],
    background_size: int,
    database: str,
    parameters: Dict[str, Any] = dict(),
)

Result container for over-representation analysis.

significant_terms

significant_terms(
    p_threshold: float = 0.05, use_adjusted: bool = True
) -> "ORAResult"

Filter to only significant terms.

top_terms

top_terms(n: int = 10) -> 'ORAResult'

Get top N terms by adjusted p-value.

as_dataframe

as_dataframe(
    engine: Literal["pandas", "polars"] = "pandas",
) -> "pd.DataFrame"

Convert results to a DataFrame.

summary

summary() -> str

Get a text summary of the results.

ORATermResult

ORATermResult dataclass

ORATermResult(
    term_id: str,
    term_name: str,
    p_value: float,
    adjusted_p_value: float,
    overlap_count: int,
    term_size: int,
    query_size: int,
    background_size: int,
    fold_enrichment: float,
    overlap_genes: List[str],
    database: str,
)

Result for a single term/pathway in ORA.

odds_ratio property

odds_ratio: float

Calculate odds ratio for enrichment.

to_dict

to_dict() -> Dict[str, Any]

Convert to dictionary.

Pathway

Pathway dataclass

Pathway(
    id: str,
    name: str,
    genes: FrozenSet[str],
    database: str,
    species: Optional[str] = None,
    url: Optional[str] = None,
)

A biological pathway or gene set.

Attributes:

Name Type Description
id str

Unique pathway identifier (e.g., "hsa04110", "R-HSA-69278", "GO:0006915")

name str

Human-readable pathway name

genes FrozenSet[str]

Set of gene identifiers in this pathway

database str

Source database (KEGG, Reactome, GO, etc.)

species Optional[str]

Species this pathway belongs to (optional)

url Optional[str]

URL to pathway page (optional)

overlap

overlap(gene_list: Set[str]) -> Set[str]

Get genes that overlap with a query gene list.

to_tuple

to_tuple() -> Tuple[str, Set[str]]

Convert to legacy tuple format (name, genes).

from_tuple classmethod

from_tuple(
    pathway_id: str,
    data: Tuple[str, Set[str]],
    database: str,
) -> "Pathway"

Create Pathway from legacy tuple format.


Core ORA Functions

ora

ora

ora(
    genes: List[str],
    gene_sets: Union[
        Dict[str, Tuple[str, Set[str]]],
        Dict[str, Pathway],
        str,
        Path,
    ],
    background: Optional[Set[str]] = None,
    min_overlap: int = 3,
    correction_method: Union[str, CorrectionMethod] = BH,
    database_name: str = "custom",
) -> ORAResult

Perform over-representation analysis with custom gene sets.

Parameters:

Name Type Description Default
genes List[str]

List of query genes.

required
gene_sets Union[Dict[str, Tuple[str, Set[str]]], Dict[str, Pathway], str, Path]

Gene sets to test — one of:

  • Dict[str, Pathway] — as returned by :func:fetch_gmt.
  • Dict[str, Tuple[str, Set[str]]] — legacy tuple format.
  • str or :class:~pathlib.Path — path to a .gmt file, which will be loaded automatically via :func:load_gmt.
required
background Optional[Set[str]]

Background gene set (universe). If None, uses union of all genes.

None
min_overlap int

Minimum overlap required to test a gene set.

3
correction_method Union[str, CorrectionMethod]

Multiple testing correction method.

BH
database_name str

Name of the database for result annotation.

'custom'

Returns:

Type Description
ORAResult

ORAResult with enrichment results.

ora_kegg

ora_kegg

ora_kegg(
    genes: List[str],
    species: Union[Species, str, int] = HUMAN,
    from_id_type: str = "entrez",
    background: Optional[Set[str]] = None,
    min_overlap: int = 3,
    correction_method: Union[str, CorrectionMethod] = BH,
    translation_database: Union[
        str, TranslationDatabase
    ] = BIOMART,
    use_cache: bool = True,
    cache_dir: Optional[str] = None,
    organism: Optional[str] = None,
) -> ORAResult

Perform KEGG pathway over-representation analysis.

Parameters:

Name Type Description Default
genes List[str]

List of query genes.

required
species Union[Species, str, int]

Species to analyse. Accepts a :class:Species member, a common name ("human"), a KEGG code ("hsa"), a scientific name, or an NCBI taxon ID (9606). Defaults to :attr:Species.HUMAN.

HUMAN
from_id_type str

Input gene ID type. Automatically translates to Entrez IDs. Supported: "entrez", "symbol", "ensembl", "uniprot", "kegg"

'entrez'
background Optional[Set[str]]

Background gene set. If None, uses all genes in KEGG.

None
min_overlap int

Minimum overlap required to test a pathway.

3
correction_method Union[str, CorrectionMethod]

Multiple testing correction method.

BH
translation_database Union[str, TranslationDatabase]

Database for ID translation ("biomart", "uniprot", "ncbi").

BIOMART
use_cache bool

Whether to use cached pathway data.

True
cache_dir Optional[str]

Directory for cache files.

None
organism Optional[str]

Deprecated — pass species instead. A raw KEGG organism code (e.g. "hsa") still works via this argument for backwards compatibility but will be removed in a future version.

None

Returns:

Type Description
ORAResult

ORAResult with KEGG pathway enrichment results.

Example
from biodbs import Species

genes = ["TP53", "BRCA1", "BRCA2", "ATM", "CHEK2"]
# Preferred — use Species enum
result = ora_kegg(genes, species=Species.HUMAN, from_id_type="symbol")
# Also accepted — KEGG code, common name, or taxon ID
result = ora_kegg(genes, species="hsa",   from_id_type="symbol")
result = ora_kegg(genes, species="human", from_id_type="symbol")
result = ora_kegg(genes, species=9606,    from_id_type="symbol")
print(result.summary())

ora_go

ora_go

ora_go(
    genes: List[str],
    species: Union[Species, str, int] = HUMAN,
    from_id_type: str = "uniprot",
    aspect: Union[str, GOAspect] = BIOLOGICAL_PROCESS,
    evidence_codes: Optional[List[str]] = None,
    background: Optional[Set[str]] = None,
    min_overlap: int = 3,
    min_term_size: int = 5,
    max_term_size: int = 500,
    correction_method: Union[str, CorrectionMethod] = BH,
    translation_database: Union[
        str, TranslationDatabase
    ] = BIOMART,
    use_cache: bool = True,
    cache_dir: Optional[str] = None,
) -> ORAResult

Perform Gene Ontology over-representation analysis using QuickGO.

Parameters:

Name Type Description Default
genes List[str]

List of query genes.

required
species Union[Species, str, int]

Species to analyse. Accepts a :class:Species member, a common name ("human"), a KEGG code ("hsa"), a scientific name, or an NCBI taxon ID (9606). Defaults to :attr:Species.HUMAN.

HUMAN
from_id_type str

Input gene ID type. Automatically translates to UniProt IDs. Supported: "uniprot", "symbol", "ensembl", "entrez"

'uniprot'
aspect Union[str, GOAspect]

GO aspect to analyze.

BIOLOGICAL_PROCESS
evidence_codes Optional[List[str]]

Evidence codes to include. Default excludes IEA.

None
background Optional[Set[str]]

Background gene set. If None, uses all genes in GO.

None
min_overlap int

Minimum overlap required.

3
min_term_size int

Minimum genes per GO term.

5
max_term_size int

Maximum genes per GO term.

500
correction_method Union[str, CorrectionMethod]

Multiple testing correction method.

BH
translation_database Union[str, TranslationDatabase]

Database for ID translation.

BIOMART
use_cache bool

Whether to use cached GO data.

True
cache_dir Optional[str]

Directory for cache files.

None

Returns:

Type Description
ORAResult

ORAResult with GO term enrichment results.

Raises:

Type Description
ValueError

If the species value is not recognised.

Example
from biodbs import Species

genes = ["TP53", "BRCA1", "BRCA2", "ATM", "CHEK2"]
# Preferred — use Species enum
result = ora_go(genes, species=Species.HUMAN, from_id_type="symbol")
# Also accepted — taxon ID, common name, or KEGG code
result = ora_go(genes, species=9606,    from_id_type="symbol")
result = ora_go(genes, species="human", from_id_type="symbol")
result = ora_go(genes, species="hsa",   from_id_type="symbol")
print(result.significant_terms().as_dataframe().head())

ora_reactome

ora_reactome

ora_reactome(
    genes: List[str],
    species: str = "Homo sapiens",
    from_id_type: str = "symbol",
    interactors: bool = False,
    include_disease: bool = True,
    min_entities: Optional[int] = None,
    max_entities: Optional[int] = None,
    fetch_overlap_genes: bool = False,
    translation_database: Union[
        str, TranslationDatabase
    ] = BIOMART,
) -> ORAResult

Perform over-representation analysis using Reactome pathway database.

Parameters:

Name Type Description Default
genes List[str]

List of gene identifiers.

required
species str

Species name (e.g., "Homo sapiens", "Mus musculus").

'Homo sapiens'
from_id_type str

Input gene ID type. Automatically translates to gene symbols. Supported: "symbol", "ensembl", "entrez", "uniprot"

'symbol'
interactors bool

Include interactors in the analysis.

False
include_disease bool

Include disease pathways.

True
min_entities Optional[int]

Minimum pathway size.

None
max_entities Optional[int]

Maximum pathway size.

None
fetch_overlap_genes bool

If True, fetch specific overlap genes (slower).

False
translation_database Union[str, TranslationDatabase]

Database for ID translation.

BIOMART

Returns:

Type Description
ORAResult

ORAResult with Reactome pathway enrichment results.

Example
genes = ["TP53", "BRCA1", "BRCA2", "ATM"]
result = ora_reactome(genes, species="Homo sapiens")
print(result.summary())
# ORA Results Summary (Reactome)
# ========================================
# Query genes: 4
# Mapped genes: 4
# Significant (adj.p <= 0.05): 15

ora_reactome_local

ora_reactome_local

ora_reactome_local(
    genes: List[str],
    species: str = "Homo sapiens",
    from_id_type: str = "symbol",
    background: Optional[Set[str]] = None,
    min_overlap: int = 3,
    min_term_size: int = 5,
    max_term_size: int = 500,
    correction_method: Union[str, CorrectionMethod] = BH,
    translation_database: Union[
        str, TranslationDatabase
    ] = BIOMART,
    use_cache: bool = True,
    cache_dir: Optional[str] = None,
) -> ORAResult

Perform local over-representation analysis using Reactome pathway data.

Parameters:

Name Type Description Default
genes List[str]

List of gene identifiers.

required
species str

Species name (e.g., "Homo sapiens", "Mus musculus").

'Homo sapiens'
from_id_type str

Input gene ID type. Automatically translates to gene symbols. Supported: "symbol", "ensembl", "entrez", "uniprot"

'symbol'
background Optional[Set[str]]

Background gene set. If None, uses all genes in pathways.

None
min_overlap int

Minimum overlap required to test a pathway.

3
min_term_size int

Minimum genes per pathway.

5
max_term_size int

Maximum genes per pathway.

500
correction_method Union[str, CorrectionMethod]

Multiple testing correction method.

BH
translation_database Union[str, TranslationDatabase]

Database for ID translation.

BIOMART
use_cache bool

Cache pathway data (recommended).

True
cache_dir Optional[str]

Directory for cache files.

None

Returns:

Type Description
ORAResult

ORAResult with Reactome pathway enrichment results.

Example
genes = ["TP53", "BRCA1", "BRCA2"]
result = ora_reactome_local(genes, species="Homo sapiens")
print(result)
# ORAResult(database='Reactome', num_significant=15, query_genes=3, mapped_genes=3)

# Get top enriched pathways
top_pathways = result.top_terms(n=5)
for term in top_pathways:
    print(f"{term.name}: p={term.p_value:.2e}")
# Cell Cycle: p=1.23e-05
# DNA Repair: p=2.45e-04
# ...

ora_enrichr

ora_enrichr

ora_enrichr(
    genes: List[str],
    gene_set_library: str = "KEGG_2021_Human",
    organism: str = "human",
    from_id_type: str = "symbol",
    translation_database: Union[
        str, TranslationDatabase
    ] = BIOMART,
) -> ORAResult

Perform over-representation analysis using EnrichR web service.

Parameters:

Name Type Description Default
genes List[str]

List of gene identifiers.

required
gene_set_library str

EnrichR library to use.

'KEGG_2021_Human'
organism str

Organism ("human", "mouse", "fly", "yeast", "worm", "fish").

'human'
from_id_type str

Input gene ID type. Automatically translates to gene symbols. Supported: "symbol", "ensembl", "entrez", "uniprot"

'symbol'
translation_database Union[str, TranslationDatabase]

Database for ID translation.

BIOMART

Returns:

Type Description
ORAResult

ORAResult with EnrichR enrichment results.

Example
genes = ["TP53", "BRCA1", "BRCA2", "ATM"]
result = ora_enrichr(genes, "KEGG_2021_Human")
print(result.as_dataframe()[["term_name", "adjusted_p_value"]].head())
#                               term_name  adjusted_p_value
# 0  Homologous recombination_Homo sapiens         0.00012
# 1           Breast cancer_Homo sapiens         0.00045

Utility Functions

hypergeometric_test

hypergeometric_test

hypergeometric_test(
    k: int, K: int, n: int, N: int
) -> float

Perform hypergeometric test for over-representation.

Calculates P(X >= k) where X follows a hypergeometric distribution. This is a one-sided test for enrichment (over-representation).

Parameters:

Name Type Description Default
k int

Number of genes in both query and term (successes in sample).

required
K int

Total genes in the term (successes in population).

required
n int

Number of query genes (sample size).

required
N int

Total genes in background/universe (population size).

required

Returns:

Type Description
float

P-value for the hypergeometric test.

multiple_test_correction

multiple_test_correction

multiple_test_correction(
    p_values: List[float],
    method: Union[str, CorrectionMethod] = BH,
) -> List[float]

Apply multiple testing correction to p-values.


GMT Functions

load_gmt

load_gmt

load_gmt(
    path: Union[str, Path],
    database: str = "",
    species: Optional[str] = None,
) -> "Dict[str, Any]"

Load a GMT file and return a dict of :class:Pathway objects.

Each non-empty line becomes one Pathway:

  • id ← column 1 (gene set name / pathway ID)
  • name ← column 2 (description; falls back to id if blank or "na")
  • genes ← columns 3 onwards
  • databasedatabase parameter (default "" — set to the source name)
  • speciesspecies parameter (optional)

Parameters:

Name Type Description Default
path Union[str, Path]

Path to the .gmt file.

required
database str

Database label to attach to every :class:Pathway (e.g. "KEGG", "MSigDB_H").

''
species Optional[str]

Species string to attach to every :class:Pathway (e.g. "Homo sapiens").

None

Returns:

Type Description
'Dict[str, Any]'

Dict[pathway_id, Pathway] ready to pass to :func:ora.

Raises:

Type Description
FileNotFoundError

If path does not exist.

Example::

gene_sets = load_gmt("h.all.v2023.1.Hs.symbols.gmt", database="MSigDB_H")
result = ora(my_genes, gene_sets)

save_gmt

save_gmt

save_gmt(
    gene_sets: Union["Dict[str, Any]", "Dict[str, Tuple]"],
    path: Union[str, Path],
) -> Path

Save gene sets to a GMT file.

Accepts the same dict types that :func:ora accepts:

  • Dict[str, Pathway]
  • Dict[str, Tuple[str, Set[str]]] (legacy tuple format)

Parameters:

Name Type Description Default
gene_sets Union['Dict[str, Any]', 'Dict[str, Tuple]']

Gene sets to write.

required
path Union[str, Path]

Output file path (created with parent directories if needed).

required

Returns:

Name Type Description
Resolved Path

class:~pathlib.Path of the written file.

Example::

gene_sets = fetch_gmt("hsa", database="kegg")
save_gmt(gene_sets, "kegg_hsa.gmt")

fetch_gmt

fetch_gmt

fetch_gmt(
    name: str,
    database: Literal[
        "kegg",
        "go",
        "gene ontology",
        "reactome",
        "enrichr",
        "msigdb",
    ] = "kegg",
    save_at: Optional[str] = None,
    species: Union[str, "Species"] = "human",
    aspect: str = "biological_process",
    use_cache: bool = True,
    min_term_size: int = 5,
    max_term_size: int = 500,
) -> "Dict[str, Any]"

Fetch a gene set collection from a pathway database and return as Dict[str, Pathway].

The returned dict is immediately usable with :func:ora. Pass save_at to also write a GMT file for use with gseapy / GSEA Desktop.

Parameters:

Name Type Description Default
name str

Database-specific identifier:

  • kegg — KEGG organism code (e.g. "hsa", "mmu").
  • reactome — species name (e.g. "human", "mouse"). Can be any value accepted by :class:Species.
  • go — GO aspect: "biological_process" (default), "molecular_function", "cellular_component", or "all". Combined with the species parameter.
  • enrichr — EnrichR library name (e.g. "KEGG_2021_Human", "MSigDB_Hallmark_2020"). Call :func:~biodbs.fetch.EnrichR.funcs.enrichr_get_libraries to list all available libraries.
required
database Literal['kegg', 'go', 'gene ontology', 'reactome', 'enrichr', 'msigdb']

Source database. One of "kegg", "reactome", "go", "enrichr". Case-insensitive.

'kegg'
save_at Optional[str]

Optional file path for the GMT output. The placeholder {name} is replaced with the sanitised name argument. Examples: "./kegg_hsa.gmt", "./{name}.gmt". Pass None (default) to skip writing.

None
species Union[str, 'Species']

Species for KEGG / GO / Reactome lookups. Ignored for EnrichR (library names are already species-specific). Accepts anything that :func:~biodbs._funcs._species.resolve_species understands.

'human'
aspect str

GO aspect when database="go". Overridden by name when name is a valid aspect string.

'biological_process'
use_cache bool

Whether to use and populate the pathway cache.

True
min_term_size int

Minimum genes per pathway (KEGG / GO / Reactome only).

5
max_term_size int

Maximum genes per pathway (KEGG / GO / Reactome only).

500

Returns:

Type Description
'Dict[str, Any]'

Dict[pathway_id, Pathway] — same type accepted by :func:ora.

Raises:

Type Description
ValueError

For unknown database names.

RuntimeError

If the EnrichR download fails.

Examples::

# All KEGG human pathways → save GMT
ks = fetch_gmt("hsa", database="kegg", save_at="./{name}.gmt")

# Reactome mouse pathways
rs = fetch_gmt("mouse", database="reactome")

# GO Biological Process (human, cached)
gs = fetch_gmt("biological_process", database="go")

# EnrichR Hallmark gene sets
hs = fetch_gmt("MSigDB_Hallmark_2020", database="enrichr",
               save_at="./hallmark.gmt")

# Run ORA immediately
result = ora(my_genes, fetch_gmt("hsa", database="kegg"))

DataFrame Columns

When using ORAResult.as_dataframe():

Column Type Description
term_id str Pathway/term ID
term_name str Pathway/term name
p_value float Raw p-value
adjusted_p_value float FDR-adjusted p-value
overlap_count int Overlapping genes
term_size int Total genes in term
query_size int Number of query genes
background_size int Universe size
fold_enrichment float Enrichment score
odds_ratio float Odds ratio
overlap_genes str Comma-separated gene IDs
database str Source database

EnrichR Libraries

Popular gene set libraries available in EnrichR:

Library Description
KEGG_2021_Human KEGG pathways
GO_Biological_Process_2021 GO biological process
GO_Molecular_Function_2021 GO molecular function
GO_Cellular_Component_2021 GO cellular component
Reactome_2022 Reactome pathways
WikiPathways_2019_Human WikiPathways
MSigDB_Hallmark_2020 MSigDB Hallmark
GWAS_Catalog_2019 GWAS Catalog
DisGeNET Disease-gene associations
DrugMatrix Drug signatures

Get all available libraries:

from biodbs.fetch import enrichr_get_libraries
libraries = enrichr_get_libraries()