Analysis Overview¶

The biodbs.analysis module provides statistical analysis functions for biological data.

Related sections:

API Reference - Complete function and class documentation
Data Fetching - Fetch gene sets and pathway data
ID Translation - Convert gene IDs before analysis
Knowledge Graph - Visualize enrichment results

Available Analyses¶

Analysis	Function	Description
ORA	`ora_kegg`, `ora_go`, `ora_enrichr`, `ora_reactome`	Over-representation analysis
GMT I/O	`load_gmt`, `save_gmt`, `fetch_gmt`	Load/save/fetch GMT gene set files

Quick Start¶

from biodbs.analysis import ora_kegg, ora_go, ora_enrichr

# KEGG pathway enrichment
result = ora_kegg(
    gene_list=["TP53", "BRCA1", "BRCA2", "ATM", "CHEK2"],
    organism="hsa",
    id_type="symbol"
)

# View results
print(result.summary())
df = result.as_dataframe()

Over-Representation Analysis¶

ORA (Over-Representation Analysis) tests whether a gene set is enriched for genes from specific pathways or functional categories.

Supported Resources¶

Function	Resource	Gene ID Type
`ora_kegg`	KEGG Pathways	Entrez ID, Symbol
`ora_go`	Gene Ontology (via QuickGO)	UniProt
`ora_enrichr`	EnrichR (100+ libraries)	Symbol
`ora_reactome`	Reactome (API)	Symbol
`ora_reactome_local`	Reactome (local, no API call)	Symbol

Basic Usage¶

from biodbs.analysis import ora_kegg

result = ora_kegg(
    gene_list=["TP53", "BRCA1", "BRCA2", "ATM"],
    organism="hsa",
    id_type="symbol"  # Auto-converts to Entrez
)

# Get significant pathways
significant = result.significant_terms(alpha=0.05)
print(significant.as_dataframe())

Working with Results¶

ORAResult Object¶

result = ora_kegg(gene_list, organism="hsa")

# Summary
print(result.summary())

# Number of terms tested
print(f"Tested: {len(result)} terms")

# As DataFrame
df = result.as_dataframe()

# Filter significant
significant = result.significant_terms(alpha=0.05)
significant = result.significant_terms(alpha=0.1, use_fdr=True)

Result Columns¶

Column	Description
`term_id`	Pathway/term identifier
`term_name`	Pathway/term name
`p_value`	Raw p-value
`q_value`	FDR-adjusted p-value
`overlap_count`	Number of genes overlapping
`term_size`	Total genes in term
`overlap_genes`	List of overlapping genes
`fold_enrichment`	Enrichment score

Next Steps¶

Detailed ORA documentation - Complete ORA guide with examples
API Reference - Full API documentation with enums
UniProt ID translation - Required for GO analysis
KEGG pathway data - Fetch pathway gene sets
EnrichR libraries - 100+ gene set libraries