Analysis Overview¶
The biodbs.analysis module provides statistical analysis functions for biological data.
Related sections:
- API Reference - Complete function and class documentation
- Data Fetching - Fetch gene sets and pathway data
- ID Translation - Convert gene IDs before analysis
- Knowledge Graph - Visualize enrichment results
Available Analyses¶
| Analysis | Function | Description |
|---|---|---|
| ORA | ora_kegg, ora_go, ora_enrichr, ora_reactome |
Over-representation analysis |
| GMT I/O | load_gmt, save_gmt, fetch_gmt |
Load/save/fetch GMT gene set files |
Quick Start¶
from biodbs.analysis import ora_kegg, ora_go, ora_enrichr
# KEGG pathway enrichment
result = ora_kegg(
gene_list=["TP53", "BRCA1", "BRCA2", "ATM", "CHEK2"],
organism="hsa",
id_type="symbol"
)
# View results
print(result.summary())
df = result.as_dataframe()
Over-Representation Analysis¶
ORA (Over-Representation Analysis) tests whether a gene set is enriched for genes from specific pathways or functional categories.
Supported Resources¶
| Function | Resource | Gene ID Type |
|---|---|---|
ora_kegg |
KEGG Pathways | Entrez ID, Symbol |
ora_go |
Gene Ontology (via QuickGO) | UniProt |
ora_enrichr |
EnrichR (100+ libraries) | Symbol |
ora_reactome |
Reactome (API) | Symbol |
ora_reactome_local |
Reactome (local, no API call) | Symbol |
Basic Usage¶
from biodbs.analysis import ora_kegg
result = ora_kegg(
gene_list=["TP53", "BRCA1", "BRCA2", "ATM"],
organism="hsa",
id_type="symbol" # Auto-converts to Entrez
)
# Get significant pathways
significant = result.significant_terms(alpha=0.05)
print(significant.as_dataframe())
Working with Results¶
ORAResult Object¶
result = ora_kegg(gene_list, organism="hsa")
# Summary
print(result.summary())
# Number of terms tested
print(f"Tested: {len(result)} terms")
# As DataFrame
df = result.as_dataframe()
# Filter significant
significant = result.significant_terms(alpha=0.05)
significant = result.significant_terms(alpha=0.1, use_fdr=True)
Result Columns¶
| Column | Description |
|---|---|
term_id |
Pathway/term identifier |
term_name |
Pathway/term name |
p_value |
Raw p-value |
q_value |
FDR-adjusted p-value |
overlap_count |
Number of genes overlapping |
term_size |
Total genes in term |
overlap_genes |
List of overlapping genes |
fold_enrichment |
Enrichment score |
Next Steps¶
- Detailed ORA documentation - Complete ORA guide with examples
- API Reference - Full API documentation with enums
- UniProt ID translation - Required for GO analysis
- KEGG pathway data - Fetch pathway gene sets
- EnrichR libraries - 100+ gene set libraries