Fetch Module API Reference¶
Complete reference for biodbs.fetch module.
Summary¶
Fetcher Classes¶
| Class | Description |
|---|---|
UniProt_Fetcher |
Fetch protein data from UniProt REST API |
PubChem_Fetcher |
Fetch chemical data from PubChem PUG REST/View APIs |
Ensembl_Fetcher |
Fetch genomic data from Ensembl REST API |
BioMart_Fetcher |
Query Ensembl BioMart for gene annotations |
KEGG_Fetcher |
Fetch pathway and gene data from KEGG API |
ChEMBL_Fetcher |
Fetch bioactivity data from ChEMBL API |
QuickGO_Fetcher |
Fetch GO annotations from QuickGO API |
HPA_Fetcher |
Fetch protein expression from Human Protein Atlas |
NCBI_Fetcher |
Fetch gene data from NCBI Entrez |
FDA_Fetcher |
Fetch drug/device data from openFDA |
Reactome_Fetcher |
Fetch pathway data from Reactome |
DO_Fetcher |
Fetch disease terms from Disease Ontology |
EnrichR_Fetcher |
Perform gene set enrichment via EnrichR |
HGNC_Fetcher |
Fetch gene nomenclature from HGNC |
ClinVar_Fetcher |
Fetch clinical variant data from ClinVar |
UniProt Functions¶
| Function | Description |
|---|---|
uniprot_get_entry |
Get a single UniProt entry by accession |
uniprot_search |
Search UniProtKB with query |
uniprot_search_by_gene |
Search by gene name |
gene_to_uniprot |
Map gene symbols to UniProt accessions |
uniprot_map_ids |
Map IDs between databases |
PubChem Functions¶
| Function | Description |
|---|---|
pubchem_get_compound |
Get compound record by CID |
pubchem_search_by_name |
Search compounds by name |
pubchem_get_properties |
Get compound properties |
Ensembl Functions¶
| Function | Description |
|---|---|
ensembl_lookup |
Lookup entity by Ensembl ID |
ensembl_lookup_symbol |
Lookup by gene symbol |
ensembl_get_sequence |
Get nucleotide/protein sequence |
ensembl_get_xrefs |
Get cross-references |
BioMart Functions¶
| Function | Description |
|---|---|
biomart_get_genes |
Get gene annotations by Ensembl IDs |
biomart_convert_ids |
Convert between gene ID types |
biomart_query |
Custom BioMart query |
KEGG Functions¶
| Function | Description |
|---|---|
kegg_list |
List entries in a KEGG database |
kegg_get |
Get KEGG entry by ID |
kegg_link |
Get cross-references between databases |
kegg_conv |
Convert between KEGG and external IDs |
ChEMBL Functions¶
| Function | Description |
|---|---|
chembl_get_molecule |
Get molecule by ChEMBL ID |
chembl_search_molecules |
Search molecules by name |
chembl_get_approved_drugs |
Get approved drugs list |
QuickGO Functions¶
| Function | Description |
|---|---|
quickgo_search_annotations |
Search GO annotations |
quickgo_get_terms |
Get GO term details |
HPA Functions¶
| Function | Description |
|---|---|
hpa_get_gene |
Get gene expression data |
hpa_get_tissue_expression |
Get tissue-level expression |
NCBI Functions¶
| Function | Description |
|---|---|
ncbi_get_gene |
Get gene info by Entrez ID |
ncbi_symbol_to_id |
Convert gene symbol to Entrez ID |
FDA Functions¶
| Function | Description |
|---|---|
fda_search |
Search openFDA endpoints |
fda_drug_events |
Search drug adverse events |
Reactome Functions¶
| Function | Description |
|---|---|
reactome_analyze |
Analyze gene list against Reactome |
Disease Ontology Functions¶
| Function | Description |
|---|---|
do_get_term |
Get disease term by DOID |
do_get_children |
Get child terms |
EnrichR Functions¶
| Function | Description |
|---|---|
enrichr_enrich |
Perform enrichment analysis |
enrichr_get_libraries |
List available gene set libraries |
HGNC Functions¶
| Function | Description |
|---|---|
hgnc_fetch |
Exact-match lookup by any HGNC field |
hgnc_search |
Wildcard / boolean search across HGNC |
hgnc_fetch_by_symbol |
Fetch gene by approved symbol |
hgnc_fetch_by_hgnc_id |
Fetch gene by HGNC ID |
hgnc_fetch_by_entrez_id |
Fetch gene by Entrez Gene ID |
hgnc_fetch_by_ensembl_id |
Fetch gene by Ensembl gene ID |
hgnc_fetch_by_uniprot_id |
Fetch gene by UniProt accession |
hgnc_fetch_by_refseq |
Fetch gene by RefSeq accession |
hgnc_search_symbol |
Wildcard search on gene symbols |
hgnc_info |
Return HGNC service metadata |
ClinVar Functions¶
| Function | Description |
|---|---|
clinvar_search |
Search ClinVar with an Entrez query string |
clinvar_count |
Count ClinVar records matching a query |
clinvar_fetch_by_id |
Fetch variant summaries by variation UID |
clinvar_search_gene |
Search and fetch variants for a gene |
clinvar_search_condition |
Search and fetch variants for a condition |
clinvar_fetch_vcv |
Fetch full VCV XML record |
clinvar_fetch_rcv |
Fetch full RCV XML record |
clinvar_link_pubmed |
Get PubMed IDs linked to a variation |
Fetcher Classes¶
UniProt_Fetcher¶
UniProt_Fetcher
¶
Fetcher for UniProt REST API.
Provides access to UniProtKB protein data including:
- Entry retrieval by accession
- Search by query
- ID mapping between databases
- Batch retrieval
Example
fetcher = UniProt_Fetcher()
# Get protein by accession
entry = fetcher.get_entry("P05067") # APP protein
print(entry.entries[0].protein_name)
# Search for proteins
results = fetcher.search("gene:TP53 AND organism_id:9606")
print(results.as_dataframe())
# Get multiple entries
entries = fetcher.get_entries(["P05067", "P04637", "P00533"])
# Map IDs
mapping = fetcher.map_ids(
["P05067", "P04637"],
from_db="UniProtKB_AC-ID",
to_db="GeneID"
)
Initialize UniProt fetcher.
get_entry
¶
Get a UniProt entry by accession.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accession
|
str
|
UniProt accession (e.g., "P05067"). |
required |
fields
|
Optional[str]
|
Comma-separated list of fields to return. |
None
|
Returns:
| Type | Description |
|---|---|
UniProtFetchedData
|
UniProtFetchedData with the entry. |
get_entries
¶
Get multiple UniProt entries by accessions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accessions
|
List[str]
|
List of UniProt accessions. |
required |
fields
|
Optional[str]
|
Comma-separated list of fields to return. |
None
|
Returns:
| Type | Description |
|---|---|
UniProtFetchedData
|
UniProtFetchedData with all entries. |
search
¶
search(
query: str,
fields: Optional[str] = None,
sort: Optional[str] = None,
size: int = 25,
include_isoform: bool = False,
cursor: Optional[str] = None,
) -> UniProtSearchResult
Search UniProtKB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query (e.g., "gene:TP53 AND organism_id:9606"). |
required |
fields
|
Optional[str]
|
Comma-separated list of fields to return. |
None
|
sort
|
Optional[str]
|
Sort field and direction (e.g., "accession desc"). |
None
|
size
|
int
|
Number of results per page (max 500). |
25
|
include_isoform
|
bool
|
Include isoforms in results. |
False
|
cursor
|
Optional[str]
|
Cursor for pagination. |
None
|
Returns:
| Type | Description |
|---|---|
UniProtSearchResult
|
UniProtSearchResult with matching entries. |
search_all
¶
search_all(
query: str,
fields: Optional[str] = None,
sort: Optional[str] = None,
max_results: int = 10000,
include_isoform: bool = False,
) -> UniProtFetchedData
Search and retrieve all results with pagination.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query. |
required |
fields
|
Optional[str]
|
Fields to return. |
None
|
sort
|
Optional[str]
|
Sort field and direction. |
None
|
max_results
|
int
|
Maximum results to retrieve. |
10000
|
include_isoform
|
bool
|
Include isoforms. |
False
|
Returns:
| Type | Description |
|---|---|
UniProtFetchedData
|
UniProtFetchedData with all matching entries. |
search_by_gene
¶
search_by_gene(
gene_name: str,
organism: Optional[Union[int, str]] = None,
reviewed_only: bool = False,
size: int = 25,
) -> UniProtSearchResult
Search by gene name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_name
|
str
|
Gene name to search. |
required |
organism
|
Optional[Union[int, str]]
|
Organism tax ID or name. |
None
|
reviewed_only
|
bool
|
Only return reviewed entries. |
False
|
size
|
int
|
Results per page. |
25
|
Returns:
| Type | Description |
|---|---|
UniProtSearchResult
|
UniProtSearchResult with matching entries. |
search_by_organism
¶
search_by_organism(
organism: Union[int, str],
reviewed_only: bool = False,
size: int = 25,
) -> UniProtSearchResult
Search by organism.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
organism
|
Union[int, str]
|
Organism tax ID or name. |
required |
reviewed_only
|
bool
|
Only return reviewed entries. |
False
|
size
|
int
|
Results per page. |
25
|
Returns:
| Type | Description |
|---|---|
UniProtSearchResult
|
UniProtSearchResult with matching entries. |
search_by_keyword
¶
search_by_keyword(
keyword: str,
organism: Optional[Union[int, str]] = None,
reviewed_only: bool = False,
size: int = 25,
) -> UniProtSearchResult
Search by keyword.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keyword
|
str
|
Keyword to search (e.g., "kinase", "receptor"). |
required |
organism
|
Optional[Union[int, str]]
|
Optional organism filter. |
None
|
reviewed_only
|
bool
|
Only return reviewed entries. |
False
|
size
|
int
|
Results per page. |
25
|
Returns:
| Type | Description |
|---|---|
UniProtSearchResult
|
UniProtSearchResult with matching entries. |
map_ids
¶
map_ids(
ids: List[str],
from_db: str = "UniProtKB_AC-ID",
to_db: str = "UniProtKB",
poll_interval: float = 1.0,
max_wait: float = 60.0,
) -> Dict[str, List[str]]
Map IDs between databases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[str]
|
List of IDs to map. |
required |
from_db
|
str
|
Source database (e.g., "UniProtKB_AC-ID", "Gene_Name", "GeneID"). |
'UniProtKB_AC-ID'
|
to_db
|
str
|
Target database (e.g., "UniProtKB", "GeneID", "PDB"). |
'UniProtKB'
|
poll_interval
|
float
|
Seconds between status checks. |
1.0
|
max_wait
|
float
|
Maximum seconds to wait for job completion. |
60.0
|
Returns:
| Type | Description |
|---|---|
Dict[str, List[str]]
|
Dictionary mapping input IDs to lists of output IDs. |
gene_to_uniprot
¶
gene_to_uniprot(
gene_names: List[str],
organism: int = 9606,
reviewed_only: bool = True,
) -> Dict[str, str]
Map gene names to UniProt accessions.
Uses concurrent requests for efficient batch processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_names
|
List[str]
|
List of gene names. |
required |
organism
|
int
|
Organism tax ID (default human). |
9606
|
reviewed_only
|
bool
|
Only return reviewed entries. |
True
|
Returns:
| Type | Description |
|---|---|
Dict[str, str]
|
Dictionary mapping gene names to accessions. |
uniprot_to_gene
¶
Map UniProt accessions to gene names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accessions
|
List[str]
|
List of UniProt accessions. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, str]
|
Dictionary mapping accessions to gene names. |
get_sequences
¶
Get protein sequences for accessions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accessions
|
List[str]
|
List of UniProt accessions. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, str]
|
Dictionary mapping accessions to sequences. |
PubChem_Fetcher¶
PubChem_Fetcher
¶
Fetcher for PubChem PUG REST and PUG View APIs.
PubChem provides two REST APIs:
PUG REST - Structured data access:
- Compound records (structures, properties, synonyms)
- Substance records (deposited data)
- Bioassay data
- Gene and protein information
- Structure searches (similarity, substructure)
PUG View - Annotation/web page content:
- Detailed compound annotations
- Safety and hazards information
- Pharmacology and biochemistry
- Literature and patents
- Drug and medication information
Example
fetcher = PubChem_Fetcher()
# Get compound by CID
aspirin = fetcher.get_compound(2244)
print(aspirin.results[0])
# Get compound properties
props = fetcher.get_properties(
[2244, 3672],
properties=["MolecularFormula", "MolecularWeight"]
)
df = props.as_dataframe()
# Search by name
results = fetcher.search_by_name("aspirin")
# Similarity search
similar = fetcher.similarity_search(
smiles="CC(=O)OC1=CC=CC=C1C(=O)O",
threshold=90
)
# Get safety data
safety = fetcher.get_safety_data(2244)
# Get pharmacology info
pharma = fetcher.get_pharmacology(2244)
get
¶
get(
domain: str,
namespace: str,
identifiers: Optional[
Union[str, int, List[Union[str, int]]]
] = None,
operation: Optional[str] = None,
properties: Optional[List[str]] = None,
output: str = "JSON",
search_type: Optional[str] = None,
threshold: Optional[int] = None,
max_records: Optional[int] = None,
) -> PUGRestFetchedData
Fetch data from PubChem PUG REST API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
domain
|
str
|
PubChem domain (compound, substance, assay, etc.). |
required |
namespace
|
str
|
Identifier namespace (cid, name, smiles, etc.). |
required |
identifiers
|
Optional[Union[str, int, List[Union[str, int]]]]
|
ID(s) to look up. |
None
|
operation
|
Optional[str]
|
Operation to perform (property, synonyms, etc.). |
None
|
properties
|
Optional[List[str]]
|
List of properties for property operation. |
None
|
output
|
str
|
Output format (JSON, XML, CSV, SDF, PNG). |
'JSON'
|
search_type
|
Optional[str]
|
For structure searches (smiles, smarts, inchi). |
None
|
threshold
|
Optional[int]
|
Similarity threshold (0-100) for similarity searches. |
None
|
max_records
|
Optional[int]
|
Maximum records to return. |
None
|
Returns:
| Type | Description |
|---|---|
PUGRestFetchedData
|
PUGRestFetchedData with parsed results. |
get_all
¶
get_all(
domain: str,
namespace: str,
identifiers: List[Union[str, int]],
method: Literal[
"concat", "stream_to_storage"
] = "concat",
batch_size: int = 100,
rate_limit_per_second: int = 5,
operation: Optional[str] = None,
properties: Optional[List[str]] = None,
**kwargs: Any,
) -> Union[PUGRestFetchedData, Path]
Fetch data for many identifiers by batching.
PubChem allows multiple CIDs/SIDs in a single request (comma-separated), but there are limits. This method batches requests.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
domain
|
str
|
PubChem domain. |
required |
namespace
|
str
|
Identifier namespace. |
required |
identifiers
|
List[Union[str, int]]
|
List of IDs to fetch. |
required |
method
|
Literal['concat', 'stream_to_storage']
|
"concat" or "stream_to_storage". |
'concat'
|
batch_size
|
int
|
IDs per request (default 100). |
100
|
rate_limit_per_second
|
int
|
Max requests per second. |
5
|
operation
|
Optional[str]
|
Operation to perform. |
None
|
properties
|
Optional[List[str]]
|
Properties for property operation. |
None
|
**kwargs
|
Any
|
Additional parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
Union[PUGRestFetchedData, Path]
|
Combined PUGRestFetchedData or Path to output file. |
get_compounds
¶
Get multiple compound records by CID.
search_by_inchikey
¶
Search compounds by InChIKey.
search_by_formula
¶
Search compounds by molecular formula.
get_properties
¶
get_properties(
cids: Union[int, List[int]],
properties: Optional[List[str]] = None,
) -> PUGRestFetchedData
Get compound properties.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cids
|
Union[int, List[int]]
|
Compound ID(s). |
required |
properties
|
Optional[List[str]]
|
Properties to retrieve. Defaults to common properties. |
None
|
get_sids_for_compound
¶
Get SIDs associated with a compound.
get_aids_for_compound
¶
Get assay AIDs associated with a compound.
similarity_search
¶
Find similar compounds by SMILES.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
smiles
|
str
|
Query SMILES string. |
required |
threshold
|
int
|
Similarity threshold (0-100). |
90
|
max_records
|
int
|
Maximum records to return. |
100
|
substructure_search
¶
Find compounds containing a substructure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
smiles
|
str
|
Query SMILES string. |
required |
max_records
|
int
|
Maximum records to return. |
100
|
get_compound_image
¶
Get compound structure image (PNG).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
image_size
|
str
|
Image size (small, large, or pixel size like "300x300"). |
'large'
|
get_compound_sdf
¶
Get compound structure in SDF format.
get_view
¶
get_view(
record_id: Union[int, str],
record_type: str = "compound",
heading: Optional[str] = None,
output: str = "JSON",
) -> PUGViewFetchedData
Fetch annotation data from PubChem PUG View API.
PUG View provides detailed annotation/web page content including safety data, pharmacology, literature, patents, etc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
record_id
|
Union[int, str]
|
Record ID (CID for compounds, SID for substances, etc.). |
required |
record_type
|
str
|
Type of record (compound, substance, assay, gene, protein, etc.). |
'compound'
|
heading
|
Optional[str]
|
Optional heading to filter to a specific section. |
None
|
output
|
str
|
Output format (JSON or XML). |
'JSON'
|
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData with hierarchical annotation data. |
get_compound_annotations
¶
Get full annotation data for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData with all annotation sections. |
get_substance_annotations
¶
Get full annotation data for a substance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sid
|
int
|
Substance ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData with all annotation sections. |
get_safety_data
¶
Get safety and hazards information for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData filtered to Safety and Hazards section. |
get_toxicity_data
¶
Get toxicity information for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData filtered to Toxicity section. |
get_pharmacology
¶
Get pharmacology and biochemistry information for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData filtered to Pharmacology and Biochemistry section. |
get_drug_info
¶
Get drug and medication information for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData filtered to Drug and Medication Information section. |
get_literature
¶
Get literature references for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData filtered to Literature section. |
get_patents
¶
Get patent information for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData filtered to Patents section. |
get_names_and_identifiers
¶
Get names and identifiers for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData filtered to Names and Identifiers section. |
get_physical_properties
¶
Get chemical and physical properties for a compound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGViewFetchedData
|
PUGViewFetchedData filtered to Chemical and Physical Properties section. |
Ensembl_Fetcher¶
Ensembl_Fetcher
¶
Fetcher for Ensembl REST API.
Ensembl REST API provides access to genomic data including:
- Gene/transcript/protein lookup and information
- Genomic and protein sequences
- Feature overlap queries
- Cross-references to external databases
- Homology and comparative genomics
- Variant data and VEP (Variant Effect Predictor)
- Coordinate mapping between assemblies
- Phenotype and ontology data
Example
fetcher = Ensembl_Fetcher()
# Lookup a gene by Ensembl ID
gene = fetcher.lookup("ENSG00000141510")
print(gene.results[0]["display_name"]) # TP53
# Get sequence for a transcript
seq = fetcher.get_sequence("ENST00000269305", sequence_type="cds")
# Find features overlapping a region
features = fetcher.get_overlap_region(
"human", "7:140424943-140624564",
feature=["gene", "transcript"]
)
# Get homologs for a gene
homologs = fetcher.get_homology("human", "ENSG00000141510")
# Get variant consequences
vep = fetcher.get_vep_hgvs("human", "ENST00000366667:c.803C>T")
get
¶
get(
endpoint: str,
id: Optional[str] = None,
ids: Optional[List[str]] = None,
species: Optional[str] = None,
symbol: Optional[str] = None,
region: Optional[str] = None,
gene: Optional[str] = None,
name: Optional[str] = None,
content_type: str = "json",
**kwargs: Any,
) -> EnsemblFetchedData
Fetch data from Ensembl REST API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoint
|
str
|
Ensembl endpoint (e.g., "lookup/id", "sequence/id"). |
required |
id
|
Optional[str]
|
Ensembl stable ID for single lookups. |
None
|
ids
|
Optional[List[str]]
|
List of IDs for batch requests. |
None
|
species
|
Optional[str]
|
Species name (e.g., "human", "homo_sapiens"). |
None
|
symbol
|
Optional[str]
|
Gene symbol for symbol-based lookups. |
None
|
region
|
Optional[str]
|
Genomic region (e.g., "X:1000000..1000100:1"). |
None
|
gene
|
Optional[str]
|
Gene name or ID for phenotype endpoints. |
None
|
name
|
Optional[str]
|
Name for name-based lookups. |
None
|
content_type
|
str
|
Response format ("json", "fasta", "text"). |
'json'
|
**kwargs
|
Any
|
Additional endpoint-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with parsed results. |
lookup
¶
lookup(
id: str,
species: Optional[str] = None,
expand: bool = False,
format: str = "full",
db_type: str = "core",
phenotypes: bool = False,
utr: bool = False,
mane: bool = False,
) -> EnsemblFetchedData
Look up an Ensembl stable ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ensembl stable ID (e.g., ENSG00000141510). |
required |
species
|
Optional[str]
|
Species name/alias (optional, auto-detected from ID). |
None
|
expand
|
bool
|
Include connected features (transcripts, exons). |
False
|
format
|
str
|
Response format ("full" or "condensed"). |
'full'
|
db_type
|
str
|
Database type ("core" or "otherfeatures"). |
'core'
|
phenotypes
|
bool
|
Include phenotypes (genes only). |
False
|
utr
|
bool
|
Include UTR features (requires expand=True). |
False
|
mane
|
bool
|
Include MANE features (requires expand=True). |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with gene/transcript/protein information. |
lookup_batch
¶
lookup_batch(
ids: List[str],
species: Optional[str] = None,
expand: bool = False,
format: str = "full",
db_type: str = "core",
) -> EnsemblFetchedData
Look up multiple Ensembl stable IDs in batch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[str]
|
List of Ensembl stable IDs (max 1000). |
required |
species
|
Optional[str]
|
Species name/alias. |
None
|
expand
|
bool
|
Include connected features. |
False
|
format
|
str
|
Response format. |
'full'
|
db_type
|
str
|
Database type. |
'core'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with results for each ID. |
lookup_symbol
¶
lookup_symbol(
species: str,
symbol: str,
expand: bool = False,
format: str = "full",
) -> EnsemblFetchedData
Look up a gene by symbol.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name (e.g., "human", "mouse"). |
required |
symbol
|
str
|
Gene symbol (e.g., "BRCA2", "TP53"). |
required |
expand
|
bool
|
Include connected features. |
False
|
format
|
str
|
Response format. |
'full'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with gene information. |
get_sequence
¶
get_sequence(
id: str,
sequence_type: str = "genomic",
species: Optional[str] = None,
expand_5prime: Optional[int] = None,
expand_3prime: Optional[int] = None,
start: Optional[int] = None,
end: Optional[int] = None,
mask: Optional[str] = None,
mask_feature: bool = False,
multiple_sequences: bool = False,
format: str = "fasta",
) -> EnsemblFetchedData
Get sequence for an Ensembl stable ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ensembl stable ID (gene, transcript, exon, protein). |
required |
sequence_type
|
str
|
Type of sequence ("genomic", "cds", "cdna", "protein"). |
'genomic'
|
species
|
Optional[str]
|
Species name (optional). |
None
|
expand_5prime
|
Optional[int]
|
Extend upstream (genomic only). |
None
|
expand_3prime
|
Optional[int]
|
Extend downstream (genomic only). |
None
|
start
|
Optional[int]
|
Trim sequence start. |
None
|
end
|
Optional[int]
|
Trim sequence end. |
None
|
mask
|
Optional[str]
|
Mask repeats ("hard" or "soft", genomic only). |
None
|
mask_feature
|
bool
|
Mask introns/UTRs. |
False
|
multiple_sequences
|
bool
|
Return multiple sequences per ID. |
False
|
format
|
str
|
Output format ("fasta" or "json"). |
'fasta'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with sequence data. |
get_sequence_batch
¶
get_sequence_batch(
ids: List[str],
sequence_type: str = "genomic",
species: Optional[str] = None,
format: str = "fasta",
) -> EnsemblFetchedData
Get sequences for multiple Ensembl IDs in batch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[str]
|
List of Ensembl stable IDs (max 50). |
required |
sequence_type
|
str
|
Type of sequence. |
'genomic'
|
species
|
Optional[str]
|
Species name. |
None
|
format
|
str
|
Output format. |
'fasta'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with sequences. |
get_sequence_region
¶
get_sequence_region(
species: str,
region: str,
expand_5prime: Optional[int] = None,
expand_3prime: Optional[int] = None,
mask: Optional[str] = None,
coord_system: Optional[str] = None,
format: str = "fasta",
) -> EnsemblFetchedData
Get genomic sequence for a region.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name (e.g., "human"). |
required |
region
|
str
|
Genomic region (e.g., "X:1000000..1000100:1"). |
required |
expand_5prime
|
Optional[int]
|
Extend upstream. |
None
|
expand_3prime
|
Optional[int]
|
Extend downstream. |
None
|
mask
|
Optional[str]
|
Mask repeats ("hard" or "soft"). |
None
|
coord_system
|
Optional[str]
|
Coordinate system filter. |
None
|
format
|
str
|
Output format ("fasta" or "json"). |
'fasta'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with sequence. |
get_overlap_id
¶
get_overlap_id(
id: str,
feature: Union[str, List[str]],
species: Optional[str] = None,
biotype: Optional[str] = None,
logic_name: Optional[str] = None,
db_type: str = "core",
) -> EnsemblFetchedData
Get features overlapping an Ensembl ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ensembl stable ID. |
required |
feature
|
Union[str, List[str]]
|
Feature type(s) to retrieve (gene, transcript, exon, etc.). |
required |
species
|
Optional[str]
|
Species name. |
None
|
biotype
|
Optional[str]
|
Filter by biotype (e.g., "protein_coding"). |
None
|
logic_name
|
Optional[str]
|
Filter by analysis logic name. |
None
|
db_type
|
str
|
Database type. |
'core'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with overlapping features. |
get_overlap_region
¶
get_overlap_region(
species: str,
region: str,
feature: Union[str, List[str]],
biotype: Optional[str] = None,
logic_name: Optional[str] = None,
so_term: Optional[str] = None,
variant_set: Optional[str] = None,
db_type: str = "core",
) -> EnsemblFetchedData
Get features overlapping a genomic region.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name (e.g., "human"). |
required |
region
|
str
|
Genomic region (e.g., "7:140424943-140624564", max 5Mb). |
required |
feature
|
Union[str, List[str]]
|
Feature type(s) to retrieve. |
required |
biotype
|
Optional[str]
|
Filter by biotype. |
None
|
logic_name
|
Optional[str]
|
Filter by analysis logic name. |
None
|
so_term
|
Optional[str]
|
Sequence Ontology term filter. |
None
|
variant_set
|
Optional[str]
|
Variant set restriction (e.g., "ClinVar"). |
None
|
db_type
|
str
|
Database type. |
'core'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with overlapping features. |
get_xrefs
¶
get_xrefs(
id: str,
species: Optional[str] = None,
external_db: Optional[str] = None,
all_levels: bool = False,
db_type: str = "core",
object_type: Optional[str] = None,
) -> EnsemblFetchedData
Get external cross-references for an Ensembl ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ensembl stable ID. |
required |
species
|
Optional[str]
|
Species name. |
None
|
external_db
|
Optional[str]
|
Filter by external database (e.g., "HGNC", "UniProt"). |
None
|
all_levels
|
bool
|
Find all linked features. |
False
|
db_type
|
str
|
Database type. |
'core'
|
object_type
|
Optional[str]
|
Filter by feature type. |
None
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with cross-references. |
get_xrefs_symbol
¶
get_xrefs_symbol(
species: str,
symbol: str,
external_db: Optional[str] = None,
db_type: str = "core",
object_type: Optional[str] = None,
) -> EnsemblFetchedData
Look up Ensembl objects by external symbol.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
symbol
|
str
|
External symbol (e.g., gene name "BRCA2"). |
required |
external_db
|
Optional[str]
|
Filter by external database. |
None
|
db_type
|
str
|
Database type. |
'core'
|
object_type
|
Optional[str]
|
Filter by feature type. |
None
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with matching Ensembl objects. |
get_homology
¶
get_homology(
species: str,
id: str,
homology_type: str = "all",
target_species: Optional[str] = None,
target_taxon: Optional[int] = None,
aligned: bool = True,
cigar_line: bool = True,
sequence: str = "protein",
compara: str = "vertebrates",
format: str = "full",
) -> EnsemblFetchedData
Get homology information for a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Source species name. |
required |
id
|
str
|
Ensembl gene ID. |
required |
homology_type
|
str
|
Type of homology ("orthologues", "paralogues", "all"). |
'all'
|
target_species
|
Optional[str]
|
Filter by target species. |
None
|
target_taxon
|
Optional[int]
|
Filter by target taxon ID. |
None
|
aligned
|
bool
|
Include aligned sequences. |
True
|
cigar_line
|
bool
|
Return sequence in CIGAR format. |
True
|
sequence
|
str
|
Sequence type ("none", "cdna", "protein"). |
'protein'
|
compara
|
str
|
Compara database name. |
'vertebrates'
|
format
|
str
|
Response format ("full" or "condensed"). |
'full'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with homology data. |
get_homology_symbol
¶
get_homology_symbol(
species: str,
symbol: str,
homology_type: str = "all",
target_species: Optional[str] = None,
sequence: str = "protein",
) -> EnsemblFetchedData
Get homology information for a gene by symbol.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Source species name. |
required |
symbol
|
str
|
Gene symbol. |
required |
homology_type
|
str
|
Type of homology. |
'all'
|
target_species
|
Optional[str]
|
Filter by target species. |
None
|
sequence
|
str
|
Sequence type. |
'protein'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with homology data. |
get_variation
¶
get_variation(
species: str,
id: str,
genotypes: bool = False,
pops: bool = False,
population_genotypes: bool = False,
phenotypes: bool = False,
genotyping_chips: bool = False,
) -> EnsemblFetchedData
Get variant information by rsID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
id
|
str
|
Variant ID (e.g., "rs56116432"). |
required |
genotypes
|
bool
|
Include individual genotypes. |
False
|
pops
|
bool
|
Include population allele frequencies. |
False
|
population_genotypes
|
bool
|
Include population genotype frequencies. |
False
|
phenotypes
|
bool
|
Include phenotypes. |
False
|
genotyping_chips
|
bool
|
Include genotyping chip info. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with variant data. |
get_vep_hgvs
¶
get_vep_hgvs(
species: str,
hgvs_notation: str,
canonical: bool = False,
domains: bool = False,
hgvs: bool = False,
numbers: bool = False,
protein: bool = False,
refseq: bool = False,
variant_class: bool = False,
) -> EnsemblFetchedData
Get variant consequences using HGVS notation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
hgvs_notation
|
str
|
HGVS notation (e.g., "ENST00000366667:c.803C>T"). |
required |
canonical
|
bool
|
Only return canonical transcript. |
False
|
domains
|
bool
|
Include protein domains. |
False
|
hgvs
|
bool
|
Add HGVS nomenclature. |
False
|
numbers
|
bool
|
Include exon/intron numbers. |
False
|
protein
|
bool
|
Include protein position and amino acid changes. |
False
|
refseq
|
bool
|
Include RefSeq transcripts. |
False
|
variant_class
|
bool
|
Include variant class. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with VEP results. |
get_vep_id
¶
get_vep_id(
species: str,
id: str,
canonical: bool = False,
domains: bool = False,
hgvs: bool = False,
numbers: bool = False,
protein: bool = False,
) -> EnsemblFetchedData
Get variant consequences using variant ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
id
|
str
|
Variant ID (e.g., rsID). |
required |
canonical
|
bool
|
Only return canonical transcript. |
False
|
domains
|
bool
|
Include protein domains. |
False
|
hgvs
|
bool
|
Add HGVS nomenclature. |
False
|
numbers
|
bool
|
Include exon/intron numbers. |
False
|
protein
|
bool
|
Include protein position. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with VEP results. |
get_vep_region
¶
get_vep_region(
species: str,
region: str,
allele: str,
canonical: bool = False,
domains: bool = False,
hgvs: bool = False,
numbers: bool = False,
protein: bool = False,
) -> EnsemblFetchedData
Get variant consequences using genomic coordinates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
region
|
str
|
Genomic region (e.g., "9:22125503-22125502:1"). |
required |
allele
|
str
|
Variant allele (e.g., "C", "DUP"). |
required |
canonical
|
bool
|
Only return canonical transcript. |
False
|
domains
|
bool
|
Include protein domains. |
False
|
hgvs
|
bool
|
Add HGVS nomenclature. |
False
|
numbers
|
bool
|
Include exon/intron numbers. |
False
|
protein
|
bool
|
Include protein position. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with VEP results. |
map_assembly
¶
map_assembly(
species: str,
asm_one: str,
region: str,
asm_two: str,
coord_system: str = "chromosome",
target_coord_system: str = "chromosome",
) -> EnsemblFetchedData
Map coordinates between assemblies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
asm_one
|
str
|
Source assembly version (e.g., "GRCh37"). |
required |
region
|
str
|
Genomic region to map (e.g., "X:1000000..1000100:1"). |
required |
asm_two
|
str
|
Target assembly version (e.g., "GRCh38"). |
required |
coord_system
|
str
|
Input coordinate system. |
'chromosome'
|
target_coord_system
|
str
|
Output coordinate system. |
'chromosome'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with mapped coordinates. |
get_phenotype_gene
¶
get_phenotype_gene(
species: str,
gene: str,
include_associated: bool = False,
include_overlap: bool = False,
include_pubmed_id: bool = False,
include_review_status: bool = False,
include_submitter: bool = False,
) -> EnsemblFetchedData
Get phenotypes associated with a gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
gene
|
str
|
Gene name or Ensembl ID. |
required |
include_associated
|
bool
|
Include phenotypes from associated variants. |
False
|
include_overlap
|
bool
|
Include phenotypes from overlapping features. |
False
|
include_pubmed_id
|
bool
|
Include PubMed IDs. |
False
|
include_review_status
|
bool
|
Include review status. |
False
|
include_submitter
|
bool
|
Include submitter names. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with phenotype data. |
get_phenotype_region
¶
get_phenotype_region(
species: str,
region: str,
include_pubmed_id: bool = False,
include_review_status: bool = False,
) -> EnsemblFetchedData
Get phenotypes in a genomic region.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
region
|
str
|
Genomic region. |
required |
include_pubmed_id
|
bool
|
Include PubMed IDs. |
False
|
include_review_status
|
bool
|
Include review status. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with phenotype data. |
get_ontology_term
¶
get_ontology_term(
id: str,
relation: Optional[str] = None,
simple: bool = False,
) -> EnsemblFetchedData
Get ontology term information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ontology term ID (e.g., "GO:0005667"). |
required |
relation
|
Optional[str]
|
Relationship types to include. |
None
|
simple
|
bool
|
Don't fetch parent/child terms. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with ontology term data. |
get_ontology_ancestors
¶
get_ontology_ancestors(
id: str,
ontology: Optional[str] = None,
zero_distance: bool = False,
) -> EnsemblFetchedData
Get ancestor terms for an ontology term.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ontology term ID. |
required |
ontology
|
Optional[str]
|
Filter by ontology. |
None
|
zero_distance
|
bool
|
Include the term itself. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with ancestor terms. |
get_ontology_descendants
¶
get_ontology_descendants(
id: str,
ontology: Optional[str] = None,
zero_distance: bool = False,
subset: Optional[str] = None,
) -> EnsemblFetchedData
Get descendant terms for an ontology term.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ontology term ID. |
required |
ontology
|
Optional[str]
|
Filter by ontology. |
None
|
zero_distance
|
bool
|
Include the term itself. |
False
|
subset
|
Optional[str]
|
Filter by subset. |
None
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with descendant terms. |
get_genetree
¶
get_genetree(
id: str,
aligned: bool = False,
cigar_line: bool = False,
sequence: str = "protein",
nh_format: str = "simple",
prune_species: Optional[str] = None,
prune_taxon: Optional[int] = None,
clusterset_id: Optional[str] = None,
compara: str = "vertebrates",
) -> EnsemblFetchedData
Get gene tree by tree ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Gene tree ID (e.g., "ENSGT00390000003602"). |
required |
aligned
|
bool
|
Include aligned sequences. |
False
|
cigar_line
|
bool
|
Return sequence in CIGAR format. |
False
|
sequence
|
str
|
Sequence type ("none", "cdna", "protein"). |
'protein'
|
nh_format
|
str
|
Newick format type. |
'simple'
|
prune_species
|
Optional[str]
|
Filter by species. |
None
|
prune_taxon
|
Optional[int]
|
Filter by taxon ID. |
None
|
clusterset_id
|
Optional[str]
|
Gene-tree resource name. |
None
|
compara
|
str
|
Compara database name. |
'vertebrates'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with gene tree data. |
get_genetree_member
¶
get_genetree_member(
species: str,
id: str,
aligned: bool = False,
sequence: str = "protein",
compara: str = "vertebrates",
) -> EnsemblFetchedData
Get gene tree containing a gene ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
id
|
str
|
Ensembl gene ID. |
required |
aligned
|
bool
|
Include aligned sequences. |
False
|
sequence
|
str
|
Sequence type. |
'protein'
|
compara
|
str
|
Compara database name. |
'vertebrates'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with gene tree data. |
get_assembly_info
¶
get_assembly_info(
species: str,
bands: bool = False,
synonyms: bool = False,
) -> EnsemblFetchedData
Get assembly information for a species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name. |
required |
bands
|
bool
|
Include karyotype band information. |
False
|
synonyms
|
bool
|
Include known synonyms. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with assembly information. |
get_species_info
¶
get_species_info(
division: Optional[str] = None,
strain_collection: Optional[str] = None,
hide_strain_info: bool = False,
) -> EnsemblFetchedData
Get information about available species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
division
|
Optional[str]
|
Filter by Ensembl division. |
None
|
strain_collection
|
Optional[str]
|
Filter by strain collection. |
None
|
hide_strain_info
|
bool
|
Hide strain information. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData with species information. |
BioMart_Fetcher¶
BioMart_Fetcher
¶
Fetcher for BioMart (Ensembl) genomic data.
BioMart provides access to:
- Gene information (IDs, names, descriptions, coordinates)
- Transcript and protein data
- Sequence data (cDNA, coding, peptide)
- Homology information
- Variation data
- GO annotations
The API has a hierarchical structure:
- Server: Contains multiple marts (e.g., ENSEMBL_MART_ENSEMBL)
- Mart: Contains multiple datasets (e.g., hsapiens_gene_ensembl)
- Dataset: Contains filters and attributes for queries
Example
fetcher = BioMart_Fetcher()
# List available marts
marts = fetcher.list_marts()
print(marts.marts)
# List datasets in a mart
datasets = fetcher.list_datasets()
print(datasets.search(contain="human"))
# Get gene info by Ensembl IDs
data = fetcher.get_genes(
ids=["ENSG00000141510", "ENSG00000012048"],
attributes=["ensembl_gene_id", "external_gene_name", "description"]
)
df = data.as_dataframe()
# Get genes by gene names
data = fetcher.get_genes_by_name(
names=["TP53", "BRCA1", "BRCA2"],
attributes=["ensembl_gene_id", "chromosome_name", "start_position"]
)
Note
BioMart API has rate limits and can be slow for large queries. Use batching for queries with many filter values.
Initialize BioMart fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
host
|
Union[str, BioMartHost]
|
BioMart host (default: www.ensembl.org). |
main
|
**data_manager_kws
|
Any
|
Keyword arguments for BioMartDataManager. |
{}
|
list_marts
¶
List available marts on the server.
Returns:
| Type | Description |
|---|---|
BioMartRegistryData
|
BioMartRegistryData with mart information. |
list_datasets
¶
List datasets available in a mart.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mart
|
Union[str, BioMartMart]
|
Mart name (default: ENSEMBL_MART_ENSEMBL). |
ensembl
|
Returns:
| Type | Description |
|---|---|
BioMartDatasetsData
|
BioMartDatasetsData with dataset information. |
get_config
¶
get_config(
dataset: Union[str, BioMartDataset] = hsapiens_gene,
use_cache: bool = True,
) -> BioMartConfigData
Get dataset configuration (filters and attributes).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
use_cache
|
bool
|
Whether to use cached configuration. |
True
|
Returns:
| Type | Description |
|---|---|
BioMartConfigData
|
BioMartConfigData with filters and attributes. |
list_attributes
¶
list_attributes(
dataset: Union[str, BioMartDataset] = hsapiens_gene,
contain: Optional[str] = None,
pattern: Optional[str] = None,
) -> Any
List available attributes for a dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
contain
|
Optional[str]
|
Filter attributes containing this string. |
None
|
pattern
|
Optional[str]
|
Filter attributes matching this regex pattern. |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
DataFrame with attribute information. |
list_filters
¶
list_filters(
dataset: Union[str, BioMartDataset] = hsapiens_gene,
contain: Optional[str] = None,
pattern: Optional[str] = None,
) -> Any
List available filters for a dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
contain
|
Optional[str]
|
Filter filters containing this string. |
None
|
pattern
|
Optional[str]
|
Filter filters matching this regex pattern. |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
DataFrame with filter information. |
query
¶
query(
dataset: Union[str, BioMartDataset] = hsapiens_gene,
attributes: Optional[List[str]] = None,
filters: Optional[
Dict[str, Union[str, List[str]]]
] = None,
unique_rows: bool = True,
) -> BioMartQueryData
Execute a BioMart query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
attributes
|
Optional[List[str]]
|
List of attributes to retrieve. |
None
|
filters
|
Optional[Dict[str, Union[str, List[str]]]]
|
Dict of filter name to value(s). |
None
|
unique_rows
|
bool
|
Whether to return unique rows only. |
True
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData with query results. |
batch_query
¶
batch_query(
dataset: Union[str, BioMartDataset] = hsapiens_gene,
attributes: Optional[List[str]] = None,
filter_name: str = "ensembl_gene_id",
filter_values: List[str] = None,
batch_size: int = 500,
max_workers: int = 4,
show_progress: bool = True,
) -> BioMartQueryData
Execute a batched BioMart query for many filter values.
BioMart has limits on query size, so large filter lists are split into batches and queried in parallel using threads.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
attributes
|
Optional[List[str]]
|
List of attributes to retrieve. |
None
|
filter_name
|
str
|
Name of the filter to batch. |
'ensembl_gene_id'
|
filter_values
|
List[str]
|
List of filter values. |
None
|
batch_size
|
int
|
Number of values per batch. |
500
|
max_workers
|
int
|
Number of parallel workers. |
4
|
show_progress
|
bool
|
Whether to show progress bar. |
True
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
Combined BioMartQueryData with all results. |
get_genes
¶
get_genes(
ids: List[str],
attributes: Optional[List[str]] = None,
dataset: Union[str, BioMartDataset] = hsapiens_gene,
batch_size: int = 500,
) -> BioMartQueryData
Get gene information by Ensembl gene IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[str]
|
List of Ensembl gene IDs. |
required |
attributes
|
Optional[List[str]]
|
Attributes to retrieve. Defaults to common gene attributes. |
None
|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
batch_size
|
int
|
Batch size for large queries. |
500
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData with gene information. |
get_genes_by_name
¶
get_genes_by_name(
names: List[str],
attributes: Optional[List[str]] = None,
dataset: Union[str, BioMartDataset] = hsapiens_gene,
batch_size: int = 500,
) -> BioMartQueryData
Get gene information by gene names (symbols).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
names
|
List[str]
|
List of gene names/symbols. |
required |
attributes
|
Optional[List[str]]
|
Attributes to retrieve. |
None
|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
batch_size
|
int
|
Batch size for large queries. |
500
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData with gene information. |
get_genes_by_chromosome
¶
get_genes_by_chromosome(
chromosome: str,
start: Optional[int] = None,
end: Optional[int] = None,
attributes: Optional[List[str]] = None,
dataset: Union[str, BioMartDataset] = hsapiens_gene,
) -> BioMartQueryData
Get genes on a chromosome, optionally within a region.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chromosome
|
str
|
Chromosome name (e.g., "1", "X", "MT"). |
required |
start
|
Optional[int]
|
Start position (optional). |
None
|
end
|
Optional[int]
|
End position (optional). |
None
|
attributes
|
Optional[List[str]]
|
Attributes to retrieve. |
None
|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData with genes in the region. |
get_transcripts
¶
get_transcripts(
gene_ids: List[str],
attributes: Optional[List[str]] = None,
dataset: Union[str, BioMartDataset] = hsapiens_gene,
batch_size: int = 500,
) -> BioMartQueryData
Get transcript information for genes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
List[str]
|
List of Ensembl gene IDs. |
required |
attributes
|
Optional[List[str]]
|
Attributes to retrieve. |
None
|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
batch_size
|
int
|
Batch size for large queries. |
500
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData with transcript information. |
get_go_annotations
¶
get_go_annotations(
gene_ids: List[str],
dataset: Union[str, BioMartDataset] = hsapiens_gene,
batch_size: int = 500,
) -> BioMartQueryData
Get Gene Ontology annotations for genes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
List[str]
|
List of Ensembl gene IDs. |
required |
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
batch_size
|
int
|
Batch size for large queries. |
500
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData with GO annotations. |
get_homologs
¶
get_homologs(
gene_ids: List[str],
target_species: str = "mmusculus",
dataset: Union[str, BioMartDataset] = hsapiens_gene,
batch_size: int = 500,
) -> BioMartQueryData
Get homolog information for genes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
List[str]
|
List of Ensembl gene IDs. |
required |
target_species
|
str
|
Target species for homologs (e.g., "mmusculus"). |
'mmusculus'
|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
batch_size
|
int
|
Batch size for large queries. |
500
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData with homolog information. |
convert_ids
¶
convert_ids(
ids: List[str],
from_type: str = "ensembl_gene_id",
to_type: str = "external_gene_name",
dataset: Union[str, BioMartDataset] = hsapiens_gene,
batch_size: int = 500,
) -> BioMartQueryData
Convert between different ID types.
Common ID types: - ensembl_gene_id - ensembl_transcript_id - ensembl_peptide_id - external_gene_name - entrezgene_id - uniprot_gn_id - hgnc_symbol - hgnc_id - refseq_mrna - refseq_peptide
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[str]
|
List of IDs to convert. |
required |
from_type
|
str
|
Source ID type (also used as filter). |
'ensembl_gene_id'
|
to_type
|
str
|
Target ID type. |
'external_gene_name'
|
dataset
|
Union[str, BioMartDataset]
|
Dataset name. |
hsapiens_gene
|
batch_size
|
int
|
Batch size for large queries. |
500
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData with ID mappings. |
KEGG_Fetcher¶
KEGG_Fetcher
¶
Fetcher for KEGG REST API.
KEGG (Kyoto Encyclopedia of Genes and Genomes) provides access to:
- Pathway information and diagrams
- Gene and protein entries
- Compound and drug data
- Disease information
- Organism-specific pathway lists
- ID conversion between databases
Operations:
- info: Get database statistics
- list: List database entries
- find: Search entries by keyword
- get: Retrieve specific entries
- conv: Convert IDs between databases
- link: Find linked entries across databases
- ddi: Drug-drug interactions
Example
fetcher = KEGG_Fetcher()
# Get database info
info = fetcher.get("info", database="pathway")
print(info.text)
# List human pathways
pathways = fetcher.get("list", database="pathway", organism="hsa")
print(pathways.to_dataframe())
# Search for genes
results = fetcher.get("find", database="genes", query="tp53")
# Get specific entries
entries = fetcher.get("get", dbentries=["hsa:7157", "hsa:672"])
for record in entries.records:
print(record.get("ENTRY"), record.get("NAME"))
# Convert KEGG IDs to NCBI Gene IDs
mapping = fetcher.get("conv", target_db="ncbi-geneid", dbentries=["hsa:7157"])
Initialize KEGG fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**data_manager_kws
|
Any
|
Keyword arguments for KEGGDataManager (e.g., storage_path for stream_to_storage method). |
{}
|
get
¶
Fetch data from KEGG REST API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
operation
|
str
|
KEGG operation (info, list, find, get, conv, link, ddi). |
required |
**kwargs
|
Any
|
Operation-specific parameters (database, query, dbentries, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
KEGGFetchedData
|
KEGGFetchedData with parsed results. |
get_all
¶
get_all(
operation: str,
dbentries: List[str],
method: Literal[
"concat", "stream_to_storage"
] = "concat",
batch_size: int = DEFAULT_BATCH_SIZE,
rate_limit_per_second: int = 3,
get_option: Optional[str] = None,
**kwargs: Any,
) -> Union[KEGGFetchedData, Path]
Fetch data for many entries by batching and concurrent requests.
KEGG limits certain operations (get, conv, link, ddi) to a small number of entries per request. This method splits a large entry list into batches and fetches them concurrently.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
operation
|
str
|
KEGG operation ( |
required |
dbentries
|
List[str]
|
List of database entry IDs to fetch. |
required |
method
|
Literal['concat', 'stream_to_storage']
|
|
'concat'
|
batch_size
|
int
|
Entries per request (default 10, KEGG's limit). |
DEFAULT_BATCH_SIZE
|
rate_limit_per_second
|
int
|
Max requests per second (default 3 to be conservative with KEGG). |
3
|
get_option
|
Optional[str]
|
For |
None
|
**kwargs
|
Any
|
Additional parameters (target_db for conv/link, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
Union[KEGGFetchedData, Path]
|
Combined KEGGFetchedData or Path to output file. |
Example::
fetcher = KEGG_Fetcher(storage_path="./data")
genes = ["hsa:10458", "hsa:7157", "hsa:672", ...] # 100+ genes
data = fetcher.get_all("get", genes)
print(len(data.records))
ChEMBL_Fetcher¶
ChEMBL_Fetcher
¶
Fetcher for ChEMBL REST API.
ChEMBL provides bioactivity data for drug-like molecules including:
- Molecules and their properties
- Bioactivity measurements
- Targets (proteins, cell lines, organisms)
- Assays and documents
- Drug information and indications
Example
fetcher = ChEMBL_Fetcher()
# Get a specific molecule by ChEMBL ID
aspirin = fetcher.get(resource="molecule", chembl_id="CHEMBL25")
print(aspirin.results[0]["pref_name"])
# Search for molecules
results = fetcher.get(
resource="molecule",
search_query="aspirin",
limit=10
)
# Filter activities by target
activities = fetcher.get(
resource="activity",
filters={"target_chembl_id": "CHEMBL240"},
limit=100
)
# Similarity search
similar = fetcher.get(
resource="similarity",
smiles="CC(=O)Oc1ccccc1C(=O)O", # Aspirin SMILES
similarity_threshold=70,
limit=50
)
get
¶
get(
resource: str,
chembl_id: Optional[str] = None,
search_query: Optional[str] = None,
filters: Optional[Dict[str, Any]] = None,
smiles: Optional[str] = None,
similarity_threshold: Optional[int] = None,
limit: Optional[int] = None,
offset: Optional[int] = None,
format: str = "json",
) -> ChEMBLFetchedData
Fetch data from ChEMBL REST API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
str
|
ChEMBL resource (molecule, activity, target, etc.). |
required |
chembl_id
|
Optional[str]
|
Optional ChEMBL ID for single-entry lookup. |
None
|
search_query
|
Optional[str]
|
Optional full-text search query. |
None
|
filters
|
Optional[Dict[str, Any]]
|
Optional field filters (e.g., {"max_phase": 4}). |
None
|
smiles
|
Optional[str]
|
SMILES string for similarity/substructure search. |
None
|
similarity_threshold
|
Optional[int]
|
Threshold for similarity search (40-100). |
None
|
limit
|
Optional[int]
|
Max records to return (1-1000). |
None
|
offset
|
Optional[int]
|
Pagination offset. |
None
|
format
|
str
|
Output format (json or xml). |
'json'
|
Returns:
| Type | Description |
|---|---|
ChEMBLFetchedData
|
ChEMBLFetchedData with parsed results. |
get_all
¶
get_all(
resource: str,
method: Literal[
"concat", "stream_to_storage"
] = "concat",
limit_per_page: int = 1000,
max_records: Optional[int] = None,
rate_limit_per_second: int = 5,
search_query: Optional[str] = None,
filters: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> Union[ChEMBLFetchedData, Path]
Fetch multiple pages of results concurrently.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
str
|
ChEMBL resource (molecule, activity, target, etc.). |
required |
method
|
Literal['concat', 'stream_to_storage']
|
"concat" returns a single ChEMBLFetchedData. "stream_to_storage" streams each batch to storage and returns the output file Path. |
'concat'
|
limit_per_page
|
int
|
Records per request (default 1000, max 1000). |
1000
|
max_records
|
Optional[int]
|
Total records to fetch. None means fetch all. |
None
|
rate_limit_per_second
|
int
|
Max concurrent requests per second. |
5
|
search_query
|
Optional[str]
|
Optional full-text search query. |
None
|
filters
|
Optional[Dict[str, Any]]
|
Optional field filters. |
None
|
**kwargs
|
Any
|
Additional parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
Union[ChEMBLFetchedData, Path]
|
Combined ChEMBLFetchedData or Path to output file. |
get_molecule
¶
Get a single molecule by ChEMBL ID.
search_molecules
¶
Search molecules by name or description.
get_activities_for_target
¶
Get bioactivity data for a specific target.
get_activities_for_molecule
¶
Get bioactivity data for a specific molecule.
similarity_search
¶
Find molecules similar to a given SMILES structure.
substructure_search
¶
Find molecules containing a given substructure.
get_approved_drugs
¶
Get approved drugs (max_phase = 4).
get_drug_indications
¶
Get indications for a specific drug/molecule.
get_mechanisms
¶
Get mechanisms of action for a specific molecule.
QuickGO_Fetcher¶
QuickGO_Fetcher
¶
Fetcher for QuickGO API (GO annotations, ontology, gene products).
QuickGO provides access to:
- Gene Ontology term information
- GO annotations for genes/proteins
- Gene product information
- Annotation downloads in various formats (GAF, GPAD, TSV)
Categories:
- ontology: GO term search and retrieval
- annotation: GO annotation search and download
- geneproduct: Gene product information
Example
fetcher = QuickGO_Fetcher()
# Search GO terms
data = fetcher.get(
category="ontology",
endpoint="search",
query="apoptosis"
)
# Get GO term by ID
data = fetcher.get(
category="ontology",
endpoint="terms/{ids}",
ids=["GO:0008150", "GO:0003674"]
)
# Search annotations for human
data = fetcher.get(
category="annotation",
endpoint="search",
goId="GO:0006915", # apoptotic process
taxonId=9606
)
df = data.as_dataframe()
Initialize QuickGO fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**data_manager_kws
|
Any
|
Keyword arguments for QuickGODataManager (e.g., storage_path for stream_to_storage method). |
{}
|
get
¶
Fetch data from QuickGO API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str
|
QuickGO category (ontology, annotation, geneproduct). |
required |
endpoint
|
str
|
API endpoint (search, terms/{ids}, downloadSearch, etc.). |
required |
**kwargs
|
Any
|
Endpoint-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
QuickGOFetchedData
|
QuickGOFetchedData with parsed results. |
get_all
¶
get_all(
category: str,
endpoint: str,
method: Literal[
"concat", "stream_to_storage"
] = "concat",
limit_per_page: int = DEFAULT_LIMIT,
max_records: Optional[int] = None,
rate_limit_per_second: int = 5,
**kwargs: Any,
) -> Union[QuickGOFetchedData, Path]
Fetch multiple pages of results concurrently.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str
|
QuickGO category (ontology, annotation, geneproduct). |
required |
endpoint
|
str
|
API endpoint (search, etc.). Note: downloadSearch doesn't support pagination, use get() directly. |
required |
method
|
Literal['concat', 'stream_to_storage']
|
|
'concat'
|
limit_per_page
|
int
|
Records per request (default 100, max 10000). |
DEFAULT_LIMIT
|
max_records
|
Optional[int]
|
Total records to fetch. None means fetch all. |
None
|
rate_limit_per_second
|
int
|
Max concurrent requests per second. |
5
|
**kwargs
|
Any
|
Forwarded to the API (goId, taxonId, etc.). |
{}
|
Returns:
| Type | Description |
|---|---|
Union[QuickGOFetchedData, Path]
|
Combined QuickGOFetchedData or Path to output file. |
HPA_Fetcher¶
HPA_Fetcher
¶
Fetcher for Human Protein Atlas data.
The Human Protein Atlas provides proteomics data including:
- Tissue expression (protein and RNA)
- Subcellular location
- Cell type expression
- Blood cell expression
- Brain region expression
- Cancer/pathology data
Example
fetcher = HPA_Fetcher()
# Get gene data by Ensembl ID
tp53 = fetcher.get_gene("ENSG00000141510")
print(tp53.results[0])
# Search for genes
results = fetcher.search("TP53")
print(results.get_gene_names())
# Get specific columns for genes
data = fetcher.search_download(
search="TP53",
columns=["g", "gs", "eg", "gd", "rnats_s"]
)
df = data.as_dataframe()
# Get expression data with default columns
expr = fetcher.get_expression("BRCA1")
# Get subcellular location data
loc = fetcher.get_subcellular_location("ENSG00000141510")
get_gene
¶
Get gene data by Ensembl ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ensembl_id
|
str
|
Ensembl gene ID (e.g., "ENSG00000141510"). |
required |
format
|
str
|
Output format (json, tsv, xml). |
'json'
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with gene information. |
get_genes
¶
get_genes(
ensembl_ids: List[str],
format: str = "json",
rate_limit_per_second: int = 5,
) -> HPAFetchedData
Get data for multiple genes by Ensembl IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ensembl_ids
|
List[str]
|
List of Ensembl gene IDs. |
required |
format
|
str
|
Output format. |
'json'
|
rate_limit_per_second
|
int
|
Rate limit for API calls. |
5
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
Combined HPAFetchedData. |
search
¶
Search for genes in HPA.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query (gene name, etc.). |
required |
format
|
str
|
Output format (json, tsv, xml). |
'json'
|
compress
|
str
|
Whether to compress response (yes/no). |
'no'
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with search results. |
search_download
¶
search_download(
search: str,
columns: Optional[List[str]] = None,
format: str = "json",
compress: str = "no",
) -> HPAFetchedData
Fetch customized data using the search_download API.
This is the most flexible way to retrieve HPA data, allowing selection of specific columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
columns
|
Optional[List[str]]
|
List of column specifiers (see HPA_COLUMNS). If None, uses DEFAULT_GENE_COLUMNS. |
None
|
format
|
str
|
Output format (json or tsv). |
'json'
|
compress
|
str
|
Whether to compress response (yes/no). |
'no'
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with requested columns. |
get_all
¶
get_all(
search: str,
columns: Optional[List[str]] = None,
method: Literal[
"concat", "stream_to_storage"
] = "concat",
format: str = "json",
**kwargs: Any,
) -> Union[HPAFetchedData, Path]
Fetch data with batching support.
Note: HPA's search_download API doesn't natively support pagination, so this method is mainly useful for storing results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
columns
|
Optional[List[str]]
|
List of column specifiers. |
None
|
method
|
Literal['concat', 'stream_to_storage']
|
"concat" or "stream_to_storage". |
'concat'
|
format
|
str
|
Output format. |
'json'
|
**kwargs
|
Any
|
Additional parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
Union[HPAFetchedData, Path]
|
HPAFetchedData or Path to stored file. |
get_expression
¶
Get expression data for gene(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
columns
|
Optional[List[str]]
|
Expression columns to retrieve. If None, uses DEFAULT_EXPRESSION_COLUMNS. |
None
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with expression data. |
get_subcellular_location
¶
Get subcellular location data for gene(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
columns
|
Optional[List[str]]
|
Subcellular location columns to retrieve. If None, uses DEFAULT_SUBCELLULAR_COLUMNS. |
None
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with subcellular location data. |
get_pathology
¶
Get pathology/cancer prognostics data for gene(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
columns
|
Optional[List[str]]
|
Pathology columns to retrieve. If None, uses DEFAULT_PATHOLOGY_COLUMNS. |
None
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with pathology data. |
get_protein_class
¶
Get protein class information for gene(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with protein class information. |
get_tissue_expression
¶
Get tissue-specific RNA expression data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
tissues
|
Optional[List[str]]
|
List of tissue column names to include. If None, gets general tissue expression info. |
None
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with tissue expression data. |
get_blood_expression
¶
Get blood cell expression data for gene(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with blood cell expression data. |
get_brain_expression
¶
Get brain region expression data for gene(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
str
|
Gene search query. |
required |
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData with brain region expression data. |
download_bulk_data
¶
download_bulk_data(
file_type: str = "json",
version: Optional[str] = None,
output_path: Optional[str] = None,
) -> Path
Download bulk HPA data file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_type
|
str
|
File type to download (tsv, json, xml). |
'json'
|
version
|
Optional[str]
|
HPA version number (e.g., "24"). None for latest. |
None
|
output_path
|
Optional[str]
|
Path to save file. If None, saves to data manager path. |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to downloaded file. |
list_columns
staticmethod
¶
List available column specifiers for search_download API.
Returns:
| Type | Description |
|---|---|
Dict[str, str]
|
Dictionary mapping column codes to descriptions. |
NCBI_Fetcher¶
NCBI_Fetcher
¶
Fetcher for NCBI Datasets API.
Provides access to NCBI gene, taxonomy, and genome data via the Datasets REST API v2.
Example
fetcher = NCBI_Fetcher()
# Get gene information by NCBI Gene ID
genes = fetcher.get_genes_by_id([7157, 672]) # TP53, BRCA1
print(genes.as_dataframe())
# Get gene by symbol and taxon
genes = fetcher.get_genes_by_symbol(["TP53", "BRCA1"], taxon="human")
# Get taxonomy information
tax = fetcher.get_taxonomy([9606, 10090]) # Human, mouse
print(tax.as_dataframe())
# Translate gene symbols to IDs
mapping = fetcher.symbol_to_id(["TP53", "BRCA1"], taxon="human")
Initialize NCBI fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
Optional[str]
|
NCBI API key for higher rate limits. Can also be set via NCBI_API_KEY environment variable. |
None
|
get_genes_by_id
¶
get_genes_by_id(
gene_ids: List[int],
returned_content: Optional[str] = None,
page_size: int = 100,
query: Optional[str] = None,
types: Optional[List[str]] = None,
) -> NCBIGeneFetchedData
Get gene data reports by NCBI Gene IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
List[int]
|
List of NCBI Gene IDs (e.g., [7157, 672]). |
required |
returned_content
|
Optional[str]
|
Content type (COMPLETE, IDS_ONLY, COUNTS_ONLY). |
None
|
page_size
|
int
|
Results per page (max 1000). |
100
|
query
|
Optional[str]
|
Additional search query. |
None
|
types
|
Optional[List[str]]
|
Gene type filter (e.g., ["PROTEIN_CODING"]). |
None
|
Returns:
| Type | Description |
|---|---|
NCBIGeneFetchedData
|
NCBIGeneFetchedData with gene reports. |
Example
fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_id([7157, 672]) print(genes.get_gene_symbols()) ['TP53', 'BRCA1']
get_genes_by_symbol
¶
get_genes_by_symbol(
symbols: List[str],
taxon: Union[int, str] = "human",
returned_content: Optional[str] = None,
page_size: int = 100,
) -> NCBIGeneFetchedData
Get gene data reports by gene symbols and taxon.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
List[str]
|
List of gene symbols (e.g., ["TP53", "BRCA1"]). |
required |
taxon
|
Union[int, str]
|
Taxon ID, common name, or scientific name. |
'human'
|
returned_content
|
Optional[str]
|
Content type. |
None
|
page_size
|
int
|
Results per page. |
100
|
Returns:
| Type | Description |
|---|---|
NCBIGeneFetchedData
|
NCBIGeneFetchedData with gene reports. |
Example
fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_symbol(["TP53", "BRCA1"], taxon="human") print(genes.to_id_mapping())
get_genes_by_accession
¶
get_genes_by_accession(
accessions: List[str],
returned_content: Optional[str] = None,
page_size: int = 100,
) -> NCBIGeneFetchedData
Get gene data reports by RefSeq accessions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accessions
|
List[str]
|
List of RefSeq accessions (e.g., ["NM_000546.6"]). |
required |
returned_content
|
Optional[str]
|
Content type. |
None
|
page_size
|
int
|
Results per page. |
100
|
Returns:
| Type | Description |
|---|---|
NCBIGeneFetchedData
|
NCBIGeneFetchedData with gene reports. |
get_genes_by_taxon
¶
get_genes_by_taxon(
taxon: Union[int, str],
query: Optional[str] = None,
types: Optional[List[str]] = None,
page_size: int = 100,
page_token: Optional[str] = None,
) -> NCBIGeneFetchedData
Get gene data reports by taxon.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
taxon
|
Union[int, str]
|
Taxon ID, common name, or scientific name. |
required |
query
|
Optional[str]
|
Search query for gene name/symbol/description. |
None
|
types
|
Optional[List[str]]
|
Gene type filter. |
None
|
page_size
|
int
|
Results per page. |
100
|
page_token
|
Optional[str]
|
Token for pagination. |
None
|
Returns:
| Type | Description |
|---|---|
NCBIGeneFetchedData
|
NCBIGeneFetchedData with gene reports. |
Example
fetcher = NCBI_Fetcher() genes = fetcher.get_genes_by_taxon("human", query="kinase")
get_taxonomy
¶
Get taxonomy data reports.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
taxons
|
List[Union[int, str]]
|
List of taxonomy IDs or names. |
required |
page_size
|
int
|
Results per page. |
100
|
Returns:
| Type | Description |
|---|---|
NCBITaxonomyFetchedData
|
NCBITaxonomyFetchedData with taxonomy reports. |
Example
fetcher = NCBI_Fetcher() tax = fetcher.get_taxonomy([9606, 10090]) print(tax.as_dataframe())
get_genome_by_accession
¶
Get genome assembly data reports by accession.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accessions
|
List[str]
|
List of assembly accessions (e.g., ["GCF_000001405.40"]). |
required |
page_size
|
int
|
Results per page. |
100
|
Returns:
| Type | Description |
|---|---|
NCBIGenomeFetchedData
|
NCBIGenomeFetchedData with genome reports. |
Example
fetcher = NCBI_Fetcher() genomes = fetcher.get_genome_by_accession(["GCF_000001405.40"])
get_genome_by_taxon
¶
get_genome_by_taxon(
taxon: Union[int, str],
page_size: int = 100,
page_token: Optional[str] = None,
reference_only: bool = False,
assembly_source: Optional[str] = None,
) -> NCBIGenomeFetchedData
Get genome assembly data reports by taxon.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
taxon
|
Union[int, str]
|
Taxon ID, common name, or scientific name. |
required |
page_size
|
int
|
Results per page. |
100
|
page_token
|
Optional[str]
|
Token for pagination. |
None
|
reference_only
|
bool
|
If True, only return reference genomes. |
False
|
assembly_source
|
Optional[str]
|
Filter by source ("refseq", "genbank", "all"). |
None
|
Returns:
| Type | Description |
|---|---|
NCBIGenomeFetchedData
|
NCBIGenomeFetchedData with genome reports. |
get_version
¶
Get NCBI Datasets API version.
Returns:
| Type | Description |
|---|---|
str
|
Version string. |
symbol_to_id
¶
Convert gene symbols to NCBI Gene IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
List[str]
|
List of gene symbols. |
required |
taxon
|
Union[int, str]
|
Taxon for the genes. |
'human'
|
Returns:
| Type | Description |
|---|---|
Dict[str, int]
|
Dictionary mapping symbols to gene IDs. |
Example
fetcher = NCBI_Fetcher() mapping = fetcher.symbol_to_id(["TP53", "BRCA1"]) print(mapping)
id_to_symbol
¶
Convert NCBI Gene IDs to gene symbols.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
List[int]
|
List of NCBI Gene IDs. |
required |
Returns:
| Type | Description |
|---|---|
Dict[int, str]
|
Dictionary mapping gene IDs to symbols. |
Example
fetcher = NCBI_Fetcher() mapping = fetcher.id_to_symbol([7157, 672]) print(mapping)
get_gene_info
¶
get_gene_info(
identifiers: List[Union[int, str]],
taxon: Union[int, str] = "human",
) -> NCBIGeneFetchedData
Get gene information by mixed identifiers (IDs or symbols).
Automatically detects whether input is gene IDs or symbols and routes to the appropriate endpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifiers
|
List[Union[int, str]]
|
List of gene IDs (int) or symbols (str). |
required |
taxon
|
Union[int, str]
|
Taxon for symbol lookups. |
'human'
|
Returns:
| Type | Description |
|---|---|
NCBIGeneFetchedData
|
NCBIGeneFetchedData with gene reports. |
FDA_Fetcher¶
FDA_Fetcher
¶
Fetcher for openFDA API.
The openFDA API provides access to FDA data including:
- Drug adverse events (drug/event)
- Drug product labeling (drug/label)
- Drug recalls and enforcement (drug/enforcement)
- Device adverse events and recalls
- Food recalls and enforcement
Rate limits:
- Without API key: 240 requests/min, 1,000 requests/day per IP
- With API key: 240 requests/min, 120,000 requests/day per key
Example
fetcher = FDA_Fetcher()
# Search drug adverse events
events = fetcher.get(
category="drug",
endpoint="event",
search={"patient.drug.medicinalproduct": "aspirin"},
limit=10
)
df = events.as_dataframe(columns=["receivedate", "patient.patientsex"])
# Get drug labels
labels = fetcher.get(
category="drug",
endpoint="label",
search={"openfda.brand_name": "TYLENOL"},
limit=5
)
Initialize FDA fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
Optional[str]
|
openFDA API key for higher rate limits (optional). |
None
|
limit
|
Optional[int]
|
Default limit for queries. If None, uses API default. |
None
|
**data_manager_kws
|
Any
|
Keyword arguments for FDADataManager (e.g., storage_path for stream_to_storage method). |
{}
|
get
¶
get(
category: str,
endpoint: str,
stream: Optional[bool] = None,
**kwargs: Any,
) -> FDAFetchedData
Fetch data from openFDA API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str
|
FDA category (e.g., "drug", "device", "food"). |
required |
endpoint
|
str
|
Category endpoint (e.g., "event", "label", "enforcement"). |
required |
stream
|
Optional[bool]
|
If True, stream the response (for large downloads). |
None
|
**kwargs
|
Any
|
Query parameters including: - search: Search query dict (e.g., {"field": "value"}). - limit: Maximum records to return (1-1000). - skip: Number of records to skip for pagination. - sort: Sort field and direction. - count: Field to count occurrences of. - api_key: Override default API key. |
{}
|
Returns:
| Type | Description |
|---|---|
FDAFetchedData
|
FDAFetchedData with query results. |
Example
fetcher = FDA_Fetcher() data = fetcher.get( ... category="drug", ... endpoint="event", ... search={"patient.drug.medicinalproduct": "aspirin"}, ... limit=10 ... ) print(data)
get_all
¶
get_all(
category: str,
endpoint: str,
method: Literal[
"concat", "stream_to_storage"
] = "concat",
batch_size: int = 1000,
max_records: Optional[int] = None,
rate_limit_per_second: int = 4,
**kwargs: Any,
) -> Union[FDAFetchedData, Path]
Fetch multiple pages of results concurrently.
Uses :meth:schedule_process to dispatch page requests across
threads while staying within the FDA rate limit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str
|
FDA category (e.g. |
required |
endpoint
|
str
|
FDA endpoint (e.g. |
required |
method
|
Literal['concat', 'stream_to_storage']
|
|
'concat'
|
batch_size
|
int
|
Records per request (max 1000). |
1000
|
max_records
|
Optional[int]
|
Total records to fetch. |
None
|
rate_limit_per_second
|
int
|
Max concurrent requests per second (FDA default: 240/min ≈ 4/sec). |
4
|
**kwargs
|
Any
|
Forwarded to the API ( |
{}
|
Note — openFDA rate limits: Without an API key: 240 req/min, 1 000 req/day per IP. With an API key: 240 req/min, 120 000 req/day per key.
Reactome_Fetcher¶
Reactome_Fetcher
¶
Fetcher for Reactome pathway analysis and content APIs.
Reactome provides comprehensive pathway analysis including:
- Over-representation analysis (ORA)
- Expression analysis
- Species comparison
- Pathway hierarchy and content
Example
fetcher = Reactome_Fetcher()
# Perform pathway analysis
genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"]
result = fetcher.analyze(genes)
print(result.significant_pathways().as_dataframe())
# Analysis with projection to human
result = fetcher.analyze_projection(genes, species="Mus musculus")
# Get top-level pathways
pathways = fetcher.get_pathways_top("Homo sapiens")
print(pathways.get_pathway_names())
# Get species list
species = fetcher.get_species()
print(species.get_species_names())
Initialize Reactome fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Default species for analysis (e.g., "Homo sapiens"). |
'Homo sapiens'
|
set_species
¶
Change the default species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name (e.g., "Homo sapiens", "Mus musculus"). |
required |
analyze
¶
analyze(
identifiers: List[str],
species: Optional[str] = None,
interactors: bool = False,
page_size: int = 100,
sort_by: str = "ENTITIES_FDR",
order: str = "ASC",
resource: str = "TOTAL",
p_value: float = 1.0,
include_disease: bool = True,
min_entities: Optional[int] = None,
max_entities: Optional[int] = None,
) -> ReactomeFetchedData
Perform pathway over-representation analysis.
Submits identifiers to Reactome Analysis Service and returns enriched pathways with statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifiers
|
List[str]
|
List of identifiers (gene symbols, UniProt IDs, etc.). |
required |
species
|
Optional[str]
|
Species name. None uses default. |
None
|
interactors
|
bool
|
Include interactors in analysis. |
False
|
page_size
|
int
|
Number of results per page. |
100
|
sort_by
|
str
|
Sort field (ENTITIES_FDR, ENTITIES_PVALUE, etc.). |
'ENTITIES_FDR'
|
order
|
str
|
Sort order (ASC, DESC). |
'ASC'
|
resource
|
str
|
Resource filter (TOTAL, UNIPROT, ENSEMBL, etc.). |
'TOTAL'
|
p_value
|
float
|
P-value cutoff for filtering results. |
1.0
|
include_disease
|
bool
|
Include disease pathways. |
True
|
min_entities
|
Optional[int]
|
Minimum pathway size. |
None
|
max_entities
|
Optional[int]
|
Maximum pathway size. |
None
|
Returns:
| Type | Description |
|---|---|
ReactomeFetchedData
|
ReactomeFetchedData with pathway enrichment results. |
Example
fetcher = Reactome_Fetcher() genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = fetcher.analyze(genes) print(result.significant_pathways(fdr_threshold=0.01).as_dataframe())
analyze_projection
¶
analyze_projection(
identifiers: List[str],
species: Optional[str] = None,
interactors: bool = False,
page_size: int = 100,
sort_by: str = "ENTITIES_FDR",
order: str = "ASC",
resource: str = "TOTAL",
p_value: float = 1.0,
include_disease: bool = True,
) -> ReactomeFetchedData
Analyze identifiers and project results to Homo sapiens.
This is useful for analyzing data from other species while viewing results in the context of human pathways.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifiers
|
List[str]
|
List of identifiers. |
required |
species
|
Optional[str]
|
Source species name (for mapping). |
None
|
interactors
|
bool
|
Include interactors. |
False
|
page_size
|
int
|
Results per page. |
100
|
sort_by
|
str
|
Sort field. |
'ENTITIES_FDR'
|
order
|
str
|
Sort order. |
'ASC'
|
resource
|
str
|
Resource filter. |
'TOTAL'
|
p_value
|
float
|
P-value cutoff. |
1.0
|
include_disease
|
bool
|
Include disease pathways. |
True
|
Returns:
| Type | Description |
|---|---|
ReactomeFetchedData
|
ReactomeFetchedData with human-projected pathway results. |
analyze_single
¶
analyze_single(
identifier: str,
species: Optional[str] = None,
interactors: bool = False,
) -> ReactomeFetchedData
Analyze a single identifier across species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier
|
str
|
Single identifier to analyze. |
required |
species
|
Optional[str]
|
Species filter. |
None
|
interactors
|
bool
|
Include interactors. |
False
|
Returns:
| Type | Description |
|---|---|
ReactomeFetchedData
|
ReactomeFetchedData with pathways containing the identifier. |
get_result_by_token
¶
get_result_by_token(
token: str,
species: Optional[str] = None,
page_size: int = 100,
page: int = 1,
sort_by: str = "ENTITIES_FDR",
order: str = "ASC",
resource: str = "TOTAL",
p_value: float = 1.0,
) -> ReactomeFetchedData
Retrieve analysis results by token.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
str
|
Analysis token from previous analysis. |
required |
species
|
Optional[str]
|
Species filter. |
None
|
page_size
|
int
|
Results per page. |
100
|
page
|
int
|
Page number. |
1
|
sort_by
|
str
|
Sort field. |
'ENTITIES_FDR'
|
order
|
str
|
Sort order. |
'ASC'
|
resource
|
str
|
Resource filter. |
'TOTAL'
|
p_value
|
float
|
P-value cutoff. |
1.0
|
Returns:
| Type | Description |
|---|---|
ReactomeFetchedData
|
ReactomeFetchedData with analysis results. |
get_found_entities
¶
Get entities found in a specific pathway.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
str
|
Analysis token. |
required |
pathway_id
|
str
|
Pathway stable ID (e.g., "R-HSA-123456"). |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of found entity dictionaries. |
get_not_found_identifiers
¶
Get identifiers that were not found in Reactome.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
str
|
Analysis token. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of unmapped identifier strings. |
download_results_json
¶
Download complete analysis results as JSON.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
str
|
Analysis token. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Complete analysis results dictionary. |
map_identifiers
¶
Map identifiers to Reactome entities without analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifiers
|
List[str]
|
List of identifiers to map. |
required |
interactors
|
bool
|
Include interactor mapping. |
False
|
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of mapped entity dictionaries. |
get_pathways_top
¶
Get top-level pathways for a species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
Optional[str]
|
Species name (e.g., "Homo sapiens"). |
None
|
Returns:
| Type | Description |
|---|---|
ReactomePathwaysData
|
ReactomePathwaysData with top-level pathway information. |
Example
fetcher = Reactome_Fetcher() pathways = fetcher.get_pathways_top("Homo sapiens") print(pathways.get_pathway_names())
get_events_hierarchy
¶
Get full event hierarchy for a species.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
Optional[str]
|
Species name. |
None
|
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of event hierarchy dictionaries. |
get_pathways_for_entity
¶
Get pathways containing a specific entity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity_id
|
str
|
Entity identifier (UniProt, gene symbol, etc.). |
required |
Returns:
| Type | Description |
|---|---|
ReactomePathwaysData
|
ReactomePathwaysData with pathways containing the entity. |
get_species
¶
Get all species in Reactome.
Returns:
| Type | Description |
|---|---|
ReactomeSpeciesData
|
ReactomeSpeciesData with species information. |
Example
fetcher = Reactome_Fetcher() species = fetcher.get_species() print(species.get_species_names()[:10])
get_species_main
¶
Get main species with curated or computationally inferred pathways.
Returns:
| Type | Description |
|---|---|
ReactomeSpeciesData
|
ReactomeSpeciesData with main species information. |
get_database_version
¶
Get current Reactome database version.
Returns:
| Type | Description |
|---|---|
str
|
Database version string. |
query_entry
¶
Query a Reactome entry by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entry_id
|
str
|
Reactome stable ID (e.g., "R-HSA-123456"). |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Entry details dictionary. |
get_participants
¶
Get all participants in an event (pathway/reaction).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event_id
|
str
|
Reactome stable ID (e.g., "R-HSA-69278"). |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of participant dictionaries with physical entity info. |
Example
fetcher = Reactome_Fetcher() participants = fetcher.get_participants("R-HSA-69278") for p in participants[:3]: ... print(p.get("displayName"))
get_participants_physical_entities
¶
Get participating physical entities in an event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event_id
|
str
|
Reactome stable ID. |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of physical entity dictionaries. |
get_participants_reference_entities
¶
Get reference entities (genes/proteins) for an event.
This returns the external database references (UniProt, NCBI Gene, etc.) for all participants in a pathway or reaction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event_id
|
str
|
Reactome stable ID (e.g., "R-HSA-69278"). |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of reference entity dictionaries containing: - identifier: External ID (e.g., UniProt accession) - databaseName: Source database (e.g., "UniProt") - displayName: Human-readable name - geneName: Gene symbol (if available) |
Example
fetcher = Reactome_Fetcher() refs = fetcher.get_participants_reference_entities("R-HSA-69278") for ref in refs[:5]: ... print(f"{ref.get('geneName')}: {ref.get('identifier')}")
get_pathway_genes
¶
Get gene identifiers for a pathway.
Convenience method that extracts gene IDs from reference entities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pathway_id
|
str
|
Reactome pathway stable ID. |
required |
id_type
|
str
|
Type of ID to return: - "gene_symbol": Gene symbols (default) - "uniprot": UniProt accessions - "all": Return dict with all available IDs |
'gene_symbol'
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of gene identifiers. |
Example
fetcher = Reactome_Fetcher() genes = fetcher.get_pathway_genes("R-HSA-69278") print(genes[:10]) ['TP53', 'MDM2', 'CDKN1A', ...]
get_all_pathways_with_genes
¶
get_all_pathways_with_genes(
species: Optional[str] = None,
id_type: str = "gene_symbol",
include_hierarchy: bool = True,
) -> Dict[str, tuple]
Get all pathways with their gene members for a species.
This method builds a complete pathway-gene mapping suitable for local over-representation analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
Optional[str]
|
Species name (e.g., "Homo sapiens"). |
None
|
id_type
|
str
|
Gene ID type ("gene_symbol" or "uniprot"). |
'gene_symbol'
|
include_hierarchy
|
bool
|
If True, include all pathways in hierarchy. If False, only top-level pathways. |
True
|
Returns:
| Type | Description |
|---|---|
Dict[str, tuple]
|
Dict mapping pathway_id -> (pathway_name, set of gene IDs). |
Example
fetcher = Reactome_Fetcher() pathways = fetcher.get_all_pathways_with_genes("Homo sapiens") for pid, (name, genes) in list(pathways.items())[:3]: ... print(f"{pid}: {name} ({len(genes)} genes)")
Note
This method makes many API calls and may take several minutes for species with many pathways. Results should be cached.
get_event_ancestors
¶
Get ancestor pathways for an event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event_id
|
str
|
Reactome stable ID. |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of ancestor pathway dictionaries. |
get_complex_subunits
¶
Get subunits of a complex.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
complex_id
|
str
|
Reactome complex stable ID. |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of subunit dictionaries. |
get_entity_component_of
¶
Get complexes/sets that contain an entity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity_id
|
str
|
Reactome entity stable ID. |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of container entity dictionaries. |
get_entity_other_forms
¶
Get other forms of a physical entity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity_id
|
str
|
Reactome entity stable ID. |
required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of other form dictionaries. |
get_diseases
¶
Get all disease objects in Reactome.
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of disease dictionaries. |
get_diseases_doid
¶
Get all Disease Ontology IDs (DOIDs) in Reactome.
Returns:
| Type | Description |
|---|---|
List[str]
|
List of DOID strings. |
map_to_reactions
¶
Map an identifier to Reactome reactions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier
|
str
|
External identifier (e.g., UniProt accession). |
required |
resource
|
str
|
Source database ("UniProt", "NCBI", "ENSEMBL", etc.). |
'UniProt'
|
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]]
|
List of reaction dictionaries. |
DO_Fetcher¶
DO_Fetcher
¶
Fetcher for Disease Ontology API.
Provides access to disease ontology data via two APIs
- Direct DO API for basic metadata
- EBI Ontology Lookup Service (OLS) for comprehensive queries
Example
fetcher = DO_Fetcher()
# Get disease term by DOID
term = fetcher.get_term("DOID:162") # Cancer
print(term.as_dataframe())
# Search for diseases
results = fetcher.search("cancer")
print(results.get_doids())
# Get term hierarchy
parents = fetcher.get_parents("DOID:162")
children = fetcher.get_children("DOID:162")
# Get cross-references
term = fetcher.get_term("DOID:162")
print(term.terms[0].mesh_id) # Get MeSH ID
print(term.terms[0].umls_cui) # Get UMLS CUI
Initialize Disease Ontology fetcher.
get_term
¶
Get a disease term by DOID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID (e.g., "DOID:162", "162", "DOID_162"). |
required |
use_ols
|
bool
|
If True, use OLS API for more detailed data. |
True
|
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with the disease term. |
get_terms
¶
Get multiple disease terms by DOIDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doids
|
List[str]
|
List of Disease Ontology IDs. |
required |
use_ols
|
bool
|
If True, use OLS API for more detailed data. |
True
|
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with all disease terms. |
get_all_terms
¶
Get all disease terms from the ontology (paginated).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
page
|
int
|
Page number (0-indexed). |
0
|
page_size
|
int
|
Number of terms per page. |
100
|
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with disease terms. |
search
¶
search(
query: str,
exact: bool = False,
rows: int = 20,
start: int = 0,
obsoletes: bool = False,
) -> DOSearchFetchedData
Search for disease terms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string. |
required |
exact
|
bool
|
If True, search for exact matches only. |
False
|
rows
|
int
|
Maximum number of results to return. |
20
|
start
|
int
|
Starting offset for pagination. |
0
|
obsoletes
|
bool
|
If True, include obsolete terms. |
False
|
Returns:
| Type | Description |
|---|---|
DOSearchFetchedData
|
DOSearchFetchedData with search results. |
search_by_xref
¶
Search for disease terms by external database reference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
database
|
str
|
Database name (e.g., "MESH", "UMLS_CUI", "ICD10CM"). |
required |
external_id
|
str
|
ID in the external database. |
required |
Returns:
| Type | Description |
|---|---|
DOSearchFetchedData
|
DOSearchFetchedData with matching terms. |
get_parents
¶
Get parent terms of a disease.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID. |
required |
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with parent terms. |
get_children
¶
Get child terms of a disease.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID. |
required |
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with child terms. |
get_ancestors
¶
Get all ancestor terms of a disease.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID. |
required |
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with ancestor terms. |
get_descendants
¶
Get all descendant terms of a disease.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID. |
required |
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with descendant terms. |
get_hierarchical_parents
¶
Get hierarchical parent terms (includes part_of relationships).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID. |
required |
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with hierarchical parent terms. |
get_hierarchical_children
¶
Get hierarchical child terms (includes part_of relationships).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID. |
required |
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with hierarchical child terms. |
get_ontology_info
¶
doid_to_mesh
¶
Convert DOIDs to MeSH IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doids
|
List[str]
|
List of Disease Ontology IDs. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Optional[str]]
|
Dictionary mapping DOIDs to MeSH IDs. |
doid_to_umls
¶
Convert DOIDs to UMLS CUIs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doids
|
List[str]
|
List of Disease Ontology IDs. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Optional[str]]
|
Dictionary mapping DOIDs to UMLS CUIs. |
doid_to_icd10
¶
Convert DOIDs to ICD-10 codes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doids
|
List[str]
|
List of Disease Ontology IDs. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Optional[str]]
|
Dictionary mapping DOIDs to ICD-10 codes. |
EnrichR_Fetcher¶
EnrichR_Fetcher
¶
Fetcher for EnrichR gene set enrichment analysis API.
EnrichR provides enrichment analysis against 200+ gene set libraries covering pathways, ontologies, transcription factors, and more.
Supported organisms:
- human (default)
- mouse
- fly (FlyEnrichr)
- yeast (YeastEnrichr)
- worm (WormEnrichr)
- fish (FishEnrichr)
Example
fetcher = EnrichR_Fetcher()
# Get available gene set libraries
libraries = fetcher.get_libraries()
print(libraries.get_library_names()[:10])
# Perform enrichment analysis
genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"]
result = fetcher.enrich(genes, library="KEGG_2021_Human")
print(result.significant_terms().get_term_names())
Initialize EnrichR fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
organism
|
str
|
Target organism (human, mouse, fly, yeast, worm, fish). |
'human'
|
set_organism
¶
Change the target organism.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
organism
|
str
|
Target organism (human, mouse, fly, yeast, worm, fish). |
required |
get_libraries
¶
Get available gene set libraries and their statistics.
Returns:
| Type | Description |
|---|---|
EnrichRLibrariesData
|
EnrichRLibrariesData containing library information. |
Example
fetcher = EnrichR_Fetcher() libs = fetcher.get_libraries() kegg_libs = libs.search("KEGG") print(kegg_libs.get_library_names())
enrich
¶
enrich(
genes: List[str],
library: str,
description: str = "biodbs gene list",
) -> EnrichRFetchedData
Perform enrichment analysis against a gene set library.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols to analyze. |
required |
library
|
str
|
Name of the gene set library (e.g., "KEGG_2021_Human"). |
required |
description
|
str
|
Description for the gene list. |
'biodbs gene list'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData containing enrichment results. |
Example
fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = fetcher.enrich(genes, "KEGG_2021_Human") top = result.top_terms(5) print(top.get_term_names())
enrich_multiple
¶
enrich_multiple(
genes: List[str],
libraries: List[str],
description: str = "biodbs gene list",
) -> Dict[str, EnrichRFetchedData]
Perform enrichment analysis against multiple libraries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols to analyze. |
required |
libraries
|
List[str]
|
List of library names to query. |
required |
description
|
str
|
Description for the gene list. |
'biodbs gene list'
|
Returns:
| Type | Description |
|---|---|
Dict[str, EnrichRFetchedData]
|
Dictionary mapping library names to EnrichRFetchedData. |
Example
fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1", "EGFR"] results = fetcher.enrich_multiple( ... genes, ... ["KEGG_2021_Human", "GO_Biological_Process_2023"] ... ) for lib, data in results.items(): ... print(f"{lib}: {len(data)} terms")
enrich_with_background
¶
enrich_with_background(
genes: List[str],
background: List[str],
library: str,
description: str = "biodbs gene list",
) -> EnrichRFetchedData
Perform enrichment analysis with a custom background gene set.
Uses the speedrichr API for background enrichment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of query gene symbols. |
required |
background
|
List[str]
|
List of background gene symbols. |
required |
library
|
str
|
Name of the gene set library. |
required |
description
|
str
|
Description for the gene list. |
'biodbs gene list'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData containing enrichment results. |
Example
fetcher = EnrichR_Fetcher() genes = ["TP53", "BRCA1"] background = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS", ...] result = fetcher.enrich_with_background( ... genes, background, "GO_Biological_Process_2023" ... )
view_gene_list
¶
Retrieve a previously submitted gene list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_list_id
|
int
|
The userListId from a previous addList call. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of gene symbols. |
get_gene_map
¶
Get gene set membership for a single gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
Gene symbol. |
required |
library
|
str
|
Gene set library name. |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with gene set membership information. |
export_results
¶
Export enrichment results as text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_list_id
|
int
|
The userListId from a previous addList call. |
required |
library
|
str
|
Gene set library name. |
required |
filename
|
str
|
Output filename (without extension). |
'enrichr_results'
|
Returns:
| Type | Description |
|---|---|
str
|
Tab-separated enrichment results as string. |
enrich_kegg
¶
Perform KEGG pathway enrichment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols. |
required |
year
|
str
|
KEGG library year version. |
'2021'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData with KEGG pathway enrichment. |
enrich_go_bp
¶
Perform GO Biological Process enrichment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols. |
required |
year
|
str
|
GO library year version. |
'2023'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData with GO BP enrichment. |
enrich_go_mf
¶
Perform GO Molecular Function enrichment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols. |
required |
year
|
str
|
GO library year version. |
'2023'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData with GO MF enrichment. |
enrich_go_cc
¶
Perform GO Cellular Component enrichment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols. |
required |
year
|
str
|
GO library year version. |
'2023'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData with GO CC enrichment. |
enrich_reactome
¶
Perform Reactome pathway enrichment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols. |
required |
year
|
str
|
Reactome library year version. |
'2022'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData with Reactome enrichment. |
enrich_wikipathways
¶
Perform WikiPathways enrichment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols. |
required |
year
|
str
|
WikiPathways library year version. |
'2023'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData with WikiPathways enrichment. |
HGNC_Fetcher¶
HGNC_Fetcher
¶
Fetcher for the HGNC REST API (rest.genenames.org).
The HGNC (HUGO Gene Nomenclature Committee) REST API provides authoritative human gene nomenclature data: approved symbols, names, aliases, previous symbols, and cross-references to Ensembl, NCBI Gene, UniProt, OMIM, etc.
Three endpoints are exposed:
- info — service metadata (last update, document count, field lists).
- fetch — exact-match lookup by any stored field; returns full records.
- search — wildcard / boolean query; returns lightweight summaries
(
hgnc_id,symbol,scoreonly).
Rate limit: 10 requests per second (enforced automatically).
Example::
fetcher = HGNC_Fetcher()
# Exact lookup by symbol
data = fetcher.fetch("symbol", "TP53")
entry = data[0] # HGNCEntry
print(entry.hgnc_id) # "HGNC:11998"
print(entry.entrez_id) # "7157"
# Wildcard search
hits = fetcher.search("symbol", "ZNF*")
print(hits.num_found) # many zinc-finger genes
# Service metadata
info = fetcher.info()
print(info["response"]["numDoc"])
info
¶
Retrieve HGNC service metadata.
Returns the raw parsed JSON dict which contains
lastModified: timestamp of last database updatenumDoc: total number of recordssearchableFields: list of fields that can be queriedstoredFields: list of fields returned by fetch
Returns:
| Type | Description |
|---|---|
dict
|
Raw JSON dict from |
Raises:
| Type | Description |
|---|---|
APIError
|
On HTTP errors. |
fetch
¶
Exact-match lookup by any stored field.
Returns full gene records for all entries where field exactly equals term. No wildcard expansion is performed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
HGNC stored field name (e.g. |
required |
term
|
str
|
Exact value to match. |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Raises:
| Type | Description |
|---|---|
APIValidationError
|
If the field name is not recognised (HTTP 400). |
APIError
|
On other HTTP errors. |
Example::
data = fetcher.fetch("symbol", "BRCA1")
print(data[0].ensembl_gene_id) # ENSG00000012048
search
¶
Wildcard / boolean search.
Two calling styles are supported:
- Free-form query:
search("symbol:ZNF* AND status:Approved") - Field + term:
search("symbol", "ZNF*")
Wildcard characters
*— zero or more characters?— exactly one character
Boolean operators: AND, OR, NOT
(URL-encoded as +AND+, +OR+, +NOT+ internally).
Note
Search responses contain only hgnc_id, symbol, and
score. Use :meth:fetch to retrieve complete records.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_or_field
|
str
|
A full Solr query string, OR a field name when term is also provided. |
required |
term
|
Optional[str]
|
The search term for the given field. Leave |
None
|
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
HGNCFetchedData
|
dicts with |
Raises:
| Type | Description |
|---|---|
APIValidationError
|
On an invalid query (HTTP 400). |
APIError
|
On other HTTP errors. |
Example::
# All ZNF genes
hits = fetcher.search("symbol", "ZNF*")
# Approved genes on chromosome 17
hits = fetcher.search("status:Approved+AND+location:17*")
ClinVar_Fetcher¶
ClinVar_Fetcher
¶
Fetcher for the ClinVar E-utilities API.
Wraps the four E-utility endpoints that ClinVar supports (esearch, esummary, efetch, elink) with rate limiting and optional API key authentication.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
Optional[str]
|
NCBI API key for 10 req/s (vs. 3 req/s without).
Falls back to the |
None
|
Example::
fetcher = ClinVar_Fetcher()
# Search for all pathogenic BRCA1 variants
uids = fetcher.search("BRCA1[gene] AND pathogenic[clnsig]")
# Fetch summaries for the first 10
data = fetcher.fetch_summary(uids[:10])
print(data.as_dataframe())
# One-step helper
data = fetcher.search_gene("TP53", retmax=100)
for v in data:
print(v.accession, v.clinical_significance)
search
¶
Find ClinVar variation UIDs matching an Entrez query.
Uses the same query language as the ClinVar website, so you can test a query interactively before automating it.
Common field tags:
BRCA1[gene]— variants in a specific genepathogenic[clnsig]— by clinical significancesingle_gene[prop]— single-gene variants only"Breast cancer"[dis]— by associated disease
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Entrez query string (e.g.
|
required |
retmax
|
int
|
Maximum UIDs to return (default 500; max 10 000). |
500
|
retstart
|
int
|
Zero-based offset for pagination. |
0
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of variation UID strings. |
Example::
uids = fetcher.search("TP53[gene] AND pathogenic[clnsig]",
retmax=200)
count
¶
Return the total number of ClinVar records matching query.
Performs an esearch with retmax=0 so no IDs are transferred.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Entrez query string. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Integer count of matching records. |
fetch_summary
¶
Retrieve document summaries for a list of variation UIDs.
Calls esummary with retmode=json to obtain structured data
including clinical significance, gene associations, conditions,
and genomic coordinates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[Union[str, int]]
|
ClinVar variation UIDs (integers or strings). |
required |
total_count
|
int
|
Optional total hit count from a preceding esearch, stored on the returned object for reference. |
0
|
Returns:
| Type | Description |
|---|---|
ClinVarFetchedData
|
class: |
ClinVarFetchedData
|
per UID. |
Raises:
| Type | Description |
|---|---|
APIError
|
On HTTP errors. |
Example::
data = fetcher.fetch_summary(["65533", "14206"])
for v in data:
print(v.title, v.clinical_significance)
fetch_vcv
¶
Retrieve the full VCV XML record for a variation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accession
|
str
|
VCV accession with or without version
(e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw XML string. |
Example::
xml = fetcher.fetch_vcv("VCV000014206")
fetch_rcv
¶
Retrieve the full RCV XML record for a variation-condition pair.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accession
|
str
|
RCV accession with or without version
(e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw XML string. |
Example::
xml = fetcher.fetch_rcv("RCV000000606")
link_to_pubmed
¶
Return PubMed UIDs linked to a ClinVar variation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variation_id
|
Union[str, int]
|
ClinVar variation UID. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of PubMed UID strings. |
search_gene
¶
search_gene(
gene_symbol: str,
single_gene: bool = True,
retmax: int = 500,
clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData
Search for variants in a gene and return summaries in one step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_symbol
|
str
|
HGNC gene symbol (e.g. |
required |
single_gene
|
bool
|
If |
True
|
retmax
|
int
|
Maximum number of variants to return. |
500
|
clinical_significance
|
Optional[str]
|
Optional filter, e.g. |
None
|
Returns:
| Type | Description |
|---|---|
ClinVarFetchedData
|
class: |
Example::
data = fetcher.search_gene("TP53", retmax=200,
clinical_significance="pathogenic")
print(data.as_dataframe()[["accession", "title",
"clinical_significance"]])
search_condition
¶
search_condition(
condition: str,
retmax: int = 500,
clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData
Search for variants associated with a disease/condition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
condition
|
str
|
Disease or condition name (e.g. |
required |
retmax
|
int
|
Maximum number of variants to return. |
500
|
clinical_significance
|
Optional[str]
|
Optional significance filter. |
None
|
Returns:
| Type | Description |
|---|---|
ClinVarFetchedData
|
class: |
Example::
data = fetcher.search_condition("Lynch syndrome", retmax=100)
UniProt¶
uniprot_get_entry¶
uniprot_get_entry
¶
Get a UniProt entry by accession.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accession
|
str
|
UniProt accession (e.g., "P05067"). |
required |
Returns:
| Type | Description |
|---|---|
UniProtFetchedData
|
UniProtFetchedData with the entry. |
uniprot_search¶
uniprot_search
¶
Search UniProtKB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query (e.g., "gene:TP53 AND organism_id:9606"). |
required |
size
|
int
|
Number of results per page (max 500). |
25
|
reviewed_only
|
bool
|
Only return reviewed (Swiss-Prot) entries. |
False
|
Returns:
| Type | Description |
|---|---|
UniProtSearchResult
|
UniProtSearchResult with matching entries. |
uniprot_search_by_gene¶
uniprot_search_by_gene
¶
uniprot_search_by_gene(
gene_name: str,
organism: Optional[Union[int, str]] = 9606,
reviewed_only: bool = True,
) -> UniProtSearchResult
Search UniProt by gene name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_name
|
str
|
Gene name to search. |
required |
organism
|
Optional[Union[int, str]]
|
Organism tax ID or name (default: human). |
9606
|
reviewed_only
|
bool
|
Only return reviewed entries. |
True
|
Returns:
| Type | Description |
|---|---|
UniProtSearchResult
|
UniProtSearchResult with matching entries. |
gene_to_uniprot¶
gene_to_uniprot
¶
gene_to_uniprot(
gene_names: List[str],
organism: int = 9606,
reviewed_only: bool = True,
return_dict: bool = True,
) -> Union[Dict[str, str], DataFrame]
Map gene names to UniProt accessions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_names
|
List[str]
|
List of gene names. |
required |
organism
|
int
|
Organism tax ID (default: human). |
9606
|
reviewed_only
|
bool
|
Only return reviewed entries. |
True
|
return_dict
|
bool
|
If True, return dict. If False, return DataFrame. |
True
|
Returns:
| Type | Description |
|---|---|
Union[Dict[str, str], DataFrame]
|
Dictionary or DataFrame mapping gene names to accessions. |
uniprot_map_ids¶
uniprot_map_ids
¶
Map IDs between databases using UniProt ID mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[str]
|
List of IDs to map. |
required |
from_db
|
str
|
Source database (e.g., "UniProtKB_AC-ID", "Gene_Name", "GeneID", "Ensembl"). |
required |
to_db
|
str
|
Target database (e.g., "UniProtKB", "GeneID", "PDB", "Ensembl"). |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, List[str]]
|
Dictionary mapping input IDs to lists of output IDs. |
Common database names
- UniProtKB_AC-ID: UniProt accession
- UniProtKB: UniProt (returns full entries)
- Gene_Name: Gene name
- GeneID: NCBI Gene ID
- Ensembl: Ensembl ID
- PDB: PDB structure ID
- RefSeq_Protein: RefSeq protein ID
PubChem¶
pubchem_get_compound¶
pubchem_get_compound
¶
Get compound data by PubChem CID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cid
|
int
|
PubChem Compound ID. |
required |
Returns:
| Type | Description |
|---|---|
PUGRestFetchedData
|
PUGRestFetchedData containing compound information. |
Example
data = pubchem_get_compound(2244) # Aspirin df = data.as_dataframe()
pubchem_search_by_name¶
pubchem_search_by_name
¶
Search compounds by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Compound name to search. |
required |
Returns:
| Type | Description |
|---|---|
PUGRestFetchedData
|
PUGRestFetchedData containing matching compounds. |
Example
data = pubchem_search_by_name("aspirin") cids = data.get_cids()
pubchem_get_properties¶
pubchem_get_properties
¶
pubchem_get_properties(
cids: Union[int, List[int]],
properties: Optional[List[str]] = None,
) -> PUGRestFetchedData
Get specific properties for compounds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cids
|
Union[int, List[int]]
|
Single CID or list of CIDs. |
required |
properties
|
Optional[List[str]]
|
List of property names. If None, returns common properties. |
None
|
Returns:
| Type | Description |
|---|---|
PUGRestFetchedData
|
PUGRestFetchedData containing property values. |
Example
data = pubchem_get_properties(2244, ["MolecularWeight", "MolecularFormula"]) df = data.as_dataframe()
Ensembl¶
ensembl_lookup¶
ensembl_lookup
¶
ensembl_lookup(
id: str,
species: Optional[str] = None,
expand: bool = False,
db_type: str = "core",
) -> EnsemblFetchedData
Look up an Ensembl stable ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ensembl stable ID (e.g., "ENSG00000141510"). |
required |
species
|
Optional[str]
|
Species name (optional, auto-detected from ID). |
None
|
expand
|
bool
|
If True, include connected features (transcripts, exons). |
False
|
db_type
|
str
|
Database type ("core" or "otherfeatures"). |
'core'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData containing gene/transcript/protein information. |
ensembl_lookup_symbol¶
ensembl_lookup_symbol
¶
Look up a gene by symbol.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
species
|
str
|
Species name (e.g., "human", "mouse"). |
required |
symbol
|
str
|
Gene symbol (e.g., "BRCA2", "TP53"). |
required |
expand
|
bool
|
If True, include connected features. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData containing gene information. |
ensembl_get_sequence¶
ensembl_get_sequence
¶
ensembl_get_sequence(
id: str,
sequence_type: str = "genomic",
species: Optional[str] = None,
expand_5prime: Optional[int] = None,
expand_3prime: Optional[int] = None,
mask: Optional[str] = None,
format: str = "fasta",
) -> EnsemblFetchedData
Get sequence for an Ensembl stable ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ensembl stable ID (gene, transcript, exon, protein). |
required |
sequence_type
|
str
|
Type of sequence ("genomic", "cds", "cdna", "protein"). |
'genomic'
|
species
|
Optional[str]
|
Species name (optional). |
None
|
expand_5prime
|
Optional[int]
|
Extend upstream (genomic only). |
None
|
expand_3prime
|
Optional[int]
|
Extend downstream (genomic only). |
None
|
mask
|
Optional[str]
|
Mask repeats ("hard" or "soft", genomic only). |
None
|
format
|
str
|
Output format ("fasta" or "json"). |
'fasta'
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData containing sequence data. |
Example
data = ensembl_get_sequence("ENST00000269305", sequence_type="cds") print(data.text) # FASTA sequence
ensembl_get_xrefs¶
ensembl_get_xrefs
¶
ensembl_get_xrefs(
id: str,
species: Optional[str] = None,
external_db: Optional[str] = None,
all_levels: bool = False,
) -> EnsemblFetchedData
Get external cross-references for an Ensembl ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Ensembl stable ID. |
required |
species
|
Optional[str]
|
Species name. |
None
|
external_db
|
Optional[str]
|
Filter by external database (e.g., "HGNC", "UniProt"). |
None
|
all_levels
|
bool
|
If True, find all linked features. |
False
|
Returns:
| Type | Description |
|---|---|
EnsemblFetchedData
|
EnsemblFetchedData containing cross-references. |
BioMart¶
biomart_get_genes¶
biomart_get_genes
¶
biomart_get_genes(
ids: List[str],
attributes: Optional[List[str]] = None,
dataset: str = "hsapiens_gene_ensembl",
) -> BioMartQueryData
Get gene information by Ensembl gene IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[str]
|
List of Ensembl gene IDs (e.g., ["ENSG00000141510"]). |
required |
attributes
|
Optional[List[str]]
|
Attributes to retrieve. If None, uses common gene attributes. |
None
|
dataset
|
str
|
BioMart dataset name. Defaults to human genes. |
'hsapiens_gene_ensembl'
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData containing gene information including |
BioMartQueryData
|
gene ID, symbol, description, and coordinates. |
Example
data = biomart_get_genes(["ENSG00000141510", "ENSG00000012048"]) df = data.as_dataframe() print(df[["ensembl_gene_id", "external_gene_name"]])
biomart_convert_ids¶
biomart_convert_ids
¶
biomart_convert_ids(
ids: List[str],
from_type: str = "ensembl_gene_id",
to_type: str = "external_gene_name",
dataset: str = "hsapiens_gene_ensembl",
) -> BioMartQueryData
Convert between different gene ID types.
Supported ID types
- ensembl_gene_id, ensembl_transcript_id, ensembl_peptide_id
- external_gene_name, hgnc_symbol, hgnc_id
- entrezgene_id, uniprot_gn_id
- refseq_mrna, refseq_peptide
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[str]
|
List of IDs to convert. |
required |
from_type
|
str
|
Source ID type (used as filter). |
'ensembl_gene_id'
|
to_type
|
str
|
Target ID type. |
'external_gene_name'
|
dataset
|
str
|
BioMart dataset name. Defaults to human genes. |
'hsapiens_gene_ensembl'
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData containing ID mappings with |
BioMartQueryData
|
both source and target ID columns. |
Example
data = biomart_convert_ids( ... ["TP53", "BRCA1"], ... from_type="external_gene_name", ... to_type="ensembl_gene_id" ... ) df = data.as_dataframe()
biomart_query¶
biomart_query
¶
biomart_query(
dataset: str = "hsapiens_gene_ensembl",
attributes: Optional[List[str]] = None,
filters: Optional[
Dict[str, Union[str, List[str]]]
] = None,
) -> BioMartQueryData
Execute a custom BioMart query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
str
|
BioMart dataset name. |
'hsapiens_gene_ensembl'
|
attributes
|
Optional[List[str]]
|
List of attributes to retrieve. |
None
|
filters
|
Optional[Dict[str, Union[str, List[str]]]]
|
Dict of filter name to value(s). |
None
|
Returns:
| Type | Description |
|---|---|
BioMartQueryData
|
BioMartQueryData containing query results. |
Example
data = biomart_query( ... dataset="hsapiens_gene_ensembl", ... attributes=["ensembl_gene_id", "external_gene_name", "chromosome_name"], ... filters={"chromosome_name": "22", "biotype": "protein_coding"} ... ) df = data.as_dataframe()
KEGG¶
kegg_list¶
kegg_list
¶
List entries in a KEGG database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
database
|
str
|
Database name (e.g., "pathway", "module", "compound"). |
required |
organism
|
Optional[str]
|
Organism code for pathway/module lists (e.g., "hsa" for human). |
None
|
Returns:
| Type | Description |
|---|---|
KEGGFetchedData
|
KEGGFetchedData containing a list of entries with IDs and descriptions. |
Example
data = kegg_list("pathway", organism="hsa") df = data.as_dataframe()
kegg_get¶
kegg_get
¶
Retrieve entry data from KEGG database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dbentries
|
Union[str, List[str]]
|
Entry ID or list of IDs (e.g., "hsa:7157"). |
required |
option
|
Optional[str]
|
Output format ("aaseq", "ntseq", "mol", "kcf", "image", "json"). |
None
|
Returns:
| Type | Description |
|---|---|
KEGGFetchedData
|
KEGGFetchedData containing entry data. |
Example
data = kegg_get("hsa:7157") # TP53 gene print(data.text)
data = kegg_get("cpd:C00022", option="mol") print(data.text)
kegg_link¶
kegg_link
¶
Find related entries between KEGG databases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_db
|
str
|
Target database (e.g., "pathway", "module", "disease"). |
required |
source
|
Union[str, List[str]]
|
Source database name OR list of entry IDs. |
required |
Returns:
| Type | Description |
|---|---|
KEGGFetchedData
|
KEGGFetchedData containing linked entries between databases. |
kegg_conv¶
kegg_conv
¶
Convert entry IDs between KEGG and external databases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_db
|
str
|
Target database (e.g., "ncbi-geneid", "ncbi-proteinid", "uniprot"). |
required |
source
|
Union[str, List[str]]
|
Source database name OR list of entry IDs to convert. |
required |
Returns:
| Type | Description |
|---|---|
KEGGFetchedData
|
KEGGFetchedData containing ID mappings between databases. |
ChEMBL¶
chembl_get_molecule¶
chembl_get_molecule
¶
Get molecule data by ChEMBL ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chembl_id
|
str
|
ChEMBL molecule ID (e.g., "CHEMBL25"). |
required |
Returns:
| Type | Description |
|---|---|
ChEMBLFetchedData
|
ChEMBLFetchedData containing molecule information including |
ChEMBLFetchedData
|
structure, properties, and cross-references. |
chembl_search_molecules¶
chembl_search_molecules
¶
Search molecules by name, synonym, or structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query (name, synonym, or InChIKey). |
required |
limit
|
int
|
Maximum number of results to return. |
100
|
Returns:
| Type | Description |
|---|---|
ChEMBLFetchedData
|
ChEMBLFetchedData containing matching molecules. |
Example
data = chembl_search_molecules("aspirin") df = data.as_dataframe() print(df[["molecule_chembl_id", "pref_name"]].head())
chembl_get_approved_drugs¶
chembl_get_approved_drugs
¶
Get list of approved drugs from ChEMBL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of drugs to return. |
1000
|
Returns:
| Type | Description |
|---|---|
ChEMBLFetchedData
|
ChEMBLFetchedData containing approved drug molecules |
ChEMBLFetchedData
|
with their names, structures, and approval information. |
Example
data = chembl_get_approved_drugs(limit=100) df = data.as_dataframe() print(df[["molecule_chembl_id", "pref_name"]].head())
QuickGO¶
quickgo_search_annotations¶
quickgo_search_annotations
¶
quickgo_search_annotations(
go_id: Optional[str] = None,
taxon_id: Optional[int] = None,
gene_product_id: Optional[str] = None,
evidence_code: Optional[str] = None,
limit: int = 100,
) -> QuickGOFetchedData
Search GO annotations with filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
go_id
|
Optional[str]
|
GO term ID to filter by. |
None
|
taxon_id
|
Optional[int]
|
NCBI taxonomy ID (e.g., 9606 for human). |
None
|
gene_product_id
|
Optional[str]
|
Gene product ID (e.g., "UniProtKB:P04637"). |
None
|
evidence_code
|
Optional[str]
|
Evidence code (e.g., "IDA", "IEA"). |
None
|
limit
|
int
|
Maximum number of results to return. |
100
|
Returns:
| Type | Description |
|---|---|
QuickGOFetchedData
|
QuickGOFetchedData containing matching GO annotations |
QuickGOFetchedData
|
with gene products, GO terms, and evidence codes. |
Example
data = quickgo_search_annotations(go_id="GO:0006915", taxon_id=9606) df = data.as_dataframe() print(df[["geneProductId", "goId", "goName"]].head())
quickgo_get_terms¶
quickgo_get_terms
¶
Get GO term details by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
Union[str, List[str]]
|
GO term ID or list of IDs (e.g., "GO:0008150" or ["GO:0008150", "GO:0003674"]). |
required |
Returns:
| Type | Description |
|---|---|
QuickGOFetchedData
|
QuickGOFetchedData containing term details including |
QuickGOFetchedData
|
name, definition, aspect, and synonyms. |
HPA (Human Protein Atlas)¶
hpa_get_gene¶
hpa_get_gene
¶
Get protein data for a single gene.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene
|
str
|
Gene name (e.g., "TP53") or Ensembl ID (e.g., "ENSG00000141510"). |
required |
fmt
|
str
|
Response format ("json", "xml", or "tsv"). |
'json'
|
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData containing protein information including |
HPAFetchedData
|
expression data, antibody information, and references. |
Example
data = hpa_get_gene("TP53") print(data.results[0].keys())
hpa_get_tissue_expression¶
hpa_get_tissue_expression
¶
Get tissue expression data for genes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
Union[str, List[str]]
|
Gene name(s) or Ensembl ID(s). |
required |
Returns:
| Type | Description |
|---|---|
HPAFetchedData
|
HPAFetchedData containing tissue expression levels |
HPAFetchedData
|
across different human tissues and organs. |
Example
data = hpa_get_tissue_expression("TP53") df = data.as_dataframe() print(df[["Gene", "Tissue", "Level"]].head())
NCBI¶
ncbi_get_gene¶
ncbi_get_gene
¶
ncbi_get_gene(
identifiers: List[Union[int, str]],
taxon: Union[int, str] = "human",
api_key: Optional[str] = None,
) -> NCBIGeneFetchedData
Get gene information from NCBI by gene IDs or symbols.
This is a convenience function that wraps the NCBI_Fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifiers
|
List[Union[int, str]]
|
List of NCBI Gene IDs (integers) or gene symbols (strings). |
required |
taxon
|
Union[int, str]
|
Taxonomy ID or name (used for symbol lookups). |
'human'
|
api_key
|
Optional[str]
|
Optional NCBI API key for higher rate limits. |
None
|
Returns:
| Type | Description |
|---|---|
NCBIGeneFetchedData
|
NCBIGeneFetchedData containing gene reports. |
Examples:
ncbi_symbol_to_id¶
ncbi_symbol_to_id
¶
ncbi_symbol_to_id(
symbols: List[str],
taxon: Union[int, str] = "human",
api_key: Optional[str] = None,
return_dict: bool = True,
) -> Union[Dict[str, int], DataFrame]
Convert gene symbols to NCBI Gene IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbols
|
List[str]
|
List of gene symbols. |
required |
taxon
|
Union[int, str]
|
Taxonomy ID or name. |
'human'
|
api_key
|
Optional[str]
|
Optional NCBI API key. |
None
|
return_dict
|
bool
|
If True, return dict. If False, return DataFrame. |
True
|
Returns:
| Type | Description |
|---|---|
Union[Dict[str, int], DataFrame]
|
Dictionary mapping symbols to gene IDs, or DataFrame. |
Example
mapping = ncbi_symbol_to_id(["TP53", "BRCA1", "EGFR"]) print(mapping)
FDA¶
fda_search¶
fda_search
¶
fda_search(
category: str,
endpoint: str,
search: Optional[Union[str, Dict]] = None,
limit: int = 100,
**kwargs: Any,
) -> FDAFetchedData
Search FDA openFDA database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str
|
FDA category ("drug", "device", "food", etc.). |
required |
endpoint
|
str
|
Endpoint within category ("event", "label", "enforcement", etc.). |
required |
search
|
Optional[Union[str, Dict]]
|
Search query string or dict of field:value pairs. |
None
|
limit
|
int
|
Maximum results per request. |
100
|
**kwargs
|
Any
|
Additional parameters (sort, count, skip). |
{}
|
Returns:
| Type | Description |
|---|---|
FDAFetchedData
|
FDAFetchedData containing search results. |
Example
data = fda_search("drug", "event", search="aspirin", limit=10) df = data.as_dataframe()
fda_drug_events¶
fda_drug_events
¶
fda_drug_events(
search: Optional[Union[str, Dict]] = None,
limit: int = 100,
**kwargs: Any,
) -> FDAFetchedData
Search FDA drug adverse event reports (FAERS).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search
|
Optional[Union[str, Dict]]
|
Search query (e.g., "patient.drug.openfda.brand_name:aspirin"). |
None
|
limit
|
int
|
Maximum results to return. |
100
|
**kwargs
|
Any
|
Additional parameters (sort, count, skip). |
{}
|
Returns:
| Type | Description |
|---|---|
FDAFetchedData
|
FDAFetchedData containing adverse event reports with |
FDAFetchedData
|
patient information, drug details, and outcomes. |
Example
data = fda_drug_events(search="aspirin", limit=50) df = data.as_dataframe()
Reactome¶
reactome_analyze¶
reactome_analyze
¶
reactome_analyze(
identifiers: List[str],
species: str = "Homo sapiens",
interactors: bool = False,
page_size: int = 100,
sort_by: str = "ENTITIES_FDR",
order: str = "ASC",
resource: str = "TOTAL",
p_value: float = 1.0,
include_disease: bool = True,
min_entities: Optional[int] = None,
max_entities: Optional[int] = None,
) -> ReactomeFetchedData
Perform Reactome pathway over-representation analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifiers
|
List[str]
|
List of identifiers (gene symbols, UniProt IDs, etc.). |
required |
species
|
str
|
Species name (e.g., "Homo sapiens", "Mus musculus"). |
'Homo sapiens'
|
interactors
|
bool
|
Include interactors in analysis. |
False
|
page_size
|
int
|
Number of results to return. |
100
|
sort_by
|
str
|
Sort field (ENTITIES_FDR, ENTITIES_PVALUE, NAME). |
'ENTITIES_FDR'
|
order
|
str
|
Sort order (ASC, DESC). |
'ASC'
|
resource
|
str
|
Resource filter (TOTAL, UNIPROT, ENSEMBL, etc.). |
'TOTAL'
|
p_value
|
float
|
P-value cutoff for filtering. |
1.0
|
include_disease
|
bool
|
Include disease pathways. |
True
|
min_entities
|
Optional[int]
|
Minimum pathway size. |
None
|
max_entities
|
Optional[int]
|
Maximum pathway size. |
None
|
Returns:
| Type | Description |
|---|---|
ReactomeFetchedData
|
ReactomeFetchedData with pathway enrichment results. |
Example
genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = reactome_analyze(genes) print(f"Found {len(result.pathways)} pathways") Found 172 pathways df = result.significant_pathways(fdr_threshold=0.05).as_dataframe() print(df[["stId", "name", "fdr", "found", "total"]].head(3).to_string()) stId name fdr found total 0 R-HSA-6796648 TP53 Regulates Transcription of DNA Repai... 1.08e-06 7 86 1 R-HSA-3700989 Transcriptional Regulation by TP53 6.45e-04 9 487 2 R-HSA-6806003 Regulation of TP53 Expression and Degradation 6.45e-04 4 46
Disease Ontology¶
do_get_term¶
do_get_term
¶
Get a disease term by DOID.
This is a convenience function that wraps the DO_Fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID (e.g., "DOID:162", "162"). |
required |
use_ols
|
bool
|
If True, use OLS API for more detailed data. |
True
|
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData containing the disease term. |
Example
term = do_get_term("DOID:162") # Cancer print(term.terms[0].name) 'cancer'
do_get_children¶
do_get_children
¶
Get child terms of a disease.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doid
|
str
|
Disease Ontology ID. |
required |
Returns:
| Type | Description |
|---|---|
DOFetchedData
|
DOFetchedData with child terms. |
Example
children = do_get_children("DOID:162") # Cancer print(f"Cancer has {len(children)} child terms")
EnrichR¶
enrichr_enrich¶
enrichr_enrich
¶
enrichr_enrich(
genes: List[str],
library: str,
organism: str = "human",
description: str = "biodbs gene list",
) -> EnrichRFetchedData
Perform gene set enrichment analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
genes
|
List[str]
|
List of gene symbols to analyze. |
required |
library
|
str
|
Name of the gene set library (e.g., "KEGG_2021_Human"). |
required |
organism
|
str
|
Target organism (human, mouse, fly, yeast, worm, fish). |
'human'
|
description
|
str
|
Description for the gene list. |
'biodbs gene list'
|
Returns:
| Type | Description |
|---|---|
EnrichRFetchedData
|
EnrichRFetchedData containing enrichment results with |
EnrichRFetchedData
|
term names, p-values, combined scores, and overlapping genes. |
Example
genes = ["TP53", "BRCA1", "EGFR", "MYC", "KRAS"] result = enrichr_enrich(genes, "KEGG_2021_Human") top = result.top_terms(5) print(top.get_term_names())
enrichr_get_libraries¶
enrichr_get_libraries
¶
Get available gene set libraries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
organism
|
str
|
Target organism (human, mouse, fly, yeast, worm, fish). |
'human'
|
Returns:
| Type | Description |
|---|---|
EnrichRLibrariesData
|
EnrichRLibrariesData containing library statistics including |
EnrichRLibrariesData
|
library names, number of terms, gene coverage, and categories. |
Example
libs = enrichr_get_libraries() kegg = libs.search("KEGG") print(kegg.get_library_names())
HGNC¶
hgnc_fetch¶
hgnc_fetch
¶
Exact-match lookup by any HGNC stored field.
Returns full gene records. No wildcard expansion — use
:func:hgnc_search for wildcard queries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
HGNC field name (e.g. |
required |
term
|
str
|
Exact value to match. |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Example::
data = hgnc_fetch("symbol", "TP53")
entry = data[0]
print(entry.hgnc_id, entry.entrez_id, entry.ensembl_gene_id)
hgnc_search¶
hgnc_search
¶
Wildcard / boolean search across HGNC records.
Returns lightweight summaries (hgnc_id, symbol, score).
Use :func:hgnc_fetch to retrieve full records.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_or_field
|
str
|
Full Solr query string, OR a field name when term is also given. |
required |
term
|
Optional[str]
|
Search term for the given field (supports |
None
|
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Example::
# All approved TP53 family members
hits = hgnc_search("symbol", "TP53*")
print(hits.symbols())
# Boolean query
hits = hgnc_search("status:Approved+AND+locus_group:non-coding+RNA")
hgnc_fetch_by_symbol¶
hgnc_fetch_by_symbol
¶
Fetch a gene entry by its approved HGNC symbol.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
symbol
|
str
|
Approved gene symbol (e.g. |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
HGNCFetchedData
|
one record; zero if the symbol is not found). |
Example::
data = hgnc_fetch_by_symbol("EGFR")
entry = data[0]
print(entry.name) # "epidermal growth factor receptor"
hgnc_fetch_by_hgnc_id¶
hgnc_fetch_by_hgnc_id
¶
Fetch a gene entry by its HGNC ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hgnc_id
|
str
|
HGNC identifier in the form |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Example::
data = hgnc_fetch_by_hgnc_id("HGNC:11998")
print(data[0].symbol) # "TP53"
hgnc_fetch_by_entrez_id¶
hgnc_fetch_by_entrez_id
¶
Fetch a gene entry by NCBI Entrez Gene ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entrez_id
|
str
|
NCBI Gene ID as a string (e.g. |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Example::
data = hgnc_fetch_by_entrez_id("7157")
print(data[0].symbol) # "TP53"
hgnc_fetch_by_ensembl_id¶
hgnc_fetch_by_ensembl_id
¶
Fetch a gene entry by Ensembl stable gene ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ensembl_id
|
str
|
Ensembl gene ID (e.g. |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Example::
data = hgnc_fetch_by_ensembl_id("ENSG00000141510")
print(data[0].symbol) # "TP53"
hgnc_fetch_by_uniprot_id¶
hgnc_fetch_by_uniprot_id
¶
Fetch a gene entry by UniProt accession.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uniprot_id
|
str
|
UniProt accession (e.g. |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Example::
data = hgnc_fetch_by_uniprot_id("P04637")
print(data[0].symbol) # "TP53"
hgnc_fetch_by_refseq¶
hgnc_fetch_by_refseq
¶
Fetch a gene entry by RefSeq accession.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
refseq_accession
|
str
|
RefSeq accession (e.g. |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Example::
data = hgnc_fetch_by_refseq("NM_000546")
print(data[0].symbol) # "TP53"
hgnc_search_symbol¶
hgnc_search_symbol
¶
Search HGNC gene symbols using wildcard patterns.
Returns lightweight summaries; use :func:hgnc_fetch_by_symbol for
full records once you have exact symbols.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Symbol query supporting |
required |
Returns:
| Type | Description |
|---|---|
HGNCFetchedData
|
class: |
Example::
hits = hgnc_search_symbol("TP53*")
print(hits.symbols())
# ['TP53', 'TP53AIP1', 'TP53BP1', 'TP53BP2', ...]
hgnc_info¶
hgnc_info
¶
Return HGNC service metadata.
Includes the database last-modified timestamp, total document count, and the lists of searchable and stored fields.
Returns:
| Type | Description |
|---|---|
dict
|
Raw JSON dict from the |
Example::
info = hgnc_info()
print(info["response"]["numDoc"])
ClinVar¶
clinvar_search¶
clinvar_search
¶
Find ClinVar variation UIDs matching an Entrez query.
Uses the same query language as the ClinVar website. Common field tags:
BRCA1[gene]— gene namepathogenic[clnsig]— clinical significance"Breast cancer"[dis]— diseasesingle_gene[prop]— single-gene variants
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Entrez query string. |
required |
retmax
|
int
|
Maximum UIDs to return (default 500). |
500
|
retstart
|
int
|
Offset for pagination. |
0
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of variation UID strings. |
Example::
uids = clinvar_search("BRCA1[gene] AND pathogenic[clnsig]")
data = clinvar_fetch_by_id(uids[:20])
clinvar_count¶
clinvar_count
¶
Return the total number of ClinVar records matching query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Entrez query string. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Integer count. |
Example::
n = clinvar_count("TP53[gene] AND pathogenic[clnsig]")
print(f"TP53 has {n} pathogenic variants in ClinVar")
clinvar_fetch_by_id¶
clinvar_fetch_by_id
¶
Fetch ClinVar summaries for a list of variation UIDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
List[Union[str, int]]
|
ClinVar variation UIDs (integers or strings). |
required |
Returns:
| Type | Description |
|---|---|
ClinVarFetchedData
|
class: |
Example::
data = clinvar_fetch_by_id([65533, 14206])
print(data.as_dataframe())
clinvar_search_gene¶
clinvar_search_gene
¶
clinvar_search_gene(
gene_symbol: str,
retmax: int = 500,
single_gene: bool = True,
clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData
Search and fetch ClinVar variants for a gene in one step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_symbol
|
str
|
HGNC gene symbol (e.g. |
required |
retmax
|
int
|
Maximum variants to return. |
500
|
single_gene
|
bool
|
If |
True
|
clinical_significance
|
Optional[str]
|
Optional filter (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
ClinVarFetchedData
|
class: |
Example::
data = clinvar_search_gene("TP53", retmax=200,
clinical_significance="pathogenic")
df = data.as_dataframe()
clinvar_search_condition¶
clinvar_search_condition
¶
clinvar_search_condition(
condition: str,
retmax: int = 500,
clinical_significance: Optional[str] = None,
) -> ClinVarFetchedData
Search and fetch ClinVar variants for a disease/condition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
condition
|
str
|
Disease or condition name (e.g. |
required |
retmax
|
int
|
Maximum variants to return. |
500
|
clinical_significance
|
Optional[str]
|
Optional significance filter. |
None
|
Returns:
| Type | Description |
|---|---|
ClinVarFetchedData
|
class: |
Example::
data = clinvar_search_condition("Breast cancer",
clinical_significance="pathogenic")
clinvar_fetch_vcv¶
clinvar_fetch_vcv
¶
Retrieve the full VCV XML record for a variation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accession
|
str
|
VCV accession (e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw XML string. |
Example::
xml = clinvar_fetch_vcv("VCV000014206")
clinvar_fetch_rcv¶
clinvar_fetch_rcv
¶
Retrieve the full RCV XML record for a variation-condition pair.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
accession
|
str
|
RCV accession (e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw XML string. |
Example::
xml = clinvar_fetch_rcv("RCV000000606")
clinvar_link_pubmed¶
clinvar_link_pubmed
¶
Return PubMed UIDs linked to a ClinVar variation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variation_id
|
Union[str, int]
|
ClinVar variation UID. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of PubMed UID strings. |
Example::
pmids = clinvar_link_pubmed(65533)
Rate Limiting¶
| Function/Class | Description |
|---|---|
RateLimiter |
Global rate limiter for API calls |
get_rate_limiter |
Get the singleton rate limiter instance |
request_with_retry |
Make HTTP request with retry logic |