Graph Module API Reference¶
Complete API reference for biodbs.graph module.
Summary¶
Core Classes¶
| Class | Description |
|---|---|
Node |
Immutable node representing a biological entity |
Edge |
Immutable edge representing a relationship |
KnowledgeGraph |
Container for nodes and edges with graph operations |
Enums¶
| Enum | Description |
|---|---|
NodeType |
Types of biological entities (gene, protein, disease, etc.) |
EdgeType |
Types of relationships (is_a, part_of, regulates, etc.) |
DataSource |
Supported data sources for graph construction |
Builder Functions¶
| Function | Description |
|---|---|
build_graph |
Create graph from nodes and edges |
build_disease_graph |
Build from Disease Ontology data |
build_go_graph |
Build from Gene Ontology data |
build_reactome_graph |
Build from Reactome data |
build_kegg_graph |
Build from KEGG data |
merge_graphs |
Merge multiple graphs |
Export Functions¶
| Function | Description |
|---|---|
to_networkx |
Export to NetworkX graph |
to_json_ld |
Export to JSON-LD format |
to_rdf |
Export to RDF format |
to_neo4j_csv |
Export to Neo4j CSV files |
to_cypher |
Export to Cypher queries |
Utility Functions¶
| Function | Description |
|---|---|
find_shortest_path |
Find shortest path between nodes |
find_all_paths |
Find all paths up to max length |
get_neighborhood |
Get nodes within N hops |
get_connected_component |
Get connected component containing node |
find_hub_nodes |
Find high-degree hub nodes |
get_graph_statistics |
Compute graph statistics |
Enums¶
NodeType¶
Types of nodes representing biological entities.
| Member | Value | Description |
|---|---|---|
GENE |
"gene" |
Gene entity |
PROTEIN |
"protein" |
Protein entity |
DISEASE |
"disease" |
Disease entity |
PATHWAY |
"pathway" |
Biological pathway |
GO_TERM |
"go_term" |
Gene Ontology term |
REACTION |
"reaction" |
Biochemical reaction |
COMPOUND |
"compound" |
Chemical compound |
DRUG |
"drug" |
Drug/pharmaceutical |
PHENOTYPE |
"phenotype" |
Phenotype |
ORGANISM |
"organism" |
Organism/species |
PUBLICATION |
"publication" |
Scientific publication |
OTHER |
"other" |
Other entity type |
NodeType
¶
Bases: str, Enum
Types of nodes in the knowledge graph.
Each node type represents a different biological entity category.
EdgeType¶
Types of relationships between biological entities.
| Member | Value | Category |
|---|---|---|
IS_A |
"is_a" |
Ontology |
PART_OF |
"part_of" |
Ontology |
HAS_PART |
"has_part" |
Ontology |
REGULATES |
"regulates" |
Regulatory |
POSITIVELY_REGULATES |
"positively_regulates" |
Regulatory |
NEGATIVELY_REGULATES |
"negatively_regulates" |
Regulatory |
PARTICIPATES_IN |
"participates_in" |
Participation |
HAS_PARTICIPANT |
"has_participant" |
Participation |
CATALYZES |
"catalyzes" |
Participation |
PRODUCES |
"produces" |
Participation |
CONSUMES |
"consumes" |
Participation |
ASSOCIATED_WITH |
"associated_with" |
Association |
INTERACTS_WITH |
"interacts_with" |
Association |
TARGETS |
"targets" |
Association |
XREF |
"xref" |
Cross-reference |
SAME_AS |
"same_as" |
Cross-reference |
ENCODES |
"encodes" |
Sequence |
TRANSCRIBES |
"transcribes" |
Sequence |
TRANSLATES |
"translates" |
Sequence |
RELATED_TO |
"related_to" |
Other |
OTHER |
"other" |
Other |
EdgeType
¶
Bases: str, Enum
Types of edges (relationships) in the knowledge graph.
Each edge type represents a different kind of relationship between biological entities.
DataSource¶
Supported data sources for graph construction.
| Member | Value | Description |
|---|---|---|
DISEASE_ONTOLOGY |
"disease_ontology" |
Disease Ontology |
GENE_ONTOLOGY |
"gene_ontology" |
Gene Ontology |
REACTOME |
"reactome" |
Reactome pathways |
KEGG |
"kegg" |
KEGG database |
QUICKGO |
"quickgo" |
QuickGO annotations |
UNIPROT |
"uniprot" |
UniProt |
ENSEMBL |
"ensembl" |
Ensembl |
PUBCHEM |
"pubchem" |
PubChem |
CHEMBL |
"chembl" |
ChEMBL |
CUSTOM |
"custom" |
Custom data source |
DataSource
¶
Bases: str, Enum
Supported data sources for knowledge graph construction.
Core Classes¶
Node¶
Node
dataclass
¶
Node(
id: str,
label: str,
node_type: NodeType = OTHER,
source: DataSource = CUSTOM,
properties: FrozenSet[Tuple[str, Any]] = frozenset(),
xrefs: FrozenSet[str] = frozenset(),
)
A node (entity) in the knowledge graph.
Nodes are immutable (frozen) to ensure graph integrity.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
str
|
Unique identifier for the node (e.g., "DOID:162", "GO:0008150"). |
label |
str
|
Human-readable label for the node. |
node_type |
NodeType
|
Type of biological entity this node represents. |
source |
DataSource
|
Data source this node originated from. |
properties |
FrozenSet[Tuple[str, Any]]
|
Additional properties as a frozen dict. |
xrefs |
FrozenSet[str]
|
Cross-references to other databases. |
Edge¶
Edge
dataclass
¶
Edge(
source: str,
target: str,
relation: EdgeType = RELATED_TO,
weight: float = 1.0,
evidence: FrozenSet[str] = frozenset(),
properties: FrozenSet[Tuple[str, Any]] = frozenset(),
)
A directed edge (relationship) in the knowledge graph.
Edges are immutable (frozen) to ensure graph integrity.
Attributes:
| Name | Type | Description |
|---|---|---|
source |
str
|
ID of the source node. |
target |
str
|
ID of the target node. |
relation |
EdgeType
|
Type of relationship. |
weight |
float
|
Optional edge weight (default 1.0). |
evidence |
FrozenSet[str]
|
Evidence supporting this relationship. |
properties |
FrozenSet[Tuple[str, Any]]
|
Additional properties as a frozen dict. |
KnowledgeGraph¶
KnowledgeGraph
¶
KnowledgeGraph(
name: str = "KnowledgeGraph",
description: Optional[str] = None,
source: DataSource = CUSTOM,
)
Container for a biological knowledge graph.
Provides methods for adding/removing nodes and edges, filtering, subgraph extraction, merging, and computing statistics.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
Name of the knowledge graph. |
|
description |
Optional description. |
|
source |
Primary data source for this graph. |
Example
from biodbs.graph import KnowledgeGraph, Node, Edge, NodeType, EdgeType
graph = KnowledgeGraph(name="DiseaseOntologyGraph")
graph.add_node(Node(id="DOID:162", label="cancer", node_type=NodeType.DISEASE))
graph.add_node(Node(id="DOID:1612", label="breast cancer", node_type=NodeType.DISEASE))
graph.add_edge(Edge(source="DOID:1612", target="DOID:162", relation=EdgeType.IS_A))
print(graph.summary())
# KnowledgeGraph: DiseaseOntologyGraph
# Nodes: 2
# Edges: 1
#
# Node types:
# disease: 2
#
# Edge types:
# is_a: 1
Initialize a new KnowledgeGraph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the graph. |
'KnowledgeGraph'
|
description
|
Optional[str]
|
Optional description. |
None
|
source
|
DataSource
|
Primary data source for this graph. |
CUSTOM
|
add_node
¶
add_node(node: Node) -> bool
Add a node to the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
Node
|
The node to add. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the node was added, False if it already existed. |
add_nodes
¶
add_nodes(nodes: List[Node]) -> int
Add multiple nodes to the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
List[Node]
|
List of nodes to add. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of nodes actually added (excludes duplicates). |
get_node
¶
get_node(node_id: str) -> Optional[Node]
Get a node by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_id
|
str
|
The node identifier. |
required |
Returns:
| Type | Description |
|---|---|
Optional[Node]
|
The Node if found, None otherwise. |
has_node
¶
Check if a node exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_id
|
str
|
The node identifier. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the node exists. |
remove_node
¶
Remove a node and all its connected edges.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_id
|
str
|
The node identifier. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the node was removed, False if it didn't exist. |
add_edge
¶
add_edge(edge: Edge) -> bool
Add an edge to the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
edge
|
Edge
|
The edge to add. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the edge was added, False if it already existed |
bool
|
or if source/target nodes don't exist. |
add_edges
¶
add_edges(edges: List[Edge]) -> int
Add multiple edges to the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
edges
|
List[Edge]
|
List of edges to add. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of edges actually added. |
get_edge
¶
has_edge
¶
has_edge(
source: str,
target: str,
relation: Optional[EdgeType] = None,
) -> bool
Check if an edge exists between two nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
Source node ID. |
required |
target
|
str
|
Target node ID. |
required |
relation
|
Optional[EdgeType]
|
Optional relation type to match. |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
True if the edge exists. |
remove_edge
¶
remove_edge(edge: Edge) -> bool
Remove an edge from the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
edge
|
Edge
|
The edge to remove. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the edge was removed, False if it didn't exist. |
get_outgoing_edges
¶
get_outgoing_edges(node_id: str) -> List[Edge]
Get all outgoing edges from a node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_id
|
str
|
The node identifier. |
required |
Returns:
| Type | Description |
|---|---|
List[Edge]
|
List of outgoing edges. |
get_incoming_edges
¶
get_incoming_edges(node_id: str) -> List[Edge]
Get all incoming edges to a node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_id
|
str
|
The node identifier. |
required |
Returns:
| Type | Description |
|---|---|
List[Edge]
|
List of incoming edges. |
get_neighbors
¶
Get neighboring node IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_id
|
str
|
The node identifier. |
required |
direction
|
str
|
"outgoing", "incoming", or "both". |
'both'
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of neighboring node IDs. |
filter_nodes
¶
filter_nodes(
predicate: Optional[Callable[[Node], bool]] = None,
node_type: Optional[NodeType] = None,
source: Optional[DataSource] = None,
) -> List[Node]
Filter nodes by predicate or attributes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predicate
|
Optional[Callable[[Node], bool]]
|
Function that returns True for nodes to include. |
None
|
node_type
|
Optional[NodeType]
|
Filter by node type. |
None
|
source
|
Optional[DataSource]
|
Filter by data source. |
None
|
Returns:
| Type | Description |
|---|---|
List[Node]
|
List of matching nodes. |
filter_edges
¶
filter_edges(
predicate: Optional[Callable[[Edge], bool]] = None,
relation: Optional[EdgeType] = None,
min_weight: Optional[float] = None,
) -> List[Edge]
Filter edges by predicate or attributes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predicate
|
Optional[Callable[[Edge], bool]]
|
Function that returns True for edges to include. |
None
|
relation
|
Optional[EdgeType]
|
Filter by relation type. |
None
|
min_weight
|
Optional[float]
|
Filter by minimum weight. |
None
|
Returns:
| Type | Description |
|---|---|
List[Edge]
|
List of matching edges. |
get_nodes_by_type
¶
get_edges_by_relation
¶
subgraph
¶
Create a subgraph containing only the specified nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_ids
|
Set[str]
|
Set of node IDs to include. |
required |
Returns:
| Type | Description |
|---|---|
'KnowledgeGraph'
|
A new KnowledgeGraph containing the subgraph. |
induced_subgraph
¶
Alias for subgraph() - creates induced subgraph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_ids
|
Set[str]
|
Set of node IDs to include. |
required |
Returns:
| Type | Description |
|---|---|
'KnowledgeGraph'
|
A new KnowledgeGraph containing the induced subgraph. |
merge
¶
Merge another graph into a new graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
'KnowledgeGraph'
|
The graph to merge with this one. |
required |
Returns:
| Type | Description |
|---|---|
'KnowledgeGraph'
|
A new KnowledgeGraph containing all nodes and edges from both. |
update
¶
Update this graph with nodes and edges from another graph.
Unlike merge(), this modifies the current graph in place.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
'KnowledgeGraph'
|
The graph to merge into this one. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Total number of new nodes and edges added. |
get_node_type_counts
¶
get_node_type_counts() -> Dict[NodeType, int]
Get counts of nodes by type.
Returns:
| Type | Description |
|---|---|
Dict[NodeType, int]
|
Dictionary mapping NodeType to count. |
get_edge_type_counts
¶
get_edge_type_counts() -> Dict[EdgeType, int]
Get counts of edges by relation type.
Returns:
| Type | Description |
|---|---|
Dict[EdgeType, int]
|
Dictionary mapping EdgeType to count. |
get_degree
¶
Get the degree of a node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_id
|
str
|
The node identifier. |
required |
direction
|
str
|
"outgoing", "incoming", or "both". |
'both'
|
Returns:
| Type | Description |
|---|---|
int
|
The degree of the node. |
summary
¶
Get a text summary of the graph.
Returns:
| Type | Description |
|---|---|
str
|
A formatted string with graph statistics. |
to_dict
¶
Convert the graph to a dictionary representation.
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary containing all graph data. |
from_dict
classmethod
¶
Create a KnowledgeGraph from a dictionary representation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Dict[str, Any]
|
Dictionary containing graph data. |
required |
Returns:
| Type | Description |
|---|---|
'KnowledgeGraph'
|
A new KnowledgeGraph instance. |
nodes_as_dataframe
¶
Convert nodes to a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
engine
|
str
|
"pandas" or "polars". |
'pandas'
|
Returns:
| Type | Description |
|---|---|
'pd.DataFrame'
|
DataFrame with node data. |
edges_as_dataframe
¶
Convert edges to a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
engine
|
str
|
"pandas" or "polars". |
'pandas'
|
Returns:
| Type | Description |
|---|---|
'pd.DataFrame'
|
DataFrame with edge data. |
Builder Functions¶
build_graph¶
build_graph
¶
build_graph(
nodes: List[Node],
edges: Optional[List[Edge]] = None,
name: str = "KnowledgeGraph",
description: Optional[str] = None,
source: DataSource = CUSTOM,
) -> KnowledgeGraph
Build a knowledge graph from node and edge lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes
|
List[Node]
|
List of Node objects to add to the graph. |
required |
edges
|
Optional[List[Edge]]
|
Optional list of Edge objects to add. |
None
|
name
|
str
|
Name for the graph. |
'KnowledgeGraph'
|
description
|
Optional[str]
|
Optional description. |
None
|
source
|
DataSource
|
Data source for the graph. |
CUSTOM
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A new KnowledgeGraph instance. |
Example
from biodbs.graph import Node, Edge, NodeType, EdgeType, build_graph
nodes = [
Node(id="A", label="Node A", node_type=NodeType.GENE),
Node(id="B", label="Node B", node_type=NodeType.GENE),
]
edges = [
Edge(source="A", target="B", relation=EdgeType.INTERACTS_WITH),
]
graph = build_graph(nodes, edges, name="MyGraph")
print(graph)
# KnowledgeGraph(name='MyGraph', nodes=2, edges=1)
Source code in biodbs/_funcs/graph/builders.py
build_disease_graph¶
build_disease_graph
¶
build_disease_graph(
data: "DOFetchedData",
name: str = "DiseaseOntologyGraph",
include_xrefs: bool = True,
include_synonyms: bool = False,
) -> KnowledgeGraph
Build a knowledge graph from Disease Ontology data.
Creates nodes for each disease term and edges for hierarchical relationships (is_a) when parent terms are available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
'DOFetchedData'
|
DOFetchedData from Disease Ontology fetcher. |
required |
name
|
str
|
Name for the graph. |
'DiseaseOntologyGraph'
|
include_xrefs
|
bool
|
Include cross-references as node xrefs. |
True
|
include_synonyms
|
bool
|
Include synonyms in node properties. |
False
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A KnowledgeGraph with disease nodes. |
Example
from biodbs.fetch import DO_Fetcher
from biodbs.graph import build_disease_graph
fetcher = DO_Fetcher()
data = fetcher.get_children("DOID:162") # cancer
graph = build_disease_graph(data)
print(graph.summary())
# KnowledgeGraph: DiseaseOntologyGraph
# Nodes: 47
# Edges: 0
#
# Node types:
# disease: 47
Source code in biodbs/_funcs/graph/builders.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
build_disease_graph_with_hierarchy¶
build_disease_graph_with_hierarchy
¶
build_disease_graph_with_hierarchy(
parent_data: "DOFetchedData",
children_data: "DOFetchedData",
name: str = "DiseaseOntologyGraph",
include_xrefs: bool = True,
) -> KnowledgeGraph
Build a disease graph with explicit parent-child relationships.
Use this when you have fetched both parent and children terms and want to create IS_A edges between them.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent_data
|
'DOFetchedData'
|
DOFetchedData containing the parent term(s). |
required |
children_data
|
'DOFetchedData'
|
DOFetchedData containing child terms. |
required |
name
|
str
|
Name for the graph. |
'DiseaseOntologyGraph'
|
include_xrefs
|
bool
|
Include cross-references as node xrefs. |
True
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A KnowledgeGraph with disease nodes and IS_A edges. |
Example
from biodbs.fetch import DO_Fetcher
from biodbs.graph import build_disease_graph_with_hierarchy
fetcher = DO_Fetcher()
parent = fetcher.get_term("DOID:162") # cancer
children = fetcher.get_children("DOID:162")
graph = build_disease_graph_with_hierarchy(parent, children)
print(graph)
# KnowledgeGraph(name='DiseaseOntologyGraph', nodes=48, edges=47)
Source code in biodbs/_funcs/graph/builders.py
build_go_graph¶
build_go_graph
¶
build_go_graph(
data: "QuickGOFetchedData",
name: str = "GeneOntologyGraph",
include_evidence: bool = True,
create_annotation_edges: bool = True,
) -> KnowledgeGraph
Build a knowledge graph from Gene Ontology (QuickGO) data.
Creates nodes for GO terms and optionally for gene products, with edges representing annotations and ontology relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
'QuickGOFetchedData'
|
QuickGOFetchedData from QuickGO fetcher. |
required |
name
|
str
|
Name for the graph. |
'GeneOntologyGraph'
|
include_evidence
|
bool
|
Include evidence codes in edge properties. |
True
|
create_annotation_edges
|
bool
|
Create edges between gene products and GO terms. |
True
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A KnowledgeGraph with GO term and gene nodes. |
Example
from biodbs.fetch import QuickGO_Fetcher
from biodbs.graph import build_go_graph
fetcher = QuickGO_Fetcher()
data = fetcher.get(category="annotation", endpoint="search", geneProductId="UniProtKB:P04637")
graph = build_go_graph(data)
print(graph)
# KnowledgeGraph(name='GeneOntologyGraph', nodes=25, edges=24)
Source code in biodbs/_funcs/graph/builders.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 | |
build_reactome_graph¶
build_reactome_graph
¶
build_reactome_graph(
data: "ReactomeFetchedData",
name: str = "ReactomeGraph",
include_species: bool = True,
include_disease_info: bool = True,
) -> KnowledgeGraph
Build a knowledge graph from Reactome pathway data.
Creates nodes for pathways and edges based on pathway relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
'ReactomeFetchedData'
|
ReactomeFetchedData from Reactome fetcher. |
required |
name
|
str
|
Name for the graph. |
'ReactomeGraph'
|
include_species
|
bool
|
Include species info in node properties. |
True
|
include_disease_info
|
bool
|
Include disease pathway flag in properties. |
True
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A KnowledgeGraph with pathway nodes. |
Example
Source code in biodbs/_funcs/graph/builders.py
build_reactome_hierarchy_graph¶
build_reactome_hierarchy_graph
¶
build_reactome_hierarchy_graph(
hierarchy_data: List[Dict[str, Any]],
name: str = "ReactomeHierarchyGraph",
) -> KnowledgeGraph
Build a knowledge graph from Reactome hierarchy data.
Creates nodes for pathways and edges for parent-child relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hierarchy_data
|
List[Dict[str, Any]]
|
List of pathway hierarchy dictionaries from Reactome's events hierarchy endpoint. |
required |
name
|
str
|
Name for the graph. |
'ReactomeHierarchyGraph'
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A KnowledgeGraph with pathway nodes and hierarchy edges. |
Source code in biodbs/_funcs/graph/builders.py
build_kegg_graph¶
build_kegg_graph
¶
build_kegg_graph(
data: "KEGGFetchedData",
name: str = "KEGGGraph",
node_type: Optional[NodeType] = None,
) -> KnowledgeGraph
Build a knowledge graph from KEGG data.
Creates nodes from KEGG entries. The node type is inferred from the data operation (pathway, compound, drug, etc.) or can be explicitly specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
'KEGGFetchedData'
|
KEGGFetchedData from KEGG fetcher. |
required |
name
|
str
|
Name for the graph. |
'KEGGGraph'
|
node_type
|
Optional[NodeType]
|
Override the inferred node type. |
None
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A KnowledgeGraph with KEGG nodes. |
Example
Source code in biodbs/_funcs/graph/builders.py
546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 | |
build_kegg_link_graph¶
build_kegg_link_graph
¶
build_kegg_link_graph(
link_data: "KEGGFetchedData",
source_type: NodeType = GENE,
target_type: NodeType = PATHWAY,
relation: EdgeType = PARTICIPATES_IN,
name: str = "KEGGLinkGraph",
) -> KnowledgeGraph
Build a knowledge graph from KEGG link data.
Creates nodes and edges from KEGG link query results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
link_data
|
'KEGGFetchedData'
|
KEGGFetchedData from kegg_link operation. |
required |
source_type
|
NodeType
|
Node type for source entries. |
GENE
|
target_type
|
NodeType
|
Node type for target entries. |
PATHWAY
|
relation
|
EdgeType
|
Edge type for the links. |
PARTICIPATES_IN
|
name
|
str
|
Name for the graph. |
'KEGGLinkGraph'
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A KnowledgeGraph with nodes and edges from link data. |
Example
from biodbs.fetch import kegg_link
from biodbs.graph import build_kegg_link_graph, NodeType
data = kegg_link("pathway", "hsa") # genes to pathways
graph = build_kegg_link_graph(
data,
source_type=NodeType.GENE,
target_type=NodeType.PATHWAY,
)
print(graph)
# KnowledgeGraph(name='KEGGLinkGraph', nodes=8500, edges=42000)
Source code in biodbs/_funcs/graph/builders.py
625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 | |
merge_graphs¶
merge_graphs
¶
merge_graphs(
*graphs: KnowledgeGraph,
name: str = "MergedGraph",
description: Optional[str] = None,
) -> KnowledgeGraph
Merge multiple knowledge graphs into one.
Combines all nodes and edges from the input graphs. Duplicate nodes (same ID) are kept as-is (first occurrence wins). Duplicate edges (same source, target, relation) are deduplicated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*graphs
|
KnowledgeGraph
|
Variable number of KnowledgeGraph instances to merge. |
()
|
name
|
str
|
Name for the merged graph. |
'MergedGraph'
|
description
|
Optional[str]
|
Optional description for the merged graph. |
None
|
Returns:
| Type | Description |
|---|---|
KnowledgeGraph
|
A new KnowledgeGraph containing all nodes and edges. |
Example
Source code in biodbs/_funcs/graph/builders.py
Export Functions¶
to_networkx¶
to_networkx
¶
to_networkx(
graph: KnowledgeGraph,
include_properties: bool = True,
include_xrefs: bool = True,
) -> "nx.DiGraph"
Export a KnowledgeGraph to a NetworkX directed graph.
Requires the networkx package to be installed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The KnowledgeGraph to export. |
required |
include_properties
|
bool
|
Include node/edge properties as attributes. |
True
|
include_xrefs
|
bool
|
Include cross-references as node attributes. |
True
|
Returns:
| Type | Description |
|---|---|
'nx.DiGraph'
|
A NetworkX DiGraph with the same nodes and edges. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If networkx is not installed. |
Example
Source code in biodbs/_funcs/graph/exporters.py
to_json_ld¶
to_json_ld
¶
to_json_ld(
graph: KnowledgeGraph,
include_context: bool = True,
compact: bool = False,
base_uri: Optional[str] = None,
) -> Dict[str, Any]
Export a KnowledgeGraph to JSON-LD format.
JSON-LD is ideal for KG-RAG (Knowledge Graph Retrieval Augmented Generation) applications as it provides structured, semantically rich data that can be easily processed by LLMs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The KnowledgeGraph to export. |
required |
include_context
|
bool
|
Include JSON-LD @context. |
True
|
compact
|
bool
|
Use compact representation (less verbose). |
False
|
base_uri
|
Optional[str]
|
Base URI for node IDs. |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
A dictionary in JSON-LD format. |
Example
Source code in biodbs/_funcs/graph/exporters.py
to_rdf¶
to_rdf
¶
to_rdf(
graph: KnowledgeGraph,
format: Literal["turtle", "xml", "n3", "nt"] = "turtle",
base_uri: str = "http://example.org/biokg/",
) -> str
Export a KnowledgeGraph to RDF format.
Requires the rdflib package to be installed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The KnowledgeGraph to export. |
required |
format
|
Literal['turtle', 'xml', 'n3', 'nt']
|
RDF serialization format ("turtle", "xml", "n3", "nt"). |
'turtle'
|
base_uri
|
str
|
Base URI for the graph. |
'http://example.org/biokg/'
|
Returns:
| Type | Description |
|---|---|
str
|
RDF data as a string in the specified format. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If rdflib is not installed. |
Example
Source code in biodbs/_funcs/graph/exporters.py
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 | |
to_neo4j_csv¶
to_neo4j_csv
¶
to_neo4j_csv(
graph: KnowledgeGraph,
output_dir: Union[str, Path],
nodes_filename: str = "nodes.csv",
edges_filename: str = "relationships.csv",
include_headers: bool = True,
) -> Tuple[Path, Path]
Export a KnowledgeGraph to CSV files for Neo4j import.
Creates two CSV files: one for nodes and one for relationships, formatted for Neo4j's LOAD CSV or neo4j-admin import.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The KnowledgeGraph to export. |
required |
output_dir
|
Union[str, Path]
|
Directory to write CSV files. |
required |
nodes_filename
|
str
|
Filename for nodes CSV. |
'nodes.csv'
|
edges_filename
|
str
|
Filename for relationships CSV. |
'relationships.csv'
|
include_headers
|
bool
|
Include Neo4j import headers. |
True
|
Returns:
| Type | Description |
|---|---|
Tuple[Path, Path]
|
Tuple of (nodes_path, edges_path). |
Example
Source code in biodbs/_funcs/graph/exporters.py
493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 | |
to_cypher¶
to_cypher
¶
to_cypher(
graph: KnowledgeGraph,
batch_size: int = 100,
use_merge: bool = True,
) -> str
Generate Cypher queries to create the graph in Neo4j.
Creates CREATE or MERGE statements for nodes and relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The KnowledgeGraph to export. |
required |
batch_size
|
int
|
Number of statements per transaction. |
100
|
use_merge
|
bool
|
Use MERGE instead of CREATE (prevents duplicates). |
True
|
Returns:
| Type | Description |
|---|---|
str
|
Cypher script as a string. |
Example
Source code in biodbs/_funcs/graph/exporters.py
606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 | |
Utility Functions¶
find_shortest_path¶
find_shortest_path
¶
find_shortest_path(
graph: KnowledgeGraph,
source: str,
target: str,
directed: bool = True,
max_depth: Optional[int] = None,
) -> Optional[List[str]]
Find the shortest path between two nodes using BFS.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The knowledge graph to search. |
required |
source
|
str
|
ID of the source node. |
required |
target
|
str
|
ID of the target node. |
required |
directed
|
bool
|
If True, follow edge direction. If False, treat as undirected. |
True
|
max_depth
|
Optional[int]
|
Maximum path length to search. |
None
|
Returns:
| Type | Description |
|---|---|
Optional[List[str]]
|
List of node IDs forming the shortest path, or None if no path exists. |
Example
Source code in biodbs/_funcs/graph/utils.py
find_all_paths¶
find_all_paths
¶
find_all_paths(
graph: KnowledgeGraph,
source: str,
target: str,
max_depth: int = 5,
directed: bool = True,
) -> List[List[str]]
Find all paths between two nodes using DFS.
Warning: This can be slow for large graphs or high max_depth.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The knowledge graph to search. |
required |
source
|
str
|
ID of the source node. |
required |
target
|
str
|
ID of the target node. |
required |
max_depth
|
int
|
Maximum path length. |
5
|
directed
|
bool
|
If True, follow edge direction. If False, treat as undirected. |
True
|
Returns:
| Type | Description |
|---|---|
List[List[str]]
|
List of paths, where each path is a list of node IDs. |
Example
Source code in biodbs/_funcs/graph/utils.py
get_neighborhood¶
get_neighborhood
¶
get_neighborhood(
graph: KnowledgeGraph,
node_id: str,
hops: int = 1,
directed: bool = False,
include_edges: bool = False,
) -> Dict[str, Any]
Get nodes within N hops of a starting node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The knowledge graph. |
required |
node_id
|
str
|
ID of the center node. |
required |
hops
|
int
|
Number of hops (distance) to include. |
1
|
directed
|
bool
|
If True, only follow outgoing edges. |
False
|
include_edges
|
bool
|
If True, include edges in the result. |
False
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with "nodes" (and optionally "edges") keys. |
Example
Source code in biodbs/_funcs/graph/utils.py
get_connected_component¶
get_connected_component
¶
get_connected_component(
graph: KnowledgeGraph,
node_id: str,
directed: bool = False,
) -> Set[str]
Get all nodes in the same connected component.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The knowledge graph. |
required |
node_id
|
str
|
ID of a node in the component. |
required |
directed
|
bool
|
If True, use strongly connected component semantics. |
False
|
Returns:
| Type | Description |
|---|---|
Set[str]
|
Set of node IDs in the same component. |
Example
Source code in biodbs/_funcs/graph/utils.py
get_all_connected_components¶
get_all_connected_components
¶
get_all_connected_components(
graph: KnowledgeGraph, directed: bool = False
) -> List[Set[str]]
Get all connected components in the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The knowledge graph. |
required |
directed
|
bool
|
If True, find strongly connected components. |
False
|
Returns:
| Type | Description |
|---|---|
List[Set[str]]
|
List of sets, each containing node IDs in a component. |
Source code in biodbs/_funcs/graph/utils.py
find_hub_nodes¶
find_hub_nodes
¶
find_hub_nodes(
graph: KnowledgeGraph,
top_n: int = 10,
direction: str = "both",
node_type: Optional[NodeType] = None,
) -> List[Tuple[str, int]]
Find the most highly connected nodes (hubs).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The knowledge graph. |
required |
top_n
|
int
|
Number of top hubs to return. |
10
|
direction
|
str
|
"outgoing", "incoming", or "both". |
'both'
|
node_type
|
Optional[NodeType]
|
Optional filter by node type. |
None
|
Returns:
| Type | Description |
|---|---|
List[Tuple[str, int]]
|
List of (node_id, degree) tuples, sorted by degree descending. |
Example
Source code in biodbs/_funcs/graph/utils.py
compute_degree_distribution¶
compute_degree_distribution
¶
compute_degree_distribution(
graph: KnowledgeGraph, direction: str = "both"
) -> Dict[int, int]
Compute the degree distribution of the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The knowledge graph. |
required |
direction
|
str
|
"outgoing", "incoming", or "both". |
'both'
|
Returns:
| Type | Description |
|---|---|
Dict[int, int]
|
Dictionary mapping degree to count of nodes with that degree. |
Source code in biodbs/_funcs/graph/utils.py
get_graph_statistics¶
get_graph_statistics
¶
get_graph_statistics(
graph: KnowledgeGraph, compute_centrality: bool = False
) -> Dict[str, Any]
Get comprehensive statistics about the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
KnowledgeGraph
|
The knowledge graph. |
required |
compute_centrality
|
bool
|
If True, compute betweenness centrality (slower). |
False
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with various statistics. |
Example
Source code in biodbs/_funcs/graph/utils.py
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 | |
format_statistics¶
format_statistics
¶
Format graph statistics as a readable string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stats
|
Dict[str, Any]
|
Dictionary from get_graph_statistics(). |
required |
Returns:
| Type | Description |
|---|---|
str
|
Formatted string representation. |