Graph Module API Reference¶

Complete API reference for biodbs.graph module.

Summary¶

Core Classes¶

Class	Description
`Node`	Immutable node representing a biological entity
`Edge`	Immutable edge representing a relationship
`KnowledgeGraph`	Container for nodes and edges with graph operations

Enums¶

Enum	Description
`NodeType`	Types of biological entities (gene, protein, disease, etc.)
`EdgeType`	Types of relationships (is_a, part_of, regulates, etc.)
`DataSource`	Supported data sources for graph construction

Builder Functions¶

Function	Description
`build_graph`	Create graph from nodes and edges
`build_disease_graph`	Build from Disease Ontology data
`build_go_graph`	Build from Gene Ontology data
`build_reactome_graph`	Build from Reactome data
`build_kegg_graph`	Build from KEGG data
`merge_graphs`	Merge multiple graphs

Export Functions¶

Function	Description
`to_networkx`	Export to NetworkX graph
`to_json_ld`	Export to JSON-LD format
`to_rdf`	Export to RDF format
`to_neo4j_csv`	Export to Neo4j CSV files
`to_cypher`	Export to Cypher queries

Utility Functions¶

Function	Description
`find_shortest_path`	Find shortest path between nodes
`find_all_paths`	Find all paths up to max length
`get_neighborhood`	Get nodes within N hops
`get_connected_component`	Get connected component containing node
`find_hub_nodes`	Find high-degree hub nodes
`get_graph_statistics`	Compute graph statistics

Enums¶

NodeType¶

Types of nodes representing biological entities.

Member	Value	Description
`GENE`	`"gene"`	Gene entity
`PROTEIN`	`"protein"`	Protein entity
`DISEASE`	`"disease"`	Disease entity
`PATHWAY`	`"pathway"`	Biological pathway
`GO_TERM`	`"go_term"`	Gene Ontology term
`REACTION`	`"reaction"`	Biochemical reaction
`COMPOUND`	`"compound"`	Chemical compound
`DRUG`	`"drug"`	Drug/pharmaceutical
`PHENOTYPE`	`"phenotype"`	Phenotype
`ORGANISM`	`"organism"`	Organism/species
`PUBLICATION`	`"publication"`	Scientific publication
`OTHER`	`"other"`	Other entity type

NodeType ¶

Bases: str, Enum

Types of nodes in the knowledge graph.

Each node type represents a different biological entity category.

EdgeType¶

Types of relationships between biological entities.

Member	Value	Category
`IS_A`	`"is_a"`	Ontology
`PART_OF`	`"part_of"`	Ontology
`HAS_PART`	`"has_part"`	Ontology
`REGULATES`	`"regulates"`	Regulatory
`POSITIVELY_REGULATES`	`"positively_regulates"`	Regulatory
`NEGATIVELY_REGULATES`	`"negatively_regulates"`	Regulatory
`PARTICIPATES_IN`	`"participates_in"`	Participation
`HAS_PARTICIPANT`	`"has_participant"`	Participation
`CATALYZES`	`"catalyzes"`	Participation
`PRODUCES`	`"produces"`	Participation
`CONSUMES`	`"consumes"`	Participation
`ASSOCIATED_WITH`	`"associated_with"`	Association
`INTERACTS_WITH`	`"interacts_with"`	Association
`TARGETS`	`"targets"`	Association
`XREF`	`"xref"`	Cross-reference
`SAME_AS`	`"same_as"`	Cross-reference
`ENCODES`	`"encodes"`	Sequence
`TRANSCRIBES`	`"transcribes"`	Sequence
`TRANSLATES`	`"translates"`	Sequence
`RELATED_TO`	`"related_to"`	Other
`OTHER`	`"other"`	Other

EdgeType ¶

Bases: str, Enum

Types of edges (relationships) in the knowledge graph.

Each edge type represents a different kind of relationship between biological entities.

DataSource¶

Supported data sources for graph construction.

Member	Value	Description
`DISEASE_ONTOLOGY`	`"disease_ontology"`	Disease Ontology
`GENE_ONTOLOGY`	`"gene_ontology"`	Gene Ontology
`REACTOME`	`"reactome"`	Reactome pathways
`KEGG`	`"kegg"`	KEGG database
`QUICKGO`	`"quickgo"`	QuickGO annotations
`UNIPROT`	`"uniprot"`	UniProt
`ENSEMBL`	`"ensembl"`	Ensembl
`PUBCHEM`	`"pubchem"`	PubChem
`CHEMBL`	`"chembl"`	ChEMBL
`CUSTOM`	`"custom"`	Custom data source

DataSource ¶

Bases: str, Enum

Supported data sources for knowledge graph construction.

Core Classes¶

Node¶

Node `dataclass` ¶

Node(
    id: str,
    label: str,
    node_type: NodeType = OTHER,
    source: DataSource = CUSTOM,
    properties: FrozenSet[Tuple[str, Any]] = frozenset(),
    xrefs: FrozenSet[str] = frozenset(),
)

A node (entity) in the knowledge graph.

Nodes are immutable (frozen) to ensure graph integrity.

Attributes:

Name	Type	Description
`id`	`str`	Unique identifier for the node (e.g., "DOID:162", "GO:0008150").
`label`	`str`	Human-readable label for the node.
`node_type`	`NodeType`	Type of biological entity this node represents.
`source`	`DataSource`	Data source this node originated from.
`properties`	`FrozenSet[Tuple[str, Any]]`	Additional properties as a frozen dict.
`xrefs`	`FrozenSet[str]`	Cross-references to other databases.

get_property ¶

get_property(key: str, default: Any = None) -> Any

Get a property value by key.

get_properties_dict ¶

get_properties_dict() -> Dict[str, Any]

Get properties as a dictionary.

with_properties ¶

with_properties(**kwargs: Any) -> 'Node'

Create a new node with additional/updated properties.

with_xrefs ¶

with_xrefs(*xrefs: str) -> 'Node'

Create a new node with additional cross-references.

to_dict ¶

to_dict() -> Dict[str, Any]

Convert node to dictionary representation.

from_dict `classmethod` ¶

from_dict(data: Dict[str, Any]) -> 'Node'

Create a Node from dictionary representation.

Edge¶

Edge `dataclass` ¶

Edge(
    source: str,
    target: str,
    relation: EdgeType = RELATED_TO,
    weight: float = 1.0,
    evidence: FrozenSet[str] = frozenset(),
    properties: FrozenSet[Tuple[str, Any]] = frozenset(),
)

A directed edge (relationship) in the knowledge graph.

Edges are immutable (frozen) to ensure graph integrity.

Attributes:

Name	Type	Description
`source`	`str`	ID of the source node.
`target`	`str`	ID of the target node.
`relation`	`EdgeType`	Type of relationship.
`weight`	`float`	Optional edge weight (default 1.0).
`evidence`	`FrozenSet[str]`	Evidence supporting this relationship.
`properties`	`FrozenSet[Tuple[str, Any]]`	Additional properties as a frozen dict.

get_property ¶

get_property(key: str, default: Any = None) -> Any

Get a property value by key.

get_properties_dict ¶

get_properties_dict() -> Dict[str, Any]

Get properties as a dictionary.

with_properties ¶

with_properties(**kwargs: Any) -> 'Edge'

Create a new edge with additional/updated properties.

with_evidence ¶

with_evidence(*evidence: str) -> 'Edge'

Create a new edge with additional evidence.

to_dict ¶

to_dict() -> Dict[str, Any]

Convert edge to dictionary representation.

from_dict `classmethod` ¶

from_dict(data: Dict[str, Any]) -> 'Edge'

Create an Edge from dictionary representation.

KnowledgeGraph¶

KnowledgeGraph ¶

KnowledgeGraph(
    name: str = "KnowledgeGraph",
    description: Optional[str] = None,
    source: DataSource = CUSTOM,
)

Container for a biological knowledge graph.

Provides methods for adding/removing nodes and edges, filtering, subgraph extraction, merging, and computing statistics.

Attributes:

Name	Type	Description
`name`		Name of the knowledge graph.
`description`		Optional description.
`source`		Primary data source for this graph.

Example

from biodbs.graph import KnowledgeGraph, Node, Edge, NodeType, EdgeType

graph = KnowledgeGraph(name="DiseaseOntologyGraph")
graph.add_node(Node(id="DOID:162", label="cancer", node_type=NodeType.DISEASE))
graph.add_node(Node(id="DOID:1612", label="breast cancer", node_type=NodeType.DISEASE))
graph.add_edge(Edge(source="DOID:1612", target="DOID:162", relation=EdgeType.IS_A))
print(graph.summary())
# KnowledgeGraph: DiseaseOntologyGraph
# Nodes: 2
# Edges: 1
#
# Node types:
#   disease: 2
#
# Edge types:
#   is_a: 1

Initialize a new KnowledgeGraph.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the graph.	`'KnowledgeGraph'`
`description`	`Optional[str]`	Optional description.	`None`
`source`	`DataSource`	Primary data source for this graph.	`CUSTOM`

nodes `property` ¶

nodes: List[Node]

Get all nodes as a list.

edges `property` ¶

edges: List[Edge]

Get all edges as a list.

node_count `property` ¶

node_count: int

Get the number of nodes.

edge_count `property` ¶

edge_count: int

Get the number of edges.

len ¶

__len__() -> int

Return the number of nodes in the graph.

contains ¶

__contains__(node_id: str) -> bool

Check if a node exists in the graph.

iter ¶

__iter__() -> Iterator[Node]

Iterate over all nodes in the graph.

repr ¶

__repr__() -> str

Return a string representation.

add_node ¶

add_node(node: Node) -> bool

Add a node to the graph.

Parameters:

Name	Type	Description	Default
`node`	`Node`	The node to add.	required

Returns:

Type	Description
`bool`	True if the node was added, False if it already existed.

add_nodes ¶

add_nodes(nodes: List[Node]) -> int

Add multiple nodes to the graph.

Parameters:

Name	Type	Description	Default
`nodes`	`List[Node]`	List of nodes to add.	required

Returns:

Type	Description
`int`	Number of nodes actually added (excludes duplicates).

get_node ¶

get_node(node_id: str) -> Optional[Node]

Get a node by its ID.

Parameters:

Name	Type	Description	Default
`node_id`	`str`	The node identifier.	required

Returns:

Type	Description
`Optional[Node]`	The Node if found, None otherwise.

has_node ¶

has_node(node_id: str) -> bool

Check if a node exists.

Parameters:

Name	Type	Description	Default
`node_id`	`str`	The node identifier.	required

Returns:

Type	Description
`bool`	True if the node exists.

remove_node ¶

remove_node(node_id: str) -> bool

Remove a node and all its connected edges.

Parameters:

Name	Type	Description	Default
`node_id`	`str`	The node identifier.	required

Returns:

Type	Description
`bool`	True if the node was removed, False if it didn't exist.

add_edge ¶

add_edge(edge: Edge) -> bool

Add an edge to the graph.

Parameters:

Name	Type	Description	Default
`edge`	`Edge`	The edge to add.	required

Returns:

Type	Description
`bool`	True if the edge was added, False if it already existed
`bool`	or if source/target nodes don't exist.

add_edges ¶

add_edges(edges: List[Edge]) -> int

Add multiple edges to the graph.

Parameters:

Name	Type	Description	Default
`edges`	`List[Edge]`	List of edges to add.	required

Returns:

Type	Description
`int`	Number of edges actually added.

get_edge ¶

get_edge(
    source: str,
    target: str,
    relation: Optional[EdgeType] = None,
) -> Optional[Edge]

Get an edge between two nodes.

Parameters:

Name	Type	Description	Default
`source`	`str`	Source node ID.	required
`target`	`str`	Target node ID.	required
`relation`	`Optional[EdgeType]`	Optional relation type to match.	`None`

Returns:

Type	Description
`Optional[Edge]`	The Edge if found, None otherwise.

has_edge ¶

has_edge(
    source: str,
    target: str,
    relation: Optional[EdgeType] = None,
) -> bool

Check if an edge exists between two nodes.

Parameters:

Name	Type	Description	Default
`source`	`str`	Source node ID.	required
`target`	`str`	Target node ID.	required
`relation`	`Optional[EdgeType]`	Optional relation type to match.	`None`

Returns:

Type	Description
`bool`	True if the edge exists.

remove_edge ¶

remove_edge(edge: Edge) -> bool

Remove an edge from the graph.

Parameters:

Name	Type	Description	Default
`edge`	`Edge`	The edge to remove.	required

Returns:

Type	Description
`bool`	True if the edge was removed, False if it didn't exist.

get_outgoing_edges ¶

get_outgoing_edges(node_id: str) -> List[Edge]

Get all outgoing edges from a node.

Parameters:

Name	Type	Description	Default
`node_id`	`str`	The node identifier.	required

Returns:

Type	Description
`List[Edge]`	List of outgoing edges.

get_incoming_edges ¶

get_incoming_edges(node_id: str) -> List[Edge]

Get all incoming edges to a node.

Parameters:

Name	Type	Description	Default
`node_id`	`str`	The node identifier.	required

Returns:

Type	Description
`List[Edge]`	List of incoming edges.

get_neighbors ¶

get_neighbors(
    node_id: str, direction: str = "both"
) -> List[str]

Get neighboring node IDs.

Parameters:

Name	Type	Description	Default
`node_id`	`str`	The node identifier.	required
`direction`	`str`	"outgoing", "incoming", or "both".	`'both'`

Returns:

Type	Description
`List[str]`	List of neighboring node IDs.

filter_nodes ¶

filter_nodes(
    predicate: Optional[Callable[[Node], bool]] = None,
    node_type: Optional[NodeType] = None,
    source: Optional[DataSource] = None,
) -> List[Node]

Filter nodes by predicate or attributes.

Parameters:

Name	Type	Description	Default
`predicate`	`Optional[Callable[[Node], bool]]`	Function that returns True for nodes to include.	`None`
`node_type`	`Optional[NodeType]`	Filter by node type.	`None`
`source`	`Optional[DataSource]`	Filter by data source.	`None`

Returns:

Type	Description
`List[Node]`	List of matching nodes.

filter_edges ¶

filter_edges(
    predicate: Optional[Callable[[Edge], bool]] = None,
    relation: Optional[EdgeType] = None,
    min_weight: Optional[float] = None,
) -> List[Edge]

Filter edges by predicate or attributes.

Parameters:

Name	Type	Description	Default
`predicate`	`Optional[Callable[[Edge], bool]]`	Function that returns True for edges to include.	`None`
`relation`	`Optional[EdgeType]`	Filter by relation type.	`None`
`min_weight`	`Optional[float]`	Filter by minimum weight.	`None`

Returns:

Type	Description
`List[Edge]`	List of matching edges.

get_nodes_by_type ¶

get_nodes_by_type(node_type: NodeType) -> List[Node]

Get all nodes of a specific type.

Parameters:

Name	Type	Description	Default
`node_type`	`NodeType`	The node type to filter by.	required

Returns:

Type	Description
`List[Node]`	List of nodes with the specified type.

get_edges_by_relation ¶

get_edges_by_relation(relation: EdgeType) -> List[Edge]

Get all edges with a specific relation type.

Parameters:

Name	Type	Description	Default
`relation`	`EdgeType`	The relation type to filter by.	required

Returns:

Type	Description
`List[Edge]`	List of edges with the specified relation.

subgraph ¶

subgraph(node_ids: Set[str]) -> 'KnowledgeGraph'

Create a subgraph containing only the specified nodes.

Parameters:

Name	Type	Description	Default
`node_ids`	`Set[str]`	Set of node IDs to include.	required

Returns:

Type	Description
`'KnowledgeGraph'`	A new KnowledgeGraph containing the subgraph.

induced_subgraph ¶

induced_subgraph(node_ids: Set[str]) -> 'KnowledgeGraph'

Alias for subgraph() - creates induced subgraph.

Parameters:

Name	Type	Description	Default
`node_ids`	`Set[str]`	Set of node IDs to include.	required

Returns:

Type	Description
`'KnowledgeGraph'`	A new KnowledgeGraph containing the induced subgraph.

merge ¶

merge(other: 'KnowledgeGraph') -> 'KnowledgeGraph'

Merge another graph into a new graph.

Parameters:

Name	Type	Description	Default
`other`	`'KnowledgeGraph'`	The graph to merge with this one.	required

Returns:

Type	Description
`'KnowledgeGraph'`	A new KnowledgeGraph containing all nodes and edges from both.

update ¶

update(other: 'KnowledgeGraph') -> int

Update this graph with nodes and edges from another graph.

Unlike merge(), this modifies the current graph in place.

Parameters:

Name	Type	Description	Default
`other`	`'KnowledgeGraph'`	The graph to merge into this one.	required

Returns:

Type	Description
`int`	Total number of new nodes and edges added.

get_node_type_counts ¶

get_node_type_counts() -> Dict[NodeType, int]

Get counts of nodes by type.

Returns:

Type	Description
`Dict[NodeType, int]`	Dictionary mapping NodeType to count.

get_edge_type_counts ¶

get_edge_type_counts() -> Dict[EdgeType, int]

Get counts of edges by relation type.

Returns:

Type	Description
`Dict[EdgeType, int]`	Dictionary mapping EdgeType to count.

get_degree ¶

get_degree(node_id: str, direction: str = 'both') -> int

Get the degree of a node.

Parameters:

Name	Type	Description	Default
`node_id`	`str`	The node identifier.	required
`direction`	`str`	"outgoing", "incoming", or "both".	`'both'`

Returns:

Type	Description
`int`	The degree of the node.

summary ¶

summary() -> str

Get a text summary of the graph.

Returns:

Type	Description
`str`	A formatted string with graph statistics.

to_dict ¶

to_dict() -> Dict[str, Any]

Convert the graph to a dictionary representation.

Returns:

Type	Description
`Dict[str, Any]`	Dictionary containing all graph data.

from_dict `classmethod` ¶

from_dict(data: Dict[str, Any]) -> 'KnowledgeGraph'

Create a KnowledgeGraph from a dictionary representation.

Parameters:

Name	Type	Description	Default
`data`	`Dict[str, Any]`	Dictionary containing graph data.	required

Returns:

Type	Description
`'KnowledgeGraph'`	A new KnowledgeGraph instance.

nodes_as_dataframe ¶

nodes_as_dataframe(
    engine: str = "pandas",
) -> "pd.DataFrame"

Convert nodes to a DataFrame.

Parameters:

Name	Type	Description	Default
`engine`	`str`	"pandas" or "polars".	`'pandas'`

Returns:

Type	Description
`'pd.DataFrame'`	DataFrame with node data.

edges_as_dataframe ¶

edges_as_dataframe(
    engine: str = "pandas",
) -> "pd.DataFrame"

Convert edges to a DataFrame.

Parameters:

Name	Type	Description	Default
`engine`	`str`	"pandas" or "polars".	`'pandas'`

Returns:

Type	Description
`'pd.DataFrame'`	DataFrame with edge data.

Builder Functions¶

build_graph¶

build_graph ¶

build_graph(
    nodes: List[Node],
    edges: Optional[List[Edge]] = None,
    name: str = "KnowledgeGraph",
    description: Optional[str] = None,
    source: DataSource = CUSTOM,
) -> KnowledgeGraph

Build a knowledge graph from node and edge lists.

Parameters:

Name	Type	Description	Default
`nodes`	`List[Node]`	List of Node objects to add to the graph.	required
`edges`	`Optional[List[Edge]]`	Optional list of Edge objects to add.	`None`
`name`	`str`	Name for the graph.	`'KnowledgeGraph'`
`description`	`Optional[str]`	Optional description.	`None`
`source`	`DataSource`	Data source for the graph.	`CUSTOM`

Returns:

Type	Description
`KnowledgeGraph`	A new KnowledgeGraph instance.

Example

from biodbs.graph import Node, Edge, NodeType, EdgeType, build_graph

nodes = [
    Node(id="A", label="Node A", node_type=NodeType.GENE),
    Node(id="B", label="Node B", node_type=NodeType.GENE),
]
edges = [
    Edge(source="A", target="B", relation=EdgeType.INTERACTS_WITH),
]
graph = build_graph(nodes, edges, name="MyGraph")
print(graph)
# KnowledgeGraph(name='MyGraph', nodes=2, edges=1)

Source code in biodbs/_funcs/graph/builders.py

def build_graph(
    nodes: List[Node],
    edges: Optional[List[Edge]] = None,
    name: str = "KnowledgeGraph",
    description: Optional[str] = None,
    source: DataSource = DataSource.CUSTOM,
) -> KnowledgeGraph:
    """Build a knowledge graph from node and edge lists.

    Args:
        nodes: List of Node objects to add to the graph.
        edges: Optional list of Edge objects to add.
        name: Name for the graph.
        description: Optional description.
        source: Data source for the graph.

    Returns:
        A new KnowledgeGraph instance.

    Example:
        ```python
        from biodbs.graph import Node, Edge, NodeType, EdgeType, build_graph

        nodes = [
            Node(id="A", label="Node A", node_type=NodeType.GENE),
            Node(id="B", label="Node B", node_type=NodeType.GENE),
        ]
        edges = [
            Edge(source="A", target="B", relation=EdgeType.INTERACTS_WITH),
        ]
        graph = build_graph(nodes, edges, name="MyGraph")
        print(graph)
        # KnowledgeGraph(name='MyGraph', nodes=2, edges=1)
        ```
    """
    graph = KnowledgeGraph(name=name, description=description, source=source)
    graph.add_nodes(nodes)
    if edges:
        graph.add_edges(edges)
    return graph

build_disease_graph¶

build_disease_graph ¶

build_disease_graph(
    data: "DOFetchedData",
    name: str = "DiseaseOntologyGraph",
    include_xrefs: bool = True,
    include_synonyms: bool = False,
) -> KnowledgeGraph

Build a knowledge graph from Disease Ontology data.

Creates nodes for each disease term and edges for hierarchical relationships (is_a) when parent terms are available.

Parameters:

Name	Type	Description	Default
`data`	`'DOFetchedData'`	DOFetchedData from Disease Ontology fetcher.	required
`name`	`str`	Name for the graph.	`'DiseaseOntologyGraph'`
`include_xrefs`	`bool`	Include cross-references as node xrefs.	`True`
`include_synonyms`	`bool`	Include synonyms in node properties.	`False`

Returns:

Type	Description
`KnowledgeGraph`	A KnowledgeGraph with disease nodes.

Example

from biodbs.fetch import DO_Fetcher
from biodbs.graph import build_disease_graph

fetcher = DO_Fetcher()
data = fetcher.get_children("DOID:162")  # cancer
graph = build_disease_graph(data)
print(graph.summary())
# KnowledgeGraph: DiseaseOntologyGraph
# Nodes: 47
# Edges: 0
#
# Node types:
#   disease: 47

Source code in biodbs/_funcs/graph/builders.py

def build_disease_graph(
    data: "DOFetchedData",
    name: str = "DiseaseOntologyGraph",
    include_xrefs: bool = True,
    include_synonyms: bool = False,
) -> KnowledgeGraph:
    """Build a knowledge graph from Disease Ontology data.

    Creates nodes for each disease term and edges for hierarchical
    relationships (is_a) when parent terms are available.

    Args:
        data: DOFetchedData from Disease Ontology fetcher.
        name: Name for the graph.
        include_xrefs: Include cross-references as node xrefs.
        include_synonyms: Include synonyms in node properties.

    Returns:
        A KnowledgeGraph with disease nodes.

    Example:
        ```python
        from biodbs.fetch import DO_Fetcher
        from biodbs.graph import build_disease_graph

        fetcher = DO_Fetcher()
        data = fetcher.get_children("DOID:162")  # cancer
        graph = build_disease_graph(data)
        print(graph.summary())
        # KnowledgeGraph: DiseaseOntologyGraph
        # Nodes: 47
        # Edges: 0
        #
        # Node types:
        #   disease: 47
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from Disease Ontology",
        source=DataSource.DISEASE_ONTOLOGY,
    )

    # Track which nodes we've added to create edges
    node_ids: Set[str] = set()

    for term in data.terms:
        # Build properties
        properties: Dict[str, Any] = {}
        if term.definition:
            properties["definition"] = term.definition
        if include_synonyms and term.synonyms:
            properties["synonyms"] = tuple(term.synonyms)
        if term.is_obsolete:
            properties["is_obsolete"] = True
        if hasattr(term, "has_children") and term.has_children:
            properties["has_children"] = True
        if hasattr(term, "is_root") and term.is_root:
            properties["is_root"] = True

        # Build xrefs
        xrefs: Set[str] = set()
        if include_xrefs and term.xrefs:
            xrefs = set(term.xrefs)

        # Create node
        node = Node(
            id=term.doid,
            label=term.name,
            node_type=NodeType.DISEASE,
            source=DataSource.DISEASE_ONTOLOGY,
            properties=frozenset(properties.items()) if properties else frozenset(),
            xrefs=frozenset(xrefs),
        )
        graph.add_node(node)
        node_ids.add(term.doid)

    # If we have hierarchical data (from get_children or get_descendants),
    # the parent-child relationships are implicit in the fetch
    # We can create is_a edges based on the query structure
    # For now, edges are created if the fetcher provides relationship info

    return graph

build_disease_graph_with_hierarchy¶

build_disease_graph_with_hierarchy ¶

build_disease_graph_with_hierarchy(
    parent_data: "DOFetchedData",
    children_data: "DOFetchedData",
    name: str = "DiseaseOntologyGraph",
    include_xrefs: bool = True,
) -> KnowledgeGraph

Build a disease graph with explicit parent-child relationships.

Use this when you have fetched both parent and children terms and want to create IS_A edges between them.

Parameters:

Name	Type	Description	Default
`parent_data`	`'DOFetchedData'`	DOFetchedData containing the parent term(s).	required
`children_data`	`'DOFetchedData'`	DOFetchedData containing child terms.	required
`name`	`str`	Name for the graph.	`'DiseaseOntologyGraph'`
`include_xrefs`	`bool`	Include cross-references as node xrefs.	`True`

Returns:

Type	Description
`KnowledgeGraph`	A KnowledgeGraph with disease nodes and IS_A edges.

Example

from biodbs.fetch import DO_Fetcher
from biodbs.graph import build_disease_graph_with_hierarchy

fetcher = DO_Fetcher()
parent = fetcher.get_term("DOID:162")  # cancer
children = fetcher.get_children("DOID:162")
graph = build_disease_graph_with_hierarchy(parent, children)
print(graph)
# KnowledgeGraph(name='DiseaseOntologyGraph', nodes=48, edges=47)

Source code in biodbs/_funcs/graph/builders.py

def build_disease_graph_with_hierarchy(
    parent_data: "DOFetchedData",
    children_data: "DOFetchedData",
    name: str = "DiseaseOntologyGraph",
    include_xrefs: bool = True,
) -> KnowledgeGraph:
    """Build a disease graph with explicit parent-child relationships.

    Use this when you have fetched both parent and children terms
    and want to create IS_A edges between them.

    Args:
        parent_data: DOFetchedData containing the parent term(s).
        children_data: DOFetchedData containing child terms.
        name: Name for the graph.
        include_xrefs: Include cross-references as node xrefs.

    Returns:
        A KnowledgeGraph with disease nodes and IS_A edges.

    Example:
        ```python
        from biodbs.fetch import DO_Fetcher
        from biodbs.graph import build_disease_graph_with_hierarchy

        fetcher = DO_Fetcher()
        parent = fetcher.get_term("DOID:162")  # cancer
        children = fetcher.get_children("DOID:162")
        graph = build_disease_graph_with_hierarchy(parent, children)
        print(graph)
        # KnowledgeGraph(name='DiseaseOntologyGraph', nodes=48, edges=47)
        ```
    """
    # First build graphs from both datasets
    parent_graph = build_disease_graph(
        parent_data, name=name, include_xrefs=include_xrefs
    )
    children_graph = build_disease_graph(
        children_data, name=name, include_xrefs=include_xrefs
    )

    # Merge them
    graph = parent_graph.merge(children_graph)
    graph.name = name
    graph.description = "Knowledge graph from Disease Ontology with hierarchy"

    # Create IS_A edges from each child to each parent
    parent_ids = {term.doid for term in parent_data.terms}
    child_ids = {term.doid for term in children_data.terms}

    for child_id in child_ids:
        for parent_id in parent_ids:
            if child_id != parent_id:  # No self-loops
                edge = Edge(
                    source=child_id,
                    target=parent_id,
                    relation=EdgeType.IS_A,
                )
                graph.add_edge(edge)

    return graph

build_go_graph¶

build_go_graph ¶

build_go_graph(
    data: "QuickGOFetchedData",
    name: str = "GeneOntologyGraph",
    include_evidence: bool = True,
    create_annotation_edges: bool = True,
) -> KnowledgeGraph

Build a knowledge graph from Gene Ontology (QuickGO) data.

Creates nodes for GO terms and optionally for gene products, with edges representing annotations and ontology relationships.

Parameters:

Name	Type	Description	Default
`data`	`'QuickGOFetchedData'`	QuickGOFetchedData from QuickGO fetcher.	required
`name`	`str`	Name for the graph.	`'GeneOntologyGraph'`
`include_evidence`	`bool`	Include evidence codes in edge properties.	`True`
`create_annotation_edges`	`bool`	Create edges between gene products and GO terms.	`True`

Returns:

Type	Description
`KnowledgeGraph`	A KnowledgeGraph with GO term and gene nodes.

Example

from biodbs.fetch import QuickGO_Fetcher
from biodbs.graph import build_go_graph

fetcher = QuickGO_Fetcher()
data = fetcher.get(category="annotation", endpoint="search", geneProductId="UniProtKB:P04637")
graph = build_go_graph(data)
print(graph)
# KnowledgeGraph(name='GeneOntologyGraph', nodes=25, edges=24)

Source code in biodbs/_funcs/graph/builders.py

def build_go_graph(
    data: "QuickGOFetchedData",
    name: str = "GeneOntologyGraph",
    include_evidence: bool = True,
    create_annotation_edges: bool = True,
) -> KnowledgeGraph:
    """Build a knowledge graph from Gene Ontology (QuickGO) data.

    Creates nodes for GO terms and optionally for gene products,
    with edges representing annotations and ontology relationships.

    Args:
        data: QuickGOFetchedData from QuickGO fetcher.
        name: Name for the graph.
        include_evidence: Include evidence codes in edge properties.
        create_annotation_edges: Create edges between gene products and GO terms.

    Returns:
        A KnowledgeGraph with GO term and gene nodes.

    Example:
        ```python
        from biodbs.fetch import QuickGO_Fetcher
        from biodbs.graph import build_go_graph

        fetcher = QuickGO_Fetcher()
        data = fetcher.get(category="annotation", endpoint="search", geneProductId="UniProtKB:P04637")
        graph = build_go_graph(data)
        print(graph)
        # KnowledgeGraph(name='GeneOntologyGraph', nodes=25, edges=24)
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from Gene Ontology",
        source=DataSource.GENE_ONTOLOGY,
    )

    go_terms: Dict[str, Dict[str, Any]] = {}  # GO ID -> term info
    gene_products: Set[str] = set()  # gene product IDs
    annotations: List[Tuple[str, str, str, Optional[str]]] = []  # (gene, go, relation, evidence)

    for result in data.results:
        # Extract GO term info
        go_id = result.get("goId") or result.get("go_id")
        go_name = result.get("goName") or result.get("go_name") or result.get("name", "")
        go_aspect = result.get("goAspect") or result.get("aspect")

        if go_id:
            if go_id not in go_terms:
                go_terms[go_id] = {
                    "name": go_name,
                    "aspect": go_aspect,
                }

            # Extract gene product info
            gene_id = result.get("geneProductId") or result.get("db_object_id")
            if gene_id:
                gene_products.add(gene_id)

                # Determine relation type from qualifier
                qualifier = result.get("qualifier") or result.get("goEvidence")
                if qualifier:
                    if "NOT" in str(qualifier).upper():
                        relation = EdgeType.NEGATIVELY_REGULATES
                    elif "part_of" in str(qualifier).lower():
                        relation = EdgeType.PART_OF
                    elif "regulates" in str(qualifier).lower():
                        relation = EdgeType.REGULATES
                    else:
                        relation = EdgeType.ASSOCIATED_WITH
                else:
                    relation = EdgeType.ASSOCIATED_WITH

                evidence = result.get("evidenceCode") or result.get("evidence_code")
                annotations.append((gene_id, go_id, relation.value, evidence))

    # Create GO term nodes
    for go_id, info in go_terms.items():
        properties: Dict[str, Any] = {}
        if info.get("aspect"):
            properties["aspect"] = info["aspect"]

        node = Node(
            id=go_id,
            label=info.get("name", go_id),
            node_type=NodeType.GO_TERM,
            source=DataSource.GENE_ONTOLOGY,
            properties=frozenset(properties.items()) if properties else frozenset(),
        )
        graph.add_node(node)

    # Create gene product nodes
    for gene_id in gene_products:
        # Try to determine if it's a protein (UniProt) or gene
        if gene_id.startswith("UniProtKB:") or gene_id.startswith("UniProt:"):
            node_type = NodeType.PROTEIN
        else:
            node_type = NodeType.GENE

        node = Node(
            id=gene_id,
            label=gene_id.split(":")[-1] if ":" in gene_id else gene_id,
            node_type=node_type,
            source=DataSource.GENE_ONTOLOGY,
        )
        graph.add_node(node)

    # Create annotation edges
    if create_annotation_edges:
        for gene_id, go_id, relation_str, evidence in annotations:
            try:
                relation = EdgeType(relation_str)
            except ValueError:
                relation = EdgeType.ASSOCIATED_WITH

            evidence_set = frozenset([evidence]) if evidence and include_evidence else frozenset()

            edge = Edge(
                source=gene_id,
                target=go_id,
                relation=relation,
                evidence=evidence_set,
            )
            graph.add_edge(edge)

    return graph

build_reactome_graph¶

build_reactome_graph ¶

build_reactome_graph(
    data: "ReactomeFetchedData",
    name: str = "ReactomeGraph",
    include_species: bool = True,
    include_disease_info: bool = True,
) -> KnowledgeGraph

Build a knowledge graph from Reactome pathway data.

Creates nodes for pathways and edges based on pathway relationships.

Parameters:

Name	Type	Description	Default
`data`	`'ReactomeFetchedData'`	ReactomeFetchedData from Reactome fetcher.	required
`name`	`str`	Name for the graph.	`'ReactomeGraph'`
`include_species`	`bool`	Include species info in node properties.	`True`
`include_disease_info`	`bool`	Include disease pathway flag in properties.	`True`

Returns:

Type	Description
`KnowledgeGraph`	A KnowledgeGraph with pathway nodes.

Example

from biodbs.fetch import Reactome_Fetcher
from biodbs.graph import build_reactome_graph

fetcher = Reactome_Fetcher()
data = fetcher.analyze(["TP53", "BRCA1", "BRCA2"])
graph = build_reactome_graph(data)
print(graph)
# KnowledgeGraph(name='ReactomeGraph', nodes=42, edges=0)

Source code in biodbs/_funcs/graph/builders.py

def build_reactome_graph(
    data: "ReactomeFetchedData",
    name: str = "ReactomeGraph",
    include_species: bool = True,
    include_disease_info: bool = True,
) -> KnowledgeGraph:
    """Build a knowledge graph from Reactome pathway data.

    Creates nodes for pathways and edges based on pathway relationships.

    Args:
        data: ReactomeFetchedData from Reactome fetcher.
        name: Name for the graph.
        include_species: Include species info in node properties.
        include_disease_info: Include disease pathway flag in properties.

    Returns:
        A KnowledgeGraph with pathway nodes.

    Example:
        ```python
        from biodbs.fetch import Reactome_Fetcher
        from biodbs.graph import build_reactome_graph

        fetcher = Reactome_Fetcher()
        data = fetcher.analyze(["TP53", "BRCA1", "BRCA2"])
        graph = build_reactome_graph(data)
        print(graph)
        # KnowledgeGraph(name='ReactomeGraph', nodes=42, edges=0)
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from Reactome pathways",
        source=DataSource.REACTOME,
    )

    for pathway in data.pathways:
        properties: Dict[str, Any] = {}

        # Add statistics
        if pathway.p_value is not None:
            properties["p_value"] = pathway.p_value
        if pathway.fdr is not None:
            properties["fdr"] = pathway.fdr
        if pathway.found_entities is not None:
            properties["found_entities"] = pathway.found_entities
        if pathway.total_entities is not None:
            properties["total_entities"] = pathway.total_entities

        # Add species info
        if include_species and pathway.species:
            properties["species"] = pathway.species.name
            properties["taxon_id"] = pathway.species.taxId

        # Add disease info
        if include_disease_info:
            properties["is_disease_pathway"] = pathway.inDisease
            properties["is_lowest_level"] = pathway.llp

        # Add database ID
        properties["db_id"] = pathway.dbId

        node = Node(
            id=pathway.stId,
            label=pathway.name,
            node_type=NodeType.PATHWAY,
            source=DataSource.REACTOME,
            properties=frozenset(properties.items()),
        )
        graph.add_node(node)

    return graph

build_reactome_hierarchy_graph¶

build_reactome_hierarchy_graph ¶

build_reactome_hierarchy_graph(
    hierarchy_data: List[Dict[str, Any]],
    name: str = "ReactomeHierarchyGraph",
) -> KnowledgeGraph

Build a knowledge graph from Reactome hierarchy data.

Creates nodes for pathways and edges for parent-child relationships.

Parameters:

Name	Type	Description	Default
`hierarchy_data`	`List[Dict[str, Any]]`	List of pathway hierarchy dictionaries from Reactome's events hierarchy endpoint.	required
`name`	`str`	Name for the graph.	`'ReactomeHierarchyGraph'`

Returns:

Type	Description
`KnowledgeGraph`	A KnowledgeGraph with pathway nodes and hierarchy edges.

Source code in biodbs/_funcs/graph/builders.py

def build_reactome_hierarchy_graph(
    hierarchy_data: List[Dict[str, Any]],
    name: str = "ReactomeHierarchyGraph",
) -> KnowledgeGraph:
    """Build a knowledge graph from Reactome hierarchy data.

    Creates nodes for pathways and edges for parent-child relationships.

    Args:
        hierarchy_data: List of pathway hierarchy dictionaries from
            Reactome's events hierarchy endpoint.
        name: Name for the graph.

    Returns:
        A KnowledgeGraph with pathway nodes and hierarchy edges.
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from Reactome pathway hierarchy",
        source=DataSource.REACTOME,
    )

    def process_node(node_data: Dict[str, Any], parent_id: Optional[str] = None):
        """Recursively process hierarchy nodes."""
        st_id = node_data.get("stId", "")
        name = node_data.get("name", node_data.get("displayName", ""))

        if not st_id:
            return

        properties: Dict[str, Any] = {}
        if node_data.get("hasDiagram"):
            properties["has_diagram"] = True
        if node_data.get("species"):
            properties["species"] = node_data["species"]

        node = Node(
            id=st_id,
            label=name,
            node_type=NodeType.PATHWAY,
            source=DataSource.REACTOME,
            properties=frozenset(properties.items()) if properties else frozenset(),
        )
        graph.add_node(node)

        # Create edge to parent
        if parent_id and parent_id in graph:
            edge = Edge(
                source=st_id,
                target=parent_id,
                relation=EdgeType.PART_OF,
            )
            graph.add_edge(edge)

        # Process children
        children = node_data.get("children", [])
        for child in children:
            process_node(child, st_id)

    # Process top-level nodes
    for node_data in hierarchy_data:
        process_node(node_data)

    return graph

build_kegg_graph¶

build_kegg_graph ¶

build_kegg_graph(
    data: "KEGGFetchedData",
    name: str = "KEGGGraph",
    node_type: Optional[NodeType] = None,
) -> KnowledgeGraph

Build a knowledge graph from KEGG data.

Creates nodes from KEGG entries. The node type is inferred from the data operation (pathway, compound, drug, etc.) or can be explicitly specified.

Parameters:

Name	Type	Description	Default
`data`	`'KEGGFetchedData'`	KEGGFetchedData from KEGG fetcher.	required
`name`	`str`	Name for the graph.	`'KEGGGraph'`
`node_type`	`Optional[NodeType]`	Override the inferred node type.	`None`

Returns:

Type	Description
`KnowledgeGraph`	A KnowledgeGraph with KEGG nodes.

Example

from biodbs.fetch import kegg_list
from biodbs.graph import build_kegg_graph

data = kegg_list("pathway", organism="hsa")
graph = build_kegg_graph(data, name="HumanPathways")
print(graph)
# KnowledgeGraph(name='HumanPathways', nodes=350, edges=0)

Source code in biodbs/_funcs/graph/builders.py

def build_kegg_graph(
    data: "KEGGFetchedData",
    name: str = "KEGGGraph",
    node_type: Optional[NodeType] = None,
) -> KnowledgeGraph:
    """Build a knowledge graph from KEGG data.

    Creates nodes from KEGG entries. The node type is inferred from
    the data operation (pathway, compound, drug, etc.) or can be
    explicitly specified.

    Args:
        data: KEGGFetchedData from KEGG fetcher.
        name: Name for the graph.
        node_type: Override the inferred node type.

    Returns:
        A KnowledgeGraph with KEGG nodes.

    Example:
        ```python
        from biodbs.fetch import kegg_list
        from biodbs.graph import build_kegg_graph

        data = kegg_list("pathway", organism="hsa")
        graph = build_kegg_graph(data, name="HumanPathways")
        print(graph)
        # KnowledgeGraph(name='HumanPathways', nodes=350, edges=0)
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from KEGG database",
        source=DataSource.KEGG,
    )

    # Infer node type from operation or entry IDs
    inferred_type = node_type
    if inferred_type is None:
        if data.operation == "list":
            # Try to infer from first record
            if data.records:
                entry_id = data.records[0].get("entry_id", "")
                inferred_type = _infer_kegg_node_type(entry_id)
        else:
            inferred_type = NodeType.OTHER

    if inferred_type is None:
        inferred_type = NodeType.OTHER

    for record in data.records:
        entry_id = record.get("entry_id") or record.get("ENTRY", "")
        description = record.get("description") or record.get("NAME", "")

        if not entry_id:
            continue

        # For flat file records, extract more properties
        properties: Dict[str, Any] = {}
        if data.format == "flat_file":
            for key in ["DEFINITION", "PATHWAY", "MODULE", "DISEASE", "DBLINKS"]:
                if key in record:
                    value = record[key]
                    # Convert lists to tuples so they are hashable for frozenset
                    properties[key.lower()] = tuple(value) if isinstance(value, list) else value

        # Create node
        node = Node(
            id=entry_id,
            label=description if description else entry_id,
            node_type=inferred_type,
            source=DataSource.KEGG,
            properties=frozenset(properties.items()) if properties else frozenset(),
        )
        graph.add_node(node)

    return graph

build_kegg_link_graph¶

build_kegg_link_graph ¶

build_kegg_link_graph(
    link_data: "KEGGFetchedData",
    source_type: NodeType = GENE,
    target_type: NodeType = PATHWAY,
    relation: EdgeType = PARTICIPATES_IN,
    name: str = "KEGGLinkGraph",
) -> KnowledgeGraph

Build a knowledge graph from KEGG link data.

Creates nodes and edges from KEGG link query results.

Parameters:

Name	Type	Description	Default
`link_data`	`'KEGGFetchedData'`	KEGGFetchedData from kegg_link operation.	required
`source_type`	`NodeType`	Node type for source entries.	`GENE`
`target_type`	`NodeType`	Node type for target entries.	`PATHWAY`
`relation`	`EdgeType`	Edge type for the links.	`PARTICIPATES_IN`
`name`	`str`	Name for the graph.	`'KEGGLinkGraph'`

Returns:

Type	Description
`KnowledgeGraph`	A KnowledgeGraph with nodes and edges from link data.

Example

from biodbs.fetch import kegg_link
from biodbs.graph import build_kegg_link_graph, NodeType

data = kegg_link("pathway", "hsa")  # genes to pathways
graph = build_kegg_link_graph(
    data,
    source_type=NodeType.GENE,
    target_type=NodeType.PATHWAY,
)
print(graph)
# KnowledgeGraph(name='KEGGLinkGraph', nodes=8500, edges=42000)

Source code in biodbs/_funcs/graph/builders.py

def build_kegg_link_graph(
    link_data: "KEGGFetchedData",
    source_type: NodeType = NodeType.GENE,
    target_type: NodeType = NodeType.PATHWAY,
    relation: EdgeType = EdgeType.PARTICIPATES_IN,
    name: str = "KEGGLinkGraph",
) -> KnowledgeGraph:
    """Build a knowledge graph from KEGG link data.

    Creates nodes and edges from KEGG link query results.

    Args:
        link_data: KEGGFetchedData from kegg_link operation.
        source_type: Node type for source entries.
        target_type: Node type for target entries.
        relation: Edge type for the links.
        name: Name for the graph.

    Returns:
        A KnowledgeGraph with nodes and edges from link data.

    Example:
        ```python
        from biodbs.fetch import kegg_link
        from biodbs.graph import build_kegg_link_graph, NodeType

        data = kegg_link("pathway", "hsa")  # genes to pathways
        graph = build_kegg_link_graph(
            data,
            source_type=NodeType.GENE,
            target_type=NodeType.PATHWAY,
        )
        print(graph)
        # KnowledgeGraph(name='KEGGLinkGraph', nodes=8500, edges=42000)
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from KEGG links",
        source=DataSource.KEGG,
    )

    sources: Set[str] = set()
    targets: Set[str] = set()

    for record in link_data.records:
        source_id = record.get("source_id", "")
        target_id = record.get("target_id", "")

        if source_id and target_id:
            sources.add(source_id)
            targets.add(target_id)

    # Create source nodes
    for source_id in sources:
        node = Node(
            id=source_id,
            label=source_id.split(":")[-1] if ":" in source_id else source_id,
            node_type=source_type,
            source=DataSource.KEGG,
        )
        graph.add_node(node)

    # Create target nodes
    for target_id in targets:
        node = Node(
            id=target_id,
            label=target_id.split(":")[-1] if ":" in target_id else target_id,
            node_type=target_type,
            source=DataSource.KEGG,
        )
        graph.add_node(node)

    # Create edges
    for record in link_data.records:
        source_id = record.get("source_id", "")
        target_id = record.get("target_id", "")

        if source_id and target_id:
            edge = Edge(
                source=source_id,
                target=target_id,
                relation=relation,
            )
            graph.add_edge(edge)

    return graph

merge_graphs¶

merge_graphs ¶

merge_graphs(
    *graphs: KnowledgeGraph,
    name: str = "MergedGraph",
    description: Optional[str] = None,
) -> KnowledgeGraph

Merge multiple knowledge graphs into one.

Combines all nodes and edges from the input graphs. Duplicate nodes (same ID) are kept as-is (first occurrence wins). Duplicate edges (same source, target, relation) are deduplicated.

Parameters:

Name	Type	Description	Default
`*graphs`	`KnowledgeGraph`	Variable number of KnowledgeGraph instances to merge.	`()`
`name`	`str`	Name for the merged graph.	`'MergedGraph'`
`description`	`Optional[str]`	Optional description for the merged graph.	`None`

Returns:

Type	Description
`KnowledgeGraph`	A new KnowledgeGraph containing all nodes and edges.

Example

from biodbs.graph import merge_graphs, build_disease_graph, build_go_graph

disease_graph = build_disease_graph(disease_data)
go_graph = build_go_graph(go_data)
merged = merge_graphs(disease_graph, go_graph, name="BioGraph")
print(merged)
# KnowledgeGraph(name='BioGraph', nodes=72, edges=24)

Source code in biodbs/_funcs/graph/builders.py

def merge_graphs(
    *graphs: KnowledgeGraph,
    name: str = "MergedGraph",
    description: Optional[str] = None,
) -> KnowledgeGraph:
    """Merge multiple knowledge graphs into one.

    Combines all nodes and edges from the input graphs. Duplicate nodes
    (same ID) are kept as-is (first occurrence wins). Duplicate edges
    (same source, target, relation) are deduplicated.

    Args:
        *graphs: Variable number of KnowledgeGraph instances to merge.
        name: Name for the merged graph.
        description: Optional description for the merged graph.

    Returns:
        A new KnowledgeGraph containing all nodes and edges.

    Example:
        ```python
        from biodbs.graph import merge_graphs, build_disease_graph, build_go_graph

        disease_graph = build_disease_graph(disease_data)
        go_graph = build_go_graph(go_data)
        merged = merge_graphs(disease_graph, go_graph, name="BioGraph")
        print(merged)
        # KnowledgeGraph(name='BioGraph', nodes=72, edges=24)
        ```
    """
    if not graphs:
        return KnowledgeGraph(name=name, description=description)

    if len(graphs) == 1:
        # Just copy the single graph
        merged = KnowledgeGraph(
            name=name,
            description=description or graphs[0].description,
            source=graphs[0].source,
        )
        merged.add_nodes(graphs[0].nodes)
        merged.add_edges(graphs[0].edges)
        return merged

    # Start with first graph as base
    merged = KnowledgeGraph(
        name=name,
        description=description or f"Merged from {len(graphs)} graphs",
        source=graphs[0].source,
    )

    # Add all nodes and edges from each graph
    for graph in graphs:
        merged.add_nodes(graph.nodes)
        merged.add_edges(graph.edges)

    return merged

Export Functions¶

to_networkx¶

to_networkx ¶

to_networkx(
    graph: KnowledgeGraph,
    include_properties: bool = True,
    include_xrefs: bool = True,
) -> "nx.DiGraph"

Export a KnowledgeGraph to a NetworkX directed graph.

Requires the networkx package to be installed.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The KnowledgeGraph to export.	required
`include_properties`	`bool`	Include node/edge properties as attributes.	`True`
`include_xrefs`	`bool`	Include cross-references as node attributes.	`True`

Returns:

Type	Description
`'nx.DiGraph'`	A NetworkX DiGraph with the same nodes and edges.

Raises:

Type	Description
`ImportError`	If networkx is not installed.

Example

from biodbs.graph import to_networkx, build_disease_graph
import networkx as nx

graph = build_disease_graph(disease_data)
nx_graph = to_networkx(graph)
centrality = nx.degree_centrality(nx_graph)
print(f"Nodes: {nx_graph.number_of_nodes()}")
# Nodes: 47

Source code in biodbs/_funcs/graph/exporters.py

def to_networkx(
    graph: KnowledgeGraph,
    include_properties: bool = True,
    include_xrefs: bool = True,
) -> "nx.DiGraph":
    """Export a KnowledgeGraph to a NetworkX directed graph.

    Requires the networkx package to be installed.

    Args:
        graph: The KnowledgeGraph to export.
        include_properties: Include node/edge properties as attributes.
        include_xrefs: Include cross-references as node attributes.

    Returns:
        A NetworkX DiGraph with the same nodes and edges.

    Raises:
        ImportError: If networkx is not installed.

    Example:
        ```python
        from biodbs.graph import to_networkx, build_disease_graph
        import networkx as nx

        graph = build_disease_graph(disease_data)
        nx_graph = to_networkx(graph)
        centrality = nx.degree_centrality(nx_graph)
        print(f"Nodes: {nx_graph.number_of_nodes()}")
        # Nodes: 47
        ```
    """
    try:
        import networkx as nx
    except ImportError:
        raise ImportError(
            "networkx is required for to_networkx(). "
            "Install it with: pip install networkx"
        )

    G = nx.DiGraph()
    G.graph["name"] = graph.name
    G.graph["source"] = graph.source.value
    if graph.description:
        G.graph["description"] = graph.description

    # Add nodes
    for node in graph.nodes:
        attrs: Dict[str, Any] = {
            "label": node.label,
            "node_type": node.node_type.value,
            "source": node.source.value,
        }

        if include_properties:
            attrs.update(node.get_properties_dict())

        if include_xrefs and node.xrefs:
            attrs["xrefs"] = list(node.xrefs)

        G.add_node(node.id, **attrs)

    # Add edges
    for edge in graph.edges:
        attrs: Dict[str, Any] = {
            "relation": edge.relation.value,
            "weight": edge.weight,
        }

        if include_properties:
            attrs.update(edge.get_properties_dict())

        if edge.evidence:
            attrs["evidence"] = list(edge.evidence)

        G.add_edge(edge.source, edge.target, **attrs)

    return G

to_json_ld¶

to_json_ld ¶

to_json_ld(
    graph: KnowledgeGraph,
    include_context: bool = True,
    compact: bool = False,
    base_uri: Optional[str] = None,
) -> Dict[str, Any]

Export a KnowledgeGraph to JSON-LD format.

JSON-LD is ideal for KG-RAG (Knowledge Graph Retrieval Augmented Generation) applications as it provides structured, semantically rich data that can be easily processed by LLMs.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The KnowledgeGraph to export.	required
`include_context`	`bool`	Include JSON-LD @context.	`True`
`compact`	`bool`	Use compact representation (less verbose).	`False`
`base_uri`	`Optional[str]`	Base URI for node IDs.	`None`

Returns:

Type	Description
`Dict[str, Any]`	A dictionary in JSON-LD format.

Example

from biodbs.graph import to_json_ld, build_disease_graph
import json

graph = build_disease_graph(disease_data)
json_ld = to_json_ld(graph)
# Use in RAG pipeline
context = json.dumps(json_ld, indent=2)
print(json_ld["@type"])
# schema:Dataset

Source code in biodbs/_funcs/graph/exporters.py

def to_json_ld(
    graph: KnowledgeGraph,
    include_context: bool = True,
    compact: bool = False,
    base_uri: Optional[str] = None,
) -> Dict[str, Any]:
    """Export a KnowledgeGraph to JSON-LD format.

    JSON-LD is ideal for KG-RAG (Knowledge Graph Retrieval Augmented
    Generation) applications as it provides structured, semantically
    rich data that can be easily processed by LLMs.

    Args:
        graph: The KnowledgeGraph to export.
        include_context: Include JSON-LD @context.
        compact: Use compact representation (less verbose).
        base_uri: Base URI for node IDs.

    Returns:
        A dictionary in JSON-LD format.

    Example:
        ```python
        from biodbs.graph import to_json_ld, build_disease_graph
        import json

        graph = build_disease_graph(disease_data)
        json_ld = to_json_ld(graph)
        # Use in RAG pipeline
        context = json.dumps(json_ld, indent=2)
        print(json_ld["@type"])
        # schema:Dataset
        ```
    """
    result: Dict[str, Any] = {}

    # Add context
    if include_context:
        result.update(BIOKG_CONTEXT)

    # Graph metadata
    result["@type"] = "schema:Dataset"
    result["schema:name"] = graph.name
    if graph.description:
        result["schema:description"] = graph.description
    result["schema:creator"] = "biodbs"
    result["schema:source"] = graph.source.value

    # Build nodes as @graph
    nodes_list: List[Dict[str, Any]] = []

    for node in graph.nodes:
        node_obj = _node_to_json_ld(node, base_uri, compact)

        # Add outgoing edges as properties
        outgoing = graph.get_outgoing_edges(node.id)
        for edge in outgoing:
            relation_key = _edge_type_to_json_ld_key(edge.relation)
            target_id = _make_uri(edge.target, base_uri)

            if relation_key in node_obj:
                # Multiple edges of same type
                if isinstance(node_obj[relation_key], list):
                    node_obj[relation_key].append(target_id)
                else:
                    node_obj[relation_key] = [node_obj[relation_key], target_id]
            else:
                node_obj[relation_key] = target_id

        nodes_list.append(node_obj)

    result["@graph"] = nodes_list

    return result

to_rdf¶

to_rdf ¶

to_rdf(
    graph: KnowledgeGraph,
    format: Literal["turtle", "xml", "n3", "nt"] = "turtle",
    base_uri: str = "http://example.org/biokg/",
) -> str

Export a KnowledgeGraph to RDF format.

Requires the rdflib package to be installed.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The KnowledgeGraph to export.	required
`format`	`Literal['turtle', 'xml', 'n3', 'nt']`	RDF serialization format ("turtle", "xml", "n3", "nt").	`'turtle'`
`base_uri`	`str`	Base URI for the graph.	`'http://example.org/biokg/'`

Returns:

Type	Description
`str`	RDF data as a string in the specified format.

Raises:

Type	Description
`ImportError`	If rdflib is not installed.

Example

from biodbs.graph import to_rdf, build_disease_graph

graph = build_disease_graph(disease_data)
turtle = to_rdf(graph, format="turtle")
print(turtle[:200])
# @prefix base: <http://example.org/biokg/> .
# @prefix biokg: <http://example.org/biokg/vocab/> .
# ...

Source code in biodbs/_funcs/graph/exporters.py

def to_rdf(
    graph: KnowledgeGraph,
    format: Literal["turtle", "xml", "n3", "nt"] = "turtle",
    base_uri: str = "http://example.org/biokg/",
) -> str:
    """Export a KnowledgeGraph to RDF format.

    Requires the rdflib package to be installed.

    Args:
        graph: The KnowledgeGraph to export.
        format: RDF serialization format ("turtle", "xml", "n3", "nt").
        base_uri: Base URI for the graph.

    Returns:
        RDF data as a string in the specified format.

    Raises:
        ImportError: If rdflib is not installed.

    Example:
        ```python
        from biodbs.graph import to_rdf, build_disease_graph

        graph = build_disease_graph(disease_data)
        turtle = to_rdf(graph, format="turtle")
        print(turtle[:200])
        # @prefix base: <http://example.org/biokg/> .
        # @prefix biokg: <http://example.org/biokg/vocab/> .
        # ...
        ```
    """
    try:
        from rdflib import Graph as RDFGraph
        from rdflib import Literal, Namespace, URIRef
        from rdflib.namespace import OWL, RDF, RDFS, XSD
    except ImportError:
        raise ImportError(
            "rdflib is required for to_rdf(). "
            "Install it with: pip install rdflib"
        )

    g = RDFGraph()

    # Define namespaces
    BASE = Namespace(base_uri)
    OBO = Namespace("http://purl.obolibrary.org/obo/")
    SCHEMA = Namespace("http://schema.org/")
    BIOKG = Namespace(base_uri + "vocab/")

    g.bind("base", BASE)
    g.bind("obo", OBO)
    g.bind("schema", SCHEMA)
    g.bind("biokg", BIOKG)

    # Add nodes
    for node in graph.nodes:
        node_uri = URIRef(base_uri + node.id.replace(":", "_"))

        # Type
        type_uri = _node_type_to_rdf_type(node.node_type, OBO, SCHEMA)
        g.add((node_uri, RDF.type, type_uri))

        # Label
        g.add((node_uri, RDFS.label, Literal(node.label)))

        # Source
        g.add((node_uri, BIOKG.source, Literal(node.source.value)))

        # Properties
        props = node.get_properties_dict()
        if "definition" in props:
            g.add((node_uri, SCHEMA.description, Literal(props["definition"])))
        for key, value in props.items():
            if key != "definition" and isinstance(value, (str, int, float, bool)):
                g.add((node_uri, BIOKG[key], Literal(value)))

        # Xrefs
        for xref in node.xrefs:
            g.add((node_uri, OBO.hasDbXref, Literal(xref)))

    # Add edges
    for edge in graph.edges:
        source_uri = URIRef(base_uri + edge.source.replace(":", "_"))
        target_uri = URIRef(base_uri + edge.target.replace(":", "_"))
        predicate = _edge_type_to_rdf_predicate(edge.relation, OBO, RDFS, BIOKG)

        g.add((source_uri, predicate, target_uri))

        # Add edge weight if not 1.0
        if edge.weight != 1.0:
            # Create a reified statement for the weight
            pass  # Simplified for now

    return g.serialize(format=format)

to_neo4j_csv¶

to_neo4j_csv ¶

to_neo4j_csv(
    graph: KnowledgeGraph,
    output_dir: Union[str, Path],
    nodes_filename: str = "nodes.csv",
    edges_filename: str = "relationships.csv",
    include_headers: bool = True,
) -> Tuple[Path, Path]

Export a KnowledgeGraph to CSV files for Neo4j import.

Creates two CSV files: one for nodes and one for relationships, formatted for Neo4j's LOAD CSV or neo4j-admin import.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The KnowledgeGraph to export.	required
`output_dir`	`Union[str, Path]`	Directory to write CSV files.	required
`nodes_filename`	`str`	Filename for nodes CSV.	`'nodes.csv'`
`edges_filename`	`str`	Filename for relationships CSV.	`'relationships.csv'`
`include_headers`	`bool`	Include Neo4j import headers.	`True`

Returns:

Type	Description
`Tuple[Path, Path]`	Tuple of (nodes_path, edges_path).

Example

from biodbs.graph import to_neo4j_csv, build_disease_graph

graph = build_disease_graph(disease_data)
nodes_path, edges_path = to_neo4j_csv(graph, "./neo4j_import/")
print(f"Nodes: {nodes_path}")
# Nodes: neo4j_import/nodes.csv
print(f"Edges: {edges_path}")
# Edges: neo4j_import/relationships.csv

Source code in biodbs/_funcs/graph/exporters.py

def to_neo4j_csv(
    graph: KnowledgeGraph,
    output_dir: Union[str, Path],
    nodes_filename: str = "nodes.csv",
    edges_filename: str = "relationships.csv",
    include_headers: bool = True,
) -> Tuple[Path, Path]:
    """Export a KnowledgeGraph to CSV files for Neo4j import.

    Creates two CSV files: one for nodes and one for relationships,
    formatted for Neo4j's LOAD CSV or neo4j-admin import.

    Args:
        graph: The KnowledgeGraph to export.
        output_dir: Directory to write CSV files.
        nodes_filename: Filename for nodes CSV.
        edges_filename: Filename for relationships CSV.
        include_headers: Include Neo4j import headers.

    Returns:
        Tuple of (nodes_path, edges_path).

    Example:
        ```python
        from biodbs.graph import to_neo4j_csv, build_disease_graph

        graph = build_disease_graph(disease_data)
        nodes_path, edges_path = to_neo4j_csv(graph, "./neo4j_import/")
        print(f"Nodes: {nodes_path}")
        # Nodes: neo4j_import/nodes.csv
        print(f"Edges: {edges_path}")
        # Edges: neo4j_import/relationships.csv
        ```
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    nodes_path = output_dir / nodes_filename
    edges_path = output_dir / edges_filename

    # Write nodes CSV
    with open(nodes_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)

        if include_headers:
            # Neo4j import headers
            writer.writerow([
                "id:ID",
                "label",
                "node_type",
                "source",
                "properties:string",
                "xrefs:string[]",
                ":LABEL",
            ])

        for node in graph.nodes:
            # Convert properties to JSON string
            props_json = json.dumps(node.get_properties_dict()) if node.properties else ""

            # Convert xrefs to Neo4j array format
            xrefs_str = ";".join(node.xrefs) if node.xrefs else ""

            # Label for Neo4j (node type as label)
            neo4j_label = node.node_type.value.title().replace("_", "")

            writer.writerow([
                node.id,
                node.label,
                node.node_type.value,
                node.source.value,
                props_json,
                xrefs_str,
                neo4j_label,
            ])

    # Write relationships CSV
    with open(edges_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)

        if include_headers:
            # Neo4j import headers
            writer.writerow([
                ":START_ID",
                ":END_ID",
                "weight:float",
                "evidence:string[]",
                "properties:string",
                ":TYPE",
            ])

        for edge in graph.edges:
            # Convert evidence to Neo4j array format
            evidence_str = ";".join(edge.evidence) if edge.evidence else ""

            # Convert properties to JSON string
            props_json = json.dumps(edge.get_properties_dict()) if edge.properties else ""

            # Relationship type (uppercase with underscores)
            rel_type = edge.relation.value.upper()

            writer.writerow([
                edge.source,
                edge.target,
                edge.weight,
                evidence_str,
                props_json,
                rel_type,
            ])

    return nodes_path, edges_path

to_cypher¶

to_cypher ¶

to_cypher(
    graph: KnowledgeGraph,
    batch_size: int = 100,
    use_merge: bool = True,
) -> str

Generate Cypher queries to create the graph in Neo4j.

Creates CREATE or MERGE statements for nodes and relationships.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The KnowledgeGraph to export.	required
`batch_size`	`int`	Number of statements per transaction.	`100`
`use_merge`	`bool`	Use MERGE instead of CREATE (prevents duplicates).	`True`

Returns:

Type	Description
`str`	Cypher script as a string.

Example

from biodbs.graph import to_cypher, build_disease_graph

graph = build_disease_graph(disease_data)
cypher = to_cypher(graph)
print(cypher[:150])
# // Cypher script generated from KnowledgeGraph: DiseaseOntologyGraph
# // Nodes: 47, Edges: 0
# ...

Source code in biodbs/_funcs/graph/exporters.py

def to_cypher(
    graph: KnowledgeGraph,
    batch_size: int = 100,
    use_merge: bool = True,
) -> str:
    """Generate Cypher queries to create the graph in Neo4j.

    Creates CREATE or MERGE statements for nodes and relationships.

    Args:
        graph: The KnowledgeGraph to export.
        batch_size: Number of statements per transaction.
        use_merge: Use MERGE instead of CREATE (prevents duplicates).

    Returns:
        Cypher script as a string.

    Example:
        ```python
        from biodbs.graph import to_cypher, build_disease_graph

        graph = build_disease_graph(disease_data)
        cypher = to_cypher(graph)
        print(cypher[:150])
        # // Cypher script generated from KnowledgeGraph: DiseaseOntologyGraph
        # // Nodes: 47, Edges: 0
        # ...
        ```
    """
    lines: List[str] = []
    command = "MERGE" if use_merge else "CREATE"

    # Header comment
    lines.append(f"// Cypher script generated from KnowledgeGraph: {graph.name}")
    lines.append(f"// Nodes: {graph.node_count}, Edges: {graph.edge_count}")
    lines.append("")

    # Create constraints for efficient MERGE
    if use_merge:
        node_types_used = {node.node_type for node in graph.nodes}
        for node_type in node_types_used:
            label = node_type.value.title().replace("_", "")
            lines.append(
                f"CREATE CONSTRAINT IF NOT EXISTS FOR (n:{label}) "
                f"REQUIRE n.id IS UNIQUE;"
            )
        lines.append("")

    # Create nodes
    lines.append("// Create nodes")
    for i, node in enumerate(graph.nodes):
        if i > 0 and i % batch_size == 0:
            lines.append("")

        label = node.node_type.value.title().replace("_", "")
        props = {
            "id": node.id,
            "label": node.label,
            "source": node.source.value,
        }
        props.update(node.get_properties_dict())

        # Escape special characters in strings
        props_str = ", ".join(
            f"{k}: {_cypher_value(v)}"
            for k, v in props.items()
        )

        lines.append(f"{command} (:{label} {{{props_str}}});")

    lines.append("")

    # Create relationships
    lines.append("// Create relationships")
    for i, edge in enumerate(graph.edges):
        if i > 0 and i % batch_size == 0:
            lines.append("")

        rel_type = edge.relation.value.upper()

        props: Dict[str, Any] = {"weight": edge.weight}
        if edge.evidence:
            props["evidence"] = list(edge.evidence)
        props.update(edge.get_properties_dict())

        props_str = ", ".join(
            f"{k}: {_cypher_value(v)}"
            for k, v in props.items()
        )

        lines.append(
            f"MATCH (a {{id: {_cypher_value(edge.source)}}}), "
            f"(b {{id: {_cypher_value(edge.target)}}}) "
            f"{command} (a)-[:{rel_type} {{{props_str}}}]->(b);"
        )

    return "\n".join(lines)

Utility Functions¶

find_shortest_path¶

find_shortest_path ¶

find_shortest_path(
    graph: KnowledgeGraph,
    source: str,
    target: str,
    directed: bool = True,
    max_depth: Optional[int] = None,
) -> Optional[List[str]]

Find the shortest path between two nodes using BFS.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The knowledge graph to search.	required
`source`	`str`	ID of the source node.	required
`target`	`str`	ID of the target node.	required
`directed`	`bool`	If True, follow edge direction. If False, treat as undirected.	`True`
`max_depth`	`Optional[int]`	Maximum path length to search.	`None`

Returns:

Type	Description
`Optional[List[str]]`	List of node IDs forming the shortest path, or None if no path exists.

Example

path = find_shortest_path(graph, "DOID:162", "DOID:1612")
if path:
    print(" -> ".join(path))
# DOID:162 -> DOID:1612

Source code in biodbs/_funcs/graph/utils.py

def find_shortest_path(
    graph: KnowledgeGraph,
    source: str,
    target: str,
    directed: bool = True,
    max_depth: Optional[int] = None,
) -> Optional[List[str]]:
    """Find the shortest path between two nodes using BFS.

    Args:
        graph: The knowledge graph to search.
        source: ID of the source node.
        target: ID of the target node.
        directed: If True, follow edge direction. If False, treat as undirected.
        max_depth: Maximum path length to search.

    Returns:
        List of node IDs forming the shortest path, or None if no path exists.

    Example:
        ```python
        path = find_shortest_path(graph, "DOID:162", "DOID:1612")
        if path:
            print(" -> ".join(path))
        # DOID:162 -> DOID:1612
        ```
    """
    if source not in graph or target not in graph:
        return None

    if source == target:
        return [source]

    # BFS with path tracking
    queue: deque = deque([(source, [source])])
    visited: Set[str] = {source}

    while queue:
        current, path = queue.popleft()

        if max_depth is not None and len(path) > max_depth:
            continue

        # Get neighbors
        neighbors = set()

        # Outgoing edges
        for edge in graph.get_outgoing_edges(current):
            neighbors.add(edge.target)

        # Incoming edges (if undirected)
        if not directed:
            for edge in graph.get_incoming_edges(current):
                neighbors.add(edge.source)

        for neighbor in neighbors:
            if neighbor == target:
                return path + [neighbor]

            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, path + [neighbor]))

    return None

find_all_paths¶

find_all_paths ¶

find_all_paths(
    graph: KnowledgeGraph,
    source: str,
    target: str,
    max_depth: int = 5,
    directed: bool = True,
) -> List[List[str]]

Find all paths between two nodes using DFS.

Warning: This can be slow for large graphs or high max_depth.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The knowledge graph to search.	required
`source`	`str`	ID of the source node.	required
`target`	`str`	ID of the target node.	required
`max_depth`	`int`	Maximum path length.	`5`
`directed`	`bool`	If True, follow edge direction. If False, treat as undirected.	`True`

Returns:

Type	Description
`List[List[str]]`	List of paths, where each path is a list of node IDs.

Example

paths = find_all_paths(graph, "A", "D", max_depth=3)
for path in paths:
    print(" -> ".join(path))
# A -> B -> D
# A -> C -> D

Source code in biodbs/_funcs/graph/utils.py

def find_all_paths(
    graph: KnowledgeGraph,
    source: str,
    target: str,
    max_depth: int = 5,
    directed: bool = True,
) -> List[List[str]]:
    """Find all paths between two nodes using DFS.

    Warning: This can be slow for large graphs or high max_depth.

    Args:
        graph: The knowledge graph to search.
        source: ID of the source node.
        target: ID of the target node.
        max_depth: Maximum path length.
        directed: If True, follow edge direction. If False, treat as undirected.

    Returns:
        List of paths, where each path is a list of node IDs.

    Example:
        ```python
        paths = find_all_paths(graph, "A", "D", max_depth=3)
        for path in paths:
            print(" -> ".join(path))
        # A -> B -> D
        # A -> C -> D
        ```
    """
    if source not in graph or target not in graph:
        return []

    all_paths: List[List[str]] = []

    def dfs(current: str, path: List[str], visited: Set[str]):
        if current == target:
            all_paths.append(path.copy())
            return

        if len(path) >= max_depth:
            return

        # Get neighbors
        neighbors = set()

        for edge in graph.get_outgoing_edges(current):
            neighbors.add(edge.target)

        if not directed:
            for edge in graph.get_incoming_edges(current):
                neighbors.add(edge.source)

        for neighbor in neighbors:
            if neighbor not in visited:
                visited.add(neighbor)
                path.append(neighbor)
                dfs(neighbor, path, visited)
                path.pop()
                visited.remove(neighbor)

    dfs(source, [source], {source})
    return all_paths

get_neighborhood¶

get_neighborhood ¶

get_neighborhood(
    graph: KnowledgeGraph,
    node_id: str,
    hops: int = 1,
    directed: bool = False,
    include_edges: bool = False,
) -> Dict[str, Any]

Get nodes within N hops of a starting node.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The knowledge graph.	required
`node_id`	`str`	ID of the center node.	required
`hops`	`int`	Number of hops (distance) to include.	`1`
`directed`	`bool`	If True, only follow outgoing edges.	`False`
`include_edges`	`bool`	If True, include edges in the result.	`False`

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with "nodes" (and optionally "edges") keys.

Example

neighborhood = get_neighborhood(graph, "DOID:162", hops=2)
print(f"Found {len(neighborhood['nodes'])} nodes within 2 hops")
# Found 25 nodes within 2 hops

Source code in biodbs/_funcs/graph/utils.py

def get_neighborhood(
    graph: KnowledgeGraph,
    node_id: str,
    hops: int = 1,
    directed: bool = False,
    include_edges: bool = False,
) -> Dict[str, Any]:
    """Get nodes within N hops of a starting node.

    Args:
        graph: The knowledge graph.
        node_id: ID of the center node.
        hops: Number of hops (distance) to include.
        directed: If True, only follow outgoing edges.
        include_edges: If True, include edges in the result.

    Returns:
        Dictionary with "nodes" (and optionally "edges") keys.

    Example:
        ```python
        neighborhood = get_neighborhood(graph, "DOID:162", hops=2)
        print(f"Found {len(neighborhood['nodes'])} nodes within 2 hops")
        # Found 25 nodes within 2 hops
        ```
    """
    if node_id not in graph:
        return {"nodes": [], "edges": []} if include_edges else {"nodes": []}

    visited: Set[str] = {node_id}
    current_level = {node_id}
    collected_edges: Set[Edge] = set()

    for _ in range(hops):
        next_level: Set[str] = set()

        for current in current_level:
            # Outgoing edges
            for edge in graph.get_outgoing_edges(current):
                if edge.target not in visited:
                    next_level.add(edge.target)
                    visited.add(edge.target)
                if include_edges:
                    collected_edges.add(edge)

            # Incoming edges (if undirected)
            if not directed:
                for edge in graph.get_incoming_edges(current):
                    if edge.source not in visited:
                        next_level.add(edge.source)
                        visited.add(edge.source)
                    if include_edges:
                        collected_edges.add(edge)

        current_level = next_level

    # Get actual node objects
    nodes = [graph.get_node(nid) for nid in visited if graph.get_node(nid)]

    result: Dict[str, Any] = {"nodes": nodes}
    if include_edges:
        result["edges"] = list(collected_edges)

    return result

get_connected_component¶

get_connected_component ¶

get_connected_component(
    graph: KnowledgeGraph,
    node_id: str,
    directed: bool = False,
) -> Set[str]

Get all nodes in the same connected component.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The knowledge graph.	required
`node_id`	`str`	ID of a node in the component.	required
`directed`	`bool`	If True, use strongly connected component semantics.	`False`

Returns:

Type	Description
`Set[str]`	Set of node IDs in the same component.

Example

component = get_connected_component(graph, "DOID:162")
print(f"Component has {len(component)} nodes")
# Component has 47 nodes

Source code in biodbs/_funcs/graph/utils.py

def get_connected_component(
    graph: KnowledgeGraph,
    node_id: str,
    directed: bool = False,
) -> Set[str]:
    """Get all nodes in the same connected component.

    Args:
        graph: The knowledge graph.
        node_id: ID of a node in the component.
        directed: If True, use strongly connected component semantics.

    Returns:
        Set of node IDs in the same component.

    Example:
        ```python
        component = get_connected_component(graph, "DOID:162")
        print(f"Component has {len(component)} nodes")
        # Component has 47 nodes
        ```
    """
    if node_id not in graph:
        return set()

    visited: Set[str] = set()
    queue: deque = deque([node_id])

    while queue:
        current = queue.popleft()

        if current in visited:
            continue
        visited.add(current)

        # Outgoing edges
        for edge in graph.get_outgoing_edges(current):
            if edge.target not in visited:
                queue.append(edge.target)

        # Incoming edges (for undirected or weak connectivity)
        if not directed:
            for edge in graph.get_incoming_edges(current):
                if edge.source not in visited:
                    queue.append(edge.source)

    return visited

get_all_connected_components¶

get_all_connected_components ¶

get_all_connected_components(
    graph: KnowledgeGraph, directed: bool = False
) -> List[Set[str]]

Get all connected components in the graph.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The knowledge graph.	required
`directed`	`bool`	If True, find strongly connected components.	`False`

Returns:

Type	Description
`List[Set[str]]`	List of sets, each containing node IDs in a component.

Source code in biodbs/_funcs/graph/utils.py

def get_all_connected_components(
    graph: KnowledgeGraph,
    directed: bool = False,
) -> List[Set[str]]:
    """Get all connected components in the graph.

    Args:
        graph: The knowledge graph.
        directed: If True, find strongly connected components.

    Returns:
        List of sets, each containing node IDs in a component.
    """
    remaining = set(node.id for node in graph.nodes)
    components: List[Set[str]] = []

    while remaining:
        # Pick any remaining node
        start = next(iter(remaining))
        component = get_connected_component(graph, start, directed)
        components.append(component)
        remaining -= component

    return components

find_hub_nodes¶

find_hub_nodes ¶

find_hub_nodes(
    graph: KnowledgeGraph,
    top_n: int = 10,
    direction: str = "both",
    node_type: Optional[NodeType] = None,
) -> List[Tuple[str, int]]

Find the most highly connected nodes (hubs).

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The knowledge graph.	required
`top_n`	`int`	Number of top hubs to return.	`10`
`direction`	`str`	"outgoing", "incoming", or "both".	`'both'`
`node_type`	`Optional[NodeType]`	Optional filter by node type.	`None`

Returns:

Type	Description
`List[Tuple[str, int]]`	List of (node_id, degree) tuples, sorted by degree descending.

Example

hubs = find_hub_nodes(graph, top_n=5)
for node_id, degree in hubs:
    print(f"{node_id}: {degree} connections")
# DOID:162: 15 connections
# DOID:4: 12 connections
# ...

Source code in biodbs/_funcs/graph/utils.py

def find_hub_nodes(
    graph: KnowledgeGraph,
    top_n: int = 10,
    direction: str = "both",
    node_type: Optional[NodeType] = None,
) -> List[Tuple[str, int]]:
    """Find the most highly connected nodes (hubs).

    Args:
        graph: The knowledge graph.
        top_n: Number of top hubs to return.
        direction: "outgoing", "incoming", or "both".
        node_type: Optional filter by node type.

    Returns:
        List of (node_id, degree) tuples, sorted by degree descending.

    Example:
        ```python
        hubs = find_hub_nodes(graph, top_n=5)
        for node_id, degree in hubs:
            print(f"{node_id}: {degree} connections")
        # DOID:162: 15 connections
        # DOID:4: 12 connections
        # ...
        ```
    """
    degrees: List[Tuple[str, int]] = []

    for node in graph.nodes:
        if node_type is not None and node.node_type != node_type:
            continue

        degree = graph.get_degree(node.id, direction)
        degrees.append((node.id, degree))

    # Sort by degree descending
    degrees.sort(key=lambda x: -x[1])

    return degrees[:top_n]

compute_degree_distribution¶

compute_degree_distribution ¶

compute_degree_distribution(
    graph: KnowledgeGraph, direction: str = "both"
) -> Dict[int, int]

Compute the degree distribution of the graph.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The knowledge graph.	required
`direction`	`str`	"outgoing", "incoming", or "both".	`'both'`

Returns:

Type	Description
`Dict[int, int]`	Dictionary mapping degree to count of nodes with that degree.

Source code in biodbs/_funcs/graph/utils.py

def compute_degree_distribution(
    graph: KnowledgeGraph,
    direction: str = "both",
) -> Dict[int, int]:
    """Compute the degree distribution of the graph.

    Args:
        graph: The knowledge graph.
        direction: "outgoing", "incoming", or "both".

    Returns:
        Dictionary mapping degree to count of nodes with that degree.
    """
    distribution: Dict[int, int] = {}

    for node in graph.nodes:
        degree = graph.get_degree(node.id, direction)
        distribution[degree] = distribution.get(degree, 0) + 1

    return dict(sorted(distribution.items()))

get_graph_statistics¶

get_graph_statistics ¶

get_graph_statistics(
    graph: KnowledgeGraph, compute_centrality: bool = False
) -> Dict[str, Any]

Get comprehensive statistics about the graph.

Parameters:

Name	Type	Description	Default
`graph`	`KnowledgeGraph`	The knowledge graph.	required
`compute_centrality`	`bool`	If True, compute betweenness centrality (slower).	`False`

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with various statistics.

Example

stats = get_graph_statistics(graph)
print(f"Density: {stats['density']:.4f}")
# Density: 0.0213
print(f"Components: {stats['num_components']}")
# Components: 1

Source code in biodbs/_funcs/graph/utils.py

def get_graph_statistics(
    graph: KnowledgeGraph,
    compute_centrality: bool = False,
) -> Dict[str, Any]:
    """Get comprehensive statistics about the graph.

    Args:
        graph: The knowledge graph.
        compute_centrality: If True, compute betweenness centrality (slower).

    Returns:
        Dictionary with various statistics.

    Example:
        ```python
        stats = get_graph_statistics(graph)
        print(f"Density: {stats['density']:.4f}")
        # Density: 0.0213
        print(f"Components: {stats['num_components']}")
        # Components: 1
        ```
    """
    n = graph.node_count
    m = graph.edge_count

    stats: Dict[str, Any] = {
        "name": graph.name,
        "source": graph.source.value,
        "num_nodes": n,
        "num_edges": m,
    }

    if n > 0:
        # Density (for directed graph: m / (n * (n-1)))
        max_edges = n * (n - 1) if n > 1 else 1
        stats["density"] = m / max_edges if max_edges > 0 else 0.0

        # Degree statistics
        degrees = [graph.get_degree(node.id, "both") for node in graph.nodes]
        stats["avg_degree"] = sum(degrees) / n
        stats["max_degree"] = max(degrees)
        stats["min_degree"] = min(degrees)

        # Out-degree statistics
        out_degrees = [graph.get_degree(node.id, "outgoing") for node in graph.nodes]
        stats["avg_out_degree"] = sum(out_degrees) / n

        # In-degree statistics
        in_degrees = [graph.get_degree(node.id, "incoming") for node in graph.nodes]
        stats["avg_in_degree"] = sum(in_degrees) / n

        # Isolated nodes (degree 0)
        stats["num_isolated"] = sum(1 for d in degrees if d == 0)

        # Connected components
        components = get_all_connected_components(graph)
        stats["num_components"] = len(components)
        stats["largest_component_size"] = max(len(c) for c in components) if components else 0

        # Node type distribution
        stats["node_type_counts"] = graph.get_node_type_counts()

        # Edge type distribution
        stats["edge_type_counts"] = graph.get_edge_type_counts()

        # Self-loops
        stats["num_self_loops"] = sum(
            1 for edge in graph.edges if edge.source == edge.target
        )

        # Compute centrality if requested
        if compute_centrality and n <= 1000:  # Only for smaller graphs
            stats["betweenness_centrality"] = compute_betweenness_centrality(
                graph, normalized=True
            )

    return stats

format_statistics¶

format_statistics ¶

format_statistics(stats: Dict[str, Any]) -> str

Format graph statistics as a readable string.

Parameters:

Name	Type	Description	Default
`stats`	`Dict[str, Any]`	Dictionary from get_graph_statistics().	required

Returns:

Type	Description
`str`	Formatted string representation.

Source code in biodbs/_funcs/graph/utils.py

def format_statistics(stats: Dict[str, Any]) -> str:
    """Format graph statistics as a readable string.

    Args:
        stats: Dictionary from get_graph_statistics().

    Returns:
        Formatted string representation.
    """
    lines = [
        f"Graph Statistics: {stats.get('name', 'Unknown')}",
        "=" * 50,
        f"Nodes: {stats.get('num_nodes', 0)}",
        f"Edges: {stats.get('num_edges', 0)}",
        f"Density: {stats.get('density', 0):.6f}",
        "",
        "Degree Statistics:",
        f"  Average: {stats.get('avg_degree', 0):.2f}",
        f"  Maximum: {stats.get('max_degree', 0)}",
        f"  Minimum: {stats.get('min_degree', 0)}",
        "",
        f"Isolated nodes: {stats.get('num_isolated', 0)}",
        f"Self-loops: {stats.get('num_self_loops', 0)}",
        f"Connected components: {stats.get('num_components', 0)}",
        f"Largest component: {stats.get('largest_component_size', 0)} nodes",
    ]

    if stats.get("node_type_counts"):
        lines.append("")
        lines.append("Node types:")
        for node_type, count in stats["node_type_counts"].items():
            lines.append(f"  {node_type.value}: {count}")

    if stats.get("edge_type_counts"):
        lines.append("")
        lines.append("Edge types:")
        for edge_type, count in stats["edge_type_counts"].items():
            lines.append(f"  {edge_type.value}: {count}")

    return "\n".join(lines)

Graph Module API Reference¶

Summary¶

Core Classes¶

Enums¶

Builder Functions¶

Export Functions¶

Utility Functions¶

Enums¶

NodeType¶

NodeType ¶

EdgeType¶

EdgeType ¶

DataSource¶

DataSource ¶

Core Classes¶

Node¶

Node dataclass ¶

get_property ¶

get_properties_dict ¶

with_properties ¶

with_xrefs ¶

to_dict ¶

from_dict classmethod ¶

Edge¶

Edge dataclass ¶

get_property ¶

get_properties_dict ¶

with_properties ¶

with_evidence ¶

to_dict ¶

from_dict classmethod ¶

KnowledgeGraph¶

KnowledgeGraph ¶

nodes property ¶

edges property ¶

node_count property ¶

edge_count property ¶

__len__ ¶

__contains__ ¶

__iter__ ¶

__repr__ ¶

add_node ¶

add_nodes ¶

get_node ¶

has_node ¶

remove_node ¶

add_edge ¶

add_edges ¶

get_edge ¶

has_edge ¶

remove_edge ¶

get_outgoing_edges ¶

get_incoming_edges ¶

get_neighbors ¶

filter_nodes ¶

filter_edges ¶

get_nodes_by_type ¶

get_edges_by_relation ¶

subgraph ¶

induced_subgraph ¶

merge ¶

update ¶

get_node_type_counts ¶

get_edge_type_counts ¶

get_degree ¶

summary ¶

to_dict ¶

from_dict classmethod ¶

nodes_as_dataframe ¶

edges_as_dataframe ¶

Builder Functions¶

build_graph¶

build_graph ¶

build_disease_graph¶

build_disease_graph ¶

build_disease_graph_with_hierarchy¶

build_disease_graph_with_hierarchy ¶

build_go_graph¶

build_go_graph ¶

build_reactome_graph¶

Node `dataclass` ¶

from_dict `classmethod` ¶

Edge `dataclass` ¶

from_dict `classmethod` ¶

nodes `property` ¶

edges `property` ¶

node_count `property` ¶

edge_count `property` ¶

len ¶

contains ¶

iter ¶

repr ¶

from_dict `classmethod` ¶