Skip to content

Graph Module API Reference

Complete API reference for biodbs.graph module.

Summary

Core Classes

Class Description
Node Immutable node representing a biological entity
Edge Immutable edge representing a relationship
KnowledgeGraph Container for nodes and edges with graph operations

Enums

Enum Description
NodeType Types of biological entities (gene, protein, disease, etc.)
EdgeType Types of relationships (is_a, part_of, regulates, etc.)
DataSource Supported data sources for graph construction

Builder Functions

Function Description
build_graph Create graph from nodes and edges
build_disease_graph Build from Disease Ontology data
build_go_graph Build from Gene Ontology data
build_reactome_graph Build from Reactome data
build_kegg_graph Build from KEGG data
merge_graphs Merge multiple graphs

Export Functions

Function Description
to_networkx Export to NetworkX graph
to_json_ld Export to JSON-LD format
to_rdf Export to RDF format
to_neo4j_csv Export to Neo4j CSV files
to_cypher Export to Cypher queries

Utility Functions

Function Description
find_shortest_path Find shortest path between nodes
find_all_paths Find all paths up to max length
get_neighborhood Get nodes within N hops
get_connected_component Get connected component containing node
find_hub_nodes Find high-degree hub nodes
get_graph_statistics Compute graph statistics

Enums

NodeType

Types of nodes representing biological entities.

Member Value Description
GENE "gene" Gene entity
PROTEIN "protein" Protein entity
DISEASE "disease" Disease entity
PATHWAY "pathway" Biological pathway
GO_TERM "go_term" Gene Ontology term
REACTION "reaction" Biochemical reaction
COMPOUND "compound" Chemical compound
DRUG "drug" Drug/pharmaceutical
PHENOTYPE "phenotype" Phenotype
ORGANISM "organism" Organism/species
PUBLICATION "publication" Scientific publication
OTHER "other" Other entity type

NodeType

Bases: str, Enum

Types of nodes in the knowledge graph.

Each node type represents a different biological entity category.

EdgeType

Types of relationships between biological entities.

Member Value Category
IS_A "is_a" Ontology
PART_OF "part_of" Ontology
HAS_PART "has_part" Ontology
REGULATES "regulates" Regulatory
POSITIVELY_REGULATES "positively_regulates" Regulatory
NEGATIVELY_REGULATES "negatively_regulates" Regulatory
PARTICIPATES_IN "participates_in" Participation
HAS_PARTICIPANT "has_participant" Participation
CATALYZES "catalyzes" Participation
PRODUCES "produces" Participation
CONSUMES "consumes" Participation
ASSOCIATED_WITH "associated_with" Association
INTERACTS_WITH "interacts_with" Association
TARGETS "targets" Association
XREF "xref" Cross-reference
SAME_AS "same_as" Cross-reference
ENCODES "encodes" Sequence
TRANSCRIBES "transcribes" Sequence
TRANSLATES "translates" Sequence
RELATED_TO "related_to" Other
OTHER "other" Other

EdgeType

Bases: str, Enum

Types of edges (relationships) in the knowledge graph.

Each edge type represents a different kind of relationship between biological entities.

DataSource

Supported data sources for graph construction.

Member Value Description
DISEASE_ONTOLOGY "disease_ontology" Disease Ontology
GENE_ONTOLOGY "gene_ontology" Gene Ontology
REACTOME "reactome" Reactome pathways
KEGG "kegg" KEGG database
QUICKGO "quickgo" QuickGO annotations
UNIPROT "uniprot" UniProt
ENSEMBL "ensembl" Ensembl
PUBCHEM "pubchem" PubChem
CHEMBL "chembl" ChEMBL
CUSTOM "custom" Custom data source

DataSource

Bases: str, Enum

Supported data sources for knowledge graph construction.


Core Classes

Node

Node dataclass

Node(
    id: str,
    label: str,
    node_type: NodeType = OTHER,
    source: DataSource = CUSTOM,
    properties: FrozenSet[Tuple[str, Any]] = frozenset(),
    xrefs: FrozenSet[str] = frozenset(),
)

A node (entity) in the knowledge graph.

Nodes are immutable (frozen) to ensure graph integrity.

Attributes:

Name Type Description
id str

Unique identifier for the node (e.g., "DOID:162", "GO:0008150").

label str

Human-readable label for the node.

node_type NodeType

Type of biological entity this node represents.

source DataSource

Data source this node originated from.

properties FrozenSet[Tuple[str, Any]]

Additional properties as a frozen dict.

xrefs FrozenSet[str]

Cross-references to other databases.

get_property

get_property(key: str, default: Any = None) -> Any

Get a property value by key.

get_properties_dict

get_properties_dict() -> Dict[str, Any]

Get properties as a dictionary.

with_properties

with_properties(**kwargs: Any) -> 'Node'

Create a new node with additional/updated properties.

with_xrefs

with_xrefs(*xrefs: str) -> 'Node'

Create a new node with additional cross-references.

to_dict

to_dict() -> Dict[str, Any]

Convert node to dictionary representation.

from_dict classmethod

from_dict(data: Dict[str, Any]) -> 'Node'

Create a Node from dictionary representation.

Edge

Edge dataclass

Edge(
    source: str,
    target: str,
    relation: EdgeType = RELATED_TO,
    weight: float = 1.0,
    evidence: FrozenSet[str] = frozenset(),
    properties: FrozenSet[Tuple[str, Any]] = frozenset(),
)

A directed edge (relationship) in the knowledge graph.

Edges are immutable (frozen) to ensure graph integrity.

Attributes:

Name Type Description
source str

ID of the source node.

target str

ID of the target node.

relation EdgeType

Type of relationship.

weight float

Optional edge weight (default 1.0).

evidence FrozenSet[str]

Evidence supporting this relationship.

properties FrozenSet[Tuple[str, Any]]

Additional properties as a frozen dict.

get_property

get_property(key: str, default: Any = None) -> Any

Get a property value by key.

get_properties_dict

get_properties_dict() -> Dict[str, Any]

Get properties as a dictionary.

with_properties

with_properties(**kwargs: Any) -> 'Edge'

Create a new edge with additional/updated properties.

with_evidence

with_evidence(*evidence: str) -> 'Edge'

Create a new edge with additional evidence.

to_dict

to_dict() -> Dict[str, Any]

Convert edge to dictionary representation.

from_dict classmethod

from_dict(data: Dict[str, Any]) -> 'Edge'

Create an Edge from dictionary representation.

KnowledgeGraph

KnowledgeGraph

KnowledgeGraph(
    name: str = "KnowledgeGraph",
    description: Optional[str] = None,
    source: DataSource = CUSTOM,
)

Container for a biological knowledge graph.

Provides methods for adding/removing nodes and edges, filtering, subgraph extraction, merging, and computing statistics.

Attributes:

Name Type Description
name

Name of the knowledge graph.

description

Optional description.

source

Primary data source for this graph.

Example
from biodbs.graph import KnowledgeGraph, Node, Edge, NodeType, EdgeType

graph = KnowledgeGraph(name="DiseaseOntologyGraph")
graph.add_node(Node(id="DOID:162", label="cancer", node_type=NodeType.DISEASE))
graph.add_node(Node(id="DOID:1612", label="breast cancer", node_type=NodeType.DISEASE))
graph.add_edge(Edge(source="DOID:1612", target="DOID:162", relation=EdgeType.IS_A))
print(graph.summary())
# KnowledgeGraph: DiseaseOntologyGraph
# Nodes: 2
# Edges: 1
#
# Node types:
#   disease: 2
#
# Edge types:
#   is_a: 1

Initialize a new KnowledgeGraph.

Parameters:

Name Type Description Default
name str

Name of the graph.

'KnowledgeGraph'
description Optional[str]

Optional description.

None
source DataSource

Primary data source for this graph.

CUSTOM

nodes property

nodes: List[Node]

Get all nodes as a list.

edges property

edges: List[Edge]

Get all edges as a list.

node_count property

node_count: int

Get the number of nodes.

edge_count property

edge_count: int

Get the number of edges.

__len__

__len__() -> int

Return the number of nodes in the graph.

__contains__

__contains__(node_id: str) -> bool

Check if a node exists in the graph.

__iter__

__iter__() -> Iterator[Node]

Iterate over all nodes in the graph.

__repr__

__repr__() -> str

Return a string representation.

add_node

add_node(node: Node) -> bool

Add a node to the graph.

Parameters:

Name Type Description Default
node Node

The node to add.

required

Returns:

Type Description
bool

True if the node was added, False if it already existed.

add_nodes

add_nodes(nodes: List[Node]) -> int

Add multiple nodes to the graph.

Parameters:

Name Type Description Default
nodes List[Node]

List of nodes to add.

required

Returns:

Type Description
int

Number of nodes actually added (excludes duplicates).

get_node

get_node(node_id: str) -> Optional[Node]

Get a node by its ID.

Parameters:

Name Type Description Default
node_id str

The node identifier.

required

Returns:

Type Description
Optional[Node]

The Node if found, None otherwise.

has_node

has_node(node_id: str) -> bool

Check if a node exists.

Parameters:

Name Type Description Default
node_id str

The node identifier.

required

Returns:

Type Description
bool

True if the node exists.

remove_node

remove_node(node_id: str) -> bool

Remove a node and all its connected edges.

Parameters:

Name Type Description Default
node_id str

The node identifier.

required

Returns:

Type Description
bool

True if the node was removed, False if it didn't exist.

add_edge

add_edge(edge: Edge) -> bool

Add an edge to the graph.

Parameters:

Name Type Description Default
edge Edge

The edge to add.

required

Returns:

Type Description
bool

True if the edge was added, False if it already existed

bool

or if source/target nodes don't exist.

add_edges

add_edges(edges: List[Edge]) -> int

Add multiple edges to the graph.

Parameters:

Name Type Description Default
edges List[Edge]

List of edges to add.

required

Returns:

Type Description
int

Number of edges actually added.

get_edge

get_edge(
    source: str,
    target: str,
    relation: Optional[EdgeType] = None,
) -> Optional[Edge]

Get an edge between two nodes.

Parameters:

Name Type Description Default
source str

Source node ID.

required
target str

Target node ID.

required
relation Optional[EdgeType]

Optional relation type to match.

None

Returns:

Type Description
Optional[Edge]

The Edge if found, None otherwise.

has_edge

has_edge(
    source: str,
    target: str,
    relation: Optional[EdgeType] = None,
) -> bool

Check if an edge exists between two nodes.

Parameters:

Name Type Description Default
source str

Source node ID.

required
target str

Target node ID.

required
relation Optional[EdgeType]

Optional relation type to match.

None

Returns:

Type Description
bool

True if the edge exists.

remove_edge

remove_edge(edge: Edge) -> bool

Remove an edge from the graph.

Parameters:

Name Type Description Default
edge Edge

The edge to remove.

required

Returns:

Type Description
bool

True if the edge was removed, False if it didn't exist.

get_outgoing_edges

get_outgoing_edges(node_id: str) -> List[Edge]

Get all outgoing edges from a node.

Parameters:

Name Type Description Default
node_id str

The node identifier.

required

Returns:

Type Description
List[Edge]

List of outgoing edges.

get_incoming_edges

get_incoming_edges(node_id: str) -> List[Edge]

Get all incoming edges to a node.

Parameters:

Name Type Description Default
node_id str

The node identifier.

required

Returns:

Type Description
List[Edge]

List of incoming edges.

get_neighbors

get_neighbors(
    node_id: str, direction: str = "both"
) -> List[str]

Get neighboring node IDs.

Parameters:

Name Type Description Default
node_id str

The node identifier.

required
direction str

"outgoing", "incoming", or "both".

'both'

Returns:

Type Description
List[str]

List of neighboring node IDs.

filter_nodes

filter_nodes(
    predicate: Optional[Callable[[Node], bool]] = None,
    node_type: Optional[NodeType] = None,
    source: Optional[DataSource] = None,
) -> List[Node]

Filter nodes by predicate or attributes.

Parameters:

Name Type Description Default
predicate Optional[Callable[[Node], bool]]

Function that returns True for nodes to include.

None
node_type Optional[NodeType]

Filter by node type.

None
source Optional[DataSource]

Filter by data source.

None

Returns:

Type Description
List[Node]

List of matching nodes.

filter_edges

filter_edges(
    predicate: Optional[Callable[[Edge], bool]] = None,
    relation: Optional[EdgeType] = None,
    min_weight: Optional[float] = None,
) -> List[Edge]

Filter edges by predicate or attributes.

Parameters:

Name Type Description Default
predicate Optional[Callable[[Edge], bool]]

Function that returns True for edges to include.

None
relation Optional[EdgeType]

Filter by relation type.

None
min_weight Optional[float]

Filter by minimum weight.

None

Returns:

Type Description
List[Edge]

List of matching edges.

get_nodes_by_type

get_nodes_by_type(node_type: NodeType) -> List[Node]

Get all nodes of a specific type.

Parameters:

Name Type Description Default
node_type NodeType

The node type to filter by.

required

Returns:

Type Description
List[Node]

List of nodes with the specified type.

get_edges_by_relation

get_edges_by_relation(relation: EdgeType) -> List[Edge]

Get all edges with a specific relation type.

Parameters:

Name Type Description Default
relation EdgeType

The relation type to filter by.

required

Returns:

Type Description
List[Edge]

List of edges with the specified relation.

subgraph

subgraph(node_ids: Set[str]) -> 'KnowledgeGraph'

Create a subgraph containing only the specified nodes.

Parameters:

Name Type Description Default
node_ids Set[str]

Set of node IDs to include.

required

Returns:

Type Description
'KnowledgeGraph'

A new KnowledgeGraph containing the subgraph.

induced_subgraph

induced_subgraph(node_ids: Set[str]) -> 'KnowledgeGraph'

Alias for subgraph() - creates induced subgraph.

Parameters:

Name Type Description Default
node_ids Set[str]

Set of node IDs to include.

required

Returns:

Type Description
'KnowledgeGraph'

A new KnowledgeGraph containing the induced subgraph.

merge

merge(other: 'KnowledgeGraph') -> 'KnowledgeGraph'

Merge another graph into a new graph.

Parameters:

Name Type Description Default
other 'KnowledgeGraph'

The graph to merge with this one.

required

Returns:

Type Description
'KnowledgeGraph'

A new KnowledgeGraph containing all nodes and edges from both.

update

update(other: 'KnowledgeGraph') -> int

Update this graph with nodes and edges from another graph.

Unlike merge(), this modifies the current graph in place.

Parameters:

Name Type Description Default
other 'KnowledgeGraph'

The graph to merge into this one.

required

Returns:

Type Description
int

Total number of new nodes and edges added.

get_node_type_counts

get_node_type_counts() -> Dict[NodeType, int]

Get counts of nodes by type.

Returns:

Type Description
Dict[NodeType, int]

Dictionary mapping NodeType to count.

get_edge_type_counts

get_edge_type_counts() -> Dict[EdgeType, int]

Get counts of edges by relation type.

Returns:

Type Description
Dict[EdgeType, int]

Dictionary mapping EdgeType to count.

get_degree

get_degree(node_id: str, direction: str = 'both') -> int

Get the degree of a node.

Parameters:

Name Type Description Default
node_id str

The node identifier.

required
direction str

"outgoing", "incoming", or "both".

'both'

Returns:

Type Description
int

The degree of the node.

summary

summary() -> str

Get a text summary of the graph.

Returns:

Type Description
str

A formatted string with graph statistics.

to_dict

to_dict() -> Dict[str, Any]

Convert the graph to a dictionary representation.

Returns:

Type Description
Dict[str, Any]

Dictionary containing all graph data.

from_dict classmethod

from_dict(data: Dict[str, Any]) -> 'KnowledgeGraph'

Create a KnowledgeGraph from a dictionary representation.

Parameters:

Name Type Description Default
data Dict[str, Any]

Dictionary containing graph data.

required

Returns:

Type Description
'KnowledgeGraph'

A new KnowledgeGraph instance.

nodes_as_dataframe

nodes_as_dataframe(
    engine: str = "pandas",
) -> "pd.DataFrame"

Convert nodes to a DataFrame.

Parameters:

Name Type Description Default
engine str

"pandas" or "polars".

'pandas'

Returns:

Type Description
'pd.DataFrame'

DataFrame with node data.

edges_as_dataframe

edges_as_dataframe(
    engine: str = "pandas",
) -> "pd.DataFrame"

Convert edges to a DataFrame.

Parameters:

Name Type Description Default
engine str

"pandas" or "polars".

'pandas'

Returns:

Type Description
'pd.DataFrame'

DataFrame with edge data.


Builder Functions

build_graph

build_graph

build_graph(
    nodes: List[Node],
    edges: Optional[List[Edge]] = None,
    name: str = "KnowledgeGraph",
    description: Optional[str] = None,
    source: DataSource = CUSTOM,
) -> KnowledgeGraph

Build a knowledge graph from node and edge lists.

Parameters:

Name Type Description Default
nodes List[Node]

List of Node objects to add to the graph.

required
edges Optional[List[Edge]]

Optional list of Edge objects to add.

None
name str

Name for the graph.

'KnowledgeGraph'
description Optional[str]

Optional description.

None
source DataSource

Data source for the graph.

CUSTOM

Returns:

Type Description
KnowledgeGraph

A new KnowledgeGraph instance.

Example
from biodbs.graph import Node, Edge, NodeType, EdgeType, build_graph

nodes = [
    Node(id="A", label="Node A", node_type=NodeType.GENE),
    Node(id="B", label="Node B", node_type=NodeType.GENE),
]
edges = [
    Edge(source="A", target="B", relation=EdgeType.INTERACTS_WITH),
]
graph = build_graph(nodes, edges, name="MyGraph")
print(graph)
# KnowledgeGraph(name='MyGraph', nodes=2, edges=1)
Source code in biodbs/_funcs/graph/builders.py
def build_graph(
    nodes: List[Node],
    edges: Optional[List[Edge]] = None,
    name: str = "KnowledgeGraph",
    description: Optional[str] = None,
    source: DataSource = DataSource.CUSTOM,
) -> KnowledgeGraph:
    """Build a knowledge graph from node and edge lists.

    Args:
        nodes: List of Node objects to add to the graph.
        edges: Optional list of Edge objects to add.
        name: Name for the graph.
        description: Optional description.
        source: Data source for the graph.

    Returns:
        A new KnowledgeGraph instance.

    Example:
        ```python
        from biodbs.graph import Node, Edge, NodeType, EdgeType, build_graph

        nodes = [
            Node(id="A", label="Node A", node_type=NodeType.GENE),
            Node(id="B", label="Node B", node_type=NodeType.GENE),
        ]
        edges = [
            Edge(source="A", target="B", relation=EdgeType.INTERACTS_WITH),
        ]
        graph = build_graph(nodes, edges, name="MyGraph")
        print(graph)
        # KnowledgeGraph(name='MyGraph', nodes=2, edges=1)
        ```
    """
    graph = KnowledgeGraph(name=name, description=description, source=source)
    graph.add_nodes(nodes)
    if edges:
        graph.add_edges(edges)
    return graph

build_disease_graph

build_disease_graph

build_disease_graph(
    data: "DOFetchedData",
    name: str = "DiseaseOntologyGraph",
    include_xrefs: bool = True,
    include_synonyms: bool = False,
) -> KnowledgeGraph

Build a knowledge graph from Disease Ontology data.

Creates nodes for each disease term and edges for hierarchical relationships (is_a) when parent terms are available.

Parameters:

Name Type Description Default
data 'DOFetchedData'

DOFetchedData from Disease Ontology fetcher.

required
name str

Name for the graph.

'DiseaseOntologyGraph'
include_xrefs bool

Include cross-references as node xrefs.

True
include_synonyms bool

Include synonyms in node properties.

False

Returns:

Type Description
KnowledgeGraph

A KnowledgeGraph with disease nodes.

Example
from biodbs.fetch import DO_Fetcher
from biodbs.graph import build_disease_graph

fetcher = DO_Fetcher()
data = fetcher.get_children("DOID:162")  # cancer
graph = build_disease_graph(data)
print(graph.summary())
# KnowledgeGraph: DiseaseOntologyGraph
# Nodes: 47
# Edges: 0
#
# Node types:
#   disease: 47
Source code in biodbs/_funcs/graph/builders.py
def build_disease_graph(
    data: "DOFetchedData",
    name: str = "DiseaseOntologyGraph",
    include_xrefs: bool = True,
    include_synonyms: bool = False,
) -> KnowledgeGraph:
    """Build a knowledge graph from Disease Ontology data.

    Creates nodes for each disease term and edges for hierarchical
    relationships (is_a) when parent terms are available.

    Args:
        data: DOFetchedData from Disease Ontology fetcher.
        name: Name for the graph.
        include_xrefs: Include cross-references as node xrefs.
        include_synonyms: Include synonyms in node properties.

    Returns:
        A KnowledgeGraph with disease nodes.

    Example:
        ```python
        from biodbs.fetch import DO_Fetcher
        from biodbs.graph import build_disease_graph

        fetcher = DO_Fetcher()
        data = fetcher.get_children("DOID:162")  # cancer
        graph = build_disease_graph(data)
        print(graph.summary())
        # KnowledgeGraph: DiseaseOntologyGraph
        # Nodes: 47
        # Edges: 0
        #
        # Node types:
        #   disease: 47
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from Disease Ontology",
        source=DataSource.DISEASE_ONTOLOGY,
    )

    # Track which nodes we've added to create edges
    node_ids: Set[str] = set()

    for term in data.terms:
        # Build properties
        properties: Dict[str, Any] = {}
        if term.definition:
            properties["definition"] = term.definition
        if include_synonyms and term.synonyms:
            properties["synonyms"] = tuple(term.synonyms)
        if term.is_obsolete:
            properties["is_obsolete"] = True
        if hasattr(term, "has_children") and term.has_children:
            properties["has_children"] = True
        if hasattr(term, "is_root") and term.is_root:
            properties["is_root"] = True

        # Build xrefs
        xrefs: Set[str] = set()
        if include_xrefs and term.xrefs:
            xrefs = set(term.xrefs)

        # Create node
        node = Node(
            id=term.doid,
            label=term.name,
            node_type=NodeType.DISEASE,
            source=DataSource.DISEASE_ONTOLOGY,
            properties=frozenset(properties.items()) if properties else frozenset(),
            xrefs=frozenset(xrefs),
        )
        graph.add_node(node)
        node_ids.add(term.doid)

    # If we have hierarchical data (from get_children or get_descendants),
    # the parent-child relationships are implicit in the fetch
    # We can create is_a edges based on the query structure
    # For now, edges are created if the fetcher provides relationship info

    return graph

build_disease_graph_with_hierarchy

build_disease_graph_with_hierarchy

build_disease_graph_with_hierarchy(
    parent_data: "DOFetchedData",
    children_data: "DOFetchedData",
    name: str = "DiseaseOntologyGraph",
    include_xrefs: bool = True,
) -> KnowledgeGraph

Build a disease graph with explicit parent-child relationships.

Use this when you have fetched both parent and children terms and want to create IS_A edges between them.

Parameters:

Name Type Description Default
parent_data 'DOFetchedData'

DOFetchedData containing the parent term(s).

required
children_data 'DOFetchedData'

DOFetchedData containing child terms.

required
name str

Name for the graph.

'DiseaseOntologyGraph'
include_xrefs bool

Include cross-references as node xrefs.

True

Returns:

Type Description
KnowledgeGraph

A KnowledgeGraph with disease nodes and IS_A edges.

Example
from biodbs.fetch import DO_Fetcher
from biodbs.graph import build_disease_graph_with_hierarchy

fetcher = DO_Fetcher()
parent = fetcher.get_term("DOID:162")  # cancer
children = fetcher.get_children("DOID:162")
graph = build_disease_graph_with_hierarchy(parent, children)
print(graph)
# KnowledgeGraph(name='DiseaseOntologyGraph', nodes=48, edges=47)
Source code in biodbs/_funcs/graph/builders.py
def build_disease_graph_with_hierarchy(
    parent_data: "DOFetchedData",
    children_data: "DOFetchedData",
    name: str = "DiseaseOntologyGraph",
    include_xrefs: bool = True,
) -> KnowledgeGraph:
    """Build a disease graph with explicit parent-child relationships.

    Use this when you have fetched both parent and children terms
    and want to create IS_A edges between them.

    Args:
        parent_data: DOFetchedData containing the parent term(s).
        children_data: DOFetchedData containing child terms.
        name: Name for the graph.
        include_xrefs: Include cross-references as node xrefs.

    Returns:
        A KnowledgeGraph with disease nodes and IS_A edges.

    Example:
        ```python
        from biodbs.fetch import DO_Fetcher
        from biodbs.graph import build_disease_graph_with_hierarchy

        fetcher = DO_Fetcher()
        parent = fetcher.get_term("DOID:162")  # cancer
        children = fetcher.get_children("DOID:162")
        graph = build_disease_graph_with_hierarchy(parent, children)
        print(graph)
        # KnowledgeGraph(name='DiseaseOntologyGraph', nodes=48, edges=47)
        ```
    """
    # First build graphs from both datasets
    parent_graph = build_disease_graph(
        parent_data, name=name, include_xrefs=include_xrefs
    )
    children_graph = build_disease_graph(
        children_data, name=name, include_xrefs=include_xrefs
    )

    # Merge them
    graph = parent_graph.merge(children_graph)
    graph.name = name
    graph.description = "Knowledge graph from Disease Ontology with hierarchy"

    # Create IS_A edges from each child to each parent
    parent_ids = {term.doid for term in parent_data.terms}
    child_ids = {term.doid for term in children_data.terms}

    for child_id in child_ids:
        for parent_id in parent_ids:
            if child_id != parent_id:  # No self-loops
                edge = Edge(
                    source=child_id,
                    target=parent_id,
                    relation=EdgeType.IS_A,
                )
                graph.add_edge(edge)

    return graph

build_go_graph

build_go_graph

build_go_graph(
    data: "QuickGOFetchedData",
    name: str = "GeneOntologyGraph",
    include_evidence: bool = True,
    create_annotation_edges: bool = True,
) -> KnowledgeGraph

Build a knowledge graph from Gene Ontology (QuickGO) data.

Creates nodes for GO terms and optionally for gene products, with edges representing annotations and ontology relationships.

Parameters:

Name Type Description Default
data 'QuickGOFetchedData'

QuickGOFetchedData from QuickGO fetcher.

required
name str

Name for the graph.

'GeneOntologyGraph'
include_evidence bool

Include evidence codes in edge properties.

True
create_annotation_edges bool

Create edges between gene products and GO terms.

True

Returns:

Type Description
KnowledgeGraph

A KnowledgeGraph with GO term and gene nodes.

Example
from biodbs.fetch import QuickGO_Fetcher
from biodbs.graph import build_go_graph

fetcher = QuickGO_Fetcher()
data = fetcher.get(category="annotation", endpoint="search", geneProductId="UniProtKB:P04637")
graph = build_go_graph(data)
print(graph)
# KnowledgeGraph(name='GeneOntologyGraph', nodes=25, edges=24)
Source code in biodbs/_funcs/graph/builders.py
def build_go_graph(
    data: "QuickGOFetchedData",
    name: str = "GeneOntologyGraph",
    include_evidence: bool = True,
    create_annotation_edges: bool = True,
) -> KnowledgeGraph:
    """Build a knowledge graph from Gene Ontology (QuickGO) data.

    Creates nodes for GO terms and optionally for gene products,
    with edges representing annotations and ontology relationships.

    Args:
        data: QuickGOFetchedData from QuickGO fetcher.
        name: Name for the graph.
        include_evidence: Include evidence codes in edge properties.
        create_annotation_edges: Create edges between gene products and GO terms.

    Returns:
        A KnowledgeGraph with GO term and gene nodes.

    Example:
        ```python
        from biodbs.fetch import QuickGO_Fetcher
        from biodbs.graph import build_go_graph

        fetcher = QuickGO_Fetcher()
        data = fetcher.get(category="annotation", endpoint="search", geneProductId="UniProtKB:P04637")
        graph = build_go_graph(data)
        print(graph)
        # KnowledgeGraph(name='GeneOntologyGraph', nodes=25, edges=24)
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from Gene Ontology",
        source=DataSource.GENE_ONTOLOGY,
    )

    go_terms: Dict[str, Dict[str, Any]] = {}  # GO ID -> term info
    gene_products: Set[str] = set()  # gene product IDs
    annotations: List[Tuple[str, str, str, Optional[str]]] = []  # (gene, go, relation, evidence)

    for result in data.results:
        # Extract GO term info
        go_id = result.get("goId") or result.get("go_id")
        go_name = result.get("goName") or result.get("go_name") or result.get("name", "")
        go_aspect = result.get("goAspect") or result.get("aspect")

        if go_id:
            if go_id not in go_terms:
                go_terms[go_id] = {
                    "name": go_name,
                    "aspect": go_aspect,
                }

            # Extract gene product info
            gene_id = result.get("geneProductId") or result.get("db_object_id")
            if gene_id:
                gene_products.add(gene_id)

                # Determine relation type from qualifier
                qualifier = result.get("qualifier") or result.get("goEvidence")
                if qualifier:
                    if "NOT" in str(qualifier).upper():
                        relation = EdgeType.NEGATIVELY_REGULATES
                    elif "part_of" in str(qualifier).lower():
                        relation = EdgeType.PART_OF
                    elif "regulates" in str(qualifier).lower():
                        relation = EdgeType.REGULATES
                    else:
                        relation = EdgeType.ASSOCIATED_WITH
                else:
                    relation = EdgeType.ASSOCIATED_WITH

                evidence = result.get("evidenceCode") or result.get("evidence_code")
                annotations.append((gene_id, go_id, relation.value, evidence))

    # Create GO term nodes
    for go_id, info in go_terms.items():
        properties: Dict[str, Any] = {}
        if info.get("aspect"):
            properties["aspect"] = info["aspect"]

        node = Node(
            id=go_id,
            label=info.get("name", go_id),
            node_type=NodeType.GO_TERM,
            source=DataSource.GENE_ONTOLOGY,
            properties=frozenset(properties.items()) if properties else frozenset(),
        )
        graph.add_node(node)

    # Create gene product nodes
    for gene_id in gene_products:
        # Try to determine if it's a protein (UniProt) or gene
        if gene_id.startswith("UniProtKB:") or gene_id.startswith("UniProt:"):
            node_type = NodeType.PROTEIN
        else:
            node_type = NodeType.GENE

        node = Node(
            id=gene_id,
            label=gene_id.split(":")[-1] if ":" in gene_id else gene_id,
            node_type=node_type,
            source=DataSource.GENE_ONTOLOGY,
        )
        graph.add_node(node)

    # Create annotation edges
    if create_annotation_edges:
        for gene_id, go_id, relation_str, evidence in annotations:
            try:
                relation = EdgeType(relation_str)
            except ValueError:
                relation = EdgeType.ASSOCIATED_WITH

            evidence_set = frozenset([evidence]) if evidence and include_evidence else frozenset()

            edge = Edge(
                source=gene_id,
                target=go_id,
                relation=relation,
                evidence=evidence_set,
            )
            graph.add_edge(edge)

    return graph

build_reactome_graph

build_reactome_graph

build_reactome_graph(
    data: "ReactomeFetchedData",
    name: str = "ReactomeGraph",
    include_species: bool = True,
    include_disease_info: bool = True,
) -> KnowledgeGraph

Build a knowledge graph from Reactome pathway data.

Creates nodes for pathways and edges based on pathway relationships.

Parameters:

Name Type Description Default
data 'ReactomeFetchedData'

ReactomeFetchedData from Reactome fetcher.

required
name str

Name for the graph.

'ReactomeGraph'
include_species bool

Include species info in node properties.

True
include_disease_info bool

Include disease pathway flag in properties.

True

Returns:

Type Description
KnowledgeGraph

A KnowledgeGraph with pathway nodes.

Example
from biodbs.fetch import Reactome_Fetcher
from biodbs.graph import build_reactome_graph

fetcher = Reactome_Fetcher()
data = fetcher.analyze(["TP53", "BRCA1", "BRCA2"])
graph = build_reactome_graph(data)
print(graph)
# KnowledgeGraph(name='ReactomeGraph', nodes=42, edges=0)
Source code in biodbs/_funcs/graph/builders.py
def build_reactome_graph(
    data: "ReactomeFetchedData",
    name: str = "ReactomeGraph",
    include_species: bool = True,
    include_disease_info: bool = True,
) -> KnowledgeGraph:
    """Build a knowledge graph from Reactome pathway data.

    Creates nodes for pathways and edges based on pathway relationships.

    Args:
        data: ReactomeFetchedData from Reactome fetcher.
        name: Name for the graph.
        include_species: Include species info in node properties.
        include_disease_info: Include disease pathway flag in properties.

    Returns:
        A KnowledgeGraph with pathway nodes.

    Example:
        ```python
        from biodbs.fetch import Reactome_Fetcher
        from biodbs.graph import build_reactome_graph

        fetcher = Reactome_Fetcher()
        data = fetcher.analyze(["TP53", "BRCA1", "BRCA2"])
        graph = build_reactome_graph(data)
        print(graph)
        # KnowledgeGraph(name='ReactomeGraph', nodes=42, edges=0)
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from Reactome pathways",
        source=DataSource.REACTOME,
    )

    for pathway in data.pathways:
        properties: Dict[str, Any] = {}

        # Add statistics
        if pathway.p_value is not None:
            properties["p_value"] = pathway.p_value
        if pathway.fdr is not None:
            properties["fdr"] = pathway.fdr
        if pathway.found_entities is not None:
            properties["found_entities"] = pathway.found_entities
        if pathway.total_entities is not None:
            properties["total_entities"] = pathway.total_entities

        # Add species info
        if include_species and pathway.species:
            properties["species"] = pathway.species.name
            properties["taxon_id"] = pathway.species.taxId

        # Add disease info
        if include_disease_info:
            properties["is_disease_pathway"] = pathway.inDisease
            properties["is_lowest_level"] = pathway.llp

        # Add database ID
        properties["db_id"] = pathway.dbId

        node = Node(
            id=pathway.stId,
            label=pathway.name,
            node_type=NodeType.PATHWAY,
            source=DataSource.REACTOME,
            properties=frozenset(properties.items()),
        )
        graph.add_node(node)

    return graph

build_reactome_hierarchy_graph

build_reactome_hierarchy_graph

build_reactome_hierarchy_graph(
    hierarchy_data: List[Dict[str, Any]],
    name: str = "ReactomeHierarchyGraph",
) -> KnowledgeGraph

Build a knowledge graph from Reactome hierarchy data.

Creates nodes for pathways and edges for parent-child relationships.

Parameters:

Name Type Description Default
hierarchy_data List[Dict[str, Any]]

List of pathway hierarchy dictionaries from Reactome's events hierarchy endpoint.

required
name str

Name for the graph.

'ReactomeHierarchyGraph'

Returns:

Type Description
KnowledgeGraph

A KnowledgeGraph with pathway nodes and hierarchy edges.

Source code in biodbs/_funcs/graph/builders.py
def build_reactome_hierarchy_graph(
    hierarchy_data: List[Dict[str, Any]],
    name: str = "ReactomeHierarchyGraph",
) -> KnowledgeGraph:
    """Build a knowledge graph from Reactome hierarchy data.

    Creates nodes for pathways and edges for parent-child relationships.

    Args:
        hierarchy_data: List of pathway hierarchy dictionaries from
            Reactome's events hierarchy endpoint.
        name: Name for the graph.

    Returns:
        A KnowledgeGraph with pathway nodes and hierarchy edges.
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from Reactome pathway hierarchy",
        source=DataSource.REACTOME,
    )

    def process_node(node_data: Dict[str, Any], parent_id: Optional[str] = None):
        """Recursively process hierarchy nodes."""
        st_id = node_data.get("stId", "")
        name = node_data.get("name", node_data.get("displayName", ""))

        if not st_id:
            return

        properties: Dict[str, Any] = {}
        if node_data.get("hasDiagram"):
            properties["has_diagram"] = True
        if node_data.get("species"):
            properties["species"] = node_data["species"]

        node = Node(
            id=st_id,
            label=name,
            node_type=NodeType.PATHWAY,
            source=DataSource.REACTOME,
            properties=frozenset(properties.items()) if properties else frozenset(),
        )
        graph.add_node(node)

        # Create edge to parent
        if parent_id and parent_id in graph:
            edge = Edge(
                source=st_id,
                target=parent_id,
                relation=EdgeType.PART_OF,
            )
            graph.add_edge(edge)

        # Process children
        children = node_data.get("children", [])
        for child in children:
            process_node(child, st_id)

    # Process top-level nodes
    for node_data in hierarchy_data:
        process_node(node_data)

    return graph

build_kegg_graph

build_kegg_graph

build_kegg_graph(
    data: "KEGGFetchedData",
    name: str = "KEGGGraph",
    node_type: Optional[NodeType] = None,
) -> KnowledgeGraph

Build a knowledge graph from KEGG data.

Creates nodes from KEGG entries. The node type is inferred from the data operation (pathway, compound, drug, etc.) or can be explicitly specified.

Parameters:

Name Type Description Default
data 'KEGGFetchedData'

KEGGFetchedData from KEGG fetcher.

required
name str

Name for the graph.

'KEGGGraph'
node_type Optional[NodeType]

Override the inferred node type.

None

Returns:

Type Description
KnowledgeGraph

A KnowledgeGraph with KEGG nodes.

Example
from biodbs.fetch import kegg_list
from biodbs.graph import build_kegg_graph

data = kegg_list("pathway", organism="hsa")
graph = build_kegg_graph(data, name="HumanPathways")
print(graph)
# KnowledgeGraph(name='HumanPathways', nodes=350, edges=0)
Source code in biodbs/_funcs/graph/builders.py
def build_kegg_graph(
    data: "KEGGFetchedData",
    name: str = "KEGGGraph",
    node_type: Optional[NodeType] = None,
) -> KnowledgeGraph:
    """Build a knowledge graph from KEGG data.

    Creates nodes from KEGG entries. The node type is inferred from
    the data operation (pathway, compound, drug, etc.) or can be
    explicitly specified.

    Args:
        data: KEGGFetchedData from KEGG fetcher.
        name: Name for the graph.
        node_type: Override the inferred node type.

    Returns:
        A KnowledgeGraph with KEGG nodes.

    Example:
        ```python
        from biodbs.fetch import kegg_list
        from biodbs.graph import build_kegg_graph

        data = kegg_list("pathway", organism="hsa")
        graph = build_kegg_graph(data, name="HumanPathways")
        print(graph)
        # KnowledgeGraph(name='HumanPathways', nodes=350, edges=0)
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from KEGG database",
        source=DataSource.KEGG,
    )

    # Infer node type from operation or entry IDs
    inferred_type = node_type
    if inferred_type is None:
        if data.operation == "list":
            # Try to infer from first record
            if data.records:
                entry_id = data.records[0].get("entry_id", "")
                inferred_type = _infer_kegg_node_type(entry_id)
        else:
            inferred_type = NodeType.OTHER

    if inferred_type is None:
        inferred_type = NodeType.OTHER

    for record in data.records:
        entry_id = record.get("entry_id") or record.get("ENTRY", "")
        description = record.get("description") or record.get("NAME", "")

        if not entry_id:
            continue

        # For flat file records, extract more properties
        properties: Dict[str, Any] = {}
        if data.format == "flat_file":
            for key in ["DEFINITION", "PATHWAY", "MODULE", "DISEASE", "DBLINKS"]:
                if key in record:
                    value = record[key]
                    # Convert lists to tuples so they are hashable for frozenset
                    properties[key.lower()] = tuple(value) if isinstance(value, list) else value

        # Create node
        node = Node(
            id=entry_id,
            label=description if description else entry_id,
            node_type=inferred_type,
            source=DataSource.KEGG,
            properties=frozenset(properties.items()) if properties else frozenset(),
        )
        graph.add_node(node)

    return graph
build_kegg_link_graph(
    link_data: "KEGGFetchedData",
    source_type: NodeType = GENE,
    target_type: NodeType = PATHWAY,
    relation: EdgeType = PARTICIPATES_IN,
    name: str = "KEGGLinkGraph",
) -> KnowledgeGraph

Build a knowledge graph from KEGG link data.

Creates nodes and edges from KEGG link query results.

Parameters:

Name Type Description Default
link_data 'KEGGFetchedData'

KEGGFetchedData from kegg_link operation.

required
source_type NodeType

Node type for source entries.

GENE
target_type NodeType

Node type for target entries.

PATHWAY
relation EdgeType

Edge type for the links.

PARTICIPATES_IN
name str

Name for the graph.

'KEGGLinkGraph'

Returns:

Type Description
KnowledgeGraph

A KnowledgeGraph with nodes and edges from link data.

Example
from biodbs.fetch import kegg_link
from biodbs.graph import build_kegg_link_graph, NodeType

data = kegg_link("pathway", "hsa")  # genes to pathways
graph = build_kegg_link_graph(
    data,
    source_type=NodeType.GENE,
    target_type=NodeType.PATHWAY,
)
print(graph)
# KnowledgeGraph(name='KEGGLinkGraph', nodes=8500, edges=42000)
Source code in biodbs/_funcs/graph/builders.py
def build_kegg_link_graph(
    link_data: "KEGGFetchedData",
    source_type: NodeType = NodeType.GENE,
    target_type: NodeType = NodeType.PATHWAY,
    relation: EdgeType = EdgeType.PARTICIPATES_IN,
    name: str = "KEGGLinkGraph",
) -> KnowledgeGraph:
    """Build a knowledge graph from KEGG link data.

    Creates nodes and edges from KEGG link query results.

    Args:
        link_data: KEGGFetchedData from kegg_link operation.
        source_type: Node type for source entries.
        target_type: Node type for target entries.
        relation: Edge type for the links.
        name: Name for the graph.

    Returns:
        A KnowledgeGraph with nodes and edges from link data.

    Example:
        ```python
        from biodbs.fetch import kegg_link
        from biodbs.graph import build_kegg_link_graph, NodeType

        data = kegg_link("pathway", "hsa")  # genes to pathways
        graph = build_kegg_link_graph(
            data,
            source_type=NodeType.GENE,
            target_type=NodeType.PATHWAY,
        )
        print(graph)
        # KnowledgeGraph(name='KEGGLinkGraph', nodes=8500, edges=42000)
        ```
    """
    graph = KnowledgeGraph(
        name=name,
        description="Knowledge graph from KEGG links",
        source=DataSource.KEGG,
    )

    sources: Set[str] = set()
    targets: Set[str] = set()

    for record in link_data.records:
        source_id = record.get("source_id", "")
        target_id = record.get("target_id", "")

        if source_id and target_id:
            sources.add(source_id)
            targets.add(target_id)

    # Create source nodes
    for source_id in sources:
        node = Node(
            id=source_id,
            label=source_id.split(":")[-1] if ":" in source_id else source_id,
            node_type=source_type,
            source=DataSource.KEGG,
        )
        graph.add_node(node)

    # Create target nodes
    for target_id in targets:
        node = Node(
            id=target_id,
            label=target_id.split(":")[-1] if ":" in target_id else target_id,
            node_type=target_type,
            source=DataSource.KEGG,
        )
        graph.add_node(node)

    # Create edges
    for record in link_data.records:
        source_id = record.get("source_id", "")
        target_id = record.get("target_id", "")

        if source_id and target_id:
            edge = Edge(
                source=source_id,
                target=target_id,
                relation=relation,
            )
            graph.add_edge(edge)

    return graph

merge_graphs

merge_graphs

merge_graphs(
    *graphs: KnowledgeGraph,
    name: str = "MergedGraph",
    description: Optional[str] = None,
) -> KnowledgeGraph

Merge multiple knowledge graphs into one.

Combines all nodes and edges from the input graphs. Duplicate nodes (same ID) are kept as-is (first occurrence wins). Duplicate edges (same source, target, relation) are deduplicated.

Parameters:

Name Type Description Default
*graphs KnowledgeGraph

Variable number of KnowledgeGraph instances to merge.

()
name str

Name for the merged graph.

'MergedGraph'
description Optional[str]

Optional description for the merged graph.

None

Returns:

Type Description
KnowledgeGraph

A new KnowledgeGraph containing all nodes and edges.

Example
from biodbs.graph import merge_graphs, build_disease_graph, build_go_graph

disease_graph = build_disease_graph(disease_data)
go_graph = build_go_graph(go_data)
merged = merge_graphs(disease_graph, go_graph, name="BioGraph")
print(merged)
# KnowledgeGraph(name='BioGraph', nodes=72, edges=24)
Source code in biodbs/_funcs/graph/builders.py
def merge_graphs(
    *graphs: KnowledgeGraph,
    name: str = "MergedGraph",
    description: Optional[str] = None,
) -> KnowledgeGraph:
    """Merge multiple knowledge graphs into one.

    Combines all nodes and edges from the input graphs. Duplicate nodes
    (same ID) are kept as-is (first occurrence wins). Duplicate edges
    (same source, target, relation) are deduplicated.

    Args:
        *graphs: Variable number of KnowledgeGraph instances to merge.
        name: Name for the merged graph.
        description: Optional description for the merged graph.

    Returns:
        A new KnowledgeGraph containing all nodes and edges.

    Example:
        ```python
        from biodbs.graph import merge_graphs, build_disease_graph, build_go_graph

        disease_graph = build_disease_graph(disease_data)
        go_graph = build_go_graph(go_data)
        merged = merge_graphs(disease_graph, go_graph, name="BioGraph")
        print(merged)
        # KnowledgeGraph(name='BioGraph', nodes=72, edges=24)
        ```
    """
    if not graphs:
        return KnowledgeGraph(name=name, description=description)

    if len(graphs) == 1:
        # Just copy the single graph
        merged = KnowledgeGraph(
            name=name,
            description=description or graphs[0].description,
            source=graphs[0].source,
        )
        merged.add_nodes(graphs[0].nodes)
        merged.add_edges(graphs[0].edges)
        return merged

    # Start with first graph as base
    merged = KnowledgeGraph(
        name=name,
        description=description or f"Merged from {len(graphs)} graphs",
        source=graphs[0].source,
    )

    # Add all nodes and edges from each graph
    for graph in graphs:
        merged.add_nodes(graph.nodes)
        merged.add_edges(graph.edges)

    return merged

Export Functions

to_networkx

to_networkx

to_networkx(
    graph: KnowledgeGraph,
    include_properties: bool = True,
    include_xrefs: bool = True,
) -> "nx.DiGraph"

Export a KnowledgeGraph to a NetworkX directed graph.

Requires the networkx package to be installed.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The KnowledgeGraph to export.

required
include_properties bool

Include node/edge properties as attributes.

True
include_xrefs bool

Include cross-references as node attributes.

True

Returns:

Type Description
'nx.DiGraph'

A NetworkX DiGraph with the same nodes and edges.

Raises:

Type Description
ImportError

If networkx is not installed.

Example
from biodbs.graph import to_networkx, build_disease_graph
import networkx as nx

graph = build_disease_graph(disease_data)
nx_graph = to_networkx(graph)
centrality = nx.degree_centrality(nx_graph)
print(f"Nodes: {nx_graph.number_of_nodes()}")
# Nodes: 47
Source code in biodbs/_funcs/graph/exporters.py
def to_networkx(
    graph: KnowledgeGraph,
    include_properties: bool = True,
    include_xrefs: bool = True,
) -> "nx.DiGraph":
    """Export a KnowledgeGraph to a NetworkX directed graph.

    Requires the networkx package to be installed.

    Args:
        graph: The KnowledgeGraph to export.
        include_properties: Include node/edge properties as attributes.
        include_xrefs: Include cross-references as node attributes.

    Returns:
        A NetworkX DiGraph with the same nodes and edges.

    Raises:
        ImportError: If networkx is not installed.

    Example:
        ```python
        from biodbs.graph import to_networkx, build_disease_graph
        import networkx as nx

        graph = build_disease_graph(disease_data)
        nx_graph = to_networkx(graph)
        centrality = nx.degree_centrality(nx_graph)
        print(f"Nodes: {nx_graph.number_of_nodes()}")
        # Nodes: 47
        ```
    """
    try:
        import networkx as nx
    except ImportError:
        raise ImportError(
            "networkx is required for to_networkx(). "
            "Install it with: pip install networkx"
        )

    G = nx.DiGraph()
    G.graph["name"] = graph.name
    G.graph["source"] = graph.source.value
    if graph.description:
        G.graph["description"] = graph.description

    # Add nodes
    for node in graph.nodes:
        attrs: Dict[str, Any] = {
            "label": node.label,
            "node_type": node.node_type.value,
            "source": node.source.value,
        }

        if include_properties:
            attrs.update(node.get_properties_dict())

        if include_xrefs and node.xrefs:
            attrs["xrefs"] = list(node.xrefs)

        G.add_node(node.id, **attrs)

    # Add edges
    for edge in graph.edges:
        attrs: Dict[str, Any] = {
            "relation": edge.relation.value,
            "weight": edge.weight,
        }

        if include_properties:
            attrs.update(edge.get_properties_dict())

        if edge.evidence:
            attrs["evidence"] = list(edge.evidence)

        G.add_edge(edge.source, edge.target, **attrs)

    return G

to_json_ld

to_json_ld

to_json_ld(
    graph: KnowledgeGraph,
    include_context: bool = True,
    compact: bool = False,
    base_uri: Optional[str] = None,
) -> Dict[str, Any]

Export a KnowledgeGraph to JSON-LD format.

JSON-LD is ideal for KG-RAG (Knowledge Graph Retrieval Augmented Generation) applications as it provides structured, semantically rich data that can be easily processed by LLMs.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The KnowledgeGraph to export.

required
include_context bool

Include JSON-LD @context.

True
compact bool

Use compact representation (less verbose).

False
base_uri Optional[str]

Base URI for node IDs.

None

Returns:

Type Description
Dict[str, Any]

A dictionary in JSON-LD format.

Example
from biodbs.graph import to_json_ld, build_disease_graph
import json

graph = build_disease_graph(disease_data)
json_ld = to_json_ld(graph)
# Use in RAG pipeline
context = json.dumps(json_ld, indent=2)
print(json_ld["@type"])
# schema:Dataset
Source code in biodbs/_funcs/graph/exporters.py
def to_json_ld(
    graph: KnowledgeGraph,
    include_context: bool = True,
    compact: bool = False,
    base_uri: Optional[str] = None,
) -> Dict[str, Any]:
    """Export a KnowledgeGraph to JSON-LD format.

    JSON-LD is ideal for KG-RAG (Knowledge Graph Retrieval Augmented
    Generation) applications as it provides structured, semantically
    rich data that can be easily processed by LLMs.

    Args:
        graph: The KnowledgeGraph to export.
        include_context: Include JSON-LD @context.
        compact: Use compact representation (less verbose).
        base_uri: Base URI for node IDs.

    Returns:
        A dictionary in JSON-LD format.

    Example:
        ```python
        from biodbs.graph import to_json_ld, build_disease_graph
        import json

        graph = build_disease_graph(disease_data)
        json_ld = to_json_ld(graph)
        # Use in RAG pipeline
        context = json.dumps(json_ld, indent=2)
        print(json_ld["@type"])
        # schema:Dataset
        ```
    """
    result: Dict[str, Any] = {}

    # Add context
    if include_context:
        result.update(BIOKG_CONTEXT)

    # Graph metadata
    result["@type"] = "schema:Dataset"
    result["schema:name"] = graph.name
    if graph.description:
        result["schema:description"] = graph.description
    result["schema:creator"] = "biodbs"
    result["schema:source"] = graph.source.value

    # Build nodes as @graph
    nodes_list: List[Dict[str, Any]] = []

    for node in graph.nodes:
        node_obj = _node_to_json_ld(node, base_uri, compact)

        # Add outgoing edges as properties
        outgoing = graph.get_outgoing_edges(node.id)
        for edge in outgoing:
            relation_key = _edge_type_to_json_ld_key(edge.relation)
            target_id = _make_uri(edge.target, base_uri)

            if relation_key in node_obj:
                # Multiple edges of same type
                if isinstance(node_obj[relation_key], list):
                    node_obj[relation_key].append(target_id)
                else:
                    node_obj[relation_key] = [node_obj[relation_key], target_id]
            else:
                node_obj[relation_key] = target_id

        nodes_list.append(node_obj)

    result["@graph"] = nodes_list

    return result

to_rdf

to_rdf

to_rdf(
    graph: KnowledgeGraph,
    format: Literal["turtle", "xml", "n3", "nt"] = "turtle",
    base_uri: str = "http://example.org/biokg/",
) -> str

Export a KnowledgeGraph to RDF format.

Requires the rdflib package to be installed.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The KnowledgeGraph to export.

required
format Literal['turtle', 'xml', 'n3', 'nt']

RDF serialization format ("turtle", "xml", "n3", "nt").

'turtle'
base_uri str

Base URI for the graph.

'http://example.org/biokg/'

Returns:

Type Description
str

RDF data as a string in the specified format.

Raises:

Type Description
ImportError

If rdflib is not installed.

Example
from biodbs.graph import to_rdf, build_disease_graph

graph = build_disease_graph(disease_data)
turtle = to_rdf(graph, format="turtle")
print(turtle[:200])
# @prefix base: <http://example.org/biokg/> .
# @prefix biokg: <http://example.org/biokg/vocab/> .
# ...
Source code in biodbs/_funcs/graph/exporters.py
def to_rdf(
    graph: KnowledgeGraph,
    format: Literal["turtle", "xml", "n3", "nt"] = "turtle",
    base_uri: str = "http://example.org/biokg/",
) -> str:
    """Export a KnowledgeGraph to RDF format.

    Requires the rdflib package to be installed.

    Args:
        graph: The KnowledgeGraph to export.
        format: RDF serialization format ("turtle", "xml", "n3", "nt").
        base_uri: Base URI for the graph.

    Returns:
        RDF data as a string in the specified format.

    Raises:
        ImportError: If rdflib is not installed.

    Example:
        ```python
        from biodbs.graph import to_rdf, build_disease_graph

        graph = build_disease_graph(disease_data)
        turtle = to_rdf(graph, format="turtle")
        print(turtle[:200])
        # @prefix base: <http://example.org/biokg/> .
        # @prefix biokg: <http://example.org/biokg/vocab/> .
        # ...
        ```
    """
    try:
        from rdflib import Graph as RDFGraph
        from rdflib import Literal, Namespace, URIRef
        from rdflib.namespace import OWL, RDF, RDFS, XSD
    except ImportError:
        raise ImportError(
            "rdflib is required for to_rdf(). "
            "Install it with: pip install rdflib"
        )

    g = RDFGraph()

    # Define namespaces
    BASE = Namespace(base_uri)
    OBO = Namespace("http://purl.obolibrary.org/obo/")
    SCHEMA = Namespace("http://schema.org/")
    BIOKG = Namespace(base_uri + "vocab/")

    g.bind("base", BASE)
    g.bind("obo", OBO)
    g.bind("schema", SCHEMA)
    g.bind("biokg", BIOKG)

    # Add nodes
    for node in graph.nodes:
        node_uri = URIRef(base_uri + node.id.replace(":", "_"))

        # Type
        type_uri = _node_type_to_rdf_type(node.node_type, OBO, SCHEMA)
        g.add((node_uri, RDF.type, type_uri))

        # Label
        g.add((node_uri, RDFS.label, Literal(node.label)))

        # Source
        g.add((node_uri, BIOKG.source, Literal(node.source.value)))

        # Properties
        props = node.get_properties_dict()
        if "definition" in props:
            g.add((node_uri, SCHEMA.description, Literal(props["definition"])))
        for key, value in props.items():
            if key != "definition" and isinstance(value, (str, int, float, bool)):
                g.add((node_uri, BIOKG[key], Literal(value)))

        # Xrefs
        for xref in node.xrefs:
            g.add((node_uri, OBO.hasDbXref, Literal(xref)))

    # Add edges
    for edge in graph.edges:
        source_uri = URIRef(base_uri + edge.source.replace(":", "_"))
        target_uri = URIRef(base_uri + edge.target.replace(":", "_"))
        predicate = _edge_type_to_rdf_predicate(edge.relation, OBO, RDFS, BIOKG)

        g.add((source_uri, predicate, target_uri))

        # Add edge weight if not 1.0
        if edge.weight != 1.0:
            # Create a reified statement for the weight
            pass  # Simplified for now

    return g.serialize(format=format)

to_neo4j_csv

to_neo4j_csv

to_neo4j_csv(
    graph: KnowledgeGraph,
    output_dir: Union[str, Path],
    nodes_filename: str = "nodes.csv",
    edges_filename: str = "relationships.csv",
    include_headers: bool = True,
) -> Tuple[Path, Path]

Export a KnowledgeGraph to CSV files for Neo4j import.

Creates two CSV files: one for nodes and one for relationships, formatted for Neo4j's LOAD CSV or neo4j-admin import.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The KnowledgeGraph to export.

required
output_dir Union[str, Path]

Directory to write CSV files.

required
nodes_filename str

Filename for nodes CSV.

'nodes.csv'
edges_filename str

Filename for relationships CSV.

'relationships.csv'
include_headers bool

Include Neo4j import headers.

True

Returns:

Type Description
Tuple[Path, Path]

Tuple of (nodes_path, edges_path).

Example
from biodbs.graph import to_neo4j_csv, build_disease_graph

graph = build_disease_graph(disease_data)
nodes_path, edges_path = to_neo4j_csv(graph, "./neo4j_import/")
print(f"Nodes: {nodes_path}")
# Nodes: neo4j_import/nodes.csv
print(f"Edges: {edges_path}")
# Edges: neo4j_import/relationships.csv
Source code in biodbs/_funcs/graph/exporters.py
def to_neo4j_csv(
    graph: KnowledgeGraph,
    output_dir: Union[str, Path],
    nodes_filename: str = "nodes.csv",
    edges_filename: str = "relationships.csv",
    include_headers: bool = True,
) -> Tuple[Path, Path]:
    """Export a KnowledgeGraph to CSV files for Neo4j import.

    Creates two CSV files: one for nodes and one for relationships,
    formatted for Neo4j's LOAD CSV or neo4j-admin import.

    Args:
        graph: The KnowledgeGraph to export.
        output_dir: Directory to write CSV files.
        nodes_filename: Filename for nodes CSV.
        edges_filename: Filename for relationships CSV.
        include_headers: Include Neo4j import headers.

    Returns:
        Tuple of (nodes_path, edges_path).

    Example:
        ```python
        from biodbs.graph import to_neo4j_csv, build_disease_graph

        graph = build_disease_graph(disease_data)
        nodes_path, edges_path = to_neo4j_csv(graph, "./neo4j_import/")
        print(f"Nodes: {nodes_path}")
        # Nodes: neo4j_import/nodes.csv
        print(f"Edges: {edges_path}")
        # Edges: neo4j_import/relationships.csv
        ```
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)

    nodes_path = output_dir / nodes_filename
    edges_path = output_dir / edges_filename

    # Write nodes CSV
    with open(nodes_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)

        if include_headers:
            # Neo4j import headers
            writer.writerow([
                "id:ID",
                "label",
                "node_type",
                "source",
                "properties:string",
                "xrefs:string[]",
                ":LABEL",
            ])

        for node in graph.nodes:
            # Convert properties to JSON string
            props_json = json.dumps(node.get_properties_dict()) if node.properties else ""

            # Convert xrefs to Neo4j array format
            xrefs_str = ";".join(node.xrefs) if node.xrefs else ""

            # Label for Neo4j (node type as label)
            neo4j_label = node.node_type.value.title().replace("_", "")

            writer.writerow([
                node.id,
                node.label,
                node.node_type.value,
                node.source.value,
                props_json,
                xrefs_str,
                neo4j_label,
            ])

    # Write relationships CSV
    with open(edges_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)

        if include_headers:
            # Neo4j import headers
            writer.writerow([
                ":START_ID",
                ":END_ID",
                "weight:float",
                "evidence:string[]",
                "properties:string",
                ":TYPE",
            ])

        for edge in graph.edges:
            # Convert evidence to Neo4j array format
            evidence_str = ";".join(edge.evidence) if edge.evidence else ""

            # Convert properties to JSON string
            props_json = json.dumps(edge.get_properties_dict()) if edge.properties else ""

            # Relationship type (uppercase with underscores)
            rel_type = edge.relation.value.upper()

            writer.writerow([
                edge.source,
                edge.target,
                edge.weight,
                evidence_str,
                props_json,
                rel_type,
            ])

    return nodes_path, edges_path

to_cypher

to_cypher

to_cypher(
    graph: KnowledgeGraph,
    batch_size: int = 100,
    use_merge: bool = True,
) -> str

Generate Cypher queries to create the graph in Neo4j.

Creates CREATE or MERGE statements for nodes and relationships.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The KnowledgeGraph to export.

required
batch_size int

Number of statements per transaction.

100
use_merge bool

Use MERGE instead of CREATE (prevents duplicates).

True

Returns:

Type Description
str

Cypher script as a string.

Example
from biodbs.graph import to_cypher, build_disease_graph

graph = build_disease_graph(disease_data)
cypher = to_cypher(graph)
print(cypher[:150])
# // Cypher script generated from KnowledgeGraph: DiseaseOntologyGraph
# // Nodes: 47, Edges: 0
# ...
Source code in biodbs/_funcs/graph/exporters.py
def to_cypher(
    graph: KnowledgeGraph,
    batch_size: int = 100,
    use_merge: bool = True,
) -> str:
    """Generate Cypher queries to create the graph in Neo4j.

    Creates CREATE or MERGE statements for nodes and relationships.

    Args:
        graph: The KnowledgeGraph to export.
        batch_size: Number of statements per transaction.
        use_merge: Use MERGE instead of CREATE (prevents duplicates).

    Returns:
        Cypher script as a string.

    Example:
        ```python
        from biodbs.graph import to_cypher, build_disease_graph

        graph = build_disease_graph(disease_data)
        cypher = to_cypher(graph)
        print(cypher[:150])
        # // Cypher script generated from KnowledgeGraph: DiseaseOntologyGraph
        # // Nodes: 47, Edges: 0
        # ...
        ```
    """
    lines: List[str] = []
    command = "MERGE" if use_merge else "CREATE"

    # Header comment
    lines.append(f"// Cypher script generated from KnowledgeGraph: {graph.name}")
    lines.append(f"// Nodes: {graph.node_count}, Edges: {graph.edge_count}")
    lines.append("")

    # Create constraints for efficient MERGE
    if use_merge:
        node_types_used = {node.node_type for node in graph.nodes}
        for node_type in node_types_used:
            label = node_type.value.title().replace("_", "")
            lines.append(
                f"CREATE CONSTRAINT IF NOT EXISTS FOR (n:{label}) "
                f"REQUIRE n.id IS UNIQUE;"
            )
        lines.append("")

    # Create nodes
    lines.append("// Create nodes")
    for i, node in enumerate(graph.nodes):
        if i > 0 and i % batch_size == 0:
            lines.append("")

        label = node.node_type.value.title().replace("_", "")
        props = {
            "id": node.id,
            "label": node.label,
            "source": node.source.value,
        }
        props.update(node.get_properties_dict())

        # Escape special characters in strings
        props_str = ", ".join(
            f"{k}: {_cypher_value(v)}"
            for k, v in props.items()
        )

        lines.append(f"{command} (:{label} {{{props_str}}});")

    lines.append("")

    # Create relationships
    lines.append("// Create relationships")
    for i, edge in enumerate(graph.edges):
        if i > 0 and i % batch_size == 0:
            lines.append("")

        rel_type = edge.relation.value.upper()

        props: Dict[str, Any] = {"weight": edge.weight}
        if edge.evidence:
            props["evidence"] = list(edge.evidence)
        props.update(edge.get_properties_dict())

        props_str = ", ".join(
            f"{k}: {_cypher_value(v)}"
            for k, v in props.items()
        )

        lines.append(
            f"MATCH (a {{id: {_cypher_value(edge.source)}}}), "
            f"(b {{id: {_cypher_value(edge.target)}}}) "
            f"{command} (a)-[:{rel_type} {{{props_str}}}]->(b);"
        )

    return "\n".join(lines)

Utility Functions

find_shortest_path

find_shortest_path

find_shortest_path(
    graph: KnowledgeGraph,
    source: str,
    target: str,
    directed: bool = True,
    max_depth: Optional[int] = None,
) -> Optional[List[str]]

Find the shortest path between two nodes using BFS.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The knowledge graph to search.

required
source str

ID of the source node.

required
target str

ID of the target node.

required
directed bool

If True, follow edge direction. If False, treat as undirected.

True
max_depth Optional[int]

Maximum path length to search.

None

Returns:

Type Description
Optional[List[str]]

List of node IDs forming the shortest path, or None if no path exists.

Example
path = find_shortest_path(graph, "DOID:162", "DOID:1612")
if path:
    print(" -> ".join(path))
# DOID:162 -> DOID:1612
Source code in biodbs/_funcs/graph/utils.py
def find_shortest_path(
    graph: KnowledgeGraph,
    source: str,
    target: str,
    directed: bool = True,
    max_depth: Optional[int] = None,
) -> Optional[List[str]]:
    """Find the shortest path between two nodes using BFS.

    Args:
        graph: The knowledge graph to search.
        source: ID of the source node.
        target: ID of the target node.
        directed: If True, follow edge direction. If False, treat as undirected.
        max_depth: Maximum path length to search.

    Returns:
        List of node IDs forming the shortest path, or None if no path exists.

    Example:
        ```python
        path = find_shortest_path(graph, "DOID:162", "DOID:1612")
        if path:
            print(" -> ".join(path))
        # DOID:162 -> DOID:1612
        ```
    """
    if source not in graph or target not in graph:
        return None

    if source == target:
        return [source]

    # BFS with path tracking
    queue: deque = deque([(source, [source])])
    visited: Set[str] = {source}

    while queue:
        current, path = queue.popleft()

        if max_depth is not None and len(path) > max_depth:
            continue

        # Get neighbors
        neighbors = set()

        # Outgoing edges
        for edge in graph.get_outgoing_edges(current):
            neighbors.add(edge.target)

        # Incoming edges (if undirected)
        if not directed:
            for edge in graph.get_incoming_edges(current):
                neighbors.add(edge.source)

        for neighbor in neighbors:
            if neighbor == target:
                return path + [neighbor]

            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, path + [neighbor]))

    return None

find_all_paths

find_all_paths

find_all_paths(
    graph: KnowledgeGraph,
    source: str,
    target: str,
    max_depth: int = 5,
    directed: bool = True,
) -> List[List[str]]

Find all paths between two nodes using DFS.

Warning: This can be slow for large graphs or high max_depth.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The knowledge graph to search.

required
source str

ID of the source node.

required
target str

ID of the target node.

required
max_depth int

Maximum path length.

5
directed bool

If True, follow edge direction. If False, treat as undirected.

True

Returns:

Type Description
List[List[str]]

List of paths, where each path is a list of node IDs.

Example
paths = find_all_paths(graph, "A", "D", max_depth=3)
for path in paths:
    print(" -> ".join(path))
# A -> B -> D
# A -> C -> D
Source code in biodbs/_funcs/graph/utils.py
def find_all_paths(
    graph: KnowledgeGraph,
    source: str,
    target: str,
    max_depth: int = 5,
    directed: bool = True,
) -> List[List[str]]:
    """Find all paths between two nodes using DFS.

    Warning: This can be slow for large graphs or high max_depth.

    Args:
        graph: The knowledge graph to search.
        source: ID of the source node.
        target: ID of the target node.
        max_depth: Maximum path length.
        directed: If True, follow edge direction. If False, treat as undirected.

    Returns:
        List of paths, where each path is a list of node IDs.

    Example:
        ```python
        paths = find_all_paths(graph, "A", "D", max_depth=3)
        for path in paths:
            print(" -> ".join(path))
        # A -> B -> D
        # A -> C -> D
        ```
    """
    if source not in graph or target not in graph:
        return []

    all_paths: List[List[str]] = []

    def dfs(current: str, path: List[str], visited: Set[str]):
        if current == target:
            all_paths.append(path.copy())
            return

        if len(path) >= max_depth:
            return

        # Get neighbors
        neighbors = set()

        for edge in graph.get_outgoing_edges(current):
            neighbors.add(edge.target)

        if not directed:
            for edge in graph.get_incoming_edges(current):
                neighbors.add(edge.source)

        for neighbor in neighbors:
            if neighbor not in visited:
                visited.add(neighbor)
                path.append(neighbor)
                dfs(neighbor, path, visited)
                path.pop()
                visited.remove(neighbor)

    dfs(source, [source], {source})
    return all_paths

get_neighborhood

get_neighborhood

get_neighborhood(
    graph: KnowledgeGraph,
    node_id: str,
    hops: int = 1,
    directed: bool = False,
    include_edges: bool = False,
) -> Dict[str, Any]

Get nodes within N hops of a starting node.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The knowledge graph.

required
node_id str

ID of the center node.

required
hops int

Number of hops (distance) to include.

1
directed bool

If True, only follow outgoing edges.

False
include_edges bool

If True, include edges in the result.

False

Returns:

Type Description
Dict[str, Any]

Dictionary with "nodes" (and optionally "edges") keys.

Example
neighborhood = get_neighborhood(graph, "DOID:162", hops=2)
print(f"Found {len(neighborhood['nodes'])} nodes within 2 hops")
# Found 25 nodes within 2 hops
Source code in biodbs/_funcs/graph/utils.py
def get_neighborhood(
    graph: KnowledgeGraph,
    node_id: str,
    hops: int = 1,
    directed: bool = False,
    include_edges: bool = False,
) -> Dict[str, Any]:
    """Get nodes within N hops of a starting node.

    Args:
        graph: The knowledge graph.
        node_id: ID of the center node.
        hops: Number of hops (distance) to include.
        directed: If True, only follow outgoing edges.
        include_edges: If True, include edges in the result.

    Returns:
        Dictionary with "nodes" (and optionally "edges") keys.

    Example:
        ```python
        neighborhood = get_neighborhood(graph, "DOID:162", hops=2)
        print(f"Found {len(neighborhood['nodes'])} nodes within 2 hops")
        # Found 25 nodes within 2 hops
        ```
    """
    if node_id not in graph:
        return {"nodes": [], "edges": []} if include_edges else {"nodes": []}

    visited: Set[str] = {node_id}
    current_level = {node_id}
    collected_edges: Set[Edge] = set()

    for _ in range(hops):
        next_level: Set[str] = set()

        for current in current_level:
            # Outgoing edges
            for edge in graph.get_outgoing_edges(current):
                if edge.target not in visited:
                    next_level.add(edge.target)
                    visited.add(edge.target)
                if include_edges:
                    collected_edges.add(edge)

            # Incoming edges (if undirected)
            if not directed:
                for edge in graph.get_incoming_edges(current):
                    if edge.source not in visited:
                        next_level.add(edge.source)
                        visited.add(edge.source)
                    if include_edges:
                        collected_edges.add(edge)

        current_level = next_level

    # Get actual node objects
    nodes = [graph.get_node(nid) for nid in visited if graph.get_node(nid)]

    result: Dict[str, Any] = {"nodes": nodes}
    if include_edges:
        result["edges"] = list(collected_edges)

    return result

get_connected_component

get_connected_component

get_connected_component(
    graph: KnowledgeGraph,
    node_id: str,
    directed: bool = False,
) -> Set[str]

Get all nodes in the same connected component.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The knowledge graph.

required
node_id str

ID of a node in the component.

required
directed bool

If True, use strongly connected component semantics.

False

Returns:

Type Description
Set[str]

Set of node IDs in the same component.

Example
component = get_connected_component(graph, "DOID:162")
print(f"Component has {len(component)} nodes")
# Component has 47 nodes
Source code in biodbs/_funcs/graph/utils.py
def get_connected_component(
    graph: KnowledgeGraph,
    node_id: str,
    directed: bool = False,
) -> Set[str]:
    """Get all nodes in the same connected component.

    Args:
        graph: The knowledge graph.
        node_id: ID of a node in the component.
        directed: If True, use strongly connected component semantics.

    Returns:
        Set of node IDs in the same component.

    Example:
        ```python
        component = get_connected_component(graph, "DOID:162")
        print(f"Component has {len(component)} nodes")
        # Component has 47 nodes
        ```
    """
    if node_id not in graph:
        return set()

    visited: Set[str] = set()
    queue: deque = deque([node_id])

    while queue:
        current = queue.popleft()

        if current in visited:
            continue
        visited.add(current)

        # Outgoing edges
        for edge in graph.get_outgoing_edges(current):
            if edge.target not in visited:
                queue.append(edge.target)

        # Incoming edges (for undirected or weak connectivity)
        if not directed:
            for edge in graph.get_incoming_edges(current):
                if edge.source not in visited:
                    queue.append(edge.source)

    return visited

get_all_connected_components

get_all_connected_components

get_all_connected_components(
    graph: KnowledgeGraph, directed: bool = False
) -> List[Set[str]]

Get all connected components in the graph.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The knowledge graph.

required
directed bool

If True, find strongly connected components.

False

Returns:

Type Description
List[Set[str]]

List of sets, each containing node IDs in a component.

Source code in biodbs/_funcs/graph/utils.py
def get_all_connected_components(
    graph: KnowledgeGraph,
    directed: bool = False,
) -> List[Set[str]]:
    """Get all connected components in the graph.

    Args:
        graph: The knowledge graph.
        directed: If True, find strongly connected components.

    Returns:
        List of sets, each containing node IDs in a component.
    """
    remaining = set(node.id for node in graph.nodes)
    components: List[Set[str]] = []

    while remaining:
        # Pick any remaining node
        start = next(iter(remaining))
        component = get_connected_component(graph, start, directed)
        components.append(component)
        remaining -= component

    return components

find_hub_nodes

find_hub_nodes

find_hub_nodes(
    graph: KnowledgeGraph,
    top_n: int = 10,
    direction: str = "both",
    node_type: Optional[NodeType] = None,
) -> List[Tuple[str, int]]

Find the most highly connected nodes (hubs).

Parameters:

Name Type Description Default
graph KnowledgeGraph

The knowledge graph.

required
top_n int

Number of top hubs to return.

10
direction str

"outgoing", "incoming", or "both".

'both'
node_type Optional[NodeType]

Optional filter by node type.

None

Returns:

Type Description
List[Tuple[str, int]]

List of (node_id, degree) tuples, sorted by degree descending.

Example
hubs = find_hub_nodes(graph, top_n=5)
for node_id, degree in hubs:
    print(f"{node_id}: {degree} connections")
# DOID:162: 15 connections
# DOID:4: 12 connections
# ...
Source code in biodbs/_funcs/graph/utils.py
def find_hub_nodes(
    graph: KnowledgeGraph,
    top_n: int = 10,
    direction: str = "both",
    node_type: Optional[NodeType] = None,
) -> List[Tuple[str, int]]:
    """Find the most highly connected nodes (hubs).

    Args:
        graph: The knowledge graph.
        top_n: Number of top hubs to return.
        direction: "outgoing", "incoming", or "both".
        node_type: Optional filter by node type.

    Returns:
        List of (node_id, degree) tuples, sorted by degree descending.

    Example:
        ```python
        hubs = find_hub_nodes(graph, top_n=5)
        for node_id, degree in hubs:
            print(f"{node_id}: {degree} connections")
        # DOID:162: 15 connections
        # DOID:4: 12 connections
        # ...
        ```
    """
    degrees: List[Tuple[str, int]] = []

    for node in graph.nodes:
        if node_type is not None and node.node_type != node_type:
            continue

        degree = graph.get_degree(node.id, direction)
        degrees.append((node.id, degree))

    # Sort by degree descending
    degrees.sort(key=lambda x: -x[1])

    return degrees[:top_n]

compute_degree_distribution

compute_degree_distribution

compute_degree_distribution(
    graph: KnowledgeGraph, direction: str = "both"
) -> Dict[int, int]

Compute the degree distribution of the graph.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The knowledge graph.

required
direction str

"outgoing", "incoming", or "both".

'both'

Returns:

Type Description
Dict[int, int]

Dictionary mapping degree to count of nodes with that degree.

Source code in biodbs/_funcs/graph/utils.py
def compute_degree_distribution(
    graph: KnowledgeGraph,
    direction: str = "both",
) -> Dict[int, int]:
    """Compute the degree distribution of the graph.

    Args:
        graph: The knowledge graph.
        direction: "outgoing", "incoming", or "both".

    Returns:
        Dictionary mapping degree to count of nodes with that degree.
    """
    distribution: Dict[int, int] = {}

    for node in graph.nodes:
        degree = graph.get_degree(node.id, direction)
        distribution[degree] = distribution.get(degree, 0) + 1

    return dict(sorted(distribution.items()))

get_graph_statistics

get_graph_statistics

get_graph_statistics(
    graph: KnowledgeGraph, compute_centrality: bool = False
) -> Dict[str, Any]

Get comprehensive statistics about the graph.

Parameters:

Name Type Description Default
graph KnowledgeGraph

The knowledge graph.

required
compute_centrality bool

If True, compute betweenness centrality (slower).

False

Returns:

Type Description
Dict[str, Any]

Dictionary with various statistics.

Example
stats = get_graph_statistics(graph)
print(f"Density: {stats['density']:.4f}")
# Density: 0.0213
print(f"Components: {stats['num_components']}")
# Components: 1
Source code in biodbs/_funcs/graph/utils.py
def get_graph_statistics(
    graph: KnowledgeGraph,
    compute_centrality: bool = False,
) -> Dict[str, Any]:
    """Get comprehensive statistics about the graph.

    Args:
        graph: The knowledge graph.
        compute_centrality: If True, compute betweenness centrality (slower).

    Returns:
        Dictionary with various statistics.

    Example:
        ```python
        stats = get_graph_statistics(graph)
        print(f"Density: {stats['density']:.4f}")
        # Density: 0.0213
        print(f"Components: {stats['num_components']}")
        # Components: 1
        ```
    """
    n = graph.node_count
    m = graph.edge_count

    stats: Dict[str, Any] = {
        "name": graph.name,
        "source": graph.source.value,
        "num_nodes": n,
        "num_edges": m,
    }

    if n > 0:
        # Density (for directed graph: m / (n * (n-1)))
        max_edges = n * (n - 1) if n > 1 else 1
        stats["density"] = m / max_edges if max_edges > 0 else 0.0

        # Degree statistics
        degrees = [graph.get_degree(node.id, "both") for node in graph.nodes]
        stats["avg_degree"] = sum(degrees) / n
        stats["max_degree"] = max(degrees)
        stats["min_degree"] = min(degrees)

        # Out-degree statistics
        out_degrees = [graph.get_degree(node.id, "outgoing") for node in graph.nodes]
        stats["avg_out_degree"] = sum(out_degrees) / n

        # In-degree statistics
        in_degrees = [graph.get_degree(node.id, "incoming") for node in graph.nodes]
        stats["avg_in_degree"] = sum(in_degrees) / n

        # Isolated nodes (degree 0)
        stats["num_isolated"] = sum(1 for d in degrees if d == 0)

        # Connected components
        components = get_all_connected_components(graph)
        stats["num_components"] = len(components)
        stats["largest_component_size"] = max(len(c) for c in components) if components else 0

        # Node type distribution
        stats["node_type_counts"] = graph.get_node_type_counts()

        # Edge type distribution
        stats["edge_type_counts"] = graph.get_edge_type_counts()

        # Self-loops
        stats["num_self_loops"] = sum(
            1 for edge in graph.edges if edge.source == edge.target
        )

        # Compute centrality if requested
        if compute_centrality and n <= 1000:  # Only for smaller graphs
            stats["betweenness_centrality"] = compute_betweenness_centrality(
                graph, normalized=True
            )

    return stats

format_statistics

format_statistics

format_statistics(stats: Dict[str, Any]) -> str

Format graph statistics as a readable string.

Parameters:

Name Type Description Default
stats Dict[str, Any]

Dictionary from get_graph_statistics().

required

Returns:

Type Description
str

Formatted string representation.

Source code in biodbs/_funcs/graph/utils.py
def format_statistics(stats: Dict[str, Any]) -> str:
    """Format graph statistics as a readable string.

    Args:
        stats: Dictionary from get_graph_statistics().

    Returns:
        Formatted string representation.
    """
    lines = [
        f"Graph Statistics: {stats.get('name', 'Unknown')}",
        "=" * 50,
        f"Nodes: {stats.get('num_nodes', 0)}",
        f"Edges: {stats.get('num_edges', 0)}",
        f"Density: {stats.get('density', 0):.6f}",
        "",
        "Degree Statistics:",
        f"  Average: {stats.get('avg_degree', 0):.2f}",
        f"  Maximum: {stats.get('max_degree', 0)}",
        f"  Minimum: {stats.get('min_degree', 0)}",
        "",
        f"Isolated nodes: {stats.get('num_isolated', 0)}",
        f"Self-loops: {stats.get('num_self_loops', 0)}",
        f"Connected components: {stats.get('num_components', 0)}",
        f"Largest component: {stats.get('largest_component_size', 0)} nodes",
    ]

    if stats.get("node_type_counts"):
        lines.append("")
        lines.append("Node types:")
        for node_type, count in stats["node_type_counts"].items():
            lines.append(f"  {node_type.value}: {count}")

    if stats.get("edge_type_counts"):
        lines.append("")
        lines.append("Edge types:")
        for edge_type, count in stats["edge_type_counts"].items():
            lines.append(f"  {edge_type.value}: {count}")

    return "\n".join(lines)