Skip to content

Building Knowledge Graphs

This guide covers how to build knowledge graphs from various biological data sources.

Generic Graph Building

Build a graph from custom nodes and edges:

from biodbs.graph import build_graph, Node, Edge, NodeType, EdgeType

nodes = [
    Node(id="gene:TP53", label="TP53", node_type=NodeType.GENE),
    Node(id="gene:BRCA1", label="BRCA1", node_type=NodeType.GENE),
    Node(id="pathway:apoptosis", label="Apoptosis", node_type=NodeType.PATHWAY),
]

edges = [
    Edge(source="gene:TP53", target="pathway:apoptosis", relation=EdgeType.PARTICIPATES_IN),
    Edge(source="gene:BRCA1", target="pathway:apoptosis", relation=EdgeType.PARTICIPATES_IN),
]

graph = build_graph(nodes, edges, name="MyGraph")

Disease Ontology

Build graphs from Disease Ontology data:

from biodbs.fetch import DO_Fetcher
from biodbs.graph import build_disease_graph

fetcher = DO_Fetcher()

# Get children of a disease term
cancer_data = fetcher.get_children("DOID:162")
graph = build_disease_graph(cancer_data)

print(graph.summary())

Options

graph = build_disease_graph(
    cancer_data,
    name="CancerGraph",
    include_xrefs=True,      # Include cross-references (MESH, UMLS, etc.)
    include_synonyms=False,  # Include synonyms in properties
)

With Hierarchy

To create IS_A edges between parent and children:

from biodbs.graph import build_disease_graph_with_hierarchy

parent = fetcher.get_term("DOID:162")
children = fetcher.get_children("DOID:162")

graph = build_disease_graph_with_hierarchy(parent, children)
# Now has IS_A edges from each child to parent

Gene Ontology (QuickGO)

Build graphs from GO annotations:

from biodbs.fetch import QuickGO_Fetcher
from biodbs.graph import build_go_graph

fetcher = QuickGO_Fetcher()

# Get annotations for a protein
data = fetcher.search_annotations(geneProductId="UniProtKB:P04637")
graph = build_go_graph(data)

print(graph.summary())

Options

graph = build_go_graph(
    data,
    name="TP53_GO",
    include_evidence=True,         # Include evidence codes on edges
    create_annotation_edges=True,  # Create edges between genes and GO terms
)

Reactome Pathways

Build graphs from Reactome analysis results:

from biodbs.fetch import Reactome_Fetcher
from biodbs.graph import build_reactome_graph

fetcher = Reactome_Fetcher()

# Analyze a gene list
data = fetcher.analyze(["TP53", "BRCA1", "BRCA2"])
graph = build_reactome_graph(data)

print(graph.summary())

Options

graph = build_reactome_graph(
    data,
    name="CancerPathways",
    include_species=True,       # Include species info
    include_disease_info=True,  # Include disease pathway flags
)

Pathway Hierarchy

Build from Reactome pathway hierarchy:

from biodbs.graph import build_reactome_hierarchy_graph

# Get hierarchy from fetcher
hierarchy = fetcher.get_events_hierarchy("Homo sapiens")
graph = build_reactome_hierarchy_graph(hierarchy)
# Creates PART_OF edges between pathways

KEGG

Build graphs from KEGG data:

from biodbs.fetch import kegg_list, kegg_link
from biodbs.graph import build_kegg_graph, build_kegg_link_graph

# From pathway list
pathways = kegg_list("pathway", organism="hsa")
graph = build_kegg_graph(pathways, name="HumanPathways")

# From gene-pathway links
links = kegg_link("pathway", "hsa")
graph = build_kegg_link_graph(
    links,
    source_type=NodeType.GENE,
    target_type=NodeType.PATHWAY,
    relation=EdgeType.PARTICIPATES_IN,
)

Merging Graphs

Combine multiple graphs into one:

from biodbs.graph import merge_graphs

# Build individual graphs
disease_graph = build_disease_graph(disease_data)
go_graph = build_go_graph(go_data)
reactome_graph = build_reactome_graph(reactome_data)

# Merge them
merged = merge_graphs(
    disease_graph,
    go_graph,
    reactome_graph,
    name="IntegratedGraph",
)

print(merged.summary())

Note

When merging, duplicate nodes (same ID) are kept as-is (first occurrence wins). Duplicate edges (same source, target, relation) are deduplicated.

Adding Custom Nodes and Edges

After building, you can extend the graph:

# Add nodes
graph.add_node(Node(id="custom:1", label="Custom Node"))

# Add edges (both nodes must exist)
graph.add_edge(Edge(
    source="custom:1",
    target="DOID:162",
    relation=EdgeType.ASSOCIATED_WITH,
))

# Update from another graph
other_graph = build_go_graph(go_data)
graph.update(other_graph)  # Modifies in place

Data Sources

  • Disease Ontology - Fetch disease terms and hierarchies for build_disease_graph().
  • QuickGO - Fetch GO annotations for build_go_graph().
  • Reactome - Fetch pathway data for build_reactome_graph().
  • KEGG - Fetch pathway and gene data for build_kegg_graph().

Next Steps