Skip to content

Tutorial

A progressive walkthrough of knowledgecomplex, from schema definition through algebraic topology.

1. Define a schema

A schema declares vertex, edge, and face types with attributes. The SchemaBuilder generates OWL and SHACL automatically.

from knowledgecomplex import SchemaBuilder, vocab, text

sb = SchemaBuilder(namespace="vv")

# Vertex types (subclass of kc:Vertex)
sb.add_vertex_type("requirement", attributes={"title": text()})
sb.add_vertex_type("test_case",   attributes={"title": text()})

# Edge type with controlled vocabulary (enforced via sh:in)
sb.add_edge_type("verifies", attributes={
    "status": vocab("passing", "failing", "pending"),
})

# Face type
sb.add_face_type("coverage")

Attribute descriptors

Descriptor What it generates Example
text() xsd:string, required, single-valued title: text()
text(required=False) xsd:string, optional notes: text(required=False)
text(multiple=True) xsd:string, required, multi-valued tags: text(multiple=True)
vocab("a", "b") sh:in ("a" "b"), required, single-valued status: vocab("pass", "fail")

Type inheritance and binding

Types can inherit from other user-defined types. Child types can bind inherited attributes to fixed values:

sb.add_vertex_type("document", attributes={"title": text(), "category": text()})
sb.add_vertex_type("specification", parent="document",
                   attributes={"format": text()},
                   bind={"category": "structural"})

Introspection

sb.describe_type("specification")
# {'name': 'specification', 'kind': 'vertex', 'parent': 'document',
#  'own_attributes': {'format': text()},
#  'inherited_attributes': {'title': text(), 'category': text()},
#  'all_attributes': {'title': text(), 'category': text(), 'format': text()},
#  'bound': {'category': 'structural'}}

sb.type_names(kind="vertex")   # ['document', 'specification']

2. Build a complex

A KnowledgeComplex manages instances. Every write triggers SHACL verification — the graph is always in a valid state.

from knowledgecomplex import KnowledgeComplex

kc = KnowledgeComplex(schema=sb)

# Vertices have no boundary — always valid
kc.add_vertex("req-001", type="requirement", title="Boot time < 5s")
kc.add_vertex("tc-001",  type="test_case",   title="Boot smoke test")
kc.add_vertex("tc-002",  type="test_case",   title="Boot regression")

# Edges need their boundary vertices to already exist (slice rule)
kc.add_edge("ver-001", type="verifies",
            vertices={"req-001", "tc-001"}, status="passing")
kc.add_edge("ver-002", type="verifies",
            vertices={"req-001", "tc-002"}, status="pending")
kc.add_edge("ver-003", type="verifies",
            vertices={"tc-001", "tc-002"}, status="passing")

# Faces need 3 boundary edges forming a closed triangle
kc.add_face("cov-001", type="coverage",
            boundary=["ver-001", "ver-002", "ver-003"])

What gets enforced

Constraint When What happens
Type must be registered Before RDF assertions ValidationError
Boundary cardinality (2 for edges, 3 for faces) Before SHACL ValueError
Boundary elements must exist in complex (slice rule) SHACL on write ValidationError + rollback
Vocab values must be in allowed set SHACL on write ValidationError + rollback
Face boundary edges must form closed triangle SHACL on write ValidationError + rollback

Element handles

elem = kc.element("req-001")
elem.id      # "req-001"
elem.type    # "requirement"
elem.attrs   # {"title": "Boot time < 5s"}

kc.element_ids(type="test_case")   # ["tc-001", "tc-002"]
kc.elements(type="test_case")      # [Element('tc-001', ...), Element('tc-002', ...)]

3. Topological queries

Every query returns set[str] for natural set algebra. All accept an optional type= filter.

# Boundary operator ∂
kc.boundary("ver-001")     # {'req-001', 'tc-001'}  (edge → vertices)
kc.boundary("cov-001")     # {'ver-001', 'ver-002', 'ver-003'}  (face → edges)
kc.boundary("req-001")     # set()  (vertex → empty)

# Coboundary (inverse boundary)
kc.coboundary("req-001")   # {'ver-001', 'ver-002'}  (vertex → incident edges)

# Star: all simplices containing σ as a face
kc.star("req-001")         # req-001 + incident edges + incident faces

# Closure: smallest subcomplex containing σ
kc.closure("cov-001")      # cov-001 + 3 edges + 3 vertices

# Link: Cl(St(σ)) \ St(σ)
kc.link("req-001")

# Skeleton: elements up to dimension k
kc.skeleton(0)   # vertices only
kc.skeleton(1)   # vertices + edges

# Degree
kc.degree("req-001")   # 2

# Subcomplex check
kc.is_subcomplex({"req-001", "tc-001", "ver-001"})   # True
kc.is_subcomplex({"ver-001"})                         # False (missing vertices)

# Set algebra composes naturally
shared = kc.star("req-001") & kc.star("tc-001")

4. Local partitioning

The topological queries above use combinatorial adjacency — boundary, star, and closure walk the simplicial structure directly. Local partitioning uses diffusion instead: spread probability from a seed and sweep the result to find a natural cluster boundary. This finds structure that combinatorial queries miss.

Requires pip install knowledgecomplex[analysis].

Graph partitioning (vertex clusters)

Diffuse from a seed vertex using personalized PageRank or the heat kernel, then sweep the resulting distribution to find a cut with low conductance:

from knowledgecomplex.analysis import (
    approximate_pagerank, heat_kernel_pagerank,
    sweep_cut, local_partition,
)

# Approximate PageRank: push-based diffusion (Andersen-Chung-Lang)
p, r = approximate_pagerank(kc, seed="req-001", alpha=0.15)
# p is a sparse dict of vertex → probability; more mass near seed

# Heat kernel PageRank: exponential diffusion (Fan Chung)
rho = heat_kernel_pagerank(kc, seed="req-001", t=5.0)
# t controls locality: small t = tight cluster, large t = broad spread

# Sweep either distribution to find a low-conductance cut
cut = sweep_cut(kc, p)
cut.vertices      # set of vertex IDs on the small side
cut.conductance   # Cheeger ratio — lower means cleaner partition

# Or use local_partition for the full pipeline in one call
cut = local_partition(kc, seed="req-001", method="pagerank")
cut = local_partition(kc, seed="req-001", method="heat_kernel")

Edge partitioning (simplicial clusters)

The simplicial version replaces the graph Laplacian with the Hodge Laplacian on edges. Instead of partitioning vertices, it partitions edges — finding clusters of relationships:

from knowledgecomplex.analysis import edge_local_partition

# Hodge PageRank: (βI + L₁)⁻¹ χ_e — diffusion on the edge space
cut = edge_local_partition(kc, seed_edge="ver-001", method="hodge_pagerank")

# Hodge heat kernel: e^{-tL₁} χ_e — exponential diffusion on edges
cut = edge_local_partition(kc, seed_edge="ver-001", method="hodge_heat", t=5.0)

cut.edges         # set of edge IDs in the cluster
cut.conductance   # edge conductance

The key difference: graph partitioning asks "which vertices are near this vertex?" while edge partitioning asks "which relationships are near this relationship?" — a question that only makes sense in a simplicial complex, not in a plain graph.

5. Algebraic topology

Requires pip install knowledgecomplex[analysis].

from knowledgecomplex.analysis import (
    boundary_matrices, betti_numbers, euler_characteristic,
    hodge_laplacian, edge_pagerank, hodge_decomposition, hodge_analysis,
)

# Boundary matrices (sparse)
bm = boundary_matrices(kc)
# bm.B1: (n_vertices × n_edges), bm.B2: (n_edges × n_faces)
# Invariant: B1 @ B2 = 0 (∂₁ ∘ ∂₂ = 0)

# Betti numbers
betti = betti_numbers(kc)     # [β₀, β₁, β₂]
chi = euler_characteristic(kc) # V - E + F = β₀ - β₁ + β₂

# Hodge Laplacian
L1 = hodge_laplacian(kc)      # B1ᵀB1 + B2B2ᵀ (symmetric PSD)
# dim(ker L₁) = β₁

# Edge PageRank
pr = edge_pagerank(kc, "ver-001", beta=0.1)   # (βI + L₁)⁻¹ χ_e

# Hodge decomposition: flow = gradient + curl + harmonic
decomp = hodge_decomposition(kc, pr)
# decomp.gradient  — im(B1ᵀ), vertex-driven flow
# decomp.curl      — im(B2), face-driven circulation
# decomp.harmonic  — ker(L₁), topological cycles

# Full analysis in one call
results = hodge_analysis(kc, beta=0.1)

All analysis functions accept an optional weights dict mapping element IDs to scalar weights, which factor into the Laplacian as diagonal weight matrices.

6. Filtrations

A filtration is a nested sequence of valid subcomplexes: C₀ ⊆ C₁ ⊆ ... ⊆ Cₘ.

from knowledgecomplex import Filtration

filt = Filtration(kc)
filt.append({"req-001"})                              # must be valid subcomplex
filt.append_closure({"ver-001"})                       # auto-closes + unions with previous
filt.append_closure({"cov-001"})                       # adds face + all boundary

filt.birth("cov-001")   # index where element first appears
filt.new_at(2)          # elements added at step 2 (Cₚ \ Cₚ₋₁)
filt[1]                 # set of element IDs at step 1

# Build from a scoring function
filt2 = Filtration.from_function(kc, lambda eid: some_score(eid))

7. Parametric sequences

A ParametricSequence views a single complex through a parameterized filter. The complex holds all elements across all parameter values; the filter selects which are "active" at each value. Unlike a Filtration, subcomplexes can shrink — elements can appear and disappear.

from knowledgecomplex import ParametricSequence

# Complex has people with active_from/active_until attributes
seq = ParametricSequence(
    kc,
    values=["Q1", "Q2", "Q3", "Q4"],
    filter=lambda elem, t: elem.attrs.get("active_from", "0") <= t < elem.attrs.get("active_until", "9999"),
)

seq["Q2"]                   # set of element IDs active at Q2
seq.birth("carol")          # "Q2" — first value where carol appears
seq.death("bob")            # "Q3" — first value where bob disappears
seq.active_at("bob")        # ["Q1", "Q2"]
seq.new_at(1)               # elements appearing at Q2
seq.removed_at(2)           # elements disappearing at Q3 (bob left)
seq.is_monotone             # False — people can leave
seq.subcomplex_at(0)        # is the Q1 slice boundary-closed?

for value, ids in seq:
    print(f"{value}: {len(ids)} elements")

The complex is the territory; the parameterized filter is the map.

8. Clique inference

Discover higher-order structure hiding in the edge graph:

from knowledgecomplex import find_cliques, infer_faces

# Pure query — what triangles exist?
triangles = find_cliques(kc, k=3)

# Fill in all triangles as typed faces
added = infer_faces(kc, "coverage")

# Preview without modifying
preview = infer_faces(kc, "coverage", dry_run=True)

9. Export and load

# Export schema + instance to a directory
kc.export("output/my_complex")
# Creates: ontology.ttl, shapes.ttl, instance.ttl, queries/*.sparql

# Reconstruct from exported files
kc2 = KnowledgeComplex.load("output/my_complex")
kc2.audit().conforms   # True

Multi-format serialization:

from knowledgecomplex import save_graph, load_graph

save_graph(kc, "data.jsonld", format="json-ld")
load_graph(kc, "data.ttl")   # additive loading

10. Verification and audit

# Throwing verification
kc.verify()   # raises ValidationError on failure

# Non-throwing audit
report = kc.audit()
report.conforms       # bool
report.violations     # list[AuditViolation]
print(report)         # human-readable summary

# Deferred verification for bulk construction
with kc.deferred_verification():
    for item in big_dataset:
        kc.add_vertex(item.id, type=item.type, **item.attrs)
    # ... add edges, faces ...
# Single SHACL pass runs on exit

# Static file verification (no Python objects needed)
from knowledgecomplex import audit_file
report = audit_file("data/instance.ttl", shapes="data/shapes.ttl",
                    ontology="data/ontology.ttl")

11. Pre-built ontologies

Three ontologies ship with the package:

from knowledgecomplex.ontologies import operations, brand, research

sb = operations.schema()   # actor, activity, resource
sb = brand.schema()        # audience, theme
sb = research.schema()     # paper, concept, note

Gotchas

Issue Detail
Slice rule Boundary elements must exist before the element that references them. Add vertices → edges → faces.
Closed triangle A face's 3 edges must span exactly 3 vertices in a cycle. An open fan or 4-vertex path will fail.
remove_element No post-removal verification. Remove faces before their edges, edges before their vertices.
Schema after load() load() recovers type names, kinds, attributes, and parent relationships from OWL + SHACL. Full describe_type() introspection works after loading.
Deferred verification Inside the context manager, intermediate states need not be valid. Verification runs once on exit.
Face orientation Boundary matrix signs are computed internally to guarantee ∂₁∘∂₂ = 0. The orientation is consistent but not guaranteed to match external conventions.