A cross-runtime trust surface for LLM-rendered text.
SUM focuses on one load-bearing claim: Python, Node, and modern browsers produce byte-identical Ed25519 signatures over the same JCS-canonical bytes for signed render receipts.
Each hosted render returns a detached-JWS receipt (sum.render_receipt.v1) that can be verified offline against JWKS.
View Repository • Hosted Worker • Proof Boundary
What Ships Today
sum-engineon PyPI withsum attest,sum verify,sum resolve,sum ledger,sum inspect, andsum schema- Cloudflare Worker APIs (
/api/render,/api/complete,/api/qid) plus verification surfaces (/.well-known/jwks.json, revoked-kids) - Browser demo for in-browser attest/verify behavior
- Cross-runtime trust triangle in CI (
make xruntime+make xruntime-adversarial) validating valid-input and rejection-class equivalence - MCP server (
sum-mcp) exposing extract/attest/verify/inspect/schema tools
Headline Verification Claims
| Claim | Status | Source |
|---|---|---|
| Three-runtime byte-symmetric Ed25519 over JCS bytes | provable | docs/PROOF_BOUNDARY.md §1.2, §1.3.1 |
Canonical round-trip reconstruct(parse(canonical_tome(S))) == S | provable | docs/PROOF_BOUNDARY.md §1.1 |
Render receipt format (sum.render_receipt.v1) | shipped | docs/RENDER_RECEIPT_FORMAT.md |
| Slider fact preservation median 1.000 (p10 0.769 long / 0.818 short) | empirical-benchmark | docs/SLIDER_CONTRACT.md |
Extraction F1 1.000 (seed_v1), 0.762 precision 1.000 (seed_v2) | empirical-benchmark | docs/PROOF_BOUNDARY.md §2.1 |
Truthfulness Boundary (Explicit)
SUM makes a hard distinction between:
- Mechanically proven surfaces (canonical codec, cross-runtime signature equivalence, verifier parity), and
- Empirical LLM behavior (extraction quality, slider preservation, narrative round-trip fidelity).
Current upstream benchmark reports that the full LLM narrative round-trip remains unsolved on seed_v1 (107.75% drift, exact-match recall 0.12), even while generator fact scoring stays high. This boundary is documented directly in docs/PROOF_BOUNDARY.md and is part of the public claim set.
Technical Surface
- Core package: Python 3.10+ (
sum-engine) - Trust/verification: Ed25519 + JCS canonicalization + detached JWS receipts
- Runtime parity: Python CLI verifier, Node verifier, in-browser verifier
- Infra: Cloudflare Worker for hosted render/verification endpoints
- Protocol surfaces: CanonicalBundle format, render receipts, MCP tool server
Why It Matters
SUM is built for cases where downstream systems need proof that output was produced and signed by a specific issuer under a reproducible canonical format — without pretending that cryptographic attestation alone proves semantic truth.
That explicit separation between attestation guarantees and semantic-quality benchmarks is the core product stance.
License
Apache 2.0.
Repository README
SUM — verifiable bidirectional knowledge distillation
A cross-runtime trust surface for LLM-rendered text. Three runtimes (Python, Node, modern browsers) produce byte-identical Ed25519 signatures over the same JCS-canonical bytes. Every render through the hosted Worker carries a detached-JWS receipt (
sum.render_receipt.v1) that any third party can verify offline against/.well-known/jwks.json. Live at https://sum-demo.ototao.workers.dev.
That is the load-bearing claim and what makes SUM different from a generic summarisation tool. The cryptographic side is mechanically proven — three independent verifier implementations agreeing byte-for-byte on every signed bundle, locked in CI on every PR. The semantic side (extraction quality, slider fact preservation) is empirically measured with explicit per-corpus numbers and explicit per-corpus boundaries; SUM does not blur the line between the two. docs/PROOF_BOUNDARY.md is the arbiter.
Headline supporting numbers (each links to its source of truth):
| Claim | Status | Source |
|---|---|---|
| Three-runtime byte-symmetric Ed25519 over JCS bytes | provable; locked by make xruntime (K1–K4) + make xruntime-adversarial (A1–A6) |
docs/PROOF_BOUNDARY.md §1.2, §1.3.1 |
Canonical round-trip reconstruct(parse(canonical_tome(S))) == S |
provable; 0.00% drift on every CI run | docs/PROOF_BOUNDARY.md §1.1 |
Render receipt — sum.render_receipt.v1, Ed25519 / JCS / detached JWS |
shipped; verifier in three runtimes | docs/RENDER_RECEIPT_FORMAT.md |
| Slider fact preservation: median 1.000, p10 0.769 (long n=16) / 0.818 (short n=8) | empirical-benchmark | docs/SLIDER_CONTRACT.md |
Extraction F1 = 1.000 (seed_v1), 0.762 with precision 1.000 (seed_v2) |
empirical-benchmark | docs/PROOF_BOUNDARY.md §2.1 |
A render receipt verifies the render attestation (issuer signed this tome, these triples, this slider position, this model, at this time). It does not verify the truth of the tome's content — that is what the slider bench measures separately. See docs/RENDER_RECEIPT_FORMAT.md §5 for the explicit trust scope.
Verify it yourself in 60 seconds
The trust loop: hit the live Worker, get back a tome plus a detached Ed25519 JWS over the JCS-canonicalised receipt payload, fetch the issuer JWKS, verify.
# 1. JWKS — single Ed25519 OKP JWK, application/jwk-set+json
curl -sS https://sum-demo.ototao.workers.dev/.well-known/jwks.json | jq .
# → {"keys":[{"crv":"Ed25519","kty":"OKP","x":"...","alg":"EdDSA","use":"sig","kid":"sum-render-2026-04-27-1"}]}
# 2. Render — tome + render_receipt (signed JWS over JCS payload)
curl -sS -X POST https://sum-demo.ototao.workers.dev/api/render \
-H 'content-type: application/json' \
-d '{"triples":[["alice","graduated","2012"],["alice","born","1990"]],
"slider_position":{"density":1.0,"length":0.5,"formality":0.7,"audience":0.5,"perspective":0.5}}' \
| jq '.render_receipt | {schema, kid, payload, jws_segments: (.jws | split(".") | length)}'
A minimal Node verifier using jose + canonicalize is in docs/RENDER_RECEIPT_FORMAT.md §A.5; the same format is reachable from Python (joserfc + jcs), Go, and Rust per §3.
What ships today
| Surface | Status | Verifies |
|---|---|---|
pip install 'sum-engine[sieve]' — sum attest / sum verify / sum resolve / sum ledger / sum inspect / sum schema |
shipped on PyPI | structural reconstruction; HMAC-SHA256 + Ed25519 signatures (W3C VC 2.0 eddsa-jcs-2022) |
Cloudflare Worker at sum-demo.ototao.workers.dev |
shipped | /api/render → tome + render_receipt; /.well-known/jwks.json → JWKS; /api/qid → Wikidata resolver |
Single-file browser demo (single_file_demo/index.html) |
shipped | paste prose → in-browser attest → CanonicalBundle JSON; same bytes verify under node standalone_verifier/verify.js (Chrome / Firefox / Safari with WebCrypto Ed25519 support) |
| Cross-runtime trust triangle | locked by CI (make xruntime) |
K1 / K1-mw / K2 / K3 / K4 — Python ↔ Node ↔ Browser agree byte-for-byte on valid bundles. make xruntime-adversarial adds A1–A6 rejection-class equivalence. |
| 5-axis slider rendering surface | density actioned deterministically; length / formality / audience / perspective LLM-conditioned via the Worker (Anthropic, Cloudflare AI Gateway optional) | bench: median LLM-axis fact preservation 1.000, p10 0.769 (long, n=16) / 0.818 (short, n=8), order preservation 1.000 wherever measurable |
MCP server (sum-mcp console script) |
shipped | five tools (extract / attest / verify / inspect / schema) exposed over stdio; bundles attested via MCP verify byte-identically through the CLI / Node / browser verifiers |
The slider's product claim — axis changes do not lose facts — is the load-bearing empirical result. It is verified by NLI audit on every embedding-flagged "loss" cell; full attribution in docs/SLIDER_CONTRACT.md.
What does NOT yet work — the honest line
SUM measures one capability that the rest of this README's numbers do not yet close: the full LLM narrative round-trip (text → LLM-extract → axioms → LLM-generate → prose' → LLM-extract → axioms'). On seed_v1 this loop produces:
- 107.75% drift (per-document
100 × |A Δ A'| / max(|A|, |A'|)) and - exact-match recall = 0.12 (6 of 50 source triples appear verbatim after the round trip).
The reason both numbers exist together: the generator preserves facts (FActScore 0.94–0.96, §2.4) but the round-trip is dominated by generator elaboration — the LLM produces ~12 reconstructed axioms per source axiom and elaborates around the source claim rather than paraphrasing it.
Canonicalisation alone does not close this gap — measured 2026-04-28 (scripts/bench/runners/canonicalization_replay.py, no LLM cost, operates on cached per-doc data):
| Canonicalisation regime | drift_pct | exact-match recall |
|---|---|---|
| baseline | 107.75 % | 0.12 |
| + predicate normalisation | 107.75 % | 0.12 (zero movement — the L1 falsification) |
| + subject canonicalisation (last-word-as-key) | 106.68 % | 0.16 |
| + aggressive object normalisation (ceiling) | 106.36 % | 0.18 |
Closing the §2.5 gap requires moving the generator (constrained decoding to a pinned vocabulary, or a fidelity-objective fine-tune), not just post-hoc key normalisation. The L0–L3 receipt is the reference baseline against which any future generator-side intervention is measured. Full attribution in docs/PROOF_BOUNDARY.md §2.5.
The deterministic canonical round-trip (the one sum attest | sum verify exercises) is mechanically proven (§1.1, 0.00% drift). The LLM round-trip is not, and this section is here to keep that distinction above the fold.
CLI quick start
pip install 'sum-engine[sieve]'
echo "Alice likes cats. Bob owns a dog." \
| sum attest --extractor=sieve > bundle.json
sum verify --input bundle.json
# → sum: ✓ verified 2 axiom(s), state integer matches (hmac=absent, ed25519=absent)
Add cryptographic attestation with one flag:
# Ed25519 / W3C VC 2.0 (eddsa-jcs-2022)
python -m scripts.generate_did_web --domain your.example --private-key-out keys/issuer.pem
sum attest --ed25519-key keys/issuer.pem < prose.txt | sum verify --strict
# → hmac=absent, ed25519=verified
The same bundle bytes verify under sum verify (Python), node standalone_verifier/verify.js (WebCrypto), and the in-browser demo (SubtleCrypto). docs/DID_SETUP.md walks the did:key / did:web issuer setup. docs/PROOF_BOUNDARY.md §1.3.1 documents what the cross-runtime Ed25519 contract proves.
Calling SUM from MCP-aware LLM clients
pip install 'sum-engine[mcp,sieve]'
# Claude Desktop / Claude Code / Cursor / Continue: add to MCP config:
# { "mcpServers": { "sum": { "command": "sum-mcp" } } }
sum-mcp exposes extract, attest, verify, inspect, schema as MCP tools. Bundles attested via MCP verify byte-identically through the CLI / Node / browser verifiers — same canonical codec. See docs/MCP_INTEGRATION.md for the full client wiring.
Calling SUM over HTTP
The hosted Worker at https://sum.ototao.com exposes /api/render, /api/complete, /api/qid, and the /.well-known/{jwks,revoked-kids}.json verification surfaces. docs/API_REFERENCE.md is the wire spec — request/response shapes, error codes, the six-step receipt-verification flow, working Node + Python examples. Use this when the caller is a web app, mobile app, or server-side service; use the MCP server when the caller is a local LLM client.
How the trust loop fits together
prose ─► /api/render ─► tome
+ render_receipt {kid, payload, jws}
│
▼
/.well-known/jwks.json
(Ed25519 OKP JWK by kid)
│
▼
jose.flattenedVerify(JCS(payload))
│
▼
render attested ✓ — issuer signed
(this tome, these triples, this slider
position, this model, at this time)
The receipt is a render attestation, not a truth oracle. Fact preservation is verified by the bench (NLI audit on weak cells). The receipt is what a downstream system keeps as durable proof; the tome is what a reader consumes. See docs/RENDER_RECEIPT_FORMAT.md §5.
Underlying substrate
Below the slider sits the substrate that earlier phases shipped and verified. Pointers, not paraphrase — every claim links to its source-of-truth doc.
- Canonical round-trip conservation (provable).
reconstruct(parse(canonical_tome(S))) == Sfor every Gödel stateS. 0.00% drift onseed_tiny_v1/seed_v1/seed_v2.docs/PROOF_BOUNDARY.md§1.1. - Cross-runtime state equivalence (provable). Python (
sympy), Node (BigInt + Miller-Rabin), in-browser JS produce byte-identical state integers. Locked by 4 harnesses (make xruntime+make xruntime-adversarial).docs/PROOF_BOUNDARY.md§1.2. - Bundle public-key attestation (provable). Ed25519-signed CanonicalBundles are tamper-detectable by any third party in any of the three runtimes.
docs/PROOF_BOUNDARY.md§1.3.1. - Merkle hash-chain integrity (provable, including under concurrent writers).
docs/PROOF_BOUNDARY.md§1.7. - Extraction F1 (empirical-benchmark). 1.000 on
seed_v1(50 simple-SVO docs); 0.762 with precision 1.000 onseed_v2(20-doc difficulty corpus). Every remainingseed_v2failure is a recall miss, not a truth inversion.docs/PROOF_BOUNDARY.md§2.1. - 103 numbered features, each with a reproducible verification command, in
docs/FEATURE_CATALOG.md.
Reproduce the bench
# Short corpus (n=8, 4–12 triples/doc, ~$0.30, ~2 min with NLI)
bash scripts/bench/run_paragraphs.sh
# Long corpus (n=16, 9–24 triples/doc, ~$1.50, ~10 min with NLI)
bash scripts/bench/run_long_paragraphs.sh
Both runners require OPENAI_API_KEY (NLI audit + extraction). Pinned model snapshots are mandatory; the harness raises SystemExit on unpinned identifiers (see docs/PROOF_BOUNDARY.md §2.6). Output is NDJSON sum.slider_drift_bench.v1, with per-cell strict / normalized / semantic / NLI fact-preservation columns.
Future developments
This roadmap names only unshipped work. Items already landed live in CHANGELOG.md [Unreleased]. Detailed sequencing lives in docs/NEXT_SESSION_PLAYBOOK.md.
Closing the LLM round-trip drift. This is the headline open problem. The full LLM round-trip (text → LLM-extract → axioms → LLM-generate → prose' → LLM-extract → axioms') currently produces 107.75 % drift and 0.12 exact-match recall on seed_v1 — facts preserved, keys not. Closing this gap is a canonicalisation problem (entity resolution, predicate normalisation, pinned-vocabulary extraction); none of those passes are shipped yet. See docs/PROOF_BOUNDARY.md §2.5 for the full attribution and per-document failure modes.
Hardening backlog
sha256_128_v2activation — Node side exists, Python side not yetCURRENT_SCHEME. Pre-empts the 2³² collision frontier./api/qidSPARQL disambiguation — moves entity resolution from the currentwbsearchentities-only path to a target >95 % accuracy floor (the floor itself is unmeasured today).- Threat-model validation — every documented defence in
docs/THREAT_MODEL.mdgets an executable test. - Delta-bundle composition semantics — specifies what
bundle.is_deltameans cross-runtime. - Sigstore / cosign signing of release artifacts.
- LLM-extraction honesty guardrails —
extraction.verifiable: true | falseso signed ≠ true is visible at the consumer interface. - Calibration-set authoring for the Venn-Abers conformal-interval implementation that already ships.
- Remaining sieve recall work on
seed_v2(apposition / relative-clause / compound-conjunct) — gated on the §2.5 work, seedocs/PROOF_BOUNDARY.md§6.
Platform surface (post-hardening)
Source anchoring in the bundle schema, bundle explorer / viewer, sum verify --explain, sum tutorial onboarding, shareable bundle URLs /b/{hash}, PWA-installable demo, sum attest <url> fetch mode. Each item names its dependency in docs/NEXT_SESSION_PLAYBOOK.md.
Verification surface
make help lists every dev command. Common targets:
make install # editable install with sieve + dev extras
make test # full pytest run (1000+ tests)
make xruntime # cross-runtime K1/K1-mw/K2/K3/K4 (Python ↔ Node)
make xruntime-adversarial # rejection-matrix A1–A6
make fortress # 21-check pure-math invariants
make smoke # fresh-venv install + attest|verify round-trip
make demo # open the single-file browser demo
CI runs the full suite on every push (.github/workflows/quantum-ci.yml); the cross-runtime-harness job runs K1–K4 + A1–A6 on Node 22; pypi-install-smoke builds the wheel and runs echo prose | sum attest | sum verify in a throwaway venv.
Truthfulness contract
Every claim in this repo carries an explicit epistemic status — provable, certified, empirical-benchmark, or expert-opinion. The arbiter is docs/PROOF_BOUNDARY.md. A summary surface that quotes an empirical-benchmark number alongside language like "mathematically guaranteed" is a policy violation per §5 and must be corrected.
Performance language (fast, efficient, low-latency, scalable) requires a benchmark in the same commit. Adversarial input agreement (the A-matrix) is a separate proof from valid-input agreement (the K-matrix); both run in CI.
If a number in this README disagrees with docs/PROOF_BOUNDARY.md or docs/SLIDER_CONTRACT.md, the docs are canonical and this README is wrong.
Contributing
- Fork and branch.
make install && make test && make xruntime.- Read
docs/NEXT_SESSION_PLAYBOOK.mdfor principles, stop-the-line triggers, and the work-ordering rule. - Open a PR. Every claim added to docs or commit messages must trace to a test, a measurement, or an explicit
designed, not provedlabel.
CONTRIBUTING.md has the test-gate matrix and the verification-gate runbook.
License
Apache 2.0. See LICENSE.
What's new
Minor-bump feature release. Agentic-first introspection surface on the sum CLI, zero breaking changes.
New subcommands
sum ledger list --db DB [--axiom K] [--since ISO] [--limit N] # NDJSON prov_ids
sum ledger stats --db DB [--pretty] # counts, ts range, chain tip
sum ledger head --db DB [--branch NAME] [--pretty] # state integer per branch
sum inspect bundle.json [--pretty] # structural read, no crypto
sum schema {bundle|provenance|credential} # JSON Schema Draft 2020-12
Each answers a question an LLM agent composing SUM into a larger pipeline previously had to craft raw SQL or docstring-archaeology for.
Why it matters for agents
ledger listturns SQLite-backed provenance into NDJSON agents canjq.inspectreads bundle shape without paying Ed25519 verification cost — agents can route by bundle attributes before deciding whether to run full verify.schemaemits ground-truth JSON Schemas so agents validate SUM output programmatically instead of guessing from prose.
State integers are always string-encoded (never JSON numbers) to preserve arbitrary precision — many agent parsers use 64-bit doubles.
Unchanged
- CLI contract for
attest/verify/resolveand every flag. - CanonicalBundle wire format (
canonical_format_version 1.0.0). - Prime scheme (
sha256_64_v1). - Cryptographic contracts (HMAC, Ed25519, W3C VC 2.0
eddsa-jcs-2022). - Cross-runtime trust triangle — K1/K1-mw/K2/K3/K4 still green; same bundle bytes still verify in Python ↔ Node ↔ Browser.
Tests
14 new cases in Tests/test_sum_cli_agentic.py pin every branch — NDJSON shape, filter composition (--axiom, --limit), stats summary keys, head unknown-branch exit path, inspect on tampered tome reports divergence rather than rejecting, schema title + required subset matches what sum attest actually emits.
Full suite at v0.3.0: 1035 collected / 1027 passed (8-test gap is spacy-dependent cases that skip when en_core_web_sm isn't downloaded). Cross-runtime harness 5/5 PASS.
See CHANGELOG.md for the full contract.