(In)Canon Proof-of-work corpus

Federal Register Corpus (1994–2025)

Reference (Zenodo)

Tomé, D. (2025). Deterministic Federal Register Corpus (1994–2025). Zenodo. https://doi.org/10.5281/zenodo.18288508

This page is the public surface of a 30‑year proof‑run: (In)Canon applied end‑to‑end across the modern Federal Register. The output is not “documents”. It is a deterministic structural record that turns verbatim authority into computable commitments while preserving absence.

What it proves is simple and rare: across three decades, administrative authority is overwhelmingly responsibility‑addressable. Once that responsibility floor exists, you can measure complexity, drift, and non‑executable authority without importing interpretation.

This parsing phase for the corpus required 480 compute hours. It is being released for free as a proof‑of‑work artefact. The commercial value is the engine that produced it: (In)Canon — a deterministic admissibility system you can license to run the same analysis on your own corpora.

Temporal span

1994 → 2025

Frozen scope: modern era of the Federal Register baseline.

Daily snapshots

7,873

A full sequence of dated structural observations across the run.

Actor presence rate

99.99%

A responsibility-hardened environment: “who is responsible” is almost always explicit.

Mean dependency rate

0.2052

A measurable “entanglement” factor: the accretion of cross-reference dependency over time.

Deterministic Ground Truth

Spanning 7,873 dated snapshots from 1994 to 2025, this corpus maintains a 99.99% Actor Presence rate. That is not “data quality” in the casual sense. It is a structural guarantee: a stable responsibility floor that lets you quantify dependency accretion, outcome absence, time binding, and cross‑reference drift without guessing what the law “means”.

What this enables

Most compliance and governance work quietly depends on reconstruction: humans fill gaps, infer actors, and retrofit outcomes. This artefact shows where you don’t have to. With a responsibility floor in place, structural signals become measurable inputs: where authority is executable, where it is hollow, and where complexity has accreted beyond local readability.

Federal Register (verbatim history)
        │
        ▼
(In)Canon lens (deterministic admissibility)
        │
        ├─ Baseline of responsibility (actor presence floor)
        ├─ Accretion diagnostics (dependency / entanglement)
        ├─ Absence reporting (not stated stays not stated)
        └─ Audit artefacts (hashes, versions, timestamps)

This public release is intentionally trade-secret safe: it distributes deterministic artefacts and baselines, not internal rule logic.

Responsibility baseline Deterministic Non-inferential Audit-grade outputs Frozen scope 1994–2025

Baseline of Responsibility

The (In)Canon Baseline: 99.99% responsibility mapping

Most public corpora cannot reliably answer the simplest operational question: who is responsible? They are anonymous, fragmented, or structurally inconsistent. The Federal Register is different.

Responsibility-hardened environment

This corpus demonstrates that across the modern era of the Federal Register, a responsible actor is explicitly identified at an extraordinary rate (99.99% Actor Presence). That makes the administrative state a rare kind of corpus: it is responsibility-addressable at scale.

Once you have that deterministic floor, you can measure every other structural property without inventing meaning: outcomes, time binding, dependency accretion, cross-reference debt, and the density of non-executable authority.

What this proves

Responsibility is structurally present (not implied) at state scale.
Large-corpus claims become computable without interpretation.
The baseline becomes a reference surface for anomaly detection.

Why it matters

Most “AI compliance” pipelines still rely on reconstruction.
This corpus separates explicit responsibility from inferred responsibility.
You can audit where authority becomes non-executable without guessing why.

Structural drift (1994–2025)

Regulatory Thicket

The Z-axis: accretion of complexity (dependency entanglement)

The dependency rate is the “interest factor.” It measures the degree to which present units rely on previously stated structure. At scale, this is not a metaphor: it is a quantifiable accretion of complexity.

Accretion of complexity

Over time, regulatory text does not just grow—it entangles. Cross-reference dependencies become technical debt: the present becomes executable only by reconstructing the past. The dependency rate provides a numeric surface for that entanglement.

This corpus turns the Federal Register into a 3D map: time (X), volume (Y), dependency entanglement (Z).

Federal Register structural drift chart: dependency accretion and action–outcome decoupling (1994–2025)

Structural drift (1994–2025): Dependency accretion (black) and Action–Outcome decoupling (“structural friction”, red). These are measured properties of the corpus — not interpretations, not scores.

The point is not “a chart.” The point is that the corpus makes drift observable at national scale: you can see when cross‑reference debt rises and when mandates decouple from defined outcomes — without importing anyone’s theory of why.

The Red Bridge

Integrity by design: we do not invent missing structure

Readers reconstruct. Models reconstruct. Most pipelines quietly conflate reconstruction with evidence. (In)Canon does the opposite: it makes reconstruction visible by refusing to fill gaps.

Canon Evidence vs Interpolated Logic

Transparency guarantee (trade-secret safe)

Canon Evidence (Black)

Verbatim commitments latched from the source. Offsets preserved. Hashes recorded. Deterministic artefacts. This is the ground truth layer.

Interpolated Logic (Red Bridge)

Any attempt to “complete” missing structure is an interpolation. We can build bridges for downstream use, but we never confuse bridges with ground truth.

We provide a 7,873-day sequence that explicitly distinguishes “what is stated” from “what would have to be inferred.” This is how you prevent structural hallucination from becoming institutional fact.

Black evidence vs red bridge demo placeholder

Evidence is evidence, interpolation is interpolation: the chart distinguishes verbatim commitments (black) from downstream bridgework (red).

We don’t generate completeness — we report absence and preserve it.

Structural Rosetta Stone

30 years of verbatim history. One deterministic lens.

The corpus is not a rewrite. It is a translation layer that turns “verbatim authority” into “deterministic commitments” without interpretation. That’s why it’s trade-secret safe: you can publish the artefacts and still protect the engine.

A structural Rosetta Stone

This release shows that a single deterministic admissibility system can ingest an authoritative state corpus and produce: (a) reproducible commitments with verbatim anchors, and (b) explicit absence reporting that never fills gaps.

The scope is frozen (1994–2025). The lens is consistent. The result is a proof-of-work artifact: one method applied end-to-end without interpretive drift.

Verbatim text (as published)
        │
        ▼
Deterministic commitments (verbatim anchors + offsets)
        │
        ├─ presence / absence (stated vs not stated)
        ├─ dependency linkage signals (cross-reference entanglement)
        └─ audit bundle (hashes, versions, timestamps)

Downloads

Free release (trade-secret safe): the proof surface

The links below assume you upload the files into your site’s /assets folder using these exact filenames. Once uploaded, these links work without further HTML edits.

1) Longitudinal baseline (1994–2025)

The Longitudinal Baseline is a high-mass structural record of the modern Federal Register era, spanning from January 1994 to December 2025. It is a deterministic "roll-up" of 7,873 daily snapshots that transforms 30 years of administrative history into a machine-readable, computable artifact. Rather than containing the full text of documents, this baseline tracks the structural health and evolution of authority over time. The primary function of the longitudinal baseline is to establish a "Ground Truth" for how administrative authority behaves as an engineering system.

Download baseline CSV

2) Lateral scaling ledger (public demonstration slice)

This file is the Lateral Scaling Ledger (public demonstration slice), which serves as a technical proof-of-work for the (In)Canon engine. It demonstrates the engine's ability to sample and aggregate data laterally across the Federal Register with deterministic accuracy. This specific slice covers 8 sampling points between January 1994 and April 1995. Each row represents a sample of 5 narratives (administrative units) processed on that specific date. All data was generated using (In)Canon pipeline ensuring consistency across the demonstration run.

Download ledger CSV

3) Full corpus bundle (optional)

The Full Corpus Bundle is the comprehensive, machine-readable output of the entire 30-year (In)Canon proof-run. While the baseline provides daily rates, this bundle contains the actual structural data for every unit processed between 1994 and 2025. It is delivered as a compressed ZIP containing CSV or JSON files organized by year or CFR Title. It establishes the methodology's reach and its ability. It is the ultimate advertisement for this methodology, showing that the system is production-ready for state-scale environments.

Download full corpus ZIP

4) Reproducibility pack (optional)

This package provides the deterministic scaffolding required to verify the structural findings of the 30-year Federal Register run. It is designed for researchers, auditors, and developers who require a "ground truth" verification of the (In)Canon methodology without requiring access to the proprietary engine logic. This pack contains data that is not just an opinion — it is an engineering result that I expect to be audited.

Download reproducibility pack

Note: This pack contains the tools to verify the artifacts provided on this page. To license the full (In)Canon Engine for your own proprietary or public corpora, please see: Contact

What you are downloading

Not “documents.” A deterministic baseline that proves responsibility is explicitly named at scale, plus artefacts that preserve absence instead of filling it. This is the public demonstration surface for licensing the engine.

Integrity guarantees

Audit-grade constraints (the reason this matters)

Deterministic outputs

Same input → same output (no stochastic behaviour)
Stable artefacts support regression tests
Audit bundles: hashes, versions, timestamps

Non-inferential handling

No gap-filling (absence is preserved)
Binary stated vs not stated reporting
No scoring, weighting, or normative judgements in the corpus artefacts

Trade secret safe by design

Public artefacts show behaviour and outputs (what the system returns), but do not disclose internal rule logic, lexicons, or constraint sets that constitute the licensable engine.

The selling point is the boundary: Canon Evidence stays evidence. Everything else is explicitly marked as bridgework, not ground truth.

Licensing (In)Canon

The corpus is free. The engine is licensable.

This corpus exists to prove that (In)Canon can deterministically establish a responsibility baseline and structural diagnostics on an authoritative corpus at national scale. It is the proof surface, not the product.

If you want this capability on your corpus

(In)Canon can be licensed as an internal engine for:

LLM pipeline gating (prevent schema-shaped hallucination)
Regulatory / governance corpus structural baselining
Dependency accretion diagnostics (“regulatory thicket” mapping)
Evidence tooling (verbatim anchors + explicit absence reporting)

Use Contact to discuss licensing.

Boundary

What this page is not

No legal interpretation
No compliance conclusion
No statement of correctness or adequacy

No scoring or ranking presented here as “truth”
No moral, political, or normative judgement
No disclosure of internal rule logic

Core boundary statement

(In)Canon identifies structure and reports stated vs not stated. It does not assess meaning, correctness, quality, compliance, or adequacy.