The Lean Project
Notes on formalizing the C++ standard library in Lean 4, and the manifest machinery — Lean Manifests — that came out of it.
See also: Thoughts on Working with LLMs for general workflow lessons, and the full technical write-up at lean4.ai.
What the project is
Formalize the C++ standard library (N4950, about 1,000 pages of library content) in Lean 4. Replace English-prose specification with machine-checked theorems. The proofs and code are second-class — what matters is the manifest: what’s true, and how confident we are.
Working code at Lean Manifests. 340+ modules, ~950 theorem-like declarations, all building.
The problem that shaped everything
Lean has two ways to stand up a theorem you can’t fully prove:
axiom foo : T— declared, tracked, shows up in the axiom-usage report of anything that depends on it.theorem foo : T := by sorry— emits a warning, contaminates the dependency graph, gives readers no structure.
Every sorry looks the same. A deliberate assumption carrying an author’s full intent is indistinguishable from a stray scratch-hole someone forgot to clean up.
For a formalization project that spans 1,000 pages and will take years, this is untenable. Readers (human and AI) need to scan a module and know immediately: is this claim proven, tested, decomposed, derived from stated assumptions, or just hoped for? Requiring them to open each proof file defeats the point of having a library interface at all.
The three-way split
Lean Manifests impose a discipline on how a module’s files are organized:
| Category | Contents | Visibility |
|---|---|---|
| Manifests | Claims: theorem signatures with explicit evidence levels | Public |
Vocabulary (Defs/) |
Types and predicates that appear in the claims | Public |
Everything else (Code/, Proofs/, Tests/) |
How each claim is established | Hidden |
A consumer reads the manifest plus the vocabulary. That tells them what the module claims and enough to understand the claims. They never need to open a proof file.
The evidence hierarchy
Five new kinds of sorry, each less apologetic than the last:
sorry "I have nothing."
UnprovenConjecture foo "I claim this. I haven't proven it.
I intend to eventually."
TestedConjecture foo "I claim this. I've verified one concrete
case (foo_test)."
DecomposedConjecture foo "I claim this. I've proven the structure
(foo_derivation), and every lemma it
bottoms out in is at least tested."
DerivedConjecture foo "I claim this. I've proven it, modulo
assumptions declared elsewhere."
ProvenTheorem foo "I claim this. Fully apologized:
no sorry anywhere."
Each is a macro. Each checks different obligations at elaboration time. The macro name appears in the source, so scanning a module gives the evidence level at a glance. No compiler patches — about 300 lines of metaprogramming on top of stock Lean 4.
Teeth (the enforcement)
The hierarchy was useful as a convention for years. It became a contract when we added the enforcement: every macro stamps its declaration with @[manifest_entry], and DerivedConjecture / DecomposedConjecture refuse to compile if any sorry-bearing dependency lacks the attribute.
Concretely: if a derivation depends on a stray theorem X := by sorry in a proof file (instead of a declared UnprovenConjecture X : T in a manifest), the build fails with a pointed error message naming the stray and suggesting the fix.
Before the enforcement, the trust report emitted by DerivedConjecture listed sorry deps by name — but those names could point at undeclared holes, and a reader had no way to tell the difference between a carefully-declared assumption and a forgotten scratch-hole. After the enforcement, the trust report is guaranteed complete: every entry is a named, attributed manifest entry, not a stray sorry.
A reader skimming a derivation can see everything it assumes by reading the manifest files. There is no off-the-record sorry, because any off-the-record sorry would have broken the build.
Why this matters for AI
LLM agents that work on Lean code today face a comprehension crisis: the library is too big to fit in context, proof bodies are unnecessary for using a theorem, but standard Lean gives no machine-extractable distinction between “proven” and “sorry’d” at the interface. Every consumer has to open every proof file to know what’s real.
Manifests + evidence hierarchy + enforcement solves this: an AI scans the manifest, sees the evidence level of each claim, and proceeds accordingly. A claim marked DerivedConjecture points at its stated dependencies; if the AI needs higher confidence than the derivation offers, it knows exactly which assumptions to strengthen.
The 42% token savings reported in the full write-up is the quantitative measure. The qualitative win is that the evidence is machine-readable and enforced, not a stylistic convention in a coding guide.
A note on the name
The project was originally called “Levelized Lean” — a nod to John Lakos’s physical-design framework for C++ (level-based decoupling to break compile-time cascades). That name fit when the main contribution was cascade prevention. Lean 4.30’s module keyword has since moved most of the cascade-prevention work into the compiler itself. What’s left — and what’s load-bearing now — is the evidence hierarchy, the vocabulary separation, and the teeth.
So the project is now Lean Manifests. The public write-up lives at lean4.ai/lean-manifests.html.
Manifests as tripwires, not design specs
Here’s an honest admission: I treat manifests as a build-verification system, not a design system. The kernel-checked claims do real work. The non-kernel-checked claims are mostly documentation that gets out of date if I’m not careful.
The evidence hierarchy catches you after you’ve already written the wrong thing. Concrete example: I recently migrated 24 tool functions in l3m to a new capability-based API. I didn’t consult the manifests first. I just wrote the code, ran lake build, and the kernel caught every bypass — functions that still used the old unconstrained paths, derivations whose sorry-deps pointed at stray theorems that no longer existed. The build did the work. The manifests were tripwires, not blueprints.
This is fine for verification. It’s not fine for design. The manifests say “confineIO rejects symlink escapes” and “ProcessCap.runTame only spawns TameCommands” — but when you’re sitting in Tools/Lake.lean adding a --json flag, you don’t naturally think “let me go read the manifest for this function first.” You grep for the function, read the code, make the change, and find out at build time whether you violated anything.
The discoverability problem is structural: theorems are organized by concept (all path-confinement theorems together, all MemFS theorems together), not by function. If you’re modifying FsCap.writeFile, you’d need to know that confine_within_root and confine_rejects_dotdot and confineIO_rejects_symlink_escape all apply to it. Nothing in the code points you there.
We’re trying a fix: a @[theorems] attribute that tags each function with its relevant theorems, plus a generated index that maps functions to their theorem obligations. The index currently covers 23 functions and 25 theorems. You can look up L3m.confineIO and immediately see the three theorems that constrain it.
Will this change behavior? Honestly, I don’t know. It provides better infrastructure for consultation — but whether a developer (or an LLM agent) actually consults before modifying is a workflow question, not a tooling question. The build will still catch you either way. The hope is that the index makes it cheap enough to check first that people actually do it. We’ll see.
What’s next
The Lean Manifests work covers the C++ standard library. lean4.ai has the public artifacts: a headerized version of CSLib (239 files, 220+ theorems in the manifests), interval arithmetic, happens-before memory models. Lean Manifests is the underlying machinery in each.
Related: l3m — a verified coding agent built on the manifest macros. Each tool carries a manifest; each manifest bottoms out in named axioms about the environment; the trust report is a build artifact. Work in progress.