Karpathy's LLM Wiki, published in April, is the foundation for documenting a legacy DWH for a Swiss insurer

Andrej Karpathy published a gist on 03.04.2026 called LLM Wiki. We have been running its pattern for the past few weeks on a real legacy data warehouse at a large Swiss insurance company through Callista. Last week he announced his move to Anthropic, which felt like a good moment to write this up. The pattern works really well: per-program markdowns, aggregated architecture and domain pages, parallel target analysis, mapping, gap ADRs, then code.

Date

May 27, 2026

What Karpathy published

The gist is short. Three layers and three operations.

Three layers:

raw/ is immutable. Papers, transcripts, repositories, datasets. The LLM never edits this.
wiki/ is where the LLM lives. Markdown pages, entity files, cross-references, summaries. The LLM creates, updates and links everything in there. It owns the wiki entirely.
CLAUDE.md (or AGENTS.md for Codex) is the schema layer. Conventions, templates, workflows. The line that turns a model from a chatbot into a disciplined wiki maintainer.

Three operations:

ingest processes a new source into the wiki.
query answers a question from the wiki, then files the answer back if it is worth keeping.
lint checks the markdown graph for contradictions, stale claims, orphans and missing links.

His personal research wiki sits at roughly 100 articles and 400'000 words, with the LLM doing nearly all the editing. He has largely moved from using LLMs to write code towards using them to maintain a compounding knowledge base.

An aspect worth highlighting: the LLM Wiki is a practical answer to memory across sessions. A persistent, structured, self-maintaining markdown layer that any agent can read from and write to. That is the part that makes it interesting for legacy application documentation.

Why this lands for legacy migration

The hard part of moving a 20-year-old data warehouse is not code generation. Modern LLMs will emit PySpark from SAS without much complaint. The hard part is comprehension. You cannot map what you do not understand. You cannot understand 20 years of insurance product logic by re-reading SAS macros every session, especially when the original authors of those macros retired years ago.

Comprehension does not happen once. It compounds. Karpathy's pattern is exactly that shape.

Linear migration pipelines, drawn on conference stages as AI Doc, AI Migration, AI Verification and AI Test Execution, work well for 1:1 ports. We use them ourselves where source and target are close enough that a structural translation is the right tool. They are the right shape for the right job.

For a re-architecture, the work is different. We map the layers of a legacy DWH onto a Medallion architecture on Databricks. The business logic stays. The implementation follows the new architecture rules. There is no 1:1 path from one to the other. There is a comprehension path first, then a mapping path, then a re-implementation that builds on both.

What this looks like in our case

Here is the shape, in the order we run it.

1. Per-program markdown with frontmatter. One markdown file per legacy program or unit. The frontmatter captures what it is, what it touches, inputs, outputs, business domain, lineage. Program by program. File by file. Small files. The rawest comprehension layer, and the most important one.

2. Aggregate the program-level markdowns into architecture and domain analysis pages. Once a few hundred per-program files exist, they compound. Domain models, lineage maps, business logic clusters, integration points. The aggregation does not invent. It reads the underlying files and writes summaries that link back. The graph of linked markdown starts to mean something.

3. Analyse the target platform in the same shape. Not only the legacy. The target stack has its own architecture, capabilities and patterns. We build a parallel set of pages describing what the target already provides. Legacy pages tell us what we need. Target pages tell us what we already have.

4. Derive the mapping. Legacy "what we need" against target "what we already have". The mapping is the alignment layer between the two sides.

5. Identify the gaps. A gap is the delta where the legacy needs something the target does not naturally provide, or the other way around. Each gap is its own page, linking back to the legacy, the target and the domain analyses behind it.

6. Decide on the gaps. File ADR entries cleanly. Each architectural decision is a numbered ADR entry with a date, the decision, the alternatives considered and the consequence. In a regulated migration, this is how the work stays traceable end to end.

7. Solution Design and Spec for implementation. Only then does code generation make sense. By the time the coding agent sees a unit of work, it sees a spec written against an understood gap, against a deliberate decision, against an aggregated domain analysis, against per-program comprehension. The model is no longer guessing.

What this maps to in Karpathy's terms: the DB2, SAS and COBOL files are raw/. The per-program markdowns, the architecture pages, the domain analysis pages, the target pages, the mapping, the gaps and the ADR entries are wiki/.

The load-bearing principle

Small files first, then build from there.

What stands out to me about this shape: we do not review every per-program file. We review the aggregations. The per-program markdowns give us a clean baseline underneath them, but they are not the artefact a domain expert spends an hour with. The aggregation pages are.

The baseline matters for two reasons. One, it gives the agent something to compress upwards. Aggregations that are not grounded in per-program files are confident guesses waiting to break. Two, it enables progressive disclosure. The agent can answer a question by starting at a top-level architecture page, drilling into a domain analysis, then into a specific program's frontmatter, then into the underlying code, and stopping when it has enough. The same shape supports structured queries and unstructured questions from the team in chat.

Small files at the bottom keep the comprehension layer honest. The aggregations on top keep it readable.

Adapting the pattern to enterprise legacy

Karpathy uses the LLM Wiki for personal research. He is the author and the only reader who needs to trust it. We took the same idea and pointed it at enterprise legacy documentation, where the writers are multiple agents and the readers are multiple teams across business and engineering. The shape of what we ended up with reflects that context.

What our setup includes, building on his pattern:

Deterministic frontmatter schemas, validated at write time. The documentation has to be queryable as data, not just as prose.
Explicit separation between legacy pages, target pages and gap pages. Each links to the other two. This makes the work traceable end to end.
ADR discipline at the gap level. Every meaningful decision is a numbered entry with a date, a decision and a consequence.
Enterprise endpoints for models and inference. The whole loop is gated by the client's enterprise compliance layer.
Human in the lead, always. The lint pass surfaces potential contradictions, missing links and stale claims as questions for a person, not as automated edits. The agent maintains the documentation. A human decides what to trust.

All of it is downstream of the core idea Karpathy crystallised in the gist.

Where this generalises

The work above is what we do at Callista. I'm the architect there, responsible for the shape of the approach.

The reason I'm also building Shipwright is that this same shape applies far beyond legacy migration. Greenfield projects, brownfield additions, ongoing iteration. Anywhere artefacts should compound rather than evaporate, the pattern holds. Shipwright is what I'm building to operationalise it across the full SDLC, pragmatically.

→ Explore Shipwright

Comprehension is the migration. Small files first. Build from there.

Sources:

LLM Wiki Gist - github.com (Andrej Karpathy) - 03.04.2026 - https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Beyond RAG: How Andrej Karpathy's LLM Wiki Pattern Builds Knowledge That Actually Compounds - Level Up Coding - April 2026 - https://levelup.gitconnected.com/beyond-rag-how-andrej-karpathys-llm-wiki-pattern-builds-knowledge-that-actually-compounds-31a08528665e
Andrej Karpathy joins Anthropic pre-training team - x.com (Andrej Karpathy) - 19.05.2026 - https://x.com/karpathy/status/2056753169888334312
Callista AI - Agentic legacy modernisation methodology - ai.callista.ch - 2026 - https://ai.callista.ch/