Introduction #
Karpathy’s tweet was titled “LLM Knowledge Base.” The core message in one sentence: he’d recently moved massive amounts of tokens from code generation toward knowledge manipulation, using LLMs to build his personal knowledge base.
Two days later, he wrote up his thinking as a complete idea file—the now widely-circulated llmwiki. This step was crucial: he upgraded an inspirational tweet into a design document that any agent could pick up and implement. For those of us working with AI, this means you don’t need to wait for a big tech product. Just hand this document to your agent and it can build the system from the blueprint.
The Fatal Flaw of RAG: Knowledge Doesn’t Compound #
What’s been our dominant usage pattern for LLMs over the past two years? Search box, Q&A machine, RAG frontend. You throw in a batch of PDFs and Markdown files, build a vector store, retrieve a few chunks at query time, and stitch together an answer.
This looks clever on any single interaction, but it has a structural problem: knowledge doesn’t accumulate. Every time you ask a question, the model rediscovers the world from raw documents. The same complex question asked today and again next week gets the same fragmented reassembly. You know that familiar frustration: you’ve queried these PDFs ten times, yet the system acts like an assistant with amnesia, rummaging through boxes from scratch every single time1.
What Karpathy wants to replace is exactly this default experience of always assembling answers on the fly without ever turning knowledge into a durable asset.
The Knowledge Compiler: A Paradigm Shift #
Karpathy’s counter-design comes down to one sentence: stop treating Q&A as the endpoint, and start treating it as an intermediate product that gets written back into a long-maintained knowledge system.
In his design, the LLM doesn’t do on-demand retrieval—it does incremental compilation. When you drop a new article into the raw directory, it doesn’t wait for you to come search someday. It immediately reads it, extracts key points, updates relevant concept pages and entity pages, adds cross-links, and even flags conflicts with existing information.
If you need to explain this to your team, Karpathy’s own analogy works perfectly:
- Obsidian is the IDE frontend. You browse raw sources, read wiki pages, and explore the graph view just like flipping through code and call graphs in VS Code
- The LLM agent is the programmer, following schemas (Claude.md, agents.md) to decide how to ingest, query, and link
- The wiki itself is the codebase—readable, diffable, version-controlled with Git. Not a vector array, but a collection of knowledge “code” you can open, refactor, and review anytime
Looking at llmwiki through this lens, you realize Karpathy has essentially ported engineering culture into knowledge management: source code, intermediate artifacts, test lints, version history. Except this time you’re not compiling binaries—you’re compiling your own second brain2.
The Three-Layer Architecture #
Karpathy’s design is very engineering-minded, structured in three layers:
At the bottom: raw sources (original materials). Papers, articles, code repos, screenshots, datasets—everything goes here. The model only reads, never writes, preserving an un-AI-polished version of the ground truth.
In the middle: the wiki layer. All Markdown written by the LLM. Summaries, concept pages, entity pages, comparison tables—everything is incrementally generated, maintained, and cross-linked here. As a human, this is the layer you primarily read.
On top: the schema. Think of it like Claude.md or agents.md—team conventions that dictate directory structure, naming rules, ingest and lint workflows, determining whether the system grows chaotically or stays maintainable over time.
The three layers are simple individually, but the real magic happens in the core actions that flow through them.
Four Core Actions #
Ingest: Incremental Compilation #
When people hear “ingest,” many reflexively think “stuff it into a vector store.” Not here. In llmwiki, ingest is more like an incremental compilation pass.
When a new source enters raw, the agent’s workflow goes roughly like this: read the full text, identify key takeaways, create a new summary page in the wiki, update the index, then propagate new information into related entity and concept pages. A decent article can easily touch 10 to 15 pages:personal relationships get updated, timelines shift, evidence tables gain new rows.
So in this paradigm, ingest isn’t uploading a file—it’s more like committing a complete knowledge change, with the agent automatically updating a dozen references.
Query: Structure-First Retrieval #
Traditional RAG retrieval is flat: given a query, run similarity ranking across the entire corpus, grab top-k chunks, and hand them to the model. It has no idea which chunks are overviews and which are footnotes of a single page.
Karpathy’s approach invests upfront in structure: an index.md that serves as a content-oriented table of contents, each page with a one-line summary, backlinks between entity and concept pages, and a log.md tracking recent ingest and lint activity. With this structure in place, the query flow becomes: read the index first, identify which pages are relevant, drill down through links to specific pages, and only then synthesize an answer within a narrow scope.
For the agent, this is like checking the table of contents and index first, then looking up specific chapters—rather than running KNN on a pile of fragments.
File Back: Good Answers Are Assets #
In traditional usage, you ask a great question, the model delivers a sharp analysis, and it sits in the chat window. A few days later, the conversation scrolls into oblivion and nobody can find it again.
In llmwiki’s design, good answers are assets. If the quality passes muster, it should be written back to the wiki: saved as a new concept page, a decision record, or a comparison table, tagged with sources and dates. Every follow-up question then upgrades the knowledge base. Your exploration paths, the dead ends you and the model discovered together—all become usable context for the next query, instead of cooling off as forgotten chat logs.
Lint: The Test Suite for Knowledge #
Grow without health checks and you end up with a ball of fat. Lint here means essentially what it means in the code world: not writing new things, but periodically scanning the wiki for problems.
It looks for: contradictory conclusions across pages, outdated claims superseded by new material but still lingering, orphan pages with no incoming links, key concepts mentioned repeatedly but lacking their own pages. At a more advanced level, you can have the agent suggest improvements during lint—like “this pattern keeps showing up but doesn’t have a dedicated page, consider creating one.”
With lint, the system actually behaves like a long-maintained codebase rather than an auto-generated pile of notes3.
Fazapedia: Compiling a Life into a Wiki #
When Farza saw Karpathy talk about using LLMs as compilers for personal knowledge bases, it resonated immediately. He didn’t stop at retweeting and liking. He picked a weekend and fed nearly his entire digital life trail into Claude.
Not just a few blog posts—over 2,500 local artifacts: five years of Apple Notes, private journals, thousands of iMessage conversations with friends and co-founders, and a pile of voice memo transcriptions. The agent chewed through it for hours and eventually produced a locally-hosted Markdown wiki of roughly 400 interlinked articles, which he called Fazapedia.
The system didn’t just know his startup timeline. It knew his social network, the anime he liked—almost reconstructing him as a person. Karpathy saw it and immediately endorsed it as the gold standard, a textbook-quality implementation of the idea.
Real Work Benefits #
Faza’s subsequent usage was even more practical. Instead of writing a long prompt from scratch every time explaining “who I am, what design I like, what projects I’ve done,” he simply had the agent use Fazapedia as a knowledge foundation, browsing through his past design critiques, project summaries, and aesthetic rants. Then he only needed to say “make a landing page for this side project,” and the agent would naturally incorporate the UI anti-patterns he’d complained about, the color schemes he hated, and the tone he preferred.
This is the leap from generating one-off answers to doing sustained style alignment and reuse based on your own knowledge base.
Karpathy’s Four Endorsements #
In follow-up discussions, Karpathy gave four very engineering-minded endorsements:
- Explicit: You can clearly read what the model knows, stored as plain text—not an invisible vector array
- Yours: Data lives physically in folders you control, not trapped in any cloud SaaS
- File over app: Everything is Markdown, no vendor lock-in, any editor or toolchain can plug in
- BYOAI (Bring Your Own AI): You plug the model into your folder, rather than shoving your data into some model provider’s black box
Criticisms and Guardrails #
Isn’t This Just Rebranded RAG? #
Skeptics argue that whether you call it a wiki or a compiler, you’re still searching through a pile of Markdown, stuffing relevant content into context, and generating an answer.
Supporters emphasize three differences: first, knowledge gets written back; second, the model works on a pre-synthesized, cross-linked intermediate layer; third, there’s an entire lint process actively maintaining structure. You could put it this way: the retrieval step is indeed still “R,” but before the “G,” there’s a layer of knowledge compilation and health checking.
The AI Slop Concern #
If you keep having the model modify its own writing, won’t it eventually turn into AI-generated slop? The model distills a wiki from human originals with some loss, then next round reads from the wiki and generates new summaries with further loss stacked on top.
This concern is legitimate. That’s precisely why Karpathy emphasizes the immutability of the raw directory. Raw can’t be modified—it’s the master copy you can always return to for verification and recompilation. The key is to explicitly specify in your schema which scenarios require cross-referencing raw, and which conclusions must not self-refer only within the wiki.
The Scale Question #
Many ask: this works fine for a wiki of one or two hundred pages, but what about tens of thousands of pages and millions of characters?
Honestly, what Karpathy is discussing right now is indeed focused on personal research and small-team knowledge bases. At that scale, index-plus-drill-down-plus-moderate-lint is perfectly sufficient. If you want to replace a large enterprise KM system from day one, you’d need a heavier indexing layer. The more reasonable approach is to acknowledge its sweet spot—individuals and small teams—and prove it works there first before talking about distributed knowledge compilation at scale.
The Cognitive Tradeoff #
Many PKM practitioners argue that writing wikis and organizing notes isn’t just about producing an artifact—it’s about forcing yourself to rethink and restructure your understanding. If you outsource all writing and organizing to the LLM and only consume the finished product, you may gradually lose that “thinking-through-organizing” muscle.
My suggestion: draw a clear line. For propositional content that requires serious thinking, write it yourself and let AI only critique and supplement. For repetitive structural work—like unifying 50 interview transcripts into a comparison table—let the agent handle it.
Ecosystem and Outlook #
If this were just Karpathy’s tweet or Farza’s life experiment, it would be easy to dismiss as influencer hype. But what you can see now is: llmwiki already has an open-source implementation running the three-layer architecture with ingest/query/lint, tools like QMD solving the search problem as wikis scale up, and projects like Remember approaching structured brains from a cross-tool sharing angle using the same Obsidian-compatible file layer.
All of this points in one direction: the path of piling knowledge into some SaaS chat window is being steadily eroded by explicit files plus local tools plus pluggable models.
Closing Thoughts #
Starting from Karpathy’s tweet, we’ve walked through the three-layer architecture, Fazapedia, criticisms, and guardrails. If I had to close with one sentence: in the AI era, your real moat isn’t how well you write prompts or how many RAG components you can stack—it’s whether you’ve organized your knowledge into a compounding knowledge codebase.
Models will change. Frameworks will change. But as long as this wiki-style file layer stays in your hands, you can keep plugging in the next generation of AI. If you’re already using Obsidian or Notion, try asking yourself one question: starting today, would you spend a little less of your tokens on writing code, and a little more on compiling knowledge?
-
The core RAG flow is “documents → embedding → vector store → similarity search → stitch an answer,” essentially “looking things up on the fly” with no accumulation between queries. ↩︎
-
This approach of porting software engineering methodology into knowledge management is analogous to the concept of “compilation” in software: source code is processed by a compiler to produce structured intermediate artifacts, rather than being re-interpreted from scratch at every run. ↩︎
-
The term “lint” comes from code-world lint tools (like ESLint, Pylint) that statically scan for potential issues. Here it’s borrowed as a “quality check” pass for the knowledge base. ↩︎