Source: raw/wiki-community-research-2026-04-11.md
A comprehensive survey of how the developer community has implemented, extended, and stress-tested the Karpathy LLM wiki pattern. This research analyzed 50+ Reddit posts, 30+ web sources, and 12 GitHub repositories to identify the most effective patterns, common pitfalls, and proven solutions for LLM-maintained knowledge bases.
Research Scope
- Reddit channels searched: r/ObsidianMD, r/ClaudeAI, r/LocalLLaMA, r/ChatGPT, r/productivity
- GitHub: Topic searches for “llm wiki,” “obsidian AI,” “claude knowledge base”
- Other sources: Hacker News threads, blog posts, documentation sites
- Volume: 50+ Reddit posts, 30+ web articles, 12 GitHub repos with meaningful implementations
Top GitHub Implementations
| Repo | Stars | What Makes It Unique |
|---|---|---|
| SamurAIGPT/llm-wiki-agent | 1,600 | Full-featured wiki agent with plugin architecture for custom ingest processors |
| nashsu/llm_wiki | 703 | Desktop (Electron) app targeting non-technical users; drag-and-drop ingestion, visual graph |
| claude-memory-compiler | 551 | Multi-pass compilation (extract facts, resolve contradictions, generate article); provenance chains |
| claude-obsidian | 489 | Obsidian plugin with compile/query/lint commands; pioneered the “hot cache” pattern |
| Ar9av/obsidian-wiki | 294 | Lightweight ingest pipeline; introduced typed wikilinks with semantic relationships |
| karpathy-wiki-template | 234 | Opinionated template vault with pre-built topic structures and starter CLAUDE.md |
| obsidian-ai-librarian | 187 | Research-on-miss: unanswered queries auto-trigger web research |
| wiki-knowledge-graph | 156 | Neo4j-backed knowledge graph layer for complex relationship queries |
| llm-vault-tools | 142 | CLI toolkit for mass cross-link, bulk lint, migration |
| obsidian-delta-manifest | 112 | Standalone delta manifest for efficient incremental ingestion |
| smart-wiki-sync | 98 | Multi-vault sync with LLM merge conflict handling |
| wiki-lint-ci | 76 | GitHub Actions for wiki lint on every commit |
10 Patterns We Adopted
1. Hot Cache (wiki/hot.md)
Session continuity file that persists state between LLM sessions. Stores what was done, pending actions, active threads, wiki stats, and settings. Prevents cold-start problems.
2. Delta Manifest (.manifest.json)
JSON tracking every source file with sha256 hash, ingestion timestamp, and produced articles. Only new or changed files get processed during ingest. Provides a full audit trail.
3. Contradiction Detection ([!contradiction] callouts)
When a new source contradicts an existing article, both claims are preserved in a structured callout. The human resolves the conflict. No silent overwrites.
4. Provenance Tracking (frontmatter + inline markers)
Every claim is marked: unmarked = extracted from source, ^[inferred] = LLM synthesis, ^[ambiguous] = sources disagree. Article-level provenance field in frontmatter summarizes the mix.
5. Living Overview (wiki/overview.md)
Narrative synthesis answering “what does this wiki know?” — not a table of contents (that is the master index). Updated after every ingest.
6. Cross-Linker (Cross-Link operation)
Automated scan of all articles for unlinked mentions of concepts that exist as other articles. Runs after batch ingests to tighten the knowledge graph.
7. Connections Layer (wiki/connections/)
Dedicated folder for cross-topic synthesis articles. First-class articles combining concepts from 2+ topic folders. Different from simple cross-links.
8. Research-on-Miss (auto_research toggle)
When a query cannot be answered from existing wiki content and the toggle is on, the system auto-researches the gap, ingests results, and re-attempts. Configurable to prevent unwanted research.
9. Typed Wikilinks
Links carry semantic meaning: [[extends::article]], [[contradicts::article]], [[implements::article]]. Enables richer navigation and graph queries.
10. Knowledge Graph Layers
Obsidian’s graph view plus Dataview queries in wiki/dashboard.md provide queryable layers on top of the flat wiki files.
Community Warnings and Our Mitigations
Hallucination Compounding
- Warning: Subtle hallucinations in wiki articles get compounded when future queries cite those articles as sources.
- Our mitigation: Provenance tracking marks every LLM-synthesized claim with
^[inferred]. The lint pass includes hallucination drift detection — re-reading cited raw sources and flagging any statements that shifted from the original. File-back chain integrity verification catches drift in synthesis articles.
Scaling Ceiling (~200-300 articles)
- Warning: Performance degrades once wikis exceed 200-300 articles. Cross-linking gets slow, context windows overflow, ingest accuracy drops.
- Our mitigation: Topic-based index hierarchy (master index routes to topic indexes, topic indexes locate articles). Scale-aware thresholds in the vault schema trigger structural changes at 50, 200, and 500 articles. Dataview dashboards for health monitoring. External search tools (qmd MCP server) planned for the 200+ range.
Token Cost
- Warning: Full wiki operations (ingest + cross-link + lint) on large vaults consume 100K+ tokens per session.
- Our mitigation: Delta manifest skips unchanged files. Targeted file reads (3-4 per query, never loading the full vault). Subagent parallelization for batch ingests of 5+ files.
Vault Contamination
- Warning: Risk of the LLM modifying raw source files or producing articles that drift from sources over time.
- Our mitigation: Immutable
raw/andai-research/layers — the LLM never writes to them. Drift detection during lint. Re-reading original sources before updating existing articles.
Key Takeaways
- The Karpathy wiki pattern has become a de facto standard for LLM-maintained knowledge bases, with 12+ significant open-source implementations. The Claude Code skills ecosystem provides the extensibility layer for building wiki librarian skills.
- The most impactful enhancements are infrastructure patterns (hot cache, delta manifest, provenance tracking) rather than content patterns.
- Hallucination compounding is the #1 community concern — provenance tracking and source-back verification are the primary defenses.
- The pattern scales well to ~200 articles with proper index hierarchy; beyond that, external search tools become necessary.
- All 10 patterns identified in this research have been implemented in this vault’s upgrade.
- The community strongly favors immutable source layers and visible contradiction handling over silent merging.
Related
- Karpathy Pattern — individually-reviewed community implementations (e.g. Stride starter vault)
- Building Skills Guide — skills pattern that powers the wiki librarian
- Skills Ecosystem — how skills like the wiki librarian fit into the broader Claude tools landscape
- Subagents — used for batch ingest parallelization
- Essential MCP Servers — qmd and search tools for wiki scaling
- Design Skills Workflow — another structured workflow pattern comparable to the wiki operations
- Claude Agent Hierarchy — how the wiki librarian fits into the agent taxonomy
Try It
- Review the vault schema. Read
CLAUDE.mdat the vault root to see all 10 patterns in action. - Test the delta manifest. Say “compile” — the system should report 0 new files if nothing has changed since last ingest.
- Trigger research-on-miss. Set
auto_research: trueinwiki/hot.md, then ask a question the wiki cannot answer. Watch the system research, ingest, and re-answer automatically. - Run a lint pass. Say “lint” to see the full 14-check health report, including hallucination drift detection.
- Explore community repos. Check out SamurAIGPT/llm-wiki-agent (1,600 stars) for plugin architecture ideas, or claude-memory-compiler (551 stars) for multi-pass compilation strategies.