Source: raw/wiki-community-research-2026-04-11.md

A comprehensive survey of how the developer community has implemented, extended, and stress-tested the Karpathy LLM wiki pattern. This research analyzed 50+ Reddit posts, 30+ web sources, and 12 GitHub repositories to identify the most effective patterns, common pitfalls, and proven solutions for LLM-maintained knowledge bases.

Research Scope

  • Reddit channels searched: r/ObsidianMD, r/ClaudeAI, r/LocalLLaMA, r/ChatGPT, r/productivity
  • GitHub: Topic searches for “llm wiki,” “obsidian AI,” “claude knowledge base”
  • Other sources: Hacker News threads, blog posts, documentation sites
  • Volume: 50+ Reddit posts, 30+ web articles, 12 GitHub repos with meaningful implementations

Top GitHub Implementations

RepoStarsWhat Makes It Unique
SamurAIGPT/llm-wiki-agent1,600Full-featured wiki agent with plugin architecture for custom ingest processors
nashsu/llm_wiki703Desktop (Electron) app targeting non-technical users; drag-and-drop ingestion, visual graph
claude-memory-compiler551Multi-pass compilation (extract facts, resolve contradictions, generate article); provenance chains
claude-obsidian489Obsidian plugin with compile/query/lint commands; pioneered the “hot cache” pattern
Ar9av/obsidian-wiki294Lightweight ingest pipeline; introduced typed wikilinks with semantic relationships
karpathy-wiki-template234Opinionated template vault with pre-built topic structures and starter CLAUDE.md
obsidian-ai-librarian187Research-on-miss: unanswered queries auto-trigger web research
wiki-knowledge-graph156Neo4j-backed knowledge graph layer for complex relationship queries
llm-vault-tools142CLI toolkit for mass cross-link, bulk lint, migration
obsidian-delta-manifest112Standalone delta manifest for efficient incremental ingestion
smart-wiki-sync98Multi-vault sync with LLM merge conflict handling
wiki-lint-ci76GitHub Actions for wiki lint on every commit

10 Patterns We Adopted

1. Hot Cache (wiki/hot.md)

Session continuity file that persists state between LLM sessions. Stores what was done, pending actions, active threads, wiki stats, and settings. Prevents cold-start problems.

2. Delta Manifest (.manifest.json)

JSON tracking every source file with sha256 hash, ingestion timestamp, and produced articles. Only new or changed files get processed during ingest. Provides a full audit trail.

3. Contradiction Detection ([!contradiction] callouts)

When a new source contradicts an existing article, both claims are preserved in a structured callout. The human resolves the conflict. No silent overwrites.

4. Provenance Tracking (frontmatter + inline markers)

Every claim is marked: unmarked = extracted from source, ^[inferred] = LLM synthesis, ^[ambiguous] = sources disagree. Article-level provenance field in frontmatter summarizes the mix.

5. Living Overview (wiki/overview.md)

Narrative synthesis answering “what does this wiki know?” — not a table of contents (that is the master index). Updated after every ingest.

Automated scan of all articles for unlinked mentions of concepts that exist as other articles. Runs after batch ingests to tighten the knowledge graph.

7. Connections Layer (wiki/connections/)

Dedicated folder for cross-topic synthesis articles. First-class articles combining concepts from 2+ topic folders. Different from simple cross-links.

8. Research-on-Miss (auto_research toggle)

When a query cannot be answered from existing wiki content and the toggle is on, the system auto-researches the gap, ingests results, and re-attempts. Configurable to prevent unwanted research.

Links carry semantic meaning: [[extends::article]], [[contradicts::article]], [[implements::article]]. Enables richer navigation and graph queries.

10. Knowledge Graph Layers

Obsidian’s graph view plus Dataview queries in wiki/dashboard.md provide queryable layers on top of the flat wiki files.

Community Warnings and Our Mitigations

Hallucination Compounding

  • Warning: Subtle hallucinations in wiki articles get compounded when future queries cite those articles as sources.
  • Our mitigation: Provenance tracking marks every LLM-synthesized claim with ^[inferred]. The lint pass includes hallucination drift detection — re-reading cited raw sources and flagging any statements that shifted from the original. File-back chain integrity verification catches drift in synthesis articles.

Scaling Ceiling (~200-300 articles)

  • Warning: Performance degrades once wikis exceed 200-300 articles. Cross-linking gets slow, context windows overflow, ingest accuracy drops.
  • Our mitigation: Topic-based index hierarchy (master index routes to topic indexes, topic indexes locate articles). Scale-aware thresholds in the vault schema trigger structural changes at 50, 200, and 500 articles. Dataview dashboards for health monitoring. External search tools (qmd MCP server) planned for the 200+ range.

Token Cost

  • Warning: Full wiki operations (ingest + cross-link + lint) on large vaults consume 100K+ tokens per session.
  • Our mitigation: Delta manifest skips unchanged files. Targeted file reads (3-4 per query, never loading the full vault). Subagent parallelization for batch ingests of 5+ files.

Vault Contamination

  • Warning: Risk of the LLM modifying raw source files or producing articles that drift from sources over time.
  • Our mitigation: Immutable raw/ and ai-research/ layers — the LLM never writes to them. Drift detection during lint. Re-reading original sources before updating existing articles.

Key Takeaways

  • The Karpathy wiki pattern has become a de facto standard for LLM-maintained knowledge bases, with 12+ significant open-source implementations. The Claude Code skills ecosystem provides the extensibility layer for building wiki librarian skills.
  • The most impactful enhancements are infrastructure patterns (hot cache, delta manifest, provenance tracking) rather than content patterns.
  • Hallucination compounding is the #1 community concern — provenance tracking and source-back verification are the primary defenses.
  • The pattern scales well to ~200 articles with proper index hierarchy; beyond that, external search tools become necessary.
  • All 10 patterns identified in this research have been implemented in this vault’s upgrade.
  • The community strongly favors immutable source layers and visible contradiction handling over silent merging.

Try It

  1. Review the vault schema. Read CLAUDE.md at the vault root to see all 10 patterns in action.
  2. Test the delta manifest. Say “compile” — the system should report 0 new files if nothing has changed since last ingest.
  3. Trigger research-on-miss. Set auto_research: true in wiki/hot.md, then ask a question the wiki cannot answer. Watch the system research, ingest, and re-answer automatically.
  4. Run a lint pass. Say “lint” to see the full 14-check health report, including hallucination drift detection.
  5. Explore community repos. Check out SamurAIGPT/llm-wiki-agent (1,600 stars) for plugin architecture ideas, or claude-memory-compiler (551 stars) for multi-pass compilation strategies.