Source: This vault’s CLAUDE.md schema (every operation, layer, and script described in the diagram traces directly to a section there).
A single-image architecture diagram of the karpathy LLM wiki pipeline as it stands on 2026-05-20 (v3). Captures the end-to-end flow from inputs (with the /inbox-refresh skill at the top of the fan-in, plus the new last30days Channel 9 community-signal input added 2026-05-14) through the immutable staging layer, the compile operation with its three-way triage: fork, the wiki layer, the publish pipeline, and the live Cloudflare Pages + Notion outputs. The maintenance loop sits on top as a recurring band; QMD sits underneath as the retrieval substrate; bin/ scripts run the operational glue.
Diagram versions — Snapshots are immutable-by-naming. The current version is v3 (2026-05-20) embedded above; prior versions v2 (2026-05-15) at
pipeline-diagram-2026-05-15.svgand v1 (2026-05-09) atpipeline-diagram-2026-05-09.svgare preserved for the historical record.
How to read this
- Six numbered columns, left-to-right: ① Inputs → ② Staging → ③ Compile/Ingest → ④ Wiki layer → ⑤ Publish → ⑥ Live.
- Maintenance loop runs across the top as a recurring band — Lint, Refresh (Tier-1 / Tier-2), Watch, Cross-link, Connection — and writes back into the wiki layer between operations. Friday 23:00 UTC and Sunday 16:00 UTC are the cloud-routine cadences.
- QMD substrate runs across the bottom: BM25 + vector + LLM rerank, ~2GB of GGUF models all-local. The 7-stage pipeline (query → expansion → parallel BM25/vec → RRF → reranker → position-aware blend → ranked results) is broken out so you can see where each stage lives. The substrate is queryable from every layer above it — by humans, by Claude during ingest, by lint checks, by cross-link sweeps.
bin/scripts band is the operational layer — 14 scripts across two rows. Wiki-core:post-ingest,lint-stale-sources,fix-stale-sources,build-audit-db,refresh,drain-questions,update-article-count. Ingest helpers + integrations:yt-transcript,last30dental,last30days-to-raw,output-review,sync-notion,sync-aios-wiki-stats,x-search-mcp. Mostly idempotent; invoked manually or by the maintenance loop.- Triage routing is the core decision in compile: every staged source carries a
triage:field with one of three values (ingest/refresh:topic/article/skip:<reason>). Each routes differently — full new article, prefixed sentence in an existing article, or manifest-only trace. The skip path is load-bearing: it records what was considered and rejected, preserving the reasoning trail. - Two-layer publish gate (the rose dashed box in column ⑤):
quartz.config.ts ignorePatternsblocks topics wholesale, and per-articlepublish: truefrontmatter is required by the ExplicitPublish plugin. Both must pass for an article to ship. - Concrete numbers sit at the top: 342 articles (323 published), 23 topics, 554 indexed docs / 19,152 vectors in QMD — the index spans two collections (karpathy-wiki 388 + weomarketly-wiki 166), 98MB on disk; 60-120s deploy time on Cloudflare Pages (migrated 2026-05-11 from Workers).
Why this exists
This vault is meta — it documents itself. Most of the patterns visible here (the triage: field on raw frontmatter, the post-ingest hook chaining QMD refresh + audit DB rebuild + stale-sources lint, the priority-tiered lint auto-fix loop, the carry-forward refresh) emerged from concrete operational needs and got named in this CLAUDE.md after they paid off. The diagram exists so future-me, future-Claude, and any reader who lands at this URL can see the whole system at once instead of reconstructing it from 17 sections of a 600-line schema file.
It also doubles as a synthadoc-style architecture artifact: a single-glance reference for “what does a Karpathy LLM-wiki pattern actually look like in production?” Anyone copying this pattern can use the diagram as a checklist of layers, scripts, and feedback loops — and skip the ones their use case doesn’t need.
Related
- Synthadoc — LLM-Powered Wiki Engine — the architecturally-similar Python+Obsidian implementation that contributed several patterns visible here (status frontmatter, query decomposition, lint stale-sources, audit DB)
- Karpathy Wiki Additions from Synthadoc (2026-05-05) — the pattern-by-pattern ledger of what got adopted, what got skipped, and why
- Stride Starter Vault — minimal-template counterpart at the other end of the spectrum
- QMD — Local Hybrid-Search MCP — the retrieval substrate; everything in the bottom band of the diagram is QMD
- Karpathy Techniques for Claude Code — the wiki-vs-semantic-RAG tradeoff thesis the architecture answers
- Wiki Community Enhancements — broader survey of Karpathy-pattern community implementations
Try It
- Read the diagram top-to-bottom, left-to-right. Title bar → maintenance band → six columns → QMD substrate → bin/ scripts → legend. The flow narrative at the very bottom of the legend block is the one-paragraph version.
- Pull the source SVG.
pipeline-diagram-2026-05-20.svgis a single self-contained file (~52KB) — embed it in slides, blog posts, READMEs without external dependencies. Prior versions:pipeline-diagram-2026-05-15.svg(v2),pipeline-diagram-2026-05-09.svg(v1). - Diff v2 → v3 if you want the changelog: claude-ai 130 → 161 articles, ai-video-content 25 → 28, agents-agentic-systems 11 → 14, seo-content 8 → 22 (the 14-study AI-citation research cluster + new
ai-seo/hub), ai-marketing 14 → 15, ai-web-design 7 → 9, ai-industry-research 4 → 6, karpathy-pattern 4 → 5, connections 9 → 10, hermes-agent 4 → 5; topics shown 18 → 23 (added theai-seo/hub page and the migrated-outweo-ai-governance/stub); QMD index 463 docs / 13,023 vec → 554 / 19,152 (karpathy-wiki 298 → 388, weomarketly-wiki 113 → 166; 73MB → 98MB); thebin/band now shows all 14 scripts (addedfix-stale-sources,drain-questions,update-article-count,sync-aios-wiki-stats,x-search-mcp, restoredoutput-review); and the COL5 deploy box was corrected from “Cloudflare Worker” to “Cloudflare Pages” (v2 had updated the live-output box but left the publish-lane label stale). - Date-anchor. Snapshots are date-anchored and immutable. As the schema evolves the diagram will too — newer versions get fresh
pipeline-diagram-YYYY-MM-DD.svgfiles; old ones stay for the historical record (immutable-by-naming, the same disciplineraw/andai-research/use).