Source: This vault’s CLAUDE.md schema (every operation, layer, and script described in the diagram traces directly to a section there).

A single-image architecture diagram of the karpathy LLM wiki pipeline as it stands on 2026-05-20 (v3). Captures the end-to-end flow from inputs (with the /inbox-refresh skill at the top of the fan-in, plus the new last30days Channel 9 community-signal input added 2026-05-14) through the immutable staging layer, the compile operation with its three-way triage: fork, the wiki layer, the publish pipeline, and the live Cloudflare Pages + Notion outputs. The maintenance loop sits on top as a recurring band; QMD sits underneath as the retrieval substrate; bin/ scripts run the operational glue.

karpathy LLM wiki pipeline

Diagram versions — Snapshots are immutable-by-naming. The current version is v3 (2026-05-20) embedded above; prior versions v2 (2026-05-15) at pipeline-diagram-2026-05-15.svg and v1 (2026-05-09) at pipeline-diagram-2026-05-09.svg are preserved for the historical record.

How to read this

  • Six numbered columns, left-to-right: ① Inputs → ② Staging → ③ Compile/Ingest → ④ Wiki layer → ⑤ Publish → ⑥ Live.
  • Maintenance loop runs across the top as a recurring band — Lint, Refresh (Tier-1 / Tier-2), Watch, Cross-link, Connection — and writes back into the wiki layer between operations. Friday 23:00 UTC and Sunday 16:00 UTC are the cloud-routine cadences.
  • QMD substrate runs across the bottom: BM25 + vector + LLM rerank, ~2GB of GGUF models all-local. The 7-stage pipeline (query → expansion → parallel BM25/vec → RRF → reranker → position-aware blend → ranked results) is broken out so you can see where each stage lives. The substrate is queryable from every layer above it — by humans, by Claude during ingest, by lint checks, by cross-link sweeps.
  • bin/ scripts band is the operational layer — 14 scripts across two rows. Wiki-core: post-ingest, lint-stale-sources, fix-stale-sources, build-audit-db, refresh, drain-questions, update-article-count. Ingest helpers + integrations: yt-transcript, last30dental, last30days-to-raw, output-review, sync-notion, sync-aios-wiki-stats, x-search-mcp. Mostly idempotent; invoked manually or by the maintenance loop.
  • Triage routing is the core decision in compile: every staged source carries a triage: field with one of three values (ingest / refresh:topic/article / skip:<reason>). Each routes differently — full new article, prefixed sentence in an existing article, or manifest-only trace. The skip path is load-bearing: it records what was considered and rejected, preserving the reasoning trail.
  • Two-layer publish gate (the rose dashed box in column ⑤): quartz.config.ts ignorePatterns blocks topics wholesale, and per-article publish: true frontmatter is required by the ExplicitPublish plugin. Both must pass for an article to ship.
  • Concrete numbers sit at the top: 342 articles (323 published), 23 topics, 554 indexed docs / 19,152 vectors in QMD — the index spans two collections (karpathy-wiki 388 + weomarketly-wiki 166), 98MB on disk; 60-120s deploy time on Cloudflare Pages (migrated 2026-05-11 from Workers).

Why this exists

This vault is meta — it documents itself. Most of the patterns visible here (the triage: field on raw frontmatter, the post-ingest hook chaining QMD refresh + audit DB rebuild + stale-sources lint, the priority-tiered lint auto-fix loop, the carry-forward refresh) emerged from concrete operational needs and got named in this CLAUDE.md after they paid off. The diagram exists so future-me, future-Claude, and any reader who lands at this URL can see the whole system at once instead of reconstructing it from 17 sections of a 600-line schema file.

It also doubles as a synthadoc-style architecture artifact: a single-glance reference for “what does a Karpathy LLM-wiki pattern actually look like in production?” Anyone copying this pattern can use the diagram as a checklist of layers, scripts, and feedback loops — and skip the ones their use case doesn’t need.

Try It

  1. Read the diagram top-to-bottom, left-to-right. Title bar → maintenance band → six columns → QMD substrate → bin/ scripts → legend. The flow narrative at the very bottom of the legend block is the one-paragraph version.
  2. Pull the source SVG. pipeline-diagram-2026-05-20.svg is a single self-contained file (~52KB) — embed it in slides, blog posts, READMEs without external dependencies. Prior versions: pipeline-diagram-2026-05-15.svg (v2), pipeline-diagram-2026-05-09.svg (v1).
  3. Diff v2 → v3 if you want the changelog: claude-ai 130 → 161 articles, ai-video-content 25 → 28, agents-agentic-systems 11 → 14, seo-content 8 → 22 (the 14-study AI-citation research cluster + new ai-seo/ hub), ai-marketing 14 → 15, ai-web-design 7 → 9, ai-industry-research 4 → 6, karpathy-pattern 4 → 5, connections 9 → 10, hermes-agent 4 → 5; topics shown 18 → 23 (added the ai-seo/ hub page and the migrated-out weo-ai-governance/ stub); QMD index 463 docs / 13,023 vec → 554 / 19,152 (karpathy-wiki 298 → 388, weomarketly-wiki 113 → 166; 73MB → 98MB); the bin/ band now shows all 14 scripts (added fix-stale-sources, drain-questions, update-article-count, sync-aios-wiki-stats, x-search-mcp, restored output-review); and the COL5 deploy box was corrected from “Cloudflare Worker” to “Cloudflare Pages” (v2 had updated the live-output box but left the publish-lane label stale).
  4. Date-anchor. Snapshots are date-anchored and immutable. As the schema evolves the diagram will too — newer versions get fresh pipeline-diagram-YYYY-MM-DD.svg files; old ones stay for the historical record (immutable-by-naming, the same discipline raw/ and ai-research/ use).