Synthadoc — LLM-Powered Wiki Engine

Source: ai-research/synthadoc-axoviq-readme.md Repo: github.com/axoviq-ai/synthadoc Stars: 231 / License: AGPL-3.0 / Released: v0.3.0 on 2026-05-04 Languages: Python 85.7% / TypeScript 14.3%

A formalized engine implementation of the Karpathy LLM-wiki pattern — an open-source Python service that compiles raw documents into structured local Markdown wikis at ingest time, with explicit contradiction detection, orphan flagging, and a job-queue/audit-DB architecture. Released as v0.3.0 on 2026-05-04 by axoviq-ai. The README opens by quoting Karpathy’s gist directly: “The LLM should be able to maintain a wiki for you.”

Compared to the Stride starter vault (a minimal Obsidian template with a 4-operation CLAUDE.md), synthadoc is the architecturally complete engine end of the spectrum — Python service + HTTP API + background job worker + multi-LLM provider abstraction + SQLite audit DB + OpenTelemetry hooks.

Key Takeaways

Ingest-time synthesis, not query-time RAG. The README frames this as the project’s core differentiation: “compiles knowledge at ingest time. Every new source enriches and cross-links the entire corpus, not just appends a new chunk.” Same thesis as Karpathy’s gist and as this vault.
5-pass IngestAgent pipeline. Vision → Analysis → Candidate search (BM25) → Decision → Write. Pass 3 reads the analysis summary plus retrieved candidates plus AGENTS.md scope and can output flag_contradiction, which transitions the page’s frontmatter status to contradicted and preserves both old and new claims with ⚠ markers.
Frontmatter status field. active | contradicted | archived — a single field that lets Dataview, lint, and audit queries find conflicted pages without scanning bodies. This vault adopted the field on 2026-05-05 (see Karpathy wiki additions from synthadoc).
Query decomposition. search_decompose_agent.py splits compound questions into 1-N sub-questions (cap=4) and runs them in parallel. Identical pattern for web search via Tavily. Avoids the relevance-collision failure where multi-entity questions only return articles about the most-frequent entity. This vault adopted the pattern as a Query operation behavior on the same date.
AGENTS.md per wiki. A file containing “LLM instructions for this domain” prepended to ingest decision prompts. Scope-based filtering without per-agent branching. This vault adopted it as an optional per-topic file, not a single global one — see the AGENTS.md section in vault CLAUDE.md.
Hooks system. Shell commands triggered on on_ingest_complete and on_lint_complete events with a JSON payload on stdin (event, wiki, source, pages_created, pages_updated, tokens, cost_usd). Blocking or non-blocking.
Three-layer cache. Embedding + LLM response + provider prompt cache. Each layer addresses a different repeat-cost class. Less load-bearing for our setup since we run on Claude Code subscription, but the layering insight transfers.
Audit trail. Three artifacts: human-readable log.md, JSON-lines synthadoc.log (rotates by size, jq-filterable), and append-only audit.db SQLite. Tables: ingest_log, audit_events, queries. Optional OpenTelemetry OTLP backend for traces/metrics. This vault now generates a similar .audit.db via bin/build-audit-db from the existing .manifest.json + log.md + questions.md.
Obsidian plugin. TypeScript plugin built into the repo for native Obsidian integration alongside the CLI. Auto-generated Dataview dashboard for any new wiki.
Multi-source ingest. PDF, PPTX, XLSX, OCR images, Markdown, URLs, YouTube transcripts, Tavily web searches, plus a manifest-file batch format. Same content surface as our raw/ + ai-research/ ingest.
Multi-LLM support. 7 providers — Gemini Flash (free 1M tokens/day), Groq (free, rate-limited), Ollama (local), MiniMax, DeepSeek, Anthropic, OpenAI. Plus Claude Code / Opencode CLI subscriptions as zero-API-key providers (the model their docs nudge for the free tier).
Auto-resolution at ≥85% confidence. LintAgent auto-resolves contradictions above the threshold; below it the conflict stays flagged with status: contradicted for human review. This is the part of synthadoc’s discipline we did NOT adopt — every contradiction stays for human resolution in our flow.

Why this matters for our wiki

Synthadoc’s release on 2026-05-04 is the most architecturally complete public reference for the Karpathy pattern to date. It validates several of our existing choices (ingest-time synthesis, append-only operation log, contradiction callouts, Obsidian as the IDE) and surfaces specific patterns we’d otherwise have had to invent independently. Five concrete improvements landed in this vault on 2026-05-05 directly informed by reading their docs/design.md:

status: active | contradicted | archived frontmatter field
Query decomposition behavior in the Query operation
bin/lint-stale-sources Python script (modeled on their LintAgent stale check)
bin/build-audit-db SQLite audit DB build script (modeled on their audit.db schema)
Optional per-topic AGENTS.md pattern

See Karpathy wiki additions from synthadoc for the full delta and rationale.

Compared to Stride starter vault

	Stride starter	Synthadoc	This vault
Form factor	Obsidian template	Python engine + plugin	Obsidian vault + Quartz site
Operations defined	4 (Ingest/Research/Query/Lint)	~10 (CLI commands)	11 (vault `CLAUDE.md`)
Contradiction handling	Mentioned in CLAUDE.md	`status: contradicted` + auto-resolve	`[!contradiction]` callout + `status: contradicted`
Audit trail	log.md only	log.md + JSON-lines + SQLite	log.md + JSON manifest + SQLite (new)
Job queue	None	Background worker, retryable	Claude Code session-bound
Multi-LLM	Implicit (whatever runs CLAUDE.md)	7 providers + 2 CLI subscriptions	Claude Code only
Live publishing	None	None	Quartz → Cloudflare Worker
Stars	9	231	private

Try It

Read their docs/design.md — the most substantive Karpathy-pattern engineering writeup public to date. Especially the IngestAgent pass-by-pass description and the audit-DB schema.
Compare AGENTS.md vs our topic _index.md — our index files are descriptive (what’s in the topic); theirs are directive (how to ingest into the topic). Decide per topic whether a directive file adds enough value to maintain.
Steal their --demo install pattern — synthadoc install history-of-computing --demo ships 13 prebuilt pages + bootstrap scaffold. We don’t currently have a way to publish a “starter” version of this vault for someone wanting to clone the pattern. Their pattern shows how.
Ignore their three-layer cache for now — embedding cache and LLM-response cache require running the model directly. We use Claude Code as the runtime, which already does provider prompt cache. Layers 1 and 2 only become relevant if we ever swap to a programmatic provider.
Watch for v0.4 — released 2026-05-04, so still moving fast. The axoviq-ai/synthadoc repo is worth a quarterly check.

Open Questions

How does their auto-resolution at ≥85% confidence actually work in practice? The README mentions the threshold but docs/design.md § 4 doesn’t expose the rubric. Worth a deeper read.
What does audit_events track that’s not already in ingest_log or queries? The schema is referenced but not enumerated.
Is the Obsidian plugin shipped as a bundled .zip for community-plugin install, or only buildable from source?

joshpocock-vault — minimal Obsidian-template implementation of the same pattern (the other end of the spectrum)
from-vibe-coding-to-agentic-engineering — Karpathy’s Sequoia talk that explicitly endorses LLM knowledge bases as understanding tools
wiki-community-enhancements — broader ecosystem survey of Karpathy-pattern variants
karpathy-techniques-for-claude-code — applying Karpathy’s patterns to Claude Code specifically
karpathy-vault-additions-from-synthadoc — the specific 5 improvements this vault adopted on 2026-05-05 from reading synthadoc’s design doc

Jonathon's AI Wiki

Explorer

Synthadoc — LLM-Powered Wiki Engine

Key Takeaways

Why this matters for our wiki

Compared to Stride starter vault

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Synthadoc — LLM-Powered Wiki Engine

Key Takeaways

Why this matters for our wiki

Compared to Stride starter vault

Try It

Open Questions

Related

Graph View

Table of Contents

Backlinks