Source: wiki synthesis: How I Use LLMs, From Vibe Coding to Agentic Engineering, Karpathy’s LLM-Wiki Techniques for Claude Code, Anthropic’s Best Practices for Claude Code
Karpathy’s two essays describe how to work with LLMs in deliberately tool-agnostic terms — a “tool ladder” of delegation and verification (How I Use LLMs) and the maturation from vibe coding to agentic engineering (his Sequoia talk). Anthropic’s Claude Code is one place that framework stops being abstract and becomes a specific surface: /clear, the four-phase plan-mode workflow, verification-as-highest-leverage, CLAUDE.md memory, subagents, and the LLM-wiki pattern this very vault runs on. Read side by side, the concept and the concrete feature are frequently the same claim at two altitudes — and where the practitioner and the vendor derive the same rule independently (both landing on “context window = working memory that degrades as it fills”), that convergence is the strongest signal the practice is real and not house style. This maps his framework point by point.
Key Takeaways
- Same root constraint, two authors. Karpathy: “context window = working memory; start fresh chats aggressively.” Anthropic: “context window = the most important resource to manage; performance degrades as it fills.” Claude Code’s
/clear— and the “kitchen sink session” failure pattern — is Karpathy’s fresh-chat-per-task rule turned into a command. - Verifiability is the sharpest match. Karpathy’s frame — “the latest LLMs automate what you can verify” — and Anthropic’s operating instruction — “include tests / screenshots / expected outputs so Claude can check itself; this is the single highest-leverage thing you can do” — are the same claim as theory and as practice.
- Delegation beats prompt magic → CLAUDE.md + skills. Karpathy’s “save reusable setups (custom instructions, memory, custom GPTs)” is realized as
CLAUDE.mdloaded every session and skills loaded on demand; the karpathy-techniques article shows theCLAUDE.mdschema is his reusable-setup idea. - Jagged intelligence → stay in the loop. The car-wash example (“treat them as tools and stay in touch with what they’re doing”) is exactly what Anthropic’s five named failure patterns operationalize — kitchen-sink, correcting-over-and-over, over-specified CLAUDE.md, trust-then-verify gap, infinite exploration.
- Vibe coding vs agentic engineering → plan mode + permissions. Vibe coding raises the floor; agentic engineering preserves the quality bar (same security, same correctness, agents as the primary executor). Claude Code’s plan-before-code phase, permission modes (auto / allowlist / sandboxing), and the mandatory verification loop are the machinery that preserves that bar.
- Understanding is the bottleneck → the LLM-wiki. Karpathy’s closing — “you can outsource your thinking but you can’t outsource your understanding,” anchored in his LLM-built knowledge base — is the pattern the karpathy-techniques article documents as a Claude Code workflow, and the one this vault implements.
The map: concept → surface
Each row is a Karpathy abstraction and the Claude Code feature it becomes concrete in. The pairing is this article’s synthesis; every cell traces to the cited source. ^[inferred]
| Karpathy’s framework | Claude Code surface | Where |
|---|---|---|
| Fresh chat per task; context = working memory | /clear + the 8-tool context inventory; “kitchen sink” failure pattern | how-i-use-llms ↔ best-practices |
| Delegation > prompt magic; save reusable setups | CLAUDE.md every session + skills loaded on demand | how-i-use-llms ↔ best-practices / karpathy-techniques |
| ”Automate what you can verify" | "Verification = single highest-leverage thing”; tests / screenshots in the prompt | from-vibe-coding ↔ best-practices |
| Choose the model tier intentionally | Explicit per-session model choice | how-i-use-llms ↔ best-practices ^[inferred] |
| Jagged intelligence → stay in the loop | The 5 named failure patterns + plan mode | from-vibe-coding ↔ best-practices |
| Vibe coding vs agentic engineering | Plan-before-code, permission modes, verification gate | from-vibe-coding ↔ best-practices |
| Delegate to keep working memory clean | Subagents (separate context, return only a summary) | how-i-use-llms ↔ best-practices |
| Understanding is the bottleneck; LLM knowledge bases | The LLM-wiki pattern in Claude Code (this vault) | from-vibe-coding ↔ karpathy-techniques |
The two convergences worth trusting
Most of the mapping is one author being abstract and the other concrete. Two rows are stronger than that — the practitioner and the vendor reached the same rule from different directions, which is what makes them load-bearing rather than stylistic. ^[inferred]
- Context as working memory. Karpathy reached “start fresh chats aggressively; old tokens distract, cost money, degrade quality” as a personal habit. Anthropic reached “context window = the most important resource; performance degrades as it fills” as the root cause behind most failure patterns. The remedy is identical: reach for
/clearearly (the doc’s rule: “/clearis admission you waited too long”), and after two failed corrections, clear and re-prompt with what you learned. Two people deriving the same constraint separately is not branding. - Verification. Karpathy frames verifiability as the economic predictor of what AI automates first: frontier RL training is “a giant verification-reward loop,” so capability peaks in verifiable domains (math, code) and is jagged elsewhere. Anthropic frames verification as the operator’s single highest-leverage move: give Claude a test, a screenshot target, or an expected output so it can check itself. Same coin — the domain-level “why it works” and the session-level “what to do about it.” This is the verification-frontier thesis stated at the workflow layer. ^[inferred]
Vibe coding → agentic engineering, as a Claude Code configuration
The maturation Karpathy names is not a philosophy in Claude Code — it is a set of toggles:
- Raising the floor = anyone can prompt Claude to build something; the vibe-coding win is real and needs no configuration.
- Preserving the bar = plan mode (explore → plan → code → commit) + permission allowlists / sandboxing (the “same security guarantees” clause) + a verification loop + a pruned
CLAUDE.md(Anthropic: “if removing a line wouldn’t cause a mistake, cut it”). That stack is agentic engineering made operational. - The karpathy-techniques article’s four community-packaged operating modes — Think First / Simplicity First / Surgical Changes / Goal-Driven Execution, distilled from Karpathy’s own tweet into the
forrestchang/andrej-karpathy-skillsCLAUDE.md— are one file-shaped instance of the same “preserve the quality bar” instinct: think before coding, keep changes minimal and surgical, and make every action define its own success criterion (on “fix the bug,” rewrite the task as “write a failing test, make it pass”).
Not the same as Karpathy-Pattern Third-Party Adoption. That article maps who cloned the LLM-wiki knowledge-base pattern (Stride, synthadoc, arscontexta, the AIOS bundles). This one maps his coding workflow onto Claude Code’s feature surface — the delegation-and-verification craft, not the second-brain schema. They share the “understanding bottleneck” endpoint (the wiki is Karpathy’s own answer to it) but travel there from opposite ends. ^[inferred]
Try It
- Adopt fresh-context discipline as a hard rule.
/clearbetween unrelated tasks; if you’ve corrected twice,/clearand re-prompt incorporating what you learned. It is Karpathy’s cheapest quality-and-cost win, as a keystroke. - Put verification in the initial prompt every time — a test, a screenshot target, or an expected output. This is the one instruction both Karpathy and Anthropic independently call the highest-leverage move; without it you are the only feedback loop.
- Treat
CLAUDE.mdas your saved reusable setup, and prune it like code. It is the concrete form of “delegation beats prompt magic” — and an over-stuffed one is Anthropic’s third named failure pattern, where the load-bearing rule gets lost in noise. - Use subagents for any investigation touching 5+ files. The delegate-and-keep-working-memory-clean move is what lets you stay in the loop on jagged tasks instead of watching Claude read hundreds of files.
- Run the wiki. Karpathy’s own answer to the understanding bottleneck is an LLM-maintained knowledge base — the karpathy-techniques workflow, which this vault is.
Related
- How I Use LLMs — the tool ladder, delegation, and fresh-chat discipline the workflow rows map from.
- From Vibe Coding to Agentic Engineering — verifiability, jagged intelligence, and the vibe→agentic maturation.
- Karpathy’s LLM-Wiki Techniques for Claude Code — the reusable-setup / four-operating-modes packaging and the understanding-bottleneck answer.
- Anthropic’s Best Practices for Claude Code — the concrete feature surface every row lands in.
- Claude Code Subagents — the delegate-to-keep-context-clean lever.
- The Verification Frontier — the domain-level version of the verification convergence above.
- Karpathy-Pattern Third-Party Adoption — the sibling article on the wiki pattern, differentiated above.
- Karpathy Pattern — community implementations of the LLM-wiki this workflow’s endpoint relies on.
Open Questions
- Where the model-tier row breaks down. Karpathy’s “choose the tier intentionally” maps to Claude Code’s model selection, but the specific reasoning-effort knob (low → max) isn’t covered in these four sources — the mapping is directional, not verified feature-for-feature. ^[inferred]
- Does the LLM-council habit have a first-party Claude Code form? Karpathy runs one question across ChatGPT / Claude / Gemini / Grok; Anthropic’s Writer/Reviewer pattern is a single-provider cross-check (fresh-context reviewer). Whether the multi-provider council has a native Claude Code analog is unaddressed. ^[inferred]