Source: wiki synthesis: How I Use LLMs, From Vibe Coding to Agentic Engineering, Karpathy’s LLM-Wiki Techniques for Claude Code, Anthropic’s Best Practices for Claude Code

Karpathy’s two essays describe how to work with LLMs in deliberately tool-agnostic terms — a “tool ladder” of delegation and verification (How I Use LLMs) and the maturation from vibe coding to agentic engineering (his Sequoia talk). Anthropic’s Claude Code is one place that framework stops being abstract and becomes a specific surface: /clear, the four-phase plan-mode workflow, verification-as-highest-leverage, CLAUDE.md memory, subagents, and the LLM-wiki pattern this very vault runs on. Read side by side, the concept and the concrete feature are frequently the same claim at two altitudes — and where the practitioner and the vendor derive the same rule independently (both landing on “context window = working memory that degrades as it fills”), that convergence is the strongest signal the practice is real and not house style. This maps his framework point by point.

Key Takeaways

  • Same root constraint, two authors. Karpathy: “context window = working memory; start fresh chats aggressively.” Anthropic: “context window = the most important resource to manage; performance degrades as it fills.” Claude Code’s /clear — and the “kitchen sink session” failure pattern — is Karpathy’s fresh-chat-per-task rule turned into a command.
  • Verifiability is the sharpest match. Karpathy’s frame — “the latest LLMs automate what you can verify” — and Anthropic’s operating instruction — “include tests / screenshots / expected outputs so Claude can check itself; this is the single highest-leverage thing you can do” — are the same claim as theory and as practice.
  • Delegation beats prompt magic → CLAUDE.md + skills. Karpathy’s “save reusable setups (custom instructions, memory, custom GPTs)” is realized as CLAUDE.md loaded every session and skills loaded on demand; the karpathy-techniques article shows the CLAUDE.md schema is his reusable-setup idea.
  • Jagged intelligence → stay in the loop. The car-wash example (“treat them as tools and stay in touch with what they’re doing”) is exactly what Anthropic’s five named failure patterns operationalize — kitchen-sink, correcting-over-and-over, over-specified CLAUDE.md, trust-then-verify gap, infinite exploration.
  • Vibe coding vs agentic engineering → plan mode + permissions. Vibe coding raises the floor; agentic engineering preserves the quality bar (same security, same correctness, agents as the primary executor). Claude Code’s plan-before-code phase, permission modes (auto / allowlist / sandboxing), and the mandatory verification loop are the machinery that preserves that bar.
  • Understanding is the bottleneck → the LLM-wiki. Karpathy’s closing — “you can outsource your thinking but you can’t outsource your understanding,” anchored in his LLM-built knowledge base — is the pattern the karpathy-techniques article documents as a Claude Code workflow, and the one this vault implements.

The map: concept → surface

Each row is a Karpathy abstraction and the Claude Code feature it becomes concrete in. The pairing is this article’s synthesis; every cell traces to the cited source. ^[inferred]

Karpathy’s frameworkClaude Code surfaceWhere
Fresh chat per task; context = working memory/clear + the 8-tool context inventory; “kitchen sink” failure patternhow-i-use-llms ↔ best-practices
Delegation > prompt magic; save reusable setupsCLAUDE.md every session + skills loaded on demandhow-i-use-llms ↔ best-practices / karpathy-techniques
”Automate what you can verify""Verification = single highest-leverage thing”; tests / screenshots in the promptfrom-vibe-coding ↔ best-practices
Choose the model tier intentionallyExplicit per-session model choicehow-i-use-llms ↔ best-practices ^[inferred]
Jagged intelligence → stay in the loopThe 5 named failure patterns + plan modefrom-vibe-coding ↔ best-practices
Vibe coding vs agentic engineeringPlan-before-code, permission modes, verification gatefrom-vibe-coding ↔ best-practices
Delegate to keep working memory cleanSubagents (separate context, return only a summary)how-i-use-llms ↔ best-practices
Understanding is the bottleneck; LLM knowledge basesThe LLM-wiki pattern in Claude Code (this vault)from-vibe-coding ↔ karpathy-techniques

The two convergences worth trusting

Most of the mapping is one author being abstract and the other concrete. Two rows are stronger than that — the practitioner and the vendor reached the same rule from different directions, which is what makes them load-bearing rather than stylistic. ^[inferred]

  • Context as working memory. Karpathy reached “start fresh chats aggressively; old tokens distract, cost money, degrade quality” as a personal habit. Anthropic reached “context window = the most important resource; performance degrades as it fills” as the root cause behind most failure patterns. The remedy is identical: reach for /clear early (the doc’s rule: “/clear is admission you waited too long”), and after two failed corrections, clear and re-prompt with what you learned. Two people deriving the same constraint separately is not branding.
  • Verification. Karpathy frames verifiability as the economic predictor of what AI automates first: frontier RL training is “a giant verification-reward loop,” so capability peaks in verifiable domains (math, code) and is jagged elsewhere. Anthropic frames verification as the operator’s single highest-leverage move: give Claude a test, a screenshot target, or an expected output so it can check itself. Same coin — the domain-level “why it works” and the session-level “what to do about it.” This is the verification-frontier thesis stated at the workflow layer. ^[inferred]

Vibe coding → agentic engineering, as a Claude Code configuration

The maturation Karpathy names is not a philosophy in Claude Code — it is a set of toggles:

  • Raising the floor = anyone can prompt Claude to build something; the vibe-coding win is real and needs no configuration.
  • Preserving the bar = plan mode (explore → plan → code → commit) + permission allowlists / sandboxing (the “same security guarantees” clause) + a verification loop + a pruned CLAUDE.md (Anthropic: “if removing a line wouldn’t cause a mistake, cut it”). That stack is agentic engineering made operational.
  • The karpathy-techniques article’s four community-packaged operating modes — Think First / Simplicity First / Surgical Changes / Goal-Driven Execution, distilled from Karpathy’s own tweet into the forrestchang/andrej-karpathy-skills CLAUDE.md — are one file-shaped instance of the same “preserve the quality bar” instinct: think before coding, keep changes minimal and surgical, and make every action define its own success criterion (on “fix the bug,” rewrite the task as “write a failing test, make it pass”).

Not the same as Karpathy-Pattern Third-Party Adoption. That article maps who cloned the LLM-wiki knowledge-base pattern (Stride, synthadoc, arscontexta, the AIOS bundles). This one maps his coding workflow onto Claude Code’s feature surface — the delegation-and-verification craft, not the second-brain schema. They share the “understanding bottleneck” endpoint (the wiki is Karpathy’s own answer to it) but travel there from opposite ends. ^[inferred]

Try It

  1. Adopt fresh-context discipline as a hard rule. /clear between unrelated tasks; if you’ve corrected twice, /clear and re-prompt incorporating what you learned. It is Karpathy’s cheapest quality-and-cost win, as a keystroke.
  2. Put verification in the initial prompt every time — a test, a screenshot target, or an expected output. This is the one instruction both Karpathy and Anthropic independently call the highest-leverage move; without it you are the only feedback loop.
  3. Treat CLAUDE.md as your saved reusable setup, and prune it like code. It is the concrete form of “delegation beats prompt magic” — and an over-stuffed one is Anthropic’s third named failure pattern, where the load-bearing rule gets lost in noise.
  4. Use subagents for any investigation touching 5+ files. The delegate-and-keep-working-memory-clean move is what lets you stay in the loop on jagged tasks instead of watching Claude read hundreds of files.
  5. Run the wiki. Karpathy’s own answer to the understanding bottleneck is an LLM-maintained knowledge base — the karpathy-techniques workflow, which this vault is.

Open Questions

  • Where the model-tier row breaks down. Karpathy’s “choose the tier intentionally” maps to Claude Code’s model selection, but the specific reasoning-effort knob (low → max) isn’t covered in these four sources — the mapping is directional, not verified feature-for-feature. ^[inferred]
  • Does the LLM-council habit have a first-party Claude Code form? Karpathy runs one question across ChatGPT / Claude / Gemini / Grok; Anthropic’s Writer/Reviewer pattern is a single-provider cross-check (fresh-context reviewer). Whether the multi-provider council has a native Claude Code analog is unaddressed. ^[inferred]