Source: Andrej Karpathy — From Vibe Coding to Agentic Engineering · YouTube 96jN2OCOfLs

Speaker: Andrej Karpathy Interviewer: Stephanie Zhan (Sequoia) Venue: Sequoia AI Ascent (the “very first special guest” of the event) Length: ~30 minute interview · YouTube 2026

A wide-ranging conversation framed by Karpathy’s now-famous claim that he has “never felt more behind as a programmer.” Walks through the December 2025 inflection point in agentic coding, the Software 1.0 → 2.0 → 3.0 trichotomy, the verifiability frame for which work AI automates first, the difference between vibe coding and agentic engineering, “animals vs ghosts” as a model for what LLMs are, and ends with a defense of human understanding as the irreducible bottleneck — explicitly anchored in his personal use of an LLM-built knowledge base, i.e., the Karpathy LLM-wiki pattern this vault implements.

Key Takeaways

  • December 2025 was the inflection. Karpathy describes a stark transition: agentic tools were “kind of helpful” through most of 2025, then in December “the chunks just came out fine and then I kept asking for more and it just came out fine and I can’t remember the last time I corrected it.” He stresses that practitioners who experienced AI in 2024 as a “ChatGPT-adjacent thing” need to look again post-December — “things have changed fundamentally” specifically on the “agentic coherent workflow” axis.
  • Software 1.0 / 2.0 / 3.0 trichotomy. 1.0 = explicit code; 2.0 = programming-by-data-set + neural net training; 3.0 = prompting an LLM as a general-purpose programmable computer with the context window as the lever. Software 3.0 includes work that couldn’t exist in earlier paradigms — not just “the same thing, faster.”
  • Two examples of 3.0 thinking. (a) OpenClaw install: instead of a bash script, it’s a copy-paste-able skill — “a little skill of copy paste this and give it to your agent and it will install OpenClaw” — which sidesteps platform fragmentation by handing the agent intelligence the cross-environment work. Karpathy explicitly frames this as “the programming paradigm” of 3.0. (b) MenuGen redundancy: he built MenuGen (photograph a menu → get pictures of dishes) as a deployed app on Vercel; later realized you could just hand the photo to Gemini with “use Nanobanana to overlay the things onto the menu” and the entire app collapses into a single multimodal call. “All of my menu gen is spurious. It’s working in the old paradigm. That app shouldn’t exist.”
  • Software might flip completely. Karpathy speculates the long-extrapolation is computers where neural nets are the host process and CPUs are the co-processor — diffusion-rendered UIs unique to the moment, raw video/audio in, neural-net “computer” doing the work, deterministic CPU as a “historical appendage” for tool use. He notes the early-computing analogy: in the 1950s–60s it was genuinely unclear whether computers would look like calculators or neural nets; we went down the calculator path; that may now flip.
  • Verifiability is the framework for “what gets automated first.” “Traditional computers automate what you can specify in code. The latest LLMs automate what you can verify.” Frontier RL training is a giant verification-reward loop, so capability peaks in verifiable domains (math, code) and is jagged-and-rough elsewhere. He flags two confounds: lab focus (what data they decide is worth getting good at — the chess-improvement-from-3.5-to-4 jump was reportedly because OpenAI added chess data to pre-training) and economic value (more environments get built where they’re useful). For founders building in verifiable domains, “you can pull the lever” of fine-tuning + RL environments + diverse datasets to get something that works, even if labs aren’t focused there.
  • Jagged intelligence — the 50-meter car wash. Karpathy’s go-to example: state-of-the-art models will refactor a 100,000-line codebase or find zero-day vulnerabilities, and simultaneously tell you to walk to a 50-meter-away car wash. “How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000-line codebase or find zero-day vulnerabilities and yet tells me to walk to this car wash? This is insane.” Implication for users: “you need to actually be in the loop a little bit and treat them as tools and stay in touch with what they’re doing.”
  • Vibe Coding vs Agentic Engineering — load-bearing distinction. Vibe coding raises the floor — “everyone can vibe code anything.” Agentic engineering preserves the quality bar that already existed in professional software: same security guarantees, same correctness bar, but using agents as the primary executor. The 10× engineer trope is too small — “people who are very good at this peak a lot more than 10×.”
  • Hiring should change. The puzzle-interview format is the old paradigm. Karpathy’s proposed agentic-engineering interview: “Give me a really big project and see someone implement that big project — let’s say a Twitter clone for agents, make it really good, make it really secure, then have some agents simulate activity. I’m going to use 10 codecs 5.4x for X high to try to break your website. They should not be able to break it.” The skills being assessed are tool-setup investment, agent orchestration, and adversarial robustness under simulated agent traffic.
  • What humans contribute (today). Taste, aesthetics, judgment, oversight, design, and spec — not API details. He’s “already forgotten” PyTorch vs NumPy specifics like keep_dim vs keep_dims vs dim vs axis because “the intern handles it.” But the human still has to know the underlying model — view-vs-storage in tensors, persistent user IDs vs email-as-key, etc. The agent-as-intern framing: gets you the recall, but you supply the design constraints. Code that LLMs produce is “very bloaty, lots of copy-paste, awkward brittle abstractions” — it works but it’s “really gross.” Aesthetics aren’t yet in the RL.
  • Animals vs Ghosts. Karpathy’s framing for what LLMs are: not animal intelligences (“if you yell at them, they’re not going to work better or worse”), but “statistical simulation circuits” — pre-training as substrate, RL bolting capabilities on top. He admits the framing is “a little bit of philosophizing” and may not have “real power,” but it disciplines the user against anthropomorphizing.
  • Agent-native infrastructure is the open opportunity. Karpathy’s pet peeve: “every time I’m told ‘go to this URL or something’ — like, I don’t want to do anything, what is the thing I should copy paste to my agent?” Most software, docs, and deploy flows are still written for humans. The frontier is decomposing workloads into sensors-over-the-world / actuators-over-the-world primitives, agent-first description, LLM-legible data structures. Eventually: “I’ll have my agent talk to your agent to figure out the details of our meetings.”
  • Closing — the irreducible bottleneck is understanding. Quoting a tweet that “blew his mind” recently: “You can outsource your thinking but you can’t outsource your understanding.” Karpathy explicitly anchors this in his use of an LLM-built knowledge base — “this is one reason I was very excited about all the LM knowledge bases because I feel like that’s a way for me to process information… I really enjoy whenever I read an article I have my wiki that’s being built up from these articles and I love asking questions about things” — describing LLM knowledge bases as “tools to enhance understanding” and noting that the bottleneck is going to be “what are we trying to build, why is it worth doing, how do I direct my agents.” This is the LLM-wiki pattern that this vault implements.

Notable Quotes

  • “You can outsource your thinking but you can’t outsource your understanding.” — Karpathy, citing a tweet, framing the closing reflection on what humans still uniquely contribute.
  • “All of my menu gen is spurious. It’s working in the old paradigm. That app shouldn’t exist.” — on the MenuGen-vs-Nano-Banana realization.
  • “How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000-line codebase or find zero-day vulnerabilities and yet tells me to walk to this car wash? This is insane.” — anchoring the jagged-intelligence frame.
  • “Vibe coding is about raising the floor for everyone. Agentic engineering is about preserving the quality bar that existed in professional software.”
  • “I’ll have my agent talk to your agent to figure out the details of our meetings.” — on agent-native infrastructure as the long arc.

Why This Matters for the Wiki

This talk closes a loop the vault has been documenting from the third-party side. Karpathy is publicly, in 2026, confirming and personally using the LLM-wiki / knowledge-base pattern that:

It also independently sources several frames the wiki has been operating with:

  • The OpenClaw “install via copy-paste-to-agent skill” anecdote is exactly the Agent Skills format thesis at the platform level.
  • The “Software 3.0” framing rhymes with the spec-prompt vs vibe-prompt taxonomy already established for design work.
  • Verifiability + RL-environment availability ties directly into Opus 4.7 best practices and the broader effort-tier discussion.

Try It

  1. Use this talk as the strategic deck for an internal AI-engineering rollout. Pair it with Agent Skills overview for the implementation layer.
  2. Adopt the OpenClaw install pattern in your own tooling. Whenever you’re tempted to write a multi-platform shell script, ask: what’s the piece of text to copy-paste to an agent that would do this? Often the agent-spec answer is shorter, more portable, and self-debugging.
  3. Re-evaluate live products against the MenuGen test. For each “AI-flavored” product in your stack, ask: does the latest multimodal model collapse the whole pipeline into a single call? If yes, the product is in the old paradigm.
  4. Adopt Karpathy’s hiring rubric for agentic engineers. Replace puzzle interviews with adversarial big-project challenges (build a Twitter clone for agents; survive a high-effort red-team).
  5. For knowledge work, build a wiki the LLM maintains. This is the explicit endorsement Karpathy gives in the talk, and the pattern this vault implements — see Karpathy Pattern for community implementations.

Open Questions

  • AutoResearch positioning vs this talk’s framing. The talk frames a vibe-coding → agentic-engineering progression; Karpathy’s open-sourced AutoResearch project (covered in the Thu Vu walkthrough) goes a step further with the human as research advisor. Worth tracking whether Karpathy talks about this progression as a third tier beyond agentic engineering.
  • The unspoken “very valuable RL environment.” Karpathy explicitly hints “I don’t want to give away the answer, but there is one domain that I think is very [interesting]” — flagged for follow-up in case he names it later.
  • Talk publish date. YouTube fetch did not surface the exact upload date in this pass. Mentions of Opus 4.7 and Karpathy’s December reference suggest a 2026-Q1/Q2 recording.
  • AI Ascent context. Worth ingesting other AI Ascent talks if they surface — Sequoia’s series tends to align with concrete strategic frames usable in Intermediate Course modules.