Source: raw/reflecting-on-a-year-of-claude-code-transcript.md (YouTube Hth_tLaC2j8, official Claude channel, ~18m, published 2026-06-08; surfaced via @bcherny + @ClaudeDevs)
Type: Video / fireside
Speakers: Boris Cherny (Claude Code creator) & Cat Wu (Claude Code PM), Anthropic
Duration: ~18 min
A first-party retrospective from the two people who built Claude Code, one year after its GA. It’s the clearest official statement to date of how the Claude Code team actually works now — and the practices they’re betting the broader ecosystem moves toward: write fixes back into CLAUDE.md/skills instead of re-correcting inline, treat verification as “can the agent run the thing,” default to auto mode over plan mode, run routines that proactively fix bugs, and operate at the scale of dozens-to-thousands of agents. The throughline echoes Karpathy’s “agentic engineering”: you stop writing code, then stop talking to the agent, and start talking to a loop.
Key Takeaways
- When Claude makes a mistake, don’t correct it inline — encode the fix. Boris: “every single time Claude makes a mistake, I don’t tell Claude to do it differently. I tell it to write it to the
CLAUDE.mdor to make a skill… And if you can do this, then Claude can just run forever.” This is the single most-emphasized idea in the talk. - Verification is the misunderstood unlock — and it’s not unit tests/lint. Those were already automated. Agent verification means “can the agent run the thing?” — spin up the real app/service and check end-to-end. Examples: Opus 4 testing its own feature in a Claude CLI via bash; iOS/Android simulators; computer-use on the desktop app.
- Auto mode has replaced plan mode. Newer models (4.6, definitely 4.7) “don’t need that planning step anymore.” Boris runs auto mode for everything so he can start a Claude and immediately move to the next one.
- Auto mode is framed as more safe than per-prompt approval. Routing each tool request to a separate classifier model beats human approval, because when you accept 99% of prompts “your eyes just glaze over.” You only see the things that matter.
- Routines are the first obvious application of the Agent SDK. A teammate’s routine listens for every ticket/GitHub issue/bug report about a feature, proactively puts up a fix, and pings the PR. “Claude tells me all the time now that someone else’s Claude has already fixed it.”
- Roles are merging. PMs, designers, finance, and data science all work in Claude Code; engineers ship products end-to-end (idea → build → legal/marketing/security). “Everyone’s going to be both.”
- Put Claude at the center of every process, not on the side. The HBR computer-productivity-paradox analogy: gains came only when companies threw out the filing cabinet and ran everything through the computer. At Anthropic, onboarding means “ask Claude, not people.”
- Operate multi-agent — increasingly from a phone. Boris does “probably half my engineering now on my phone,” starting agents via remote control and voice mode; the new agent view + desktop app (auto work-tree cloning) replaced his six-terminal-tabs workflow.
- Be a context minimalist. With today’s models you skip prompt/context engineering: give the minimal system prompt + minimal tools and let the model pull in what it needs. A leaner harness leaves more room for your own prompts.
The Two Leaps
Boris frames a year and a half as two big jumps in what an engineer interacts with:
- Source code → agent. “I don’t write the source code, I talk to an agent and the agent writes the source code for me.”
- Agent → loop/routine. “I don’t talk to an agent anymore. I talk to a loop or I talk to a routine and it prompts Claude for me.” (See [[claude-ai/scheduled-tasks|
/loop]] and Routines.)
The team expects a third shift is already underway — agents running longer and more autonomously, “dozens or hundreds or thousands” at a time — with a form factor nobody has figured out yet.
Verification (the part everyone misunderstands)
- Not unit tests, lint, or type-checks — those are trivially automatable and already done.
- Agent verification = can the agent actually run and exercise the thing it built. Often non-obvious to wire up; that’s the real work.
- Cat’s desktop-development skill example: a teammate added a skill that teaches Claude to run the local desktop app; Claude uses computer use to click around, invoke new UX, test edge cases, fix issues and re-check. When staging breaks, Claude reads Slack to learn whether staging is down or someone already hit the issue — then updates the skill after debugging.
- This is why the team trusts agents to run unattended: verification + security together let you “move on and just have a second agent.”
Auto Mode + the Security Model
- The old permission-prompt model (approve every tool call) was the best available 1.5 years ago, before good classifiers and well-aligned models.
- Auto mode routes each request to a separate model that checks for safety and denies anything “a little sus”; you can allow it later. (See plan mode — the artifact-producing alternative some still prefer.)
- How they earned trust to ship it: collected thousands of full agent-trajectory transcripts + permission prompts and had auto mode classify safe/unsafe; brought in red-teamers to prompt-inject and hack Claude Code’s auto mode; turned attacks into evals; iterated until all were denied. “Not only protecting against vulnerabilities in the wild today, but the most intelligent attacks we can construct.”
- Security discipline: “always red teaming, always pentesting… always having a threat model and then using that to figure out how is this thing going to get attacked, how are people going to get prompt injected.” (Connects to the verification frontier.)
Routines & Proactive Agents
- A teammate who shipped voice mode set a routine to watch every voice-mode ticket/issue/bug and proactively put up fixes; then a second routine for all unresponded feedback; then one that picks up any bug report unanswered for 5 hours, fixes it, and merges the easy-to-verify ones.
- Result: “remember back in the day you used to actually have to respond to code review comments, fix CI, rebase? I haven’t done that in a long time.” Routines babysit PRs, do code review, and fix CI.
Try It
For WEO Marketly / any team scaling Claude Code:
- Adopt the “encode the fix” rule. Make it a team norm: when Claude errs, update
CLAUDE.mdor write a skill — don’t just re-prompt. This is what lets agents “run forever.” - Define verification per project. Write down how an agent can run and exercise each surface (start the web server, an iOS/Android sim, Claude in Chrome for web) — that harness is the prerequisite for trusting auto mode.
- Turn on auto mode and stop reading every prompt. Pair it with the self-verification harness; reserve attention for what matters.
- Stand up one routine. Start with the canonical case: watch a bug/issue label, auto-fix, ping the PR. The first concrete payoff of the Agent SDK.
- Put Claude at the center. Make “ask Claude” the default for onboarding questions and process steps — not a tool bolted onto the side.
- Go context-minimal. Trim system prompts and tool lists; give the model a way to pull context and let it figure out the rest.
Related
- Dynamic Workflows — the orchestrate-hundreds/thousands-of-agents primitive this video’s “armies of agents” framing depends on (and where Boris’s autonomous-Opus 5-tips are noted)
- Claude Code Routines — the proactive-bug-fixing primitive at the heart of the routines stories
- [[claude-ai/scheduled-tasks|
/loop]] — the “talk to a loop, not an agent” leap - Plan Mode — the mode auto mode has largely replaced for the team
- The Verification Frontier — why “can the agent run it” is the load-bearing skill
- From Vibe Coding to Agentic Engineering — Karpathy’s framing of the same source→agent→loop progression and role-merging
- 2026 AI-Work Restructuring — the org-level “Claude at the center” + roles-merging thesis
- Claude Opus 4.8 — the model generation (4.6/4.7+) that made the planning step unnecessary
- Claude Agent SDK — the programmatic Claude Code layer routines are built on
- Boris Cherny — Creator of Claude Code — entity hub; this video is one of his primary first-party appearances
Open Questions
- No speaker labels in the source. The captions don’t attribute lines; quote attributions here are inferred from context (Boris = creator, Cat Wu = PM) ^[inferred] — verify against the video before citing a specific person.
- The next form factor. The team explicitly says they don’t know what the dozens/hundreds/thousands-of-agents interface will look like — “it’s going to be up to the team [and community] to figure it out.”
/goalvs routines vs loops boundaries. The talk uses “loop” and “routine” somewhat interchangeably; the precise primitive boundaries are covered better in scheduled-tasks + claude-code-routines.