Reflecting on a Year of Claude Code — Boris Cherny & Cat Wu (Anthropic)

Source: raw/reflecting-on-a-year-of-claude-code-transcript.md (YouTube Hth_tLaC2j8, official Claude channel, ~18m, published 2026-06-08; surfaced via @bcherny + @ClaudeDevs) Follow-up source (2026-07-10 refresh): raw/x-account-claudeai-2074244664199115201.md (@claudeai, posted 2026-07-06 — “The Making of Claude Code” video + article; Boris Cherny’s companion post at x.com/bcherny/status/2074247226038063316) Type: Video / fireside Speakers: Boris Cherny (Claude Code creator) & Cat Wu (Claude Code PM), Anthropic Duration: ~18 min

A first-party retrospective from the two people who built Claude Code, one year after its GA. It’s the clearest official statement to date of how the Claude Code team actually works now — and the practices they’re betting the broader ecosystem moves toward: write fixes back into CLAUDE.md/skills instead of re-correcting inline, treat verification as “can the agent run the thing,” default to auto mode over plan mode, run routines that proactively fix bugs, and operate at the scale of dozens-to-thousands of agents. The throughline echoes Karpathy’s “agentic engineering”: you stop writing code, then stop talking to the agent, and start talking to a loop.

Refreshed 2026-07-10: @claudeai shared a follow-up “The Making of Claude Code” video + article (2026-07-06) featuring builders and early users. Boris Cherny’s companion post adds concrete origin-story detail not covered in the June retrospective above — see Origins section below.

Key Takeaways

When Claude makes a mistake, don’t correct it inline — encode the fix. Boris: “every single time Claude makes a mistake, I don’t tell Claude to do it differently. I tell it to write it to the CLAUDE.md or to make a skill… And if you can do this, then Claude can just run forever.” This is the single most-emphasized idea in the talk.
Verification is the misunderstood unlock — and it’s not unit tests/lint. Those were already automated. Agent verification means “can the agent run the thing?” — spin up the real app/service and check end-to-end. Examples: Opus 4 testing its own feature in a Claude CLI via bash; iOS/Android simulators; computer-use on the desktop app.
Auto mode has replaced plan mode. Newer models (4.6, definitely 4.7) “don’t need that planning step anymore.” Boris runs auto mode for everything so he can start a Claude and immediately move to the next one.
Auto mode is framed as more safe than per-prompt approval. Routing each tool request to a separate classifier model beats human approval, because when you accept 99% of prompts “your eyes just glaze over.” You only see the things that matter.
Routines are the first obvious application of the Agent SDK. A teammate’s routine listens for every ticket/GitHub issue/bug report about a feature, proactively puts up a fix, and pings the PR. “Claude tells me all the time now that someone else’s Claude has already fixed it.”
Roles are merging. PMs, designers, finance, and data science all work in Claude Code; engineers ship products end-to-end (idea → build → legal/marketing/security). “Everyone’s going to be both.”
Put Claude at the center of every process, not on the side. The HBR computer-productivity-paradox analogy: gains came only when companies threw out the filing cabinet and ran everything through the computer. At Anthropic, onboarding means “ask Claude, not people.”
Operate multi-agent — increasingly from a phone. Boris does “probably half my engineering now on my phone,” starting agents via remote control and voice mode; the new agent view + desktop app (auto work-tree cloning) replaced his six-terminal-tabs workflow.
Be a context minimalist. With today’s models you skip prompt/context engineering: give the minimal system prompt + minimal tools and let the model pull in what it needs. A leaner harness leaves more room for your own prompts.

The Two Leaps

Boris frames a year and a half as two big jumps in what an engineer interacts with:

Source code → agent. “I don’t write the source code, I talk to an agent and the agent writes the source code for me.”
Agent → loop/routine. “I don’t talk to an agent anymore. I talk to a loop or I talk to a routine and it prompts Claude for me.” (See loop` and Routines.)

The team expects a third shift is already underway — agents running longer and more autonomously, “dozens or hundreds or thousands” at a time — with a form factor nobody has figured out yet.

Origins: Before Claude Code Was Claude Code

[X signal — @claudeai, 2026-07-06] Source: raw/x-account-claudeai-2074244664199115201.md. A follow-up to the June retrospective above: @claudeai shared “The Making of Claude Code,” a video + article featuring builders and early users. Boris Cherny’s companion post adds origin detail not covered in the June interview:

Claude Code originated inside internal safety research — it wasn’t conceived from day one as a developer product.
It had a predecessor codenamed “Clide.”
Despite the product’s impact, the team considers itself “1% done.” Consistent with the “two leaps, and a third shift already underway” framing above — the team’s own account of its trajectory keeps landing on “early,” not “arrived.”
On the Aider question: in replies to his own post, Cherny clarified he personally never used Aider (the open-source AI pair-programming CLI), though some Clide contributors may have drawn inspiration from it. Worth recording precisely because “was Claude Code inspired by Aider” is a recurring community question this directly, if narrowly, answers.

Verification (the part everyone misunderstands)

Not unit tests, lint, or type-checks — those are trivially automatable and already done.
Agent verification = can the agent actually run and exercise the thing it built. Often non-obvious to wire up; that’s the real work.
Cat’s desktop-development skill example: a teammate added a skill that teaches Claude to run the local desktop app; Claude uses computer use to click around, invoke new UX, test edge cases, fix issues and re-check. When staging breaks, Claude reads Slack to learn whether staging is down or someone already hit the issue — then updates the skill after debugging.
This is why the team trusts agents to run unattended: verification + security together let you “move on and just have a second agent.”

Auto Mode + the Security Model

The old permission-prompt model (approve every tool call) was the best available 1.5 years ago, before good classifiers and well-aligned models.
Auto mode routes each request to a separate model that checks for safety and denies anything “a little sus”; you can allow it later. (See plan mode — the artifact-producing alternative some still prefer.)
How they earned trust to ship it: collected thousands of full agent-trajectory transcripts + permission prompts and had auto mode classify safe/unsafe; brought in red-teamers to prompt-inject and hack Claude Code’s auto mode; turned attacks into evals; iterated until all were denied. “Not only protecting against vulnerabilities in the wild today, but the most intelligent attacks we can construct.”
Security discipline: “always red teaming, always pentesting… always having a threat model and then using that to figure out how is this thing going to get attacked, how are people going to get prompt injected.” (Connects to the verification frontier.)

Routines & Proactive Agents

A teammate who shipped voice mode set a routine to watch every voice-mode ticket/issue/bug and proactively put up fixes; then a second routine for all unresponded feedback; then one that picks up any bug report unanswered for 5 hours, fixes it, and merges the easy-to-verify ones.
Result: “remember back in the day you used to actually have to respond to code review comments, fix CI, rebase? I haven’t done that in a long time.” Routines babysit PRs, do code review, and fix CI.

Try It

For WEO Marketly / any team scaling Claude Code:

Adopt the “encode the fix” rule. Make it a team norm: when Claude errs, update CLAUDE.md or write a skill — don’t just re-prompt. This is what lets agents “run forever.”
Define verification per project. Write down how an agent can run and exercise each surface (start the web server, an iOS/Android sim, Claude in Chrome for web) — that harness is the prerequisite for trusting auto mode.
Turn on auto mode and stop reading every prompt. Pair it with the self-verification harness; reserve attention for what matters.
Stand up one routine. Start with the canonical case: watch a bug/issue label, auto-fix, ping the PR. The first concrete payoff of the Agent SDK.
Put Claude at the center. Make “ask Claude” the default for onboarding questions and process steps — not a tool bolted onto the side.
Go context-minimal. Trim system prompts and tool lists; give the model a way to pull context and let it figure out the rest.

Dynamic Workflows — the orchestrate-hundreds/thousands-of-agents primitive this video’s “armies of agents” framing depends on (and where Boris’s autonomous-Opus 5-tips are noted)
Claude Code Routines — the proactive-bug-fixing primitive at the heart of the routines stories
loop` — the “talk to a loop, not an agent” leap
Plan Mode — the mode auto mode has largely replaced for the team
The Verification Frontier — why “can the agent run it” is the load-bearing skill
From Vibe Coding to Agentic Engineering — Karpathy’s framing of the same source→agent→loop progression and role-merging
2026 AI-Work Restructuring — the org-level “Claude at the center” + roles-merging thesis
Claude Opus 4.8 — the model generation (4.6/4.7+) that made the planning step unnecessary
Claude Agent SDK — the programmatic Claude Code layer routines are built on
Boris Cherny — Creator of Claude Code — entity hub; this video is one of his primary first-party appearances
Why Coding Is Solved, and What Comes Next (AI Ascent 2026) — Boris’s other 2026 first-party retrospective; the “1% done” framing added in the 2026-07-10 Origins refresh above echoes this talk’s “printing press, early days” thesis
Loop Engineering: Getting Started with Loops — the Claude Code team’s own four-type loop taxonomy, the article-length successor to this video’s “my job is to write loops” line
Loop Engineering (Cobus Greyling) — the community pattern reference built around Boris’s “my job is to write loops”: turns /loop-style usage into six primitives, seven patterns, and an L0→L3 readiness ladder.
The Loop Is the Unit of Work — the cross-topic synthesis placing Boris’s “talk to a loop” leap alongside the tool-agnostic loop-engineering naming of the same shift.
Engineering Leverage in the Agent Era (Boris Cherny) — a later (2026-07-15) thread that expands this video’s “encode the fix” idea into a fuller engineering-leverage argument about tacit knowledge as infrastructure.

Open Questions

No speaker labels in the source. The captions don’t attribute lines; quote attributions here are inferred from context (Boris = creator, Cat Wu = PM) ^[inferred] — verify against the video before citing a specific person.
The next form factor. The team explicitly says they don’t know what the dozens/hundreds/thousands-of-agents interface will look like — “it’s going to be up to the team [and community] to figure it out.”
/goal vs routines vs loops boundaries. The talk uses “loop” and “routine” somewhat interchangeably; the precise primitive boundaries are covered better in scheduled-tasks + claude-code-routines.

Jonathon's AI Wiki

Explorer

Reflecting on a Year of Claude Code — Boris Cherny & Cat Wu (Anthropic)

Key Takeaways

The Two Leaps

Origins: Before Claude Code Was Claude Code

Verification (the part everyone misunderstands)

Auto Mode + the Security Model

Routines & Proactive Agents

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Reflecting on a Year of Claude Code — Boris Cherny & Cat Wu (Anthropic)

Key Takeaways

The Two Leaps

Origins: Before Claude Code Was Claude Code

Verification (the part everyone misunderstands)

Auto Mode + the Security Model

Routines & Proactive Agents

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks