Code with Claude 2026 — Opening Keynote

Source: Code with Claude 2026 — Opening Keynote (YouTube, 47:29), Anthropic, May 7 2026. Transcript via local Whisper fallback (no YouTube captions on day-of upload — see w19 release digest context).

Anthropic’s first developer conference keynote, recorded May 7 2026. Three layers of one story: model intelligence (Diane Penn), Claude platform agents (Angela Jiang + Caitlin Burke), and Claude Code primitives (Cat Wu + Boris Cherny). Headlines: doubled Claude Code 5-hour rate limits via SpaceX Colossus 1; three new Managed Agents primitives — multi-agent orchestration, outcomes, and dreaming; Claude Code Desktop launch; routines, AutoFix, Claude Security as compose-able primitives. No new model unveiled — the keynote was about putting existing models to work.

Key Takeaways

Three new Managed Agents primitives. Multi-agent orchestration (fleets), outcomes (rubric-graded iterate-until-success), and dreaming (Claude self-learns from previous sessions, writes lessons to memory). All three demoed live on a fictional drone-landing startup (“Lumara”).
No new model. Mythos preview was named as “the next point on the exponential” but not unveiled. Keynote was deliberately product-focused — closing the gap between model capability and shipped product.
17× year-over-year API volume growth. Cited by Ami Vora in opening. Average developer is now spending 20 hours per week running Claude.
Doubled Claude Code 5-hour rate limits for Pro / Max / Team / seat-based enterprise; “considerably raised” Opus API limits. Funded by SpaceX Colossus 1 partnership (covered in Anthropic + SpaceX Rate-Limit Increase).
Claude Code Desktop launched. Third surface after CLI and IDE — full-screen GUI, built-in preview, sidebar control plane for parallel agents (local + remote), image and rich-output rendering. Built on the same Claude Agent SDK as the IDE plugin.
Routines reframed as “higher-order prompts.” Boris Cherny: “I’m not the one doing the prompting. I’m the one creating a routine that does the prompting.” Trigger by schedule, webhook, API, or arbitrary event. Run locally or on remote Cloud Compute.
AutoFix + shift code review + Claude Security compose into agentic CI. AutoFix watches a PR from open → green → production: addresses code-review comments, fixes flaky CI (root-cause-fix in the Claude Code codebase, not just retry), auto-rebases on merge conflicts. Claude Security scans the codebase overnight and can spawn Claude Code sessions for fixes.
Anthropic’s own engineering: 200 % increase in PRs/engineer with Claude Code Tools wall-to-wall, same code-quality bar, while engineering team scaled.
Mercado Libre at scale. 23,000 engineers all on Claude Code. 500,000+ PRs reviewed with human oversight. 9,000+ apps modernized. Oscar Mullen targeting “90 % autonomous coding by Q3 2026.”
Shopify across functions. Claude Code adopted in design / product / data science alongside engineering. “The speed is just crazy.” (Andrew McNamara, director of applied AI.)
Advisor strategy as a cost lever. Smaller-model executor + bigger-model advisor pattern shipped as a Messages-API tools-array primitive. Eve Legal: “frontier-model quality at 5× lower cost” with Sonnet executing + Opus advising.
18 versions of Claude. Diane Penn: Anthropic shipped 8 frontier models in the last 12 months across Haiku / Sonnet / Opus / Mythos.
Task Horizon trajectory. Last year: minutes. Now: hours. Next: “always-on, proactive agents that know what to work on without losing the thread.”
Build for the next model, not the current one. “Make upgrades cheap” — automated evals, simple scaffolding, ambitious prototypes that don’t quite work today. The teams getting most value are the ones whose architecture absorbs intelligence jumps.

The Three Layers

The keynote was structured as three layers feeding into one story.

Layer 1 — Model intelligence (Diane Penn, Research PM)

18 Claude versions across Haiku / Sonnet / Opus / Mythos; 8 frontier models in last 12 months.
Lineage callouts: Opus 3 (JSON adherence + code), Sonnet 3.5/3.6 (computer use safely), Sonnet 3.7 (over-eager calibration), Claude 4 (thinking dials, test-time compute).
Opus 4.7 customer wins: AMP (full smart-mode migration, simpler tooling); Rakuten (3× more production engineering tasks resolved); Intuit (self-correcting during planning, faster + cleaner execution).
Claude Design by Anthropic Labs launched the day after Opus 4.7 — “Opus 4.7 has a real taste for visual design.” See Claude Design.
Three product-direction commitments: higher judgment + better quality co-taste (autonomous engineering), context windows that feel infinite when combined with high-quality memory, multi-agent coordination for goals too big for a single instance.
Mythos preview named as “the next point on the exponential. Not a small step.” See Claude Mythos Preview System Card.
Builder advice: “Design for the next version, not just the current one. Make upgrades cheap. Maintain harder evals; build ambitious prototypes that don’t quite work today — that’s how you’ll notice when the exponential is moving under you.”

Layer 2 — Claude platform agents (Caitlin Burke + Angela Jiang)

Two named problems: getting the right outcome (still requires too much prompt-opt + tool construction + harness engineering), and ship-fast-but-scalably.

Claude Managed Agents is the harness that bundles best-practice production infrastructure. Notion built in-product Claude agents on Managed Agents for long-running autonomous tasks. “10× faster prototype to production.” Memory is bundled in, automatically tuned for Claude, and portable (“memory is ultimately yours — take it wherever you’d like”).

Three new Managed Agents primitives announced today:

Multi-agent orchestration. A coordinator session spawns specialist agents on independent threads with independent context windows, then merges results. Demoed on Lumara: commander coordinates a detector (site selection) and a navigator (safe descent).
Outcomes. A markdown-rubric file specifies success criteria; a separate grader agent checks each run against the rubric; the executor iterates until the grader passes (or hits max_iterations). Demoed: drone-landing rubric (“touch down softly + land on clear ground + enough fuel to return”) drove iteration on hill-climbing.
Dreaming. See dedicated article Claude Dreaming. One-button (Dream) in the Claude Developer Console points a dreaming agent at a memory store; the agent inspects past sessions, identifies skills missed and lessons that should have been learned, writes them to the memory store. Demo’s overnight Dream produced a “descent playbook” of cross-mission heuristics that lifted hill-climb sites 3 + 4 from prior failure.

Advisor strategy. A simpler primitive announced alongside: split execution from advising. Update the tools array in the Messages API to expose a bigger model (e.g., Opus) as an advisor to a smaller executor (Sonnet). When Sonnet hits a hard step, it consults Opus. Eve Legal reports frontier-model quality at 5× lower cost.

Layer 3 — Claude Code primitives (Cat Wu + Boris Cherny)

Three surfaces, same SDK. CLI (power users / minimal text / max customization) → IDE (follow code changes) → Desktop (full GUI / preview / sidebar control plane / image rendering / local + remote session control). All three sit on the Claude Agent SDK.

shift code review — agent fleet catches critical bugs on the PR. Used by all internal Anthropic teams + thousands of external companies.

Mobile remote control — Claude Code added to the iOS and Android Claude apps. “Go to a park, touch grass, and still code.”

AutoFix — listens to PR / CI / review events, proactively writes fixes. In the Claude Code codebase itself, flaky-CI handling root-cause-fixes the flake instead of retrying. Result: scenario-owner never sees a red X.

Routines — the “higher-order prompt” primitive. Configure once; trigger by schedule / webhook / API call / arbitrary event. Local execution or remote Cloud Compute. Cherny: “The default isn’t I’m going to prompt Claude Code. The default is I will have Claude prompt Claude Code.” See also rate-limit increase making this primitive viable at scale.

Claude Security — overnight whole-codebase scan; can spawn Claude Code sessions to address discovered vulnerabilities. Integrates with existing SDLC.

Composability is the win. AutoFix + Routines + shift code review + Claude Security stack as agentic CI. The Boris demo (AcmePay refund implementation) showed: routine watches GitHub for new tickets → spawns Claude Code → Claude implements idempotent multi-currency refund + audit logging → tests its own work in a browser → AutoFix babysits the PR through review + flaky CI + merge conflicts → green PR ready to merge while engineer is away.

Customer scale callouts

Org	Detail
Anthropic (internal)	200 % increase in PRs/engineer with Claude Code wall-to-wall, same code-quality bar
Mercado Libre	23,000 engineers all on Claude Code; 500,000+ PRs reviewed; 9,000+ apps modernized; targeting 90 % autonomous coding by Q3 2026 (Oscar Mullen)
Shopify	Cross-functional rollout (engineering + design + product + data science). “The speed is just crazy.” (Andrew McNamara, dir. applied AI)
Notion	Built in-product Claude agents on Managed Agents for long-running autonomous tasks
Eve Legal	Advisor strategy — frontier-model quality at 5× lower cost
AMP	Migrated full smart-mode to Opus 4.7; simplified tooling (model no longer needs scaffold)
Rakuten	3× more production engineering tasks resolved with Opus 4.7
Intuit	Opus 4.7 self-correcting during planning, faster + cleaner execution

Speakers

Ami Vora — Chief Product Officer (opener; rate-limit + SpaceX news)
Diane Penn — Research PM (model layer)
Angela Jiang — Claude Platform Head of PM (Managed Agents demo)
Caitlin Burke ^[ambiguous] (whisper transcribed as “Caitlyn Lesse” elsewhere — likely Katelyn Lesse, Claude Platform Head of Engineering — see claude-ai _index org callouts) (Managed Agents demo)
Cat Wu — Claude Code/Cowork Head of PM (CC primitives intro)
Boris Cherny — Head of Claude Code (CC live demo)

Try It

Read the keynote announcements end-to-end in the source articles: rate-limit increase, cookbook coverage of Managed Agents multiagent + outcomes, SDK releases, Claude Code week 19 release digest. Each landed earlier this week as build-up.
Try Dreaming on a Managed Agent. See Claude Dreaming for the one-button workflow.
Wire a routine for a recurring task. Pick a workflow you currently kick off manually (issue triage, dependency-update PR). Configure as a routine with a webhook or schedule. The async-first shift is the keynote’s most concrete builder takeaway.
Test the advisor strategy. Update your Messages API tools array to expose Opus as advisor to a Sonnet executor. Measure cost vs Sonnet-alone and quality vs Opus-alone on your eval. Eve Legal’s 5× cost reduction is the headline; replicate or refute.
Build harder evals. Diane Penn’s builder advice: “build ambitious prototypes that don’t quite work today, so you’ll notice when the exponential is moving.” Pick the next model jump and pre-stage an eval that doesn’t pass on Opus 4.7 — instrument it to alert when Mythos closes the gap.

Open Questions

Mythos preview availability beyond Project Glasswing. Keynote names it as “the next point on the exponential” but Mythos Preview System Card documents the deliberate withholding. Will any Code-with-Claude-2026 attendees get access?
Claude Code Desktop pricing / subscription tier. ~~Confirm that Desktop is included in existing Pro/Max/Team plans and not a separate SKU.~~ Resolved 2026-05-09: Desktop is bundled into existing Claude Code subscriptions — not a separate SKU. Claude Pro ( $20/ m o,$ 17/mo annual) explicitly lists “Access to Claude Code in the terminal, on the web, and on the desktop” per the post-keynote pricing page. Max 5x ( $100/ m o), M a x 20 x ($ 200/mo) inherit. Team Premium ( $100/ se a t ann u a l /$ 125/mo) includes Claude Code. Late April 2026: Team Standard ( $20/ se a t ann u a l /$ 25/mo) quietly gained Claude Code access too — older third-party guides still list it as Premium-only. Free tier remains the only excluded plan. Sources: claudecodecamp.com, ssdnodes.com, findskill.ai (claude.com/pricing snapshot dated May 4 2026).
Dreaming cost model. ~~Whether Dream cycles cost the same as inference or are billed differently.~~ Resolved 2026-05-09 via Anthropic’s official pricing page: Dreaming has no separate line item; it bills under the standard Managed Agents dual-axis model — tokens at standard rates (with prompt-cache multipliers) + ** $0.08/ sess i o n - h o u r * * w hi l esess i o n s t a t u s i s ‘ r u nnin g ‘. I d l e / resc h e d u l in g / t er mina t e d t im e i s f ree . W e b se a rc hin s i d e a Dre am cos t s t h es t an d a r d$ 10/1k. The “press a button, run overnight” demo bills the cumulative running-state duration plus tokens consumed during reflection. See Claude Dreaming for the full breakdown.
Outcomes vs Subagents vs Multi-agent orchestration. Three overlapping primitives now exist. The article articulates them as distinct (Outcomes = grader, Subagents = parallel session children, Multi-agent = coordinator-managed fleet) but the boundary cases (when does a coordinator become the right call vs an outcome-graded loop) are not addressed. Worth a connection article.
Routines vs cron + GitHub Actions. Boris frames routines as a higher-order prompt primitive. The differentiator vs running Claude Code from a CI job is not explicit — likely state persistence, multi-trigger uniformity, and remote-Cloud-Compute backing, but worth confirming.
Whisper-mistranscribed names. “Caitlyn” vs Caitlin / Katelyn ambiguity. Review against the Anthropic press post when published.

Anthropic + SpaceX Rate-Limit Increase — the capacity announcement Ami Vora opens with.
Claude Code Week 19 Release Digest — six CLI releases bracketing the conference.
Anthropic SDK Releases — May 2026 — Python v0.100 / TS v0.95 land the Managed Agents API surface.
Anthropic Cookbook: Managed Agents Multiagent + Outcomes — cookbook coverage of the same APIs the keynote unveils.
Managed Agents — the harness primitive these features upgrade.
Claude Dreaming — dedicated entity article on the self-learning feature.
Claude Mythos Preview System Card — the model preview Diane Penn names.
Opus 4.7 Best Practices — current frontier model the AMP / Rakuten / Intuit case studies use.
Claude Code Routines — the primitive Boris reframes as “higher-order prompt.”
Claude Design — launched the day after Opus 4.7.
Claude Code CLI Reference — the original surface; Desktop is the third surface in the lineage.
Claude Agent Skills Overview — multi-agent + advisor patterns.
Agent Workflow Patterns — Outcomes implements the evaluator-optimizer pattern from Anthropic’s official taxonomy.
2026 Claude Code AIOS Pattern — the convergent practitioner framework that Routines + Skills + Memory feeds into.
Cost & Intelligence Levers — advisor strategy is a new entry on the cost lever.

Jonathon's AI Wiki

Explorer

Code with Claude 2026 — Opening Keynote

Key Takeaways

The Three Layers

Layer 1 — Model intelligence (Diane Penn, Research PM)

Layer 2 — Claude platform agents (Caitlin Burke + Angela Jiang)

Layer 3 — Claude Code primitives (Cat Wu + Boris Cherny)

Customer scale callouts

Speakers

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Code with Claude 2026 — Opening Keynote

Key Takeaways

The Three Layers

Layer 1 — Model intelligence (Diane Penn, Research PM)

Layer 2 — Claude platform agents (Caitlin Burke + Angela Jiang)

Layer 3 — Claude Code primitives (Cat Wu + Boris Cherny)

Customer scale callouts

Speakers

Try It

Open Questions

Related

Graph View

Table of Contents

Backlinks