Source: ai-research/anthropic-engineering-scaling-managed-agents-2026-06-16.md (Anthropic Engineering, “Scaling Managed Agents,” Lance Martin / Gabe Cemaj / Michael Cohen, published 2026-04-08)

Anthropic Engineering’s architecture deep-dive on how Claude Managed Agents is built. The load-bearing move: decouple the brain (Claude + harness) from the hands (sandboxes/tools) and the session (an external append-only event log), so each can fail or be replaced independently. The motivating principle — “Harnesses encode assumptions that go stale as models improve” — is why the system is opinionated about interface shapes, not about what runs behind them. This is the how-it’s-built companion to the product/operator articles under Claude Managed Agents. (Lead author Lance Martin is rlancemartin, also behind the Managed Agents cookbooks.)

Key Takeaways

  • The “pet” problem → cattle. The first version coupled session + harness + sandbox in one container — “we’d adopted a pet.” A container failure lost the session, debugging was near-impossible, and untrusted code shared space with credentials. The fix virtualizes three independent interfaces.
  • Brain / Hands / Session. Session = a durable append-only event log stored outside the harness. Harness = a stateless loop calling Claude (nothing in it needs to survive a crash). Sandbox = isolated execution. “Each became an interface that made few assumptions about the others.”
  • Six primitives carry the whole system: execute(name, input) → string (universal interface to any tool/container/service), provision({resources}), wake(sessionId) (reboot a failed harness with prior state), getSession(id), emitEvent(id, event) (durable writes during the loop), and getEvents() (positional slices for context interrogation).
  • Crashes become tool errors. With containers stateless and cattle-like, “if a container died, the harness caught the failure as a tool-call error and passed it back to Claude.” Because the session log sits outside the harness, a dead harness is simply replaced. Eliminating upfront per-session container provisioning dropped p50 TTFT ~60% and p95 over 90%.
  • Credentials never touch the harness. Two patterns: resource-bundled auth (a Git token used only at sandbox init to clone/wire remotes; Claude never sees it) and an external vault + MCP proxy (the proxy takes a session token and fetches the real credential from the vault). “The harness is never made aware of any credentials.”
  • Session ≠ context window. Context is durably stored in the session log; getEvents() lets the harness select positional slices, rewind before a moment, or transform events before passing them to Claude’s context window — context engineering + prompt caching without losing durability.
  • Many brains, many hands. The execute() interface means “the harness doesn’t know whether the sandbox is a container, a phone, or a Pokémon emulator,” and harnesses can “pass hands to one another.” Many stateless harnesses scale without proportional container provisioning — and can reach a customer’s VPC without network peering.
  • Modeled on operating systems. “Operating systems solved this problem by virtualizing hardware into abstractions — process, file — general enough for programs that didn’t exist yet.” Design for harnesses, sandboxes, and agent patterns not yet conceived.

The six primitives

PrimitiveJob
execute(name, input) → stringUniversal interface to any tool, container, MCP server, or service (a “hand”)
provision({resources})Spin up a new container/sandbox on demand
wake(sessionId)Reboot a failed/idle harness with its prior state
getSession(id)Retrieve the full event log for recovery
emitEvent(id, event)Durable write to the session during the loop
getEvents()Fetch positional slices of the event stream (rewind / reread / transform)

Why it matters

  • It’s the canonical first-party statement of why agent infrastructure decouples the session from the harness — the “infrastructure was the wall” thesis of the Agent Platform team made concrete.
  • The wake() + external-session-log pattern is Anthropic’s managed answer to the same crash-recovery problem you’d otherwise solve yourself with durable execution — compare the Temporal durable agentic loop. ^[inferred]

Try It

  1. Audit your own harness against the three interfaces: is your session log outside the harness? Can a crashed harness be replaced without losing state? Do credentials live outside the sandbox the model controls?
  2. Adopt the crash model: treat sandboxes as cattle — catch container death as a tool-call error and hand it back to Claude rather than nursing the instance.
  3. Adopt the credential discipline: never let the harness or model see tokens — bundle at sandbox init, or proxy through a vault keyed by session.
  4. Or don’t build it: use Managed Agents directly (platform.claude.com/docs/en/managed-agents/overview) and inherit the decoupling.

Open Questions

  • No pricing, rate limits, or availability regions are disclosed in the post.
  • The primitive signatures (execute/provision/wake/…) are described conceptually in the engineering post; exact SDK/API names in the platform docs may differ. ^[inferred]