Source: ai-research/anthropic-engineering-scaling-managed-agents-2026-06-16.md (Anthropic Engineering, “Scaling Managed Agents,” Lance Martin / Gabe Cemaj / Michael Cohen, published 2026-04-08)
Anthropic Engineering’s architecture deep-dive on how Claude Managed Agents is built. The load-bearing move: decouple the brain (Claude + harness) from the hands (sandboxes/tools) and the session (an external append-only event log), so each can fail or be replaced independently. The motivating principle — “Harnesses encode assumptions that go stale as models improve” — is why the system is opinionated about interface shapes, not about what runs behind them. This is the how-it’s-built companion to the product/operator articles under Claude Managed Agents. (Lead author Lance Martin is rlancemartin, also behind the Managed Agents cookbooks.)
Key Takeaways
- The “pet” problem → cattle. The first version coupled session + harness + sandbox in one container — “we’d adopted a pet.” A container failure lost the session, debugging was near-impossible, and untrusted code shared space with credentials. The fix virtualizes three independent interfaces.
- Brain / Hands / Session. Session = a durable append-only event log stored outside the harness. Harness = a stateless loop calling Claude (nothing in it needs to survive a crash). Sandbox = isolated execution. “Each became an interface that made few assumptions about the others.”
- Six primitives carry the whole system:
execute(name, input) → string(universal interface to any tool/container/service),provision({resources}),wake(sessionId)(reboot a failed harness with prior state),getSession(id),emitEvent(id, event)(durable writes during the loop), andgetEvents()(positional slices for context interrogation). - Crashes become tool errors. With containers stateless and cattle-like, “if a container died, the harness caught the failure as a tool-call error and passed it back to Claude.” Because the session log sits outside the harness, a dead harness is simply replaced. Eliminating upfront per-session container provisioning dropped p50 TTFT ~60% and p95 over 90%.
- Credentials never touch the harness. Two patterns: resource-bundled auth (a Git token used only at sandbox init to clone/wire remotes; Claude never sees it) and an external vault + MCP proxy (the proxy takes a session token and fetches the real credential from the vault). “The harness is never made aware of any credentials.”
- Session ≠ context window. Context is durably stored in the session log;
getEvents()lets the harness select positional slices, rewind before a moment, or transform events before passing them to Claude’s context window — context engineering + prompt caching without losing durability. - Many brains, many hands. The
execute()interface means “the harness doesn’t know whether the sandbox is a container, a phone, or a Pokémon emulator,” and harnesses can “pass hands to one another.” Many stateless harnesses scale without proportional container provisioning — and can reach a customer’s VPC without network peering. - Modeled on operating systems. “Operating systems solved this problem by virtualizing hardware into abstractions — process, file — general enough for programs that didn’t exist yet.” Design for harnesses, sandboxes, and agent patterns not yet conceived.
The six primitives
| Primitive | Job |
|---|---|
execute(name, input) → string | Universal interface to any tool, container, MCP server, or service (a “hand”) |
provision({resources}) | Spin up a new container/sandbox on demand |
wake(sessionId) | Reboot a failed/idle harness with its prior state |
getSession(id) | Retrieve the full event log for recovery |
emitEvent(id, event) | Durable write to the session during the loop |
getEvents() | Fetch positional slices of the event stream (rewind / reread / transform) |
Why it matters
- It’s the canonical first-party statement of why agent infrastructure decouples the session from the harness — the “infrastructure was the wall” thesis of the Agent Platform team made concrete.
- The
wake()+ external-session-log pattern is Anthropic’s managed answer to the same crash-recovery problem you’d otherwise solve yourself with durable execution — compare the Temporal durable agentic loop. ^[inferred]
Try It
- Audit your own harness against the three interfaces: is your session log outside the harness? Can a crashed harness be replaced without losing state? Do credentials live outside the sandbox the model controls?
- Adopt the crash model: treat sandboxes as cattle — catch container death as a tool-call error and hand it back to Claude rather than nursing the instance.
- Adopt the credential discipline: never let the harness or model see tokens — bundle at sandbox init, or proxy through a vault keyed by session.
- Or don’t build it: use Managed Agents directly (
platform.claude.com/docs/en/managed-agents/overview) and inherit the decoupling.
Related
- Claude Managed Agents — the product/operator overview this post is the architecture of.
- Managed Agents — Self-Hosted Sandboxes + MCP Tunnels — the “hands run on customer infra” + credential-isolation pattern this decoupling makes possible.
- Managed Agents Production (Jess Ann + Lance Martin) — the Agent / Environment / Session / Events mental model; this post is the deeper “why the session is its own interface.”
- Agent Platform Team — the “infrastructure was the wall” thesis this architecture answers.
- How We Contain Claude — the runtime-sandbox/containment slice that complements this session + credential slice.
- Claude Agent SDK — How the Agent Loop Works — the “stateless loop calling Claude” that this architecture virtualizes as the harness.
- Temporal — Durable Agentic Loop — the roll-your-own durable-execution counterpart to the external-session-log +
wake()pattern. - Running the Agentic Loop — In-Process, Durable, or Hosted — the cross-topic synthesis placing this hosted runtime against the in-process SDK loop and the durable Temporal loop.
Open Questions
- No pricing, rate limits, or availability regions are disclosed in the post.
- The primitive signatures (
execute/provision/wake/…) are described conceptually in the engineering post; exact SDK/API names in the platform docs may differ. ^[inferred]