Source: wiki synthesis: Claude Agent SDK — How the Agent Loop Works, Temporal — Durable Agentic Loop, Scaling Managed Agents
The agentic loop is one primitive: Claude evaluates the state, calls tools, receives results, and repeats until it produces output with no tool calls. Three articles ingested together — the first-party Agent SDK loop, Temporal’s durable agentic loop, and Anthropic’s Managed Agents engineering architecture — are the same loop run three ways. What separates them is not the loop logic (identical in all three) but where the loop’s state lives and who owns failure recovery. Read together they form a spectrum from you own the harness, the run is ephemeral to Anthropic owns the harness, the state is durable by design — and, surprisingly, all three converge on the same architectural rule.
Key Takeaways
- One loop, three runtimes. The Claude tool-use cycle (
evaluate → tool_use → execute → append results → repeat → final text) is byte-for-byte the same in the SDK, the Temporal recipe, and inside Managed Agents. The runtime is the variable, not the algorithm. ^[inferred] - The axis is state durability + recovery ownership. Moving SDK → Temporal → Managed Agents, you trade control over the harness for someone else owning crash recovery, retries, scaling, and credential isolation. ^[inferred]
- All three independently arrive at one rule: the harness should be stateless; the session/state should be durable and external. Managed Agents states it outright (“nothing in the harness needs to survive a crash”); Temporal enforces it via the Workflow/Activity determinism split; the SDK advises it (resume via
session_id, and “put durable rules in CLAUDE.md, not the initial prompt, because compaction may drop them”). Same insight, three enforcement strengths. ^[inferred] - Retries are a tell. The SDK leaves retries to you/the client; Temporal explicitly takes retries away from the Anthropic client (“client retries can interfere with correct and durable error handling”); Managed Agents catches a dead sandbox as a tool-call error and hands it back to Claude. Who owns the retry tells you which runtime you’re in. ^[inferred]
- Durability is orthogonal to correctness and cost. This is a third axis alongside verification (is the output right?) and security (what does it cost, is it safe?). A production loop needs an answer on all three. ^[inferred]
- It’s the operational sequel to “the loop is the unit of work”. Once the loop is your unit of work, the next engineering decision is where it runs and who owns its failure modes. ^[inferred]
The one loop, three runtimes
| Dimension | In-process — Agent SDK | Durable — Temporal | Hosted — Managed Agents |
|---|---|---|---|
| Where the loop runs | your process | a Temporal worker on your infra | Anthropic’s infra |
| Where loop state lives | memory + conversation; opt-in session_id resume | the Workflow event history (auto-replayed) | external append-only session log (emitEvent / getEvents) |
| Crash recovery | run dies; resumes only if you persisted session_id | automatic replay from history | wake(sessionId) reboots a stateless harness |
| Who owns retries | you / the SDK client | Temporal (deliberately not the Anthropic client) | Anthropic (sandbox death → tool-call error → back to Claude) |
| Tool execution | in-process tools + MCP | each call is a retryable Activity | universal execute(); sandboxes are cattle |
| Bounding | max_turns / max_budget_usd / effort | Workflow timeouts + retry policy | service-managed |
| Credential isolation | your responsibility | your responsibility | vault + MCP proxy — harness never sees creds |
| Statelessness of harness | advised (resume + CLAUDE.md) | enforced (determinism split) | architected (session decoupled from harness) |
| Ops you own | the harness only | harness + a durable-execution cluster | almost nothing |
| Best for | prototypes, interactive, bounded tasks | long-running / unattended / side-effecting, on your stack | production without building the infra |
The shared insight: stateless harness, durable external state
The non-obvious payoff of reading these three together is that they are the same architecture discovered from three directions. The agentic loop is fragile by default — a long run holds its entire state in one process’s memory, so a crash three hours in loses everything. Each runtime fixes this by pulling state out of the harness:
- Managed Agents makes it the headline design move — the session is an append-only event log that “sits outside the harness,” so “nothing in the harness needs to survive a crash,” and
wake()reconstitutes a fresh harness from the log. - Temporal gets the same property as a side effect of its execution model: the Workflow (the loop) must be deterministic and is replayed from an event history, while all non-deterministic work (the Claude call, tool execution) lives in Activities. The loop’s state is the durable history.
- The Agent SDK offers the weak form: the loop is in-process, but you can capture
session_idand resume with “the full context from previous turns restored,” and the docs steer durable instructions into CLAUDE.md (re-injected every request) rather than the initial prompt (which compaction can summarize away).
So the spectrum is also a spectrum of how strongly statelessness is enforced: advised (SDK) → enforced by framework (Temporal) → architected into the platform (Managed Agents). ^[inferred] The lesson transfers downward: even if you stay in-process, treating the harness as disposable and persisting the session is the design that scales.
How to choose
- Default to in-process (SDK). For anything interactive or bounded, the SDK is the lowest-overhead option and
max_turns+max_budget_usdare sufficient guardrails. Don’t add durability infrastructure a well-scoped task doesn’t need. ^[inferred] - Reach for durable execution (Temporal) when the loop is long-running, unattended, or has side effects you can’t afford to lose mid-flight or double-run — and you want to own the stack. The price is operational: you run a durable-execution cluster. ^[inferred]
- Reach for Managed Agents when you’d otherwise be rebuilding sandbox isolation, transcript storage, credential vaults, and scaling yourself — the “infrastructure was the wall” case the Agent Platform team names. The price is vendor coupling and less harness control. ^[inferred]
A quick diagnostic: what happens if the process dies mid-run? If the answer is “the run is lost and that’s unacceptable,” you’ve outgrown the in-process loop and the choice is durable-vs-hosted. ^[inferred]
They compose, not just compete
These are layers, not rivals: ^[inferred]
- Inner loop / outer loop. The Managed Agents production talk frames an inner loop (outcomes iterating to a rubric, hosted) wrapped by an outer loop (you + Claude Code refining the rubric). Different runtimes can own different layers.
- A loop inside a loop. You can run an in-process SDK agent inside a Temporal Activity — durable orchestration around an ephemeral agent — so the cheap loop gets crash-recovery for free without rewriting it as a Workflow. ^[inferred]
- Orthogonal axes stay separate. Choosing a runtime here does not answer correctness or cost: still write the verifier first (verifier-first loops) and still run the token math (loop economics). The three axes compose into one production-readiness checklist.
Try It
- Map your current loop onto the table. Where does its state live, and what happens if the process dies mid-run? That single question places you on the spectrum.
- Stay in-process for bounded work. SDK loop with
max_turns+max_budget_usd; persistsession_idso you can resume even if you usually don’t. - Externalize state for unattended/side-effecting loops. Temporal Workflow (own infra) or Managed Agents (hosted) — either way, make the harness disposable and the session durable.
- Adopt the credential discipline regardless of runtime. Never let the harness or model hold tokens; Managed Agents’ vault + MCP-proxy pattern is portable to a self-hosted setup.
- Answer all three axes before you call it production: runtime (this article), correctness (verifier), and cost/security (economics).
Related
- Claude Agent SDK — How the Agent Loop Works — the in-process runtime (the loop, message types,
max_turns/max_budget_usd, session resume, hooks). - Temporal — Durable Agentic Loop — the durable runtime (Workflow = loop, Activities = Claude + tools, Temporal owns retries).
- Scaling Managed Agents — the hosted runtime (brain/hands/session decoupling, external event log,
wake()). - The Loop Is the Unit of Work — the prequel: how the leverage point moved prompt → harness → loop.
- Verifier-First Loops — the correctness axis orthogonal to this durability axis.
- Should You Build a Loop? — the cost/security axis.
- Managed Agents Production (Jess Ann + Lance Martin) — the inner-loop / outer-loop composition pattern.
- Agent Platform Team — the “infrastructure was the wall” argument behind the hosted option.
- Agent Loops — the topic this synthesis sits on top of.
Open Questions
- Where is the crossover point — in run length, value, or side-effect risk — at which durable execution’s operational cost pays for itself versus in-process
session_idresume? No source quantifies it. ^[inferred] - Do the three recovery mechanisms converge? SDK session-resume, Temporal replay, and Managed Agents
wake()all reconstruct loop state after failure — whether these become one portable “agent state” abstraction or stay three incompatible mechanisms is unsettled. ^[inferred] - No head-to-head benchmark exists. None of the three sources runs the same task across all three runtimes to compare cost, latency, and reliability — the obvious missing experiment. ^[inferred]