Source: wiki synthesis: Claude Agent SDK — How the Agent Loop Works, Temporal — Durable Agentic Loop, Scaling Managed Agents

The agentic loop is one primitive: Claude evaluates the state, calls tools, receives results, and repeats until it produces output with no tool calls. Three articles ingested together — the first-party Agent SDK loop, Temporal’s durable agentic loop, and Anthropic’s Managed Agents engineering architecture — are the same loop run three ways. What separates them is not the loop logic (identical in all three) but where the loop’s state lives and who owns failure recovery. Read together they form a spectrum from you own the harness, the run is ephemeral to Anthropic owns the harness, the state is durable by design — and, surprisingly, all three converge on the same architectural rule.

Key Takeaways

  • One loop, three runtimes. The Claude tool-use cycle (evaluate → tool_use → execute → append results → repeat → final text) is byte-for-byte the same in the SDK, the Temporal recipe, and inside Managed Agents. The runtime is the variable, not the algorithm. ^[inferred]
  • The axis is state durability + recovery ownership. Moving SDK → Temporal → Managed Agents, you trade control over the harness for someone else owning crash recovery, retries, scaling, and credential isolation. ^[inferred]
  • All three independently arrive at one rule: the harness should be stateless; the session/state should be durable and external. Managed Agents states it outright (“nothing in the harness needs to survive a crash”); Temporal enforces it via the Workflow/Activity determinism split; the SDK advises it (resume via session_id, and “put durable rules in CLAUDE.md, not the initial prompt, because compaction may drop them”). Same insight, three enforcement strengths. ^[inferred]
  • Retries are a tell. The SDK leaves retries to you/the client; Temporal explicitly takes retries away from the Anthropic client (“client retries can interfere with correct and durable error handling”); Managed Agents catches a dead sandbox as a tool-call error and hands it back to Claude. Who owns the retry tells you which runtime you’re in. ^[inferred]
  • Durability is orthogonal to correctness and cost. This is a third axis alongside verification (is the output right?) and security (what does it cost, is it safe?). A production loop needs an answer on all three. ^[inferred]
  • It’s the operational sequel to “the loop is the unit of work”. Once the loop is your unit of work, the next engineering decision is where it runs and who owns its failure modes. ^[inferred]

The one loop, three runtimes

DimensionIn-process — Agent SDKDurable — TemporalHosted — Managed Agents
Where the loop runsyour processa Temporal worker on your infraAnthropic’s infra
Where loop state livesmemory + conversation; opt-in session_id resumethe Workflow event history (auto-replayed)external append-only session log (emitEvent / getEvents)
Crash recoveryrun dies; resumes only if you persisted session_idautomatic replay from historywake(sessionId) reboots a stateless harness
Who owns retriesyou / the SDK clientTemporal (deliberately not the Anthropic client)Anthropic (sandbox death → tool-call error → back to Claude)
Tool executionin-process tools + MCPeach call is a retryable Activityuniversal execute(); sandboxes are cattle
Boundingmax_turns / max_budget_usd / effortWorkflow timeouts + retry policyservice-managed
Credential isolationyour responsibilityyour responsibilityvault + MCP proxy — harness never sees creds
Statelessness of harnessadvised (resume + CLAUDE.md)enforced (determinism split)architected (session decoupled from harness)
Ops you ownthe harness onlyharness + a durable-execution clusteralmost nothing
Best forprototypes, interactive, bounded taskslong-running / unattended / side-effecting, on your stackproduction without building the infra

The shared insight: stateless harness, durable external state

The non-obvious payoff of reading these three together is that they are the same architecture discovered from three directions. The agentic loop is fragile by default — a long run holds its entire state in one process’s memory, so a crash three hours in loses everything. Each runtime fixes this by pulling state out of the harness:

  • Managed Agents makes it the headline design move — the session is an append-only event log that “sits outside the harness,” so “nothing in the harness needs to survive a crash,” and wake() reconstitutes a fresh harness from the log.
  • Temporal gets the same property as a side effect of its execution model: the Workflow (the loop) must be deterministic and is replayed from an event history, while all non-deterministic work (the Claude call, tool execution) lives in Activities. The loop’s state is the durable history.
  • The Agent SDK offers the weak form: the loop is in-process, but you can capture session_id and resume with “the full context from previous turns restored,” and the docs steer durable instructions into CLAUDE.md (re-injected every request) rather than the initial prompt (which compaction can summarize away).

So the spectrum is also a spectrum of how strongly statelessness is enforced: advised (SDK) → enforced by framework (Temporal) → architected into the platform (Managed Agents). ^[inferred] The lesson transfers downward: even if you stay in-process, treating the harness as disposable and persisting the session is the design that scales.

How to choose

  • Default to in-process (SDK). For anything interactive or bounded, the SDK is the lowest-overhead option and max_turns + max_budget_usd are sufficient guardrails. Don’t add durability infrastructure a well-scoped task doesn’t need. ^[inferred]
  • Reach for durable execution (Temporal) when the loop is long-running, unattended, or has side effects you can’t afford to lose mid-flight or double-run — and you want to own the stack. The price is operational: you run a durable-execution cluster. ^[inferred]
  • Reach for Managed Agents when you’d otherwise be rebuilding sandbox isolation, transcript storage, credential vaults, and scaling yourself — the “infrastructure was the wall” case the Agent Platform team names. The price is vendor coupling and less harness control. ^[inferred]

A quick diagnostic: what happens if the process dies mid-run? If the answer is “the run is lost and that’s unacceptable,” you’ve outgrown the in-process loop and the choice is durable-vs-hosted. ^[inferred]

They compose, not just compete

These are layers, not rivals: ^[inferred]

  • Inner loop / outer loop. The Managed Agents production talk frames an inner loop (outcomes iterating to a rubric, hosted) wrapped by an outer loop (you + Claude Code refining the rubric). Different runtimes can own different layers.
  • A loop inside a loop. You can run an in-process SDK agent inside a Temporal Activity — durable orchestration around an ephemeral agent — so the cheap loop gets crash-recovery for free without rewriting it as a Workflow. ^[inferred]
  • Orthogonal axes stay separate. Choosing a runtime here does not answer correctness or cost: still write the verifier first (verifier-first loops) and still run the token math (loop economics). The three axes compose into one production-readiness checklist.

Try It

  1. Map your current loop onto the table. Where does its state live, and what happens if the process dies mid-run? That single question places you on the spectrum.
  2. Stay in-process for bounded work. SDK loop with max_turns + max_budget_usd; persist session_id so you can resume even if you usually don’t.
  3. Externalize state for unattended/side-effecting loops. Temporal Workflow (own infra) or Managed Agents (hosted) — either way, make the harness disposable and the session durable.
  4. Adopt the credential discipline regardless of runtime. Never let the harness or model hold tokens; Managed Agents’ vault + MCP-proxy pattern is portable to a self-hosted setup.
  5. Answer all three axes before you call it production: runtime (this article), correctness (verifier), and cost/security (economics).

Open Questions

  • Where is the crossover point — in run length, value, or side-effect risk — at which durable execution’s operational cost pays for itself versus in-process session_id resume? No source quantifies it. ^[inferred]
  • Do the three recovery mechanisms converge? SDK session-resume, Temporal replay, and Managed Agents wake() all reconstruct loop state after failure — whether these become one portable “agent state” abstraction or stay three incompatible mechanisms is unsettled. ^[inferred]
  • No head-to-head benchmark exists. None of the three sources runs the same task across all three runtimes to compare cost, latency, and reliability — the obvious missing experiment. ^[inferred]