Source: wiki synthesis: 12-Factor Agents, Claude Agent SDK — How the Agent Loop Works
HumanLayer’s 12-factor-agents is a bottom-up framework, distilled from production pain, for what a reliable LLM-application harness needs. Anthropic’s Agent SDK is a top-down, first-party implementation of exactly that kind of harness — the same loop that runs Claude Code, packaged for embedding in other applications. Nobody in the wiki has checked one against the other. Doing so is useful for anyone deciding whether to build a custom harness on the raw Messages API, adopt the Agent SDK, or roll their own 12-factor-compliant framework: which factors does the SDK hand you for free, which does it merely make possible, and where do the two philosophies actually pull in different directions?
Key Takeaways
- The SDK gives strong native support for roughly half the factors. Structured tool calls (1, 4), context-window levers (3), session pause/resume (6), subagents as small-focused-agent composition (10), and the stateless-reducer loop shape (12) are either built in or the SDK’s documented recommended pattern.
- Some factors are enabled but not opinionated — the SDK gives you the seam, you still do the design work. Owning your prompts (2) and compact errors (9) are fully within your control (you write the system prompt; you decide what a tool result returns) but the SDK doesn’t hand you a default.
- Factor 5 (unify execution and business state) is the SDK’s clearest gap.
session_idpersists SDK-internal conversation state; it has no concept of your application’s business state — a database row, a workflow status. That unification is entirely the caller’s job. It’s the same gap the SDK-vs-Temporal-vs-Managed-Agents connection flags from the durability angle, seen here from the reliability-factor angle instead. - Factor 8 (“own your control flow,” which argues against outsourcing your loop to a framework) is the one genuine philosophical tension — and it mostly resolves rather than contradicts. The Agent SDK is a framework providing the loop, which sounds like precisely what Factor 8 warns against. But the SDK externalizes the loop as an iterable message stream — you see every
AssistantMessage, tool call, andUserMessageyourself — rather than hiding it behind an opaque.run()call, and it exposesPreToolUse/PostToolUsehooks to intercept at every step. It’s a framework built to expose the mechanics Factor 8 wants exposed, not one that hides them. ^[inferred] - Factor 7 (contact humans with tools) has a direct primitive match.
AskUserQuestionplus permission-mode approval callbacks model human-in-the-loop exactly the way 12-factor prescribes — an interrupt for input is a tool call, not an exception path. - Factor 11 (trigger from anywhere) is architecturally true but not out-of-the-box. The SDK ships as “a standalone package (no Claude Code CLI required)” specifically so it can be embedded anywhere — but the webhook/cron/messaging integrations themselves are the caller’s build, not something the SDK ships.
Factor-by-Factor Scorecard
| # | Factor | SDK coverage | Evidence |
|---|---|---|---|
| 1 | Natural language → tool calls | Native | AssistantMessage tool-call requests are the core message type of the loop |
| 2 | Own your prompts | Enabled, not opinionated | System prompt + CLAUDE.md are yours to write; SDK doesn’t prescribe content |
| 3 | Own your context window | Native, strong | ToolSearch defers MCP schemas; CLAUDE.md (survives compaction) vs. initial prompt (may be summarized away); subagents as context isolation |
| 4 | Tools are structured outputs | Native | UserMessage tool results are a first-class message type, fed back into the loop |
| 5 | Unify execution + business state | Gap — DIY | session_id tracks SDK conversation state only, not the caller’s business state |
| 6 | Launch / pause / resume | Native | session_id resume “restores full context from previous turns” |
| 7 | Contact humans with tools | Native primitive match | AskUserQuestion + permission-mode approval callbacks |
| 8 | Own your control flow | Tension resolves | Loop exposed as an iterable stream + hooks at every step, not hidden behind a black-box call |
| 9 | Compact errors | Enabled, not opinionated | Tool-result content is fully yours to shape; no prescribed compact-error format |
| 10 | Small, focused agents | Native, strong | Subagents get fresh context; SDK docs explicitly recommend this for cost and context reasons |
| 11 | Trigger from anywhere | Architecturally enabled | Standalone package, no CLI dependency — but the triggering integrations themselves are DIY |
| 12 | Stateless reducer | Native, exact match | Loop is documented as (context) → next_step; see the runtime-comparison connection for how this plays out across SDK, Temporal, and Managed Agents |
^[inferred] This table is this article’s own synthesis — individual cells trace to facts stated in both source articles, but the native/enabled/gap classification is a judgment call neither source makes directly.
The Factor 8 Question, in More Detail
12-factor-agents’ Factor 8 argues that framework abstractions hide the failures explicit code would surface, and that you should own your control flow rather than outsource it. Read narrowly, “don’t outsource your loop to a framework” would seem to rule out the Agent SDK by definition — the SDK’s entire purpose is running the loop for you.
The more useful reading is about visibility, not ownership of the literal for-loop. What Factor 8 actually objects to is a harness that swallows the loop’s intermediate state so failures surface as an opaque final error instead of a traceable step. On that reading, the Agent SDK is closer to compliant than not: iterating query() hands you every AssistantMessage (including tool-call requests), every UserMessage (including tool results), and every SystemMessage boundary (init, compaction) as they happen — plus PreToolUse/PostToolUse hooks that run in your own process and can block or modify a call before it executes. That’s a framework, but not the kind of framework Factor 8 is warning about. The place the tension is real, not resolved, is Factor 5 above: the SDK does not expose or unify your business state the way it exposes message flow, so “own your control flow” only holds at the conversation layer, not the application layer. ^[inferred]
Try It
- If you’re choosing a harness for a new agent, use the scorecard above as a starting checklist: factors 1, 3, 4, 6, 7, 10, and 12 come close to free with the SDK; budget explicit design time for 2, 5, 9, and 11.
- If Factor 5 (business-state unification) is a hard requirement — e.g., an agent’s progress must survive independently of any SDK session — treat that as the signal to look at Temporal or a custom event-sourced state layer instead of (or alongside) the raw SDK. See Running the Agentic Loop for that comparison in full.
- Read the 12 factors even if you’re committed to the SDK. They’re a checklist for the decisions the SDK deliberately leaves to you — prompt content, error shapes, business-state unification — not just an argument against using a framework at all.
- Audit an existing SDK-based agent against Factor 9 specifically. Compact, actionable tool-error text is one of the cheapest reliability wins in the whole framework and the SDK won’t enforce it for you.