12-Factor Agents — HumanLayer's Framework for Reliable LLM Applications

Source: ai-research/agentwikis-agent-workflows-2026-06-12.md — compiled by Agent Wikis from the 12-factor-agents repo, Anthropic “Building effective agents,” and Weng/Huyen agent surveys; sourced 2026-06-12

humanlayer/12-factor-agents is a framework of 12 principles for building reliable LLM applications in production, homaging Heroku’s 12-factor apps. Apache-2.0 (code), CC BY-SA 4.0 (content). Maintained by HumanLayer (whose product is the Factor 11 pitch). The framework’s load-bearing claim: LLMs are stateless functions; the state machine, control flow, and human-contact points must be built explicitly in the surrounding code — framework abstractions hide failures.

Key Takeaways

LLMs are stateless functions — an agent step is “here’s what’s happened so far, what’s the next step?” The whole design problem follows from this
Own your control flow (Factor 8) — don’t outsource the agent loop to a framework; framework abstractions hide failures that explicit code surfaces
Context engineering (Factors 2, 3) has superseded prompt engineering as the core discipline — optimizing the full token configuration, not just the instruction wording
Tools are the capability ceiling (Factors 1, 4) — tools are how agents affect the world; tool design determines agent behavior; treat tool calls as structured outputs
Reliability via composition (Factors 9, 10) — compact errors + small focused agents > one large complex agent
Human contact is first-class (Factor 7) — interrupting for human input is a designed behavior, not a sign of failure; it’s a tool call like any other

The Twelve Factors

#	Factor	Theme
1	Natural language → tool calls	Tool Design
2	Own your prompts	Context Engineering
3	Own your context window	Context Engineering
4	Tools are structured outputs	Tool Design
5	Unify execution & business state	Control Flow
6	Launch / pause / resume	Control Flow
7	Contact humans with tools	Human-in-the-Loop
8	Own your control flow	Control Flow
9	Compact errors	Reliability
10	Small, focused agents	Reliability
11	Trigger from anywhere	Human-in-the-Loop
12	Stateless reducer	Control Flow

Context Engineering (Factors 2 & 3)

The successor framing to prompt engineering (Anthropic, Sep 2025): building with LLMs is less about finding the right words and more about “what configuration of context is most likely to generate the desired behavior?” Context = the full set of tokens sampled over; context is a critical but finite resource.

Factor 2 — Own your prompts: don’t outsource prompt engineering to a framework’s black box; production quality requires hand-controlling what goes in
Factor 3 — Own your context window: you’re not obligated to use standard message formats; custom serializations (compact event logs instead of verbose message arrays) are fair game

Practical levers: compaction/summarization as conversations grow; retrieval instead of stuffing; compact error representations (Factor 9); sub-agents as context isolation (each gets its own window); token-efficient tool responses.

This is the lens that explains why compiled knowledge bases (Agent Wikis, this vault) outperform raw-source RAG: curating what enters the window beats forcing the model to search harder through noise.

Tool Design (Factors 1 & 4)

Factor 1: LLMs convert natural language to tool calls — tools are how agents affect the world, so tool design is agent design.

Factor 4: Tools are structured outputs — the same way an LLM produces the next token, it produces a tool call. Treat tool calls as structured data, not imperative commands.

Key principles: single responsibility (one tool per distinct action); self-describing name + description; idempotent where possible; fail loudly with compact informative errors (Factor 9).

Control Flow (Factors 5, 6, 8, 12)

The cluster of factors about owning your agent’s state machine:

Factor 5 — unify execution state with business state; don’t let the agent’s internal state and your application state diverge
Factor 6 — design for launch/pause/resume; long-running agents need checkpoints
Factor 8 — own your control flow; don’t let a framework’s loop be the only place errors can surface
Factor 12 — stateless reducer; the agent function is (context) → next_step, state lives outside it

These four together make the same architectural argument as Alex Krantz’s OpenClaw deep-dive (“agents-as-threads, session-as-process”) and Ryan Carson’s Clawd Chief (“agents are cron jobs and markdown files”).

Reliability (Factors 9 & 10)

Factor 9: Compact errors — full stack traces are expensive context waste; compact representations that tell the model what failed and what to do about it are faster and cheaper.

Factor 10: Small, focused agents — each agent should do one thing well; composing small reliable agents beats one large unreliable one. This is also the cost argument: small scoped agents use fewer tokens per task.

Human-in-the-Loop (Factors 7 & 11)

Factor 7: Contact humans with tools — interrupting for human input is a first-class operation; model it as a tool call, not an exception handler.

Factor 11: Trigger from anywhere — agents triggerable only from one surface (e.g., a CLI) are fragile; design for triggering from messaging channels, cron, webhooks, other agents.

HumanLayer’s product is the hosted infrastructure for Factor 7: routing agent approval requests to humans across surfaces (Slack, email, etc.) with audit trails.

The Five Workflow Patterns (Anthropic)

Companion taxonomy from Anthropic’s “Building effective agents” — the structural patterns that appear across production systems:

Prompt chaining — fixed sequence; gates between steps; use when task cleanly decomposes into subtasks
Routing — classify input, dispatch to specialized follow-up; also the cost lever (easy → small model, hard → capable model)
Parallelization — sectioning (speed) or voting (confidence); use when subtasks are independent
Orchestrator-workers — central LLM dynamically delegates; the bridge to full agents and multi-agent systems
Evaluator-optimizer — generate → evaluate → loop; use when evaluation criteria are clear and iteration measurably improves output

These compose freely (a router can front a chain whose middle step parallelizes). Note: the existing article Agent Workflow Patterns covers sequential/parallel/evaluator-optimizer; the routing and orchestrator-workers patterns are addressed here. 12-factor’s Factor 8 (“own your control flow”) makes the same argument from the infrastructure side.

Convergent Principles Across Sources

The 12-factor framework, Anthropic essays, and the Weng/Huyen agent surveys converge on the same set:

LLMs are stateless functions; state lives outside them
Context window management is the core engineering challenge
Tools define the agent’s capability ceiling
Human oversight is architectural, not ad-hoc
Reliability comes from composing small reliable units, not from building complex monoliths
“Own your control flow” — abstractions hide failures

Try It

Read the 12-factor repo: github.com/humanlayer/12-factor-agents (factor pages under content/factor-0N.md)
Audit an existing agent against each factor: which ones does it handle explicitly vs implicitly?
Apply Factor 9 concretely: replace verbose error logging in tool responses with a one-line summary of what failed and the action to retry
Apply Factor 3 concretely: log the token count of your context window at each step and identify what’s eating it

RAG and Vector Retrieval for Agents — A Practical Primer — “retrieval instead of stuffing” (Factor 3) explained end to end: embeddings, chunking, hybrid search, agentic-RAG tool-calling patterns, and why compiled knowledge bases like this vault can beat raw-source RAG.
Agent Workflow Patterns — Anthropic’s sequential/parallel/evaluator-optimizer taxonomy (overlaps with workflow patterns above)
Principles for Autonomous System Design — Alex Krantz’s complementary architectural deep-dive
Claude Agent Hierarchy — Claude-specific agent tier decision framework
Microsoft Agent Governance Toolkit — Factor 5 (unify execution + business state) and Factor 8 (own your control flow) at the governance layer
Claude Code Memory Architecture — context engineering in a coding agent context
Loop Engineering (Cobus Greyling) — operationalizes the same “own your control flow / context / prompts” thesis over time: six primitives, seven patterns, and an L0→L3 readiness ladder for the loops that prompt your agents.
Checking Anthropic’s Agent SDK Against the Community’s 12 Factors — factor-by-factor audit of how Claude’s own Agent SDK measures up against this framework.
Agent SDK — How the Agent Loop Works — Claude’s own first-party implementation of the same control-flow/context-ownership architecture, described from the loop-mechanics side.

Jonathon's AI Wiki

Explorer

12-Factor Agents — HumanLayer's Framework for Reliable LLM Applications

Key Takeaways

The Twelve Factors

Context Engineering (Factors 2 & 3)

Tool Design (Factors 1 & 4)

Control Flow (Factors 5, 6, 8, 12)

Reliability (Factors 9 & 10)

Human-in-the-Loop (Factors 7 & 11)

The Five Workflow Patterns (Anthropic)

Convergent Principles Across Sources

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

12-Factor Agents — HumanLayer's Framework for Reliable LLM Applications

Key Takeaways

The Twelve Factors

Context Engineering (Factors 2 & 3)

Tool Design (Factors 1 & 4)

Control Flow (Factors 5, 6, 8, 12)

Reliability (Factors 9 & 10)

Human-in-the-Loop (Factors 7 & 11)

The Five Workflow Patterns (Anthropic)

Convergent Principles Across Sources

Try It

Related

Graph View

Table of Contents

Backlinks