Source: ai-research/agentwikis-agent-workflows-2026-06-12.md — compiled by Agent Wikis from the 12-factor-agents repo, Anthropic “Building effective agents,” and Weng/Huyen agent surveys; sourced 2026-06-12
humanlayer/12-factor-agents is a framework of 12 principles for building reliable LLM applications in production, homaging Heroku’s 12-factor apps. Apache-2.0 (code), CC BY-SA 4.0 (content). Maintained by HumanLayer (whose product is the Factor 11 pitch). The framework’s load-bearing claim: LLMs are stateless functions; the state machine, control flow, and human-contact points must be built explicitly in the surrounding code — framework abstractions hide failures.
Key Takeaways
- LLMs are stateless functions — an agent step is “here’s what’s happened so far, what’s the next step?” The whole design problem follows from this
- Own your control flow (Factor 8) — don’t outsource the agent loop to a framework; framework abstractions hide failures that explicit code surfaces
- Context engineering (Factors 2, 3) has superseded prompt engineering as the core discipline — optimizing the full token configuration, not just the instruction wording
- Tools are the capability ceiling (Factors 1, 4) — tools are how agents affect the world; tool design determines agent behavior; treat tool calls as structured outputs
- Reliability via composition (Factors 9, 10) — compact errors + small focused agents > one large complex agent
- Human contact is first-class (Factor 7) — interrupting for human input is a designed behavior, not a sign of failure; it’s a tool call like any other
The Twelve Factors
| # | Factor | Theme |
|---|---|---|
| 1 | Natural language → tool calls | Tool Design |
| 2 | Own your prompts | Context Engineering |
| 3 | Own your context window | Context Engineering |
| 4 | Tools are structured outputs | Tool Design |
| 5 | Unify execution & business state | Control Flow |
| 6 | Launch / pause / resume | Control Flow |
| 7 | Contact humans with tools | Human-in-the-Loop |
| 8 | Own your control flow | Control Flow |
| 9 | Compact errors | Reliability |
| 10 | Small, focused agents | Reliability |
| 11 | Trigger from anywhere | Human-in-the-Loop |
| 12 | Stateless reducer | Control Flow |
Context Engineering (Factors 2 & 3)
The successor framing to prompt engineering (Anthropic, Sep 2025): building with LLMs is less about finding the right words and more about “what configuration of context is most likely to generate the desired behavior?” Context = the full set of tokens sampled over; context is a critical but finite resource.
- Factor 2 — Own your prompts: don’t outsource prompt engineering to a framework’s black box; production quality requires hand-controlling what goes in
- Factor 3 — Own your context window: you’re not obligated to use standard message formats; custom serializations (compact event logs instead of verbose message arrays) are fair game
Practical levers: compaction/summarization as conversations grow; retrieval instead of stuffing; compact error representations (Factor 9); sub-agents as context isolation (each gets its own window); token-efficient tool responses.
This is the lens that explains why compiled knowledge bases (Agent Wikis, this vault) outperform raw-source RAG: curating what enters the window beats forcing the model to search harder through noise.
Tool Design (Factors 1 & 4)
Factor 1: LLMs convert natural language to tool calls — tools are how agents affect the world, so tool design is agent design.
Factor 4: Tools are structured outputs — the same way an LLM produces the next token, it produces a tool call. Treat tool calls as structured data, not imperative commands.
Key principles: single responsibility (one tool per distinct action); self-describing name + description; idempotent where possible; fail loudly with compact informative errors (Factor 9).
Control Flow (Factors 5, 6, 8, 12)
The cluster of factors about owning your agent’s state machine:
- Factor 5 — unify execution state with business state; don’t let the agent’s internal state and your application state diverge
- Factor 6 — design for launch/pause/resume; long-running agents need checkpoints
- Factor 8 — own your control flow; don’t let a framework’s loop be the only place errors can surface
- Factor 12 — stateless reducer; the agent function is
(context) → next_step, state lives outside it
These four together make the same architectural argument as Alex Krantz’s OpenClaw deep-dive (“agents-as-threads, session-as-process”) and Ryan Carson’s Clawd Chief (“agents are cron jobs and markdown files”).
Reliability (Factors 9 & 10)
Factor 9: Compact errors — full stack traces are expensive context waste; compact representations that tell the model what failed and what to do about it are faster and cheaper.
Factor 10: Small, focused agents — each agent should do one thing well; composing small reliable agents beats one large unreliable one. This is also the cost argument: small scoped agents use fewer tokens per task.
Human-in-the-Loop (Factors 7 & 11)
Factor 7: Contact humans with tools — interrupting for human input is a first-class operation; model it as a tool call, not an exception handler.
Factor 11: Trigger from anywhere — agents triggerable only from one surface (e.g., a CLI) are fragile; design for triggering from messaging channels, cron, webhooks, other agents.
HumanLayer’s product is the hosted infrastructure for Factor 7: routing agent approval requests to humans across surfaces (Slack, email, etc.) with audit trails.
The Five Workflow Patterns (Anthropic)
Companion taxonomy from Anthropic’s “Building effective agents” — the structural patterns that appear across production systems:
- Prompt chaining — fixed sequence; gates between steps; use when task cleanly decomposes into subtasks
- Routing — classify input, dispatch to specialized follow-up; also the cost lever (easy → small model, hard → capable model)
- Parallelization — sectioning (speed) or voting (confidence); use when subtasks are independent
- Orchestrator-workers — central LLM dynamically delegates; the bridge to full agents and multi-agent systems
- Evaluator-optimizer — generate → evaluate → loop; use when evaluation criteria are clear and iteration measurably improves output
These compose freely (a router can front a chain whose middle step parallelizes). Note: the existing article Agent Workflow Patterns covers sequential/parallel/evaluator-optimizer; the routing and orchestrator-workers patterns are addressed here. 12-factor’s Factor 8 (“own your control flow”) makes the same argument from the infrastructure side.
Convergent Principles Across Sources
The 12-factor framework, Anthropic essays, and the Weng/Huyen agent surveys converge on the same set:
- LLMs are stateless functions; state lives outside them
- Context window management is the core engineering challenge
- Tools define the agent’s capability ceiling
- Human oversight is architectural, not ad-hoc
- Reliability comes from composing small reliable units, not from building complex monoliths
- “Own your control flow” — abstractions hide failures
Try It
- Read the 12-factor repo:
github.com/humanlayer/12-factor-agents(factor pages undercontent/factor-0N.md) - Audit an existing agent against each factor: which ones does it handle explicitly vs implicitly?
- Apply Factor 9 concretely: replace verbose error logging in tool responses with a one-line summary of what failed and the action to retry
- Apply Factor 3 concretely: log the token count of your context window at each step and identify what’s eating it
Related
- Agent Workflow Patterns — Anthropic’s sequential/parallel/evaluator-optimizer taxonomy (overlaps with workflow patterns above)
- Principles for Autonomous System Design — Alex Krantz’s complementary architectural deep-dive
- Claude Agent Hierarchy — Claude-specific agent tier decision framework
- Microsoft Agent Governance Toolkit — Factor 5 (unify execution + business state) and Factor 8 (own your control flow) at the governance layer
- Claude Code Memory Architecture — context engineering in a coding agent context
- Loop Engineering (Cobus Greyling) — operationalizes the same “own your control flow / context / prompts” thesis over time: six primitives, seven patterns, and an L0→L3 readiness ladder for the loops that prompt your agents.