Source: ai-research/agentwikis-agent-workflows-2026-06-12.md — compiled by Agent Wikis from the 12-factor-agents repo, Anthropic “Building effective agents,” and Weng/Huyen agent surveys; sourced 2026-06-12

humanlayer/12-factor-agents is a framework of 12 principles for building reliable LLM applications in production, homaging Heroku’s 12-factor apps. Apache-2.0 (code), CC BY-SA 4.0 (content). Maintained by HumanLayer (whose product is the Factor 11 pitch). The framework’s load-bearing claim: LLMs are stateless functions; the state machine, control flow, and human-contact points must be built explicitly in the surrounding code — framework abstractions hide failures.

Key Takeaways

  • LLMs are stateless functions — an agent step is “here’s what’s happened so far, what’s the next step?” The whole design problem follows from this
  • Own your control flow (Factor 8) — don’t outsource the agent loop to a framework; framework abstractions hide failures that explicit code surfaces
  • Context engineering (Factors 2, 3) has superseded prompt engineering as the core discipline — optimizing the full token configuration, not just the instruction wording
  • Tools are the capability ceiling (Factors 1, 4) — tools are how agents affect the world; tool design determines agent behavior; treat tool calls as structured outputs
  • Reliability via composition (Factors 9, 10) — compact errors + small focused agents > one large complex agent
  • Human contact is first-class (Factor 7) — interrupting for human input is a designed behavior, not a sign of failure; it’s a tool call like any other

The Twelve Factors

#FactorTheme
1Natural language → tool callsTool Design
2Own your promptsContext Engineering
3Own your context windowContext Engineering
4Tools are structured outputsTool Design
5Unify execution & business stateControl Flow
6Launch / pause / resumeControl Flow
7Contact humans with toolsHuman-in-the-Loop
8Own your control flowControl Flow
9Compact errorsReliability
10Small, focused agentsReliability
11Trigger from anywhereHuman-in-the-Loop
12Stateless reducerControl Flow

Context Engineering (Factors 2 & 3)

The successor framing to prompt engineering (Anthropic, Sep 2025): building with LLMs is less about finding the right words and more about “what configuration of context is most likely to generate the desired behavior?” Context = the full set of tokens sampled over; context is a critical but finite resource.

  • Factor 2 — Own your prompts: don’t outsource prompt engineering to a framework’s black box; production quality requires hand-controlling what goes in
  • Factor 3 — Own your context window: you’re not obligated to use standard message formats; custom serializations (compact event logs instead of verbose message arrays) are fair game

Practical levers: compaction/summarization as conversations grow; retrieval instead of stuffing; compact error representations (Factor 9); sub-agents as context isolation (each gets its own window); token-efficient tool responses.

This is the lens that explains why compiled knowledge bases (Agent Wikis, this vault) outperform raw-source RAG: curating what enters the window beats forcing the model to search harder through noise.

Tool Design (Factors 1 & 4)

Factor 1: LLMs convert natural language to tool calls — tools are how agents affect the world, so tool design is agent design.

Factor 4: Tools are structured outputs — the same way an LLM produces the next token, it produces a tool call. Treat tool calls as structured data, not imperative commands.

Key principles: single responsibility (one tool per distinct action); self-describing name + description; idempotent where possible; fail loudly with compact informative errors (Factor 9).

Control Flow (Factors 5, 6, 8, 12)

The cluster of factors about owning your agent’s state machine:

  • Factor 5 — unify execution state with business state; don’t let the agent’s internal state and your application state diverge
  • Factor 6 — design for launch/pause/resume; long-running agents need checkpoints
  • Factor 8 — own your control flow; don’t let a framework’s loop be the only place errors can surface
  • Factor 12 — stateless reducer; the agent function is (context) → next_step, state lives outside it

These four together make the same architectural argument as Alex Krantz’s OpenClaw deep-dive (“agents-as-threads, session-as-process”) and Ryan Carson’s Clawd Chief (“agents are cron jobs and markdown files”).

Reliability (Factors 9 & 10)

Factor 9: Compact errors — full stack traces are expensive context waste; compact representations that tell the model what failed and what to do about it are faster and cheaper.

Factor 10: Small, focused agents — each agent should do one thing well; composing small reliable agents beats one large unreliable one. This is also the cost argument: small scoped agents use fewer tokens per task.

Human-in-the-Loop (Factors 7 & 11)

Factor 7: Contact humans with tools — interrupting for human input is a first-class operation; model it as a tool call, not an exception handler.

Factor 11: Trigger from anywhere — agents triggerable only from one surface (e.g., a CLI) are fragile; design for triggering from messaging channels, cron, webhooks, other agents.

HumanLayer’s product is the hosted infrastructure for Factor 7: routing agent approval requests to humans across surfaces (Slack, email, etc.) with audit trails.

The Five Workflow Patterns (Anthropic)

Companion taxonomy from Anthropic’s “Building effective agents” — the structural patterns that appear across production systems:

  1. Prompt chaining — fixed sequence; gates between steps; use when task cleanly decomposes into subtasks
  2. Routing — classify input, dispatch to specialized follow-up; also the cost lever (easy → small model, hard → capable model)
  3. Parallelization — sectioning (speed) or voting (confidence); use when subtasks are independent
  4. Orchestrator-workers — central LLM dynamically delegates; the bridge to full agents and multi-agent systems
  5. Evaluator-optimizer — generate → evaluate → loop; use when evaluation criteria are clear and iteration measurably improves output

These compose freely (a router can front a chain whose middle step parallelizes). Note: the existing article Agent Workflow Patterns covers sequential/parallel/evaluator-optimizer; the routing and orchestrator-workers patterns are addressed here. 12-factor’s Factor 8 (“own your control flow”) makes the same argument from the infrastructure side.

Convergent Principles Across Sources

The 12-factor framework, Anthropic essays, and the Weng/Huyen agent surveys converge on the same set:

  • LLMs are stateless functions; state lives outside them
  • Context window management is the core engineering challenge
  • Tools define the agent’s capability ceiling
  • Human oversight is architectural, not ad-hoc
  • Reliability comes from composing small reliable units, not from building complex monoliths
  • “Own your control flow” — abstractions hide failures

Try It

  • Read the 12-factor repo: github.com/humanlayer/12-factor-agents (factor pages under content/factor-0N.md)
  • Audit an existing agent against each factor: which ones does it handle explicitly vs implicitly?
  • Apply Factor 9 concretely: replace verbose error logging in tool responses with a one-line summary of what failed and the action to retry
  • Apply Factor 3 concretely: log the token count of your context window at each step and identify what’s eating it