Source: Anthropic Blog Agent Workflow Patterns 2026 03 05 (Anthropic blog, Mar 5 2026 — https://claude.com/blog/common-workflow-patterns-for-ai-agents-and-when-to-use-them) · raw/x-account-bcherny-2060390852619272526.md (Boris Cherny / Claude Code creator sharing Salesforce’s agentic-Claude-Code writeup, posted 2026-05-29)

Anthropic’s named taxonomy of the three agent workflow patterns that keep showing up in production — Sequential, Parallel, and Evaluator-Optimizer — plus a decision framework for picking one. Workflows don’t replace agent autonomy; they shape where and how agents apply it. Default to sequential; upgrade when requirements force it.

Key Takeaways

  • Workflow ≠ agent. An agent decides what to do and when to stop. A workflow sets the execution shape — stages, checkpoints, boundaries — that an agent operates inside.
  • Three patterns cover most cases. Sequential (dependencies), Parallel (independence), Evaluator-Optimizer (iterative quality). Everything else is usually one of these composed.
  • “Start with the simplest pattern that solves your problem. Default to sequential.” — the blog’s explicit recommendation.
  • Single-agent first. “First try your pipeline as a single agent, where the steps are just part of the prompt.” Only reach for a workflow when that breaks.
  • Patterns nest. An evaluator-optimizer can use parallel evaluation. A sequential workflow can parallelize at a bottleneck stage.
  • Aggregation strategy precedes parallelism. “Design your aggregation strategy before implementing parallel agents.” The tricky part of parallel is combining the outputs, not splitting the work.
  • Stopping criteria precede iteration. “Set clear stopping criteria before you start iterating.” Evaluator-optimizer loops without explicit stop conditions run forever or regress.

The three patterns

1. Sequential Workflows — “step B needs step A’s output”

When to use: tasks with natural dependencies between stages; each stage adds specific value.

Examples from the blog:

  • Marketing copy generation, then translation into multiple languages
  • Document extraction → schema validation → database load
  • Draft → review → polish cycles

Pro tip: try the whole pipeline in a single agent (steps as prompt sections) before building a multi-agent sequence.

2. Parallel Workflows — “tasks are independent but serial is slow”

When to use: divide work into independent subtasks, or get multiple perspectives on the same problem.

Sub-patterns:

  • Sectioning — different agents handle different aspects of the same problem
  • Evaluation — each agent assesses a different quality dimension
  • Voting — multiple agents analyze the same content, aggregate

Examples from the blog:

  • Quality-metric evals in parallel
  • Code review with agents splitting vulnerability categories
  • Document analysis: themes + sentiment + facts extracted in parallel

Pro tip: design the aggregation step first. If you can’t describe how the outputs combine, parallel is premature.

3. Evaluator-Optimizer Workflows — “first draft quality isn’t good enough”

When to use: clear, measurable quality criteria an evaluator can apply consistently, and the first-attempt-to-final gap is meaningful enough to justify the extra loop.

Examples from the blog:

  • API docs against a style/accuracy standard
  • Customer comms requiring tone and precision
  • SQL against efficiency and security checks

Pro tip: set stopping criteria before you start. Maximum iterations, quality threshold, time budget — pick one.

Decision framework

From the blog, in order:

  1. Can a single agent handle this? → yes: skip workflows entirely
  2. Clear sequential dependencies? → Sequential
  3. Independent subtasks or multiple perspectives needed? → Parallel
  4. Quality improves meaningfully with refinement? → Evaluator-Optimizer

Plus the operational considerations:

  • Failure handling and retry per step
  • Latency and cost constraints
  • Measuring improvement against a baseline

Case study — Salesforce goes agentic with Claude Code

A production data point for the patterns above. Boris Cherny (Claude Code creator) shared Salesforce’s writeup on restructuring engineering work around agentic Claude Code (X, 2026-05-29). Metrics below are first-party-reported by Salesforce via bcherny — presented as reported, not independently verified.

  • A migration originally scoped at 231 days shipped in 13 days. ^[first-party metric — Salesforce via bcherny]
  • One PR delivered 21 endpoints at 100% test coverage — a single agent owning an end-to-end deliverable rather than a hand-off chain across multiple engineers. ^[first-party metric — Salesforce via bcherny]
  • Quality rose alongside velocity: incidents dropped 5% despite more PRs shipped. ^[first-party metric — Salesforce via bcherny]
  • Security and quality guardrails were built into the agentic workflow itself — the checks live inside the loop, not as a downstream gate. This is the evaluator-optimizer pattern made operational: SQL/code against efficiency-and-security checks, with the evaluator embedded so the agent iterates before a human ever reviews.

Why it illustrates this article’s thesis. The blog’s core recommendation is collapse before you split — “first try your pipeline as a single agent” and “default to the simplest pattern.” Salesforce’s stated lesson is the same move one level up the org chart: the gains came from fundamentally changing how teams work — deleting steps, eliminating hand-offs, and letting agents own end-to-end — not from accelerating the old multi-stage process. ^[first-party claim — Salesforce via bcherny] The 21-endpoints-in-one-PR figure is the concrete shape of that: where a legacy sequential hand-off chain (author → reviewer → QA → integrator) would have fragmented the work, a single agent with guardrails-in-the-loop owned it. The pattern advice (“design aggregation first,” “set stopping criteria”) is what makes that ownership safe rather than reckless.

Relation to Claude’s agent primitives

These patterns are architectural — they describe the shape of a workflow, not the primitives used to implement it. On Claude, the primitives that instantiate these patterns are:

  • Subagents — isolated parallel workers; the natural implementation for Parallel and for Sequential stages with scoped permissions
  • Agent Teams — peer-level coordination; richer than subagents for multi-perspective Parallel patterns
  • Managed Agents — server-hosted long-running loops; useful for any pattern where the workflow runs for hours
  • Advisor Strategy — upward consultation; integrates with any of the three patterns as a cost-optimized intelligence boost inside stages
  • Routines — scheduled/triggered execution; orthogonal to pattern choice (any pattern can run as a routine)

Open Questions

  • Quantitative thresholds. At what task complexity does sequential-as-prompt break down and require multi-agent sequential? No numbers in the blog.
  • Pattern combination cost. Nesting patterns (e.g. evaluator-optimizer containing parallel evaluation) compounds token costs. No guidance on when cost exceeds quality gains.
  • Failure-mode library. The blog lists operational considerations but doesn’t enumerate known failure modes per pattern (e.g. sequential with long dependency chains tends to drift; parallel aggregation struggles with conflicting outputs).
  • Linked white paper. The Anthropic “Building effective AI agents” white paper (https://resources.anthropic.com/ty-building-effective-ai-agents) likely goes deeper — not yet ingested into this wiki.
  • Salesforce’s full writeup. The bcherny thread (https://x.com/bcherny/status/2060390852619272526) links to Salesforce’s detailed case study with the underlying methodology behind the 231-days→13-days and 21-endpoint metrics. Fetch target — the thread captures headline numbers only; the linked writeup would let us verify the metrics and extract the per-pattern guardrail mechanics rather than presenting them as reported.

Try It

  1. Audit an existing agent. Classify your current Claude-driven workflow as sequential, parallel, evaluator-optimizer, or single-agent. If you can’t classify it, you may have accidentally built a hybrid that’s harder to debug than necessary.
  2. Collapse before splitting. For any multi-agent pipeline you have, try running it as a single-agent prompt with numbered steps. If quality matches, delete the orchestration.
  3. Define aggregation first for your next parallel design. Write the combiner before writing the fan-out. If the combiner is vague, parallel isn’t ready yet.
  4. Set stopping criteria for any evaluator-optimizer loop in your codebase. Max iterations, minimum quality threshold, time budget — pick at least one. Loops without stops are production incidents waiting to happen.
  5. Read the white paper at https://resources.anthropic.com/ty-building-effective-ai-agents for the long-form version of this taxonomy.