Source: Introducing dynamic workflows in Claude Code (Anthropic blog, claude.com/blog/introducing-dynamic-workflows-in-claude-code; official docs code.claude.com/docs/en/workflows); operator walkthroughs raw/Opus_4.8_is_NOT_Claude_s_biggest_release_today_Ultracode_and_Dynamic_Workflows.md and raw/Claude_Code_Dynamic_Workflows_Clearly_Explained.md; best-practices deep-dive by Thariq Shihipar (@trq212, Anthropic Claude Code team), 2061907337154367865 — also published on the Claude Blog; trigger-word update via raw/x-bookmarks-recent-digest-2026-06-04.md (@ClaudeDevs, 2026-06-03); concurrency-cap + deep-research-verify specifics via raw/Claude_Code_Just_Dropped_Workflows_An_Actual_Game_Changer.md (2026-06-05)
The official Anthropic announcement of dynamic workflows — the feature the wiki previously covered only via a third-party walkthrough ([[claude-ai/claude-code-workflows-tool-walkthrough|Claude Code /workflows Walkthrough]]). Claude dynamically writes orchestration scripts that run tens to hundreds of parallel subagents in a single session, checking its work (agents refute each other until answers converge) before anything reaches you. Built for the work too big for one pass — a bug hunt across a whole service, a migration touching hundreds of files, a plan stress-tested from every angle — and explicitly framed as turning quarters of work into days. Now in research preview; enabled via the ultracode setting (on by default for Max/Team/API).
Key Takeaways
- What it is: Claude plans dynamically from your prompt, breaks the task into subtasks, and fans out across tens-to-hundreds of parallel subagents in one session — writing the orchestration script itself rather than you authoring it. (This is the official-announcement framing of the same Workflow primitive the [[claude-ai/claude-code-workflows-tool-walkthrough|
/workflowswalkthrough]] dissects at theworkflow.jslevel.) - The self-checking loop is the differentiator: agents address the problem from independent angles, other agents try to refute what they found, and the run iterates until the answers converge — “how a workflow reaches results a single pass can’t.” Adversarial verification is built into the primitive, not bolted on.
- Resumable by design: progress is saved as the run goes, so an interrupted job picks up where it left off instead of restarting. Coordination happens outside the conversation, so the plan stays on track no matter how big the task gets (no orchestrator context-rot).
- Built for long-running work — runs can extend into hours and days, doing complex engineering that previously took weeks.
- Token cost is real: dynamic workflows consume meaningfully more usage than a typical Claude Code session. The first time a workflow triggers, Claude Code shows what’s about to run and asks you to confirm. Start on a scoped task to calibrate.
- Turn on auto mode for the best experience (it’s the permission layer that lets long parallel runs proceed without per-action prompts).
- Flagship proof point — the Bun Zig→Rust port (Jarred Sumner): 99.8% of the existing test suite passing, ~750,000 lines of Rust, 11 days first-commit-to-merge, with hundreds of agents in parallel and two reviewers on each file.
- Two ways to start: ask Claude to create a workflow, or turn on the Claude Code setting
ultracode. - Trigger-word update (2026-06-03): the explicit trigger keyword changed from “workflow” to “ultracode” — “use a workflow for this” still works when the intent is clear, but incidental mentions of “workflow” no longer kick off a dynamic workflow (a false-positive fix from user feedback). Say “ultracode” to trigger one explicitly (@ClaudeDevs, 2026-06-03).
- The Claude Code team’s own best-practices guide (Thariq Shihipar, ~2026-06-02) frames workflows as the fix for three single-context failure modes — agentic laziness, self-preferential bias, goal drift — and ships a reusable six-pattern taxonomy (classify-and-act, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, loop-until-done) plus an example-prompt library skewed toward non-coding work. See Best practices from the Claude Code team below.
- Operator note — when a loop actually pays off (the “GP-loop,” 2026-06-09). [YouTube signal — Greg Isenberg pod] A bounded-loop discipline that sharpens the loop-until-done pattern above and counterweights open-ended
/goalloops: a Claude Code skill runs a code-review loop gated by an external scorer — the agent checks the GitHub PR, reads the review, fixes it, re-pushes, and won’t stop until the review scores ≥4/5 (or it caps at 5 iterations, then gives up). The load-bearing heuristic: loops only pay off in a confined process with a fixed, binary feedback signal (“where the output is black-or-white with no creativity” — i.e., code review), and they break down past ~1,000 LOC per push (too much for the agent to fully review → split into multiple PRs). A concrete boundary on autonomy: the scorer and the LOC ceiling are what keep the loop from running away. (Source:raw/WTF_Is_an_AI_Agent_Loop_Genius_or_Hype.md.)
How it works
- Plan dynamically — Claude reads your prompt and decomposes it into subtasks on the fly (no pre-authored script required).
- Fan out — work is distributed across subagents running in parallel.
- Check before folding in — results are verified before they’re merged into the answer. Independent agents attack the problem from different angles; other agents try to refute their findings; the run keeps iterating until the answers converge.
- Return one coordinated answer — you come back to a single result, not a pile of subagent transcripts. Because coordination lives outside the conversation, the main session’s context never fills with intermediate state (the “token tax” the walkthrough details).
- Resume on interruption — progress is checkpointed continuously; an interrupted run continues rather than restarting.
Admins can disable workflows org-wide through managed settings.
The Bun rewrite — what it unlocks at scale
Anthropic’s headline example is Jarred Sumner’s port of Bun from Zig to Rust, run entirely through dynamic workflows (not yet in production at announcement):
- One workflow mapped the correct Rust lifetime for every struct field in the Zig codebase.
- The next wrote every
.rsfile as a behavior-identical port of its.zigcounterpart — hundreds of agents in parallel, with two reviewers on each file. - A fix loop then drove the build and test suite until both ran clean.
- An overnight workflow addressed unnecessary data copies and opened a PR for each for final human review.
Result: ~750,000 lines of Rust, 99.8% of the existing test suite passing, 11 days from first commit to merge. This is the same Bun rewrite referenced in The Capability Curve — now attributed concretely to dynamic workflows as the mechanism.
Availability & getting started
- Research preview, available today in: Claude Code CLI, Desktop, and the VS Code extension, plus the Claude API, Amazon Bedrock, Vertex AI, and Microsoft Foundry.
- Plan gating:
- Max / Team (and Claude Code via the API): on by default.
- Enterprise: off by default at launch — admin enables in Claude Code settings.
- Enterprise admins can also disable via managed settings.
- Enable / start: ask Claude to create a workflow, or turn on the
ultracodesetting. Pair with auto mode for best results. - Docs:
code.claude.com/docs/en/workflows.
Customer use (early access)
- Klarna — strongest results on discovery and review across large codebases: identifying dead code and cleanup opportunities that traditional static analysis missed, speeding maintenance/refactoring.
- CyberAgent — “fills the gap between firing off a single subagent and building out a full agent team. Plan to implementation just flows, so we can trust longer runs without losing visibility.”
The two quotes bracket the sweet spot: bigger than one subagent, lighter than a standing agent team — discovery/review sweeps and plan→implementation runs.
Best practices from the Claude Code team (Thariq Shihipar, 2026-06-02)
Thariq Shihipar (@trq212) and Sid Bidasaria (@sidbid), members of technical staff at Anthropic on the Claude Code team, published a best-practices deep-dive ~a week after launch (“my initial workflows experiences and learnings”; also on the Claude Blog). It is the primary-source companion to the announcement above — the framing the team itself uses. The authors’ own caveat: “best practices are still developing! Dynamic workflows often use more tokens, so think carefully about when and how to use them.”
Why a workflow beats one big context window — three failure modes
The default harness has to plan and execute in the same context window, which is great for coding but “can break down over long-running, massively parallel and/or highly structured adversarial tasks.” The longer Claude works in one context, the more it hits three named failure modes — and a workflow combats each by “orchestrating separate Claudes with their own context windows and focused, isolated goals”:
- Agentic laziness — stopping before a multi-part task is finished and declaring done after partial progress (the author’s example: “addressing 20 of the 50 items in a security review”).
- Self-preferential bias — Claude’s tendency to prefer its own results/findings, especially when asked to verify or judge them against a rubric. (The fix: have a separate agent do the verification.)
- Goal drift — gradual loss of fidelity to the original objective across many turns, especially after compaction — each lossy summarization can drop edge-case requirements or “don’t do X” constraints.
These are the same failure modes Anthropic documents for the model itself in the Opus 4.8 system card (lazy-investigation, overconfidence/self-preference, goal-drift). Dynamic workflows are the harness-level mitigation: isolation + focused goals + cross-agent verification, instead of trusting one long context to police itself.
Six reusable patterns
The dev-recommended vocabulary for what a workflow’s orchestration script should do:
| Pattern | What it does |
|---|---|
| Classify-and-act | A classifier agent decides the task/output type and routes to specialized agents, behaviors, or final determinations. |
| Fan-out-and-synthesize | Split into many parallel subtasks (isolated agents); a synthesizer step then merges their structured outputs and acts as a synchronization barrier. |
| Adversarial verification | For each agent’s output, a separate spawned agent adversarially checks it against a rubric or criteria. |
| Generate-and-filter | Generate many candidates, then filter / dedupe / select by rubric, verification, or quality tests to keep only the best. |
| Tournament | Multiple agents attempt the same task with different approaches; a judge runs pairwise comparisons until one winner remains. |
| Loop until done | Re-spawn agents in a loop until a dynamic stop condition (no new findings, no remaining errors) rather than a fixed iteration count. |
Example prompts (straight from the author)
The post’s most reusable artifact is a library of natural-language prompts that trigger good workflows — and most are not coding tasks:
- “This test fails maybe 1 in 50 runs. Set up a workflow to reproduce it, form theories and adversarially test them in worktrees
/goaldon’t stop until one theory works.” - “Go through my last 50 sessions and mine them for corrections I keep making and turn the recurring ones into CLAUDE.md rules.”
- “Dig through incidents in Slack for the past six months and find recurring root causes where nobody has filed a ticket.”
- “Take my business plan and run a workflow where different agents tear it apart from an investor’s, a customer’s, and a competitor’s perspective.”
- “Here’s a folder of 80 resumes, rank them for the backend role and double-check the top ten. Interview me using the AskUserQuestion tool for a rubric.”
- “I need a name for this CLI tool. Brainstorm a bunch of options and run a tournament to pick the top 3.”
- “Rename our User model to Account everywhere.”
- “Go through my blog post draft and verify every technical claim against the codebase — I don’t want to ship anything wrong."
"Often even more useful for non-technical work”
Thariq’s explicit claim: “workflows are sometimes even more useful for non-technical work.” The use-case catalog spans migrations/refactors (the Bun port above), deep research (the built-in /deep-research skill), deep verification, sorting (e.g. support tickets by severity via tournament/pairwise comparison), memory & rule adherence (mining sessions into CLAUDE.md rules), root-cause investigation (independent hypotheses to defeat self-preferential bias), triaging at scale (quarantine patterns), exploration & taste (design, naming, rubrics), evals, and model/intelligence routing (the workflow picks Haiku vs Opus per agent).
When NOT to use one
Workflows are new… they are not needed for every task and may end up using significantly more tokens. For regular coding tasks, try and ask yourself: does it really need more compute?
Tips
- Prompting: lean on the named patterns; the
ultracodetrigger lets Claude decide when to spin one up; say “quick workflow” for a lighter run. - Combine with [[claude-ai/claude-code-goal-command-walkthrough|
/goal]] and/loopfor autonomous, repeat-until-done behavior. - Token budgets: cap spend by prompting a budget inline — “use 10k tokens” sets the cap. (First concrete budget-control syntax beyond the walkthrough’s
budgetsknob.) - Saving & sharing: press
sin the workflow menu to save; check the.jsinto~/.claude/workflows, or distribute via a skill — put the workflow files in the skill folder and reference them inSKILL.md. Prompt Claude to treat a shared workflow as a template, not a verbatim script, for flexibility.
Hands-on demo (operator field test — 2026-05-29)
A RoboNuggets hands-on run of dynamic workflows + ultracode on Max, confirming and quantifying the announcement claims:
ultracodedefined in practice: a Claude-Code effort setting = extra-high effort + Claude decides on its own whether to invoke a dynamic workflow. It sits on the effort slider aftermax; the VS Code extension turns the toggle purple, and typingworkflowsin the terminal triggers a bespoke animation (Anthropic coded a custom UI cue for the release)./workflowsis a live status monitor: mid-run it shows the orchestrator’s drafted multi-phase plan (e.g. Phase 1 audit / Phase 2 planning / synthesis), per-agent completion state, and per-agent token consumption — the way to observe a long-running job’s progress.- Observed fan-out scale, real numbers: a 3-site brand-audit fanned out 9 audit agents → 13 live-fetch agents; an open-ended bug audit under
ultracodedid a pre-assessment as a sole agent first, then fanned out 8 parallel auditors, then 88 parallel sub-agents for the verification step — 96 total sub-agents for one bug report. Confirms “hundreds of parallel sub-agents” is literally true; the orchestrator’s own words: “orchestrate a fan-out audit with adversarial per-finding verification.” - Token cost, measured: the two heavy runs moved the account’s weekly rate limit from 2% → 6% (≈4% for two tasks) — concrete backing for the “meaningfully more usage” warning. ^[inferred — operator opinion that Anthropic should show absolute token counts, not a percentage]
- Orchestrator behavior: acts as a manager — “using the wait productively, pre-building the report generator so the moment data lands it can produce deliverables fast.” Output was technically rich but “vanilla white-paper”; a second design-system pass was needed to make it presentable (a workflow produces the analysis, not the polish).
- Field thesis: “Opus 4.8 is a great incremental release, but the way we work is dictated more by updates to the harness” —
ultracode+ dynamic workflows are the real unlock, not the benchmarks. ^[inferred — creator’s editorial thesis]
Second operator walkthrough (2026-05-31) — three concrete deltas
A second hands-on explainer (skill-audit run: 41 Haiku scoring agents → one Opus synthesis agent, ~5M input tokens, HTML “worst-to-best” skills ranking) surfaces operational specifics the announcement and the RoboNuggets run don’t:
/deep researchis a built-in workflow-backed command. It automatically invokes a dynamic workflow — spins up parallel research agents, has them vote on each claim, and returns a cited deep-research report. The first concrete named command that triggers a workflow on its own (distinct from typing “set me up a dynamic workflow”). ^[inferred — single-creator claim, not in the Anthropic announcement]- Workflow
.jsfiles save to a global location by default — redirect them explicitly. The generated script lands in the global Claude Code working directory, not the current project; the operator had to tell it to save into the project’s.claude/workflows/to keep the reusable workflow with the repo. Non-obvious gotcha for anyone expecting the [[claude-ai/claude-code-workflows-tool-walkthrough|canonical.claude/workflows/<name>.js]] path inside the project. ^[inferred — single-creator observation] - Pin all subagents to Haiku as the primary cost lever. Beyond “start scoped,” the explicit cost discipline is bound the scope, name the deliverable, and put every subagent on Haiku — most workflow spend is input tokens (cheaper than output), so a Haiku fan-out keeps a hundred-agent run affordable. Backs the “half my $200/month plan in one ~30-min prompt” anecdote from an unbounded desktop-wide crawl. ^[inferred — single-creator cost framing]
/goalvs workflow = depth vs width. The creator frames/goalas a loop (re-runs untildone == true, can run 24h+) and a workflow as a width play (N agents fan out, each executes a fixed plan slice, results synthesize once — no per-iteration convergence check). Note this is a usage heuristic, not the official line: Anthropic’s announcement explicitly describes workflows as self-checking (agents refute findings until answers converge), so the “no convergence loop” framing is the creator’s simplification, not a contradiction of the primitive. ^[inferred — creator’s mental model; reconciled against the official self-checking claim above]
Third operator walkthrough (2026-06-05) — concurrency caps + deep-research verify internals
A third hands-on explainer (“Claude Code Just Dropped Workflows”) retreads the now-covered ground (workflow.js orchestrator, journal/resumability, ultracode auto-trigger, Max/Team-on vs Pro/Enterprise-off, model-per-phase) but adds two falsifiable operational specifics absent from the announcement and the prior walkthroughs:
- Hard concurrency caps: 16 agents concurrent (max), 1,000 agents total per run. This bounds the “tens-to-hundreds in parallel” / “96 total sub-agents observed” framing above with an actual ceiling — excess agents queue and run as slots free; the 1,000-total is a per-run lifetime cap. ^[inferred — single unnamed creator; aligns with Anthropic’s documented Workflow limits but confirm against
code.claude.com/docs/en/workflowsbefore relying] - The
/deep researchverify stage, quantified: after fanning out parallel searches and deduping, it verifies the top ~25 claims with 3 independent verify agents each, and 2-of-3 refutes kills a claim before synthesis. This is the concrete adversarial-verification mechanic behind the “agents vote on each claim” note above. ^[inferred — single-creator detail]
Why it matters
- It closes the “official source” gap the walkthrough flagged in its Open Questions — the feature is now formally announced, named (dynamic workflows), and given a stable enablement path (
ultracode). - Adversarial convergence is now a first-class primitive. “Agents try to refute what they found until answers converge” is the same multi-angle/verify pattern the wiki documents in verification-loop autonomy and the Ara workshop — here it’s baked into the orchestrator.
- Resumability + out-of-conversation coordination is what makes hours/days-long runs viable — it removes the orchestrator-context-rot ceiling that capped model-as-orchestrator patterns.
- Fits the long-running-agent design space alongside [[claude-ai/claude-code-goal-command-walkthrough|
/goal]] (single-target autonomous loop) and agent teams (persistent multi-instance coordination). Dynamic workflows are the structured, dynamically-planned, self-verifying fan-out in the middle. - The token-cost warning is load-bearing — this is a power tool that bills like one; the confirm-before-first-run gate and “start scoped” guidance are the official discipline.
Try It
- Confirm access: on Max/Team or via the API, dynamic workflows are on by default — turn on
ultracode(or just ask Claude to create a workflow). On Enterprise, ask your admin to enable them. - Turn on auto mode before starting a long run.
- Start scoped to calibrate token usage — a single-service bug hunt or a dead-code sweep (Klarna’s use case) is the canonical first run. Expect the confirm-before-run prompt on the first trigger.
- Use it for the right shape of work: big migrations, whole-codebase discovery/review, or a plan you want stress-tested from every angle — not one-off single-pass tasks.
- For the mechanics (authoring
workflow.js, the agent/pipeline/schema primitives, budgets, per-subagent skip/retry), read the [[claude-ai/claude-code-workflows-tool-walkthrough|/workflowswalkthrough]] — it’s the hands-on companion to this announcement. - Read the official docs at
code.claude.com/docs/en/workflowsfor the current research-preview surface.
Refresh — Boris Cherny’s autonomous-Opus playbook (first-party, 2026-06)
[X signal — @bcherny 2026-06] Two first-party posts from Claude Code’s creator frame dynamic workflows inside the broader “run Opus autonomously for hours/days” playbook. Five tips (2063792263067754658, opening “Opus is the best model for long-running work”): (1) auto mode for permissions so Claude doesn’t stop to ask; (2) dynamic workflows to orchestrate hundreds/thousands of agents on one task; (3) /goal or /loop to nudge Claude to keep going until done; (4) Claude Code in the cloud (desktop/mobile app) so you can close your laptop; (5) give Claude a way to self-verify end-to-end — Claude in Chrome for web, iOS/Android simulator MCP for mobile, or a way to start the full web server/service for backend work. Tip 5 (the self-verification harness) is the most additive over this article’s existing coverage. Separately, his “Claude Code, one year after GA” reflection (2064034799711588805; echoed by @ClaudeDevs) reports the usage shift toward auto mode, routines that proactively fix bugs before the user sees them, coding from a phone, verification best practices, and loops — linking a @bcherny × @_catwu video — now transcribed and articled → Reflecting on a Year of Claude Code.
Related
- [[claude-ai/claude-code-workflows-tool-walkthrough|Claude Code
/workflowsWalkthrough]] — the hands-on deep-dive (workflow.js anatomy, agent/pipeline/schema, budgets, 3 worked examples). This article is the official announcement; that one is the mechanics. - [[claude-ai/claude-code-goal-command-walkthrough|Claude Code
/goalWalkthrough]] — the autonomous-loop primitive; the sibling long-running-agent design. - Claude Code Agent Teams — persistent multi-instance coordination; dynamic workflows sit between a single subagent and a full agent team (the CyberAgent framing).
- The Capability Curve — context for the Bun Zig→Rust rewrite as a capability proof point.
- When AI Builds Itself — Recursive Self-Improvement — the structural “why” behind the harness race: Anthropic’s internal data on AI accelerating AI development (8× code/quarter, 76% open-ended task success), with dynamic workflows as the mechanism that turns “quarters of work into days.”
- Claude Opus 4.8 + System Card — the model-level failure modes (lazy-investigation, self-preference, goal-drift) that Thariq’s workflow rationale mirrors at the harness level.
- How We Claude Code (Ara) — agent-native verification practices that complement workflow self-checking.
- Week 23 Release Digest — the v2.1.154 release that flipped dynamic workflows on in the CLI (
ultracode). - Week 22 Release Digest — surrounding Claude Code release context.
- CLI Reference —
auto-modeand related surfaces dynamic workflows compose with.
Open Questions
ultracodevsCLAUDE_CODE_WORKFLOWS=1. The walkthrough cited the env-var enablement (v2.1.147 era); the official post names theultracodesetting and “on by default for Max/Team/API.” Update (W23): Claude Code v2.1.154 shippedultracodeas the live effort-slider setting (= xhigh + Claude auto-deciding whether to invoke a workflow), and the RoboNuggets demo above confirms its in-CLI behavior. The relationship to the olderCLAUDE_CODE_WORKFLOWS=1env var (deprecated / aliased / distinct) is still unconfirmed.- Exact token/usage accounting. “Meaningfully more usage” and the confirm-before-run gate are stated, but no per-workflow metering detail or budget-unit definition is given. Update (2026-06-02): Thariq’s post documents an inline budget prompt — “use 10k tokens” sets a hard cap — the first concrete user-facing control beyond the walkthrough’s
budgetsknob. Per-workflow metering/reporting detail is still unspecified. - Bun port production status. Explicitly “not yet in production” at announcement; Jarred Sumner’s promised writeup will be the deeper primary source — worth a refresh when it publishes.
- Research-preview → GA timeline and whether Pro plans get access are not stated.