Agent Loops

Agent loops — small programs (or slash-commands, workflows, schedules) that prompt an agent for you, read what it produced, decide whether it’s done, and reprompt until it is. This topic is a hands-on learning path for understanding and safely building them: the lineage, the mental model, the cost controls that keep them from burning your budget, the verification discipline that makes them trustworthy, and concrete loops you can run this week.

The leverage claim that started the topic (Boris Cherny, creator of Claude Code): “I don’t prompt Claude anymore. I have loops running that prompt Claude… My job is to write loops.” This topic is where the wiki learns that craft.

Start Here — a learning path

Mental model + lineage — Write Loops, Not Prompts (explainer). What a loop is, how we got here (ReAct → AutoGPT → Ralph Loop → /goal → agent loops), the 3 cost controls, and 3 beginner starter loops. Begin here.
The thesis (primary source) — Loop Engineering — Addy Osmani’s Essay. The canonical essay that named the practice: “replace yourself as the prompter,” the five building blocks, and the argument that they now ship inside both Claude Code and Codex.
The official taxonomy (first-party) — Loop Engineering: Getting Started with Loops (Anthropic). The Claude Code team’s own framework, published three weeks after the essay above: four loop types — turn-based, goal-based (/goal), time-based (/loop//schedule), proactive (composing all three + dynamic workflows) — each mapped to what you hand off and which primitive to reach for.
The reference catalog — Loop Engineering (Cobus Greyling). Six primitives, seven production patterns, the L0→L3 readiness ladder, and a failure-mode catalog you can design against. The deeper “how to build it right” layer (built on the essay above).
The verification discipline — Verifier-First Loops. Write the verifier before you launch the loop; proof outside the agent (tests, screenshots, artifacts). The topic’s load-bearing open question, answered.
Building the verifier, in engineering depth (first-party) — Harness Design for Long-Running Agents (Anthropic). Anthropic’s own two-part series: an initializer+coding-agent harness that solves cross-session memory (Nov 2025), then a GAN-inspired generator/evaluator split that solves the “agents grade their own homework generously” problem the first post didn’t touch (Mar 2026) — real cost/duration figures, code-line-level bug reports, and the discovery that a model upgrade can retire half your harness.
The decision + economics — Should You Build a Loop?. The four-condition test, the token-cost math, and the security tax — when not to build one.
The synthesis — The Loop Is the Unit of Work. Why the leverage point moved prompt → harness → loop, and how the Claude-native and tool-agnostic vocabularies are one pattern.
The first-party source — Reflecting on a Year of Claude Code. Boris Cherny & Cat Wu on “talk to a loop, not an agent,” /babysit, and auto-mode.

Articles in this topic

Write Loops, Not Prompts — Agent Loops Explained — lineage (ReAct/AutoGPT/Ralph Loop//goal), the one-sentence definition, three cost controls (max iterations · no-progress detection · token ceilings), skills-as-the-thing-that-makes-loops-work, three starter loops (issue-backlog · front-end-verification · code-review/babysit), and the readiness gate. Start here.
Loop Engineering — Addy Osmani’s Essay (the canonical thesis) — the primary source that named loop engineering (X Article, 2026-06-08, 1.8M views): “replace yourself as the prompter,” the five building blocks (+ memory) mapped onto both Claude Code and Codex, a worked daily-triage loop example, and the residual risks (verification, comprehension debt, cognitive surrender). The Cobus repo below is built on this.
Loop Engineering: Getting Started with Loops (Anthropic, first-party) — the Claude Code team’s own post (claude.com/blog, ~2026-06-30), arriving three weeks after Osmani’s essay: four loop types (turn-based · goal-based · time-based · proactive), a worked SKILL.md verification example, and the “biggest cost levers are model/effort choice and loop boundaries” guidance.
Loop Engineering — Cobus Greyling’s Cross-Tool Pattern Reference + CLIs — the reference catalog: six primitives, seven production patterns, the L0→L3 readiness ladder, an S1/S2/S3 failure-mode catalog, cross-tool examples (Grok / Claude Code / Codex / GitHub Actions), and the loop-audit/loop-init/loop-cost npm CLIs.
Verifier-First Loops — Proof Outside the Agent — the verification discipline (omarsar0 · alphabatcher · Karpathy): write the verifier first (done-condition · per-pass check · saved artifact · failure→retry), keep proof outside the agent’s self-report, split plan/execute/evaluate across model families, and prefer multimodal goals. Anchored on Karpathy’s “if you can’t evaluate, you can’t auto-research it.”
Harness Design for Long-Running Agents — Anthropic’s Two-Part Engineering Series — first-party (Justin Young, Nov 2025 + Prithvi Rajasekaran, Mar 2026): an initializer-agent/coding-agent harness solving cross-session memory loss (feature-list JSON, git-commit discipline, browser-automation testing), then a GAN-inspired generator/evaluator split solving self-evaluation leniency, scaled to a planner/generator/evaluator architecture for full-stack coding with real cost tables ( $9 so l o v s$ 200 full harness vs $124.70 after a model-upgrade-driven simplification).
Should You Build a Loop? The Four-Condition Test, Cost Math & Security Tax — the decision/economics layer (plutos_eth): the four-condition test for whether to build a loop at all, token-cost ranges, cost-per-accepted-change, four silent failure modes (Ralph Wiggum · self-preference · agentic laziness · goal drift), and a 30-day loop security checklist.
The Loop Is the Unit of Work — the synthesis: how the leverage point migrated prompt → harness → loop, why the Claude-native and tool-agnostic vocabularies are one pattern, and why the maker/checker verifier is what makes a loop shippable (the verification gradient = the readiness ladder).
Temporal — Durable Agentic Loop with Claude Tool Calling (Python) — the durability/reliability dimension of the loop: implement the Claude tool-use loop as a Temporal Workflow (the loop) + Activities (Claude call, tool execution) so it survives crashes and Temporal owns retries (not the Anthropic client). The roll-your-own counterpart to Managed Agents’ external session log + `wake()`, and the durable sibling of the first-party Agent SDK loop.
The Best Way to Vibe Code (Matthew Berman) — practitioner case study: automate the prompt-review-reprompt cycle; the free Loop Library, three nightly loops (overnight docs-sweep · novel sub-50ms page-load loop · production-error-sweep), the test+docs+logging flywheel, the “wait until you see X” automation-sequencing trick, cloud-vs-local decision criteria, and the still-unsolved parallel merge/deploy problem (batch-commit workaround).
Matthew Berman) — the curated, community-submittable catalog of agent loops at signals.forwardfuture.ai/loop-library; includes the architecture-satisfaction loop (Peter Steinberger) as a worked verifier-first example. The recipe-collection companion to the loop-engineering catalog above.
LOOPS — Everything You Need to Know (Matthew Berman) — the canonical framework + full catalog reference: the trigger × goal model (3 trigger types × 2 goal types — verifiable vs LLM-as-judge), the 7-loop catalog with verbatim copy-paste prompts (sub-50ms page-load · overnight docs sweep · architecture satisfaction · logging coverage · production error sweep · SEO/GEO visibility · full product evaluation), how to run them with /goal and automations, and the two honest caveats (goal-design is the hard part / can’t build features with loops yet; loops are expensive). Berman’s third source in this topic — pulls the loop theory and exact prompts into one place.
Agent Loops, Clearly Explained (Nate Herk) — a grounded practitioner counterweight to the hype: the Reason-Act-Observe mental model, the quality-vs-attempts curve (outsource the iteration loop to the agent), the two pillars (an objective goal + verification), three topologies (solo · maker-checker · manager+helpers), and the contrarian message that most tasks don’t need loops or fleets — match the run length to the payoff, and a loop is only as good as its done-check. Worked /goal demos (thumbnail-scoring · three.js plane · Abbey Road CSS) pulled from Berman’s Loop Library.
Looper — Visual, Review-Gated Agent Loops for Claude Code — a tool (ksimback, MIT, 233★) that makes you design a loop before running it: /looper interviews you into a sharp goal + falsifiable verification + a cross-model review gate (defaults to a different model family to avoid self-evaluation bias) + termination guards, then compiles a portable loop.yaml + run-loop.py. The maintainer frames it as “the design layer that sits in front of” /goal (execution) and /loop (scheduling) — the verifier-first discipline operationalized before the loop runs.
How I Plan, Build, and Run Loops with Claude Code (Thariq Shihipar) — first-party practitioner methodology from an Anthropic Claude Code team member: the /loop vs /goal (hand off the exit condition, “power through,” guaranteed completion of a complex task) vs workflows (subagents that do + parallelize + verify; turn a non-deterministic task into a roughly-deterministic one; strongest for non-technical work) distinction, why a separate verifier beats self-grading, the Figma-MCP-vs-screenshot verifiability example, the Bun→Rust workflow, latency-as-a-deterministic-signal for /goal auto-research, and packaging a workflow-as-JS-file into a reusable skill. The demo-driven companion to the first-party loop taxonomy and Dynamic Workflows.
The Gauntlet Loop (Matt Shumer) — the viral one-prompt method behind “Claude of Duty” (a ~55k-line FPS Claude Code + Opus 5 built from a single prompt, every asset generated in code, repo open-sourced): the agent — not you — breaks the goal into parts, gives each part a specialist builder and a fresh-context blind-critic subagent, and only passes work that beats a real-world reference (actual Call of Duty screenshots) in a blind side-by-side — split, build, judge, repeat, with no arbitrary final round. Carries the verbatim prompt, the seven-step procedure, an r/ClaudeAI reproduction (“Claude Bandicoot,” three 5-hour Opus 5 ultracode windows), and the generalization to websites / writing / backend (exemplar paragraphs or test suites as the bar). The maximalist showcase cousin of verifier-first loops — the same never-let-the-builder-grade-itself discipline, aimed at creative quality instead of correctness.

Core mechanisms (Claude Code)

Dynamic Workflows — orchestration logic that lives outside the agent (a JS function), with per-sub-agent token caps and max-iteration variables.
goal` Walkthrough — the completion-condition loop (a Ralph Loop that knows when it’s done).
loop` & scheduled tasks and Routines — the scheduling heartbeat that turns a one-shot into a standing loop.

The hard parts (resolved 2026-07-03, one question remains)

Verification, ambiguity-handling, vision-doc planning, the human-in-the-loop line, the non-engineer minimum-viable loop, and the precise /goal//loop/Routines/Dynamic Workflows boundaries were this topic’s six oldest open questions — all six were drained and resolved on 2026-07-03. See The Verification Frontier for the underlying theory and each article’s own “resolved” sections for the practical answers: Verifier-First Loops (verification standard), Write Loops, Not Prompts (ambiguity, vision docs, human step-back point), Loop Engineering (non-engineer minimum-viable loop), and The Loop Is the Unit of Work (the four-primitive comparison). One question remains genuinely open — empirical per-loop-type token-cost benchmarks — tracked in this topic’s research agenda.

Agents & Agentic Systems — the broader agent topic (frameworks, multi-agent patterns, harness self-improvement) this sub-theme grew out of.
The 2026 Claude Code AIOS Pattern — loops as the orchestration layer of a larger “agent OS.”

Jonathon's AI Wiki

Explorer

Agent Loops

Start Here — a learning path

Articles in this topic

Core mechanisms (Claude Code)

The hard parts (resolved 2026-07-03, one question remains)

The Gauntlet Loop (Matt Shumer)

Verifier-First Loops — Proof Outside the Agent (omarsar0 · alphabatcher · Karpathy)

How I Plan, Build, and Run Loops with Claude Code (Thariq Shihipar)

Loop Engineering: Getting Started with Loops (Anthropic's Official Framework)

Harness Design for Long-Running Agents — Anthropic's Two-Part Engineering Series

The Best Way to Vibe Code (Matthew Berman) — Automations, Loops & the Loop Library

The Loop Is the Unit of Work — Convergent Loop Engineering Across Claude Code and Agentic Frameworks

Should You Build a Loop? The Four-Condition Test, Cost Math & Security Tax

Loop Engineering — Cobus Greyling's Cross-Tool Pattern Reference + CLIs

Write Loops, Not Prompts — Agent Loops Explained (Lineage, Cost Controls, 3 Starter Loops)

Looper — Visual, Review-Gated Agent Loops for Claude Code

Agent Loops, Clearly Explained (Nate Herk): Reason-Act-Observe + Why Most Tasks Don't Need Fleets

Loop Library (Forward Future / Matthew Berman)

LOOPS — Everything You Need to Know (Matthew Berman): the Trigger × Goal Framework + 7-Loop Catalog

Loop Engineering — Addy Osmani's Essay (the canonical thesis)

Temporal — Durable Agentic Loop with Claude Tool Calling (Python)

Explorer

Agent Loops

Start Here — a learning path

Articles in this topic

Core mechanisms (Claude Code)

The hard parts (resolved 2026-07-03, one question remains)

Related topics