Source: raw/Claude_Code_Creator_-_Write_Loops_Not_Prompts.md (YouTube EH2MMQTaPEA, ~9-min creator explainer; channel not named in the transcript — reacting to Boris Cherny’s “write loops” clip + tweet thread)

A beginner-friendly explainer that “susses out the middle ground” on the viral “you should be designing loops that prompt your agents, not prompting agents yourself” claim. Its value is pedagogical: it traces how we got to agent loops (ReAct → AutoGPT → Ralph Loop → /goal → agent loops), defines a loop in one sentence, gives three cost controls so loops don’t “burn billions of tokens,” and ends with three concrete starter loops anyone can run this week. A good first stop for learning the topic before the heavier references. This is the seed article for the agent-loops topic.

Key Takeaways

  • A loop is a small program that prompts the agent for you, reads what it produced, decides if it’s done, and reprompts if not. That’s the whole idea — the rest is guardrails.
  • The leverage claim: building the loop that harnesses your work is more valuable than the individual prompts you send. “My job is to write loops” (Boris Cherny).
  • Lineage matters: today’s “agent loop” is the productized descendant of ReAct (2022–23), AutoGPT, and the Ralph Loop — each fixed a specific failure of the last.
  • /goal = a Ralph Loop that stops on a completion condition instead of a max-iteration count. That single change is what made loops practical.
  • Token blowups are a configuration failure, not an inherent property. Three controls prevent them: max iterations, no-progress detection, and per-sub-agent token ceilings.
  • Skills are what make loops effective — “loops are plumbing; if you pour concrete and eggshells down there you’ll have a bad time.” A loop is not a new capability by itself.
  • Readiness gate: if you can’t already run 2–3 parallel agent sessions comfortably, building a loop is a bad idea.

The Lineage — how we got to agent loops

The explainer’s spine is a four-step evolution, each step solving the previous one’s failure:

  1. ReAct (2022–23 paper) — the first formalized agent loop: a model reasons about what to do, acts via tools, reads the output, and repeats until done. One model, one loop, a human watching the whole time. Unlocked acting on real tool output instead of hallucinating.
  2. AutoGPT — give it a goal and let it prompt itself toward completion, spawning sub-goals. Proved autonomy was possible with guardrails — but ran down rabbit holes, wasting time and tokens.
  3. The Ralph Loop — run the same exact prompt over and over until the work is complete or a max-iteration cap is hit. Its insight: discipline through forced amnesia — a fresh conversation every run means fresh eyes and an objective re-approach, instead of a history that clogs the pipes.
  4. /goal (Codex + Claude Code) — productized the Ralph Loop with one change: instead of a max-iteration count, it runs until the task and your intention for it are actually complete, judged against criteria you set. “A goal is a Ralph Loop that knows when it’s done.”

That lands at the agent loop: loops orchestrating loops — effectively a multi-agent-orchestration version of a Ralph Loop. The mental model: the loop that harnesses everything is worth more than any single prompt through it.

The Three Cost Controls

The viral backlash is mostly about money — the explainer cites Uber blowing its entire AI budget in 4 months, then capping engineers at ~$1,500 per tool per month. For those of us on a Claude Max or Codex CLI plan rather than a corporate budget, three controls keep a loop from “flying off the rails”:

  1. Max iterations — a hard cap even when you have a goal, for the case where the loop can’t reach it and would otherwise retry forever. In Claude Code workflows you set this as a variable in the workflow.
  2. No-progress detection — the real source of runaway bills: the loop hits the same issue repeatedly, or new issues keep cropping up and it wanders down weird paths without addressing the root. Detect stagnation and stop.
  3. Token ceilings — a hard financial stop; in a Claude workflow you cap the max tokens each sub-agent may spend.

Key enabler: in workflows the orchestration logic lives outside the agent (a JavaScript function). The old failure mode — “the agent was responsible for orchestrating the agent,” so it locked into bad decisions and ran with them — goes away when an external function drives the loop. Framed properly, a loop “shouldn’t burn any more tokens than you doing all those steps manually would have.”

Skills Make Loops Work

“Loops are like plumbing — but if you’re pouring concrete and eggshells down there, you’re going to have a really bad time.”

A loop is not, by itself, a new capability that turns you into a super vibe-coder. The best loops integrate skills that already exist into how they run. Encode your context, conventions, and guardrails as skills/markdown the loop reads, so the agent knows when it may decide for itself and when it must stop and check in with you — a line the explainer stresses is personal to your skill level and what you want to stay in control of, not one-size-fits-all.

Three Starter Loops You Can Run This Week

  1. Issue-backlog loop. Log every bug / feature idea / “come back to this” as a GitHub issue, then: /goal → go through the open issues, use the superpowers systematic-debugging skill, write a test case for each bug, commit it, and open a PR per bug. You don’t get the “goal achieved” notification until all of them are fixed. Cost-control caps can be added in plain natural language.
  2. Front-end verification loop. After a front-end change, kick off the Chrome extension (or an iOS simulator), snapshot the result, inspect the DOM, and cross-reference it against your specs / acceptance criteria — and especially against your designs, viewing the built UI and the Claude Design mockup side by side, iterating until near-100% fidelity. Cap the tries or wrap it in /goal.
  3. Code-review loop (Boris Cherny’s). /loop every 5 minutes with a custom /babysit command that auto-addresses code-review comments, auto-rebases, and shepherds PRs to production. Once you run parallel agents you generate far more commits and PRs; this loop is what keeps that volume from getting away from you. Note Cherny’s loops lean heavily on custom skills and slash commands.

The Readiness Gate

A blunt prerequisite: if you can’t already comfortably run 2–3 parallel agent sessions yourself — managing the context, passing it between agents, and getting output you’re happy with purely from the context you authored — building a loop is a really bad idea. If you can, the question simply becomes how to automate that process better. Verification underpins all of it: “you have to have a really strong process for verifying and reviewing what has been done” — and against what standard is the part the explainer says is still underdeveloped.

  • Agent Loops (topic index) — the learning hub this article seeds.
  • Loop Engineering — Addy Osmani’s Essay — the canonical primary source the whole topic descends from: “replace yourself as the prompter,” the five building blocks mapped onto both Claude Code and Codex.
  • Loop Engineering (Cobus Greyling) — the deeper, tool-agnostic reference: six primitives, seven patterns, L0→L3 readiness ladder, failure-mode catalog. The next stop after this explainer.
  • Verifier-First Loops — the verification discipline (write the verifier first) that answers this explainer’s “against what standard?” open question.
  • Should You Build a Loop? — the four-condition test + cost math + security tax; the decision layer above the three cost controls here.
  • The Loop Is the Unit of Work — the synthesis placing this lineage in the prompt → harness → loop progression.
  • Reflecting on a Year of Claude Code — the first-party Boris Cherny source for “my job is to write loops” and the /babysit code-review loop.
  • Dynamic Workflows — the Claude Code mechanism for the externalized orchestration + per-sub-agent token caps this video leans on.
  • [[claude-ai/claude-code-goal-command-walkthrough|Claude Code /goal Walkthrough]] — the completion-condition primitive at the center of the lineage.
  • The Verification Frontier — why “against what standard do you verify?” is the load-bearing open question.

Open Questions

The explainer is candid that three things still lack clean answers (carried into the topic research agenda):

  • How do loops handle ambiguity? Without enough context, the model decides on your behalf at runtime. For business-case / end-user-problem decisions, that context must be extremely well documented in markdown the loop reads — or don’t use loops yet.
  • Where does planning live? Is the loop planning the intention, the implementation, or both? Peter Steinberger uses a vision.md per project (per Grok: core problem, intended solution, key goals, technical principles, and what success looks like) — but whether a vision doc actually erases the decisions you’d otherwise make by hand is something you have to test yourself.
  • Where does the human step back in? You still need to be in the loop to test (unless tokens are effectively infinite). The right point depends on your budget, domain knowledge, and whether you’re an engineer — there’s no one-size-fits-all line.