Source: ai-research/loop-engineering-cobusgreyling.md (github.com/cobusgreyling/loop-engineering — MIT, 162★/25 forks, created 2026-06-09, last push 2026-06-13)

cobusgreyling/loop-engineering is a practical, tool-aware reference for loop engineering — the discipline of replacing yourself as the person who prompts the agent by designing the system that does it instead. It turns the abstract idea (named in essays by Addy Osmani and Cobus Greyling, and in Boris Cherny’s “my job is to write loops” framing) into copyable assets: a six-primitive taxonomy, seven production loop patterns, an L0→L3 readiness ladder, an incident-style failure-mode catalog, side-by-side examples across Grok / Claude Code / Codex / GitHub Actions, and three published npm CLIs (loop-init, loop-audit, loop-cost). The repo dogfoods itself — it runs its own daily-triage and validate-patterns/audit loops on every push.

Key Takeaways

  • Definition. A loop is a recursive goal: you define a purpose and the agent iterates (with sub-agents, verification, and external state) until done or until it escalates to a human. “Loop engineering is replacing yourself as the prompter.”
  • The leverage shift. From crafting individual prompts → designing the control system that orchestrates agents over time. Boris Cherny (Head of Claude Code): “I don’t prompt Claude anymore. I have loops running that prompt Claude… My job is to write loops.” Peter Steinberger and Addy Osmani make the same call.
  • Six primitives (+ Memory). Automations/Scheduling · Worktrees · Skills · Plugins & Connectors (MCP) · Sub-agents (maker/checker) · + Memory/State — the durable spine outside any conversation.
  • Seven patterns, each with cadence, risk, starter kit, and week-one readiness level: Daily Triage, PR Babysitter, Issue Triage, CI Sweeper, Post-Merge Cleanup, Dependency Sweeper, Changelog Drafter.
  • Readiness ladder. L0 Draft → L1 Report-onlyL2 Assisted (small auto-fixes + verifier) → L3 Unattended (with denylist, budget, metrics, human gates). Never skip L1 for a new pattern on a production repo.
  • Honest about cost and risk. Token burn, comprehension debt, and “cognitive surrender” are named failure modes, not footnotes. Two people can run the same loop and get opposite results — the loop doesn’t know; you do.
  • The CLIs are thin but real: loop-audit (0–100 Loop Readiness Score, v1.4.1), loop-init (scaffolder, v1.2.1), loop-cost (token-spend estimator, v1.0.2). All on npm; no clone required.

The Six Primitives (+ Memory)

The building blocks every loop assembles from. A minimal viable loop is scheduling + one triage skill + a state file; you add the rest only once the prior version proves its value (and its failure modes).

PrimitiveJob in the loopRealizations
Automations / SchedulingThe heartbeat — discovery + triage on a cadence/loop (Grok/Claude Code), Claude Code scheduled tasks/cron, GitHub Actions + dispatch, /goal (run until a verifiable condition)
WorktreesSafe parallel execution without merge hellisolation: "worktree" on spawned sub-agents; one worktree per fix, deleted after verify/handoff
SkillsPersistent memory of intent (conventions, build/test commands, “we don’t do it this way”)SKILL.md + scripts; the cure for intent debt
Plugins & Connectors (MCP)Reach into real tools — Linear/Jira, Slack, GitHub PRs, DBs, deploysMCP as the common substrate; scope read+comment until trusted
Sub-agents (maker/checker)The single most important reliability pattern — the implementer never grades its own homeworkExplorer→Implementer→Verifier; verifier on a stronger model for unattended loops
+ Memory / StateDurable spine across sessions — “what are we working on, what did we try, what’s waiting on a human”STATE.md / LOOP-STATE.json / a Linear section; often the single most important artifact the loop produces

The Seven Patterns

Each pattern ships with cadence, risk rating, a clone-and-run starter (Grok/Claude Code/Codex), week-one readiness level, and tool-specific notes. Machine-readable index lives in patterns/registry.yaml.

PatternCadenceRiskWeek 1Token cost
Daily Triage1d–2hLowL1 reportLow
PR Babysitter5–15mMediumL1 watchHigh
CI Sweeper5–15mMediumL2 cautiousVery high
Dependency Sweeper6h–1dMediumL2 patch-onlyMedium
Changelog Drafter1d / on-tagLowL1 draftLow
Post-Merge Cleanup1d–6hLowL1 off-peakLow
Issue Triage2h–1dLowL1 propose-onlyLow

Best first loops are the low-risk, low-cost, L1 ones (Daily Triage, Changelog Drafter, Post-Merge Cleanup) — high value, report-only, hard to do damage. The repo’s own pattern-picker.md and an interactive web picker route you to one.

Readiness Ladder + Upgrade Path

The spine of the operating discipline (the loop-design-checklist.md rubric scores 10 sections against these):

  • L0 — Draft: documented intent only.
  • L1 — Report: triage → state file, no auto-action. Run here 1–2 weeks before anything writes code.
  • L2 — Assisted: small auto-fixes, each gated by a separate verifier + worktree + max-attempts cap.
  • L3 — Unattended: runs without you watching — only with denylist, token budget, metrics, and explicit human gates.

loop-audit enforces this: L3 is capped until a project has a verifier, state, cost observability (budget + run-log + LOOP.md budget section), and proven loop activity (real “Last run” timestamps / loop commits / scheduled workflows — not just files on disk).

Failure-Mode Catalog (the most reusable part)

An incident-style catalog rated S1 (annoying) / S2 (harmful) / S3 (critical) — usable as a debugging checklist for any loop, not just ones built from this repo:

  • Infinite Fix Loop (S2) — same PR/CI gets 5+ fix attempts, never converges. Mitigate: hard cap (≈3) → escalate; separate/stronger verifier; classify flakes in triage; record attempt count in state.
  • Verifier Theater (S2) — verifier “approves” but CI fails. Mitigate: verifier must actually run tests/lint and report output; different instructions (“find reasons to reject”); different model+context from the implementer.
  • State Rot (S1→S2) — STATE.md references merged PRs / closed tickets; loop acts on ghosts. Mitigate: prune every run; validate IDs against live API; one state file per pattern.
  • Token Burn (S1) — sub-minute cadence runs full sub-agent chains on empty triage. Mitigate: cheap triage-only pass first, spawn sub-agents only when state says actionable; early-exit on empty watchlist (<5k tokens); daily budget → pause.
  • Notification Fatigue (S1→S2) — pings every run, team mutes the bot, real escalations missed. Mitigate: notify only when a human decision is required; digest mode.
  • Over-Reach / Wrong Scope (S2→S3) — loop refactors unrelated modules or touches denylisted paths. Mitigate: enforced path denylist; “smallest possible diff”; triage = signal only, no invention.
  • Comprehension Debt Spiral & Cognitive Surrender (S2, long-term/cultural) — velocity up but nobody can explain recent changes; “the loop handles it.” Mitigate: mandatory human review for non-trivial PRs; weekly loop-digest; success metric = time saved with the quality bar held.
  • Parallel Collision (S2) & Escalation Failure (S2) — sub-agents edit the same files; or loop retries forever and never pings a human. Mitigate: isolation: worktree for all code-editing sub-agents; connector ping on escalation + alert if an item waits >24h.

Vocabulary It Standardizes

Glossary linking loop engineering to the surrounding agentic-dev ideas (mostly Osmani’s): Agent Harness Engineering (the single-session sandbox; the loop = harness + schedule + state + verification), the Factory Model (the system that builds the software), Intent Debt (cold-start guesses; paid down by skills), Comprehension Debt (gap between what exists and what you understand), Orchestration Tax (human cost of coordinating parallel agents), and Code Agent Orchestra / Adversarial Code Review (different agents for explore/implement/verify).

Examples by Tool

The #examples-by-tool section — the most directly useful comparison — implements the same patterns four ways:

  • Grok — native /loop, isolation: "worktree", scheduler_create/scheduler_delete; the repo’s primary target (the Grok Build TUI has strong native primitive support).
  • Claude Code/loop 1d Run $loop-triage. Read STATE.md...; /goal for one-shot “get main green” with a fresh model checking the stop condition; verifier as a .claude/agents/loop-verifier.md sub-agent; isolation: worktree.
  • Codex — Automations + .codex/agents/verifier.toml.
  • GitHub Actions — cron/dispatch workflows (the repo’s own daily-triage.yml is the live example).

An MCP cookbook (examples/mcp/) supplies read-only and safe-write connector configs (GitHub read-only, GitHub propose, Linear, Slack-read) plus a “safe write pattern.”

Implementation

Tool/Service: cobusgreyling/loop-engineering (MIT). Three npm packages + a clone-and-copy reference repo + a GitHub-Pages interactive showcase. Setup (no clone needed):

npx @cobusgreyling/loop-init . --pattern daily-triage --tool grok   # scaffold starter + loop-budget.md + loop-run-log.md
npx @cobusgreyling/loop-cost  --pattern daily-triage --level L1      # estimate daily token spend
npx @cobusgreyling/loop-audit . --suggest                           # 0–100 Loop Readiness Score + next steps

Cost: free/OSS. The point of loop-cost is that the real cost is your loops’ token spend — cadence is a linear multiplier (5m vs 1d = 288× runs/day); a 15m CI-sweeper running full sub-agent chains every time is ~5M tokens/day (“avoid”). loop-audit exits code 2 if score < 40 (CI-gate friendly). Integration notes: loop-audit scores presence of state file, triage skill, verifier, LOOP.md, AGENTS.md/CLAUDE.md, safety docs, workflows, MCP config, worktree evidence, registry.yaml, budget + run-log, and (v1.4) dynamic loop activity. The CLIs are intentionally thin wrappers around the registry metadata; the durable value is the docs, patterns, and checklist.^[inferred — based on reading the tool source vs the docs]

Try It

  • Read the failure-mode catalog and the design checklist even if you adopt nothing else — they’re the highest-signal, tool-agnostic parts and double as a review rubric for loops you already run.
  • Score an existing loop: npx @cobusgreyling/loop-audit . --suggest on a repo where you run /loop or scheduled tasks, and see where it lands on L0–L3.
  • Map it onto Claude Code’s native primitives you already have: /loop, /schedule, /goal, ScheduleWakeup, scheduled cloud agents (Routines), worktree isolation, and sub-agents — this repo is the discipline layer on top of those mechanisms, not a replacement for them.^[inferred]
  • Start a new loop at L1 report-only (Daily Triage or Changelog Drafter) for a week before letting anything write code.
  • 12-Factor Agents (HumanLayer) — sibling framework; “LLMs are stateless functions, own your control flow / context / prompts” is the same thesis loop engineering operationalizes over time.
  • Dynamic Workflows (Claude Code) — the Claude-Code-native loop mechanism (/loop, agent-loop genius-or-hype) this reference maps patterns onto.
  • Loop Engineering — Addy Osmani’s Essay — the canonical primary essay this repo turns into tooling (the repo was created 2026-06-09, the day after the essay); the five building blocks, the worked example, and the named risks originate there.
  • Verifier-First Loops — the verifier-first pre-flight that prevents this catalog’s Verifier Theater / Infinite Fix Loop failures.
  • Should You Build a Loop? — the four-condition decision test + 30-day security checklist that complement this repo’s loop-cost economics and failure catalog.
  • agent-skills (Addy Osmani) — Osmani’s companion essay; his spec-to-ship skill lifecycle is the “skills pay down intent debt” primitive made concrete.
  • Reflecting on a Year of Claude Code (Boris Cherny & Cat Wu) — primary source for the “my job is to write loops” framing and the /loop 30m /slack-feedback-style usage.
  • Ryan Carson’s Clawd Chief — “agents are cron jobs and markdown files” — the same loop thesis at solo-founder scale.
  • Browserbase Autobrowse and Reflexio — harness-self-improvement siblings: graduate a successful run into a reusable skill/playbook (maker/checker + skills-as-memory overlap).
  • 2026 Claude Code AIOS Pattern — the markdown-config + scheduled-task “agent OS” framing loops slot into.
  • The Loop Is the Unit of Work — the cross-topic synthesis placing this catalog, Claude Code’s /loop/Routines, and the verification frontier as one pattern (prompt → harness → loop).
  • Agent Loops (topic) — the wiki’s learning hub for loops; start with the Write Loops, Not Prompts explainer (lineage + 3 starter loops), then this catalog as the reference layer.

Open Questions

  • The three CLIs are days old (loop-cost v1.0.2) and lightly tested; treat scores/estimates as directional, not authoritative.^[inferred]
  • No independent production-loop case studies yet beyond the author’s own stories/ (which honestly include why-we-killed-ci-sweeper.md); adopter evidence is thin at 162★.