Hermes Autonomous SWE Workflow — Model Routing via Kanban (Shubham Saboo)

Source: saboo-shubham-hermes-swe-workflow-2026-05-22.md — X post + thread by Shubham Saboo (@Saboo_Shubham_), status/2057880306804527566 (thread root 2056577594926256618, ~2026-05-19), fetched via x_search 2026-05-22. SmallCode repo: github.com/Doorman11991/smallcode.

A worked example of Hermes Agent as an orchestrator that routes a coding task by complexity — cheap local models for scoped work, frontier models for the hard parts — with everything tracked on a single Kanban board. Shubham Saboo’s thesis: “LOCAL AI coding agent finally make sense with Hermes orchestrator.” The accompanying retro-terminal infographic walks one feature request (“add dark mode toggle to settings page and persist preference”) end-to-end through Telegram prompt → Hermes decomposition → Kanban routing → parallel execution → review → merge-ready patch.

The workflow infographic

HERMES-AGENT autonomous software engineering workflow — a retro amber-on-black terminal diagram showing a Telegram prompt decomposed by Hermes into a Kanban board, with simple scoped tasks routed to SmallCode plus Ollama on a Mac mini and hard or ambiguous work routed to Claude or Codex, plus an activity feed ending in a merge-ready dark-mode-toggle patch

Infographic by Shubham Saboo. The pipeline: (1) Telegram prompt (“add dark mode toggle + persist preference”) → (2) Hermes gathers intent/context and decomposes → (3) Kanban task decomposition (prioritize + route by complexity) → (4) Kanban board (Backlog / Todo / In Progress / In Review / Done) where Hermes tracks everything → (5) activity feed ending in a merge-ready settings_dark_mode.patch with 5 passing tests. Footer reads “MODEL ROUTING: Smart | WORKER HOST: Mac mini (Ollama).”

The routing rule

The load-bearing idea is task-complexity-based model routing, decided at decomposition time:

Task class	Routed to	Why
Simple, scoped coding (well-defined patches, tests, repo cleanup)	SmallCode + Ollama on a Mac mini	Fast, local, zero API cost — a terminal-native agent running models like Gemma 4
Hard / planning / ambiguous (complex reasoning, architecture, reviews, unclear requirements)	Claude or Codex	Frontier reasoning where it actually pays for itself

Hermes is the orchestrator/PM layer above both: it understands the request, decomposes it, assigns each card to the right worker, tracks status on the Kanban board, runs the review loop, and reports the merge-ready result. In the worked example, SmallCode generates the toggle patch + 5 tests locally while Claude does the code review and flags an edge case (“consider edge case when system theme changes”).

Key Takeaways

This is the cheapest-successful-outcome principle applied to a coding agent. The same thesis the wiki tracks in Puneet Shah’s platform-optimization talk (Sonnet/Haiku executor + Opus advisor) and Lucas’s “optimize cheapest *successful* outcome, not cheapest per token” — here the cheap tier is pushed all the way to a free local model on a Mac mini, and the routing decision is made structurally at task-decomposition time rather than per-prompt.
SmallCode is a terminal-native agent, not a model. Per Saboo’s follow-up: github.com/Doorman11991/smallcode is a harness that drives local Ollama models (Gemma 4 cited) at no cost. It occupies the same “cheap scoped worker” slot that a Haiku-tier model fills in cloud-only routing — but with zero marginal cost since inference is local. The Hermes-as-orchestrator-over-heterogeneous-workers shape mirrors the broader Hermes plugin ecosystem and the reviewer≠worker pattern from the HESO Hermes-Sisyphus-Orchestrator community project staged the same week.
Kanban as the orchestration substrate. Hermes already ships a kanban primitive (it’s one of the Herm-TUI’s 11 tabs). Saboo’s workflow uses it as the single source of truth for multi-worker coordination — every task is a card, every card has an assigned worker + status + comments, and the review loop happens in the “In Review” column. This is the concrete answer to “how does one orchestrator coordinate a local worker and a frontier worker without losing track?”
The Mac mini + Ollama detail is the practical unlock. Running the scoped-task tier on local hardware means the high-frequency, low-stakes work (patches, tests, cleanup — the bulk of any coding session) costs nothing and never hits an API rate limit. Frontier spend is reserved for the few cards that genuinely need it (architecture, ambiguous requirements, final review). For a self-hosted Hermes operator already running 24/7 on a box, this is a natural cost-floor.
Worked example > abstract claim. Unlike the nainsi dwivedi overview infographic (which maps the whole framework), this one traces one real task through the system with timestamps, worker assignments, an activity feed, and a final merge-ready patch. That makes it the better artifact for someone asking “okay, but what does a Hermes coding session actually look like end to end?”
Provenance caveat. This is a creator’s polished visualization of his own setup, not Nous Research documentation. The infographic is aspirational-grade (clean timestamps, tidy comments) — a real session is messier. The routing pattern is the durable takeaway; treat the specific UI (the terminal dashboard chrome, the exact activity-feed format) as illustrative, and verify SmallCode’s actual capabilities against its repo before depending on it.

Try It

Adopt the routing rule even without SmallCode: in your Hermes (or any orchestrator) config, route well-scoped patch/test/cleanup tasks to your cheapest working tier and reserve Claude/Codex for planning, architecture, and review. The decision belongs at decomposition time, not per-prompt.
Stand up the local tier: install SmallCode + Ollama on a spare machine (Mac mini, or any box with enough RAM), pull a local model (Gemma 4 cited by Saboo), and point your scoped-task lane at it. Inference is then free and rate-limit-free.
Use the Kanban tab as the coordination substrate: if you run Herm-TUI, the kanban tab is already there. Make every decomposed task a card with an explicit worker assignment so a local worker and a frontier worker can run in parallel without colliding.
Pair with the cost framing: read Puneet Shah’s optimization stack and Picking the Right Model for the eval-first discipline that tells you which tasks can safely drop to the cheap tier without losing the outcome.

Hermes Agent — topic landing — the framework overview (and the nainsi dwivedi whole-framework infographic)
Herm-TUI (liftaris) — ships the kanban tab this workflow coordinates on
Hermes MemoryKit — the memory-routing companion to this model-routing pattern
Hermes Codex App-Server Runtime — how Hermes delegates openai/* turns to Codex (the “Codex does heavy lifting” half of this routing)
Getting more out of the Claude Platform (Puneet Shah) — cloud-side equivalent (Sonnet/Haiku executor + Opus advisor)
Picking the Right Model — Build a Private Eval — the eval-first discipline behind safe down-routing
Hermes Agent 1-Hour Course (Nate Herk) — covers the multi-model setup + cron patterns this builds on
2026 Claude Code AIOS Pattern — the broader orchestrator-over-workers architecture this is an instance of

Open Questions

SmallCode maturity. github.com/Doorman11991/smallcode — stars, license, test coverage, and how robust the local-model code generation actually is on real repos (vs the tidy infographic). Needs a repo vet before recommending as a production scoped-task tier.
How does Hermes actually decide the routing? The infographic says “route by complexity” but the decision mechanism (a classifier prompt? a heuristic? a config rule?) isn’t specified. Worth tracing against Hermes docs / Saboo’s setup.
Is the terminal dashboard real or a mockup? The “Hermes Agent Terminal v1.0.0” chrome with the live activity feed may be a designed visualization rather than actual Hermes UI. The Herm-TUI and the built-in hermes dashboard are the real dashboard surfaces — unclear if this exact view exists.
Local-model quality ceiling. Gemma 4 via Ollama for “scoped patches and tests” — at what task complexity does the local tier start producing patches that fail review and bounce back, eroding the cost savings? No data in the thread; an eval would settle it.

Jonathon's AI Wiki

Explorer

Hermes Autonomous SWE Workflow — Model Routing via Kanban (Shubham Saboo)

The workflow infographic

The routing rule

Key Takeaways

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Hermes Autonomous SWE Workflow — Model Routing via Kanban (Shubham Saboo)

The workflow infographic

The routing rule

Key Takeaways

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks