Source: Hermes Agent — User Stories docs (ai-research/hermes-agent-user-stories-2026-05-09.md; hermes-agent.nousresearch.com)
Nous Research’s official Hermes Agent docs include a community-curated User Stories & Use Cases page that catalogs feature requests, integrations, and real-world deployments — sourced from GitHub issues and X/Twitter posts. Useful as a roadmap signal (what users actually want) and as a deployment-pattern map (what Hermes is being used for in production).
Five categories
The page groups stories into five buckets, each of which surfaces a different aspect of the Hermes value proposition:
- Privacy & Self-Hosted — secure remote access patterns
- Business Ops — sales, sales outreach, inventory, email
- Integrations — protocol-level requests (JMAP, etc.)
- Personal Assistant — Google Tasks, productivity tools
- Meta & Ecosystem — migration paths from competing agents
A sixth category (General) holds testimonials.
Privacy & Self-Hosted
Tailscale serve for secure remote access (no exposed ports)
Users want secure remote access to the Hermes API server / Open WebUI without exposing ports publicly. Tailscale serve provides zero-config HTTPS tunneling over a private mesh — instead of port-forwarding 443 from a residential IP or running a ufw allow 443, Tailscale’s mesh network lets the operator’s laptop reach the Hermes box over an authenticated WireGuard tunnel.
This pairs directly with the Hermes seven-layer security model: layer 1-7 protect what runs on the box; Tailscale serve protects who can reach it.
For the full reproducible mesh + SSH + persistent-session walkthrough (not just tailscale serve’s HTTPS proxy), see Access Your Hermes Agent From Anywhere — Tailscale + Termius + tmux — Tailscale for reach, Termius for mobile control, tmux for session persistence, reaching the agent from a phone with zero exposed ports.
— @artile, GitHub, 2026
Business Ops
Create and edit Google Slides decks
Extending the google-workspace skill to Google Slides so Hermes can create and edit presentations for users already in Google Workspace. Lifts Hermes from “creates Google Docs and Sheets” to “creates the full deliverable pipeline a sales/marketing function needs.”
— @PaulTisl, GitHub, 2026
Hunter.io email-finding for sales outreach
Surface Hunter.io (email lookup/verification) via Composio MCP for sales-outreach workflows. Closes the loop between LinkedIn/Apollo prospecting and email-on-file enrichment without a manual handoff.
— @m1chaeljmk, GitHub, 2026
Live inventory tracking on Hermes
With Hermes (built by @NousResearch) providing 40+ built-in tools, persistent memory, and subagent parallelization, the development experience is best-in-class. Built for operations like inventory tracking where context, memory, and real-time inputs are non-negotiable.
Real production deployment — not a feature request. Anchors Hermes’ positioning against shorter-context coding agents for stateful business workflows.
— @akashnet, X/Twitter, 2026-04-21
Give your Hermes its own email inbox
Here’s how to give your Hermes agent its own email inbox. No SMTP/IMAP, no Google OAuth, just plug in AgentMail using MCP.
AgentMail provides a hosted email inbox over MCP — agents get a real address without the operator wiring SMTP credentials into the box. Pairs with the MCP credential scoping rules (the AgentMail credential goes only to that MCP subprocess).
— @agentmail, X/Twitter, 2026-04-07
Integrations
JMAP email for Fastmail users
Requesting JMAP support in the email integration for Fastmail users. JMAP is the modern HTTPS-native replacement for IMAP — more efficient, batchable, and what Fastmail-as-an-IMAP-source actually wraps internally. For agents that read/triage email at scale, JMAP collapses dozens of IMAP roundtrips into a single batched HTTP call.
— @zednik-max, GitHub, 2026
Personal Assistant
Google Tasks integration
Adding a Google Tasks tool so Hermes can create, update and list tasks as part of personal productivity. Bridges the “agent did the work” → “next-action lives in my Tasks list” gap most users hit when they try to combine Hermes with their actual GTD pipeline.
— @isakcarlson5-del, GitHub, 2026
Sometimes Hermes Agent melts my heart
Sometimes Hermes Agent melts my heart @NousResearch.
Open-ended testimonial — not a feature request. Useful as a calibration on what people actually feel about the product.
— @flyingcloudliu-hub, X/Twitter, 2026
Meta & Ecosystem
Shadow-to-live migration from OpenClaw
A proposed migration path for users moving from OpenClaw to Hermes, covering shadow-mode runs before full cutover. Shadow mode is the pattern where the new agent runs alongside the old one, gets the same inputs, but doesn’t act — operators compare outputs and only flip the cutover when shadow runs match for N consecutive days.
The OpenClaw → Hermes pattern matters because Printing Press, Crabbox, and several other tools in this wiki ship as OpenClaw plugins — operators committed to OpenClaw’s plugin ecosystem need a path that doesn’t drop those integrations.
— @oangelo, GitHub, 2026
Switched from OpenClaw, not looking back
A real cutover testimonial. Counts as social proof for the migration story above.
— @pfanis, X/Twitter, 2026-04-14
General
An AI employee for my hardest tasks
Hermes Agent with ChatGPT 5.5 is literally magic. I’ve thrown some of my hardest tasks at this combo and the agent has been able to handle EVERYTHING. Time to set up your AI employee.
Validates the Hermes + ChatGPT 5.5 stack pattern that Nate Herk’s course walks operators through end-to-end.
Claude Code as Hermes builder/doctor — community pattern (2026-05-21)
[Reddit signal — r/hermesagent 1tjarlz, 52 score, 42 comments, OP u/spinsilo] Source: raw/reddit-1tjarlz.md
Community workflow shipped same week as the Skill Bundles feature: install Claude Code on the VPS or Mac mini where Hermes is already running, point CC at the .hermes root directory, and use CC as your Hermes builder / doctor / advisor. Reporter calls it “a massive unlock” — after nearly giving up on Hermes for the 10th time, they installed CC on the VPS, pointed it at .hermes, asked it to fix all the bugs, and “it pretty much one shotted it.” Two operational wins: (1) the CC subscription absorbs the work (no Hermes-side tokens burned on builder/doctor/debug sessions); (2) the autonomous-executor role and the builder role are separated — Hermes remains the autonomous executor where agents/scripts/skills/API keys live (Telegram gateway, second brain, etc.), while Claude Code becomes the editor + debugger sitting alongside it. Compatible with both VPS and Mac-mini installs. Adjacent to the Codex App-Server Runtime pattern (which delegates openai/* turns to Codex CLI for ChatGPT-subscription pricing) — the spinsilo pattern is the CC analog at a different stack layer (CC operates on Hermes’s filesystem rather than being delegated to mid-turn). Companion to the “code is the source of truth” principle from Anthropic’s Claude Code team.
Two community-reported workflows added 2026-05-17
[Reddit signal — r/hermesagent 2026-05-17] Source: raw/reddit-1tfrilq.md (18 score / 7 comments, OP Little-Tea7664)
Free-model auto-rotation cron. Operator’s setup: every morning, a cron job fetches the current OpenRouter free models, does a quick API call against each to test liveness, ranks by context window size, and saves the top two as config.default + config.fallback in ~/.hermes/config.yaml. End result: each new Telegram session starts on the freshest top-context-window free model. With 1.50/session in tokens. Pattern: zero-dollar daily model selection — let the cron pick the best free model rather than locking to a paid one.
[Reddit signal — r/hermesagent 2026-05-17] Source: raw/reddit-1tfka2y.md (11 score / 4 comments, OP feliche93, demo video: youtu.be/2wDZ7HGzNMc)
Receipt-PDF retrieval via Agent Browser + 1Password. Bookkeeping workflow operator uses to backfill receipts the bank shows as missing: Hermes (1) finds transactions with missing receipts, (2) logs into vendor portals like Namecheap through Agent Browser (Hermes’s browser-automation skill), (3) uses a dedicated 1Password vault scoped to bookkeeping (not the personal vault — credential scoping principle), (4) handles the email verification code automatically, (5) downloads the right PDF, (6) matches it by amount/date/vendor heuristics, (7) attaches it back to the bank transaction record, (8) saves the flow as a reusable skill so the next missing-receipt sweep doesn’t rebuild the playbook. Composable pattern: Agent Browser + dedicated-vault + email-verification handling + heuristic-match + skill-as-output.
Community model-selection consensus (megathread, May 2026)
[Reddit signal — r/hermesagent 2026-05-18] Source: raw/reddit-1tgbsuz.md (85 score / 16 comments, OP Jonathan_Rivera, sticky Megathread flair).
r/hermesagent mod-run Models Megathread synthesizing 32 model-selection threads from Apr 30 – May 17, 2026 — split into Local vs Cloud, grouped by use case, with summary knowledge tables. Useful as a roadmap signal beyond per-story feature requests: which models the operator community actually runs against Hermes today.
Headline consensus (local):
- Qwen 3.6 (27B / 35B) — the community favorite for self-hosted primary. Runs across the GPU/RAM spectrum (8GB GPU through 128GB RAM); 27B is the sweet-spot variant.
Headline consensus (paid stacks):
- The Hermes + ChatGPT 5.5 stack repeats often enough in testimonials to count as a community-validated default for users on a $20 ChatGPT subscription (matches the Nate Herk course pattern). The dominant balance question is whether GPT-5.5 mini covers most tasks well enough to preserve quota for the full 5.5 on heavy work.
- Minimax M2.7 is currently in a degradation-complaint phase — “First week was perfect… now it keeps going around in circles, code output is bad and it only remains fast” (
reddit-1tgihiq, OPunknownharris, 16 score). One data point, but worth tracking — if more reports land, factor into orchestrator-child role assignments.
Roadmap implications:
- Community implicitly votes for Qwen 3.6 27B as the de-facto local default. Documentation + example configs that target Qwen 3.6 will reach the widest operator base.
- Multi-model balance posts (which model for which task) dominate the megathread, signaling that operators want guidance on model-routing inside Hermes — pick a different model for chat vs coding vs background tool-use. A Hermes-side router that picks the cheapest model meeting per-task quality bar would meet a real demand. Adjacent to the free-model-auto-rotation cron pattern noted in the 2026-05-17 cohort above.
The megathread itself isn’t ingested as a separate article — it’s the kind of editorial synthesis whose value comes from being read on Reddit at the source. Use this entry as a pointer.
Performance — local-inference cohort (May 2026)
llama.cpp + Multi-Token Prediction beats Ollama on Strix Halo
[Reddit signal — r/hermesagent 2026-05-19] reddit-1tha1ey (CapitalIncome845, 30 score / 7 comments, “Memory & Context” flair): Operator running Hermes on a Mac mini with Qwen 3.6 MOE served by Ollama on Strix Halo reports usable-but-slow results, fell back to a paid ChatGPT subscription for serious work. After switching the serving stack to llama.cpp with multi-token prediction enabled, reports 5–10× faster than Ollama — “actually usable now.” Tweaked llama-server config:
./build/bin/llama-server \
-m "$MODEL_PATH" \
--host 0.0.0.0 \
--port "$PORT" \
--spec-type draft-mtp \
--spec-draft-n-max 2 \
--spec-draft-p-min 0.85 \
--parallel 1 \
-ngl 99 \
-fa on \
-c 65536 \
--timeout 600 \
--keep 12000 \
--no-slots \
--no-mmap \
--jinjaPairs with the Qwen 3.6 as the de-facto local default signal above — this is the same model on a faster serving stack. The MTP flags (--spec-type draft-mtp, --spec-draft-n-max 2, --spec-draft-p-min 0.85) are the levers; the rest is standard llama-server. Source video referenced in the post: youtube.com/watch?v=MI0Pm1d6YF4. Operator’s bottom line: “If you’re still using ollama, consider switching to llama cpp.”
Business Ops (continued) — content marketing CMS pipelines
Sanity + Medusa headless CMS pipeline → 12K daily impressions
[Reddit signal — r/hermesagent 2026-05-18] reddit-1tgt8g1 (Soundpulse99, 13 score / 8 comments, “Use Case” flair): Operator running a packaging-ecommerce store (propacks.net) on Sanity + Medusa reports going from 12 blog posts and no Google presence to 85 posts and ~12K daily impressions in two months — Hermes publishes directly to Sanity via the CMS API with structured portable text, SEO fields, FAQ schema, product references, and internal links. “Not drafts in a Google Doc. Directly published documents.” Three components called out:
- Self-hosted search proxy so Hermes pulls real research from competitor pages, industry sources, Reddit threads, trade pubs — citations included. Operator’s framing: “writes like an editor who did their homework.”
- Auto-updating context — Hermes already knows the product catalog, brand voice, existing content, collection structure. Every new post compounds on prior context instead of starting cold.
- Full pipeline on autopilot — idea → research → write → triage → humanize → publish to CMS → submit to Google Indexing API. Daily HARO + F5Bot scans, outreach tracking, GSC monitoring all run alongside.
Operator reports ~15+ hours/week saved vs the prior workflow. Adjacent to the Blog-Agent-Worker pattern (internal) — same “AI doesn’t draft, AI publishes” thesis, different agent runtime (Hermes vs BAW’s task-graph pipeline). Worth tracking if the operator productizes — “thinking about turning it into a product.”
Cost optimization — token-reduction cohort (May 2026)
Seven-technique token-cost reduction method (OpenClaw, ~95% claimed)
[Reddit signal — r/hermesagent 2026-05-19] Source: raw/reddit-1ths4dt.md (28 score / 25 comments, OP dxzzzzzz, Use Case flair). Operator running OpenClaw on a 155 to ~153/month or ~$1,629/year, conservative single-user estimate). Pattern applies to any system-prompt-based agent framework, including Hermes. The seven techniques, in priority order:
- B-tree bootstrap document architecture. Shrink
AGENTS.mdandMEMORY.mdto <60-line index files; move detailed rules into adocs/subdirectory loaded on-demand via thereadtool. Operator’s measured before/after:AGENTS.mdfrom ~3,000 → ~570 tokens (-81%),MEMORY.mdfrom ~2,000 → ~397 tokens (-80%), full bootstrap from ~6,115 → ~2,082 tokens (-66%). Same complexity logic as DB indexes — go from O(n) to O(log n) on bootstrap retrieval. Pairs with short directory aliases (/sk/1.mdvs/skills/long_name.md) since path tokens count too over thousands of conversation loops. - AI auto-compression (compaction). Configure
mode: safeguardso early conversation history compresses to summaries when the context window approaches its limit. Measured: 100-round conversation context from ~120K → ~25K tokens; per-round consumption from ~1,200 → ~600 tokens. - Local-model layering for lightweight tasks. Run heartbeat detection, security audits, and memory retrieval against local Ollama
qwen2.5:3b(free, CPU inference, ~3GB RAM) + local QMD (free, semantic vector search). Reserve paid SOTA (Sonnet, GPT-5.5 Pro) for genuinely intelligence-heavy work. - Direct script-to-API calls bypassing bootstrap. For repetitive tasks (portfolio analysis, market briefs), call OpenRouter/Anthropic/OpenAI from a Python script (
ask_openrouter.py) — never loadAGENTS/SOUL/MEMORYfor one-off API calls. Measured: per-task tokens from ~9,000 → ~1,200 (-87%). - Console commands replace LLM conversation. Service restarts, status checks, log views go directly to
exec— no LLM understanding step. The framing: “user must intervene” — operators who ask the agent to “restart openclaw” instead of typingopenclaw gateway restartthemselves are burning ~3,000 tokens per maintenance command. - CPU-fy daily logic via Python cron. High-frequency scheduled tasks (intraday market monitoring every 10 min, crypto price every 15 min, weather 2×/day, security scan every 30 min) go to Python scripts that fetch + judge + push to QQ/notification channel without ever invoking the LLM. Operator’s measured before/after: ~485,200 tokens/day → 0 LLM tokens for the same logic.
- Heartbeat checklist-ification. Convert vague heartbeat prompts (“you are a security audit expert…”) into structured execution checklists (
1. Read cron file 2. Check key leaks 3. Output HEARTBEAT_OK) and run againstollama/qwen2.5:3bwithlightContext: true(no bootstrap). If output ≠HEARTBEAT_OK, push to user.
Two ancillary patterns also surfaced: (a) RAG-vectorize skill descriptors so only relevant skill chunks load into the system prompt when many skills are installed; (b) for pandas-style data analysis, ask the LLM to write the Python rather than feeding the sheet directly to the LLM. Operator’s framing of the overall architecture: “Transform LLM from all-purpose butler to expert advisor — CPU-fy daily operations, let complex reasoning go to large models.” Adjacent to the free-model auto-rotation cron from the 2026-05-17 cohort — both are zero-dollar-where-possible patterns. The B-tree bootstrap technique applies directly to Hermes’ own AGENTS.md / MEMORY.md schema; the heartbeat checklist-ification applies to Hermes’ Heartbeat function.
Multi-agent stack — coordination architecture (May 2026)
Hermes + Claude + Obsidian wiki + Hindsight on Mac Mini M4
[Reddit signal — r/hermesagent 2026-05-20] Source: raw/reddit-1tiec44.md (15 score / 4 comments, OP Froggy_legs, Use Case flair). Operator publishes a four-component multi-agent stack built around a clear synchronous/asynchronous split, deployed on a Mac Mini M4 with ~$5 total Hindsight cost over 3 weeks (1.6M tokens). The cast:
- Hermes Agent (Mac Mini M4, accessed via Telegram) — persistent operational work: cron jobs, scheduled briefings, quick fetch-and-send from phone.
- Claude (claude.ai web/desktop) — complex reasoning, long drafting, anything that benefits from a bigger context window.
- Obsidian wiki at
~/wiki/— synchronous coordination layer, based on Karpathy’s LLM-wiki pattern; both agents read and write the same files via a wiki MCP server. - Hindsight — asynchronous memory layer; both agents write durable facts to and read from the same observation bank (
hindsight.vectorize.io/best-practicesreferenced as the source of the operator’s memory-writing discipline).
The split that finally clicked: wiki is for structured documents you’d want to read (rules, protocols, page-shaped knowledge with sections and headings); Hindsight is for facts you’d want surfaced when relevant (one-liners, multi-turn captures, state). The protocol for two agents editing the same files lives in a shared-rules: true page called ai-as-partner.md and exposes four techniques worth stealing:
shared-rules: truefrontmatter designates which pages follow the dual-agent protocol (e.g.,ai-as-partner.md,hindsight-memory.md,SCHEMA.md) — most wiki pages don’t need locking, only co-edited rule pages do.- YAML soft-lock — each shared-rules page has a
lock: true/falsefield. Read → set true → edit → set false in the same write. Stale locks (updated >30 min) get overridden with a log entry. Not bulletproof, but “for two agents that rarely collide, plenty.” - Append-only Proposals section with three-party threads. Both agents add proposals to the bottom without coordinating (appends don’t conflict). Proposals are threaded conversations between Claude, Hermes, and the human, with timestamped markers. When resolved, the entire thread moves verbatim to
log.mdunder the resolution date, and the page itself is updated with the final decision merged in — page stays clean, full provenance preserved. - Rules pages stay terse, rationale lives in
log.md. Hard 500-line budget per rules page. Wiki shows current state, log shows history. Open proposals older than 14 days auto-escalate via the Monday briefing.
The architecture also includes a division-of-labor table mapping domain → owner (long-running automation/cron → Hermes; complex coding/reasoning → Claude; wiki rule pages → Shared via proposals; email/calendar → Either; weekly wiki maintenance → Hermes via Sunday script). Hindsight-side discipline: always set context and timestamp on every retain; use document_id for clustered facts (the upsert mechanism — stable IDs like cards-rich-uses replace prior versions instead of accumulating contradictions); faithful capture (not pre-summarized); recall for pinpoint, reflect for synthesis; don’t retain and recall in the same turn (retain is async); prefix every memory with [claude] or [hermes] for audit. Each agent keeps a local mirror of the rules so both stay functional when the wiki is unreachable; the wiki remains canonical. The human stays the arbiter when agents disagree — “two AI agents will not negotiate a conflict resolution on their own.” Pattern is genuinely new in this catalog: prior entries treat Hermes as a single-agent surface; this is the first published shape of Hermes as one node in a multi-agent topology, with Karpathy-pattern wikis filling the role most operators expect MCP shared-memory to fill.
Business operator profile — Antoine running multiple businesses on Hermes
[YouTube — creator walkthrough 2026-05-19] Source: raw/How_to_have_Hermes_run_your_business.md — Antoine (founder of Fly.hermes.ai) walking a podcast host through three live businesses orchestrated by Hermes. The walkthrough is screen-shared evidence (Stripe dashboards, Telegram chats, ad library scrapes) of Hermes running operational marketing + customer-support + build workflows in production. Where the GitHub/X user-stories cohort is feature-request shaped, this profile is a fully-operationalized stack the operator is willing to expose in detail.
Revenue context — three Stripe dashboards. The operator’s “AI agents are running these businesses” claim is paired with Stripe gross-volume screenshots: 44K / $18K across three businesses. Started on OpenClaw, migrated most workflows to Hermes. Quote: “recently, the one I use the most is Hermes… I just got better output. OpenClaw often broke after updates… and I found Hermes way better at using skills and tools on repeat… the way it’s self-evolving for Hermes, I just find it more efficient.” — corroborating the OpenClaw → Hermes migration pattern from the GitHub catalog with a working stack at the other end.
Ad-creative pipeline — Meta Ad Library + Apify + Seedance + FFmpeg. Hermes runs a daily ad-research cron: scrapes the Meta Ad Library via Apify (operator explicitly chose Apify-API-as-scraper over agent-browsing for cost + reliability — “browsing with agent is good. It works most of the time, but it’s not as reliable as API calls. Takes forever, takes a lot of tokens, not as reliable”), gathers competitor ad text + images + video, ranks by appearance/disappearance over time. Daily delta example from the walkthrough: “yesterday was 42 new ads and 33 disappeared. So we know that those ads are not that good.” For video ads the operator is implementing video-watching capability — uploads competitor video to Gemini and asks Gemini to describe each scene + draft a prompt that recreates the video via AI generation. UGC video assembly stack: Seedance generates 15-30s clips, then Hermes directly calls FFmpeg from Python to stitch clips + overlay text. “Directly, Hermes can attach them together, and he can also add text on top of it… it’s calling [FFmpeg] here… and it’s using Python.” Operator’s pattern is “snippets, not full video” — generate multiple shots, split-test, edit. Also mentioned: Meta Tribe V2 model that analyzes a video and predicts neural-response patterns for viewers (“which is really good” — operator-claimed; not all machines support it).
Slash command vocabulary — eight commands every operator should know. The walkthrough catalogues Hermes’s operational slash commands with concrete-use framing:
| Command | When to use | Operator’s framing |
|---|---|---|
/new | Switch topic within same chat | ”Most of the time you don’t need this if you have topic-channels in Telegram.” |
/compress | Manual context compaction | ”Hermes does it itself most of the time, so it’s not that useful.” |
/q | Queue a follow-up task while a long task runs | ”When you start to work on a task and sometimes it takes 15 minutes… you have another idea, you can just /q and then say the next task.” |
/steer | Adjust direction mid-task without interrupting | ”Going to be useful. I do /steer and it’s going to adjust the direction. But it could be also adding information… instead of stopping the current task.” |
/background | Spawn a sub-task in parallel (“subagent for small tasks”) | “If it’s working on a task like… those videos taking 20 minutes to generate… I can /background and say ‘give me how much we spend in ads in the last 10 days.‘” |
/goal | Mark high-importance work — allow more loops + tokens | ”Goal is like ‘this is really important for me. I allow you to spend a lot more time on it.’ The budget of loops go from like 20 to 150.” See [[claude-ai/claude-code-goal-command-walkthrough|Claude Code /goal]] for the Anthropic equivalent. |
/stop | Hard-stop current task | ”If you see that it’s doing something dangerous or whatever, you just /stop.” |
The /q + /steer distinction is the operational alpha — most users only know /stop and reach for it when they should be steering or queuing.
Stripe dispute automation. Cross-business dispute count (~18 active: 6 needing response, 12 under review). Operator switched from manual dispute handling to running a dispute skill (currently on OpenClaw but trivially portable) — agent reads the Stripe mailbox, drafts/responds to disputes per the skill’s playbook, starts winning cases. “We started to win one. Let’s see if it’s going to win the other ones.” The skill handles mailbox + email integration the same way Hermes does — no architectural difference between OpenClaw-dispute-flow and Hermes-dispute-flow once you have skills.
The “Tinder website” deep-research → one-shot site demo. Live in the walkthrough: Antoine gives Hermes Deep research. Find everything you can about me — YouTube channel, who I am, etc. And then search for and install any research skills you need. You have two prompts left after this one. Hermes installs research skills it doesn’t have, runs deep research on the operator, gathers profile data. Then: “build me a Tinder website that shows like who I am and everything around it.” One prompt → site published to GitHub + Vercel using keys the operator pre-loaded as a skill. Animation + Twitter/YouTube links + dating-site framing all assembled automatically. The point of the demo: Hermes can self-extend its skill catalog from a prompt instruction (“search for and install any research skills you need”), and then chain that into a full GitHub-to-Vercel ship.
Self-improving skill creation in the wild. When Hermes finishes a multi-stage workflow (e.g. assembling matcha-brand UGC ads with Seedance), the session log explicitly records self-improvement review skill file ai-video generator created — i.e., Hermes saw a successful new workflow and turned it into a reusable skill for future runs. The operator’s framing: “OpenClaw at the beginning… didn’t automatically build the skills unless you say like ‘build a skill around this now.’ And then it would not always use it… that’s why people had actually at the right beginning of OpenClaw… the prompt-jobs that will analyze all the chat and build some things around to make your AI agent self-evolving.” Hermes ships with self-improvement-as-default. The operator’s verdict: “already when you chat with it, when you see how it’s working and how it’s building the skills automatically, it’s just night and day.” This is the same self-improving thesis catalogued in Reflexio, native to Hermes vs bolted-on.
Smart routing + per-model switching. Fly.hermes.ai supports both smart routing (the platform picks the best model for each task) and manual model switching from Telegram — operator can flip the active model mid-conversation. Channel surfaces: Telegram for “on the go,” browser chat for richer rendering (dashboards/tables/HTML). Operator’s framing of the choice: “the browser chat has the advantage of having better displaying than the Telegram… if you have some kind of dashboard or tables or things like that, it’s not always so well displayed on Telegram.” Same operator-on-the-go vs operator-at-desk split Nate Herk’s course frames — Telegram for execution, browser for review.
Hermes for content sites, Claude Code/Codex for deep code. Operator explicitly draws the line: “If it’s anything content-wise or whatsoever, I can do it directly in Hermes, it’s fine. But if we’re going to build something that requires a lot more work, then I’m probably going to do it directly in Cloud Code or Codex.” Maps to the surfaces decision framework — Hermes for stateful business agents + content + light builds; Claude Code/Codex for heavy code with long thinking windows and standards-aware project structure.
Fly.hermes.ai as the hosted product. The operator’s own business — fly.hermes.ai — packages Hermes for users who don’t want to self-host: 7-day free trial, both Telegram + in-browser chat, model catalog + smart routing, skills directory accessible in the lower-left of the UI. Pattern: same Hermes platform, hosted-as-a-service for operators avoiding VPS setup. Sister positioning to the Hostinger one-click Hermes VPS path documented in Nate Herk’s course — same agent runtime, different deployment surface (PaaS vs self-managed VPS).
Patterns to draw from this catalog
Reading the full list of user stories surfaces three operational patterns:
- The Hermes value prop is “stateful business agent,” not “coding agent.” Inventory tracking, sales outreach, email triage, Google Tasks — these are workflows where the long-running memory + multi-day session continuity matter more than raw code-generation speed. Treat Hermes as a Managed Agents alternative for self-hosted ops, not as a Claude Code competitor.
- MCP is the integration substrate. Hunter.io via Composio MCP, AgentMail via MCP, Google Slides extending google-workspace skill — every new integration ships as either an MCP server or a skill that wraps an MCP server. Operators looking to extend Hermes should design MCP-first.
- Migration from OpenClaw is real. OpenClaw → Hermes via shadow mode is a documented path. The reverse (Hermes → OpenClaw) is not. For operators committing to one ecosystem, this is a directional signal.
Try It
- Browse the live page at
https://hermes-agent.nousresearch.com/docs/user-stories— it updates as community contributions land. The cohort of items above is a 2026-05-09 snapshot. - Submit a use case by opening a GitHub issue on the Hermes repo or tweeting
@NousResearchwith the use case. Repeat patterns drive prioritization. - Test the AgentMail MCP if you want Hermes to handle email without wiring SMTP. Search “AgentMail MCP Hermes” — there’s a published recipe.
- Set up Tailscale serve before exposing Hermes’ Open WebUI publicly. The
tailscale servedocumentation walks the zero-port-forwarding pattern. - For shadow-mode migration, run both agents in parallel against a non-mutating workflow first (e.g., a daily briefing). Compare outputs for a week before cutting any production workflow.
Recent operator playbooks (May 2026)
Three substantive r/hermesagent posts in the same week — all Use Case / Workshop flair — surface concrete operator playbooks that go past feature-request signal:
[Reddit signal — r/hermesagent 2026-05-28] Source: raw/reddit-1tph8wg.md (87 score / 32 comments, OP jebk, Use Case flair, “You’re probably accidentally tokenmaxxing. Learn to delegate more”). OP’s pre-optimization Hermes setup via OpenRouter ran ~0.18 across the simple/standard tier. (L2) Delegation discipline — hard rule: any task consuming >50 lines of code or output in the orchestrator’s context gets delegated to a subagent; orchestrator writes spec, subagent implements, orchestrator sees only the summary (~1KB spec + 500B summary vs 15-20KB direct work). Daily cron self-audits for missed delegations (>8 web_search calls + 0 delegate_task = should-have-batched-research violation). (L3) Delegate-first tool access — disabled_toolsets in profile config replaces heavy MCP schemas with a ~50-token “delegate when needed” instruction in SOUL.md. Measured savings on OP’s setup: Browser 12 tools = ~1,800 tokens/turn; Frigate MCP 59 tools = ~8,000; HA MCP 22+ dynamic = ~10-15,000. Combined 20-25K tokens/turn saved on default profile; trivial query “say hi” goes from ~16K prompt tokens → ~6K with delegate-first profile.
[Reddit signal — r/hermesagent 2026-05-28] Source: raw/reddit-1tpzpri.md (37 score / 16 comments, OP old-mike, Use Case flair, “My ultra-cheap, hybrid local/cloud stack for Hermes Agent (DeepSeek-V4-Flash & OpenRouter) + Text/Voice via Telegram”). Self-hosted home-server setup (Windows 24GB+ / Linux/Mac 16GB+ runs the same stack) targeting ~$3/month total token bill via DeepSeek-V4-Flash + OpenRouter via Telegram channel. Companion data point to the L1-router approach above — same destination (free-tier routing), different mechanism.
[Reddit signal — r/hermesagent 2026-05-28] Source: raw/reddit-1tpms69.md (37 score / 57 comments, OP Anisselbd, Use Case flair, “What cron jobs do you run with Hermes Agent? Here’s my setup”). Calendar-provider-wired (Google + iCloud) personal-assistant cron stack: 09:00 daily briefing (weather + Google + iCloud calendar + iCloud Reminders as all-day events fusion; Python-only, 0 token cost via no_agent: true); 12:00 tech-news digest (RSS from HN/TechCrunch/The Verge/Ars Technica, Hermes-summarized, Telegram-delivered); event-triggered (calendar-add → confirmation + brief) and time-based (overnight inbox sweep) crons. The no_agent: true zero-cost-cron flag is the load-bearing technique for daily summary workflows where the summarization doesn’t need an LLM — the value is the aggregation. Worth lifting as a Business Ops pattern across the topic.
[X signal — @IBuzovskyi 2026-06, citing official Hermes docs] Source: raw/x-bookmarks-recent-digest-2026-06-14.md. The inverse of no_agent is the **wakeAgent 0); a 40% jump → wake, report to Slack, act through the Stripe MCP. The economics: of 20 monitoring jobs a day where 18 find nothing, you pay for 2. Same throughline as no_agent — scripts do the mechanical work for free; the agent spends tokens only on the judgment that needs it.
Related
- Hermes Agent topic index
- Hermes Agent — Security Model — Tailscale serve pairs with the seven-layer defense model
- Nate Herk’s Hermes 1-Hour Course — operator-side walkthrough validating the “Hermes + ChatGPT 5.5” stack mentioned in the General testimonial
- Printing Press — ships OpenClaw plugins that the OpenClaw-to-Hermes migration path needs to preserve
- Crabbox — OpenClaw plugin for short-lived Linux boxes; another integration that survives migration
- Managed Agents — Anthropic-hosted alternative; the comparison frame for “self-hosted Hermes vs hosted Anthropic”
Open Questions
- How fresh is the user-stories page? It pulls from GitHub issues and X — is it auto-synced (atom feed?) or manually curated? If manual, the 2026-05-09 snapshot may already be stale.
- Are upvotes/likes tracked? The page doesn’t show signal density, so a single tweet ranks alongside a 50-thumbs-up GitHub issue.
- Does Nous publish a roadmap that links explicit user stories to upcoming releases? If yes, that’s a higher-signal artifact to track.