Hermes GAPA — The Self-Improvement Loop

Source: ai-research/agentwikis-hermes-gapa-2026-06-12.md — compiled by Agent Wikis from Hermes community transcripts and v0.8.0 release notes; sourced 2026-06-12. 2026-07-14 addition: raw/reddit-1uvqmyx.md (r/hermesagent — a concrete skill-bloat failure mode and a proposed reviewer/curation stage; added to Risks and Pitfalls)

GAPA is Hermes Agent’s self-improvement mechanism: a closed cycle in which the agent observes its own behavior, identifies failures, packages successes as reusable skills, and edits its own prompts and memory — all without human intervention. “Works like back propagation but for prompts instead of model weights.” It is the architectural difference that makes Hermes a learning agent rather than a stateless one: ChatGPT, Claude, and OpenClaw “reset every time” — Hermes compounds.

Key Takeaways

5-stage loop running continuously — trajectory capture → GAPA review every ~15 tool calls → autonomous skill creation → persistent memory update → compounding reuse
Cadence is configurable — skills.creation_nudge_interval: 15 in config.yaml
Writes to disk autonomously — creates SKILL.md files under ~/.hermes/skills/custom/, updates MEMORY.md, USER.md, and state.db
Compounds per user — over time, your Hermes becomes different from anyone else’s: shaped by your workflows, preferences, and patterns
Self-diagnosed its own tool-calling bugs — v0.8.0 case study: Hermes ran GAPA on its own GPT/Codex provider interactions, identified 5 failure modes, and patched its own guidance without human input
Only fires on complex tasks — trivial requests bypass the loop; value compounds on repeatable, non-trivial work

How the Loop Works

Stage 1 — Trajectory Capture Every API call, tool decision, and output is recorded in order and saved to sessions.json + FTS5-indexed state.db. Most agent frameworks discard this at session end; Hermes keeps it as queryable history.

Stage 2 — GAPA Review (~every 15 tool calls) The agent pauses, reads back the recent trajectory, and evaluates what worked and what failed. Output: edits to its own working prompts and memory. The 15-call cadence is configurable via skills.creation_nudge_interval.

Stage 3 — Autonomous Skill Creation When trajectory analysis finds reusable work, Hermes packages it as a SKILL.md document and writes it to ~/.hermes/skills/. Community examples:

Hacker News morning briefing: single complex prompt → autonomous skill + wired cron job
Manim animation skill: created after first successful complex technical explanation as animated video
X-posting workflow: writes feedback into the skill MD on each run and improves on the next

Stage 4 — Persistent Memory Update Beyond skills, three memory layers update in parallel:

Cross-session FTS5 search — every prior conversation indexed and queryable (“six months from now you can ask ‘haven’t we solved something like this?’”)
Honcho user model — builds a model of who you are, your work style, your preferences, your domain knowledge
Memory nudges — proactively suggests skill improvements you can approve or reject

Stage 5 — Compounding Reuse Next similar request runs the existing skill, refines it on new feedback, and ratchets capability upward. The agent’s trajectory for complex tasks also feeds offline RL training via the ML Research Pipeline — trajectories become training data for future model improvements.

The v0.8.0 Self-Diagnosis Case Study

The most concrete public proof that GAPA works on the harness itself: Hermes ran trajectories against GPT and Codex backends, the GAPA review identified 5 distinct failure modes in how those models invoked tools, and the agent generated patches to its own provider-specific guidance — without a human in the loop. Flagged in v0.8.0 release notes as “Self-Optimized GPT/Codex Tool-Use Guidance — The agent diagnosed and patched 5 failure modes in GPT and Codex tool calling through automated behavioral benchmarking.”

Key Config Parameters

skills:
  creation_nudge_interval: 15      # GAPA review cadence (tool calls)
  external_dirs: []
memory:
  memory_enabled: true
  user_profile_enabled: true       # Honcho-style user modeling
  memory_char_limit: 2200
  user_char_limit: 1375
  nudge_interval: 10               # memory-suggestion cadence
  flush_min_turns: 6
  provider: honcho
agent:
  max_turns: 60

Relevant slash commands: /skills (browse, install, inspect, create), /memory (view what Hermes knows about you), /yolo (accept dangerous commands uninterrupted)

Risks and Pitfalls

Only fires on complex tasks — trivial requests bypass the loop; don’t judge adoption on simple use
LLM-generated skills can fail — inspect with hermes skills inspect <name> before using in cron
First 7 days are rough — the loop needs runs to learn from; judge after a week of normal use
Context bloat — memory accumulates in long sessions; tune session_reset.mode: both and idle_minutes for local models
Stuck-loop bug — recurring across versions; v0.8.0’s inactivity-based timeout helps; still recurs
Security surface — GAPA edits prompts and writes skills to disk autonomously; use Docker backend for production, not local + /yolo
Skill naming conflicts — overlapping tasks can produce morning-briefing and morning-briefing-1 both half-working; run hermes skills audit periodically
Unbounded self-improvement can bloat a heavily-used skill (community report, 2026-07-13). r/hermesagent (u/4rt_relay, 67 upvotes, raw/reddit-1uvqmyx.md): a most-used skill accrued 700+ automatic edits and grew to hundreds of KB of Markdown — nearly every session appended another “lesson” (temporary debugging findings, one-off recovery steps, stale implementation details, duplicated rules), making the skill worse: bloated, repetitive, slower to load. The loop generates edits far more easily than it curates them — there is no default garbage-collection/merge step. Proposed fix (community): make skill maintenance a default second stage — collect proposed changes, group by skill, and use an isolated reviewer to merge / rewrite / reject / defer, optimizing for a smaller, more accurate final skill rather than accepting every proposal. This is a concrete instance of the “who judges the self-improvement?” gap tracked in Self-Improving Agent Loops; until Hermes ships such a stage, treat hermes skills audit / hermes skills inspect + manual pruning of your highest-traffic skills as a periodic chore, not an optional one.

Try It

Run Hermes on a repeating workflow for 7+ days and watch /skills grow
After a cron run, inspect a newly created skill: hermes skills inspect <name> before putting it in production
Ask /memory to see what your Hermes has learned about your preferences
Review the ML Research Pipeline (hermes-agent.nousresearch.com/docs) to understand how your trajectories feed offline training

Hermes Memory Providers — the pluggable backend the memory stage writes to
Hermes Skill Bundles — composing autonomously-created skills into slash-command workflows
Nate Herk 1-Hour Course — practical intro covering the self-improving loop
Hermes Masterclass — hands-on trajectory → skill → reuse walkthrough
Reflexio — external harness applying similar harvest-from-trajectories pattern to any agent
Browserbase Autobrowse — browser-skill equivalent of the same compile-from-runs pattern
Self-Improving Agent Loops — GAPA, ACE, and the Question of Who Judges

Jonathon's AI Wiki

Explorer

Hermes GAPA — The Self-Improvement Loop

Key Takeaways

How the Loop Works

The v0.8.0 Self-Diagnosis Case Study

Key Config Parameters

Risks and Pitfalls

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Hermes GAPA — The Self-Improvement Loop

Key Takeaways

How the Loop Works

The v0.8.0 Self-Diagnosis Case Study

Key Config Parameters

Risks and Pitfalls

Try It

Related

Graph View

Table of Contents

Backlinks