Source: duclamvan-hermes-memorykit-readme-2026-05-22.md — full README + companion benchmark repo link, fetched 2026-05-22. Repo: github.com/duclamvan/hermes-memorykit. Version: v0.2.0 (released 2026-05-19). Stars: 1. License: MIT. Language: Python (100%). Author: duclamvan. Companion benchmark repo: github.com/duclamvan/hermes-memory-benchmarks.
Practical memory toolkit for Hermes Agent users — addresses the failure mode “my agent stops remembering after long chats, cron runs, tool calls, and context compression.” Extends Hermes’s existing three-tier memory model (Core Memory in Context / Session Search FTSS / External Memory Providers) with an opinionated 8-layer stack — LCM raw transcript → native memory → QMD/markdown wiki → entity graph → hybrid RRF router → focus brief → regression tests → nightly maintenance. Ships a Hermes-native plugin wrapper exposing four tools (memory_stack_status, memory_stack_route, memory_stack_focus_brief, memory_stack_regress) plus config templates for .env, ~/.hermes/config.yaml, and cron prompts. Author claims via companion benchmark repo: 100/100 A+ score, 27/27 retrieval checks passed, 35 Hermes profiles verified. Adoption signal caveat: 1 star, 0 forks, 5 commits, 3 days old at ingest — extremely early. Benchmark claims are self-reported. Worth tracking but not yet community-validated.
The 8-layer stack
The toolkit’s central design — quoted verbatim from the README:
raw transcript → durable notes → searchable docs → entity graph → RRF router → focus brief → regression tests → nightly maintenance| # | Layer | What it stores / does |
|---|---|---|
| 1 | LCM raw transcript store | Exact conversation history (Hermes’s existing LCM SQLite) |
| 2 | Native memory | Compact durable facts (Hermes’s existing Core Memory) |
| 3 | QMD or Markdown wiki | Searchable project / user / system docs (e.g., this Karpathy-style wiki) |
| 4 | Entity graph | Links people, projects, topics, files, sessions (extracted from layers 1+3) |
| 5 | Hybrid RRF router | Ranks candidates across all 4 sources via Reciprocal Rank Fusion |
| 6 | Focus brief builder | Turns ranked recall into a short, cited task brief |
| 7 | Retrieval regression harness | Catches memory drift over time |
| 8 | Nightly maintenance | Updates indexes, runs health checks |
The novel layers vs Hermes’s native memory model are 4 (entity graph), 5 (hybrid RRF router), 6 (focus brief), 7 (regression harness), 8 (nightly maintenance). Layers 1-3 are wrappers around Hermes primitives that already exist. The toolkit’s value is the routing layer on top.
Key Takeaways
- The 8-layer model is the load-bearing contribution. Hermes ships layers 1-3 natively (LCM + native memory + plus-optional-Obsidian-or-Notion external memory). MemoryKit adds 4-8 — the routing and regression layer. The framing is: agents have memory features but no retrieval router and no drift detector. That gap is real and matches the operator pain points surfaced in the user-stories catalog.
- Hybrid RRF (Reciprocal Rank Fusion) over four sources, not vector search. The router (layer 5) ranks across LCM, native memory, QMD, and the entity graph — explicitly not a single vector DB. Matches the Karpathy LLM-wiki thesis that hybrid keyword+structure beats vector-only at the hundreds-of-docs scale this wiki sits at.
- Focus brief, not raw retrieval. Layer 6 turns ranked recall into a cited task brief — not a list of documents. That matches Anthropic’s Best Practices framing on context-as-prompt (give the agent a focused context payload, not a dump). Cites are the discipline anchor — the brief is verifiable against its sources.
- Regression harness is the differentiator. Layer 7 — “catches memory drift” — is a primitive most agent memory toolkits don’t ship. The “we trained an eval set” discipline applied to retrieval. Without this, you can’t tell if a new memory write silently degraded retrieval quality. Pairs with the eval-first methodology from Lucas’s Code-with-Claude London talk.
- Hermes plugin wrapper exposes 4 native tools. Once installed (
install_hermes_plugin.py --hermes-home ~/.hermes --repo "$PWD" --force), the toolkit registersmemory_stack_status/memory_stack_route/memory_stack_focus_brief/memory_stack_regressas Hermes-native tools. Available to skills, cron prompts, and ad-hoc invocations. - Promotion policy lives in
docs/skills-and-memory.md. The toolkit ships an explicit doc on when to keep a fact in LCM, when to save native memory, when to write docs, when to create a skill. That’s the hardest discipline in any memory architecture — having a written rule helps. Pairs with Hermes Skill Bundles (which makes skills explicit at the bundle level). - Nightly cron is the maintenance heartbeat. Layer 8 runs via Hermes cron with a specific prompt template the README provides: “Run preflight token refresh first. Then run Hermes MemoryKit nightly maintenance from the repo. Summarize failures only, and include report paths.” Matches the cron daylight-savings self-correction pattern from Nate Herk’s course.
- Hermes config snippet is small and opinionated. Two short files:
~/.hermes/config.yaml(4 lines — memory_enabled + user_profile_enabled + LCM engine + compression) and~/.hermes/.env(4 LCM env vars — large-output externalization at 12K char threshold, transcript GC off, context threshold 0.70). Worth reading even if you don’t install the toolkit — it’s a baseline-config reference for any Hermes operator. - Adoption is unproven. 1 star / 0 forks / 5 commits / 3 days old at ingest. Benchmark claims (100/100 A+) are self-reported in a companion repo by the same author. Treat the architecture as a thoughtful proposal to track, not a battle-tested toolkit. Tier-1 refresh recommended in 30 days to recheck adoption signal.
- Public-safety note is unusually thoughtful. README explicitly warns: “Do not publish your raw LCM database, private notes, session IDs, Telegram topic names, secrets, or local profile paths. Publish redacted reports and aggregate benchmark numbers only.” That’s the right anti-pattern guard for a memory-tooling project — surface-area for accidental data exfil is high if you push raw memory to GitHub.
Where this fits
| Topic surface | Description | Relationship |
|---|---|---|
| Hermes Agent topic landing | Public landing page | MemoryKit is a new operator-side surface added here |
| Hermes Skill Bundles | Slash-command-driven skill composition | Complementary — bundles solve skill-invocation probabilism, MemoryKit solves retrieval-routing |
| Hermes Skins (joeynyc) | Visual themes for Hermes CLI | Unrelated — skins are cosmetic; MemoryKit is behavior + retrieval |
| Hermes Codex App-Server Runtime | Runtime delegation to Codex CLI | MemoryKit tools work regardless of runtime |
| Karpathy’s LLM-Wiki Techniques | The LLM-wiki pattern | Same retrieval-routing problem at a different layer — Karpathy pattern uses Claude Code wiki; MemoryKit packages similar discipline as Hermes plugin |
| Claude Code Memory Architectures Compared | Built-in vs memarch vs Hermes | Adjacent companion — that article is the Claude Code version of this comparison; MemoryKit is a specific Hermes implementation |
Try It
- Smallest install (read-only): clone the repo, read
docs/skills-and-memory.mdfor the promotion policy (LCM → native memory → docs → skill). Useful even without installing the toolkit — gives you a written rule for memory hygiene. - Full install with verify: follow the README’s Quick Start — clone, venv,
pip install -e .[dev], copy.envtemplate, runmemory_stack_verify.py --hermes-home ~/.hermes --workspace ~/my-hermes-workspace. Verify reports stack health before you depend on the toolkit. - Try one query:
python scripts/memory_stack_router.py "what did we decide about memory?" --json— outputs the ranked routing across LCM/native/QMD/graph as JSON. Sanity-check before installing the plugin wrapper. - Install the Hermes plugin wrapper:
python scripts/install_hermes_plugin.py --hermes-home ~/.hermes --repo "$PWD" --force. Add the printedMEMORY_STACK_REPO=...line to~/.hermes/.env. Restart Hermes (/reset). Then call from any Hermes session. - Wire nightly maintenance: add the README’s cron prompt to a Hermes cron job (token refresh → maintenance → failure-only summary + report paths). Pairs with the cron-creation pattern from Nate Herk’s course.
- Verify the toolkit’s benchmark claim: clone the companion benchmark repo and run it against your own Hermes deployment. The author claims 100/100 A+ — does it actually score that against your data? Surface any divergence as feedback.
Reddit signal — community memory-provider bake-off surfaces a NEW recommendation (Mnemosyne) (2026-05-25)
[Reddit signal — r/hermesagent 2026-05-25] Source: raw/reddit-1tms3g6.md (171 score, 76 comments, OP Lorian0x7, Memory & Context flair). OP tested every available Hermes memory provider end-to-end and lands on a recommendation that’s not currently covered by this wiki: Mnemosyne. Lorian0x7’s qualitative shake-out reads:
- Cloud providers — rejected as a class (vendor lock-in + data retention concerns)
- Hindsight — technically the best memory quality, but too heavy (many API calls, costly even on cheap models, hidden config knobs, “too many bugs”)
- OpenViking — pain to set up; OP dropped halfway
- Holographic — speed was fine, but quality was not there (“I’m still unsure if it was doing something”)
- Hancho — pain to set up; pretty good at profiling but same heaviness issues as Hindsight
- Mnemosyne (the OP’s winner) — “the easiest to setup, lightweight, fully local, and it’s the best balanced between quality and speed.” Stack: SQLite-based persistence (truncation cuts further detail). Not built-in by default — OP explicitly thinks it should be.
Note on MemoryKit (this article’s subject). MemoryKit is not on the OP’s tested list — Lorian0x7’s thread predates broad community awareness of duclamvan’s v0.2.0 release (toolkit was 3 days old at the wiki ingest 3 days before this signal). Don’t read the omission as a negative comparison; read it as “the field is now bigger than this thread captures.” The hybrid-RRF + entity-graph + regression-harness architecture MemoryKit ships is orthogonal to the bake-off dimensions Lorian0x7 ranks on (setup pain, weight, quality, speed) — both could co-exist or compose, with MemoryKit acting as the router on top of whatever native memory provider the operator picks.
Implication for the wiki. Mnemosyne is a tracked-but-uncovered provider as of the 2026-05-25 refresh. Worth a Tier-1 research pass to surface: (a) Mnemosyne’s repo + license + maintainer; (b) the exact SQLite schema + retrieval contract; (c) whether MemoryKit’s memory_stack_route can fuse Mnemosyne as one of its 4 sources, or whether it expects Hermes-native Core Memory specifically. Falsification candidate for this article: if Mnemosyne’s standalone performance materially exceeds MemoryKit’s claimed 100/100 A+ on the same benchmark, the value proposition of the 8-layer routing stack weakens — re-test on refresh.
Strict-bar caveat. Lorian0x7 is a self-reported operator, not a benchmark publisher. The ranking is qualitative (“kinda sucks”, “pain to setup”, “I liked the speed”). 171 score + 76 comments signals community interest in the question, not validation of the verdict. Treat as a triage map of providers worth investigating, not a tier list.
Reddit signal — Honcho as a community-built hybrid memory provider (2026-05-26)
[Reddit signal — r/hermesagent 2026-05-26] Source: raw/reddit-1to3req.md (13 score, 23 comments, OP MrGandalfSG, Setup & Installation flair). A second uncovered memory provider — Honcho — shipped as a fully-local, hybrid Gemini-orchestrated setup driving zero-token retrieval. Adjacent to the Mnemosyne signal above but a different architecture entirely. OP runs:
- Orchestrator: Gemini-3-Flash (high-level reasoning + subagent delegation) → DeepSeek v4 Flash fallback
- Local memory store: Honcho in Docker on WSL2 (Ubuntu), same host as Hermes
- Local chat/reasoning:
Gemma-4-e2bvia LM Studio withn_parallel=2(load-bearing — Honcho runs two processes, the API and the Deriver background-distillation worker, so single-parallel deadlocks) - Embeddings:
EmbeddingGemma-300Mat 768 dimensions, with pgvector modified from default 1536 → 768 dims to match the local embedder - Hardware: Legion Go + OneXGPU eGPU
The headline claim — “Zero-Token Memory: 100% locally on my machine. No internet calls, no Token fees for API retrieval” — is the load-bearing operator pitch. Worth tracking against MemoryKit’s claimed 100/100 A+ score: Honcho is a standalone provider, MemoryKit is a router on top — they could compose (Honcho as one of MemoryKit’s 4 layer-5 RRF sources) rather than compete. Falsification candidate: does MemoryKit’s memory_stack_route already support Honcho as a source, and if not, what’s the integration surface? Add to Open Questions on next refresh.
Adoption signal caveat. 13 score / 23 comments — significantly lower than the Lorian0x7 bake-off (171 / 76). Single operator with a specific hardware setup (Legion Go + eGPU + WSL2) that doesn’t generalize cleanly to most Hermes deployments. Treat as a worked configuration worth lifting patterns from (pgvector dim-matching, LM Studio n_parallel=2, Honcho’s API+Deriver dual-process architecture), not a community-validated provider recommendation.
Related
- Hermes Agent — topic landing — covers the three-tier memory model this toolkit extends
- Hermes Skill Bundles — orthogonal extension primitive (skill invocation vs memory retrieval)
- Hermes Skins (joeynyc) — visual customization (unrelated, but recent ingest in the same topic)
- Hermes Agent 1-Hour Course (Nate Herk) — covers cron creation and the broader operator setup
- Hermes Codex App-Server Runtime — runtime configuration (orthogonal)
- Claude Code Memory Architectures Compared — Claude-Code-side equivalent comparison
- Karpathy’s LLM-Wiki Techniques — same retrieval-routing problem applied to Claude Code wiki workflow
- Picking the Right Model — Build a Private Eval — regression-test discipline applied to model choice; this toolkit applies it to retrieval
Open Questions
- Reproduce the 100/100 A+ benchmark. Author runs it against their own data; community has not verified. First refresh cycle should attempt reproduction or surface the benchmark methodology.
- LCM internals. The toolkit explicitly references LCM (Long Context Memory) primitives in Hermes —
LCM_LARGE_OUTPUT_EXTERNALIZATION_*env vars +engine: lcmconfig. Worth a dedicated wiki article on Hermes LCM if/when ingested from primary docs — current coverage is implicit. - Entity-graph extraction quality. Layer 4 extracts an entity graph from Markdown and LCM SQLite transcripts. What entity types, what disambiguation strategy, what schema? Unspecified in the README; would need a
docs/entity-graph.mddeep-dive on first refresh. - Hybrid RRF tuning. Layer 5 fuses ranked lists from 4 sources via Reciprocal Rank Fusion. Default RRF constant? Per-source weight tuning? Affects retrieval quality materially. Worth pulling from
scripts/memory_stack_router.pyon refresh. - Comparison against synthadoc. The synthadoc IngestAgent + Pass 3 contradiction tracking is another opinionated retrieval-discipline toolkit operating in the same problem space. Are they composable, competing, or solving subtly different problems? Worth a side-by-side once both mature further.
- Versioning + breakage risk. v0.2.0 at fetch. The toolkit depends on Hermes-internal LCM API surface — any Hermes upstream API change could break it. Worth checking compat matrix on each Hermes release.