Source: duclamvan-hermes-memorykit-readme-2026-05-22.md — full README + companion benchmark repo link, fetched 2026-05-22. Repo: github.com/duclamvan/hermes-memorykit. Version: v0.2.0 (released 2026-05-19). Stars: 1. License: MIT. Language: Python (100%). Author: duclamvan. Companion benchmark repo: github.com/duclamvan/hermes-memory-benchmarks.

Practical memory toolkit for Hermes Agent users — addresses the failure mode “my agent stops remembering after long chats, cron runs, tool calls, and context compression.” Extends Hermes’s existing three-tier memory model (Core Memory in Context / Session Search FTSS / External Memory Providers) with an opinionated 8-layer stack — LCM raw transcript → native memory → QMD/markdown wiki → entity graph → hybrid RRF router → focus brief → regression tests → nightly maintenance. Ships a Hermes-native plugin wrapper exposing four tools (memory_stack_status, memory_stack_route, memory_stack_focus_brief, memory_stack_regress) plus config templates for .env, ~/.hermes/config.yaml, and cron prompts. Author claims via companion benchmark repo: 100/100 A+ score, 27/27 retrieval checks passed, 35 Hermes profiles verified. Adoption signal caveat: 1 star, 0 forks, 5 commits, 3 days old at ingest — extremely early. Benchmark claims are self-reported. Worth tracking but not yet community-validated.

The 8-layer stack

The toolkit’s central design — quoted verbatim from the README:

raw transcript → durable notes → searchable docs → entity graph → RRF router → focus brief → regression tests → nightly maintenance
#LayerWhat it stores / does
1LCM raw transcript storeExact conversation history (Hermes’s existing LCM SQLite)
2Native memoryCompact durable facts (Hermes’s existing Core Memory)
3QMD or Markdown wikiSearchable project / user / system docs (e.g., this Karpathy-style wiki)
4Entity graphLinks people, projects, topics, files, sessions (extracted from layers 1+3)
5Hybrid RRF routerRanks candidates across all 4 sources via Reciprocal Rank Fusion
6Focus brief builderTurns ranked recall into a short, cited task brief
7Retrieval regression harnessCatches memory drift over time
8Nightly maintenanceUpdates indexes, runs health checks

The novel layers vs Hermes’s native memory model are 4 (entity graph), 5 (hybrid RRF router), 6 (focus brief), 7 (regression harness), 8 (nightly maintenance). Layers 1-3 are wrappers around Hermes primitives that already exist. The toolkit’s value is the routing layer on top.

Key Takeaways

  • The 8-layer model is the load-bearing contribution. Hermes ships layers 1-3 natively (LCM + native memory + plus-optional-Obsidian-or-Notion external memory). MemoryKit adds 4-8 — the routing and regression layer. The framing is: agents have memory features but no retrieval router and no drift detector. That gap is real and matches the operator pain points surfaced in the user-stories catalog.
  • Hybrid RRF (Reciprocal Rank Fusion) over four sources, not vector search. The router (layer 5) ranks across LCM, native memory, QMD, and the entity graph — explicitly not a single vector DB. Matches the Karpathy LLM-wiki thesis that hybrid keyword+structure beats vector-only at the hundreds-of-docs scale this wiki sits at.
  • Focus brief, not raw retrieval. Layer 6 turns ranked recall into a cited task brief — not a list of documents. That matches Anthropic’s Best Practices framing on context-as-prompt (give the agent a focused context payload, not a dump). Cites are the discipline anchor — the brief is verifiable against its sources.
  • Regression harness is the differentiator. Layer 7 — “catches memory drift” — is a primitive most agent memory toolkits don’t ship. The “we trained an eval set” discipline applied to retrieval. Without this, you can’t tell if a new memory write silently degraded retrieval quality. Pairs with the eval-first methodology from Lucas’s Code-with-Claude London talk.
  • Hermes plugin wrapper exposes 4 native tools. Once installed (install_hermes_plugin.py --hermes-home ~/.hermes --repo "$PWD" --force), the toolkit registers memory_stack_status / memory_stack_route / memory_stack_focus_brief / memory_stack_regress as Hermes-native tools. Available to skills, cron prompts, and ad-hoc invocations.
  • Promotion policy lives in docs/skills-and-memory.md. The toolkit ships an explicit doc on when to keep a fact in LCM, when to save native memory, when to write docs, when to create a skill. That’s the hardest discipline in any memory architecture — having a written rule helps. Pairs with Hermes Skill Bundles (which makes skills explicit at the bundle level).
  • Nightly cron is the maintenance heartbeat. Layer 8 runs via Hermes cron with a specific prompt template the README provides: “Run preflight token refresh first. Then run Hermes MemoryKit nightly maintenance from the repo. Summarize failures only, and include report paths.” Matches the cron daylight-savings self-correction pattern from Nate Herk’s course.
  • Hermes config snippet is small and opinionated. Two short files: ~/.hermes/config.yaml (4 lines — memory_enabled + user_profile_enabled + LCM engine + compression) and ~/.hermes/.env (4 LCM env vars — large-output externalization at 12K char threshold, transcript GC off, context threshold 0.70). Worth reading even if you don’t install the toolkit — it’s a baseline-config reference for any Hermes operator.
  • Adoption is unproven. 1 star / 0 forks / 5 commits / 3 days old at ingest. Benchmark claims (100/100 A+) are self-reported in a companion repo by the same author. Treat the architecture as a thoughtful proposal to track, not a battle-tested toolkit. Tier-1 refresh recommended in 30 days to recheck adoption signal.
  • Public-safety note is unusually thoughtful. README explicitly warns: “Do not publish your raw LCM database, private notes, session IDs, Telegram topic names, secrets, or local profile paths. Publish redacted reports and aggregate benchmark numbers only.” That’s the right anti-pattern guard for a memory-tooling project — surface-area for accidental data exfil is high if you push raw memory to GitHub.

Where this fits

Topic surfaceDescriptionRelationship
Hermes Agent topic landingPublic landing pageMemoryKit is a new operator-side surface added here
Hermes Skill BundlesSlash-command-driven skill compositionComplementary — bundles solve skill-invocation probabilism, MemoryKit solves retrieval-routing
Hermes Skins (joeynyc)Visual themes for Hermes CLIUnrelated — skins are cosmetic; MemoryKit is behavior + retrieval
Hermes Codex App-Server RuntimeRuntime delegation to Codex CLIMemoryKit tools work regardless of runtime
Karpathy’s LLM-Wiki TechniquesThe LLM-wiki patternSame retrieval-routing problem at a different layer — Karpathy pattern uses Claude Code wiki; MemoryKit packages similar discipline as Hermes plugin
Claude Code Memory Architectures ComparedBuilt-in vs memarch vs HermesAdjacent companion — that article is the Claude Code version of this comparison; MemoryKit is a specific Hermes implementation

Try It

  1. Smallest install (read-only): clone the repo, read docs/skills-and-memory.md for the promotion policy (LCM → native memory → docs → skill). Useful even without installing the toolkit — gives you a written rule for memory hygiene.
  2. Full install with verify: follow the README’s Quick Start — clone, venv, pip install -e .[dev], copy .env template, run memory_stack_verify.py --hermes-home ~/.hermes --workspace ~/my-hermes-workspace. Verify reports stack health before you depend on the toolkit.
  3. Try one query: python scripts/memory_stack_router.py "what did we decide about memory?" --json — outputs the ranked routing across LCM/native/QMD/graph as JSON. Sanity-check before installing the plugin wrapper.
  4. Install the Hermes plugin wrapper: python scripts/install_hermes_plugin.py --hermes-home ~/.hermes --repo "$PWD" --force. Add the printed MEMORY_STACK_REPO=... line to ~/.hermes/.env. Restart Hermes (/reset). Then call from any Hermes session.
  5. Wire nightly maintenance: add the README’s cron prompt to a Hermes cron job (token refresh → maintenance → failure-only summary + report paths). Pairs with the cron-creation pattern from Nate Herk’s course.
  6. Verify the toolkit’s benchmark claim: clone the companion benchmark repo and run it against your own Hermes deployment. The author claims 100/100 A+ — does it actually score that against your data? Surface any divergence as feedback.

Reddit signal — community memory-provider bake-off surfaces a NEW recommendation (Mnemosyne) (2026-05-25)

[Reddit signal — r/hermesagent 2026-05-25] Source: raw/reddit-1tms3g6.md (171 score, 76 comments, OP Lorian0x7, Memory & Context flair). OP tested every available Hermes memory provider end-to-end and lands on a recommendation that’s not currently covered by this wiki: Mnemosyne. Lorian0x7’s qualitative shake-out reads:

  • Cloud providers — rejected as a class (vendor lock-in + data retention concerns)
  • Hindsight — technically the best memory quality, but too heavy (many API calls, costly even on cheap models, hidden config knobs, “too many bugs”)
  • OpenViking — pain to set up; OP dropped halfway
  • Holographic — speed was fine, but quality was not there (“I’m still unsure if it was doing something”)
  • Hancho — pain to set up; pretty good at profiling but same heaviness issues as Hindsight
  • Mnemosyne (the OP’s winner)“the easiest to setup, lightweight, fully local, and it’s the best balanced between quality and speed.” Stack: SQLite-based persistence (truncation cuts further detail). Not built-in by default — OP explicitly thinks it should be.

Note on MemoryKit (this article’s subject). MemoryKit is not on the OP’s tested list — Lorian0x7’s thread predates broad community awareness of duclamvan’s v0.2.0 release (toolkit was 3 days old at the wiki ingest 3 days before this signal). Don’t read the omission as a negative comparison; read it as “the field is now bigger than this thread captures.” The hybrid-RRF + entity-graph + regression-harness architecture MemoryKit ships is orthogonal to the bake-off dimensions Lorian0x7 ranks on (setup pain, weight, quality, speed) — both could co-exist or compose, with MemoryKit acting as the router on top of whatever native memory provider the operator picks.

Implication for the wiki. Mnemosyne is a tracked-but-uncovered provider as of the 2026-05-25 refresh. Worth a Tier-1 research pass to surface: (a) Mnemosyne’s repo + license + maintainer; (b) the exact SQLite schema + retrieval contract; (c) whether MemoryKit’s memory_stack_route can fuse Mnemosyne as one of its 4 sources, or whether it expects Hermes-native Core Memory specifically. Falsification candidate for this article: if Mnemosyne’s standalone performance materially exceeds MemoryKit’s claimed 100/100 A+ on the same benchmark, the value proposition of the 8-layer routing stack weakens — re-test on refresh.

Strict-bar caveat. Lorian0x7 is a self-reported operator, not a benchmark publisher. The ranking is qualitative (“kinda sucks”, “pain to setup”, “I liked the speed”). 171 score + 76 comments signals community interest in the question, not validation of the verdict. Treat as a triage map of providers worth investigating, not a tier list.

Reddit signal — Honcho as a community-built hybrid memory provider (2026-05-26)

[Reddit signal — r/hermesagent 2026-05-26] Source: raw/reddit-1to3req.md (13 score, 23 comments, OP MrGandalfSG, Setup & Installation flair). A second uncovered memory provider — Honcho — shipped as a fully-local, hybrid Gemini-orchestrated setup driving zero-token retrieval. Adjacent to the Mnemosyne signal above but a different architecture entirely. OP runs:

  • Orchestrator: Gemini-3-Flash (high-level reasoning + subagent delegation) → DeepSeek v4 Flash fallback
  • Local memory store: Honcho in Docker on WSL2 (Ubuntu), same host as Hermes
  • Local chat/reasoning: Gemma-4-e2b via LM Studio with n_parallel=2 (load-bearing — Honcho runs two processes, the API and the Deriver background-distillation worker, so single-parallel deadlocks)
  • Embeddings: EmbeddingGemma-300M at 768 dimensions, with pgvector modified from default 1536 → 768 dims to match the local embedder
  • Hardware: Legion Go + OneXGPU eGPU

The headline claim — “Zero-Token Memory: 100% locally on my machine. No internet calls, no Token fees for API retrieval” — is the load-bearing operator pitch. Worth tracking against MemoryKit’s claimed 100/100 A+ score: Honcho is a standalone provider, MemoryKit is a router on top — they could compose (Honcho as one of MemoryKit’s 4 layer-5 RRF sources) rather than compete. Falsification candidate: does MemoryKit’s memory_stack_route already support Honcho as a source, and if not, what’s the integration surface? Add to Open Questions on next refresh.

Adoption signal caveat. 13 score / 23 comments — significantly lower than the Lorian0x7 bake-off (171 / 76). Single operator with a specific hardware setup (Legion Go + eGPU + WSL2) that doesn’t generalize cleanly to most Hermes deployments. Treat as a worked configuration worth lifting patterns from (pgvector dim-matching, LM Studio n_parallel=2, Honcho’s API+Deriver dual-process architecture), not a community-validated provider recommendation.

Open Questions

  • Reproduce the 100/100 A+ benchmark. Author runs it against their own data; community has not verified. First refresh cycle should attempt reproduction or surface the benchmark methodology.
  • LCM internals. The toolkit explicitly references LCM (Long Context Memory) primitives in Hermes — LCM_LARGE_OUTPUT_EXTERNALIZATION_* env vars + engine: lcm config. Worth a dedicated wiki article on Hermes LCM if/when ingested from primary docs — current coverage is implicit.
  • Entity-graph extraction quality. Layer 4 extracts an entity graph from Markdown and LCM SQLite transcripts. What entity types, what disambiguation strategy, what schema? Unspecified in the README; would need a docs/entity-graph.md deep-dive on first refresh.
  • Hybrid RRF tuning. Layer 5 fuses ranked lists from 4 sources via Reciprocal Rank Fusion. Default RRF constant? Per-source weight tuning? Affects retrieval quality materially. Worth pulling from scripts/memory_stack_router.py on refresh.
  • Comparison against synthadoc. The synthadoc IngestAgent + Pass 3 contradiction tracking is another opinionated retrieval-discipline toolkit operating in the same problem space. Are they composable, competing, or solving subtly different problems? Worth a side-by-side once both mature further.
  • Versioning + breakage risk. v0.2.0 at fetch. The toolkit depends on Hermes-internal LCM API surface — any Hermes upstream API change could break it. Worth checking compat matrix on each Hermes release.