Source: wiki synthesis: agents-that-remember-managed-agents-memory-stores-dreaming, hermes-memorykit-duclamvan, auto-memory, Hermes Agent topic landing
Agents start isolated — each session is amnesiac, every conversation re-explores the same ground. Persistent memory is the bridge across sessions, and by mid-2026 three vendors had shipped meaningfully different answers: Anthropic’s Managed Agents memory stores + dreaming (filesystem mounted on sessions, async consolidation harness), Nous Research’s Hermes Agent (three-tier Core Memory + Session Search + External Memory Providers, with duclamvan’s MemoryKit as a community 8-layer extension), and Claude Code’s Auto Memory (per-project .claude/projects/<project>/memory/ markdown files with a MEMORY.md index loaded into every session). The architectures diverge along three axes — storage shape (filesystem vs tiered vs flat-file index), write discipline (model-driven vs operator-curated vs harvested-with-approval), and consolidation strategy (explicit batch job vs nightly cron vs in-session autodream).
Key Takeaways
- All three converge on the filesystem as the memory interface. Anthropic mounts memory stores as a filesystem so the model can use
bash+grep+ standard file ops. Hermes uses markdown files under~/.hermes/. Claude Code stores memory as markdown files under.claude/projects/<project>/memory/. Bespoke memory APIs lost. The reason is the same in each case — the model already knows how to navigate a filesystem; a custom retrieval surface is friction it has to learn. - All three are versioned and human-editable. Anthropic exposes a console filesystem viewer where every edit creates a new version. Hermes memory and Auto Memory files are plain markdown editable in any editor. Human-in-the-loop edits are first-class everywhere — not a power-user escape hatch.
- Consolidation is async and separate from write. Anthropic’s dreaming is a batch job spawning one sub-agent per transcript over up to ~100 transcripts. Hermes runs a nightly cron that updates indexes, runs health checks, and (with MemoryKit) drops a regression harness on retrieval quality. Auto Memory has an autodream background pass merging duplicates and dating relative time references. None of them try to consolidate during live agent work.
- The boundary unit differs across vendors. Anthropic supports multiple memory stores per organization with user-defined boundaries (per-user / per-workspace / per-domain). Hermes scopes memory per-agent profile under
~/.hermes/and lets you isolate fully or share across agents. Auto Memory is per-project under.claude/projects/<project>/. Each vendor encodes a different opinion about what unit of work owns a memory namespace. - Hermes ships a three-tier model out of the box. Per the topic landing infographic: Tier 1 Core Memory in Context (~2,200 chars + 1,375-char
USER.md, always in the prompt), Tier 2 Session Search (full-text search across past conversations with LLM summarization, on-demand), Tier 3 External Memory Providers (Obsidian / Notion / Roam / MongoDB / Redis / Zep / Chroma, pluggable). MemoryKit then adds layers 4-8 (entity graph, RRF router, focus brief, regression tests, nightly maintenance) on top. - Anthropic ships the explicit batch consolidation primitive. Dreaming is a non-destructive clone-then-write step: a new output store is created and dream output goes there; the input store is untouched. This is the unique architectural move — the other two consolidate in place.
- Claude Code’s Auto Memory is the lightest-weight take. No filesystem mount, no batch job, no orchestrator — just markdown files and an auto-loaded
MEMORY.mdindex. Per auto-memory, the typology splits into user / feedback / project / reference, and the discipline is one-line index entries pointing at per-topic detail files. - Cost models split along runtime trust. Dreaming is the only one with an explicit cost story — ~95% prompt-cache hit rate, planned 50% batch-style discount, model swap between Opus 4.7 and Sonnet 4.6. Hermes is operator-paid (run any backend on your own infra; MemoryKit adds zero hosted cost). Auto Memory is bundled into the Claude Code session cost (markdown files are tiny — context is the only meaningful spend).
- Write discipline diverges sharply. Dreaming’s input transcripts are model-curated and the output store is model-written. Hermes lets the agent write to Tier 1 + Tier 2, but Tier 3 + MemoryKit’s QMD wiki are operator-curated. Auto Memory writes happen via the save-this-to-memory loop — the human says “save this as a feedback entry” and the agent appends. Three different opinions about who owns the memory boundary.
- None of them have a public benchmark. Anthropic’s workshop did not state token bounds or a citation-quality score. MemoryKit claims 100/100 A+ on its companion repo — self-reported, 1 star at ingest. Auto Memory is described in third-party Claude Code walkthroughs without comparative numbers. The empirical evidence layer is the missing piece across all three.
Three architectures, side by side
| Managed Agents memory stores + dreaming | Hermes three-tier + MemoryKit | Claude Code Auto Memory | |
|---|---|---|---|
| Storage shape | Filesystem mounted as a session resource; arbitrary directory tree the model organizes | Tier 1 in-context block + Tier 2 session-search SQLite + Tier 3 pluggable external provider (Obsidian / Notion / Roam / Redis / Chroma / etc.); MemoryKit adds QMD wiki + entity graph | Flat directory of markdown files under .claude/projects/<project>/memory/, indexed by MEMORY.md table of contents |
| Write discipline | Model reads first, writes new info to files in the store; access: read-only flag blocks writes; steering prompt at mount time narrows what gets written | Agent writes to Tier 1 (Core Memory) + Tier 2 (Session Search) automatically; Tier 3 + QMD wiki are operator-curated; MemoryKit’s docs/skills-and-memory.md ships an explicit promotion policy (LCM → native → docs → skill) | Continuous harvest from conversation, persisted on human approval (“save this to memory as a feedback entry”); index lines stay under ~200 chars |
| Consolidation | Dreaming — explicit async batch job. Caller supplies model (Opus 4.7 / Sonnet 4.6), input store, and up to ~100 transcript session IDs. Orchestrator spawns one sub-agent per transcript; output is a cloned store with deduplication, enrichment (back-filled dates / IDs), and stale-info removal. Non-destructive (writes only to a new output store) | Nightly cron running MemoryKit layer 8 — updates indexes, runs health checks, executes regression harness against retrieval. Hermes core ships compression + GEPA (Genetic-Pareto Prompt Evolution) for skill evolution offline | Autodream background pass between sessions — merges duplicates, drops contradicted entries, converts “yesterday” to dated entries (per nate-herk-every-level-of-claude L5 framing) |
| Retrieval interface | Model uses bash + grep + standard file ops; dreamed output includes a slug-keyed index file for cheap lookup before wide grep | MemoryKit hybrid RRF (Reciprocal Rank Fusion) router across LCM + native memory + QMD + entity graph; produces a cited focus brief rather than raw retrieved chunks | MEMORY.md index auto-loaded into every session; detail files referenced by one-line index entries; inspect with /memory in Claude Code or Cowork |
| Surface | Claude Managed Agents CLI + console; observable as a CMA resource alongside agents / environments / sessions; dream jobs themselves create a CMA session you can click into | Hermes CLI + dashboard TUIs (hermes dashboard --tui, Herm-TUI, Hermes Console); MemoryKit registers 4 Hermes-native tools (memory_stack_status, memory_stack_route, memory_stack_focus_brief, memory_stack_regress) | Claude Code CLI + Cowork — same memory state loads into both surfaces per Rick Mulready’s Memory 2.0 walkthrough |
| Cost model | Token-priced; mitigated by ~95% prompt-cache hit rate + planned batch-style 50% discount + model swap (Opus 4.7 ↔ Sonnet 4.6) + explicit token budgeting | Operator-paid — runs on your own infra against any model backend; MemoryKit adds zero hosted cost (it’s a local Python plugin); GEPA evolution roughly $2-10/run per the topic infographic | Bundled into Claude Code session cost; markdown files are tiny, so context is the only meaningful spend; index size discipline (~200 chars per line) is the operating constraint |
Three axes of variation
Storage shape is the first split. Anthropic gives the model a filesystem and lets it decide the directory layout — files come out organized into subdirectories the model chose. Hermes layers three distinct stores (Core Memory in context / Session Search SQLite / pluggable external provider) and MemoryKit adds two more retrieval-oriented stores on top (QMD wiki / entity graph). Auto Memory is flatter than either — a directory of markdown files keyed by a single index file. The trade-off is legibility versus structure: a flat directory is easy to grep but harder to route across; a tiered model gives you obvious retrieval lanes but requires discipline to keep them aligned.
Write discipline is the second. Dreaming’s input is model-curated and the dreamed output store is model-written, with operator steering via the dream prompt. Hermes splits writes — the agent owns Tier 1 + Tier 2, the operator owns Tier 3 + QMD, and MemoryKit’s docs/skills-and-memory.md provides a written rule for when a fact gets promoted up the stack. Auto Memory’s writes happen on explicit human approval (“save this as a feedback entry”), so the human is in the loop on every persistent fact. The trade-off is autonomy versus auditability: model-driven writes scale faster but can drift; human-approved writes stay accurate but require the operator to be present.
Consolidation strategy is the third. Dreaming is the only architecture where consolidation is a first-class explicit primitive — a job with a model, inputs, outputs, observability, and cost levers. Hermes does it as a cron and (with MemoryKit) as a regression harness on retrieval. Auto Memory does it via an autodream pass between sessions. The trade-off is observability versus simplicity: an explicit batch job lets you debug consolidation in the console at the cost of operator complexity; a cron is simple but opaque; autodream is invisible but you have no surface to debug if memory quality degrades.
When to pick which
- If you’re building on Claude Managed Agents and your workflow spans multiple sessions per user → Anthropic memory stores + scheduled dreaming. The explicit primitives give you observability you cannot get with file-based memory, and the cost story is plausible at scale (95% cache hit + 50% batch discount).
- If you’re self-hosting an agent and want maximum control over the storage backend → Hermes three-tier + (optionally) MemoryKit. The external-memory tier lets you point at Obsidian / Notion / Roam / Redis / Chroma without re-architecting; MemoryKit adds the routing + regression layer if you’ve outgrown vanilla Hermes recall. Caveat: MemoryKit was 3 days old at ingest with 1 star — track adoption before depending on it.
- If you’re working in Claude Code on a single project and want zero-config persistence → Auto Memory. No setup, no batch jobs, no external infra. The save-this-to-memory loop is the entire UX. The trade-off is no cross-project sharing and no consolidation surface to debug.
- If you need cross-organization or cross-agent memory sharing → none of the three solves this cleanly yet. Anthropic states memory stores are per-org but multiple per org; cross-org sharing is not addressed. Hermes profiles can share via Tier 3 external providers but you wire that yourself. Auto Memory is per-project by design.
The deeper pattern — what they all agree on
Filesystem as the memory interface. Anthropic mounts memory stores as a filesystem so the model can bash and grep. Hermes uses markdown under ~/.hermes/. Auto Memory uses markdown under .claude/projects/<project>/memory/. Three independent vendors converged on the same retrieval primitive because the model already knows how to navigate a filesystem — a bespoke memory API is just friction. The same logic that drove Karpathy’s LLM-wiki pattern is operating here: text-on-disk is the lowest-friction substrate for an LLM to work in.
Versioning + audit-ability is non-negotiable. Anthropic versions every edit and exposes the API to inspect prior versions. Hermes stores everything as markdown so git is the version surface. Auto Memory is also markdown, also git-able. No vendor shipped a memory system you cannot audit by hand — the trust model assumes the operator (or a regulator, or a curious user) might want to read what the agent thinks it knows.
Consolidation is async and separate from write. Across all three, write-time is cheap and consolidation is the expensive step. Dreaming is a separate job. Hermes runs nightly. Auto Memory’s autodream is a between-sessions background pass. None of them try to do consolidation during live agent work — the latency budget for an interactive turn cannot absorb a full memory rewrite. This is structural: an agent that paused mid-response to dedupe its memory would be unusable.
Per-user / per-project / per-org boundary is a first-class concern. Anthropic exposes the boundary as a parameter (memory store ID, choose your own scope). Hermes scopes per-agent profile under ~/.hermes/ and supports multi-agent isolation per Nate Herk’s course. Auto Memory is keyed to the project folder. The boundary is encoded into the file path or resource ID — there’s no scenario where memory leaks across boundaries by accident, because the storage path itself is the boundary.
Where the architectures diverge — and what it implies
Consolidation phase explicitness. Dreaming is a first-class primitive you can schedule, inspect, debug, and budget. The Hermes cron is operator-owned and largely opaque (MemoryKit’s regression harness changes that — it’s the closest thing to dreaming in the Hermes world). Auto Memory’s autodream is a between-sessions pass you cannot observe directly. Implication: if you’re going to depend on memory consolidation, Anthropic gives you the only architecture where the consolidation step is itself an observable artifact. The others ask you to trust the loop.
Runtime ownership. Dreaming runs in Anthropic’s hosted plane (you pay per dream job, Anthropic operates the harness). Hermes runs entirely on your infra. Auto Memory runs in the Claude Code CLI process on your machine. Implication: the architecture you pick is also a deployment-and-trust decision. Anthropic memory stores require trusting their hosted runtime with your agent’s accumulated state; Hermes lets you keep memory entirely off the vendor’s machines; Auto Memory keeps memory inside your developer workstation. This maps cleanly to the same dimension visible in Managed Agents self-hosted sandboxes — the trust boundary is the real product decision.
Queryable knowledge base vs append log. MemoryKit’s RRF router across LCM + native memory + QMD + entity graph treats memory as a knowledge base to be queried — you ask a question and get a cited focus brief. Anthropic memory stores are closer to an append log the model navigates — the model uses grep and cat, not a query API. Auto Memory is the same — markdown the model reads on demand. Implication: for high-volume retrieval-heavy workflows (research agents, customer-support agents with deep history), MemoryKit’s discipline matches the workload. For low-volume sticky-context workflows (project assistants, personal agents), the append-log shape is enough — and lighter.
Try It
- Pick the architecture that matches your trust model. Run through the Where the architectures diverge section and decide whose runtime you trust with your agent’s accumulated memory. Then pick the matching architecture.
- Run the Anthropic workshop repo’s recall test — bootstrap an agent, reproduce the no-memory base case, mount a memory store, then run the dreaming pipeline end-to-end. The cheapest way to feel the trade-offs is to do all three architectures’ “hello world” in one afternoon.
- Inspect Auto Memory in your current Claude Code project —
/memorylists every index line. Anything wrong, stale, or duplicated → fix it now. The discipline is the same one CLAUDE.md uses: index entries stay under ~200 chars. - Try MemoryKit’s promotion policy doc (
docs/skills-and-memory.md) even if you don’t install the plugin. The written rule for when a fact gets promoted from LCM → native memory → docs → skill is the hardest discipline in any memory architecture — having the rule written down helps regardless of vendor. - Schedule a nightly memory job in whichever architecture you picked: a Claude Managed Agents dream cron, a Hermes nightly maintenance task, or a between-sessions Auto Memory cleanup. The async consolidation step is what stops the memory pile from slowly degrading.
Open Questions
- No public benchmark across vendors. None of the three architectures has been measured against the same eval suite. The MemoryKit 100/100 A+ score is self-reported. Anthropic gave no citation-precision numbers. Auto Memory has no third-party benchmark. The first vendor-neutral eval would change the picture materially.
- Cost ceiling at production scale. Dreaming’s 95% cache hit + 50% batch discount sounds plausible for ~100-transcript dreams; cost at 1,000+ sessions/day is unknown. Hermes operator cost is whatever your hosted-LLM bill plus infra adds up to. Auto Memory’s bundled cost is invisible to operators. None of the three has published a dollar-per-million-memories number.
- Concurrent-write semantics across parallel sessions. Anthropic memory stores are versioned, but conflict resolution when two sessions write the same file simultaneously is not covered in the workshop. Hermes Tier 1 + Tier 2 writes from concurrent agents — semantics unspecified. Auto Memory in a multi-developer project — also unspecified.
- Wiki-pattern convergence. Three architectures converging on filesystem-as-memory is the same convergence that drove Karpathy’s LLM-wiki pattern and synthadoc. Open question: do these all evolve into the same primitive (markdown-on-disk + hybrid search + LLM-driven consolidation), or does the runtime-ownership split keep them architecturally distinct?
- Auto Memory growth degradation. The MEMORY.md index in this very project hit 26.8KB and was flagged for over-long index entries. At what file count or KB count does Auto Memory’s flat-directory + single-index shape stop working? No vendor has published a graceful-degradation curve.
Related
- Agents That Remember — Managed Agents Memory Stores + Dreaming — Kevin’s Code with Claude London workshop introducing both primitives
- Hermes MemoryKit — 8-Layer Memory Stack — duclamvan’s community extension of Hermes’s three-tier model
- Auto Memory — Claude Code’s Persistent Cross-Session Memory — file-based per-project memory layer
- Hermes Agent — topic landing — covers the base three-tier memory model (Core Memory / Session Search / External Providers)
- Managed Agents production (Jess + Lance) — primitive deep-dive on the Agent / Environment / Session stack that memory stores layer onto
- Context Management in Claude Code — sibling concern at the single-session level; persistent memory extends the same problem across sessions
- CLAUDE.md File Primer — sibling layer to Auto Memory; CLAUDE.md is prescriptive rules, Auto Memory is descriptive harvested facts