Claude Code Memory Architectures Compared — Built-in vs memarch vs Hermes (Hybrid Recommendation)

Source: raw/3_Claude_Memory_Systems_to_Get_You_Ahead_of_99%_of_People.md — YouTube tutorial rFWxRZ5D-lM (creator inferred to be the Agentic Academy / Ben channel based on identical “skill systems” terminology + commercial pitch present across the skill-chaining and five-skill AIOS videos — see Open Questions).

Side-by-side decomposition of three Claude Code memory systems against three load-bearing questions every memory system must answer — storage (when/how does information get written?), injection (when/how does it get loaded into context?), and recall (when/how does past information get retrieved?). The video’s thesis: don’t pick one — combine Claude Code automemory + memarch’s vector-DB capture-everything + Hermes’s curated memory.md + user.md + soul.md frozen snapshot, with memory.md acting as a level-0 cheap context check before falling through to memarch’s three-tier retrieval. Refines and extends prior coverage in memory-and-dreaming-self-learning-agents-talk and Hermes Agent.

Key Takeaways

The three questions every memory system must answer

Storage — when a user says something worth remembering (“our landing page is example.com/scrapes”, “we’re using Stripe not PayPal”), how does it get saved durably and reliably?
Injection — how does important context get pushed into Claude’s window at session start, without bloating the prompt?
Recall — when the user asks about something months old (“what did we decide about pricing last Tuesday?”), how does the system find it?

The three systems take very different approaches on each axis.

System 1 — Claude Code automemory (built-in)

Storage: Silent auto-detection during conversations. Writes to per-project .claude/projects/<id>/memory/ MD files with a memory.md index pointing to them. If the same preference appears 3+ times across projects, it gets promoted to ~/.claude/memory/.
Inspect locally: Run /memory in the Claude Code terminal — shows user memory (CLAUDE.md), project memory (CLAUDE.md), or the automemory folder.
Injection: Default CLAUDE.md injection at session start + a pre-tool-use hook that scans the memory.md index and pulls relevant files into context before tool calls.
Recall: Weak. Checks automemory files; if not there, it’s lost. The --resume flag works only if you know the session ID.
Capacity: “Barely saves a thing.” Selective — only triggers on what Claude thinks is important.

System 2 — memarch (open-source, vector-DB, capture-everything)

Storage: Claude Code stop hook fires after every turn → calls Haiku to summarize the turn into bullets → appends to a dated memory file with session anchors. Periodically runs memarch index (or manually) → chunks → hashes → embeds → stores in a local Mil vector DB on CPU (zero API cost).
Source-of-truth: markdown files. The vector DB is rebuildable from them.
Injection: None — relies on Claude Code’s default CLAUDE.md + memory.md injection. memarch is purely a storage + recall library.
Recall (the three-tier system, progressive disclosure):
1. memarch search — convert query to vectors, find semantic matches in the vector DB. Hybrid dense vectors (meaning) + BM25 (keyword). Returns closest matches.
2. memarch expand — if tier 1 isn’t enough, returns more context + metadata + summary around each match.
3. Raw session dialogue — if still insufficient, retrieves full session transcripts. Token-heavy but exhaustive.
Strength: Excellent for “find a needle in 6 months of conversations by meaning, not keyword.”
Weakness: No injection layer — always hits the database, slower than checking local context.

System 3 — Hermes (curated agent-controlled memory)

Storage: Agent has add / replace / remove tools on three files — memory.md (environment + things done), user.md (user profile, voice, preferences), soul.md (third file mentioned, contents not detailed in transcript). All three are character-capped to enforce consolidation. Built-in deduplication checks before writes.
Background pruning: Every turn auto-saves raw transcript to a database. Every 7 days a “curator” runs to consolidate + prune the raw transcripts down into the three MD files.
Injection: At session start, loads a frozen snapshot of CLAUDE.md + memory.md + user.md + soul.md — ~1,300 tokens cached per session, not per message. Subsequent in-session changes get written to disk but don’t update the current session’s snapshot (apply on next session).
Recall: Smart level-0 check — first searches the already-loaded memory.md in context (zero cost, instant) before dropping to database keyword search. Database tier returns top 3 sessions by relevance + Gemini Flash summary.
Strength: Cheap retrieval for in-context-recall + curated consolidation.
Weakness: Keyword-only recall (no semantic vectors), depends on agent deciding what’s worth saving.

The hybrid recommendation

Combine all three:

Layer	Borrowed from	What it does
Storage — automatic raw	memarch stop hook	Captures every turn into vector DB + raw transcripts
Storage — curated	Hermes `memory.md` + `user.md`	Agent decides what matters, character-capped
Injection	Hermes frozen snapshot	`CLAUDE.md` + `memory.md` + `user.md` + `soul.md` at session start (~1,300 tokens)
Recall L0	Hermes context check	Search already-loaded `memory.md` first (instant, zero-cost)
Recall L1	memarch hybrid search	Vector + BM25 semantic search of the vector DB
Recall L2	memarch expand	More context around each match
Recall L3	memarch raw dialogue	Full session transcripts as last resort

Cost: ~1,300 tokens per session for the injection layer. Worth it vs. retrieval-only setups, per the creator’s recommendation, “the increased performance from recent consolidated memories is worth it.”

Design principles surfaced

Markdown as source of truth, vectors as derived state. Both memarch and Hermes treat MD as authoritative; vector DBs / databases are rebuildable.
Progressive disclosure for recall. Don’t always hit the heaviest layer first. Level-0 (context) → Level-1 (semantic search) → Level-2 (expanded match) → Level-3 (raw dialogue).
Character caps enforce consolidation. Hermes’s cap is a feature, not a constraint — it forces curation.
Keep CLAUDE.md under ~200 lines. The transcript explicitly calls this out as best practice for the system-prompt cost.

Open Questions

Creator identity — the transcript names no channel. Style + commercial pitches (“Agentic Academy”, “AI accelerator”, “skill systems”) match the same author as skill-systems-orchestrator-child-pattern and ben-five-skill-aios-setup — likely Ben (channel attribution carry-forward from those companion videos).
memarch repo URL and license — not linked in transcript. Mentioned by name but pointer to GitHub / install instructions not in the transcript.
soul.md contents — third Hermes memory file mentioned but never detailed. Likely a higher-order persona/values file (see hermes-user-stories for adjacent context).
Mil vector DB — referenced as the local CPU vector store. May be a typo for Milvus or another OSS vector DB. Verification needed.

memory-and-dreaming-self-learning-agents-talk — original Anthropic memory + dreaming framework reference
_index — Hermes Agent topic root, including memory model context
hermes-user-stories — concrete Hermes deployment patterns
hermes-security-model — defense-in-depth around persistent state
skill-systems-orchestrator-child-pattern — companion video by same creator, skill architecture
ben-five-skill-aios-setup — companion video by same creator, full AIOS 5-skill bundle
nate-herk-claude-code-operating-systems-course — alternative AIOS framing (3Ms + 4Cs, AIS-OS repo)
_index — the Karpathy LLM-wiki pattern referenced in YT5’s optimizer skill

Try It

Inspect your existing automemory. Run /memory in Claude Code → open the automemory folder. Note how sparse it is by default. This is the baseline.
Install memarch (once URL is verified) for storage + recall — capture-everything stop hook + local vector DB rebuildable from markdown.
Adopt Hermes’s three-file pattern even without running Hermes — create memory.md (environment), user.md (preferences), optionally soul.md (higher-order values). Cap each at a character budget. Inject at session start via CLAUDE.md references.
Build the level-0 check. Before querying any vector DB or session log, instruct the agent (in CLAUDE.md) to first scan the already-loaded memory.md — instant zero-cost retrieval for ~30% of recall queries.
Run a 7-day curator. Schedule a routine (or Claude Routine, see claude-code-routines) that consolidates raw transcripts into the three MD files weekly. Mirrors Hermes’s curator pattern.
Audit CLAUDE.md length. Keep it under ~200 lines; offload detail to memory.md files that get pulled in by the pre-tool-use hook only when needed.

Jonathon's AI Wiki

Explorer

Claude Code Memory Architectures Compared — Built-in vs memarch vs Hermes (Hybrid Recommendation)

Key Takeaways

The three questions every memory system must answer

System 1 — Claude Code automemory (built-in)

System 2 — memarch (open-source, vector-DB, capture-everything)

System 3 — Hermes (curated agent-controlled memory)

The hybrid recommendation

Design principles surfaced

Open Questions

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Claude Code Memory Architectures Compared — Built-in vs memarch vs Hermes (Hybrid Recommendation)

Key Takeaways

The three questions every memory system must answer

System 1 — Claude Code automemory (built-in)

System 2 — memarch (open-source, vector-DB, capture-everything)

System 3 — Hermes (curated agent-controlled memory)

The hybrid recommendation

Design principles surfaced

Open Questions

Related

Try It

Graph View

Table of Contents

Backlinks