Source: raw/3_Claude_Memory_Systems_to_Get_You_Ahead_of_99%_of_People.md — YouTube tutorial rFWxRZ5D-lM (creator inferred to be the Agentic Academy / Ben channel based on identical “skill systems” terminology + commercial pitch present across the skill-chaining and five-skill AIOS videos — see Open Questions).
Side-by-side decomposition of three Claude Code memory systems against three load-bearing questions every memory system must answer — storage (when/how does information get written?), injection (when/how does it get loaded into context?), and recall (when/how does past information get retrieved?). The video’s thesis: don’t pick one — combine Claude Code automemory + memarch’s vector-DB capture-everything + Hermes’s curated memory.md + user.md + soul.md frozen snapshot, with memory.md acting as a level-0 cheap context check before falling through to memarch’s three-tier retrieval. Refines and extends prior coverage in memory-and-dreaming-self-learning-agents-talk and Hermes Agent.
Key Takeaways
The three questions every memory system must answer
- Storage — when a user says something worth remembering (“our landing page is example.com/scrapes”, “we’re using Stripe not PayPal”), how does it get saved durably and reliably?
- Injection — how does important context get pushed into Claude’s window at session start, without bloating the prompt?
- Recall — when the user asks about something months old (“what did we decide about pricing last Tuesday?”), how does the system find it?
The three systems take very different approaches on each axis.
System 1 — Claude Code automemory (built-in)
- Storage: Silent auto-detection during conversations. Writes to per-project
.claude/projects/<id>/memory/MD files with amemory.mdindex pointing to them. If the same preference appears 3+ times across projects, it gets promoted to~/.claude/memory/. - Inspect locally: Run
/memoryin the Claude Code terminal — shows user memory (CLAUDE.md), project memory (CLAUDE.md), or the automemory folder. - Injection: Default
CLAUDE.mdinjection at session start + apre-tool-usehook that scans thememory.mdindex and pulls relevant files into context before tool calls. - Recall: Weak. Checks automemory files; if not there, it’s lost. The
--resumeflag works only if you know the session ID. - Capacity: “Barely saves a thing.” Selective — only triggers on what Claude thinks is important.
System 2 — memarch (open-source, vector-DB, capture-everything)
- Storage: Claude Code stop hook fires after every turn → calls Haiku to summarize the turn into bullets → appends to a dated memory file with session anchors. Periodically runs
memarch index(or manually) → chunks → hashes → embeds → stores in a local Mil vector DB on CPU (zero API cost). - Source-of-truth: markdown files. The vector DB is rebuildable from them.
- Injection: None — relies on Claude Code’s default
CLAUDE.md+memory.mdinjection. memarch is purely a storage + recall library. - Recall (the three-tier system, progressive disclosure):
memarch search— convert query to vectors, find semantic matches in the vector DB. Hybrid dense vectors (meaning) + BM25 (keyword). Returns closest matches.memarch expand— if tier 1 isn’t enough, returns more context + metadata + summary around each match.- Raw session dialogue — if still insufficient, retrieves full session transcripts. Token-heavy but exhaustive.
- Strength: Excellent for “find a needle in 6 months of conversations by meaning, not keyword.”
- Weakness: No injection layer — always hits the database, slower than checking local context.
System 3 — Hermes (curated agent-controlled memory)
- Storage: Agent has
add/replace/removetools on three files —memory.md(environment + things done),user.md(user profile, voice, preferences),soul.md(third file mentioned, contents not detailed in transcript). All three are character-capped to enforce consolidation. Built-in deduplication checks before writes. - Background pruning: Every turn auto-saves raw transcript to a database. Every 7 days a “curator” runs to consolidate + prune the raw transcripts down into the three MD files.
- Injection: At session start, loads a frozen snapshot of
CLAUDE.md+memory.md+user.md+soul.md— ~1,300 tokens cached per session, not per message. Subsequent in-session changes get written to disk but don’t update the current session’s snapshot (apply on next session). - Recall: Smart level-0 check — first searches the already-loaded
memory.mdin context (zero cost, instant) before dropping to database keyword search. Database tier returns top 3 sessions by relevance + Gemini Flash summary. - Strength: Cheap retrieval for in-context-recall + curated consolidation.
- Weakness: Keyword-only recall (no semantic vectors), depends on agent deciding what’s worth saving.
The hybrid recommendation
Combine all three:
| Layer | Borrowed from | What it does |
|---|---|---|
| Storage — automatic raw | memarch stop hook | Captures every turn into vector DB + raw transcripts |
| Storage — curated | Hermes memory.md + user.md | Agent decides what matters, character-capped |
| Injection | Hermes frozen snapshot | CLAUDE.md + memory.md + user.md + soul.md at session start (~1,300 tokens) |
| Recall L0 | Hermes context check | Search already-loaded memory.md first (instant, zero-cost) |
| Recall L1 | memarch hybrid search | Vector + BM25 semantic search of the vector DB |
| Recall L2 | memarch expand | More context around each match |
| Recall L3 | memarch raw dialogue | Full session transcripts as last resort |
Cost: ~1,300 tokens per session for the injection layer. Worth it vs. retrieval-only setups, per the creator’s recommendation, “the increased performance from recent consolidated memories is worth it.”
Design principles surfaced
- Markdown as source of truth, vectors as derived state. Both memarch and Hermes treat MD as authoritative; vector DBs / databases are rebuildable.
- Progressive disclosure for recall. Don’t always hit the heaviest layer first. Level-0 (context) → Level-1 (semantic search) → Level-2 (expanded match) → Level-3 (raw dialogue).
- Character caps enforce consolidation. Hermes’s cap is a feature, not a constraint — it forces curation.
- Keep
CLAUDE.mdunder ~200 lines. The transcript explicitly calls this out as best practice for the system-prompt cost.
Open Questions
- Creator identity — the transcript names no channel. Style + commercial pitches (“Agentic Academy”, “AI accelerator”, “skill systems”) match the same author as skill-systems-orchestrator-child-pattern and ben-five-skill-aios-setup — likely Ben (channel attribution carry-forward from those companion videos).
- memarch repo URL and license — not linked in transcript. Mentioned by name but pointer to GitHub / install instructions not in the transcript.
soul.mdcontents — third Hermes memory file mentioned but never detailed. Likely a higher-order persona/values file (see hermes-user-stories for adjacent context).- Mil vector DB — referenced as the local CPU vector store. May be a typo for Milvus or another OSS vector DB. Verification needed.
Related
- memory-and-dreaming-self-learning-agents-talk — original Anthropic memory + dreaming framework reference
- _index — Hermes Agent topic root, including memory model context
- hermes-user-stories — concrete Hermes deployment patterns
- hermes-security-model — defense-in-depth around persistent state
- skill-systems-orchestrator-child-pattern — companion video by same creator, skill architecture
- ben-five-skill-aios-setup — companion video by same creator, full AIOS 5-skill bundle
- nate-herk-claude-code-operating-systems-course — alternative AIOS framing (3Ms + 4Cs, AIS-OS repo)
- _index — the Karpathy LLM-wiki pattern referenced in YT5’s optimizer skill
Try It
- Inspect your existing automemory. Run
/memoryin Claude Code → open the automemory folder. Note how sparse it is by default. This is the baseline. - Install memarch (once URL is verified) for storage + recall — capture-everything stop hook + local vector DB rebuildable from markdown.
- Adopt Hermes’s three-file pattern even without running Hermes — create
memory.md(environment),user.md(preferences), optionallysoul.md(higher-order values). Cap each at a character budget. Inject at session start viaCLAUDE.mdreferences. - Build the level-0 check. Before querying any vector DB or session log, instruct the agent (in
CLAUDE.md) to first scan the already-loadedmemory.md— instant zero-cost retrieval for ~30% of recall queries. - Run a 7-day curator. Schedule a routine (or Claude Routine, see claude-code-routines) that consolidates raw transcripts into the three MD files weekly. Mirrors Hermes’s curator pattern.
- Audit
CLAUDE.mdlength. Keep it under ~200 lines; offload detail tomemory.mdfiles that get pulled in by thepre-tool-usehook only when needed.