Source: raw/gh-star-rohitg00-agentmemory.md (gh-stars puller stub) + ai-research/rohitg00-agentmemory-readme-2026-05-30.md (WebFetch-derived README summary, fetched 2026-05-30) + raw/The_5-Tool_Fix_for_Claude_Code_s_Worst_Habits.md (operator demo, video hqcZZuvBUSY)

agentmemory (github.com/rohitg00/agentmemory, Apache-2.0, TypeScript, ~19.8K stars) is a persistent-memory server for AI coding agents. It listens to your agent sessions in the background, compresses what happened into searchable memory, and injects the relevant pieces back into future sessions — so you stop re-explaining your project’s architecture, conventions, and past decisions every time you start fresh. It clears the wiki’s repo bar on the substance that usually trips up high-star projects: Apache-2.0 license, 950+ passing tests, a reproducible benchmark harness, and 46 releases of maintenance history.

Key Takeaways

  • What it replaces: the CLAUDE.md + rules/ folder pattern, but as a living layer. Instead of hand-maintaining static memory files, agentmemory accumulates and decays memory automatically from session activity.
  • Four memory tiers (the same shape Hermes and OpenClaw use, productized as a drop-in server): working (raw tool-use observations), episodic (compressed session summaries), semantic (extracted facts/patterns about your project), procedural (workflows + decision patterns). Memories decay — infrequently-used ones lose retrieval priority, so stale context (e.g. a framework quirk fixed in a newer model) auto-evicts.
  • Retrieval is hybrid, not just vector. Three indexes — BM25 keyword, vector similarity, and knowledge-graph traversal — fused via Reciprocal Rank Fusion, then the top results are injected at SessionStart. This is the same RRF + multi-stream pattern the wiki’s stronger memory toolkits converge on.
  • Pipeline: Capture (hooks intercept tool events, filter secrets) → Compress (LLM summarization, or synthetic BM25 compression with no LLM) → Index (3 streams) → Retrieve (RRF + SessionStart injection).
  • Benchmarks (project’s own, reproducible): 95.2% R@5 on LongMemEval-S (ICLR 2025, 500 questions) vs grep’s 86.2%; ~170K tokens/year vs ~650K for LLM-summarized competitors; 2.2× precision over grep; 14 ms p50. Harness in eval/README.md, scorecards in docs/benchmarks/. ^[inferred — the “#1 persistent memory” claim rests on selected metrics; competitors lead on some dimensions, per the repo’s own honesty note]
  • No infra tax. Built on an in-house iii function-trigger-worker primitive that replaces Express/SQLite/pm2/Prometheus; only dependency is SQLite + the iii engine. Deployment templates for Fly.io, Railway, Render, Coolify.
  • Works across 15+ agents via MCP, native plugins, or REST API — Claude Code, Codex, Cursor, Copilot among the named topics. Real-time viewer at localhost:3113; server on 3111.

Why it matters

The wiki has tracked the “memory is the agent bottleneck” thesis from several angles — Memory & Dreaming, Managed Agents memory stores, and the build-your-own memory architecture comparison. agentmemory is the off-the-shelf, benchmarked, multi-agent answer in that space: you don’t design a memory stack, you npm install one and connect it. It is architecturally a sibling to Reflexio and AutoAgent (harness extracts a reusable artifact from past runs) and to Hermes MemoryKit (RRF router + tiered memory) — but unlike those early/solo repos, it ships with a peer-reviewed benchmark and 950+ tests, which is why it clears the strict repo bar where similar high-star projects get deferred.

Schema discipline — the structure counterpoint (Graphiti / Zep)

agentmemory’s lever is decay — solving what to forget. A complementary school argues the harder problem is what to never store, and that the fix is structure, not retrieval. ^[source: @akshay_pachaar thread, raw/x-bookmarks-recent-digest-2026-05-31.md, promoting Zep AI’s open-source Graphiti] The failure mode it targets: the default pipeline hands an LLM raw text and lets it pick entity types, labels, and relationships on its own — so the knowledge graph “behaves like an expensive vector store,” entity types collapse into generic labels, and every relationship flattens into a single RELATES_TO. The graph holds the data but no query reaches it with precision.

The proposed fix is to constrain the output space before generation, not after:

  • Entities define what the agent is allowed to remember — Pydantic models with typed fields + descriptive docstrings replace the LLM’s guesswork with domain vocabulary.
  • Edges define how things connect — source/target constraints mean an invalid relationship (e.g. Project → Competitor with no such edge in the schema) simply cannot form.
  • Temporal resolution separates what was true from what is true — fact resolution invalidates outdated edges while preserving history, so the graph never silently serves stale state. (This is the same hazard agentmemory’s decay addresses, reached from the opposite direction.)
  • Heuristic: ~10 entity types, 10 edge types, 10 fields per type — start with 3–4 of each and expand only when retrieval fails. Forces modeling the 80% that matters instead of chasing completeness.

Zep AI’s Graphiti packages this as a fully open-source temporal knowledge-graph library (Pydantic-based ontology, schema-guided extraction at both entity- and fact-extraction points, entity/fact resolution, temporal windowing). It is the named “schema-first” alternative to agentmemory’s “capture-then-decay” approach — worth comparing directly if your memory needs domain-specific query precision rather than general session recall.

Implementation

  • Tool/Service: agentmemorygithub.com/rohitg00/agentmemory, npm @agentmemory/agentmemory, homepage agent-memory.dev.
  • Setup: npm install -g @agentmemory/agentmemory → run agentmemory (server on :3111) → agentmemory connect claude-code (wires the MCP server). Viewer at localhost:3113.
  • Cost: free / open-source (Apache-2.0). LLM-compression mode incurs model cost; the synthetic BM25 compression mode runs with no LLM if you want zero marginal cost.
  • Integration notes: MCP, native plugin, or REST API; 15+ agents supported. Memory decay is automatic — no manual pruning. Secrets are filtered at the capture hook, but treat the local memory store as sensitive (it mirrors your session content).

Try It

  1. Install and connect to one project’s Claude Code: npm i -g @agentmemory/agentmemory, agentmemory, then agentmemory connect claude-code. Run a normal session and watch the localhost:3113 viewer populate.
  2. Run the same recurring task across two fresh sessions (e.g. “add a setting + persist it”) and check whether the second session skips the re-discovery work the first did — that’s the value test.
  3. If you want zero marginal cost, configure the BM25 (no-LLM) compression mode and compare retrieval quality against the LLM mode on your own repo.
  4. Vet before depending on it: clone the eval/ harness and reproduce a slice of the LongMemEval-S numbers rather than taking the “#1” claim at face value.

Open Questions

  • The “#1 persistent memory … based on real-world benchmarks” claim rests on LongMemEval-S R@5 + token-efficiency; independent reproduction (and head-to-heads vs Mem0/Zep/Letta on the same harness) would confirm or temper it.
  • The iii runtime primitive (replacing Express/SQLite/pm2/Prometheus) is novel and load-bearing — its maturity and failure modes under concurrent multi-agent load are unverified here.
  • Privacy posture of the local store under team/shared use (the capture hook filters secrets, but the store mirrors session content) — worth a closer look before connecting it to client-data sessions.