Agents That Remember — Managed Agents Memory Stores + Dreaming (Kevin, Anthropic)

Source: Agents_that_remember.md — “Agents that remember” workshop by Kevin (engineer at Anthropic) at Code with Claude London 2026; YouTube https://www.youtube.com/watch?v=geUv4CjPpxI; transcript fetched 2026-05-23.

Hands-on workshop introducing two new Claude Managed Agents (CMA) primitives that solve the “agents are isolated by default” problem: memory stores (a persistent filesystem-like store mounted as a resource to sessions so agents can read and write across runs) and dreaming (an asynchronous batch process that organizes, deduplicates, fact-checks, and enriches a memory store over many sessions). Both are wired directly into the existing CMA primitives stack — agent + environment + session — and are visible in both the CLI and the CMA console. Together with session, they compose into three layers: ephemeral isolated runs, cross-session memory, and scheduled memory improvement.

Key Takeaways

Two new CMA primitives launched: memory stores and dreaming. Both are first-class CMA resources, observable in the console alongside agents, environments, and sessions.
A memory store is a persistent filesystem-like store. It mounts as a resource on sessions and the model gets tools to read and write to it. The filesystem interface is deliberate — the model uses bash to explore, grep to keyword-search, and standard file ops to read and write, which is more powerful than a bespoke memory API.
Multiple memory stores per organization. Boundaries are user-defined — per-user, per-workspace, per-domain, etc. There is no built-in opinion about how to partition them.
Memory store creation parameters: a name, an optional description, and an access field (default read-write, can be set read-only so the agent can read but not update). On session mount you can also pass a steering prompt that tells the agent what to read or write (e.g. “focus on long-term investment decisions”).
Memory files are versioned — every edit creates a new version, with API endpoints for inspection. The console exposes a filesystem viewer where humans can add, edit, or remove memory files manually (human-in-the-loop edits are first-class).
Without a memory store, two sessions in the same agent/environment do not share information. The workshop reproduces this base case first: session A is told something; session B has no access to it. Memory stores are what bridge them.
Dreaming is an asynchronous batch job. It takes (a) an input memory store and (b) a list of session transcript IDs to dream over (up to ~100 per job at launch; Anthropic is working to scale this further). The transcripts can be a daily window, a per-user slice, etc. — chosen by the caller.
Dreaming is non-destructive. It clones the input memory store into a new output memory store and writes only to the output. The input is never touched. After review the caller can retire the old input store to keep the per-org count manageable.
Dreaming runs a multi-agent harness on top of CMA primitives. The orchestrator spawns one sub-agent per input transcript and runs them in parallel. The harness is exhaustive by design — Claude looks over every transcript so it does not miss information. The dream job itself creates a CMA session you can click into for observability and debugging.
Model choice: Opus 4.7 for higher quality, Sonnet 4.6 for lower cost. Latency: minutes to hours depending on transcript count. Token cost is mitigated by an expected ~95% prompt-cache hit rate (since most processing is agentic over the same transcripts) plus a planned batch-API-style 50% discount for off-hours scheduling. Additional levers: model swap, prompt steering, explicit token budgeting.
Dreaming output: a slug-keyed index file for fast lookup, enriched files with back-filled details (dates, identifiers, schedules), deduplicated and reorganized memory files, and removal of stale information. The diff is shown in the console so a human can review.
Three composable layers: session (ephemeral isolated agent run, one conversation thread) → memory store (connects information across sessions) → dreaming (organizes, enriches, and de-stales memory at scale).

Details

The base-case problem

Today on CMA, when you create an agent + environment and spin up sessions, each session is isolated. The workshop demonstrates this with two sessions in the same agent: session A is told “the CMA talk yesterday covered multi-agent orchestration, outcomes, and memory; my notes are at this URL.” The model responds “thanks, noted.” Session B (same agent, same environment, no memory store) is then asked “what did I tell you about the CMA talk?” — the model responds “I do not have access to that information; here are ways I can help.” No transfer happens. This isolation limits CMA’s usefulness for any real-world workflow that spans more than one conversation.

Creating and mounting a memory store

CLI workflow walked through in the workshop:

Create the memory store. Required parameter is name (Kevin uses CWC memory); optional description. The store appears in the CMA console under Manage Agents → Memory Stores with status active and an empty filesystem viewer.
Mount it on a session. When creating a session via the sessions API, pass the memory_store_id. Two optional fields are available at mount time:
- prompt — steers the agent on what to read and write (e.g., “focus on investment decisions for this user”).
- access — defaults to read-write; can be set to read-only to expose the store for retrieval but block writes.
First write. When the session receives an event with new information and a memory store is mounted, the model first reads the memory store to check for existing context (typically empty on the first run). If nothing is found, it writes the new information to a file in the store. In Kevin’s demo the model creates sessions.md and saves the workshop notes there.
Recall test. A second session is created with the same memory store. When asked about the CMA talk, the model first reads the memory store (visible in the session event timeline), uses grep to search for the CMA keyword, finds the prior session’s notes, and answers correctly.
Inspection. The console’s filesystem viewer shows the directory structure (Claude organizes files into subdirectories when it chooses to). Files are clickable, editable in place, and versioned — every edit creates a new version reachable via the API. Humans can also create new memory files manually.

The deliberate design choice: mount as a filesystem rather than a memory API. The model gets bash, grep, and standard file ops — interfaces it already knows well, with no domain-specific learning curve.

Dreaming as a memory-improvement harness

When agents read and write to a memory store over time, they tend to dump information — every task may add a record, so the store grows unbounded, accumulates duplicates, and develops staleness. Dreaming is the answer.

CLI workflow:

Create a dream job. Required fields: model (claude-opus-4-7 or claude-sonnet-4-6), memory_store_id (the input store), and session_ids (the list of transcripts to dream over — caller’s choice; daily windows of 10-20 sessions are a typical pattern, up to ~100 per job at launch). Optional: additional instructions appended to the default dream prompt (e.g., “always back-fill specific dates and identifiers”, or “organize files under this directory structure”).
Execution. The dream job status starts at pending then moves to running. The console shows the input memory store, a live token count, and — critically — the dream’s own CMA session, which you can click into to see the orchestrator + sub-agent activity in real time. This observability surface is the same one CMA already provides for normal sessions.
Harness shape. The orchestrator launches one sub-agent per input session. Each sub-agent has a system prompt describing what to extract and consolidate. The orchestrator’s job is to spin them up in parallel and manage progress. Exhaustive by design — Claude is meant to look at every transcript so nothing is missed.
Non-destructive output. Dreaming clones the input memory store into a new output memory store and writes only to that clone. The input is untouched. After review, the caller can retire the old input store via API to keep per-org memory-store counts in check.
Completion. The console shows a diff of what dreaming did: typically a new index.md (slug-keyed lookup file for future agents — cheaper than wide grep), enriched versions of existing files with back-filled metadata, new files extracting structured content (e.g., an event_logistics.md with the full conference schedule), and removed stale entries.
Use the output store. Fetch the dream resource, read output_memory_store_id, and mount that on future sessions. The workshop demonstrates this: a fresh session asked “what sessions did I attend, what resources do I have links for, what follow-ups did I flag?” — with the dreamed output store mounted, the agent reads the index first, then the relevant memory files, and returns a much richer answer than the pre-dream session would have produced.

The dreaming harness itself is built on top of standard CMA primitives — there is no separate runtime. That makes it observable, debuggable, and consistent with the rest of the CMA surface.

Cost / token budget

Dreaming is intentionally exhaustive — it spends tokens to avoid missing information. The workshop covers the cost story explicitly:

~95% prompt-cache hit rate expected. Most of the work is agentic processing over the same transcripts and the same growing memory store, so cache reuse is high.
Planned batch-style 50% discount for off-hours scheduling, analogous to Anthropic’s batch API discount.
Model swap — Sonnet 4.6 instead of Opus 4.7 cuts per-token cost; quality trade-off is workload-specific.
Prompt steering — additional instructions can narrow what dreaming focuses on, reducing wasted exploration.
Token budgeting — explicit budgets can cap a dream job’s spend.

The implicit guidance: schedule dreams asynchronously (overnight, post-batch), not live during agent work, and let cache + discount + model choice absorb most of the cost.

Try It

Audit your CMA agent for cross-session information loss. Identify workflows where a user or task spans two or more sessions and information silently disappears between them. Those are your first memory-store candidates.
Run the workshop repo’s recall test. Download the Code with Claude workshop repository (URL is shown in the talk), run the bootstrap script, and walk through the no-memory base case → memory-store recall test → dreaming pipeline end-to-end with your own agent.
Try read_only access for a sensitive domain. If you want an agent to retrieve from a curated memory store (e.g., compliance facts, policy documents) but not write to it, mount with access: read_only.
Schedule a periodic dream job. Write a cron (or CMA-scheduled task) that nightly takes the last N sessions and dreams over the production memory store. Mount the output on tomorrow’s sessions; retire the prior input store. This is the workshop’s recommended production loop.
Use the console diff for human-in-the-loop review. Before retiring an input store, open the dream’s diff view in the console and confirm enrichments look correct — Kevin frames this as where human review fits most naturally.

Open Questions

Maximum memory-store size before performance degrades — the talk does not state a file count or byte ceiling.
Concurrent-write semantics across parallel sessions writing to the same memory store — versioning is mentioned, but conflict resolution for simultaneous writes is not covered.
Cost ceiling for a typical dream job at production scale (1000+ sessions/day). The 95% cache rate and 50% batch discount are stated, but no dollar figure or token bound is provided.
Versioning UI and rollback semantics — files are versioned and the console exposes a viewer, but the talk does not walk through restoring a prior version of a memory file.
Cross-organization or cross-agent memory-store sharing — the talk states stores are per-organization but multiple per org; whether stores can be shared across distinct agents within an org is not addressed.
Eviction policy when a store grows past whatever the underlying limit turns out to be — dreaming removes stale info, but a size-bounded eviction policy is not described.

Managed Agents production (Jess + Lance) — primitives deep-dive on Agent / Environment / Session / Events; memory stores and dreaming layer on top of these.
Managed Agents — Self-Hosted Sandboxes + MCP Tunnels — same primitive family, opposite axis (compute infra trust boundary vs persistent state).
Code with Claude London 2026 keynote — parent event; memory stores + dreaming are part of the same CMA launch wave.
Context Management in Claude Code — sibling concern at the single-session level; memory stores extend the same problem across sessions.
Auto Memory — Claude Code’s auto-memory feature; sister concept at a different scale (in-CLI, per-developer) vs the CMA cross-session primitive.
The Capability Curve — Jeremy’s adoption framing; memory + dreaming move agents up the curve toward longer-horizon, multi-session work.
Hermes MemoryKit — community-built 8-layer memory stack; orthogonal architecture (rich local layering) to Anthropic’s CMA approach (filesystem + dream harness).

Jonathon's AI Wiki

Explorer

Agents That Remember — Managed Agents Memory Stores + Dreaming (Kevin, Anthropic)

Key Takeaways

Details

The base-case problem

Creating and mounting a memory store

Dreaming as a memory-improvement harness

Cost / token budget

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Agents That Remember — Managed Agents Memory Stores + Dreaming (Kevin, Anthropic)

Key Takeaways

Details

The base-case problem

Creating and mounting a memory store

Dreaming as a memory-improvement harness

Cost / token budget

Try It

Open Questions

Related

Graph View

Table of Contents

Backlinks