Memory and Dreaming for Self-Learning Agents (Mahes / Anthropic, Code with Claude 2026)

Source: Memory and dreaming for self-learning agents (Anthropic / Mahes — PM, Platform team — Code with Claude 2026, May 7 2026 conference talk, ~10 min, YouTube RtywqDFBYnQ)

Standalone talk by Mahes (Product Manager on Anthropic’s Platform team — same team that shipped MCP and Skills) covering two announcements stacked together: Memory in Claude Managed Agents (public beta launched a few weeks before the talk), and Dreaming (research preview in Managed Agents API, launching live during the talk). The thesis: memory is the next agent primitive — the missing piece between MCP-augmented agents and continuously self-improving agents that get better at their job day-by-day. Two named customer outcomes anchor the substantive claims: Rakuten — 90% drop in first-pass mistakes with memory; Harvey — 6× increase in task completion rate on a legal benchmark with Dreaming. Goes well beyond what the Code with Claude 2026 keynote covered on either feature; this is the canonical engineering-detail source.

Key Takeaways

Memory is the next primitive after MCP and Skills. Mahes situates it explicitly: MCP gave agents access to external tools and data; Claude Code + Agent SDK gave them powerful harnesses; Skills (October launch) let them pick up brand-new capabilities from other agents or users. Memory is what unlocks continuous self-learning and context management over long-horizon tasks — letting agents learn about success criteria, common mistakes, working strategies, environment-specific knowledge, and (most ambitiously) learn from other agents in the same environment.
Memory is modeled as a file system — not a key-value store, not a vector DB. Same design rationale as Skills: agents already manage virtual environments and file systems competently, so model memory the same way. Files with hierarchy + format. Claude uses familiar bash and grep tools to read/write/organize. Opus 4.7 is state-of-the-art at file-system-based memory (claim) — better at discerning what’s worth remembering, structuring it across files, keeping it organized.
Three layers Anthropic identified for a frontier memory system.
1. Storage layer — where data lives, what attribution metadata sits alongside it.
2. Structure-and-content layer — file-system model + Skills-as-procedural-memory.
3. Process layer — how often memory updates, what triggers updates, what sources decide changes. Memory API solves layers 1-2; Dreaming solves layer 3.
Rakuten outcome. “Dropped first-pass mistakes in their internal knowledge agents by 90%” because agents catch mistakes and share them with the next iteration of agents. Side effect: better token efficiency and lower cost + better latency from memory deployment (less re-investigation of solved problems).
Memory primitive — four enterprise-grade properties.
- Permission scopes. One agent can have read-only access to one memory store and read-write to another. Demo: SRE agent has read-only access to org-wide knowledge / runbooks / SLO guidelines; read-write access to its own SRE working-memory store; read access to the codebase memory store.
- Optimistic concurrency. With hundreds-to-thousands of agents reading/writing the same memory, agents use a content hash precondition to verify they’re not clobbering another agent’s update before applying their own. (Same pattern as ETag / If-Match in HTTP.)
- Version history + attribution. Full audit log of every memory update, with attribution metadata: which agent made the change, when, which session. Agents can be given access to the audit log too — not just developers. “Most sought-after property” per Mahes after talking to customers.
- Standalone API. Portable. Customers wanted to do their own PII scanning, custom cleanup pipelines, cloning into external systems. Anthropic deliberately did not lock memory inside the Managed Agents harness.
Dreaming — the new primitive launched live in the talk. “A process that looks for patterns and mistakes across your recent agent sessions and their transcripts and automatically produces organized and up-to-date memory content.” Research preview in Managed Agents API.
Dreaming is async, batch, and out-of-band. Doesn’t run inside a session. Triggered three ways:
1. Cron-style schedule via Console or API.
2. Plugged into existing pipelines — kick off when an agent finishes a task and is spinning down (“save those learnings before exit”).
3. Manual via the Claude Console UI. Out-of-band design is deliberate: keeps the hot path fast (no latency added to active sessions), separates memory-quality objective from task-completion objective, lets the dreaming agent see across multiple agents’ sessions for shared patterns no single agent could detect from its own perspective.
Harvey outcome. Deployed Dreaming on one of their legal benchmarks. Saw a 6× increase in task completion rate on a “pretty realistic legal scenario.” Concrete, named, dramatic.
Dreaming output is a diff applied to a memory store. From the live demo (an SRE-agent fleet handling P1 alerts):
- Pattern discovery: “a bunch of these agents were triggered exactly 60 seconds after an upstream CPU spike → there’s likely retry logic that’s inefficient.” No single agent saw the pattern; Dreaming did.
- Deduplication and curation: 5 redundant memory entries collapsed to 1.
- Stale-entry removal: caught one entry no longer valid based on transcript evidence.
- Verification backfill: appends a “verified at this time based on this transcript” note so downstream agents can rely on it tomorrow.
Test-time-compute analogy for memory quality. Mahes frames Dreaming using the same scaling-law shape as test-time compute / thinking models: let an agent spend more tokens on memory upkeep to get better downstream outcomes. Memory becomes a dedicated objective separate from task completion — “memory is going to be increasingly load-bearing.”
Search-index analogy. Dreaming is the upfront-effort step that produces the high-quality, up-to-date index; downstream agents reading from the memory store get to amortize that effort across many retrievals. “We can amortize this effort across all those agents that are reading from a memory store.”
The frontier memory system as Anthropic now sees it. Memory (real-time read/write during a session) on the left; Dreaming (comprehensive batch process to verify, organize, enrich memory) on the right. Dreaming is the bridge between intermediate per-task memory and large-scale shared knowledge bases that Anthropic expects to see across enterprise multi-agent fleets.

Where it fits in the wiki

Refreshes Claude Dreaming. The existing article was written from the keynote’s brief Dreaming demo (Caitlin’s drone-landing playbook, ~5 minutes of stage time). This talk is the engineering deep-dive — adds the Harvey 6× number, the file-system memory model, the optimistic-concurrency mechanic, the version-history requirement, the SRE demo, and the test-time-compute / search-index framing. Refreshing claude-dreaming.md with a “Memory + Dreaming as one system” section is the right downstream move.
Slots into the Code with Claude 2026 keynote tree. The keynote was the umbrella; this talk is a specific deep-dive session at the same conference. Other deep-dive sessions from the conference would each get their own article and link forward from the keynote.
Pairs with Claude Managed Agents as the canonical Memory + Dreaming reference. Memory shipped in public beta a few weeks before the talk; Dreaming shipped in research preview during the talk. Both are inside the Managed Agents API.
Composes with Managed Agents cookbook coverage. The cookbook ships the multiagent + outcomes patterns; memory is the substrate they both write to / read from. A memory-aware variant of the multi-agent coordinator pattern is the next likely cookbook addition.
Reframes the Skills story. Mahes calls Skills “procedural memory with a lightweight spec.” Memory + Dreaming + Skills now form a stacked memory taxonomy: Skills are how agents acquire reusable capabilities; Memory is how they accumulate per-task / per-environment knowledge; Dreaming is how they consolidate it.

Implementation

Tool/Service: Memory + Dreaming inside Claude Managed Agents API.
- Memory: public beta (launched a few weeks before May 7 2026).
- Dreaming: research preview (launched May 7 2026).
Setup:
- Memory: enable in Managed Agents API. Stores accessed via standard tools (bash, grep) or via the standalone Memory API for out-of-harness curation.
- Dreaming: kick off via Claude Console (UI), via Managed Agents API (cron-able), or via end-of-task hook in your existing harness.
Cost: Pay-as-you-go — Dreaming spends additional tokens on memory upkeep (test-time-compute-style). Vendor pitch: amortized across many retrievals from the same memory store, so per-retrieval cost falls.
Integration notes:
- Permission scopes — set per agent per memory store (read-only, read-write, no-access). Demo’s three-store SRE pattern (org-wide read-only + service-specific read-write + codebase context) is the recommended starting shape for multi-agent fleets.
- Optimistic concurrency — content-hash preconditions on every write, so simultaneous agents can’t silently overwrite each other.
- Version history — full audit log; metadata: which agent, when, which session. Roll back / inspect / give agents access for “what changed and why.”
- Standalone API — portable; bring your own PII scanner, cleanup pipeline, archival/clone destination.
- Dreaming triggers — recommended pattern is “kick off when an agent finishes a task and is spinning down” so learnings don’t sit unwritten.
- Async / out-of-band — does not block active sessions, does not add latency to the hot path.
Demo workflow (SRE agent fleet, paraphrased from the talk):
1. P1 alert from dispatch service → spin up SRE agent A with access to three memory stores (org-wide RO, SRE RW, codebase RO).
2. SRE agent A investigates CPU utilization, traffic patterns, recent PRs. Writes findings to SRE memory store (RW).
3. Same alert pages a few minutes later → SRE agent B spins up. Reads SRE memory store first. Sees note from A. Short-circuits investigation. Token efficiency + intelligence gain without code changes.
4. Overnight: kick off Dreaming on the SRE memory store with the past 7 days of sessions. Dreaming agent spins up sub-agents to look through transcripts, identifies the 60-second-after-CPU-spike pattern, deduplicates 5 redundant entries → 1, removes 1 stale entry, adds verification note. Diff applied to memory store.
5. Next day’s SRE agents start with a richer, deduped, verified memory store.

Open Questions

Dreaming token cost at scale. Mahes invokes the test-time-compute analogy — but doesn’t quantify how much compute Dreaming spends per session it considers, or per memory-store update. Worth tracking once usage data emerges.
Inter-agent memory contamination. Optimistic concurrency prevents silent overwrites, but doesn’t prevent one agent writing wrong/misleading content into a shared memory store and Dreaming consolidating it. What’s the abuse / bad-actor / poisoned-memory threat model?
Memory portability across model versions. If Memory is file-system-modeled and Opus 4.7 is “state-of-the-art at file-system memory,” what happens when Opus 5 ships? Is the memory format model-independent or do you re-Dream when models change?
Dreaming + Skills overlap. Skills are “procedural memory with a lightweight spec.” Dreaming reflects on past sessions and updates a memory store. When does a learning surface as a Skill (durable, portable, version-controlled) vs as Memory (per-environment, dynamic, agent-managed)? The Mahes framing implies they coexist; the harness boundary is where the next clarification belongs.
Standalone Memory API surface. “Customers do PII scanning, cleanup, cloning.” What auth model — does the Memory API support tenant-scoped tokens, organization-wide tokens, etc.? Not addressed in the talk.
Per-customer Dream-isolation. Dreaming reflects on agent-session transcripts. For multi-tenant SaaS deployments built on Managed Agents, what’s the boundary that prevents one tenant’s transcripts from contaminating another’s memory store via Dreaming? Implicitly, customer-managed memory stores are the boundary, but worth verifying in the docs.

Try It

Watch the talk (YouTube RtywqDFBYnQ, ~10 min). Mahes goes through the SRE-agent demo around 8:00 — short and concrete.
Read the existing Claude Managed Agents article first for the API surface that Memory + Dreaming live inside.
Try Memory in Managed Agents if you have access. Start with the three-store SRE pattern (org-wide RO + service-specific RW + codebase RO) since that’s the demo’s reference shape and the permission-scope mechanic is the easy lever for multi-agent fleets.
Schedule one Dreaming run on a memory store after a multi-session experiment. Look at the diff: is the deduplication useful? Are stale-entry removals correct? Is the verification-note pattern usable downstream?
For agentic CI scenarios specifically, the Mahes “trigger Dreaming when agent spins down” pattern is a clean fit — wire it into the harness’s session-end hook.
Pair with the Managed Agents cookbook patterns. Multi-agent + outcomes generate more diverse session traces; Memory absorbs them; Dreaming consolidates them. The three primitives together are the “self-learning agent” loop the talk describes.
Skim the Karpathy “Vibe Coding to Agentic Engineering” talk for the broader frame — Karpathy’s “LLM knowledge bases as understanding tools” thesis is the user-side mirror of what Mahes is building agent-side.

Claude Dreaming — entity article (refreshed alongside this ingest)
Code with Claude 2026 — Opening Keynote — umbrella talk; this is one of its deep-dives
Claude Managed Agents — the API surface
Managed Agents cookbook (multiagent + outcomes)
Agent Skills Overview — Skills as procedural memory
skills repo
Opus 4.7 Best Practices — Opus 4.7 cited as state-of-the-art at file-system memory
Karpathy — From Vibe Coding to Agentic Engineering
2026 Claude Code AIOS Pattern

Jonathon's AI Wiki

Explorer

Memory and Dreaming for Self-Learning Agents (Mahes / Anthropic, Code with Claude 2026)

Key Takeaways

Where it fits in the wiki

Implementation

Open Questions

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Memory and Dreaming for Self-Learning Agents (Mahes / Anthropic, Code with Claude 2026)

Key Takeaways

Where it fits in the wiki

Implementation

Open Questions

Try It

Related

Graph View

Table of Contents

Backlinks