Claude Dreaming — Self-Learning Managed Agents

Source: Code with Claude 2026 — Opening Keynote (Anthropic, May 7 2026) + Anthropic’s New Dreams Feature Just Changed Everything (Lewis / creator interpretation, May 7 2026) + Memory and dreaming for self-learning agents (Mahes / Anthropic Platform PM, Code with Claude 2026 deep-dive talk — added 2026-05-08, canonical engineering-detail source, see the talk’s article).

Dreaming is one of three new primitives shipped with Claude Managed Agents at Code with Claude 2026 (alongside multi-agent orchestration and outcomes). It lets a managed agent inspect its own previous sessions, identify skills missed and lessons that should have been learned, and write those learnings to a memory store that subsequent sessions reference. The keynote framed it as “self-learning” — Caitlin Burke pressed a single Dream button in the Cloud Developer Console and an overnight Dream produced a “descent playbook” of cross-mission heuristics that lifted hill-climb performance on an in-app drone-landing benchmark.

Key Takeaways

The premise. A managed agent has run N sessions. Some succeeded, some hill-climbed only partially, some failed. Dreaming replays those sessions, asks “what skills should have been learned, what lessons were missed?”, and writes the answer to a memory store. New sessions reference that memory store and ride the captured heuristics.
One-button activation in the Cloud Developer Console. The keynote demo: Caitlin clicked Dream, chose a memory store, the dreaming agent ran overnight, the dashboard showed what was written to memory. No code change to the managed-agent harness.
The output looks like a playbook, not a fine-tune. The drone-landing demo’s overnight Dream produced a descent playbook with mission-derived heuristics — readable text, not weights. Subsequent sessions retrieved the playbook from memory at inference time.
It’s distinct from outcomes. Outcomes is per-run grading + iteration against a rubric. Dreaming is across-runs reflection that compresses a population of sessions into reusable skill / heuristic memory. Both can stack: a session graded by outcomes contributes to the memory store a Dream cycle reflects on.
It complements multi-agent orchestration. A multi-agent fleet generates more session diversity → more Dream-able experience → faster hill-climb. The keynote shows all three new primitives stacked on the Lumara drone demo.
Memory is portable. Caitlin reiterated the broader Managed Agents promise: “memory is ultimately yours — take it wherever you’d like.” The Dream’s output goes into the memory store the customer owns.
One creator-interpretation framing: “back-testing 2.0.” Lewis (independent creator video) frames Dreaming as agentic-back-testing — instead of replaying a strategy on historical data, the agent reflects on the why behind past good vs. bad decisions and extracts patterns. Useful framing for trading / investing use cases; the keynote’s framing is more general (self-learning across sessions).
Use case landscape (creator-extrapolated, not Anthropic-stated): trading-strategy iteration on past trades; personal-blueprint analysis (food log + bloodwork + sleep + glucose data) where Dream surfaces “kiwis correlate with worse next-day performance”; sales / lead-gen where Dream surfaces what kinds of leads close vs not. All on top of multi-session memory the agent already has.
Confidence note. Mechanics are documented from the keynote (canonical source). Feature naming — “Dreaming” vs “Dreams” — varies between the keynote (consistent “Dreaming”) and the Lewis video (uses both “Dreams feature” and “dreaming”). The Anthropic platform docs at time of writing use “Dreaming” as the canonical noun.

Engineering details (added 2026-05-08 from Mahes talk)

The keynote demo was the storytelling cut. Mahes’ Memory and dreaming for self-learning agents talk is the engineering deep-dive — see the dedicated article for the full breakdown. Concrete additions:

Memory + Dreaming are one system, two layers. Memory (real-time read/write during a session) is the substrate; Dreaming (out-of-band batch consolidation across sessions) is the bridge to large-scale shared knowledge bases. The frontier-memory-system framing has three layers: storage (data + attribution metadata), structure-and-content (file-system model + Skills-as-procedural-memory), process (when memory updates, what triggers them, what sources). Memory API solves layers 1-2; Dreaming solves layer 3.
Memory is a file system, not a key-value store or vector DB. Same design rationale as Skills. Files with hierarchy + format. Claude reads/writes via familiar bash and grep tools. Opus 4.7 is “state-of-the-art at file-system-based memory” per Mahes — better at deciding what’s worth remembering, structuring across files, keeping organized.
Dreaming is async / batch / out-of-band by design. Three triggers: cron via Console or API; plugged into existing pipelines (recommended pattern: kick off when an agent finishes a task and is spinning down); manual via Console UI. Out-of-band lets it see across multiple agents’ sessions for shared patterns no single agent could detect from its own perspective. Also: no latency added to active sessions, separates memory-quality objective from task-completion objective.
Customer outcomes Mahes named.
- Rakuten — 90% drop in first-pass mistakes in internal knowledge agents (Memory). Side effect: better token efficiency and lower cost + better latency from less re-investigation of solved problems.
- Harvey — 6× increase in task completion rate on a legal benchmark (Dreaming).
Four enterprise-grade Memory properties.
- Permission scopes — read-only / read-write / no-access per agent per memory store. Reference shape from the SRE demo: org-wide knowledge RO + service-specific RW + codebase RO.
- Optimistic concurrency — content-hash precondition on every write (ETag / If-Match equivalent) prevents silent clobbering across hundreds-to-thousands of concurrent agents.
- Version history + attribution — full audit log: which agent, when, which session. Optionally accessible to agents themselves, not just developers.
- Standalone API — portable. Customers run their own PII scanning, cleanup pipelines, archival, cloning. No harness lock-in.
What Dreaming actually outputs (from the SRE-agent demo): a diff applied to a memory store. The demo’s diff did four things: pattern discovery (identified a 60-second post-CPU-spike retry pattern no single agent had noticed), deduplication (5 redundant entries → 1), stale-entry removal (1 entry no longer valid per transcript evidence), verification backfill (appends “verified at this time based on this transcript” notes downstream agents can rely on).
Test-time-compute analogy for memory quality. Mahes frames Dreaming using the same scaling-law shape as test-time compute / thinking models: let an agent spend more tokens on memory upkeep to get better downstream outcomes. The search-index analogy is even cleaner — Dreaming is the upfront-effort step producing a high-quality, up-to-date index that downstream agents amortize across many retrievals.
Memory shipped public beta a few weeks before May 7. Dreaming shipped research preview during the May 7 talk — both inside the Managed Agents API.

What Dreaming actually does

The keynote’s mechanic, per Caitlin’s narration during the live demo:

Choose a memory store. The dreaming agent will read its past sessions and write learnings to this store. (One memory store per project / per use case in the demo.)
The dreaming agent runs. It “looks over all of those past simulation sessions” — replays the trajectory and outcome of each — and identifies skills it should add or lessons it should remember.
It writes to memory. The output is text-based artifacts — in the demo, an explicit “descent playbook” labeled and written by the agent itself.
New sessions retrieve from memory. Subsequent runs of the managed agent reference the dreamed memory at inference time. In the demo, the dashboard showed sites 3 and 4 (which had previously failed in the post-multi-agent-and-outcomes hill climb) succeeded after the overnight Dream — without regressing on any sites that had previously succeeded.

Demo trajectory (what was hill-climbed):

Pass	Setup	Result on 6-site benchmark
1	Multi-agent (commander → detector + navigator) + outcomes (rubric: soft touchdown, clear ground, return-to-Earth fuel)	4 / 6 sites solved; sites 3 + 4 failed
2	Same setup + overnight Dream produced a descent playbook in memory	All 6 sites solved; no regression

The improvement from one Dream press was sufficient that Boris Cherny’s later Claude-Code segment of the same keynote explicitly framed routines as “the same hill-climbing mechanic for code.”

Why it matters

It’s a self-improvement primitive at the platform layer. Anthropic shipped self-improving-agent patterns previously as harnessed practitioner code (skill-folder learnings folded into SKILL.md by end-of-day wrap-ups, see 2026 Claude Code AIOS Pattern). Dreaming compresses that same idea into one button against a managed agent — no skill engineering, no learnings.md file, no wrap-up routine. It’s the platform-layer equivalent of what Simon Scrapes / Brandon Storey / Nate Herk built by hand in Claude Code.
It rewires the eval loop. Outcomes drives within-session iteration; Dreaming drives across-session reflection. Together they create a two-tier iteration loop where every session both contributes to and benefits from a growing memory of what works.
It changes the cost model of “more agents.” Multi-agent orchestration generates more sessions (more parallel context windows, more inference). Dreaming compresses those sessions into reusable text-memory the next batch references — turning what would be a token-cost multiplier into a token-cost amortizer.

Try It

Wait for the docs. Dreaming was demoed in the keynote; docs and SDK surface should land in the next SDK release wave. The cookbook (Multiagent + Outcomes coverage) currently shows multiagent + outcomes; a CMA_dream_* notebook is the predictable next entry.
Stand up a small Managed Agent + memory store. Once the API surface is documented, ship a managed agent on a use case with multi-run memory (e.g., issue-triage, content-grading, investment-research, sales-outreach). Run 10+ sessions to build session inventory the Dream can reflect over.
Compare Dream-on vs Dream-off. A simple A/B: same agent, two memory stores, only one gets a Dream cycle between batches. Measure quality + iteration count per session post-Dream. The keynote demo shows visible hill-climb after one Dream; replicate or refute on your eval.
Read the full keynote article for the Multiagent + Outcomes context Dreaming sits in.
Watch the Lewis “Dreams Feature” video for one creator’s extrapolation to trading and personal-data use cases — useful for stretching the framing beyond Anthropic’s drone-landing demo, while keeping the speculation flagged as creator-derived.

Open Questions

Cost model. ~~Per-Dream cost vs continuous background process. Keynote demo presents a button-press, run-overnight pattern but doesn’t address billing.~~ Resolved 2026-05-09 via Anthropic’s official pricing page (platform.claude.com/docs/en/about-claude/pricing): Dreaming has no separate line item. It runs as a Managed Agents session and is billed under the same dual-axis model — tokens at standard model rates (with prompt-caching multipliers carrying over identically) **+ $0.08/ sess i o n - h o u r f or ‘ r u nnin g ‘ s t a t u s d u r a t i o n * * (i d l e / resc h e d u l in g / t er mina t e d t im e i s f ree) . W e b se a rc hin s i d e a Dre am cyc l ecos t s t h es t an d a r d$ 10/1k searches. Worked example from Anthropic: 1-hour Opus 4.7 session w/ 50k input + 15k output + no cache = $0.705; w i t h 40 k o f in p u t a sc a c h ere a d s =$ 0.525. The Messages API modifiers that don’t apply: Batch API discount, Fast mode premium, data residency multiplier, third-party platform pricing. There is no announced max_dream_cost or dream_budget_tokens as of the May 9 snapshot — cost containment is achieved by harnessing the standard session-runtime accrual rules (keep the Dream out of running state when not actively reflecting).
Memory-store schema. The Dream wrote a “descent playbook” as a labeled artifact in the demo dashboard. Whether the dreaming agent picks the structure (free-form playbook + heuristics + counter-examples) or the harness imposes a schema isn’t shown. Compare with skill-design patterns which prescribe a specific SKILL.md shape.
Session-eligibility filters. Whether the dreaming agent can be scoped to “Dream over only sessions where outcomes graded > 0.7” or similar — important if memory poisoning from low-quality runs is a concern.
Privacy + retention. Memory store is portable per Caitlin, but what does Anthropic retain (e.g., for safety analysis on dreaming’s own behavior)? RSP v3.0/v3.1 implications not addressed.
Multi-tenant Dreaming. Whether one memory store can be Dreamed-into by multiple distinct agents (e.g., a writer agent + a grader agent both contributing) or if it’s strictly 1:1.
“Dreaming” term collision. Anthropic’s interpretability research has used “dreams” as a metaphor in different contexts (model representations, latent space inspection). Ensure docs clarify this is the platform-layer self-learning primitive, not the research metaphor.

Code with Claude 2026 — Opening Keynote — primary source; covers Dreaming alongside multi-agent + outcomes + advisor strategy + Routines + Desktop.
Claude Managed Agents — the harness Dreaming upgrades.
Cookbook: Managed Agents Multiagent + Outcomes — sibling primitive; outcomes is the within-session iteration counterpart.
Anthropic SDK Releases — May 2026 — the SDK wave that lands Multiagent + Outcomes; Dreaming surface expected next.
2026 Claude Code AIOS Pattern — practitioner-built self-improving skill loops that Dreaming productizes at the platform layer.
Anthropic + SpaceX Rate-Limit Increase — the capacity expansion that makes overnight Dream runs viable at scale.
Agent Workflow Patterns — Dreaming sits alongside the evaluator-optimizer pattern; cross-session reflection is a new pattern not in the original taxonomy.
Translating Claude’s Thoughts — Anthropic’s interpretability tool that turns activations into text; complementary “what is the agent thinking” surface to Dreaming’s “what should the agent learn.”

Jonathon's AI Wiki

Explorer

Claude Dreaming — Self-Learning Managed Agents

Key Takeaways

Engineering details (added 2026-05-08 from Mahes talk)

What Dreaming actually does

Why it matters

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Claude Dreaming — Self-Learning Managed Agents

Key Takeaways

Engineering details (added 2026-05-08 from Mahes talk)

What Dreaming actually does

Why it matters

Try It

Open Questions

Related

Graph View

Table of Contents

Backlinks