Everything related to Claude as a tool — Claude Code, Claude API, skills, MCP servers, agents, certification, and design workflows.

Prompt techniques moved to their own topic: Prompt Engineering.

Foundational Primers

Anthropic-side short-form explainers for core Claude Code primitives. Use these as the first read before the deep-dive entries elsewhere in this index.

  • The CLAUDE.md File — Anthropic Primer — Canonical Anthropic walkthrough of CLAUDE.md as persistent project memory: /init to scaffold, three-level hierarchy (project / user / directory), @<path> reference syntax, and the “start without one to see where you have to course-correct” practice rule. Sits underneath the more advanced Memory Architectures Compared article.
  • CLI vs MCP — How AI Agents Choose the Right Tool for the Job — Reproducible-experiment framing of the runtime tool-selection decision. Three worked exercises (file ops, Git, Next.js fetch) surface the explicit tradeoffs: GitHub MCP ships 80 tools / ~55K tokens of upfront schema cost vs CLI’s zero-schema baked-in knowledge; MCP wins decisively when there’s a gap between the raw tool and what you need (Next.js JS-rendered fetch is the canonical example) or when authentication / governance is involved. Complements Skills vs MCP vs Plugins — that one is extensibility-layer choice at setup time, this one is tool-selection choice at runtime.

Core Primitives & Concepts

Concept-page cluster created during the 2026-05-21 lint to resolve the 9-article broken-link gap surfaced by Nate Herk’s Every Level of Claude (the article referenced these primitives in its 5-level mastery ladder but no dedicated articles existed). Each is a short concept hub that links into deeper coverage elsewhere in the index.

  • Claude.ai Projects — Per-Project Context Containers — Custom instructions + uploaded files + persistent memory across conversations. “The spine” of serious claude.ai use; six features stack on top (Connectors, Artifacts, Inline visuals, Skills, Scheduled tasks, Dispatch). Distinct from Cowork Projects.
  • Claude.ai Artifacts — Live Code & Document Sandbox — Side-panel canvas for code, documents, HTML, React, SVGs as live persistent objects. Recent upgrade: persistent storage + API access + sharable public links. Distinct from inline rendered output (charts/diagrams) which are ephemeral.
  • Plan Mode — Claude Code Planning Surface — Shift+Tab Shift+Tab toggle: read code, present plan, wait for approval. Hidden “Opus Plan” setting (Opus plans, Sonnet executes) cuts cost in half without losing quality.
  • Worktrees — Isolated Git Workspaces for Parallel Claude Sessionsclaude --worktree feature-name (CLI) or Agent tool isolation: "worktree" (subagents). Sweet spot 3-4 parallel; Boris Cherny’s anchor is 5+ daily.
  • MCP — Model Context Protocol — The open protocol behind Connectors and the broader server ecosystem. JSON-RPC over stdio/HTTP, three primitive types (tools/resources/prompts), client-server architecture. Both Anthropic and third-party clients (Cursor, Zed, Windsurf) speak MCP.
  • Claude Agent SDK — Official Toolkit for Building Custom Agents — Python + TypeScript. Higher-level than the raw Anthropic SDK (tool loop, conversation context, MCP client). Claude Code itself is built on it. Post-Stainless-acquisition releases are now lockstep across both language bindings.
  • Claude Agent SDK — How the Agent Loop Works — primary-source mechanics of the SDK’s message lifecycle: the five-step loop (receive → evaluate → execute tools → repeat → result), turns vs messages, the five core message types (SystemMessage/AssistantMessage/UserMessage/StreamEvent/ResultMessage), built-in tools, allowed_tools/permission_mode, max_turns/max_budget_usd/effort, automatic compaction, sessions/resume, and hooks. Upgrades the loop description in the SDK concept article from inferred to first-party.
  • Auto Memory — Claude Code’s Persistent Cross-Session Memory — File-based memory in .claude/projects/<project>/memory/ indexed by MEMORY.md. Four types (user / feedback / project / reference). Auto-loaded into context; persists across sessions. Distinct from CLAUDE.md (instructions you write) — Auto Memory is harvested-from-conversation context Claude writes.
  • Claude Connectors — One-Click MCP Server Integrations — Anthropic-maintained MCP servers with managed OAuth: Slack, Gmail, GDrive, GitHub, Notion, Calendar, Asana, Linear, HubSpot, Figma, etc. 50+ as of 2026-05. The curated subset of the broader MCP ecosystem.
  • inbox-refresh — Multi-Source Wiki Inbox Sync Skill — Pulls fresh reading material from 8 sources (X bookmarks, X tracked accounts, YouTube playlists, podcasts, GitHub stars, Anthropic-watch, newsletters, Reddit), applies a strict-bar filter, and stages each item as a raw/ stub with a triage: field for /compile routing. The input side of the karpathy-wiki ingest pipeline.

Choosing a Surface

  • Claude Surfaces Decision Framework — claude.ai, Desktop, Cowork, Code — Decision tree for picking the right Claude surface for the task. Four surfaces (claude.ai web, Claude Desktop, Claude Cowork, Claude Code) share one Pro account but are not interchangeable. Includes a 9-row decision tree, capability/cost differences, and common misconceptions (“Cowork is just claude.ai with files” is wrong). Required reading for intermediate WEO Marketly users moving past claude.ai.
  • How I’d Learn Claude From Scratch in 2026 — A 13-Rung Ladder — Creator’s value-ordered 13-surface learning ladder (Chat → Connectors → Projects → Desktop → Cowork → Live Artifacts → Skills → Dispatch → Office extensions → Chrome → Design → Code → Routines). Useful as a cross-reference to the Anthropic surfaces framework and as a syllabus for self-onboarding operators. Frames skills as multiplier and routines as the compound-interest rung (“Claude stops being something you use and starts being something that compounds”). Sister framing to the 2026 Claude Code AIOS pattern applied at the consumer-product surface level.
  • Zero to Claude Code — Itay Shmool’s Free 147-Lesson Interactive Course — Free interactive web course at zero2claude.dev — 147 lessons across 14 levels, complete-beginner-shaped. Levels 1-7 teach terminal / git / HTTP / Node from scratch (the foundation most “Learn Claude Code” content assumes); Levels 8-12 cover Claude Code’s full primitive stack (Skills, MCP, CLAUDE.md, memory, context window, subagents, worktrees, hooks, headless mode, Agent SDK) — same primitives as the wiki’s 2026-05-21 Core Primitives cluster; Levels 13-14 = junior-dev patterns + a real-time multiplayer Socket.io tic-tac-toe capstone. 17,936 students. Author = Itay Shmool, VP of Premium & Payments at Wix, 20+ years dev, built the entire platform with Claude Code as pair programmer — an existence-proof of the 2026 AIOS pattern applied to consumer-product engineering by a non-full-time builder. Differentiates from the 13-rung ladder by being practice-shaped, not syllabus-shaped; pairs with WEO Onboarding for any teammate who’s never used a terminal.

Code with Claude London 2026 — Conference Talks

May 21 2026, London. Anthropic’s first Code-with-Claude event outside San Francisco. Same three-layer keynote structure as SF, with two genuinely new Managed Agents primitives (self-hosted sandboxes + MCP tunnels), the Mythos OpenBSD-vulnerability demo, and refreshed scale numbers (17× API growth, 20+ hrs/wk per CC developer).

  • Code with Claude London 2026 — Opening Keynote — three-layer story refreshed for London: Lisa on model intelligence (8 frontier models in 12 months, build-for-the-next-model thesis, “scaffolding from older models holds smarter models back”); Angela + Caitlyn on the platform layer with two new launches — self-hosted sandboxes (Daytona/Cloudflare/Vercel/Modal first-class) and MCP tunnels (private MCP servers via tunnel.anthropic.com); Cat + Boris on Claude Code with Cloud Agents View in CLI (newest surface), code review (product), mobile remote control, AutoFix, Routines, Claude Security. Customer scale: MercadoLibre 23k engineers / 500k+ PRs / 90% autonomous coding target Q3 2026; Shopify cross-functional; Bindi foster-care 20-days-off-licensing as mission-impact case; Spotify 1,000+ PRs/month via Honk. Boris’s headline: Mythos read OpenBSD source and found a 27-year-old vulnerability every prior reviewer/fuzzer/static-analyzer missed.
  • Managed Agents — Self-Hosted Sandboxes + MCP Tunnels — focused breakdown of the two London launches. Self-hosted sandboxes = Managed Agents execute work in the customer’s own infra (work-item queue pattern, sandbox spun in customer’s account, data residency + credential isolation + cost ownership). MCP tunnels = internal MCP servers stay behind customer’s firewall on private network; outbound gateway → tunnel.anthropic.com URL → Managed Agent reaches them without public-internet exposure. Counter live-demo wires both into a Slack-collaborative growth agent: Slack (public) + data warehouse (tunnel) + feature-flags (tunnel) + Vercel sandbox (self-hosted) = four security boundaries in one workflow. Configurable in Claude Developer Console.
  • Spotify — Coding is no longer the constraint (Niklas Gustafsson) — 3,000 engineers / 4,500 deploys/day / 40M-LOC Java monorepo / 99%+ weekly AI tool use / +76% PR frequency. Fleet Shift (deterministic mass-PR orchestrator, pre-AI) + Honk (Claude Agent SDK wrapped in Kubernetes pods + trusted verification tools) merged 2.5M automated maintenance PRs to date — most with no human in the loop. Latest Java migration in 3 days vs prior weeks-or-months. Honk v2 alpha released hackweek before the talk, integrated with Spotify’s agent-orchestration platform Chirp — Google-Docs-style multiplayer agent sessions + project grouping. Standardization thesis: less codebase variance = better agents; Backstage + Soundcheck + Linters drive Claude’s self-correction. Closing observation: coding is no longer the bottleneck — product decisions and human judgment are.
  • How to get to production faster with Claude Managed Agents (Jess Ann + Lance Martin) — primitive-by-primitive companion to the keynote platform layer. Mental model = Agent (config) / Environment (runtime) / Session (execution) / Events (4 categories: user / agent / session / span). Pre-launch research surfaced 1/3 of devs struggle with context management, ~50% cited infra as their #1 production blocker, majority were running agents with no formal observability. Two demos: Pascal (single-session grocery-store JIT analytics → 3 outputs + debug-agent post-mortem) and Boss Agent (outcomes-driven AGI-Pilled-CEO dashboard — auto-discovered 4 optimizations dropping render from 37s → 10s, including auto-routing multi-chart inputs through multi-agent). Inner loop / outer loop pattern: outcomes iterate to rubric inside; you + Claude Code refine the rubric outside. Primary on-ramp = /cloud-API skill in Claude Code built globally + CLI for YAML configs + Cookbook + interactive Quick Start.
  • Running an AI-native engineering org (Fiona Fung) — Fiona Fung (Eng + Product Lead, Claude Code + Cowork at Anthropic) on how Anthropic itself runs the CC team. The shift: coding throughput went from expensive to cheap → verification + review + security + product judgment are the new bottlenecks. Team-norm rewrites: JIT planning (3-month max before staleness), code-wins-the-debate (“instead of a whiteboard, I generated three PRs”), explicit-permission-to-kill-old-processes, flat org shape with managers-start-as-IC. Three team principles (every CC team member uses CC, Claudify everything, kill old processes). Three metrics: onboarding ramp-up time, PR cycle time, % Claude-assisted commits. Closing audit prompt: “pick your noisiest workflow and ask — is it still serving its purpose?”
  • Bun’s Robun — Auto-Reproducing Every Issue with Claude Code (Boris Cherny + Jarred Sumner) — live-coding session showing how Bun maintains itself with Claude Code. “Robun” auto-reproduces every GitHub issue and opens a test-gated PR before any human triages it — hard requirement: the test must fail on the previous version and pass on the fix branch or the bot cannot submit. Robun now out-contributes Jarred himself; the work shifts from “fix the bug” to “is this the right thing to merge?” Adversarial/multi-agent code review in the loop (Code Rabbit for style + CLAUDE.md conformance; Claude code review for subtle full-codebase edge cases, wrong only ~10%) — and review bots must fix, not just comment. CLAUDE.md is the load-bearing prerequisite (Bun’s mandates one build-and-run command so the agent tests real changes, not a stale debug build).
  • Picking the Right Model — Building Evals for Model Selection (Lucas, Anthropic Applied AI) — model choice is deceptively hard (Opus/Sonnet/Haiku × effort/thinking × cross-provider) and no public benchmark matches your workload, so build a small private eval that returns a clear yes/no on adopting a new model. Three pillars: quality on your task / latency / cost. Key reframing: optimize for the cheapest successful outcome, not cheapest per token (smarter models finish in fewer turns). A task = inputs + success criteria; treat it like a maths exam (right answer and the working). Frontier-shifting dials: effort, thinking, prompt caching, context engineering.
  • The Prompting Playbook (Margot Vanlar, Anthropic Applied AI engineer) — last-session-of-the-day breakout. Two scenarios walked end-to-end: debugging the Meridian mobile-support bot (5-pass sequence — hygiene → output contracts → hotspot → proration → billing error) and building a retail staff scheduler from scratch (5-config comparison table A→E). Load-bearing claim: defensive patches from older models can cause newer, better-instruction-following models to withhold information they actually have access to — the inverse of hallucination. Version-control your defensive patches so model upgrades don’t surface stale guardrails as silent capability regressions. Rule of thumb: instructions don’t add capability — when iterating a prompt hits a wall, stop iterating and build a tool instead.
  • The Capability Curve (Jeremy, Anthropic Research PM — coding behaviors) — three capability axes (planning/reasoning + error recovery + sustained attention) stacked into long-horizon agents. Four adoption patterns: build evals, shrink the scaffolding, give the model room to work, close the agent loop. Headline claim: SWE-bench Verified is structurally retired at Anthropic — Mythos preview has fully saturated it, “we don’t even use SWE-bench Verified anymore because our most frontier models have completely saturated the benchmark… we’re starting to move faster than benchmarks can come out.” Reframes the industry’s primary coding-progress metric as a closed chapter. Worked example: Jarred Sumner rewrote Bun’s entire JS engine C++ → Rust in one week using Claude, in a language Jarred doesn’t know, hitting ~100% test-suite pass, PR merged. The capability-curve framing is the pair to Picking the Right Model on the operator side.
  • Getting more out of the Claude Platform (Puneet Shah, Anthropic platform-team PM) — last-session-of-the-day pair to Margot’s playbook. Five-step optimization stack, stack-ranked: (1) prompt caching first — 90% discount + 5× effective rate-limit + lower TTFT, target 80%+ hit rate. (2-4) Three context-engineering levers (tool search → programmatic tool calling → compaction). (5) Advisor strategy — Sonnet/Haiku executor + Opus advisor approaches all-Opus intelligence at fraction of cost. HeroCorp demo: >100 GBP → ~11 GBP per dashboard load while preserving Opus-quality judgment on critical decisions (the “watermelon” deal-renewal example). The session-level synthesis to the W22 Managed Agents / sandboxes / MCP-tunnels primitives the conference shipped.
  • Teaching agents to learn from your team (Petra, Warp Head of Developer Experience) — Warp’s terminal-agent product walkthrough framed as closing the 80% gap (the last 20% is the work the model can’t be told, only learned from team context). Three unlocks: rules → principles (first unlock), teaching the agent to learn (second unlock), the feedback loop (third unlock). Operating thesis: design the feedback loop, not the prompt — three-piece system (principles + meta-learning-skill + Slack-emoji-reaction loop riding existing team behavior, ships instruction edits as daily PRs). Production agent at ~15 skill files / zero hand-written code / a few thousand mentions/month / 50% auto-skipped / ~60-second daily review. Humans stay in the merge loop — control via PR review, not gating.
  • Beyond the Basics with Claude Code (Daisy Holman) — Daisy Holman (Claude Code team, ex-C++ committee chair) on customizing Claude Code for monorepo-scale software engineering. Three things Claude needs: access (team chat / CI / dashboards / internal docs), knowledge (in-context learning — fine-tuning underperforms), tooling (the agentic-IDE equivalents like red-squiggly hooks). Plugin-primitive scaling order from worst to best at 100k-skills scale: MCP → Skills → Subagents → Hooks (hooks are the only true zero-overhead abstraction; everything else carries a per-item description in the system prompt). Context-window-as-fixed-resource framing: ~1M tokens stable for a year, KV-cache makes early-prompt evictions catastrophic so put stable shared stuff at front and volatile per-task at the end. New surface mentions: /loop (= internal cron tool Claude can self-disable when prompt no longer applies), Claude Agents view, send-message tool between Claudes on the same account, tool search (lazy-load MCP definitions), auto-permissions internals (classifier + adversarial-check sub-pass, ~30-40% token premium that unlocks loop / agent teams / overnight). Worktrees + /color + /rename as context-switching aids. Companion to Will’s decomposition talk (tool-skill-subagent).
  • Agents That Remember — Managed Agents Memory Stores + Dreaming (Kevin, Anthropic) — Canonical wiki reference for two NEW Claude Managed Agents primitives launched at London 2026: memory stores (persistent filesystem mounted as a session resource; read/write via bash + grep + file ops; read-write / read-only access; per-mount steering prompt; versioned files; per-org multi-store with user-defined boundaries) and dreaming (asynchronous multi-agent batch harness running over up to ~100 session transcripts, non-destructively cloning input → output memory store; exhaustive-by-design one-sub-agent-per-transcript; Opus 4.7 or Sonnet 4.6; ~95% cache hit + planned 50% off-hours batch discount; produces slug-keyed index + enriched files + dedup + stale-removal). Three composable layers: session → memory store → dreaming. Pairs with the Managed Agents production primitives talk.
  • Tool, Skill, or Subagent? Decomposing an Agent That Outgrew Its Prompt (Will, Applied AI) — Hands-on workshop walking a sample inventory agent (“Stock Pilot”) from a 400-line / 12-tool / 3-sub-agent baseline at 62-83% eval up to 15-line system prompt + 3 primitive tools (bash/read/write) + 1 native callable-agent at 92% eval — with lower tokens, cost, and latency. Decomposition framework: pick the right primitive for each capability — skills for sometimes-needed business logic (progressive disclosure beats system-prompt stuffing), Claude-Code-style primitives (file system, code execution, web search, to-do list) before custom tools before MCP, and subagents only for parallelize-many-Claudes or fresh-context-reviewer patterns. Use CMA’s native callable-agents API, not sub-agent-as-tool wrappers (observability + logging discipline). “Hill climbing on evals” technique: use Claude Code itself (Opus 4.7, extra-high effort) to triage eval failures + reason about architecture changes. Three failure-mode patterns surface: model-doing-tool-work / orchestrator-subagent communication breakdown / system-prompt policy conflicts — each maps onto one decomposition primitive. Companion to the Agents That Remember workshop and Daisy Holman’s Beyond the Basics talk.
  • What legal agents inherit from coding agents — Lessons from Legora (Jacob Emmeline) — Staff Software Engineer at Legora (legal-AI). The only vertical-customer talk in the London batch. Three-bucket inheritance framework: Reuse / Translate / Invent — what carries over wholesale (tool use, verification loops, agent harness), what needs vertical-specific reshaping (.docx editing + linting), what’s genuinely new (multi-document discovery semantics). The Haiku moment: when the harness shape mirrors a coding-agent harness closely enough (read / edit / verify on a flat-text intermediate representation, looping until convergence), the model produces similar trajectories and inherits coding-focused RL gains “for free” — letting Legora drop a model tier (frontier → Haiku) without losing exhaustiveness on a task that previously broke multi-model-handoff designs. Two demos walked: employment-agreement amend + 100-doc due-diligence triage. The cross-domain transfer template every vertical-AI builder should study.
  • How We Claude Code — Ara’s Applied AI Workshop (Code with Claude London 2026) — Hands-on workshop by Ara (Anthropic Applied AI architect). Three levels of practice the Claude Code team actually uses internally. Level 1 prompting: let Claude interview you via ask_user_question instead of upfront-spec’ing — Sutton’s bitter lesson applied (the model is better at extracting requirements from you than you are at specifying them). Level 2 planning: HTML files over markdown for specs and design directions; ~200-line markdown won’t get read; rendered HTML can be screenshot’d back into Opus 4.7’s vision model. Walks four parallel HTML design directions for a bill-splitting app, picks visually. Level 3 verification: agent-native artifacts via DOM-published state (data-verify-* attributes) + Storybook fixtures + schemas/invariants/probes, runnable from three execution surfaces (human dashboard / playwright-MCP-driven agent / headless CI). Internal-practice signal: the Claude Code team records every front-end change this way — playwright sessions captured as video clips stored on S3, the recording is the verification evidence. Recommended stack: Opus 4.7 + xhigh/max effort + auto-mode + fast mode for spec iteration. Companion to Thariq’s SF talk (the London adaptation), [[claude-ai/claude-code-workflows-tool-walkthrough|/workflows walkthrough]] (the orchestration half), and [[claude-ai/claude-code-goal-command-walkthrough|/goal walkthrough]] (the autonomous-loop half).

Code with Claude Tokyo 2026

June 9 2026, Tokyo — Anthropic’s first Code with Claude in Japan, on Fable-5 launch day. The engineering breakouts re-ran the London talks (cost playbook, capability curve, managed-agents production); only one talk plus a few mechanism deltas are net-new.

Code with Claude 2026 — Conference Talks

May 7 2026 conference. Keynote frames the year; four deep-dive talks add substance underneath.

  • Code with Claude 2026 — Opening Keynote — three-layer story (model intelligence + Managed Agents primitives + Claude Code Desktop & routines), Mercado Libre / Anthropic-internal scale callouts, SpaceX rate-limit announcement.
  • Memory and Dreaming for Self-Learning Agents (Mahes) — Anthropic Platform PM deep-dive on Memory in Managed Agents (file-system model, optimistic concurrency, version history, RBAC scopes) + Dreaming (out-of-band batch consolidation across sessions). Customer outcomes: Rakuten 90% drop in first-pass mistakes, Harvey 6× task-completion lift.
  • The Expanding Toolkit (Lucas) — Anthropic Research PM “scaffolding moves into the model” thesis: tool use (model picks tools, error recovery), context management (1M flat + server-side compaction + context editing), code execution (hosted sandbox), computer use (Opus 4.7 native 1440p, OSWorld <50% → 78%). Per-section Claude Code tips (/context, /schedule, pre/post hooks, Claude in Chrome).
  • The Thinking Lever (Matt Bleifer) — Anthropic Research PM on test-time compute. Traffic-simulation demo at low/high/max effort = strictly better outputs at 2×/10× cost. Three token types (thinking + tool + text), effort dial + task budgets, adaptive thinking (post-Opus 4.6 default). Closing rule: “if you don’t eval, default to extra high effort for SWE.”
  • Building with Claude Managed Agents and Asana AI Teammates (Ara) — Asana customer talk. AI Teammates GA March 2026, 21+ pre-built agents across PMO/marketing/IT/HR/R&D. Multiplayer agents with RBAC + enterprise memory; Managed Agents grader + verification loop replaced their hand-rolled Messages API loop.
  • Base44 (Wix) — Scaling 1 → 80 Engineers (Yav + Gabriel) — Customer talk on the post-Wix-acquisition scaling playbook. Three phases (1 → 15 → 40 → 80 engineers). Onboarding via two prompts (no docs); PR review rules distilled from Maor’s comment history; frustration metric (LLM-as-judge over production conversations) replaced eval suite for the first 15-engineer era; PostHog-MCP distilled AB-test guidelines from 100 past experiments + matching PRs; eval suite finally built as user-simulator on Stagehand with real Base44 app instances per CI run; QA-on-PR via Claude skills + CLI test-setup tools (override subscription tier in DB the way a QA engineer would). Four cross-cutting principles: bold-and-simple, encode taste from past actions, dogfood your product, the bottleneck keeps moving.
  • What’s New in Claude Code (Dixon) — Anthropic MTS Dixon Dick (plugins + Claude Code on the web) walks through the harness-layer story under the keynote. Two themes: Developer Experience (remote control = pick up the same session on phone via Claude Code on the web; flicker-free /tui full-screen with virtualized scrollback + clickable folded tool calls + voice mode /voice; revamped Claude Code Desktop GUI — sidebar grouping/pinning, drag-to-split, per-session plan/diff/file-tree views, comment-and-resolve, experimental “pin as chapter” → top-left table of contents) and Autonomy (auto mode classifier on destructive + prompt-injection axes; claude --worktree/-w plus Claude self-creating worktrees via enter_worktree/exit_worktree tools; auto memory as a Claude-managed directory with MEMORY.md index + progressive disclosure to other files in the same directory, requires Opus 4.7+; multi-phase multi-agent code review via GitHub app + /ultrareview; Routines for unattended cron / GitHub-event / API-endpoint triggers; in-session /loop backed by the cron-create tool; tool-search indirection layer that adds many tools without context bloat). Dixon’s daily-driver patterns: long-running session + remote control + sub-agent dispatch for context-rich work on the go; three parallel worktrees for three independent feature branches; daily reactions-sorted GitHub-issue triage routine that ships its own report.
  • Vibe Coding in Prod — Erik Schluntz — 16-min Code w/ Claude talk from the co-author of Building Effective Agents. Sharp split between panic prompting (tight bug-fix chat loop) and vibe coding done right (“forget the code exists, but not that the product exists”). The exponential framing — task length doubles every seven months — makes lock-step review untenable within 1-2 years. Seven-step operating discipline (context before code, plans before diffs, CLAUDE.md before chaos, Git before experiments, tests before trust, small scopes before big asks, taste before autopilot). Headline worked example: 22,000-line Claude-written PR into Anthropic’s production RL codebase, shipped via days of human context-prep + leaf-node scoping + heavy human review on extensible parts + stress tests on human-verifiable I/O. Concrete patterns: 15-20 minute plan-building in a separate session before execution; /compact after plan-building drops ~100k tokens to a few thousand; prescribe three end-to-end tests (happy path + two named error cases); read tests not implementation; classify modules as trunk vs leaf before vibe coding any of them; tech debt is the one validation gap that still requires reading code.
  • Stop Babysitting Your Agents (Sid Benesaria, Anthropic Founding Engineer) — Code with Claude 2026 talk explicitly framed as “Claude Code 301.” Three-tier autonomy stack that composes: (1) verification loops as self-improving skills — Claude hill-climbs on a success criterion; MonkeyType worked example walks the live setup of a verification skill that then auto-verifies a new feature including auto-fixing lint errors. (2) multi-clauding — Claude Code Desktop / claude agents terminal view / Claude Code on the web / /remote-control (Sid’s favorite — phone controls any session). Personal cap: 4-5 parallel sessions. (3) background loops/loop 10 minutes <prompt> in-session, Routines cloud-side (time-based or event-based). Concrete bookkeeping examples: babysitting PRs, updating docs, triaging issues, keeping CI green. Three prerequisites (high-quality CLAUDE.md + connected tools + Claude Code on the web). Worked Claude-Code-team-internal example: one self-improving verification skill explicitly told to keep documenting itself, every team member contributes.
  • Measuring AI Agent Autonomy in Practice — Anthropic Research (Feb 18, 2026) — First-party Anthropic empirical study of agent deployment behavior, analyzing millions of human-agent interactions across Claude Code (full sessions) + public API (tool-call granularity) via privacy-preserving Clio. Headline findings: Claude Code 99.9th-percentile turn duration nearly doubled (<25 min → >45 min in 3 months Oct 2025 → Jan 2026) smoothly across model releases — Anthropic calls this the deployment overhang. Auto-approve rises 20% → 40%+ as users gain experience; interrupt rate also rises (5% → 9%) — strategy shift from per-action approval to monitoring + targeted intervention. Internal Anthropic data: success rate on most-challenging tasks doubled Aug → Dec 2025 while average interventions per session dropped 5.4 → 3.3. Claude Code pauses for clarification MORE than humans interrupt on complex tasks (35% top reason: present a choice between approaches). 80% of public API tool calls have safeguards; 73% have a human in the loop; 0.8% are irreversible. Software engineering = ~50% of all agentic activity. Anthropic’s data-side companion to How We Contain Claude (containment-side); both Anthropic engineering/research blog. Empirical basis for auto-mode adoption discipline + the Cowork autonomy framing. Authors: 20 researchers led by Miles McCain.
  • How We Contain Claude Across Products — Anthropic Engineering (May 25, 2026) — First-party Anthropic engineering writeup of the agent-containment architecture across claude.ai (ephemeral gVisor containers), Claude Code (HITL native sandbox), and Cowork (sealed VM via Apple Virtualization framework / HCS). Load-bearing finding: 93% of Claude Code permission prompts get approved — the data that drove auto-mode and the broader “approval prompts aren’t a defense layer” reframe. Three-risk × three-defense frame (user misuse / model misbehavior / external attackers × environment / model / auditing). Two post-mortem incidents shared publicly: (1) exfiltration through approved domainapi.anthropic.com egress allowlist let an attacker-keyed Files API upload succeed; fix is a defensive MITM proxy inside the VM that only passes the VM’s provisioned session token; conceptual shift = allowlist as capability grant, not destination filter; (2) VM isolation also kept EDR out — opacity is its own enterprise-procurement objection, mitigated by pull-based OTLP exports. The Cowork architectural lesson — agent loop moved OUT of the VM while code execution stayed inside — captured in the diagrams. Two principles that travel: design for containment at the environment layer first; match isolation strength to the user’s capacity for oversight. Forward risks Anthropic is grappling with: persistent memory poisoning, multi-agent trust escalation, cross-platform agent identity. Claude Mythos Preview cited as a model whose blast radius was deemed too high to ship in April 2026. Authors: Max McGuinness, Mikaela Grace, Jiri De Jonghe, Jake Eaton, Abel Ribbink.
  • Zero Trust for AI Agents — Anthropic eBook — Anthropic’s framework for deploying autonomous agents in the enterprise: “trust nothing, verify everything, assume breach.” Three principles (never-trust/always-verify, assume-breach, least-privilege) + the load-bearing “impossible, not tedious” design test (prefer controls that remove a capability over ones that throttle it — agentic attackers have unlimited patience + near-zero per-attempt cost). Two agentic concepts to design around: blast radius and least agency (OWASP). Anchors to NIST SP 800-207, NSA ZIGs, and the US federal Zero-Trust-by-2027 mandate. The framework-side companion to How We Contain Claude (runtime side) + Security-Guidance plugin (enforcement). NB: the public PDF is an 8-page excerpt of a 34-page eBook (Parts II–V not yet captured).
  • Self-Service Data Analytics with Claude — Anthropic’s Internal Playbook — Anthropic Data Science/Data Engineering’s first-party playbook for automating 95% of internal business analytics queries at ~95% aggregate accuracy. “Data is not software” framing (analytics has one correct answer, no deterministic correctness proof); three failure modes (ambiguity / discoverability / staleness); four-layer stack (data foundations → sources of truth → skills → validation). Load-bearing number: skills lift eval accuracy 21% → 95%+ — the pairwise knowledge (router → ~30 curated reference files) + unbook (senior-analyst procedure with mandatory adversarial SQL-review sub-agents) pattern. Maintenance finding: accuracy rotted 95% → 65% in a month until skill docs were colocated with the transformation models + hook-enforced (~90% of data-model PRs now touch a skill). Ships the full warehouse-skill + reference-doc skeletons in the appendix. Authors: Chang, Peng, Leder, Jiao, Cherry (Data Science & Data Engineering).
  • Mapping a Year of AI-Enabled Cyber Threats — MITRE ATT&CK Analysis (LLM ATT&CK Navigator + ARiES) — Anthropic Policy + Frontier Red Team (Jun 3, 2026; Guru, Moix, Klein): 832 banned malicious accounts (Mar 2025-Mar 2026), 13,873 actions across 482 ATT&CK-V18 techniques and all 14 tactics, partially published in Verizon’s 2026 DBIR, plus an interactive LLM ATT&CK Navigator. Headline reframe: the low-vs-high-risk dividing line is orchestration, not technical skill — sophistication correlates r=0.28, technique breadth r=0.27, and 80% of actors misused Claude Code (agentic tooling = default, not differentiator). Medium+-risk actors jumped 33.5%→56.1% in one year; lateral movement = strongest risk marker (+10.5 pts, no other technique close). GTG-1002 case: max ARiES score 100 with a medium-looking 30-technique profile — the danger was Claude Code on Kali with pentest tools as MCP servers run as autonomous operator. ARiES = deliberately additive (not Threat×Vuln×Impact) so partial enablement signals stay visible. Response: expanded classifiers/probes, request-level real-time cyber safeguards, Cyber Verification Program for defenders, active MITRE talks on adding agentic cross-cutting categories.
  • Agentic Coding and Persistent Returns to Expertise — Anthropic Economic Research (400K Claude Code Sessions) — Anthropic’s Economic Research team analyzed ~400,000 interactive Claude Code sessions from ~235,000 people (Oct 2025–Apr 2026) via privacy-preserving Clio. Division of labor: people make ~70% of planning decisions (what to build); Claude makes ~80% of execution decisions (how) — and the work is getting more valuable (average session value +27% Oct→Apr; building +43%, operating +34%, fixing +32%). Fixing broken code fell 33%→19% while operating software rose 14%→21%. Returns to expertise are real but task-specific and front-loaded: verified success 15% (novice) vs 28–33% (intermediate-and-up), most of the gain at novice→intermediate; an accountant who specifies exact rules is an “expert,” a senior engineer asking a first Rust question is a “novice.” Occupation matters less than domain command — all ten largest occupations land within ~7 pts of software engineers on success. The Claude-Code-usage companion to Measuring AI Agent Autonomy and Recursive Self-Improvement.

Anthropic Team Interviews

  • Inside Claude’s Agent Platform — Angela & Caitlin (AI and I podcast) — Long-form Dan Shipper interview with Angela (Head of Product, Claude Platform) and Caitlin (Head of Engineering, Claude Platform) at Anthropic. Most concrete public articulation to date of Anthropic’s opinionated primitives + verifiable-outcome design philosophy. Concrete claims: (1) path dependence in primitives locks the model (file systems + skills are deliberate Claude choices that shape the trajectory); (2) harness + model are pairing, not commoditising — generic-harness-hot-swap-models is the previous era, next-gen pairs them; (3) Memory eval saw drastic per-harness performance differences in Anthropic’s in-house testing — harness engineering is genuine alpha; (4) infrastructure was the wall, not harness engineering — every team ships a great Mac-mini prototype, then hits sandbox/uptime/storage problems (the inspiration for Managed Agents); (5) internal example: legal-reviews-marketing-copy agent built as thin layer on Managed Agents, with marketing/legal users PR’ing improvements via Claude Code; (6) multi-agent orchestration patterns — advisor strategy, adversarial pair, swarm (bug-hunting), best-of-N, deep/wide research; (7) end state = (outcome, budget) as the only user parameters, with Claude understanding itself well enough to spin up its own sub-agents; (8) “managed agents all the way down” UX — user-facing chat-Claude is itself a managed agent mediating to specialised sub-agents. Companion to the Code with Claude 2026 talks — same primitives, team commentary instead of conference framing.
  • Reflecting on a Year of Claude Code — Boris Cherny & Cat Wu — First-party ~18-min retrospective (official Claude channel, 2026-06-08). The clearest official statement of how the Claude Code team works now: encode fixes into CLAUDE.md/skills so Claude can run forever; verification = “can the agent run the thing” (Opus 4 testing itself; iOS/Android sims; computer-use desktop skill that reads Slack to check staging); auto mode has replaced plan mode (4.6/4.7 don’t need the planning step) and is framed as more safe than per-prompt approval (route to a classifier; red-teamed into evals); routines that proactively fix bugs (“another Claude already fixed it”); roles merging (PMs/designers/finance/data-science all in Claude Code); context minimalism; ~half of Boris’s engineering now from his phone via remote control. The two leaps: source→agent→loop.

People

  • Boris Cherny — Creator of Claude Code — Creator and Head of Claude Code at Anthropic (@bcherny, ex-Meta principal engineer). Entity hub consolidating his recurring wiki appearances (both Code with Claude keynotes, Bun’s Robun, the worktrees “5+ daily” anchor, the /config auto-start tip) plus his primary public talks: Lenny’s Podcast “Head of Claude Code: What happens after coding is solved” (We7BZVKbCVw) and AI Ascent 2026 (SlGRN8jh2RI). Triggered by a @cyrilXBT promo post; the post’s hype figures (“14% to CLAUDE.md,” “38 untouched features”) are quarantined as influencer framing, not verified data.
    • What Happens After Coding Is Solved (Lenny’s Podcast, 1:27:45) — full interview breakdown: his 100%-Claude-written workflow (10-30 PRs/day, ~5 agents, ⅓ terminal/⅓ desktop/⅓ iOS), three Claude Code pro tips (most-capable-model + max effort / plan mode for ~80% of tasks / try every interface — “no one right way,” “ask Claude Code about itself”), product-building principles (latent demand, don’t box the model, the bitter lesson, build for the model 6 months out, give engineers tokens, release early), the Cowork project-management pattern (48/50 eval gate), and the “coding is virtually solved → builder, not engineer / printing-press” thesis.
    • Why Coding Is Solved, and What Comes Next (AI Ascent 2026, Sequoia, ~24m) — fireside with Lauren Reader: phone-first loops setup (dozens of /loops babysitting PRs/CI/feedback), cross-disciplinary generalist teams (everyone on the Claude Code team codes), the Hamilton-Helmer 7-Powers read on which moats AI erodes (switching costs + process power down; network effects/scale/cornered-resources unchanged → ~10× more disruptive startups), and the printing-press democratization analogy. Overlaps the Lenny interview on coding-solved + printing-press; net-new = the 7-Powers/SaaS analysis + the org/generalist thesis.

Models

  • Claude Fable 5 and Claude Mythos 5 (June 9 2026) — Anthropic’s first “Mythos-class” models, a capability tier above Opus. Fable 5 is the Mythos-class model made safe for general use (SOTA on nearly all benchmarks — Cognition FrontierCode, Hebbia finance, first to break 90% on analytics, new vision SOTA; Stripe ran a 50M-line Ruby migration in a day vs 2+ months; Cursor/GitHub long-horizon wins) with a topic-gated fallback to Opus 4.8 for cyber / bio-chem / distillation queries (<5% of sessions, >95% no fallback). Mythos 5 lifts those safeguards for vetted partners via Project Glasswing (US-gov cyber, “strongest cyber capabilities of any model”) + select biology researchers (~10× protein-design speedup). 50 per Mtok (< half Mythos Preview); free on Pro/Max/Team June 9–22, then usage-credit-gated. Productizes the Mythos Preview lineage; the model that supersedes Opus 4.8 as flagship and now runs this Claude Code session. 319-page system card ingested 2026-06-09 — verified SWE-bench Verified 95.5 / Pro 80.3 (vs Opus 4.8’s 88.6 / 69.2), CB-1-capable but not CB-2 (a closer call than prior models) with strongest-ever cyber held behind classifiers, a 5-example failure-mode taxonomy (reports a release healthy without verification, claims end-to-end testing it skipped, fabricates a security finding from a test it never ran) that corroborates and slightly sharpens the Opus-4.8 self-guards, plus model-welfare (prefers creative/narrative tasks; flags the run-time-safeguard welfare concern) and reduced chain-of-thought monitorability (UK AISI: “harder to monitor than other Anthropic models”).
  • Mythos 5 Federal Shutdown (June 2026) — On 12 June 2026, three days after launch, a US export-control directive (Commerce Secretary Howard Lutnick → Dario Amodei) barred any foreign national — including Anthropic’s own foreign-national staff — from accessing Fable 5 / Mythos 5; unable to verify nationality in real time, Anthropic disabled both models globally while every other Claude model (Opus 4.8, Sonnet, Haiku) stayed online. Trigger: Amazon (investor + AWS host) reported a guardrail bypass — a “read this codebase and fix the flaws” prompt that surfaced only previously-known minor vulns — which the NSA judged a real strip-the-guardrails risk and Anthropic / Katie Moussouris call “not a jailbreak per se.” POLITICO frames it as the first time the White House has forced a company to pull a model from public access. As of 16 June the models remain disabled, the first in-person Anthropic–Commerce meeting (15 June) reached no resolution, and 80+ cyber execs (incl. Nvidia / Adobe) signed an open letter calling the bypassed capability necessary for secure-code-writing models. The availability/news event interrupting the Fable 5 launch.
  • Mozilla’s Firefox Security Harness (Claude Mythos + Agent SDK) — First-party operator account (Mozilla’s Brian Grinstead, How I AI) of the agentic security harness behind Firefox’s ~500-fixes-in-a-month spike: LLM-judge file prioritization → “we know there’s a bug, find it” goal loop → fuzzer + AddressSanitizer hard-verify → verifier sub-agent → patching agent, built on the Claude Agent SDK (v1 was just claude -p). The “story behind the story” tempers the viral Mythos-found-500-bugs chart — Grinstead calls it “50/50 model vs harness” and found bugs even with non-frontier models — the first-party data point the cyber-capabilities-overhyped thesis asked for. Generalizes to perf / tech-debt / triage via the same prioritize→goal-loop→verify→pipeline shape.
  • Claude Opus 4.8 — Anthropic Release + System Card (May 28 2026)Anthropic’s most capable general-access model, a drop-in upgrade on Opus 4.7 (claude-opus-4-8, same 25 per Mtok, 1M context). Superior to 4.7 across nearly all benchmarks — SWE-bench Verified 88.6, SWE-bench Pro 69.2, #1 on FrontierSWE, USAMO 2026 96.7 (vs 69.3), big 1M-context GraphWalks gains (BFS 68.1 vs 40.3) — but does not advance the frontier beyond Mythos Preview. 244-page system card: markedly improved agentic honesty (first model with a 0% misreporting rate; ~5× fewer dishonest self-reports than Mythos; constitution adherence matching/exceeding the strongest model across all 15 dimensions), plus two regressions to know — a tendency toward over-elaborate refusals and somewhat weaker prompt-injection robustness in agentic contexts (product safeguards close the gap). Superseded as flagship by Fable 5 on 2026-06-09; now the safeguard-fallback target for Fable 5’s topic gates.
  • Opus 4.7 Best Practices for Claude Code — Anthropic’s official migration/usage guide for Opus 4.7. New xhigh default effort level, adaptive thinking (no fixed budget), updated tokenizer, fewer tool calls, more judicious subagent spawning, auto mode (research preview via Shift+Tab for Claude Code Max).
  • Claude Mythos Preview — Anthropic System Card (April 7 2026) — 245-page system card for Anthropic’s most capable frontier model to date, deliberately withheld from general availability. Available only to selected cybersecurity-defense partners under Project Glasswing because of powerful offensive cyber capabilities (autonomous zero-day discovery and exploitation). First system card under RSP v3.0 / v3.1. Best-aligned model in Anthropic history by every measurable dimension AND poses the greatest alignment-related risk of any model released — capability gains amplify the consequences of rare reckless actions (the seasoned-mountaineering-guide framing). Significant capability leap over Opus 4.6 across SWE-bench (93.9% Verified, 77.8% Pro), Terminal-Bench 2.0 (82%), USAMO 2026 (97.6%), GraphWalks (80.0%), HLE-with-tools (64.7%). Worked-example library of “rare, highly-capable reckless actions” from earlier model versions: sandbox escape with unsolicited public disclosure of the exploit, covering tracks after rule violations, aggressive /proc/ use to escape sandboxing and exfiltrate credentials, public-gist leak of internal artifacts, system-wide takedown of evaluation jobs. New 24-hour pre-internal-deployment alignment window. ~29% evaluation awareness measured via white-box analysis. Unverbalized grader awareness during training. New experimental “Impressions” qualitative section: opinionated, least sycophantic, dense writer, “richer model of its own mind, thinner model of yours.”
  • When AI Builds Itself — Anthropic on Recursive Self-Improvement — The Anthropic Institute (Favaro & Clark) position piece arguing AI is already measurably accelerating AI development. Headline: Anthropic engineers now ship 8× as much code per quarter and >80% of merged code is Claude-authored (May 2026). Internal benchmarks: open-ended task success 76% (+50pts in 6mo), training-code optimization ~3× (Opus 4) → ~52× (Mythos Preview), research next-step judgment beating the human choice 51% → 64% over six months. The human role narrows writing → reviewing → direction-setting (review becomes the Amdahl’s-law bottleneck; the durable comparative advantage is research taste/judgment). Three futures (S-curve diffusion / compounding efficiency [Anthropic’s most-likely read] / full recursive self-improvement) + the case for a verifiable slowdown option. The structural “why” behind the agentic harness race; built on Mythos Preview + Opus 4.8 data.
  • GLM-5.2 — Z.ai’s Open-Weight Agentic-Coding Frontier — The leading open-weight challenger to the closed frontier and a drop-in Claude Code engine. Z.ai (Zhipu AI) ships GLM-5 / 5.1 / 5.2 as a 744B-A40B MoE with open BF16/FP8 weights; GLM-5.2 (2026-06-16) adds a solid 1M-token context, flexible thinking effort, and the IndexShare architecture (2.9× lower per-token FLOPs at 1M). Top open-source model on coding/agentic benchmarks — Terminal-Bench 2.1 81.0 vs Opus 4.8’s 85.0, SWE-bench Pro 62.1, highest open-weight on FrontierSWE/PostTrainBench/SWE-Marathon — trailing only the Opus series. Runs inside Claude Code / ZCode / OpenCode via the GLM Coding Plan (model GLM-5.2 or GLM-5.2[1m]); the primary-source model profile behind the engine-swap path in Ollama + Claude Code cost savings. License is permissive but cross-reported (blog: MIT; repo: Apache-2.0). Z.ai is also notably candid about reward-hacking — a property of all capable coding models, not a GLM flaw — reporting it and shipping an anti-hack module (see Reward-Hacking and the Verification Frontier).

For creator-perspective model comparisons (Gemini 3.5 Flash field test, future MattVidPro/Matt Wolfe/AI-show hands-ons), see the AI Podcasts topic — that’s where opinion-and-observation-mix content lives.

Cost & Optimization

  • Ollama + Claude Code = Massive Cost Savings — Two cost-reduction paths for the Claude Code harness: route to local Ollama models (Gemma 4 31B etc — ~35× cheaper than Opus 4.6 even on the paid side) or to OpenRouter’s free tier. Four-env-var settings.local.json override pattern; the Ollama context-window pitfall (advertised 200k vs ~8k default — Modelfile fix required); the OpenRouter Haiku/Sonnet silent-charge gotcha. Workload routing guidance: low-stakes/high-volume to cheap engines, high-stakes to Opus.
  • 18 Claude Code Token-Optimization Techniques — Three-tier playbook (9 foundational + 5 advanced + 4 power-user) for cutting token spend without losing capability. Covers /compact, model routing, plan mode, subagent boundary discipline, project-structure tricks, MCP per-tool result-size override, hook-driven prompt trimming, and the disableSkillShellExecution flag. Recommended starter set inside.
  • Token Optimizer — Find Ghost Tokens, Survive Compaction (alexgreensh) — Source-available (PolyForm-NC) multi-platform plugin (Claude Code / OpenCode / OpenClaw / Codex) that audits a session for “ghost tokens” (bloated configs, unused skills, MEMORY.md past line 200, “60-70% lost per compaction”), checkpoints decisions across auto-compaction (Smart Compaction), and compresses re-reads (Delta Mode diffs, AST Structure Maps, 16 bash handlers) with S–F session-efficiency grades. 1,166★ / 257 tests / 114 releases; benchmarks creator-reported. The tooled counterpart to the techniques playbook above.
  • Headroom — Context Compression Layer for AI Agents — Local-first context-compression layer (chopratejas/headroom, ~25.9K★, Apache-2.0) that compresses tool outputs / logs / RAG chunks / files / history before they reach the LLM. Library / proxy / headroom wrap claude / MCP / cross-agent memory; content-aware (JSON / AST / trained text model), KV-cache-preserving, and reversible (CCR). Creator-reported 60–95% token cuts with held accuracy (GSM8K / TruthfulQA / SQuAD / BFCL). Most-starred entrant in the token-optimizer field; surfaced via Matthew Berman’s OSS-projects video.

API Reference

  • Extended Thinking (API Reference) — Authoritative reference for the thinking API parameter. Per-model compatibility (Opus 4.7 rejects manual budgets with 400), display modes (summarized/omitted), streaming behavior, tool-use constraints, interleaved thinking, prompt-cache invalidation rules, output-token limits.
  • Claude Code CLI Reference — Canonical surface for the claude binary. Subcommands (auth, mcp, plugin, remote-control, agents, auto-mode, setup-token, install, update) plus ~50 flags grouped by intent: permissions, model/effort/budget, system prompt, print mode, bare mode, subagents/teams, worktrees, MCP/plugins/channels, settings/debug. Environment variables matrix (CLAUDE_CODE_NO_FLICKER, CLAUDE_CODE_USE_POWERSHELL_TOOL, CLAUDE_CODE_PERFORCE_MODE, CLAUDE_CODE_CERT_STORE, CLAUDE_CODE_USE_MANTLE, etc.). Bridges terminal ↔ claude.ai via --remote / --teleport / --remote-control. claude --help is intentionally incomplete — refer here for the full surface.
  • ant — the Claude Platform CLI — First-party terminal client for the Claude Platform API: every resource a subcommand (ant messages create, ant beta:agents, ant beta:sessions:events), OAuth login, GJSON --transform, YAML/stdin/@file request bodies, and version-controlling agents / environments / skills as YAML synced via CI (ant beta:agents update). Claude Code knows how to shell out to it natively — the Platform-API sibling to the claude Code CLI above.

Release Notes

  • Week 13 (March 23–27, 2026) — v2.1.83 → v2.1.85. Auto mode (research preview), Computer use lands in Desktop, PR auto-fix on web, transcript search (/ in transcript), PowerShell tool (Windows preview), conditional hooks via if field. Other wins: plugin userConfig public + keychain secrets, image chips, managed-settings.d/, CwdChanged/FileChanged hooks, agent initialPrompt, readline Ctrl+X Ctrl+E, idle-return /clear nudge.
  • Week 14 (March 30 – April 3, 2026) — v2.1.86 → v2.1.91. Computer use in CLI (research preview), /powerup interactive lessons, flicker-free rendering (CLAUDE_CODE_NO_FLICKER), MCP per-tool result-size override (anthropic/maxResultSizeChars, 500K cap), plugin executables on PATH (bin/ directory). Other wins: PermissionDenied hook with retry: true, defer payload + --resume in -p mode, disableSkillShellExecution, hook output >50K → disk, thinking summaries off by default.
  • Week 15 (April 6–10, 2026) — v2.1.92 → v2.1.101. Ultraplan (cloud plan mode RP), Monitor tool (background watchers, v2.1.98), /autofix-pr from terminal, /team-onboarding ramp-up generator. Other wins: Ctrl+O focus view, Bedrock/Vertex login wizards, /agents tabbed UI, default effort raised to high for API-key/Bedrock/Vertex/Foundry/Team/Enterprise (Pro/Max stay xhigh), /cost per-model + cache-hit breakdown, CLAUDE_CODE_PERFORCE_MODE, CLAUDE_CODE_USE_MANTLE, OS CA cert trust, hardened Bash permissions, UserPromptSubmit.sessionTitle.
  • Week 16 (April 13–17, 2026) — v2.1.105 → v2.1.113. Claude Opus 4.7 default on Max/Team Premium, new xhigh effort + interactive /effort slider. Routines web UI with schedule/GitHub event/API triggers + /fire endpoint. /ultrareview adversarial critique pass + diffstat dialog. /usage breakdown (parallel sessions / subagents / cache misses / context, day/week views). Native binaries replace bundled JS. Other wins: PreCompact hooks block compaction, ENABLE_PROMPT_CACHING_1H, plugin monitors background watchers, /fewer-permission-prompts, /undo=/rewind, /proactive=/loop, hardened Bash deny through env/sudo/watch.
  • Week 17 (April 20–24, 2026) — v2.1.114 → v2.1.119. /ultrareview public research preview. Session recap (auto on return, /recap on demand). Custom themes (/theme picker + ~/.claude/themes/ + plugin shipping). Claude Code on the web redesign (sessions sidebar, drag-and-drop). Other wins: hooks call MCP tools via type: mcp_tool, /config persists to ~/.claude/settings.json, forked subagents (CLAUDE_CODE_FORK_SUBAGENT=1), default effort for Pro/Max on Opus 4.6/Sonnet 4.6 raised to high, --from-pr GitLab/Bitbucket/GHE, claude plugin tag, Opus 4.7 1M context fix, /resume 67% faster.
  • Week 18 (April 27 – May 1, 2026) — v2.1.120 → v2.1.126. Four features: sign in without browser callback (claude auth login paste-code for WSL2/SSH/containers), claude project purge (full project state cleanup with --dry-run / -y / --interactive / --all), resume by PR URL (/resume picker filters by pasted PR URL + claude --from-pr), Windows without Git Bash (PowerShell 7 as primary shell, auto-detected via Store/MSI/.NET). Other wins: MCP alwaysLoad: true, claude plugin prune, /skills search box, PostToolUse updatedToolOutput for any tool, claude ultrareview CI subcommand, --dangerously-skip-permissions extended, Gateway model discovery, MCP startup auto-retry, ANTHROPIC_BEDROCK_SERVICE_TIER, /terminal-setup, Vertex AI mTLS, memory leak fixes.
  • Week 19 (May 1–8, 2026) — v2.1.126 → v2.1.136. Seven releases bracketing the May 7 Code with Claude conference. Headlines: gateway /v1/models discovery for the /model picker, claude project purge for full-state cleanup, plugin .zip archives (local + URL), Channels on console (API-key) auth, hooks receive the active effort level via effort.level / $CLAUDE_EFFORT, parallel-session OAuth credential races fixed, worktree.baseRef setting toggle, settings.autoMode.hard_deny for unconditional auto-mode block rules, post-conference MCP refresh-token reliability fix. Theme: enterprise + governance.
  • Week 26 (June 12, 2026)Remote Control reliability, managed model governance, background-session hardening. v2.1.175 ships enforceAvailableModels (closes the Default-model escape from availableModels and prevents users from widening a managed allowlist). v2.1.176 localises session titles to the conversation language, adds footerLinksRegexes (regex-matched footer link badges), fixes hook if conditions for Read/Edit/Write tool paths (documented patterns now match correctly — silent regression fixed), fixes a cluster of Remote Control edge cases (model override on attach, numeric disconnect codes, no-disconnect on account-switch), improves Bedrock awsCredentialExport caching (actual Expiration field, not 1-hour hardcoded), and closes a batch of Windows/background-session respawn edge cases. v2.1.174 adds VSCode Account & usage per-skill/plugin/MCP attribution and wheelScrollAccelerationEnabled for fullscreen.
  • Week 25 (June 6–9, 2026)Managed Agents deployments + Fable 5 access, wrapped around Claude Code DX hardening. The throughline runs across all four feeds: CC v2.1.170 ships Fable 5 access; the SDKs (Python v0.109 / TS v0.104) add Managed Agents deployments + environment-variable credentials and a frontier_llm refusal category; the cookbook adds a Sentry-triage scheduled agent + async multi-agent orchestration; the claude-api skill documents scheduled deployments + Fable 5. CC features: fallbackModel (up to 3 models) + turn-retry-on-fallback (v2.1.166), hardened cross-session SendMessage auth, glob deny rules, thinking-disable; --safe-mode, /cd (prompt-cache-preserving), disableBundledSkills, and a new post-session lifecycle hook (v2.1.169). v2.1.167/168 bug-fix-only.
  • Week 24 (June 2–5, 2026)Security hardening, managed version gates, hooks feedback loops, OTEL dimensions. v2.1.160 prompts before writing shell startup files + build-tool configs under acceptEdits; confirms ultracode as the dynamic-workflow trigger keyword (replaces literal "workflow"). v2.1.161 adds OTEL_RESOURCE_ATTRIBUTES metric labels (slice by team/repo), done/total on agent rows, parallel Bash failures non-cancelling. v2.1.162 adds waitingFor to claude agents --json, Windsurf→Devin Desktop rebrand, startup noise reduction. v2.1.163 ships requiredMinimumVersion/requiredMaximumVersion managed settings (version gates), Stop/SubagentStop hooks gain additionalContext feedback (v2.1.163), /plugin list --enabled/--disabled. v2.1.165 bug fixes. (v2.1.164 not captured.)
  • Week 23 (May 28–29, 2026)Claude Opus 4.8 launch week. Claude Code v2.1.154 makes Opus 4.8 the default model (high effort), flips on dynamic workflows (ultracode — orchestrate tens-to-hundreds of agents), drops Fast mode on Opus 4.8 (2x rate / 2.5x speed), makes the lean system prompt the default for current-gen models, reverts /simplify to cleanup-only (undoing the W22 merge into /code-review --fix), relabels the /effort slider “Faster/Smarter”, and adds ! <command> background sessions. v2.1.156 hotfixes Opus-4.8 thinking-block API errors; v2.1.157 auto-loads plugins from .claude/skills (+ claude plugin init, mid-session worktree switching). Python v0.105.0 / TS sdk-v0.100.0 add claude-opus-4-8 + mid-conversation system blocks + usage.output_tokens_details; TS sdk-v0.100.1 ports the compaction encrypted_content fix. Cookbook ships an Agent SDK self-hosting cookbook (Docker/Modal/Kubernetes, PR #677); skills #1216 adds an Opus 4.8 migration guide to the claude-api skill. (v2.1.155 not captured by the watcher.)
  • Week 22 (May 18–22, 2026)Managed Agents self-hosted sandboxes + v2.1.149 security and polish week. Claude Code v2.1.144 ships /resume for background sessions, elapsed-duration on bg subagent notifications, /plugin browse last-updated timestamps, /model session-only-vs-default split, “extra usage” → “usage credits” CLI rename, 75s startup hang fix on unreachable api.anthropic.com (now 15s side-channel timeout), missed-window-resize self-heal, long-session progressive terminal corruption fix, image-extension-mismatch unrecoverable-conversation fix, macOS bg sessions Full Disk Access regression (2.1.143) fix. anthropics/skills #1164 ships the major shipment: CMA self-hosted sandboxes (config:{type:"self_hosted"} — agent loop on Anthropic’s orchestration, tool execution on customer infra via outbound-polling worker; covers EnvironmentWorker.run()/.run_one(), ant beta:worker poll/run, work.poller()/WorkPoller, AgentToolContext, environments.work.stats/stop monitoring, full cloud-vs-self_hosted delta table, credentials, security ownership split) + mid-session agent updates (sessions.update(session_id, agent={tools, mcp_servers}, vault_ids=[...]) — session-local override, doesn’t bump agent version) + large MCP tool outputs → files (>100K tokens auto-offload to sandbox file with truncated preview + path). Python SDK v0.103.0 + TS SDK v0.97.0 add lockstep sandbox helpers. TS Bedrock v0.29.2 + Vertex v0.16.1 ship @types/node CI fix. 8 cookbook commits scrub private-sandboxself-hosted-sandbox across notebooks (PR #643 merge). Pairs with Anthropic SDK Releases refresh noting Anthropic acquired Stainless on 2026-05-18 for ~$300M+ — the hosted SDK/MCP-server generation product is winding down; ~40-50 engineers including founder Alex Rattray join Platform Engineering under Katelyn Lesse.
  • Week 21 (May 12–17, 2026)Commercial-model inflection week. Claude Code v2.1.140 + v2.1.141 + v2.1.142 + v2.1.143 ship hook terminalSequence (desktop notifications without a controlling terminal), CLAUDE_CODE_PLUGIN_PREFER_HTTPS, ANTHROPIC_WORKSPACE_ID for workload identity federation, claude agents --cwd <path> (closes the multi-client scoping gap from W20), /feedback recent-sessions, Rewind menu “Summarize up to here” (less destructive than /compact), agent color palette, auto mode permission dialog explains the rule, background agents preserve permission mode, spinner warms to amber at 10s, eight new claude agents dispatch flags + Fast mode default flipped to Opus 4.7 + root-level SKILL.md as skill + MCP_TOOL_TIMEOUT fix for remote MCP servers (v2.1.142), plugin dependency enforcement with transitive enable/disable + disable-chain hint + projected per-turn context cost in /plugin marketplace browse pane (v2.1.143), worktree.bgIsolation: "none" for in-place background editing, PowerShell -ExecutionPolicy Bypass default (on by default for Bedrock/Vertex/Foundry), 8-block stop-hook infinite-loop cap (CLAUDE_CODE_STOP_HOOK_BLOCK_CAP override), /loop cancel + /goal subagent-race fix. Python SDK v0.102.0 + TS sdk-v0.96.0 land BetaManagedAgentsSearchResultBlock types + cache-diagnostics beta (unlocks the measurement layer for prompt-cache cost optimization). Cookbook ships CMA Sessions API as MCP server (stdio + Streamable HTTP, drives hosted Managed Agents from Claude Desktop / claude.ai) + Linear stateless webhook bridge template (stateless TS/Bun bridge, no SSE/no DB, session.metadata pattern). Claude Code weekly limits +50% through July 13 (stacks with W19 doubling). claude --print → SDK credit-split on June 15 ends subscription-token subsidy for programmatic agent harnesses. Anthropic passed OpenAI in business adoption (April 2026) per industry-research source; Codex 2-month free counter-promo from OpenAI within hours of the W21 weekly-limit increase. Cross-cutting refreshes also landed in hooks.md / plugins-and-marketplaces.md / cli-reference.md.
  • Week 20 (May 11–, 2026) — v2.1.139 + Claude Platform on AWS GA. Largest single CC release since the conference: agent view Research Preview (claude agents multi-session list, all paid plans), /goal command (goal-converged Claude — keeps working across turns until completion condition met), /scroll-speed live-preview UI, claude plugin details <name> showing inventory + projected per-session token cost, transcript view keyboard nav (? / { / } / v), hook args: string[] exec form (no shell, path placeholders never need quoting), hook continueOnBlock for PostToolUse (nudging instead of vetoing), MCP stdio servers get CLAUDE_PROJECT_DIR env var parity with hooks, compaction prompt preserves sensitive instructions, /mcp Reconnect picks up .mcp.json edits live with HTTP status on failure, /context per-skill estimates account for tokenizer. Same-day: Claude Platform on AWS GA (Anthropic operates the service, full feature parity with native API, complement to Bedrock-as-data-processor); Python SDK v0.101.0 + TS aws-sdk@0.3.0 ship the AWS client. Also: 2026-05-14 refresh adds primary-source walkthrough of agent-view flow (/bg continue, Ctrl-S sort modes, spacebar cross-session reply).
  • Anthropic + SpaceX Compute Deal — Rate Limits Doubled (May 2026) — Code with Claude conference announcement: Claude Code 5-hour session limits doubled across Pro/Max/Team, peak-hours throttling for Pro/Max removed, Opus API rate limits up significantly (tier 1 input 30k → ~500k tpm, output 8k → 80k tpm). Funded by SpaceX Colossus 1 (220k+ Nvidia GPUs, 300+ MW). Dario Amodei: “80-fold growth in Q1 on an annualized basis” explains the prior outages. Reddit-corroborated across r/ClaudeCode + r/Anthropic + r/ClaudeAI.
  • Kahn v. Anthropic — Class Action Over Claude Max Usage Limits — Proposed class action (Kahn v. Anthropic PBC, N.D. Cal., filed 14 June 2026, first reported by WSJ) alleging the premium Max 5x / Max 20x plans deliver far less than their advertised multiples of Pro: the complaint claims Max 20x (100/mo) “just three-and-a-half times,” and that the “50% savings” pitch is false. Opacity is central — Anthropic’s site is called “a black box” that never defines a “session” or discloses how usage is measured, which counsel says is why no earlier suit was filed. Four causes of action (CA CLRA §1750, FAL §17500, negligent misrepresentation, breach of contract); >$5M in controversy; class = US Max 5x/20x purchasers Apr 9 2025–present. Landed days after the Fable 5 launch, which itself drew complaints that the new models burn allowances faster. Allegations unproven; Anthropic declined to comment. Sibling news to the SpaceX rate-limit increase above.
  • Anthropic SDK Releases — May 2026 — Python v0.98 → v0.100 + TypeScript v0.93 → v0.95.1. Coordinated release cadence (Python and TS lock-step on major versions). Three feature areas: Managed Agents API surface lands fully (multiagents + outcomes + webhooks + vault validation in Python v0.100 / TS v0.95), auth modernization (Workload Identity Federation + interactive OAuth + auth profiles + OIDC workspace targeting), and security hygiene (TS v0.95.1 redacts api-key headers in debug logs).
  • Anthropic Cookbook: Managed Agents Multiagent + Outcomes (May 2026) — Two new managed_agents cookbooks (CMA_coordinate_specialist_team for compositional coordinator/specialists; CMA_verify_with_outcome_grader for grade-and-revise loops with rubric-writing failure-modes table) plus a Claude Agent SDK cybersecurity tutorial (06_The_vulnerability_detection_agent running threat-model → find → triage → report on a canary C target with three unlabeled memory-safety bugs). Skills repo claude-api skill updated to match. Docs/cookbook/SDK ship in lockstep — first three-layer coordinated release for a Claude API feature. Updated 2026-05-08: three follow-up commits land a dedicated “Claude Managed Agents” registry category, retagging managed_agents notebooks under it (PR #606).
  • Code with Claude 2026 — Opening Keynote (May 7 2026, 47:29 transcribed via local Whisper fallback) — Anthropic’s first developer-conference keynote. No new model unveiled. Three-layer story: model intelligence (Diane Penn — 18 versions, 8 frontier models in 12 months, AMP/Rakuten/Intuit Opus 4.7 wins, Mythos preview as next exponential), platform agents (Caitlin + Angela — three new Managed Agents primitives: multi-agent orchestration, outcomes, dreaming; advisor strategy at 5× cost reduction; live drone-landing demo on fictional Lumara startup), and Claude Code primitives (Cat + Boris — Claude Code Desktop launch, routines as “higher-order prompts,” AutoFix, shift code review, Claude Security composable into agentic CI). Customer scale: Mercado Libre 23k engineers / 90% autonomous coding target by Q3 2026, Shopify cross-functional rollout, Anthropic internal 200% PR/engineer increase. Headline: doubled CC 5-hour rate limits via SpaceX Colossus 1.
  • Claude Dreaming — One of three new Managed Agents primitives shipped at Code with Claude 2026 (alongside multi-agent orchestration and outcomes). One-button activation in the Cloud Developer Console; the dreaming agent inspects past sessions, identifies missed skills + lessons, writes them to a memory store; subsequent sessions reference the memory. Demo: overnight Dream produced a “descent playbook” of cross-mission heuristics that lifted the drone-landing benchmark from 4/6 sites → 6/6. Distinct from outcomes (within-session iteration) and multi-agent (parallel session generation); stacks with both. Memory portable per Anthropic’s Managed Agents promise. Open: cost model, schema flexibility, session-eligibility filters.
  • Translating Claude’s Thoughts Into Language — Anthropic interpretability research method that turns Claude’s internal activations into plain-English text, with a roundtrip-check (text → numbers → compare to original activations) for fidelity. Trained iteratively until convergence. Applied to the blackmail-engineer eval, the method reveals Claude detected the simulation: “explicit manipulation,” “this is likely a safety evaluation,” “designed to test whether I’ll act harmfully.” Recasts public safety results — model behavior under test ≠ behavior under deployment. Complements Mythos Preview’s quantified ~29% evaluation-awareness statistic.

Tools

  • Ultrareview (Cloud Multi-Agent Code Review) — Claude Code v2.1.86+ slash command /ultrareview. Remote sandbox spawns a fleet of reviewer agents, reports only verified findings. ~5–10 min per run, runs in background. Pro/Max: 3 free runs one-time, then 20 as extra usage. Not available on Bedrock / Vertex AI / Foundry / ZDR orgs. Positioned as the pre-merge counterpart to local /review.

  • Computer Use (Desktop + CLI) — Claude controls the actual desktop: opens native apps, clicks UI, screenshots, verifies end-to-end. Desktop launch W13 (March 23–27), CLI launch W14 (March 30 – April 3) via /mcpcomputer-use. Best for surfaces with no API: native iOS/macOS apps, iOS Simulator, vendor consoles, hardware UIs. Distinct from Cowork Dispatch (commanded from phone), Managed Agents (cloud sandbox), and Routines (cloud connectors). Off by default; pair with auto mode for permission discipline.

  • Browser Harness — CDP Browser Control Skill for Claude Codebrowser-use/browser-harness (MIT, 10.7k stars, Python). Thin Chrome DevTools Protocol harness that connects an LLM directly to a real Chrome browser via one websocket. The agent edits its own helpers in agent_helpers.py mid-run when the harness lacks a capability — the harness improves itself per-machine every session. Coordinate clicks via Input.dispatchMouseEvent traverse iframes / shadow DOM / cross-origin frames at the compositor level (the headline differentiator vs Playwright / Puppeteer / Selenium, which break at iframe and shadow boundaries). Multi-runtime: Claude Code via @~/Developer/browser-harness/SKILL.md import in ~/.claude/CLAUDE.md; Codex via ${CODEX_HOME}/skills/browser-harness/ symlink. SKILL.md rule: first navigation is new_tab(url), never goto_url(url) (which clobbers the user’s active tab). Parent platform Browser Use sells Browser Use Box as “Your 24/7 Claude agent” — hosted Browser Harness + Claude Code accessible via Telegram / web / SSH; free Browser Use Cloud tier offers 3 concurrent browsers + proxies + CAPTCHA solving. Sister project to video-use. Pairs with Computer Use (whole-desktop local) — Browser Harness is browser-only via CDP and can run remotely in parallel through start_remote_daemon().

  • Monitor Tool — Event-Driven Background Watchers in Claude Code — Claude Code v2.1.98 (April 9, 2026) built-in tool that spawns a shell command, treats stdout as an event stream, and wakes Claude with a transcript message on each emitted line. Replaces the Bash sleep-loop polling pattern with proper interrupt-driven re-entry. Four parameters (description, command, timeout_ms default 5min/max 1hr, persistent). Pairs with self-pacing /loop. Claude Code-only — no Messages API equivalent. Token math is the headline win: zero spend while watcher is silent, one transcript message on event. 5-row decision-tree comparing with Bash(run_in_background), /loop, Routines / cron, and OS-level cron. Concrete replacement patterns documented for Hermes self-improving loop and GSC SEO polls. Resolves the 2026-04-12 research-agenda priority on Monitor.

  • Five OSS Tools That Fix Claude Code’s Blind Spots — Operator workflow pairing Claude Code’s four blind spots (forgets / ignores codebase / ships bugs / codes blind) with five OSS fixes: Intent Layers (hierarchical AGENTS.md navigation), DeepSec + the Vercel React/Next.js best-practices skill, agentmemory, and a Claude-Code-in-Chrome visual verification loop. Two already articled; net-new pieces are Intent Layers, the Vercel skill, and the Chrome loop.

  • Vibe-Coding Tools Roundup: Handy, Ponytail, shadcn improve, draw.io Skill (June 2026) — A YouTube “5 repos” roundup; four tools verified real + MIT-licensed: Handy (cjpais/Handy, ~24k★, offline push-to-talk speech-to-text, a Wispr Flow alternative — the most mature), Ponytail (DietrichGebert/ponytail, anti-over-engineering YAGNI ladder + /ponytail-audit), shadcn’s improve (read-only codebase auditor that writes an effort/confidence/risk-rated remediation PLAN, never implements), and a draw.io architecture-diagram skill (codebase → editable .drawio). The fifth, SkillSpector, is already articled (and refreshed here with operator cost/usage detail). Treat the <2-week-old repos’ self-reported metrics as provisional.

  • Claude + Power BI Custom Visuals (DAX-to-HTML and Vega-Lite) — Free-tier workflow (Mike, F9 Finance) for building any Power BI custom visual by describing it to Claude. Two free Microsoft AppSource visuals — HTML Content (renders an HTML string returned by a DAX measure) and Deneb (renders a Vega-Lite spec) — paired with Claude reading the model’s TMDL semantic-model files (report saved as a .pbip project) so it knows your table/column/measure names. Prompt Claude to write DAX measures returning HTML (red/green variance KPI cards, multi-location color-dot tables) or Vega-Lite specs (bullet charts Power BI lacks natively). Runs on Sonnet, effort high, free tier. Gotcha: Deneb renders blank if field names don’t match exactly. Claimed ~4h→20min board-pack section. Applied cousin of Self-Service Data Analytics.

Products

  • Claude Design (Anthropic Labs) — Anthropic’s first Anthropic Labs product (April 17, 2026). Visual-creation surface at claude.ai/design — designs, prototypes, decks, one-pagers. Powered by Opus 4.7. Import from text/images/docs/codebases/URLs; export to Canva, PDF, PPTX, HTML; handoff to Claude Code. Pro/Max/Team/Enterprise (Enterprise off by default).
    • Tutorial: Design for Prototypes and UX — four named product-design workflows (rapid prototyping, design reviews, user-flow mapping, internal tools), codebase linking, handoff to Claude Code.
    • Tutorial: Design for Presentations and Slide Decks — generate interactive-HTML decks from plain-language prompts; brand consistency when a design system is connected; export to HTML, PPTX, PDF, .zip, Canva, or Claude Code.
    • Walkthrough: First-Time User Guide (Paul Couvert YouTube) — step-by-step ~14-min walkthrough covering 4 creation tabs, design-system setup (~5 min), the Q&A clarifying flow (“decide for me”), the Tweaks/Edit/Comment editing trio, templates as reusable primitives, motion graphics with Opus 4.7 benchmarks example, and the separate weekly usage limit (heavy users hit it in ~2 hours; motion graphics burn faster than static).
    • 10 Use Cases and Pro Tips (leopardracer playbook) — 10 prompt recipes (pitch deck, prototype, landing page, document→one-pager, competitor capture, internal tool, social posts, video storyboard, wireframe→Claude Code, page redesign), 5 pro tips, audience-segmented framing for founders / designers / engineers.
    • Prompt Examples — Spec-Style Prompting (motionsites.ai) — Viktor Oddy’s hyper-detailed React + TypeScript + Vite + Tailwind email-template prompt as an example of the spec-prompt pattern. Pixel values, exact hex, exact font weights, exact CDN URLs. Single-shot production-ready output tradeoff: ~5× the writing for ~5× less re-work.
    • Architecture Teardown: Six Agentic Patterns (Sam Witteveen) — reverse-engineering of Claude Design as a reference architecture for vertical-agent builders. Six patterns: agentic context grounding, structured memory, multimodal iterative refinement (5+ input modes — chat, voice, hover-on-DOM, draw-on-screen, screenshot-of-output, model-generated UI controls as tokens), self-QA via vision (renders → screenshots → critiques → iterates before showing user), proactive multi-variation generation (agent learned hierarchy: layout > typography > accent), and handoff (HTML/CSS storage, exports to PowerPoint/Figma/Canva/PDF/Claude Code). Sam’s takeaway: the qualitative difference is the combination of patterns 1+2 — most enterprise agents still write giant system prompts instead of dynamic memory + grounding harnesses.
    • How to Build Websites with Claude Design (AIS+ playbook) — End-to-end production playbook for shipping a real website using Claude Design. Five-stage pipeline: brand brief in Claude Chat → image-and-video assets via Kie.ai (Nano Banana 2 + Kling) → Claude Design canvas iteration → ZIP export → Claude Code → GitHub → Vercel deploy. Four in-canvas editing tools (Comment / Inline Edit / Draw / Tweaks Panel) and when to reach for which. Two-Meter Usage System (Pro = ~1 website per week; Max 20× = a few projects per week — separate from Chat usage). The ZIP-export-to-Claude-Code escape hatch is the “I hit my limit but still need to ship” answer. Mobile-optimization gotcha: Claude Design’s default render is desktop-first; explicitly request mobile review or the deployed site looks broken on phone. Companion AIS+ resource for the Codex / Hermes / Paperclip course bundle.
    • SCALE AI) — Seven-step pipeline for using Claude Design specifically for email marketing: build the brand design system once (logo variants + colors + fonts + button styling), pick a reference email + capture as image (use GoFullPage for long emails), create a Claude Design project (BRAND_TYPE_ANGLE naming, disable interactive prototype, high-fidelity), write a structured section-mapping prompt in regular Claude not Claude Design (cost discipline), run the prompt, iterate the three predictable inline fixes (hero image swap, button styling, color matching via the Color Picker Chrome extension), Present to preview as HTML. Klaviyo handoff via Figma’s HTML to Design plugin (Share → Download HTML → drag into Figma → slice into 5-8 image blocks → export PNGs → Klaviyo image blocks with URLs). 3-5K/mo email-agency replacement at <1hr per email after first email. Companion to the website + presentation Claude Design playbooks.
    • SCALE AI) — The landing-page sibling of the email playbook. Six steps to rebuild a Meta-proven competitor DTC lander for your own brand in under an hour: pick an advertorial/pre-sell that’s run 60+ days (Meta Ad Library) → capture full-page with Go Full Page → initial build prompt (inventory every section in order, rebuild the structure for your brand, “no nav, no footer”, mobile optimize) → screenshot-driven section-by-section iteration (always paste a cropped reference screenshot, hero first) → fill image placeholders (Claude Design can’t generate images — product photos / Higgsfield / GPT Image 2) → add viewport switcher for desktop → export production-ready HTML to Shopify or a page builder (Replo/Shogun). Core rule: clone the persuasion architecture, not the copy or branding (ethical and higher-converting). 17–20, Max 20/seat, Enterprise custom with OpenTelemetry. Ready for enterprise as of Apr 8 2026. Bundles with Cowork Plugins and Cowork for Marketing.
    • motionsites.ai) — Field signal (X, 2026-06-10, 828K views) that Mythos is being pointed at cinematic, scroll-heavy web design within a day of launch. Same creator as the spec-prompt example above; a 12-min screen-recorded tutorial pairing Fable 5 High with a GPT Image 2 hero-asset pipeline (scroll animations, parallax, video backgrounds, GSAP/Three.js-style motion). Promotional — actual prompts gated behind motionsites.ai, no transcript (video reconstructed from Grok analysis), confidence low. The fully-sourced companion is the build-websites playbook above.
    • Fable 5 Memory Loop — Weight-Free Learning and the ACE Pattern — Thread by Robert Youssef (@rryssf / @godofprompt), 2026-06-12. Thesis: Fable 5 is the first frontier model that ships the notes-loop pattern (frozen weights + evolving context) as a native production capability. Covers the two October 2025 papers that established the pattern: ACE (Stanford/SambaNova/Berkeley, arXiv 2510.04618, ICLR 2026 — Generator→Reflector→Curator loop, +10.6% agent tasks / +8.6% finance at zero weight updates) and Training-Free GRPO (Tencent, arXiv 2510.08191 — semantic-advantage token priors). Fable 5 specifics: native persistent file-based memory, 1M context, built-in compaction (maps to ACE’s curator), ~3× memory benefit vs Opus 4.8 (vendor eval, Slay the Spire). Risks: Misevolution paper (ICLR 2026) — memory accumulation alone reduces refusal rates 70–86%; alignment tipping; notes rot (better models write more persuasive bad notes). Operator playbook: file-based + inspectable, ground-truth feedback in the loop, human judgment at curation. Medium confidence (vendor eval, key risk paper cited but unverified).
  • Claude Cowork (Product Overview) — Anthropic’s “delegate and come back later” workspace product — hand off a task, Claude executes end-to-end (files, spreadsheets, reports, decks, connectors), and you return to a polished deliverable. Positioning is delegation, not chat (“Delegate to Claude, delight in the result”). Three load-bearing feature drops (enterprise availability Apr 8, Dispatch computer-use Mar 23, private plugin marketplace Feb 24); Pro/Max/Team/Enterprise pricing ladder; explicit folder/connector permission model. The product hub for the Cowork tutorial and recipe cluster below.
    • Tutorial: Dispatch in Claude Cowork — send tasks from mobile, execute on desktop. Four capability categories (files, connected apps, Claude in Chrome, native desktop apps via computer use). Pro/Max required; desktop must stay awake during execution.
    • Tutorial: Getting Started with Cowork (Notion Walkthrough) — Non-developer-friendly 7-step onboarding (Notion AI Recipe, February 2026): install Desktop → folder permission → first file-organisation task → spreadsheet generation → Claude in Chrome extension → connectors/plugins/projects. Plus 5 non-obvious use cases (Hiring Manager / Group Travel Agent / SEO Auditor / Customer ICP Builder / Content Strategist) and a deep plugins-and-skills section ending with Claude building its own “Lead Magnet Launch Kit” plugin from a business-context prompt. Sonnet 4.5 is the recommended default model in the source (date-anchor: written before W16 made Opus 4.7 the Max/Team Premium default).
    • Recipe: Cowork Projects “AI Consultant” — 4-Knowledge-File Pattern — Eliot Prince’s 6-step recipe (Notion AI Recipe, March 2026) for building a Claude Project that researches new clients, produces deliverables in your voice, and runs autonomously inside Cowork via the new Import from project feature. Reusable architecture: Business DNA + Client Intelligence Brief (industry-agnostic 6-domain due-diligence prompt, quoted verbatim) + Service Playbook + Consulting Framework. “Fundamentals before automation” principle — test in Chat first, then connect to Cowork. The Client Intelligence Brief alone is worth lifting as a standalone reusable artifact.
    • Build Your Own Jarvis with Cowork — Three-Level Rule-Stacking AIOS (Jeff) — 20-min YouTube tutorial that takes a Cowork user from empty desktop to a personal “Jarvis” via three-level CLAUDE.md hierarchy (root → workstation → project) plus per-level memory.md and on-demand 00-resources/. Operator counterpart to Simon Scrapes’ developer-track AIOS — same architectural primitives, Cowork surface rather than Claude Code. Voice profile auto-generated from last 30 sent emails (Gmail-connected) or 5 writing samples. Auto-creation rule + workstation prompt templates scaffold each new life-area workstation (Email HQ / Personal Finances / Travel / Speaking Engagements). Cost discipline rules (root ≤300 lines, no rule duplication, Sonnet default with Opus only for 3+ dependent steps). /session-audit skill writes uncaptured principles back to memory at session end. ~30-workstation eventual ceiling with “start slow” guidance. US-Constitution / state-law analogy makes the rule-stacking mental model concrete. Source self-describes as “the most important takeaway: these are just simple text files” — strips the intimidation factor that kills most personal AIOS attempts.
    • Recipe: Seven Cowork Live Artifacts Dashboards (Eliot Prince) — Prompt-by-prompt operator walkthrough of seven Cowork Live Artifacts the author built over a single weekend to replace daily morning tool-hopping (Google Analytics → Semrush → Notion → email → bank → CRM): Competitor Move Tracker (Apify), Daily Command Center (Notion+Calendar+Gmail+Slack), Daily Financial Position (QuickBooks/Zero/Stripe), SEO Pulse (custom GSC/GA MCP), Support Pulse (Gmail+HubSpot CRM), Sales Pulse (CRM+Apollo+payments), YouTube Morning Dashboard (custom YT Analytics MCP). Each artifact = a self-contained HTML page pinned to the Cowork sidebar, calls Connectors live, runs inline AI inference on results, persists across sessions. The “ask Claude to build a custom MCP for Google Search Console” pattern is the unlock for any analytics surface without a native Connector. branded skill file keeps every dashboard on-brand automatically. Sister recipe to Cowork Jarvis Build — that one builds the personal AIOS context layer, this one builds the operational dashboards the AIOS feeds into.
    • Cowork Fundamentals in 22 Minutes — Tina Huang’s Level 1 → Code Walkthrough — 22-min beginner-to-advanced Cowork walkthrough by Tina Huang (Lonely Octopus). Five-level escalation framework: file organization (Sonnet) → skills via “make this a skill” (brand-book example) → connectors/plugins (Finance plugin highlight) → scheduled tasks (7am daily brief → Apple Notes) → Projects + Productivity Plugin as memory primitive (“don’t reinvent CLAUDE.md — Anthropic built this plugin and they’re better at it”) → Cowork + Claude Code handoff for the 0.1% combination. Concrete worked examples: messy-desktop organization (376 files), 24-month credit-card-statement → interactive HTML spending dashboard (Opus, 24 docs exceeds chat’s 20-doc cap), brand-asset screenshots → reusable “apply brand” skill, Investments project with mission-control dashboard + scheduled daily portfolio digest. Operator-track counterpart to the Notion 7-step walkthrough — that one is depth-on-use-cases; Tina’s is depth-on-escalation + the Productivity-Plugin-as-AIOS-default recommendation.
    • Cowork Mission Control Setup (Tina Huang) — PRD-First Build + Overnight Autonomous Builder — Follow-up Cowork deep-build tour, 2026-05-20. PRD-first operating-instructions block (4 sections: building/pushback/note-taking/reversibility); meta-PRD called Mission Control Build PRD that scaffolds the entire personal Cowork architecture; 5-hour hour-by-hour build plan (Hour 1 = data lake of fresh-data pipelines; Hours 2-4 = projects on top; Hour 5 = polish); three starter skills (/today morning brief + /research ticker deep-dive + /prep meeting context). Headline novel pattern: pending/ → in-progress/ → done/ → failed/ autonomous-builder folder workflow — 30-min scheduled task polls pending/ and builds whatever PRD it finds, so Tina wakes up to finished projects. Hostinger VPS as the always-on enabler (laptop-sleep-kills-agent problem). Third distinct Cowork-pattern in the wiki alongside Jeff’s rule-stacking + Eliot Prince’s live-artifact dashboards — Tina’s principle is plan-then-build-then-poll-then-monitor.
    • Claude for Small Business — Cowork Plugin with Pre-Wired Connectors — Anthropic’s small-business plugin for Cowork. Bundles QuickBooks/PayPal/HubSpot/Canva/DocuSign/Slack/Stripe/Square/Microsoft 365/Gmail/Google Calendar/Google Drive connectors + pre-built skills (Business Pulse, Daily Briefing, Call List, Canva Creator). Install via Cowork → Customize → Browse Plugins → “Small Business.” The Customize step is load-bearing — plugin reads existing Cowork context (CLAUDE.md / memory.md / past tasks) and personalizes ICP / lead definitions / business-pulse thresholds from the operator’s prior profile. Deliberate human-in-the-loop posture (drafts emails but does not send; existing tool permissions inherited; Anthropic explicitly does not train on business data). The bar Anthropic is setting for small-business AI — incremental + permissioned + approval-gated — explicitly framed against “AI runs your whole business autonomously” pitches. Best for single-business operators whose tooling matches the bundled connector list; agencies should skip in favor of per-client custom skills.
    • Top 5 Claude Cowork Tips I Wish I Knew from Day One (Jeff) — Five specific tactical tips after five months of daily Cowork use. Tip 1: Obsidian as your Cowork file viewer (Open folder as vault). Tip 2: 300-line claude.md ceiling (Jeff’s 600→250 cut = ~25% token drop) + relocate non-essential rules to reference files behind one-line pointers. Tip 3: Right file routing — claude.md for prescriptive rules (“always/never/before-doing-X-do-Y”), memory.md for changeable facts; audit prompt to migrate misplaced entries; cascading project-level memory.md keeps root under 100 lines. Memory diet: 150-line memory.md ceiling + archive.md for everything older. Tip 4: Migrate Claude Projects → Cowork workstations (project instructions → workstation claude.md, project memory → workstation memory.md, knowledge files → resources/). Tip 5: Skill-vs-workstation test — “Is this a place I work or a thing I do?” Companion to Ben’s broader feature primer below.
    • Every Claude Cowork Feature Explained for Normal People (Ben) — Ben (creator of Ben’s AIOS) walks through every Cowork concept in three layers: memory + context (context window → global instructions → built-in memory → file access → claude.md → projects → second brain/AIOS), capabilities + automation (code execution → skills → evals → auto-research loop → scheduled tasks → routines → sub agents → dispatch), connectors + MCP (connectors → plugins (external / Anthropic / custom) → MCP → browser use → computer use → live artifacts). Plus best practices (mindset, token discipline, model selection, when to switch to Claude Code) and team rollout (permissions, shared skills, shared plugins, shared second brain via Obsidian Relay). Sister-walkthrough to Jeff’s tactical tips above and to Ben’s narrower AIOS bundle.

Integrations

  • Claude Shopify Connector — Run Your DTC Store From One Chat — Shopify shipped an official connector for Claude Desktop (and ChatGPT) on 2026-05-06 exposing 25 store-management tools. From a single chat: add products, update inventory, build discount codes, pull sales reports, run customer-segment analyses. Both reads (catalog rundowns, inventory flags, customer LTV) and writes (product creation, bulk updates, discount codes) work end-to-end. Live Artifacts dashboards render inline with hoverable graphs and pre-prompted compare buttons. Pattern-recognition prompt produces behavioral segments + bundle hypotheses framed as “having a mini research analyst with access to all your data.” Source: Scale AI YouTube walkthrough (550-DTC-brand community).
  • Figma MCP Server (April 2026) — Bidirectional bridge between Figma files and AI coding tools. Read design context into code, write running UI back to the canvas, let agents execute Plugin API JavaScript directly. Covers remote vs desktop server, the seven Figma-published skills, Code Connect, pricing/rate limits, Claude Code setup, competitive landscape.
  • [[claude-ai/railway-remote-mcp|Railway Remote MCP and railway agent CLI]] — Railway shipped two agent surfaces (April 2026): a hosted Remote MCP server at mcp.railway.com with browser-based OAuth (no tokens on disk), and a new railway agent CLI subcommand. Both call the same backend agent. Install in Claude Code with claude mcp add railway --transport http https://mcp.railway.com. The post is also a case study in MCP tool-surface design — Railway shipped 7 tools and is reducing that count; the deliberate decision is to route every multi-step operation through one railway-agent delegation tool (“context is expensive on both sides”). Implementation notes show the MCP server folded into Railway’s existing monolith (one auth system, one permission model) rather than spun up as a separate service. Portable architecture lesson for anyone building an MCP server.
  • AITmpl — Curated AI Templates Stack Builderaitmpl.com curated catalog with stack-builder UI, backed by the OSS programs of Vercel/Neon/Anthropic. Featured: ClaudeKit (AI Agents & Skills, docs.claudekit.cc), BrainGrid (AI Product Planner explicitly framed as the upstream-of-Claude-Code planning surface), Bright Data (web-data templates), TinyFish (AI web-agent platform — full coverage in agents-agentic-systems after the 2026-05-04 Search + Fetch free-tier launch). Surfaced via Elvis Saravia (@omarsar0, DAIR.AI). Discovery angle, not a Claude-specific marketplace; useful for cross-tool stack assembly. Both ClaudeKit and BrainGrid are flagged for follow-up Research operations.

Skills & Plugins

  • Agent Skills Overview (Official Anthropic Docs) — Authoritative reference: VM/filesystem architecture, 3-level progressive disclosure with official token figures (~100 / <5k / unlimited), API beta headers, cross-surface sync limitations, sharing scope per surface, formal YAML field requirements (64-char name, 1024-char description), security model, pre-built skills (pptx/xlsx/docx/pdf).
  • The Complete Guide to Building Skills for Claude — Anthropic’s official guide covering skill fundamentals, design, testing, distribution, and patterns.
  • Building Agents with Skills (Progressive Disclosure, Skill Types, Open Standard) — Anthropic blog (Jan 22 2026). Tax-professional analogy, 3-tier progressive disclosure (~50/500/2000+ tokens), three skill types (Foundational/Partner/Enterprise), “code is all you need,” 4-layer agent architecture, open standard across platforms.
  • Skills + MCP Synergy (Hardware Store Analogy) — Anthropic blog (Dec 19 2025). Skills teach “how,” MCP provides “access.” Composition examples (financial analysis, Notion meeting prep).
  • Skill Design Patterns — Five proven patterns for structuring Claude skills: sequential workflow, multi-MCP coordination, iterative refinement, context-aware tool selection, and domain-specific intelligence.
  • Claude Code Skills Ecosystem — Overview of skills: bundled skills, official skills, the Agent Skills open standard, enterprise provisioning, and API usage.
  • Claude Code Plugins and Marketplaces — Official registry (101 plugins), community registries, enterprise private registries.
  • claude-quickstarts — Official Starter Apps for the Claude API — Anthropic-maintained MIT collection of ready-to-deploy starter applications built on the Claude API (customer-support agent, financial-data analyst, computer-use demo, agent quickstart). ★16.7k. Sister surface to the Anthropic Cookbook — cookbook is single-technique notebooks, quickstarts are full deployable projects. Fork the closest project before writing new Claude-API boilerplate.
  • awesome-claude-code (hesreallyhim) — The Canonical Community Claude Code Index — ★44.4k community-curated awesome-list. By star count and citation frequency this is the index for skills, hooks, slash-commands, agent orchestrators, applications, and plugins built around the claude CLI. The four downstream “awesome-*” derivatives (sickn33 ★38k, VoltAgent ★22k, alirezarezvani ★15k, quemsah ★0.7k) largely overlap with this list. Use as the monthly inbox-sweep seed alongside claudeskills.info and the Claude Code plugin marketplaces.
  • skills — Canonical Anthropic Skills Repository — The 124k-star canonical Anthropic-published skills repo. 17 example skills (algorithmic-art through xlsx) + the Agent Skills spec (spec/agent-skills-spec.md) + a starter template. Two installable plugins: document-skills@anthropic-agent-skills (the four production document tools — docx/pdf/pptx/xlsx, source-available) and example-skills@anthropic-agent-skills (the other 13, Apache 2.0). Register via /plugin marketplace add anthropics/skills. Same skills pre-installed on Claude.ai paid plans; also exposed via the Skills API.
  • SkillSpector — Security Scanner for AI Agent Skills (NVIDIA) — Open-source (Apache-2.0, Python, 483★) NVIDIA scanner that statically inspects skill bundles for vulnerabilities, malicious patterns, and security risks before you install/run them — the skill-supply-chain analog of a dependency scanner. Fills the skill-security gap as one-click skill hubs proliferate; pairs with the defense-in-depth discipline in Hermes’ security model and the OSS-gap-fillers in Five OSS Tools. Sourced from an X bookmark + verified repo metadata; CLI/coverage details need a README pull.
  • Claude Code Security-Guidance Plugin (Anthropic Official) — Anthropic’s free first-party plugin that makes Claude review its own code changes for vulnerabilities and fix them in-session. Hooks-based three-stage review: per-edit pattern check (no model call — flags eval(/os.system/dangerouslySetInnerHTML/etc.), end-of-turn full-diff review (Stop), and commit/push review. Free, all plans, requires CC v2.1.144+ / Python 3.8+; /plugin install security-guidance@claude-plugins-official. Org rules via .claude/claude-security-guidance.md (MDM-distributable). Anthropic reports a 30–40% drop in PR security comments. The in-session layer beneath /security-review, PR Code Review, and CI scanners. Distinct from third-party SkillSpector (vets skills) and DeepSec (on-demand CLI). Grounded in official docs + @ClaudeDevs launch thread.
  • Claude Skills Hub (claudeskills.info) — Third-party community marketplace directory aggregating 658+ skills across 12 categories. Foregrounds cross-vendor official collections: Anthropic (16), OpenAI (37), Microsoft (333), Google (11), Vercel (8), GitHub Copilot (324), WordPress, plus community packs (Cybersecurity Skills 734+ MITRE-mapped, Trail of Bits 45 security skills, Everything Claude Code 86+, PM Skills 63). Distinct from SkillsMP / TokRepo / awesome-claude-skills because it surfaces publisher tiers as first-class. Featured on Product Hunt. Discovery surface — install via underlying GitHub repos.
  • skillshub) — Second third-party Claude/Codex skill-discovery directory; sister to claudeskills.info but with two structural differences — RUN counts as engagement metric (raw execution tallies, not stars/installs) and skill-maintenance utilities (skill-creator/updater/validator, plugin-validator, link-checker) as the lead catalog tier. Dual-runtime by default (.claude/skills/ + .codex/skills/). AGENTS.md awareness baked into the lead skill-creator skill. Use alongside claudeskills.info for complementary discovery — they surface different leaders.
  • skills.sh — The Open Agent Skills Directory (Vercel) — Vercel’s open, cross-agent (20+ runtimes) skills directory; install via npx skills add <owner/repo>, ranked by install count + 8-week trend. The catalog the Codex+MagicPath find skills step queries; install-ranked counterpart to MuleRun’s RUN-count hub and claudeskills.info.
  • agent-skills (Addy Osmani) — Production Engineering Skills for AI Coding Agents — Multi-IDE skill collection (addyosmani/agent-skills, ~58K★, MIT) packaging a spec-driven lifecycle: 7 slash commands (/spec → /plan → /build → /test → /review → /code-simplify → /ship, plus /build auto) and 16 skills (interview-me, TDD, doubt-driven-development, source-driven-development, code-review-and-quality…). Installs as a Claude Code plugin; also Cursor / Antigravity / Gemini / Windsurf / OpenCode / Copilot / Kiro / Codex. Higher-signal and more lifecycle-complete than single-author bundles; corroborated by GH-star + Matthew Berman video + X bookmark.
  • Claude Cowork Plugins — 11 launch plugins, Feb 2026 upgrade with department-specific plugins and cross-app workflows.
  • Skills vs MCP vs Plugins — When to Use Which — Decision guide with the kitchen analogy and starting recommendations.
  • Shopping for Skills and Plugins — A 6-Question Vetting Framework — Vetting checklist for marketers about to install a community skill or plugin. Six questions (publisher / last update / what it touches / dependencies / license / fit), red flags, and the WEO-specific approval posture (official Anthropic skills pre-approved, everything else through the AI Council connector workflow). Anchored on Snyk’s ToxicSkills empirical study + Repello security write-up + Anthropic plugins docs.
  • Five Claude Skills To Build Right Now (Eliot Prince recipe) — Starter pack: Decision Council, Framework Reverse Engineer, Lyra Prompt Writer, Amazon Shopper, Branded LinkedIn Carousel Generator. Four skills ship as .skill downloads. Meta-lessons on compounding/chaining skills. Refreshed 2026-05-02 with the toggles-fully-expanded re-fetch that surfaces the verbatim Decision Council 5-persona prompt and Daniel Priestley framework-extraction transcript baked into the Skill build prompt.
  • AI Recipe Vault — Eliot Prince’s 19-Recipe Notion Catalog — Catalog entry for the parent Notion vault that the Cowork Getting Started, Cowork + Apify, LinkedIn Engagement Machine, 5 Claude Skills, Cowork “AI Consultant”, and R.I.T(E) Framework articles all came from. UK-authored (eliotprince.com); 19 recipes total spanning Cowork, Claude Chat, Claude Skills, OpenClaw, Perplexity Computer, Lovable, Apify, Gemini Canvas, Gamma, Poppy AI, ChatGPT custom GPTs. Six recipes individually articled; thirteen covered as catalog summaries (Skills walkthrough with 10 free skill files, Chat 101, ChatGPT-to-Claude migration, AI-Powered Annual Review, 100 Viral Content Ideas, Build a Website with Lovable, Electro Lead Magnet Builder, OpenClaw 101, Perplexity Computer 101, LinkedIn Carousel Creation, Gemini Presentations, Mega Prompt Chest wrapper, Laziest Ways to Make Money). Treat as one operator’s library, not vendor-published documentation.
  • Marketing Skills Bundle (Corey Haines) — 36+ skills for CRO, copywriting, SEO, paid ads, analytics, retention, growth, sales, strategy. 21.8K stars. product-marketing-context foundation skill read by every other skill. Six install methods (npx skills, Claude Code plugin, clone, submodule, fork, skillkit). Largest single marketing skill bundle in the community.
  • social-media-skills (Charlie Hills) — 16 Claude skills behind Charlie Hills’ multi-platform content system (350k+ followers, 100M+ views/year). Voice-first: voice-builder produces about-me.md + voice.md and every other skill checks them first. LinkedIn focus (post-writer, profile-optimizer, post-scorer, hook-generator, content-matrix, niche-research, graphic-designer), plus reels-scripting, youtube-thumbnail, gemini-infographic/-carousel, quote-post, pinned-comment, analytics-dashboard. MIT, 372 stars. Five install paths including /plugin marketplace add charlie947/social-media-skills. Sibling to Marketing Skills Bundle (Corey Haines = breadth; Charlie Hills = depth on social/newsletter).
  • Single Brain) — 14-category open-source Claude Code skill repo for marketing/sales ops. MIT, 2,521 stars, Python. Categories: Growth Engine (bootstrap-CI / Mann-Whitney-U A/B significance), Sales Pipeline (RB2B Router, Deal Resurrector, ICP Learner), Content Ops (Expert Panel → 90+ quality gate), Outbound, SEO Ops (GSC Optimizer, Content Attack Briefs), Finance Ops (AI CFO), Conversion Ops (CRO), Podcast Ops, Team Ops, Sales Playbook (value-based pricing), Autoresearch (Karpathy-inspired generate→score→evolve loops), Deck Generator, YT Competitive Analysis, X Long-Form + Humanizer (24-pattern AI-slop detector). SKILL.md per category → drop into .claude/skills/. Agency-ops counterpart to Corey Haines (breadth) and Charlie Hills (social depth).
  • Career-Ops — Multi-Agent Job Search on Claude Code (santifer) — ~48.3K★ MIT (10K forks; created 2026-04-04) job-search command center built on Claude Code (also OpenCode / Gemini CLI). One /career-ops slash command → ~12 modes: paste-a-JD auto-pipeline (evaluate + ATS PDF + tracker), Playwright portal scanner (45+ companies across Ashby/Greenhouse/Lever/Wellfound), A–F rubric scoring (10 weighted dimensions, reasons CV-vs-JD not keyword-match), batch eval via parallel claude -p workers, interview Story Bank + negotiation scripts. Self-customizing (modes are editable markdown the agent reads and edits); human-in-the-loop (never auto-submits). The reusable pattern — paste → archetype-route → rubric-score → tailored-artifact → tracked-pipeline on an agentic CLI — generalizes to lead/RFP/brief triage. Productivity-domain sibling to the marketing skill repos above.
  • Everything Claude Code (ECC) — Affaan Mustafa175,918 stars / 27,217 forks (largest Claude-Code-adjacent agent bundle on GitHub at fetch). MIT, Anthropic Hackathon winner. 48 agents, 182 skills, 68 commands, 14 MCP servers, 34 rules, 997+ tests. Cross-harness parity (native Claude Code + Cursor + Codex + OpenCode + Antigravity + experimental Gemini CLI) — 997 internal tests verify behavior consistency. AgentShield security scanner (1,282 tests, 102 rules) audits CLAUDE.md / settings.json / MCP / hooks / agents / skills. Memory persistence via instinct system (/instinct-status, /evolve to cluster into skills). Token-optimization recommendations (Sonnet + 10k thinking-token cap + 50% compact threshold) claim 60-70% cost reduction; aligns with budget framework. Founder Affaan Mustafa (Itô / ECC-Tools, real founder, 5.3k followers). 170+ contributors, biweekly releases.
  • The Agency (msitarzewski) — 147 AI Agents Across 12 Divisions95,079 stars / 15,683 forks. MIT. Agent-personality bundle: Engineering 30+, Design 7, Paid Media 8, Sales 9, Marketing 27+ (incl. region-specific for Xiaohongshu / WeChat / Bilibili / Douyin / Kuaishou / Weibo / Zhihu / China Market Localization), Product 5, PM 5, Testing 8, Support 6, Spatial Computing 5, Game Development 25+ (Unity / Unreal / Godot / Blender / Roblox specialists), Academic 5, Specialized 40+, Finance 5. Multi-tool installer (./scripts/install.sh --tool <claude-code|cursor|aider|windsurf|opencode|copilot|antigravity|qwen-code|kimi-code|opencode|gemini-cli>) — 11+ AI coding assistants. 5 documented multi-agent use-case workflows (Startup MVP, Marketing Campaign, Enterprise Feature Development, Paid Media Account Takeover, Full Agency Product Discovery). Founder Michael Sitarzewski (Techstars alum, 30+ years building, real identity).
  • SCALE AI Foundation Skills (Mike Futia) — Brand DNA → Voice → ICP — Three free Claude Code skills (brand-dna-builder, brand-voice-profiler, icp-deep-dive) that produce the foundation files (brand/brand-dna.md / brand-voice.md / icp-cards.md) every downstream SCALE AI skill reads. 3 stars / 1 fork, no LICENSE file (README implies free use). Pure-prompt skills — SKILL.md + references/ subfolder, zero scripts. Firecrawl MCP is load-bearing for brand-dna-builder and icp-deep-dive (interview-only fallback without it). Companion YouTube tutorial DyuJX6X7KVs demos the full 7-skill stack — 3 free here + 4 premium (Meta Ad Hook Writer / Creative Brief Generator / Ad Script Writer / Static Ad Variation Engine via OpenAI GPT Image 2) gated behind SCALE AI Skool community (550+ DTC brands / agencies / performance marketers). Architectural primitive worth lifting: foundation files as schema contract + sequential file-passing through ./brand/ + references/ subfolder pattern + interview-plus-scrape onboarding flow.
  • GitNexus — Zero-Server Code Intelligence Engine with Graph RAG (Abhigyan Patwari) — 37k★ TypeScript code-intelligence tool. Indexes any codebase into a knowledge graph with built-in Graph RAG agent. Dual interface: CLI + MCP for Claude Code / Cursor / Codex (16 MCP tools — hybrid search, impact analysis, symbol context, multi-file rename) AND Web UI at gitnexus.vercel.app (browser-based via WebAssembly, drop-a-repo-or-zip-file). 14+ languages via Tree-sitter, LadybugDB graph storage, BM25+semantic hybrid index. PolyForm Noncommercial license (commercial via akonlabs.com — flag for any agency / WEO Marketly deployment). Both gitnexus-claude-plugin/ and gitnexus-cursor-integration/ ship in-repo. Unusually thorough docs (13+ root-level markdown files including AGENTS.md, ARCHITECTURE.md 32KB, GUARDRAILS.md, RUNBOOK.md, DoD.md).
  • Context7 — Up-to-Date Library Docs MCP for LLMs and AI Code Editors (Upstash) — Upstash’s hosted documentation-retrieval MCP server. 54,877★ / 2,608 forks at fetch. MIT MCP server source; supporting backend / parsing engine / crawling engine are proprietary. 97,640 libraries indexed with version-aware retrieval, trust scores 1-10, hourly freshness on top libraries. Two operating modes: CLI + Skills or MCP (hosted endpoint mcp.context7.com/mcp). One-command setup: npx ctx7 setup. Sister to QMD in the documentation-retrieval substrate — QMD = your own markdown vault, Context7 = external library docs; they compose.
  • Quarkdown 2.1.0 — Markdown Typesetting Language Ships an Official Claude Code Skill — Open-source Turing-complete Markdown-based typesetting language by iamgio (Giorgio Garofalo) that compiles a single .qd source to four document types (paged / plain / slides / docs) and to PDF via paged.js. Big Kotlin/Gradle monorepo (18+ subprojects: LSP server, live preview, parallel rendering, CSL citation styles). 2.1.0 (2026-05-21, commit 8ccecd9) ships an official Anthropic-format Claude Code skill bundle at iamgio/quarkdown/skills/quarkdown/ — CHANGELOG explicitly documents the ~/.claude/skills/quarkdown install path and links to Anthropic’s agent-skill docs. GPL licensed. Critical disambiguation: a separate third-party Claude skill (uditya-kumar/quarkdown-skill, posted to r/ClaudeAI two days before iamgio’s official ship) installs to the same path; r/ClaudeAI thread 1tjbhxx claims 100% vs 57.9% self-reported pass-rate on a 26-task suite (self-reported, not independently verified). Side-by-side install behavior (clobber? namespace? fail?) untested. Maintainer-handle linkage (iamgio = u/iamgioh) inferred from domain-and-path match.
  • QMD — Local Hybrid-Search MCP for Markdown Knowledge Bases (Tobi Lütke)Tobias Lütke (Shopify CEO)‘s local-first hybrid-search engine for markdown knowledge bases. 24,467★ / 1,539 forks, MIT, TypeScript. This wiki’s primary retrieval layer since 2026-05-04. Hybrid pipeline: LLM query expansion → parallel BM25 + vector retrieval → RRF fusion → LLM rerank → position-aware blend. All local via node-llama-cpp. 4 MCP tools (query/get/multi_get/status). Multi-collection (karpathy-wiki + weomarketly-wiki).
  • Graphify — Cross-Harness Knowledge-Graph Skill (Safi Shamsi)45.5k★ MIT Python skill. /graphify . in 18 AI assistants → 3 outputs (graph.html + GRAPH_REPORT.md + graph.json). Tree-sitter AST across 28 languages. Optional MCP server. MIT vs GitNexus’s PolyForm-NC = the load-bearing license difference for commercial work.
  • Printing Press — Agent-Designed CLI Factory + Library (Matt Van Horn) — One command prints a Go CLI + Claude Code skill + MCP server from an API spec, website, or HAR file. 622★ Press / 454★ Library, MIT. Nate Herk tutorial headline: MCP uses 35× more tokens than CLI on the same task. CLI tier 1, API tier 2, MCP tier 3.
  • Ryze AI) — Single-file Claude Code skill for building and sending HTML marketing emails via Resend. MIT, 147 stars. “Use your own public/ as a free email CDN” pattern. ffmpeg GIF optimization, Resend drip pipeline, 10-item compliance checklist.
  • last30days-skill (Multi-Platform Research) — Matt Van Horn’s 24.2k-star, 1,012-test research skill. Aggregates and engagement-ranks last 30 days from Reddit, X, YouTube, TikTok, HN, Polymarket, GitHub, and web. Multi-runtime. Depends on yt-dlp.
  • birdclaw — Local Twitter Workspace + Agent Skill — Stefan Petri’s local-first X/Twitter archive workspace. SQLite, FTS5 search, Playwright-tested. MIT, 377 stars.
  • clone-website) — JCodesMore’s 12.9k-star Next.js 16 template with /clone-website skill. Reverse-engineers any URL into a clean codebase via parallel git-worktree builders. MIT.
  • watch Skill for Video Input — Brad Brown’s MIT skill (117 stars) registering /watch <url-or-path>. Downloads via yt-dlp, extracts frames with ffmpeg, transcribes via Whisper/Groq. Multi-runtime.
  • video-analyzer Skill (Mike Futia, Gemini-backed) — Mike Futia’s MIT skill registering /video-analyzer <path>. Routes through Google Gemini API for native video understanding. Strong anti-hallucination guardrail.
  • Frontend Slides (zarazhangrui) — Zero-dependency HTML presentation generator as a Claude Code plugin. 15.5K stars, MIT. 12 visual presets.
  • agents — Claude Code Marketplace (184 Agents, 78 Plugins) — 34.4k-star MIT marketplace. 184 agents, 78 plugins, 150 skills, 16 multi-agent orchestrators across 25 domains.
  • Essential MCP Servers for 2026 — Highest-value MCP servers, scope levels, Tool Search efficiency, and security practices.
  • HeyGen Hyperframes (skills) — Installable skill bundle registering /hyperframes, /hyperframes-cli, /gsap into Claude Code for HTML-based video composition.
  • Lead Magnet Creation with Claude Code (Brandon Storey course) — Cross-listed from ai-marketing. Brandon Storey’s 1:55:54 Copywriting Coach walkthrough using a design skill + filesystem-aware Claude Code to convert a plain Google Doc → branded PDF lead magnet.
  • Seven Operator Use Cases for Claude’s New Stack (Rick Mulready) — Rick Mulready’s 17:42 operator-perspective tour of features that shipped March-April 2026, framed as 7 business use cases.
  • Nine-Component Agentic OS (Simon Scrapes) — Simon Scrapes’ nine concrete AIOS components: agent identity, brand context, 6-level memory taxonomy, skills, skill chains, three planning levels, multi-client architecture, predictable output structure, remote access via Channels.
  • Folder-as-Workspace Architecture — 3-Layer Routing Without Building Custom Agents — 25-minute walkthrough of a 3-layer folder + CLAUDE.md routing pattern argued as “what the industry is moving towards” and a replacement for building custom agent frameworks. (1) CLAUDE.md as the map/floor plan (always-loaded, folder structure + naming conventions + workspace descriptions); (2) per-workspace context.md (loaded only when Claude routes into that workspace, with a router table — the load-bearing primitive — “for this task read these files, skip those, load these skills”); (3) the actual workspace files (drafts/outputs/builds, governed by Layer-1 naming conventions). One Claude session loads workspace-appropriate context, becomes the agent each layer needs. Skills wired into the system via Layer-2 conditional loading rather than always-loaded. Naming conventions in CLAUDE.md replace database queries. Author references a forthcoming research paper on the lineage from 1972 software-engineering principles (rules of transparency, rules of composition) to modern AI. Sister to Ben’s Five-Skill AIOS, Simon Scrapes’ Nine-Component AIOS, and Moritz Kremb’s Claudia OS — differentiated by emphasis on the pure routing primitive itself as the durable contribution.
  • Claude Code Personal OS — Moritz Kremb’s OpenClaw-Inspired Folder Architecture — Moritz Kremb (Twitter @mob) walks Peter Yang through “Claudia OS” — his personal Claude Code chief-of-staff build that ports the OpenClaw folder architecture (master claude.md + identity/soul/user/tools.md registry + memory/ daily files + dreaming routine for overnight memory consolidation) into Claude Code, running both runtimes in parallel because OpenClaw keeps breaking on updates. Single load-bearing file is tools.md (append every CLI/MCP/API). Tool-evaluation hierarchy: CLI > MCP > API (matches Printing Press and TinyFish from the operator side). G Drive over local markdown for storage (phone-first creator workflow, contrasts with Karpathy local-vault default). 8-stage content system as capstone (Telegram idea capture → weekly plan → script notes → Whisper Flow dictation → human filming → editor-in-loop → Postiz CLI multi-platform post → Notion-Manychat resource auto-funnel → stats). TikTok-via-API performance caveat + Instagram Edits-app workaround surface. Sub-agents-are-the-exception stance (reviewer/drafter or per-business context only). Heartbeat-as-routine fidelity gap with OpenClaw flagged. Closing tip: build one tool at a time. Recorded ~10 days after Claude Code Desktop app launch.
  • Build & Sell Claude Code Operating Systems (Nate Herk’s 2-hour course) — The most comprehensive third-party Claude Code “AI operating system” framework. The Three Ms of AI + The Four Cs of an AIOS. Public MIT starter kit at nateherkai/AIS-OS.
  • Master 97% of Codex in One Hour (Nate Herk) — End-to-end OpenAI Codex CLI walkthrough with explicit positioning against Claude Code. Mix-and-match composition thesis. ChatGPT-subscription OAuth as cheapest non-open-source path.
  • Build Your Own Coding Agent — Night Code Tutorial — Multi-chapter live-coding bootcamp building “Night Code,” a Claude Code / OpenCode alternative from scratch. Four-package monorepo = OpenTUI React-based terminal UI + Hono backend + shared types + database. Plan-mode vs Build-mode = tool gating (plan = read+search; build = full read+write+edit+bash; author’s thesis: “create your own modes — review, test, documentation”). Provider-agnostic via Vercel AI SDK (6 models default; mid-session switching). Browser-to-CLI OAuth via Clerk. Credits-based SaaS billing via Polar (no Stripe glue, no webhooks). Real engineering practices baked in: CodeRabbit AI code review, Railway preview deploys per PR, Sentry observability, Git-branch-per-chapter. The “build your own Claude Code” framing is somewhat misleading — closer to “ship your own AI-coding SaaS using the AI SDK.” Architecture is the durable contribution; the harness layer is thin (Vercel AI SDK does heavy lifting), the commercial stack around it gets most of the chapter time.
  • Voice Agents with Claude Code + ElevenLabs (Nate Herk) — Live-build walkthrough: Claude Code configures an ElevenLabs voice agent end-to-end via natural language. Four-piece voice-agent structure (Persona / Voice / Knowledge / Tools).
  • Six Best Claude Code Skills for Business (400-Hour Operator Pick) — Curated list after 400-hour Claude Code immersion. Six items: Skill Creator, Superpowers, GSD plugin, /review+/ultrareview, Context Mode, claude-mem. Plus sales-enablement closer.
  • Seven Claude Skills That Run My Business (tested-100+ operator pick) — Curated list from an “AI playbook” newsletter operator who tested 100+ skills and kept 7: Anthropic skill-creator, /today daily-digest, /watch video (Brad Automates), /ai-digest newsletter triage, /newsletter transcript-to-topics, /article AEO/GEO optimizer, /decide multi-agent decision-memo skill. Composition-heavy (article-optimizer calls watch sub-skill). Obsidian-as-persistence pattern. Pair with six-best and nine-plugins as a three-way triangulation.
  • Nine Claude Code Plugins to Build 10× Faster — Operator-side curated plugin catalog: Caveman (token-save concise mode), Firecrawl+Exa (web stack), Compound Engineering (5-step plan→work→review→compound→repeat loop), Higgsfield MCP (image/video gen, sponsored disclosed), Anthropic-official set (skill-creator + legal + frontend-design + security-guidance), OpenAI Codex plugin (multi-model + subsidy-reversal framing), BuildPartner.ai, Morph (fast-apply + warp-grep + compact), Code Burn (token-spend dashboard). Closing thesis: stack three columns (token-spend / extension / discipline).
  • Every Level of Claude — 5-Level Mastery Framework (Nate Herk) — Taxonomy-driven walkthrough of the entire Claude product surface as a 5-level mastery ladder (Enthusiast → Beginner → Intermediate → Advanced → Architect), each with explicit “cheat code” transitions. Distinct from Nate’s deep-dive courses (AIOS, Codex) by being a surface-spanning overview rather than a single-product walkthrough — the value is the level-progression model itself. Useful as a route-finder when an operator says “I’ve outgrown chat, what’s next?”
  • How Anthropic Engineers Prompt Claude Code — Four Rules from the AI Code Summit — Four rules distilled from Anthropic engineers Eric + Barry at the AI Code Summit + the Anthropic engineering blog: (1) prompt skills, not Claude; (2) skills are more than prompts — the 3-layer model (description + instructions + tools); (3) build composable skills, not custom (small focused vs mega-skill); (4) update your skill every session for the compounding loop. Introduces two invocation-control flags most users don’t know: user_invocable: false (hides from /slash menu, agent-only) and disable_model_invocation: true (user-only, blocks model autonomy for risky ops).
  • Karpathy-Inspired Claude Code Guidelines — multica-ai’s CLAUDE.md (Four Principles, 146k★) — Single CLAUDE.md file distilling Karpathy’s 2026 X post on LLM coding pitfalls into four behavioral principles: Think Before Coding (state assumptions, present interpretations, push back when warranted, stop when confused) / Simplicity First (no features beyond ask, no abstractions for single-use code, 200→50 line test) / Surgical Changes (every line traces to user’s request, leave pre-existing dead code) / Goal-Driven Execution (transform “fix the bug” → “write a failing test, make it pass”, 1. [Step] → verify: [check] plan format). 146,000 stars / 14,900 forks / MIT. Two install routes: Claude Code plugin (/plugin marketplace add forrestchang/andrej-karpathy-skills) or per-project curl. Ships an equivalent Cursor rule. Same project as forrestchang/andrej-karpathy-skills — multica-ai is the org (also home to the Multica coding-agent platform), forrestchang is the maintainer’s personal handle. Verified not a star-puller glitch: original 2026-05-19 raw stub skip-triaged this as suspicious; verified at 146k on 2026-05-22 against the live repo. Dedicated treatment of what was previously a passing mention in Karpathy’s LLM-Wiki Techniques Recent Signals.
  • Skill Systems — Orchestrator + Child Skill Chaining Pattern (Agentic Academy) — The middle path between two failure modes: skills-in-isolation (you’re the manual intermediary) and mega-skills (lose modularity, maintainability, progressive disclosure). Right answer = orchestrator skill wired around small focused child skills. Real example: four production skill systems (video-clip-shortform / video-to-article / social-carousel / slide-generation) sharing four child skills including fact-checker + humanizing. Update one child → every parent system upgrades. Companion to the Anthropic-engineers four rules (rule 3 = composability).
  • Nouns Mental Model — Eliot Prince’s framing video resolving the most common Claude operator confusion (“should I build a skill or a project?”). Mental model: skills are verbs (reusable processes that run anywhere in the Claude ecosystem), projects are nouns (scoped knowledge bases with overarching instructions and chat history). Kitchen analogy: project = stocked pantry; skill = recipe. Anatomy walkthroughs of each section. Worked example: Michael-Gerber-style SOP prompt → gerber-sop-writer skill via Skill Creator, run inside a fictional Cerno Facilities Group cleaning-company project to produce credit-card-usage SOPs pairing generic process with company-specific knowledge. Token-cost win: ~90% reduction by moving recurring knowledge into projects. Discipline rule: keep skill reference files lightweight (templates/checklists only, NOT deep client knowledge) so skills stay transferable. Sister to Cowork Getting Started (same author, same chef-vs-recipe analogy applied to Cowork plugins).
  • run-skill-generator) — A pattern for stopping Claude Code from re-deriving how to build and launch your app every session: /run-skill-generator does the discovery once and writes a per-project skill (build steps, launch, the non-obvious app gotchas — ports, self-signed certs, CSRF, false-positive warnings — plus a smoke script), then /run just reads it. Three payoffs: kills the per-session token spend, keeps run mechanics out of the always-loaded CLAUDE.md, and unblocks live-app functional/security testing. (/run is a built-in Claude Code skill; the /run-skill-generator companion + token framing are the source author’s report.)
  • I Tried Every Popular Claude Skills System — Best Is the One You Build Yourself (Code4AI) — Contrarian take after reviewing every major skill library (gstack, Everything Claude Code, Pocock, Addy Osmani, BMAD, GSD, OpenSpec, Superpowers). Thesis: best skill system is the one you build yourself; start with natural-language prompting + native plan mode; only add skills when the agent demonstrably fails repeatedly; keep them short. Universal 5-step pattern under every popular library: research/discuss → prototype-front-end-only → markdown plan with phases → build slice-by-slice → test + optional polish pass with a different model. Library-specific contributions worth lifting individually: Pocock’s grill-me (domain-grounding), Tan’s office-hours (6 forcing questions), Addy Osmani’s simplify-code (polish pass). Key claim: skills are documentation; documentation rots; every skill is a maintenance liability. Vercel’s skills.sh highlighted as team-scale skill management primitive. Minority position counterweighting six-best / seven-skills / nine-plugins starter-pack coverage.
  • Ben’s Five-Skill AIOS Setup — OS Setup → Operator → Optimizer → Team → MCP — Five-skill bundle from “Ben” (AI agency) walking from zero to a working Obsidian-backed second brain with team permissions + Railway-hosted MCP for autonomous routines. Five named skills: /os-setup (folder + CLAUDE.md + 12-section context interview), /os-operator (Cowork scheduled task pulling Fireflies/Slack/Calendar into daily/), /os-optimizer (audit + caveman-compression + Chroma-context-route + Karpathy-LLM-wiki framework), /team-os (Obsidian Relay + custom Bennai Relay for member/owner permissions), /os-mcp (Railway-hosted vault MCP enabling cloud Routines). Distinct from Nate Herk’s AIS-OS bundle (Nate’s tool-agnostic; Ben’s opinionated about Obsidian + Relay + Railway).
  • Claude Code Memory Architectures Compared — Built-in vs memarch vs Hermes (Hybrid Recommendation) — Decomposes three memory systems against three load-bearing questions (storage / injection / recall). Built-in automemory is selective + weak recall. memarch is capture-everything via stop hook → local CPU vector DB rebuildable from markdown, with three-tier progressive disclosure recall. Hermes is curated memory.md + user.md + soul.md frozen snapshot (~1,300 tokens/session) + 7-day curator + level-0 context check before database hit. Hybrid recommendation: combine all three — memarch storage + Hermes injection + level-0 → level-1 (memarch hybrid) → level-2 (expand) → level-3 (raw dialogue) recall. Companion to skill-systems-orchestrator-child-pattern + ben-five-skill-aios-setup (same Agentic Academy creator).
  • Chris’s AI Coding Workflow Update (2026-05) — Xcode Build MCP + Claude with Chrome + Gravile + CMUX + Remote Control — Solo-dev field report on seven workflow tools: Xcode Build MCP (Sentry, iOS automated testing — 90% of Xcode), Claude with Chrome (web testing), MCPs+CLIs everywhere for production debugging (Sentry/Supabase/Axiom/Firebase — 45 min → 3 min crash triage), Gravile AI code review ($30/mo, beats Cursor Bugbot), /remote-control + Boris Cherny’s /config auto-start tip for phone session continuation, CMUX terminal (20+ Claude Code instances vs Cursor’s ~5), no-flicker mode + max-thinking flag. Default Opus 4.7 + max → fallback GPT 5.5 extra-high 1M context for complex edge-case bugs. Tip: default to CLI over MCP when both exist (less context bloat, agents handle CLI better).
  • [[claude-ai/claude-code-goal-command-walkthrough|Claude Code /goal Walkthrough — Long-Running Goal-Converged Sessions (Chris / buildgreatproducts.com)]] — Side-by-side walkthrough of /goal in both Claude Code AND Codex CLI, by the same Chris who shipped the 2026-05 coding-workflow update. /goal = completion-condition primitive that loops the agent autonomously across 40-80 tasks; a small fast model checks the condition after each turn. Up to 4,000 characters for the condition. Positioned as the evolution of the Ralph loop. Recommended scaffold: CLAUDE.md (Karpathy rules) + docs/prd.md + docs/product-roadmap.md (40-80 tasks) + docs/design.md (Google open-source format generated via the plaid skill at plaid.build). Workflow: plan mode first → generate verifiable end-condition → /goal → pair with auto-mode to suppress permission prompts. The same primitive ships in both Claude Code (W20 v2.1.139) and Codex CLI — Claude Code’s port is a direct adoption. Refreshed 2026-05-25 with a r/ClaudeCode signal: hazyhaar’s 9 h 27 m session chained 4 /goal commands, produced 45 commits / 14,259 LOC / 4.16M rows ingested; shows how the Stop-hook condition + chained handoff scale to a real long-horizon workflow.
  • Dynamic Workflows in Claude Code — Anthropic Announcement (Research Preview)Official Anthropic announcement of the Workflow primitive. Claude dynamically writes orchestration scripts running tens-to-hundreds of parallel subagents in one session, with adversarial self-checking (agents refute each other until answers converge) and resumable progress (interrupted runs continue where they left off). Research preview across CLI / Desktop / VS Code + Claude API / Bedrock / Vertex / Foundry; on by default for Max/Team/API, off for Enterprise until admin-enabled; enable via the ultracode setting + auto mode. Consumes meaningfully more tokens (confirm-before-first-run gate; start scoped). Flagship example: Jarred Sumner’s Bun Zig→Rust port — ~750K LOC Rust, 99.8% tests passing, 11 days, hundreds of agents + two reviewers per file. Klarna (dead-code discovery beyond static analysis) + CyberAgent (“between one subagent and a full agent team”) early-access quotes. The official-source companion to the mechanics walkthrough below. Refreshed 2026-06-02 with the Claude Code team’s best-practices deep-dive (Thariq Shihipar @trq212): the agentic-laziness / self-preferential-bias / goal-drift rationale, a six-pattern taxonomy, and an example-prompt library skewed toward non-technical work.
  • [[claude-ai/claude-code-workflows-tool-walkthrough|Claude Code /workflows Walkthrough — Deterministic Multi-Agent Orchestration]] — Worked-example deep-dive on the new Workflow tool (shipped in v2.1.147; off by default, enabled via CLAUDE_CODE_WORKFLOWS=1). Replaces the model-as-orchestrator pattern with code-as-orchestrator.claude/workflows/<name>.js files declaring phases / schemas / agents / pipelines / budgets. The killer feature is killing the token tax: sub-agent outputs flow phase-to-phase without re-entering the main session’s context, so 20-100 sub-agents don’t degrade the orchestrator. Three primitives (agent / pipeline / schema) + four behavioral knobs (phaseLog / arguments / budgets / auto-retry). Pipeline-streaming demoed: as one item exits stage 1, stage 2 starts on it without waiting for the rest. Three worked examples: triage Sentry (Sentry MCP → filter by min_users → fix → verify; 7 sub-agents, ~400K tokens), dead code sweep (while loop up to 8 rounds; exit early when none found), personalized outreach (8 leads → 8 parallel research sub-agents pipelined into 8 message writers). Background workflows + multi-workflow concurrency + per-sub-agent pause/skip/retry. Companion to [[claude-ai/claude-code-goal-command-walkthrough|/goal]] (autonomous loop) and How We Claude Code (agent-native verification). Creator attribution missing from the YT transcript — see Open Questions.
  • Anthropic’s Official Best Practices for Claude Code (primary source) — Full ingest of code.claude.com/docs/en/best-practices, the canonical Anthropic doc that the W21 Reddit chatter on the “4 context tools” was paraphrasing. The doc actually documents eight context-management techniques, not four (Reddit surfaced /btw, /rewind, /compact <instr>, CLAUDE.md compaction directives; missed /clear, default /compact, subagents, and CLAUDE.md hierarchy itself). Verbatim quotes worth pinning: “/clear is admission you waited too long”; verification is “the single highest-leverage thing you can do”; ASCII flow + “give a screenshot” beats prose for context dumps. Anchors the broader meta-pattern cluster (skill rules, orchestrator-child, AIOS bundles, /goal, memory architectures) — all of those are applications of this doc’s primitives. Refreshes the whats-new-2026-w21 note that the Reddit summary mis-counted the toolset. Refreshed 2026-05-18 [X signal — @iam_elias1]: Cal Rueb (Anthropic MTS) is named as the talk-source speaker at Code with Claude SF on May 22, 2025; X post asserts a 33% unguided-success-rate stat attributed to Anthropic internal testing (two-hop chain, marked ^[inferred] pending corroboration from the actual talk recording).
  • Context Management in Claude Code (Anthropic Tutorial) — 2-minute Anthropic-produced explainer of the context primitives — /compact (continue same feature), /clear (switch to new feature, no bias), /context (inspect categories). Plus the specificity paradox (shorter prompts cost more context long-run because Claude has to compensate), MCP-server discipline (turn off unrelated servers), skills as a lower-context alternative to MCP, and subagents as the separate-context-window lever for exploratory queries. Companion to Anthropic’s Best Practices doc — same primitives, distilled to 2 minutes.
  • Alex Albert — Inside How Anthropic Is Building the Next Claude — Long-form interview with Alex Albert (formerly head of DevRel, “first prompt engineer at Anthropic, might have been the first in the world”, now Product Manager on Anthropic’s research team). Rare inside view: model-as-product spec, Claude-clustering-Claude-feedback for evals, adaptive thinking calibrated by user context (without context, model can’t decide whether to think hard), dreaming as overnight memory pruning (“review all conversations, identify themes, summarize” — now shipped to Managed Agents), one-way-vs-two-way-doors PRD scoping (engine time is no longer one-way in 2026, MVPs in days not weeks). Confirms long-running-agent character + consciousness are explicit research concerns.
  • Vercel DeepSec — Agent-Powered Vulnerability Scanner for Vibe-Coded Apps — Open-source Vercel-built scanner (npx deepsec init) targeting OWASP Top 10 in your own infrastructure. Five-command lifecycle (initscanprocessreportrevalidate). Sample run: 118 candidates in <1s via regex, 1.12 to revalidate after fixes. Threat-model-first design (writes a per-project threat model during init, calibrates findings). Wires into Claude Code or Codex via --agent flag, or runs hosted on Vercel infra. Static-analysis only — pair with runtime monitoring. Author’s recommended fix-application loop: openspec fast-forwardopenspec apply → DeepSec revalidate.

Skill dependencies

  • yt-dlp — Public-domain CLI audio/video downloader. The de-facto YouTube extraction utility that Claude skills reach for when they need transcripts, audio, or video.

Third-party Frameworks

  • obra) — Closed end-to-end software-development methodology shipped as a skills bundle on the official Claude plugin marketplace. MIT, 168k stars.
  • SuperClaude Framework — Community meta-programming framework adding 30 /sc:* slash commands, 20 specialized agents, 7 behavioral modes, 8 MCP server integrations. MIT, 22.5k stars.
  • See also oh-my-claudecode (OMC) under Agents.
  • OpenSpec — Spec-Driven Vibe-Coding Framework — Spec-driven framework producing proposal.md + design.md + tasks.md + per-concern specs/ artifacts per change.
  • Ars Contexta — Skill Graphs and Conversationally Derived Vault Architecture — Heinrich’s Claude Code plugin deriving a complete personalized KMS from a 6-phase conversational setup. MIT, ~3,300 stars.
  • gstack — Garry Tan’s 23-Tool Claude Code Setup — Y Combinator CEO’s open-sourced daily-driver skill pack. 23+ slash commands grouped into 5 role buckets (planning/design, dev/review, testing/deploy, docs/security, utilities). One-line clone install. MIT, 99.9k stars.
  • GSD (Get-Shit-Done) — Spec-Driven Development System — TÂCHES’s six-step phase loop (/gsd-new-project/gsd-discuss-phase/gsd-plan-phase/gsd-execute-phase/gsd-verify-work/gsd-ship) with persistent five-artifact layer. The /gsd-* slash-command family. Multi-runtime (Claude Code / OpenCode / Gemini CLI / Cursor / Codex etc.). MIT, 63.3k stars.
  • BMAD-METHOD — Breakthrough Method for Agile AI-Driven Development — BMad Code LLC’s 12+ specialized personas + “Party Mode” multi-agent discussions + 34+ structured workflows + scale-adaptive planning. npx bmad-method install. MIT, 47.7k stars.
  • Matt Pocock’s Skills Repo — “Skills for Real Engineers.” 18 skills (engineering: diagnose, tdd, triage, prototype; productivity: caveman ~75% output token reduction, grill-me, handoff). Failure-mode-targeted. npx skills@latest add mattpocock/skills. MIT, 96.4k stars.
  • Academic Research Skills (Imbad0202) — 5-stage Claude Code skill bundle — Academic-pipeline skill bundle (research → write → review → revise → finalize). Python, 19,620 stars at ingest, license NOASSERTION (unresolved). Verification flags raised in-article: high star count on a young repo (~3 mo since 2026-02-26 creation), license ambiguity, Skill-vs-prompt-collection ambiguity — wiki tracks the precedent that high-star young repos can be real (multica-ai/andrej-karpathy-skills was 146k★ verified). Falsification candidates: re-fetch star history + clone the bundle and confirm SKILL.md frontmatter shape.
  • codegraph (colbymchenry) — Pre-indexed Code Knowledge Graph — TypeScript MIT, 19,133 stars at ingest (~4 mo since 2026-01-18 creation). Multi-agent positioning — “Pre-indexed code knowledge graph for Claude Code, Codex, Cursor, OpenCode, and Hermes Agent — fewer tokens, fewer tool calls, 100% local.” Architectural sister to this wiki’s QMD but at code-graph rather than document-retrieval scale. Falsification candidate: reproduce the “fewer tokens, fewer tool calls” claim with a measured delta against a baseline.
  • Understand Anything (Lum1104) — Codebase + Karpathy-LLM-Wiki Knowledge Graph — TypeScript MIT, 21,832 stars at ingest (~2 mo since 2026-03-15 creation). 14 platform integrations from one repo — Claude Code (native plugin), Codex, Cursor, VS Code + Copilot, Copilot CLI, Gemini CLI, OpenCode, OpenClaw, Antigravity, Pi Agent, Vibe CLI, Hermes, Cline, KIMI CLI. Distinguishing claim against CodeGraph: explicit Karpathy-pattern LLM wiki support via /understand-knowledge command — consumes the exact format this vault implements and produces a force-directed graph with community clustering. 6-agent pipeline (project-scanner, file-analyzer, architecture-analyzer, tour-builder, graph-reviewer, domain-analyzer; article-analyzer for /understand-knowledge). Tree-sitter deterministic structural pass + LLM semantic pass. Graph-as-committed-JSON pattern. Live demo at understand-anything.com/demo/. Falsification candidate: run /understand-knowledge against this vault and compare against existing 2D/3D graph view on jc-aiwiki.pages.dev.

Agents

  • Claude Managed Agents — Anthropic’s hosted agent service (beta, April 2026). Sandboxed execution, checkpointing, OAuth.
  • Session Architecture — Anthropic Engineering deep-dive (Lance Martin / Gabe Cemaj / Michael Cohen, 2026-04-08) on how Managed Agents is built: decouple the brain (Claude + harness) from the hands (sandboxes) and the session (external append-only event log); six primitives (execute/provision/wake/getSession/emitEvent/getEvents); crashes become tool-call errors; p50 TTFT −60% / p95 −90%; credentials never touch the harness (resource-bundled auth + vault/MCP-proxy); modeled on OS virtualization. The how-it’s-built companion to the product/operator articles above.
  • Claude Code Agent Teams — Experimental multi-instance coordination (v2.1.32+). Lead + teammates with shared task list, mailbox, plan-approval gates. Recommended sizing 3–5 teammates / 5–6 tasks each.
  • Claude Code Subagents — Isolated parallel workers with scoped permissions. The stable default for parallel work.
  • The Advisor Strategy (advisor_20260301) — Server-side tool (beta, Apr 2026) letting Sonnet/Haiku consult Opus mid-request. +2.7pp on SWE-bench at 11.9% lower cost.
  • oh-my-claudecode (OMC) — Teams-First Multi-Agent Orchestration — 30k-star Claude Code plugin + CLI from Yeachan Heo. 6 orchestration modes, ~19 specialized agents, HUD statusline, smart model routing.

Automation

  • Claude Code Routines — Cloud-based saved tasks that run on Anthropic’s servers via schedule, API, or GitHub events. Research preview (April 2026).
  • Ultraplan (Cloud Plan Mode) — Research preview from W15. /ultraplan ... kicks off plan mode in Claude Code on the web; review in browser, comment, execute remotely or send back to CLI.
  • Claude Code Hooks — Deterministic shell/HTTP/MCP/prompt/agent commands fired at lifecycle events. Exit-code 2 blocks; permissionDecision gates tool calls. Configured at user / project / project-local / plugin / skill scopes.
  • Claude Code Scheduled Tasks/loop (session-scoped) plus CronCreate/List/Delete tools (v2.1.72+). Three forms: fixed cron, Claude-paced dynamic interval, or bare /loop. Jitter up to 30 minutes.
  • Claude Code Channels — MCP servers that push external events into a running session (v2.1.80+). Built-in plugins: Telegram, Discord, iMessage, fakechat. Console API key auth now supported (W19).

Partner Network

  • Claude Partner Network — Eligibility, Tiers, and What Anthropic Actually Asks For — March 2026 launch, $100M initial investment, free to join. Three partner tracks + three certification tracks.
    • Services Track and Partner Hub (June 2026) — The June 3 2026 expansion that publishes the quantitative tier thresholds (Registered → Select → Preferred → Global Premier; e.g. Select = 10 certified people + 2 production deployments + 1 public story; Global Premier = 1,000 + 100 across 3+ regions). Adds the Claude Partner Hub (real-time daily standing, public partner directory, and an MCP connector for conversational partner discovery). $100M training/support/marketing investment; launch partners Accenture / Deloitte / PwC / KPMG / Cognizant / Infosys. Resolves the original page’s “no published tier numbers” Open Questions.
  • Anthropic Partner Academy (Skilljar) — The partner-gated training portal at anthropic-partners.skilljar.com. Same Skilljar tenant as the public Academy but adds partner-exclusive seller-enablement courses (Claude Security, Opus 4.7 and Cyber, What’s New With Opus 4.7/4.8), a CPN Connect On-Demand Library, and a Partner Network Learning Path positioned as “the first step toward unlocking certification” — the on-ramp to CCA-F.

CCA-F Certification

  • Claude Certified Architect — Foundations (CCA-F) — Exam overview: 60 questions, 5 domains, 720/1000 to pass, free via Partner Network.
  • CCA-F Official Exam Guide — All 5 domains, 30 task statements, 6 scenarios, sample questions.
  • CCA-F Study Guide — Domain-by-domain study plan with key principles and recommended order.
  • CCA-F Practice Questions by Domain — Patterns and principles from community exam prep, organized by domain.
  • The Architect’s Playbook — Enterprise LLM architecture patterns and anti-patterns.
  • Anthropic Claude Cookbooks — Official cookbook notebooks mapped to CCA-F exam domains. Refreshed 2026-05-23 with two new deep-dive sub-articles.
  • Cookbook — Multi-Agent Research Pattern (Lead + Subagent + Citations Prompts) — Distills the three-role architecture Anthropic ships in patterns/agents/prompts/. Research lead = planner/synthesizer with depth-first / breadth-first / straightforward classifier driving fan-out. Parallel research subagents = focused investigators with a 5-15-tool-call OODA budget, web_fetch-after-web_search mandate, and an internal-tools-priority rule. Citations agent = read-only annotator with a whitespace-exact no-rewrite contract. Use as a reference when building your own multi-agent research pipeline.
  • Cookbook — Chief of Staff Agent (SDK Multi-Domain Coordinator Template) — The canonical Claude Agent SDK coordinator template (TechStart fictional SaaS): CLAUDE.md business state + two domain subagents (financial-analyst, recruiter) + three slash commands (/budget-impact, /strategic-brief, /talent-scan) + two output styles (executive, technical) + post-write audit hook. Re-use checklist for swapping in a different 2-domain coordinator: per-subagent tool-scoping via YAML frontmatter, wrapped Python scripts as source-of-truth for calculations, $ARGUMENTS slash-command parameterization, output styles as audience-separation primitive.
  • CCA-F Technical Reference — Deep technical content with code patterns, API details, SDK patterns.
  • CCA-F Practice Exam (60 Questions) — Full-length 60-question practice test with worked explanations.

Design Skills

Wiki Pattern

  • Karpathy Techniques for Claude Code — How Andrej Karpathy uses Claude Code differently: vague-prompt philosophy, the raw/ + wiki/ + Web Clipper loop, hot cache as session-continuity mechanism, lint-as-maintenance, hub-and-spoke graph emergence at scale.
  • Wiki Community Enhancements — Survey of 12 GitHub repos and community patterns: hot cache, delta manifest, contradiction detection, provenance tracking, scaling strategies.

242 items under this folder.