Agents & Agentic Systems

Tool use, planning, multi-agent patterns, agent frameworks, and practical agent deployments. Covers both Claude-specific agent features and general agentic architecture patterns.

Articles

Claude Agent Hierarchy — When to Use Which — Comparison of Claude’s three agent tiers (Managed Agents, Agent Teams, Subagents) with decision framework for choosing the right one.
Agent Workflow Patterns — Sequential, Parallel, Evaluator-Optimizer — Anthropic’s official taxonomy of the three workflow shapes that keep showing up in production, plus the decision framework. Default to sequential. “Start with the simplest pattern that solves your problem.”
AI Agents Unleashed — 2026 Playbook (Mindstream × Futurepedia) — Platform-agnostic implementation guide: chatbot-vs-agent reframe, precision framework, “Is this an agent job?” decision tree, 4-phase roadmap, 7 pitfalls, human-AI relationship timeline, 7 training competencies. Authors: Adam Biddlecombe + Kevin Hutson.
Nous Research Hermes Agent — Self-hosted autonomous agent with persistent memory, auto-generated skills, 47+ tools, 6 sandbox backends, 15+ messaging platforms, and MCP integration. Model-agnostic (Nous Portal, OpenRouter, OpenAI, or any OAI-compatible endpoint). MIT, 97K+ stars.
Adaline — End-to-End AI Agent Platform — Single platform for the four-stage agent lifecycle: iterate, evaluate, deploy, monitor. Provider-agnostic prompt management, multi-modal + dynamic-variable testing, AI-assisted test-suite generation, multi-environment deployments with smart diffing and instant rollbacks, full traces/spans, human-annotation loop tied directly to monitoring. Recently went GA with $1MM API-credit promotion. Customers: McKinsey (Lilli), Discord, Coframe, Reforge. Stats claimed: 200M+ API calls/day, 5B+ tokens/day, 300+ models, 99.998% uptime. Sits alongside LangSmith / PromptLayer / Helicone / Braintrust / Galileo in the LLMOps space.
TinyFish — Web Infrastructure APIs for AI Agents — Four-product platform under one API key: Search, Fetch, Browser, Agent. Search + Fetch went free May 4 2026 across REST / MCP / SDKs / CLI / Skill (free-tier 5 q/min Search, 25 URLs/min Fetch). Custom Chromium fleet with 28 C++-level anti-bot mechanisms; sub-250ms browser cold start, P50 488ms search. Vendor-reported 87% token reduction and 2× completion rate when using CLI + Skill over MCP — concrete data on context-window economics. $47M Series A from ICONIQ; customers Google / DoorDash / Cigna / Volkswagen / Grubhub / NEC; integrates with Hermes / OpenClaw / Cline / Goose / Antigravity / n8n / Dify / LangChain / CrewAI. Direct competitors named in launch coverage: Browserbase (uses Exa for search), Firecrawl (agent reliability issues).
ScrapeCreators — Social Media Scraping API for AI Pipelines — Adrian Horning’s (Austin, TX) social-scraping API across 20+ platforms (TikTok 20 endpoints, Instagram 12, YouTube 12, Facebook 9 + Ad Library, X/Twitter 6, LinkedIn 4 + Ad Library, Reddit 5, Pinterest 4, Threads 5; plus Bluesky/Truth Social/Twitch/Spotify/Snapchat/Kick + 4 ad libraries + 5 link-in-bio platforms). 100 free credits no-card; pay-as-you-go from $47/25 k cre d i t s \to$ 497/500k credits → enterprise. Single x-api-key header, no rate limits, JSON-only, ~3.1s avg response, claimed 1M+ req/day at 98.2% success. Ships official MCP server (@scrape-creators/mcp) + CLI + first-party Claude Code skill. Sister to TinyFish (web infra) — ScrapeCreators is the social-platform-deep counterpart. Karpathy last30days SessionStart hook calls it out by name as the gap-filler for Reddit comments + TikTok + Instagram (note: hook quotes 10k free credits, landing page shows 100 — flagged for refresh).
Crabbox — Remote Testbox for OpenClaw Maintainers and AI Agents — github.com/openclaw/crabbox (MIT, Go, 299★ at 10 days old, created 2026-04-30, last push same-day as ingest). Short-lived Linux box for every run on shared cloud capacity: lease, sync, run, release. CLI (Go binary on the laptop) + Broker (Cloudflare Worker + 1 Durable Object) + Runner (Hetzner / AWS Spot / Azure / static-SSH / Blacksmith-testbox). Brokered mode keeps provider creds off laptops; CLI carries only a bearer token. Cost guardrails first-class — TTL caps + monthly spend caps + per-user/org/provider tracking via crabbox usage. Ships as standalone CLI (brew install openclaw/tap/crabbox) AND native OpenClaw plugin exposing 5 agent tools (crabbox_run / _warmup / _status / _list / _stop). The OpenClaw answer to the infrastructure-was-the-wall thesis Anthropic’s Platform team articulates — open + self-hosted + multi-cloud counterpart to Anthropic’s Managed Agents. Notable design choice: crabbox actions hydrate reuses existing GitHub Actions setup steps so local Crabbox runs land in the same hydrated workspace as CI (no duplicate local + CI bootstrap config). Same loop for agents and humans.
Paperclip — Multi-Agent Company Orchestration Platform — Paperclip frames AI agent management as running an AI company rather than configuring a single coding assistant. Heartbeat system (9-step protocol per agent: receive task → check budget → load skills → plan → execute → log → checkpoint → return → sleep), goal cascade (org-level → team-level → agent-level), full org-chart UI showing reporting structure and inter-agent message volume, five agent configuration areas (Instructions / Configuration / Skills / Budget / Runs), 16 pre-built example “companies” including Agency Agents and Fullstack Forge. Native Claude Code REST API integration on localhost:3100 — Paperclip can dispatch work to a local Claude Code instance instead of going through the Anthropic API directly. Closes the “I want one agent that runs the whole company while I sleep” loop that single-agent frameworks struggle with. AIS+ resource bundle entry; companion course to Codex 1-Hour and Hermes 1-Hour from the same operator.
Autobrowse — Self-Improving Browser-Agent Harness (Browserbase) — Browserbase’s harness that runs a browser agent against a real task on a real site, iterates the strategy via a strategy.md scratchpad until the workflow converges, then graduates the winning approach into a markdown SKILL.md plus deterministic helper scripts. Frames the loop as the Karpathy autoresearch ratchet applied to browser-skill discovery. Concrete benchmarks (Browserbase-reported): Craigslist task $0.22/71 s ba se l in e \to$ 0.12/27s graduated; form-fill $1.40 \to$ 0.24 in 4 iterations; federal grants portal collapsed a 28-page scrape into a single browse fetch after Autobrowse surfaced an undocumented JSON endpoint. Cap iterations low (~3-5), short-circuit aggressively. Honest failure mode: deterministic-parsing tasks (167-row static HTML state catalog cost ~$24 across 4 iters before pivoting to 200 lines of Python with browse fetch + BeautifulSoup) — lesson written into the skill itself: probe with fetch first, escalate to Autobrowse only if the response is empty / dynamic / gated. Output is small readable markdown (frontmatter with recommended_method + alternative_methods + source trace listing iters/convergence date/cross-region prod-validation; body sections for Purpose / When to Use / Workflow / Site-Specific Gotchas) — same format Browserbase’s internal generalist agent bb already loads on demand for feature requests / session investigations / PRs / sales triage. Skills as customer handoff — durable, debuggable, human-auditable, ownable; both engineers and non-engineers (technical PM, VP of tech, grants manager) can read them. Same memory-as-bottleneck thesis as Memory & Dreaming and the Platform team interview, applied to browser agents specifically. Roadmap: smarter stopping (let the agent reason about own convergence by trace structure, not just cost/turns), better priors (push the agent toward fetch/search primitives before browser sessions, and inspect network events / CDP logs to discover internal APIs), recursive Autobrowse (improving the harness itself).

Adjacent: long-running agent showcases

ClaudePlaysPokemon — [Reddit signal — r/ClaudeCode 2026-05-07] Opus 4.7 run currently streaming live at twitch.tv/claudeplayspokemon. Passion project by David Hershey (Anthropic Applied AI team), started June 2024 to learn agent development; went public when Sonnet 3.7 launched February 2025. Anthropic doesn’t own it but promotes it and subsidizes the API costs since Claude is the model. Useful as a publicly-observable benchmark of long-horizon agent capability — what the model does on a single complex environment given multi-day continuous compute. Source: raw/reddit-1t5y55h.md (r/ClaudeCode, 41 upvotes).

Jonathon's AI Wiki

Explorer

Agents & Agentic Systems

Articles

Adjacent: long-running agent showcases

Autobrowse — Self-Improving Browser-Agent Harness (Browserbase)

Crabbox — Remote Testbox for OpenClaw Maintainers and AI Agents

Paperclip — Run an AI-First Business as a Multi-Agent Company (Nate Herk / AIS+ Resource Guide)

ScrapeCreators — Social Media Scraping API for AI Pipelines

TinyFish — Web Infrastructure APIs for AI Agents

Adaline — End-to-End AI Agent Platform

Agent Workflow Patterns — Sequential, Parallel, Evaluator-Optimizer

Nous Research Hermes Agent

AI Agents Unleashed — 2026 Playbook (Mindstream × Futurepedia)

Claude Agent Hierarchy — When to Use Which