Source: ai-research/crabbox-homepage-2026-05-09.md, ai-research/crabbox-how-it-works-2026-05-09.md, ai-research/crabbox-architecture-2026-05-09.md (crabbox.sh docs site, fetched 2026-05-09 + gh api repos/openclaw/crabbox)
Crabbox (github.com/openclaw/crabbox, MIT, Go, 299★ at 10 days old, last push 2026-05-09) is a shared remote-testbox system for OpenClaw maintainers and AI agents. The pitch: keep the local developer story unchanged — edit, save, run — while moving compute and tests onto owned cloud capacity. A crabbox run command leases a Linux machine, syncs the dirty checkout, executes remotely, streams stdout/stderr back, and releases the machine. A small Cloudflare-hosted broker owns provider credentials, lease state, cleanup, usage, and cost guardrails so individual machines and CLIs never need to. Same loop for agents and humans — the OpenClaw plugin exposes Crabbox as five agent tools (crabbox_run / _warmup / _status / _list / _stop).
Key Takeaways
- OpenClaw-native remote-runner infra. Created Apr 30, 2026 by the OpenClaw GitHub org; ships as both a standalone CLI and a native OpenClaw plugin. Same project ecosystem as Peter Steinberger’s Printing Press and OpenClaw on Rabbit R1. 299 stars in ten days suggests strong OpenClaw-community uptake.
- Three-layer architecture. CLI (Go binary, on the laptop) — owns config, per-lease SSH key, sync, command exec, heartbeats, release. Broker (Cloudflare Worker + one Durable Object) — owns auth, lease state, provider credentials, expiry, cleanup, usage, and spend caps. Runner (vanilla Ubuntu, Hetzner / AWS Spot / Azure / static-SSH) — leaf node; holds nothing durable; provisioned, used, deleted. Runners never call back to the broker.
- Brokered, not BYO creds. Provider secrets stay in Worker environment (Cloudflare Worker secrets for AWS + Hetzner). The CLI carries only a bearer token. This is the central design choice — it’s why Crabbox can serve a fleet of agents without per-laptop AWS keys.
- Cost guardrails are first-class, not bolted on. Two cost numbers per lease:
estimatedUSD(elapsed runtime) andreservedUSD(worst-case TTL cost reserved before provisioning). Hourly price source:CRABBOX_COST_RATES_JSONoverride → provider live pricing (EC2 Spot history / Hetzner server-type) → built-in fallback rates.crabbox usagesummarizes spend by user / org / provider / server type. TTL-bounded machines + monthly spend caps + per-user/org/provider tracking — all enforced by the broker, not the CLI. - Five-phase run lifecycle. Plan (load config, mint per-lease SSH key) → Lease (
POST /v1/leases, broker reserves cost + provisions machine) → Sync (rsync dirty checkout, hydrate base ref, skip-if-fingerprint-matches) → Run (heartbeats + stream output) → Release (broker terminates runner, frees reserved cost). Bootstrap-failure safety:crabbox runretries once with a new machine on a fresh non-kept lease but never duplicates commands on--keepor--idreuse. - Warm-machine reuse is explicit.
crabbox warmupprovisions a long-lived box; subsequentcrabbox run --id <slug>/crabbox ssh --id <slug>reuses it. Idle-timeout alarm releases it if untouched. Pattern aimed at iterative dev loops + CI hydration where cold-start cost matters. - Sync is local-first, never requires clean checkout. Seeds remote Git from configured origin/base ref → overlays dirty files via rsync (with checksum-mode option) → skips no-ops via fingerprints → excludes heavy directories from repo config → guards against suspicious mass tracked deletions → hydrates base-ref history for changed-test workflows. Same model for humans editing in their IDE and agents writing patches in a sandbox.
- Brokered vs direct vs static-SSH. Brokered = normal (provider secrets in Worker, central usage history). Direct = debug fallback (local AWS SDK chain or
HCLOUD_TOKEN, no central tracking). Static-SSH =provider: sshagainst existing machines, bypasses the broker entirely. macOS + WSL2 use POSIX rsync; native Windows uses PowerShell + tar archive sync. - GitHub Actions hydration is a first-class feature.
crabbox actions hydratereuses the repo’s existing GitHub Actions setup steps so local Crabbox runs land in the same hydrated workspace as CI — no duplicate “what does our CI install?” config in repo. Notable because most remote-runner products require parallel local + CI bootstrap definitions. - OpenClaw plugin exposes 5 agent tools. The repository root is a native OpenClaw plugin package: once installed in OpenClaw, Crabbox operations show up as
crabbox_run/crabbox_warmup/crabbox_status/crabbox_list/crabbox_stop. The plugin shells out to the configuredcrabboxbinary with argv arrays — local CLI config, broker login, repo claims, and sync behavior stay owned by the CLI. This is the design pattern Anthropic’s Agent Platform team articulates as “the right way to expose tools to agents — thin shells over fat CLIs,” echoing the Printing Press CLI-tier-1 thesis. - Resource-conscious by design. Cloud-init bootstrap = SSH + Git + rsync + curl + jq + writable
/work/crabbox. Project runtimes (language, Docker, deps) come from repo-owned setup (Actions hydration / devcontainers / Nix / mise / asdf), not the Crabbox base. Bootstraps stay slim; full runtime lives where it should.
How it fits together
+----------------------+ HTTPS / JSON +--------------------------+
| your laptop | ------------------> | Cloudflare Worker |
| crabbox CLI | bearer + owner | Fleet Durable Object |
| per-lease SSH key | | provider creds |
| repo checkout | | lease + usage state |
+----------+-----------+ +------------+-------------+
| | provider API
v v
+------------------------------+ Hetzner Cloud / AWS EC2 /
| SSH (primary + fallback) | Azure / static-SSH host
+----------- rsync ------------> Ubuntu runner (/work/crabbox/<slug>/)
The CLI talks to the broker over HTTPS, then talks directly to the leased runner over SSH and rsync. The runner never calls the broker; that path stays one-way.
| Layer | Owns |
|---|---|
| CLI | config + flags; per-lease SSH key; SSH readiness; Git seeding + rsync; sync fingerprints + sanity checks; remote command + streaming; heartbeats; release |
| Broker | request auth + identity; serialized lease state; provider credentials; machine create/delete; lease expiry; pool/status/inspect; usage; spend caps |
| Provider | raw compute: Hetzner Cloud servers / AWS EC2 / Azure VMs / static-SSH |
| Runner | nothing durable; Linux prepared by cloud-init with SSH + Git + rsync + curl + jq + /work/crabbox; project runtimes come from repo-owned setup |
A run, end to end
- Plan — Load config (flags > env > repo-local
crabbox.yaml> user config > defaults). Mint a temporary lease ID and per-lease SSH key. - Lease —
POST /v1/leaseswith class, provider, TTL, idle timeout, slug, bootstrap options, SSH public key. Worker authenticates → forwards to Fleet Durable Object → DO checks active-lease + monthly spend caps, asks provider for live pricing, reserves worst-case TTL cost, provisions the machine, returns host / SSH user / port / workdir / expiry / lease ID / slug. - Sync — Wait for SSH +
crabbox-readyprobe. Seed remote Git when possible. Compare local + remote sync fingerprints; skip rsync if nothing changed. Otherwise rsync the dirty checkout into/work/crabbox/<slug>/for POSIX targets, or send a manifest tar archive for native Windows. Run sanity checks. Hydrate the configured base ref when supported. - Run — Start heartbeats in the background. Run the requested command over SSH. Stream stdout/stderr.
- Release — Release the lease unless
--keep. Broker terminates the runner and frees provider-side state. Bootstrap-failure on a fresh non-kept lease retries once with a new machine; kept or--idleases never duplicate commands.
Backends
| Backend | Status | Use case |
|---|---|---|
hetzner-ephemeral | Implemented | Default — created per lease or as overflow |
aws | Implemented | Burst capacity, managed Windows/WSL2, EC2 Mac |
azure | Implemented | Linux + native Windows SSH/sync/run |
hetzner-static | Interfaces ready | Pre-created warm-pool machines (planned) |
ssh-static | Implemented | Manually managed SSH-reachable hosts |
blacksmith-testbox | Implemented (wrapper) | Forwards lifecycle to the Blacksmith CLI; crabbox list/status normalize Blacksmith data into Crabbox’s common view |
github-actions | Future | Register/dispatch real Actions-backed runner work for workflow parity |
external-runner | Future | Adapter boundary for other hosted-runner systems |
Coordinator API
Implemented endpoints (/v1/...):
- Health/identity:
GET /health,GET /pool(admin),GET /whoami - Leases:
POST /leases,GET /leases,GET /leases/{id-or-slug},POST /leases/{id-or-slug}/heartbeat,POST /leases/{id-or-slug}/release - Runs:
GET /runs,POST /runs,GET /runs/{run-id},GET /runs/{run-id}/logs,POST /runs/{run-id}/finish,POST /runs/{run-id}/telemetry - Usage:
GET /usage(groups by user / org / provider / server type) - Admin:
GET /admin/leases,POST /admin/leases/{id-or-slug}/release,POST /admin/leases/{id-or-slug}/delete(separate admin token)
Auth: signed GitHub login tokens for normal users (crabbox login flow, requires allowed-org membership) OR shared bearer tokens for trusted automation. Cloudflare Access can sit in front of the Worker; only verified Access JWT email becomes the bearer-token owner.
Heartbeats may carry a telemetry object (Linux load average, memory, root-disk, uptime); coordinator stores latest snapshot + bounded telemetryHistory ring of 60 samples for portal trend charts.
Why this matters for AI agents
Crabbox is the concrete OpenClaw answer to the problem the Anthropic Platform team identified as “the wall” in the AI and I podcast: “everyone hits the same problem of like, oh, wow, I either need to keep a server constantly running or I need to use infrastructure that will spin up and spin down, and I need to store the transcript data, and I need secure sandboxing.”
Anthropic’s Managed Agents is the closed-source, hosted answer for builders on the Claude API. Crabbox is the open-source, self-hosted answer for OpenClaw. Both solve the same wall — sandbox isolation, lease lifecycle, transcript persistence, cost guardrails — but for different agent runtimes.
For a single user with a Mac mini running oh-my-claudecode or Hermes, Crabbox isn’t strictly needed. The sandbox-on-demand pattern earns its keep when:
- You have multiple people / agents who need isolated runs and don’t want to share one Mac mini’s filesystem.
- You want cost ceilings the agent can’t bust through (a runaway loop on a leased Hetzner box stops at the TTL cap; on your Mac mini it just keeps burning your CPU).
- You need per-run cleanup guarantees — Hetzner deletes the machine on release; your Mac mini accumulates state.
- You want workflow parity with CI — Actions hydration means local Crabbox runs land in the same workspace your CI exercises.
Try It
- Install + login.
brew install openclaw/tap/crabbox crabbox login # bearer token in OS keychain crabbox doctor # validate config + network + SSH key - One-shot run.
crabbox run -- pnpm test(or your test command). Watch the output stream. Confirm the machine is gone after release:crabbox listshows nothing. - Warm-box reuse.
crabbox warmupreturns a slug likeblue-lobster. Thencrabbox run --id blue-lobster -- pnpm test:changedreuses the box. SSH in for debugging:crabbox ssh --id blue-lobster. Stop withcrabbox stop blue-lobster. - Cost-aware monthly cap. Set
CRABBOX_COST_RATES_JSONif you want explicit pricing. Run a few jobs, thencrabbox usageto see spend grouped by provider/server type. - OpenClaw plugin path. If you’re already on OpenClaw, install the Crabbox plugin from the same repo. Five new agent tools (
crabbox_run/_warmup/_status/_list/_stop) become available; configureplugins.entries.crabbox.config.binaryonly ifcrabboxisn’t onPATH. - Actions hydration. If your repo has GitHub Actions setup steps, run
crabbox actions hydrateto mirror them on the leased box — local runs match CI without duplicating bootstrap config. - Direct fallback. When debugging the broker itself, use direct provider mode (
HCLOUD_TOKENor AWS SDK chain in env). No central usage history, no heartbeat — diagnosis only.
Open Questions
- Public coordinator vs. self-host. The docs assume operators run their own Cloudflare Worker + Durable Object. There’s no obvious “hosted Crabbox” SaaS — every team running Crabbox in brokered mode runs their own broker. Cost guardrails work because you set the spend caps; an external SaaS broker would need a different trust model.
- Comparison to Anthropic Managed Agents. Both solve the same wall. Crabbox = open + self-hosted + OpenClaw-native + multi-cloud (Hetzner, AWS, Azure, SSH); Managed Agents = closed + hosted + Anthropic-API-native + Anthropic infra. Pricing comparison not made in this article — needs separate research when Managed Agents pricing is published.
- Blacksmith testbox positioning. Why ship a
blacksmith-testboxprovider wrapper? Either Crabbox views Blacksmith as a peer/competitor and wants compatibility, or there’s an OpenClaw-Blacksmith partnership not surfaced in the docs. - Active-lease ceiling and queueing behavior. Docs mention active-lease and monthly spend caps but don’t specify queue behavior when a lease is denied — silent fail, retry-with-backoff, fast-fail to caller? Worth confirming before exposing to many concurrent agents.
- Telemetry granularity. Heartbeat telemetry is “Linux load average, memory, root-disk use, uptime, source, capture timestamp.” No CPU per-process / network / GPU metrics. For multi-agent training-loop scenarios, that’s likely enough; for production-runtime SLA monitoring, probably not.
- Stars trajectory. 299 stars in 10 days for a niche infra tool is fast — likely OpenClaw-community-driven. Re-check at 30 days to see if uptake holds outside the OpenClaw maintainer cohort.
Related
- OpenClaw on Rabbit R1 — Crabbox is one piece of the OpenClaw infrastructure surface; R1 is the consumer-hardware voice surface.
- Printing Press — Agent-Designed CLI Factory — Same OpenClaw ecosystem; Press is for generating agent CLIs, Crabbox is for running them. Peter Steinberger acknowledged in Press; openclaw GitHub org publishes both.
- Inside Claude’s Agent Platform — Angela & Caitlin — Articulates the infrastructure-was-the-wall thesis Crabbox solves for the OpenClaw side. Anthropic’s Managed Agents is the closed-source counterpart.
- TinyFish — Web infra APIs that already integrate with OpenClaw. Different layer — TinyFish provides agent-readable web (Search/Fetch/Browser/Agent), Crabbox provides agent-writable Linux machines.
- ScrapeCreators — Sister social-platform-deep API. Different layer; same “agent-native infra under one API key” framing.
- oh-my-claudecode — Local self-hosted Claude Code harness. Crabbox is the remote-isolation upgrade path when oh-my-claudecode’s Mac mini setup hits the wall.
- Hermes Agent (internal) — Self-hosted agent framework; could pair with Crabbox for sandboxed runs. WEO Marketly internal.
- Claude Managed Agents — Anthropic’s hosted answer to the same infrastructure-wall problem. Closed + Anthropic-API-native vs. Crabbox open + multi-cloud.
- Code with Claude 2026 Keynote — Where Anthropic announced Managed Agents. Crabbox lands as the OpenClaw response.
- 2026 Claude Code AIOS Pattern — Cross-topic synthesis on convergent agent-OS implementations; Crabbox fits as the isolated-execution-environment primitive in that pattern.
- Agent Workflow Patterns — Sequential / parallel / evaluator-optimizer. Crabbox provides the underlying compute; the patterns ride on top.
- Claude Code CLI Reference — Crabbox runs whatever command you give it; “claude” is a common payload.