Source: Hermes Agent — Security docs (ai-research/hermes-agent-security-2026-05-09.md; hermes-agent.nousresearch.com)
Nous Research’s official security documentation for Hermes Agent articulates a seven-layer defense-in-depth model for running self-hosted self-improving agents. Public-facing documentation, citable for any operator deploying Hermes against real production data — the security boundary is what differentiates “I have a coding agent” from “I have an autonomous AI employee with shell + browser + messaging access on a VPS.”
The seven layers
- Dangerous command approval — every command is matched against a curated pattern list before execution; matches require user approval.
- Container/sandbox isolation — agents run inside Docker containers, not against the host shell directly.
- MCP scoped credentials — each MCP server’s
envblock is the only environment passed to that subprocess; no host-env leakage. - Credential redaction — error messages are sanitized (
ghp_...,sk-...,token=,key=,password=,secret=all replaced with[REDACTED]) before reaching the LLM. - Website access policy — explicit allow/deny list for which URLs the browser/web tools may visit.
- User authorization on messaging channels — Telegram/Slack/etc. require permission before the agent acts on inbound messages.
- Encrypted secrets at rest — API keys decrypted only at heartbeat time when needed.
Approval modes (~/.hermes/config.yaml)
approvals:
mode: manual # manual | smart | off
timeout: 60 # seconds to wait for user response| Mode | Behavior | When to use |
|---|---|---|
manual | Every dangerous command requires user approval | Default. New agents, untrusted skill sets, production data. |
smart | Contextual judgment auto-approves safe variants | Trusted operator with stable skill library and good logs. |
off | Disables the approval gate | Not recommended. Documented but explicitly flagged as such. |
The timeout field gates how long Hermes waits before treating non-response as a denial — short timeouts force the operator to be near a Telegram channel; long ones risk runaway loops if the operator is offline.
Dangerous pattern list (curated)
Every command Hermes attempts gets pattern-matched against this list. Matches require approval (per the configured approvals.mode):
| Pattern | Why dangerous |
|---|---|
rm -rf and variants | Recursive force delete |
pkill -9 | Force-kill processes |
| Fork bomb patterns | DoS the host |
bash -c / sh -c / zsh -c / ksh -c (incl. combined -lc) | Shell command injection via -c |
python -e / perl -e / ruby -e / node -c | Script execution via -e/-c |
curl ... | sh / wget ... | sh | Pipe remote content to shell |
bash <(curl ...) / sh <(wget ...) | Execute remote script via process substitution |
tee to /etc/, ~/.ssh/, ~/.hermes/.env | Overwrite sensitive files |
> / >> to /etc/, ~/.ssh/, ~/.hermes/.env | Same, via redirection |
xargs rm | Recursive delete via xargs |
find -exec rm / find -delete | Find with destructive actions |
The list is curated — Nous maintains it, not user-defined. Updates ship through Hermes releases.
MCP server credential scoping
Each MCP server gets only the env block its config declares — never the full host environment. Example:
mcp_servers:
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..." # ONLY this is passedThe GitHub MCP subprocess sees GITHUB_PERSONAL_ACCESS_TOKEN and nothing else from the host. Your OpenAI key, Anthropic key, Stripe key, ~/.aws/credentials, and any other host env var stay invisible to that subprocess.
This matters because MCP servers run third-party code (npx -y @some/server) that your agent invokes. Without scoping, a compromised or malicious MCP package could exfiltrate every credential on the box.
Credential redaction in error messages
When an MCP tool errors, the error message goes back to the LLM as context for retry/recovery. Redaction sanitizes it first:
ghp_...(GitHub PAT prefix) →[REDACTED]sk-...(OpenAI key prefix) →[REDACTED]token=→[REDACTED]key=→[REDACTED]API_KEY=→[REDACTED]password=→[REDACTED]secret=→[REDACTED]
Without this, a misconfigured GitHub call returning 401 Bad credentials: token=ghp_realToken would put the live token into the LLM’s working context — and from there into transcripts, into Telegram messages, into whatever else the LLM logs or summarizes.
Encrypted secrets at rest
API keys and secrets stored encrypted on disk — decrypted only at heartbeat time when the agent actually needs them. Combined with credential redaction (which prevents leakage out of the runtime) and MCP scoping (which prevents leakage into untrusted subprocesses), this closes the loop on the secret-handling story.
Why this matters operationally
Self-hosted agents are dual-use by design — the same shell access that lets Hermes install dependencies and ship code also lets a bad command path destroy the box. The seven layers exist because at least one of them will fail eventually:
- Dangerous-command approval is the spine, but it depends on the pattern list staying current as new attack vectors emerge.
- Container isolation contains blast radius when commands escape approval (e.g., a clever
evalthat doesn’t pattern-match). - MCP scoping limits what compromised third-party tools can access.
- Credential redaction limits what slips into LLM context if an MCP call errors with a leaky message.
- Website access policy prevents the browser tool from being weaponized for SSRF or for accessing internal services.
- Authorization on messaging channels prevents anyone-with-a-Telegram-username from issuing commands.
- Encrypted secrets at rest cap the damage if the host is compromised offline.
Each layer is independently load-bearing.
Try It
- Default to manual approval mode when first deploying. The friction is the feature — you’re calibrating which commands you actually want Hermes to run autonomously before relaxing.
- Audit
~/.hermes/config.yamlregularly. Every MCP server’senvblock is a credential. List them: which keys live there, what’s their blast radius if leaked, are they scoped to least-privilege? - Test the redaction. Deliberately misconfigure a GitHub PAT and watch how the error surfaces in the agent’s reply. If you see the literal token in the response, redaction broke or the LLM cached it before redaction ran.
- Set
approvals.timeoutto match your responsiveness. Operators on Telegram with phone notifications can run 60s. Operators reviewing logs once daily should run 600s+ or stick withmode: manualand accept that overnight runs will halt. - Set up the website access policy explicitly. Default-allow makes the browser tool an SSRF vector against your VPS’s internal network (RFC 1918 ranges, metadata endpoints). Block private IP space first.
- Cross-reference Nate Herk’s course — it walks the operator-side discipline (separate Gmail per agent, least-privilege API keys, stale-memory diagnosis) that pairs with the platform-side defenses documented here.
Related
- Hermes Agent topic index
- Nate Herk’s Hermes 1-Hour Course — operator-side discipline complementing platform-side defenses
- Hermes Agent — User Stories & Use Cases — community feature requests including Tailscale serve for secure remote access
- Printing Press — generates Claude Code skills + OpenClaw skills + MCP servers from one spec; same MCP-scoping concerns apply to generated servers
- Crabbox — short-lived Linux box per agent run; complementary isolation pattern to Hermes container model
- Managed Agents — Anthropic’s hosted alternative; bypasses much of the self-hosted security burden but trades sovereignty for it
Open Questions
- The docs reference an “encrypted secrets at rest” claim but do not specify the encryption algorithm, key derivation, or whether the master key is stored in TPM / Keychain / a passphrase prompt. Worth a follow-up against the Hermes source code.
- The website access policy is described conceptually; the doc does not enumerate the syntax (regex? glob? exact host match? port-aware?). Operators deploying production policies will need to read the source or open a docs issue.
- “Smart” approval mode’s contextual-judgment heuristics are unspecified in public docs. For a security-sensitive feature, this is a gap — operators relaxing from
manualare doing so without knowing the model.