Source: Hermes Agent — Security docs (ai-research/hermes-agent-security-2026-05-09.md; hermes-agent.nousresearch.com)

Nous Research’s official security documentation for Hermes Agent articulates a seven-layer defense-in-depth model for running self-hosted self-improving agents. Public-facing documentation, citable for any operator deploying Hermes against real production data — the security boundary is what differentiates “I have a coding agent” from “I have an autonomous AI employee with shell + browser + messaging access on a VPS.”

The seven layers

  1. Dangerous command approval — every command is matched against a curated pattern list before execution; matches require user approval.
  2. Container/sandbox isolation — agents run inside Docker containers, not against the host shell directly.
  3. MCP scoped credentials — each MCP server’s env block is the only environment passed to that subprocess; no host-env leakage.
  4. Credential redaction — error messages are sanitized (ghp_..., sk-..., token=, key=, password=, secret= all replaced with [REDACTED]) before reaching the LLM.
  5. Website access policy — explicit allow/deny list for which URLs the browser/web tools may visit.
  6. User authorization on messaging channels — Telegram/Slack/etc. require permission before the agent acts on inbound messages.
  7. Encrypted secrets at rest — API keys decrypted only at heartbeat time when needed.

Approval modes (~/.hermes/config.yaml)

approvals:
  mode: manual       # manual | smart | off
  timeout: 60        # seconds to wait for user response
ModeBehaviorWhen to use
manualEvery dangerous command requires user approvalDefault. New agents, untrusted skill sets, production data.
smartContextual judgment auto-approves safe variantsTrusted operator with stable skill library and good logs.
offDisables the approval gateNot recommended. Documented but explicitly flagged as such.

The timeout field gates how long Hermes waits before treating non-response as a denial — short timeouts force the operator to be near a Telegram channel; long ones risk runaway loops if the operator is offline.

Dangerous pattern list (curated)

Every command Hermes attempts gets pattern-matched against this list. Matches require approval (per the configured approvals.mode):

PatternWhy dangerous
rm -rf and variantsRecursive force delete
pkill -9Force-kill processes
Fork bomb patternsDoS the host
bash -c / sh -c / zsh -c / ksh -c (incl. combined -lc)Shell command injection via -c
python -e / perl -e / ruby -e / node -cScript execution via -e/-c
curl ... | sh / wget ... | shPipe remote content to shell
bash <(curl ...) / sh <(wget ...)Execute remote script via process substitution
tee to /etc/, ~/.ssh/, ~/.hermes/.envOverwrite sensitive files
> / >> to /etc/, ~/.ssh/, ~/.hermes/.envSame, via redirection
xargs rmRecursive delete via xargs
find -exec rm / find -deleteFind with destructive actions

The list is curated — Nous maintains it, not user-defined. Updates ship through Hermes releases.

MCP server credential scoping

Each MCP server gets only the env block its config declares — never the full host environment. Example:

mcp_servers:
  github:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..."   # ONLY this is passed

The GitHub MCP subprocess sees GITHUB_PERSONAL_ACCESS_TOKEN and nothing else from the host. Your OpenAI key, Anthropic key, Stripe key, ~/.aws/credentials, and any other host env var stay invisible to that subprocess.

This matters because MCP servers run third-party code (npx -y @some/server) that your agent invokes. Without scoping, a compromised or malicious MCP package could exfiltrate every credential on the box.

Credential redaction in error messages

When an MCP tool errors, the error message goes back to the LLM as context for retry/recovery. Redaction sanitizes it first:

  • ghp_... (GitHub PAT prefix) → [REDACTED]
  • sk-... (OpenAI key prefix) → [REDACTED]
  • token=[REDACTED]
  • key=[REDACTED]
  • API_KEY=[REDACTED]
  • password=[REDACTED]
  • secret=[REDACTED]

Without this, a misconfigured GitHub call returning 401 Bad credentials: token=ghp_realToken would put the live token into the LLM’s working context — and from there into transcripts, into Telegram messages, into whatever else the LLM logs or summarizes.

Encrypted secrets at rest

API keys and secrets stored encrypted on disk — decrypted only at heartbeat time when the agent actually needs them. Combined with credential redaction (which prevents leakage out of the runtime) and MCP scoping (which prevents leakage into untrusted subprocesses), this closes the loop on the secret-handling story.

Why this matters operationally

Self-hosted agents are dual-use by design — the same shell access that lets Hermes install dependencies and ship code also lets a bad command path destroy the box. The seven layers exist because at least one of them will fail eventually:

  • Dangerous-command approval is the spine, but it depends on the pattern list staying current as new attack vectors emerge.
  • Container isolation contains blast radius when commands escape approval (e.g., a clever eval that doesn’t pattern-match).
  • MCP scoping limits what compromised third-party tools can access.
  • Credential redaction limits what slips into LLM context if an MCP call errors with a leaky message.
  • Website access policy prevents the browser tool from being weaponized for SSRF or for accessing internal services.
  • Authorization on messaging channels prevents anyone-with-a-Telegram-username from issuing commands.
  • Encrypted secrets at rest cap the damage if the host is compromised offline.

Each layer is independently load-bearing.

Try It

  1. Default to manual approval mode when first deploying. The friction is the feature — you’re calibrating which commands you actually want Hermes to run autonomously before relaxing.
  2. Audit ~/.hermes/config.yaml regularly. Every MCP server’s env block is a credential. List them: which keys live there, what’s their blast radius if leaked, are they scoped to least-privilege?
  3. Test the redaction. Deliberately misconfigure a GitHub PAT and watch how the error surfaces in the agent’s reply. If you see the literal token in the response, redaction broke or the LLM cached it before redaction ran.
  4. Set approvals.timeout to match your responsiveness. Operators on Telegram with phone notifications can run 60s. Operators reviewing logs once daily should run 600s+ or stick with mode: manual and accept that overnight runs will halt.
  5. Set up the website access policy explicitly. Default-allow makes the browser tool an SSRF vector against your VPS’s internal network (RFC 1918 ranges, metadata endpoints). Block private IP space first.
  6. Cross-reference Nate Herk’s course — it walks the operator-side discipline (separate Gmail per agent, least-privilege API keys, stale-memory diagnosis) that pairs with the platform-side defenses documented here.

Open Questions

  • The docs reference an “encrypted secrets at rest” claim but do not specify the encryption algorithm, key derivation, or whether the master key is stored in TPM / Keychain / a passphrase prompt. Worth a follow-up against the Hermes source code.
  • The website access policy is described conceptually; the doc does not enumerate the syntax (regex? glob? exact host match? port-aware?). Operators deploying production policies will need to read the source or open a docs issue.
  • “Smart” approval mode’s contextual-judgment heuristics are unspecified in public docs. For a security-sensitive feature, this is a gap — operators relaxing from manual are doing so without knowing the model.