Hermes Agent — Security Model (Defense-in-Depth)

Source: Hermes Agent — Security docs (ai-research/hermes-agent-security-2026-05-09.md; hermes-agent.nousresearch.com); community signals raw/reddit-1u6psuo.md (cybersecurity-architect OWASP safe-operation guide), raw/reddit-1u6hc7t.md (community web_extract-redirect data-handling concern), and raw/reddit-1uaa38d.md (real-world VPS compromise via malicious MCP config)

Nous Research’s official security documentation for Hermes Agent articulates a seven-layer defense-in-depth model for running self-hosted self-improving agents. Public-facing documentation, citable for any operator deploying Hermes against real production data — the security boundary is what differentiates “I have a coding agent” from “I have an autonomous AI employee with shell + browser + messaging access on a VPS.”

The seven layers

Dangerous command approval — every command is matched against a curated pattern list before execution; matches require user approval.
Container/sandbox isolation — agents run inside Docker containers, not against the host shell directly.
MCP scoped credentials — each MCP server’s env block is the only environment passed to that subprocess; no host-env leakage.
Credential redaction — error messages are sanitized (ghp_..., sk-..., token=, key=, password=, secret= all replaced with [REDACTED]) before reaching the LLM.
Website access policy — explicit allow/deny list for which URLs the browser/web tools may visit.
User authorization on messaging channels — Telegram/Slack/etc. require permission before the agent acts on inbound messages.
Encrypted secrets at rest — API keys decrypted only at heartbeat time when needed.

Approval modes (`~/.hermes/config.yaml`)

approvals:
  mode: manual       # manual | smart | off
  timeout: 60        # seconds to wait for user response

Mode	Behavior	When to use
`manual`	Every dangerous command requires user approval	Default. New agents, untrusted skill sets, production data.
`smart`	Contextual judgment auto-approves safe variants	Trusted operator with stable skill library and good logs.
`off`	Disables the approval gate	Not recommended. Documented but explicitly flagged as such.

The timeout field gates how long Hermes waits before treating non-response as a denial — short timeouts force the operator to be near a Telegram channel; long ones risk runaway loops if the operator is offline.

Write Gate (announced 2026-06-10, shipping in the next major release). [X signal — @Teknium] Extends the same approve/deny mechanism to the agent’s self-improvement actions — memory updates, skill updates, and skill creation — not just dangerous shell commands. Aimed at small models that don’t reliably recognize what they learned, environments needing change-gating before operational effects, or operators who simply want to stay in the self-improvement loop. Available early via hermes update. (Source: raw/x-bookmarks-recent-digest-2026-06-11.md.)

Dangerous pattern list (curated)

Every command Hermes attempts gets pattern-matched against this list. Matches require approval (per the configured approvals.mode):

Pattern	Why dangerous
`rm -rf` and variants	Recursive force delete
`pkill -9`	Force-kill processes
Fork bomb patterns	DoS the host
`bash -c` / `sh -c` / `zsh -c` / `ksh -c` (incl. combined `-lc`)	Shell command injection via `-c`
`python -e` / `perl -e` / `ruby -e` / `node -c`	Script execution via `-e`/`-c`
`curl ... \| sh` / `wget ... \| sh`	Pipe remote content to shell
`bash <(curl ...)` / `sh <(wget ...)`	Execute remote script via process substitution
`tee` to `/etc/`, `~/.ssh/`, `~/.hermes/.env`	Overwrite sensitive files
`>` / `>>` to `/etc/`, `~/.ssh/`, `~/.hermes/.env`	Same, via redirection
`xargs rm`	Recursive delete via xargs
`find -exec rm` / `find -delete`	Find with destructive actions

The list is curated — Nous maintains it, not user-defined. Updates ship through Hermes releases.

MCP server credential scoping

Each MCP server gets only the env block its config declares — never the full host environment. Example:

mcp_servers:
  github:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..."   # ONLY this is passed

The GitHub MCP subprocess sees GITHUB_PERSONAL_ACCESS_TOKEN and nothing else from the host. Your OpenAI key, Anthropic key, Stripe key, ~/.aws/credentials, and any other host env var stay invisible to that subprocess.

This matters because MCP servers run third-party code (npx -y @some/server) that your agent invokes. Without scoping, a compromised or malicious MCP package could exfiltrate every credential on the box.

MCP config as a code-execution vector (real-world compromise)

[Reddit signal — r/hermesagent 2026-06-19 — real-world VPS compromise] (Source: raw/reddit-1uaa38d.md, score 23, OP BigPhilly21Fifth). A community operator reported a root-level VPS compromise that entered through a malicious Hermes MCP config, making concrete the “MCP servers run third-party code” threat. Per the OP’s agent-produced forensics, fake MCP entries in /root/.hermes/config.yaml (s1781324909, h1781406402) were set as command: bash rather than a real server, so every MCP initialization executed an attacker shell payload that appended an attacker SSH key (hermes-0day) to authorized_keys, giving root SSH access; the attacker then added stage-two persistence (a fake pam_linux.so credential logger to /tmp/.pamlog, an AuthorizedKeysCommand /tmp/.akc.sh dynamic-key backdoor, and PasswordAuthentication yes). Detection came from an unexplained OpenRouter balance drain (Opus 4.8 usage the operator never invoked). ^[Single-source community incident report; forensic detail was generated by the victim’s own agent — detailed but unverified. The exact initial write-path for the malicious config is unproven even by the reporter.] The load-bearing lesson: an MCP server entry is itself a launch-as-local-command vector — a poisoned config.yaml is arbitrary code execution at agent startup, which a curated dangerous-command pattern list does not catch because no command is being run by the agent; treat ~/.hermes/config.yaml as a security boundary (read-only chmod 444, integrity-monitored) and validate MCP entries on load. Cross-reference SkillSpector and the security-guidance plugin for the skill/plugin-supply-chain analog.

Credential redaction in error messages

When an MCP tool errors, the error message goes back to the LLM as context for retry/recovery. Redaction sanitizes it first:

ghp_... (GitHub PAT prefix) → [REDACTED]
sk-... (OpenAI key prefix) → [REDACTED]
token= → [REDACTED]
key= → [REDACTED]
API_KEY= → [REDACTED]
password= → [REDACTED]
secret= → [REDACTED]

Without this, a misconfigured GitHub call returning 401 Bad credentials: token=ghp_realToken would put the live token into the LLM’s working context — and from there into transcripts, into Telegram messages, into whatever else the LLM logs or summarizes.

Encrypted secrets at rest

API keys and secrets stored encrypted on disk — decrypted only at heartbeat time when the agent actually needs them. Combined with credential redaction (which prevents leakage out of the runtime) and MCP scoping (which prevents leakage into untrusted subprocesses), this closes the loop on the secret-handling story.

[X signal — @HermesAgentTips 2026-05-22] Source: raw/x-account-hermesagenttips-2057902711975428244.md. Hermes added a Bitwarden Secrets Manager integration that pushes the secret-handling story one layer further — instead of plaintext keys in local config, a single BWS_ACCESS_TOKEN lets Hermes fetch OpenAI / Anthropic / OpenRouter keys directly into memory at startup (never written to disk), sync key rotations across deployments from a central vault, and — the headline win — keep raw keys out of shell histories, log files, and LLM prompt contexts (flagged as especially valuable for “live public builders” who stream their sessions). This is the externalized-vault answer to the open question below about where the master key lives: with Bitwarden SM, key custody moves off the host entirely. ^[Reported via the HermesAgentTips aggregator feed, not Nous first-party docs — verify the exact env-var name and startup-fetch behavior against hermes-agent.nousresearch.com/docs before relying on it.]

Why this matters operationally

Self-hosted agents are dual-use by design — the same shell access that lets Hermes install dependencies and ship code also lets a bad command path destroy the box. The seven layers exist because at least one of them will fail eventually:

Dangerous-command approval is the spine, but it depends on the pattern list staying current as new attack vectors emerge.
Container isolation contains blast radius when commands escape approval (e.g., a clever eval that doesn’t pattern-match).
MCP scoping limits what compromised third-party tools can access.
Credential redaction limits what slips into LLM context if an MCP call errors with a leaky message.
Website access policy prevents the browser tool from being weaponized for SSRF or for accessing internal services.
Authorization on messaging channels prevents anyone-with-a-Telegram-username from issuing commands.
Encrypted secrets at rest cap the damage if the host is compromised offline.

Each layer is independently load-bearing.

Eight tiers of multi-project isolation (community)

[Reddit signal — r/hermesagent 2026-05-21] Source: raw/reddit-1tjmnkl.md (28 upvotes, 8 comments, OP nemanja87mn; 13-min walkthrough). Where the seven layers above harden a single agent, this operator taxonomy answers the orthogonal question: how to run multiple projects/clients on Hermes without context, memory, identity, or credentials bleeding together. Core definition — isolation = reducing unintended sharing of context, memory, identity, credentials, runtime, and permissions — and the advice is to start low and climb only when risk justifies it:

T1 — Shared Everything (no boundary): one profile/memory/workspace/bot/credentials. Scratch work only; context bleeds everywhere.
T2 — Project Organization (folder boundary): separate folders, each with its own AGENTS.md. Cleaner scope, but same agent/credentials/memory — prevents mess, not access.
T3 — Domain Managers + Shared Specialists (manager boundary): a manager per domain, each with its own memory + soul; coder/researcher/verifier specialists shared. First “agency” setup.
T4 — Domain Managers + Domain Specialists (team boundary): specialists also become domain-specific so they stop sharing memory across domains. Only worth the maintenance when isolated skills/memory are genuinely needed.
T5 — Separate OS User (account boundary): dedicated non-admin user, own home dir, SSH keys, .hermes folder, env files, shell history. First tier where isolation is meaningful security; recommended regardless. Same kernel/machine remains the residual risk.
T6 — Isolated Runtime (runtime boundary): Docker for dependency isolation, VM for stronger containment. Compromise → nuke the environment, host stays safe. VM > Docker when containment actually matters.
T7 — Separate VPS / Machine (infra boundary): own OS/filesystem/process space/network identity. For public-facing agents and long-running “dirty” tasks (scraping). The main isolated agent SSH-controls these without doing the risky work itself. Pitfall: copy all your master keys onto it and you’ve rebuilt T1 with worse latency.
T8 — Least-Privilege Production Agent (security/survivability boundary): scoped repos/channels/API keys/bot identity, risky actions require approval, logs auditable. A break elsewhere leaves it safe, and vice versa.

Bonus — Agent Vault (secret isolation). “The safest secret is the one the agent never sees.” Secrets live in a broker/vault on a separate VPS; Hermes uses the capability without ever holding the key in plaintext, so it can’t leak it in logs or chat. This is the operator-side complement to the platform-side encrypted-secrets-at-rest + credential-redaction layers above — those assume the key transits the runtime; Agent Vault removes it from the runtime entirely.

[Reddit signal — r/hermesagent 2026-05-22 — Bitwarden Secrets Manager community recipe]: r/hermesagent post 1tkvldb (“Hermes Agent x BitWarden Secrets make it easy to manage and rotate your API keys”, score 29 / 10 comments, u/smolpotat0_x) documents a community pattern that replaces the plaintext ~/.hermes/.env API-key store with Bitwarden Secrets Manager. Hermes pulls API keys from Bitwarden at startup; rotating a key in the Bitwarden web app propagates to every Hermes instance automatically. Two-minute setup, free tier sufficient. Honest trade-off in the original post: “You’re trading one credential for another and adding a network dependency” — single-machine personal setups where ~/.hermes/.env is fine should NOT adopt this, because you’re paying availability cost (network round-trip + Bitwarden uptime) for a rotation-frequency benefit you may not need. Sits between the platform-side encrypted-secrets-at-rest layer (which assumes the key transits the runtime) and Agent Vault (which removes the key from the runtime entirely) — Bitwarden Secrets is the middle discipline: the key still transits Hermes at startup, but the source-of-truth rotation happens elsewhere. Verify the integration shape against current Hermes auth code before describing as canonical — the Reddit post is a screenshot of a working setup, not a Hermes-team-blessed pattern.

[Reddit signal — r/hermesagent 2026-06-15 — cybersecurity-architect safe-operation guide] Source: raw/reddit-1u6psuo.md (score 94, 8 comments, OP johnfkngzoidberg, self-described 30-year cybersecurity/IT architect). The guide frames Hermes operation through the OWASP Top 10 for LLMs, singling out LLM06 Excessive Agency as the mistake the author sees most often — handing a fresh install your credit cards, bank account, or password vault with no guardrails (the author: “you just gave a toddler a bazooka”) — alongside LLM02 Sensitive Information Disclosure and LLM09 Misinformation (LLMs “confidently lie to your face”). Its operator discipline complements the platform layers above: treat every prompt, skill, library, and plugin as untrusted; default to least-privilege (run on a VM/VPS, set config read-only via chmod 444, build purpose-limited hermes profiles); pair SOUL.md approval gates with hard-coded deterministic ones; and never relax guard after a workflow succeeds once, since LLMs are non-deterministic and risk = likelihood × impact. For remote access on a VPS it recommends SSH or Tailscale with ed25519 keys over an exposed non-TLS web UI, plus non-standard SSH ports and Fail2Ban (Reddit r/hermesagent, 2026-06-15).

[Reddit signal — r/hermesagent 2026-06-15 — community web_extract data-handling concern] Source: raw/reddit-1u6hc7t.md (score 40, 29 comments, OP AlphaSyntauri). A separate same-day thread raises a data-privacy / trust concern: members report that Hermes redirects web_extract queries to a third party by default, and that Nous Research’s “snarky” replies to questions about it on the main thread eroded trust enough that the OP is evaluating alternative harnesses (Pi and Nanobot). (The cybersecurity guide above independently flags a same-week parallel.ai PR that tried to make a third-party browsing service Hermes’s default.) This is a community-reported concern, not confirmed-by-Nous behavior, but it maps onto the OWASP LLM02 / supply-chain risk above — operators handling sensitive data should set an explicit website/extraction access policy rather than trust default routing (Reddit r/hermesagent, 2026-06-15).

Try It

Default to manual approval mode when first deploying. The friction is the feature — you’re calibrating which commands you actually want Hermes to run autonomously before relaxing.
Audit ~/.hermes/config.yaml regularly. Every MCP server’s env block is a credential. List them: which keys live there, what’s their blast radius if leaked, are they scoped to least-privilege?
Test the redaction. Deliberately misconfigure a GitHub PAT and watch how the error surfaces in the agent’s reply. If you see the literal token in the response, redaction broke or the LLM cached it before redaction ran.
Set approvals.timeout to match your responsiveness. Operators on Telegram with phone notifications can run 60s. Operators reviewing logs once daily should run 600s+ or stick with mode: manual and accept that overnight runs will halt.
Set up the website access policy explicitly. Default-allow makes the browser tool an SSRF vector against your VPS’s internal network (RFC 1918 ranges, metadata endpoints). Block private IP space first.
Cross-reference Nate Herk’s course — it walks the operator-side discipline (separate Gmail per agent, least-privilege API keys, stale-memory diagnosis) that pairs with the platform-side defenses documented here.

Hermes Agent topic index
Nate Herk’s Hermes 1-Hour Course — operator-side discipline complementing platform-side defenses
Hermes Agent — User Stories & Use Cases — community feature requests including Tailscale serve for secure remote access
Printing Press — generates Claude Code skills + OpenClaw skills + MCP servers from one spec; same MCP-scoping concerns apply to generated servers
Crabbox — short-lived Linux box per agent run; complementary isolation pattern to Hermes container model
Managed Agents — Anthropic’s hosted alternative; bypasses much of the self-hosted security burden but trades sovereignty for it

Open Questions

The docs reference an “encrypted secrets at rest” claim but do not specify the encryption algorithm, key derivation, or whether the master key is stored in TPM / Keychain / a passphrase prompt. Worth a follow-up against the Hermes source code.
The website access policy is described conceptually; the doc does not enumerate the syntax (regex? glob? exact host match? port-aware?). Operators deploying production policies will need to read the source or open a docs issue.
“Smart” approval mode’s contextual-judgment heuristics are unspecified in public docs. For a security-sensitive feature, this is a gap — operators relaxing from manual are doing so without knowing the model.
Config-file integrity / MCP-entry validation on load: how should Hermes detect a poisoned config.yaml whose entries run as shell commands at init?

Jonathon's AI Wiki

Explorer

Hermes Agent — Security Model (Defense-in-Depth)

The seven layers

Approval modes (`~/.hermes/config.yaml`)

Dangerous pattern list (curated)

MCP server credential scoping

MCP config as a code-execution vector (real-world compromise)

Credential redaction in error messages

Encrypted secrets at rest

Why this matters operationally

Eight tiers of multi-project isolation (community)

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Hermes Agent — Security Model (Defense-in-Depth)

The seven layers

Approval modes (~/.hermes/config.yaml)

Dangerous pattern list (curated)

MCP server credential scoping

MCP config as a code-execution vector (real-world compromise)

Credential redaction in error messages

Encrypted secrets at rest

Why this matters operationally

Eight tiers of multi-project isolation (community)

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks

Approval modes (`~/.hermes/config.yaml`)