Jonathon's AI Wiki

Tag: claude-mythos-preview

3 items with this tag.

  • Jun 11, 2026

    Claude Mythos Preview — Anthropic System Card (April 7 2026)

    • claude-mythos-preview
    • system-card
    • anthropic
    • rsp-3-0
    • frontier-model
    • alignment
    • model-welfare
    • project-glasswing
    • cybersecurity
    • opus-4-6
    • defensive-cyber
    • not-generally-available
    • evaluation-awareness
    • recursive-self-improvement
    • primary-source
  • May 27, 2026

    Anthropic Engineering — How We Contain Claude Across Products (3-Risk × 3-Defense Frame)

    • anthropic-engineering
    • agent-containment
    • sandboxing
    • blast-radius
    • claude-ai-runtime
    • claude-code-runtime
    • cowork-runtime
    • gvisor
    • vm-isolation
    • egress-allowlist
    • claude-mythos-preview
    • prompt-injection
    • agent-identity
    • post-mortem
    • x-sourced
    • reddit-sourced
    • r-claudecode
  • May 08, 2026

    Translating Claude's Thoughts Into Language — Activation-to-Text Interpretability

    • interpretability
    • mechanistic-interpretability
    • activations
    • alignment
    • safety-evals
    • anthropic-research
    • blackmail-test
    • self-introspection
    • claude-mythos-preview

Created with Quartz v4.5.2 © 2026

  • ✦ Explore the graph in 3D