Jonathon's AI Wiki

Tag: alignment

5 items with this tag.

  • Jun 16, 2026

    Claude Fable 5 and Claude Mythos 5

    • claude
    • model-release
    • mythos
    • fable-5
    • mythos-5
    • safeguards
    • project-glasswing
    • frontier-model
    • pricing
    • system-card
    • swe-bench
    • alignment
    • model-welfare
    • cyber
    • rsp
  • Jun 15, 2026

    Claude Opus 4.8 — Anthropic Release + System Card (May 28, 2026)

    • opus-4-8
    • claude
    • model-release
    • system-card
    • anthropic
    • benchmarks
    • alignment
    • safety
    • agentic
    • long-context
    • pricing
    • mythos-preview
  • Jun 11, 2026

    Claude Mythos Preview — Anthropic System Card (April 7 2026)

    • claude-mythos-preview
    • system-card
    • anthropic
    • rsp-3-0
    • frontier-model
    • alignment
    • model-welfare
    • project-glasswing
    • cybersecurity
    • opus-4-6
    • defensive-cyber
    • not-generally-available
    • evaluation-awareness
    • recursive-self-improvement
    • primary-source
  • Jun 10, 2026

    When AI Builds Itself — Anthropic on Recursive Self-Improvement

    • anthropic
    • anthropic-institute
    • recursive-self-improvement
    • ai-accelerating-ai
    • mythos-preview
    • benchmarks
    • agentic-coding
    • research-automation
    • alignment
    • ai-safety
    • capability-curve
    • jack-clark
  • May 08, 2026

    Translating Claude's Thoughts Into Language — Activation-to-Text Interpretability

    • interpretability
    • mechanistic-interpretability
    • activations
    • alignment
    • safety-evals
    • anthropic-research
    • blackmail-test
    • self-introspection
    • claude-mythos-preview

Created with Quartz v4.5.2 © 2026

  • ✦ Explore the graph in 3D