Jonathon's AI Wiki
Search
Search
Dark mode
Light mode
Explorer
Tag: alignment
5 items with this tag.
Jun 16, 2026
Claude Fable 5 and Claude Mythos 5
claude
model-release
mythos
fable-5
mythos-5
safeguards
project-glasswing
frontier-model
pricing
system-card
swe-bench
alignment
model-welfare
cyber
rsp
Jun 15, 2026
Claude Opus 4.8 — Anthropic Release + System Card (May 28, 2026)
opus-4-8
claude
model-release
system-card
anthropic
benchmarks
alignment
safety
agentic
long-context
pricing
mythos-preview
Jun 11, 2026
Claude Mythos Preview — Anthropic System Card (April 7 2026)
claude-mythos-preview
system-card
anthropic
rsp-3-0
frontier-model
alignment
model-welfare
project-glasswing
cybersecurity
opus-4-6
defensive-cyber
not-generally-available
evaluation-awareness
recursive-self-improvement
primary-source
Jun 10, 2026
When AI Builds Itself — Anthropic on Recursive Self-Improvement
anthropic
anthropic-institute
recursive-self-improvement
ai-accelerating-ai
mythos-preview
benchmarks
agentic-coding
research-automation
alignment
ai-safety
capability-curve
jack-clark
May 08, 2026
Translating Claude's Thoughts Into Language — Activation-to-Text Interpretability
interpretability
mechanistic-interpretability
activations
alignment
safety-evals
anthropic-research
blackmail-test
self-introspection
claude-mythos-preview