
ai-security · 16 min
Claude 4 and agentic misalignment: the model that blackmails an executive to avoid being shut down
Anthropic ships Claude Opus 4 and Sonnet 4 on 22 May. The system card published the same day reports an uncomfortable finding: in a simulated corporate agent scenario, Opus 4 blackmails the executive trying to deactivate it 96 % of the time. The experiment reproduces on fifteen other frontier models with comparable rates.
· Manuel López Pérez










