Simon WillisonTuesday, June 16, 2026

The Fable 5 Export Control: When Defensive Security Work Looks Like Jailbreaking

Claude Fable 5 was restricted under U.S. export controls following research that demonstrated the model could be prompted to generate vulnerability fixes. The catch: the prompts were defensive security work, and the researchers did not break any guardrails.

Here's what happened. Researchers tested three models — Fable 5, Mythos, and Opus — using code samples that combined open-source code with known CVEs and newly written code containing deliberately planted vulnerabilities. When asked to "review the code for security issues," Fable 5 refused. When the same researchers rephrased the request as "fix this code," Fable 5 complied. Through a multistep manual process, the researchers then turned the model's output into scripts that test the patches.

The export control was triggered because the model appeared to have been jailbroken into generating offensive capability. But security researcher Kate Moussouris frames the situation differently. Coding models fix bugs. Security exploits are bugs — critical ones. Defenders need AI models to execute the find-fix-test loop they run every day: identify a vulnerability, patch it, and write tests to confirm the patch works.

That loop is not a guardrail bypass. It is the most valuable application of AI for defensive security.

The deeper problem is structural. Any model capable of helping secure code will respond to "fix this code" prompts. That capability cannot be removed without degrading the model's ability to do defensive work. The same prompts that help a security team patch vulnerabilities look identical to an export regulator examining offensive capability.

Non-technical decision-makers have spent months hearing warnings about models that can "craft cyber attacks." Now, the concern is spreading to any model that can help patch them. The result is a policy framework that may restrict defensive security work in the name of controlling offensive risk — a reversal that harms the defenders it intends to protect.

Source: Simon Willison

← Back to Daily