We Still Need Offensive AI for Defense

Anthropic gave Mythos to a chosen few and quietly throttled Fable 5 for everyone else. We ran this experiment in 1995, and we already know how it ends.

In late 2023, I published a Substack article titled We need offensive GenAI for defensive use. The argument was simple: governments and frontier labs were rushing to restrict offensive cyber capabilities instead of putting more of them in defenders' hands, and that instinct, however well-meaning, would leave defenders worse off than the attackers they face. Two and a half years later, the argument is no longer hypothetical. It has a product line.

The two locks

In April, Anthropic unveiled Claude Mythos, a model that finds and weaponizes software vulnerabilities at a level no system had reached before. Judged too dangerous to release openly, it went behind Project Glasswing, handed first to a small, vetted set of cyber defenders and critical-infrastructure operators. Two months later came Fable 5, billed as the first Mythos-class model the public can actually use. The public version arrives with its cyber teeth filed down.

Each decision is defensible on its own terms. Together they leave most defenders standing a step behind the attackers they're paid to stop.

Mythos: the problem is the guest list

Mythos's limitation is about membership. "Differential access" (giving defenders the powerful model before anyone else) sounds like exactly the right instinct, and in principle it is. But in practice "defenders" became a curated roster: the companies and agencies cleared into Glasswing. The three-person IT team at a regional hospital, the engineer who keeps a municipal water system running, the mid-market software vendor whose product sits inside ten thousand other networks (the soft targets attackers actually pick off) are not on the list. Neither are most US banks, manufacturers, transportation companies, and the long tail of organizations that quietly hold the economy together. What decides whether you get the strongest cyber model in the world isn't whether you have something worth defending. It's whether you made the cut.

Fable 5: the problem is the dimmer switch, and that it's hidden

Fable 5's limitation is about capability. Anthropic wired in a router: ask a cybersecurity question and the model "falls back" to the weaker Opus 4.8. Anthropic says it happens in under 5% of sessions, so most users never notice. But the defender does: SANS's chief AI officer found routine incident-response, detection, and forensics work quietly demoted to the lesser model before he could put a finger on the cause.

So defenders take two hits. The public one: the work that matters most to them routes to a weaker model. The hidden one: they're never told when. You cannot calibrate how far to trust a tool that won't tell you when it's holding back. The attacker has no such problem. He isn't using the polite public router, but an open-weight model with no guardrails, or jailbreaks a frontier one, or simply waits for the capability to leak downward, as it always does within months.

To its credit, Anthropic didn't dig in. "We made the wrong tradeoff and we apologize for not getting the balance right," it wrote. It had reached for invisible safeguards, it explained, because they "can be targeted more narrowly, allowing us to ship quickly," adding, "and that was the wrong tradeoff." The remedy lands this week: flagged requests "will visibly fall back to Opus 4.8," and on the API will "return a reason for their refusal." In Anthropic's own words: "You will see this every time it happens." That is the right reflex, and a faster, plainer correction than most of the industry manages. It deserves real credit.

But notice what the apology fixes and what it doesn't. It makes the safeguards honest; it leaves the defender's downgrade fully in place. Visibility is not capability. The defender who can now see an incident-response query fall back to Opus 4.8 is better informed, but she is still stuck with the weaker model. The correction tells you when you've been throttled; it doesn't stop the throttling. That Anthropic moved this quickly is the encouraging part. It's also the reason to ask it to finish the job rather than stop at disclosure.

The asymmetry runs the wrong way

None of this is an argument against safety. It's an argument about who safety is for. Defense is the harder side of the board: the attacker needs one exploitable flaw; the defender has to find and fix all of them, across sprawling estates, on a deadline set by someone else. For the first time in a decade, a tool exists that bends that math back toward the defender, but only if the defender is allowed to hold it.

And here is the catch that the whole debate keeps stepping around: the offensive prompt and the defensive prompt are the same prompt. Enumerate this company's exposed surface, find the weak configurations and exposed credentials, build and test the exploits describes an attack and a Tuesday-morning self-assessment in identical words. Throttle the capability and you throttle the defender running it against his own network before the attacker gets there. The attacker was never going to ask for access.

The precedent: Farmer and Venema

None of this is new. There's a very close precedent, and it comes from inside our own field.

In 1995, Dan Farmer and Wietse Venema released SATAN, the Security Administrator Tool for Analyzing Networks. Venema had already written tcp_wrappers, the first host-based firewall, free and running on machines everywhere; Farmer had studied security under Gene Spafford at Purdue. SATAN mostly automated the discovery of well-known flaws the two had already documented in print. Releasing it as a free download set off a national firestorm anyway. According to a 1996 PC Magazine article, its free availability raised national-security concerns at the US Department of Justice, which applied pressure against Silicon Graphics, where Farmer worked. Farmer and Venema refused to restrict access, and Farmer parted ways with SGI over it.

The fear was word-for-word today's: put an automated vulnerability finder in everyone's hands and you arm every attacker on the internet. Instead, what has happened is that vulnerability management has become a multi-billion dollar industry, and vulnerability scanning ubiquitous. The defensive value buried the cost of attackers having the same tool, because attackers could find those flaws regardless, and defenders, finally, could keep pace.

Mythos and Fable are SATAN three decades on, at a level of capability Farmer and Venema could only have imagined, and we are relitigating their 1995 argument almost verbatim. The only thing that's advanced is the gatekeeping: finer-grained now, and quieter. Farmer and Venema were pressured to restrict access to a far blunter tool, and they refused, and they were right.

The line, drawn wrong

The policy language has barely moved in three years. "Automated vulnerability discovery and exploitation" still gets filed under serious risk. And it is a serious risk, for exactly as long as attackers are the only ones who hold it. Gate it to a vetted few and you've dug a moat around the best-resourced defenders while leaving everyone else on the far bank. Ship the public a version with the cyber capability turned down (now, after a welcome apology, turned down in plain sight rather than in secret), and you've still pulled the advantage back from the defenders who need it most. Honesty about the throttle is progress. It is not the same as removing it.

You don't defuse an offensive capability by rationing it. You defuse it by making sure the defender has it too: all of them, and in the open. Anthropic got the principle right: give defenders the advantage first. The next move is to take that principle the rest of the way, not to stop at making the gate visible. Widen the circle past the cleared list. Give the public version its full defensive capability, not a labeled downgrade. The advantage has to reach the defenders who actually need it, not just the ones who were already on it.

Author: Arve Kjoelen. A revision of "We need offensive GenAI for defensive use" (Improving Defenses, Substack, 2023).