The capability gap between the two frontier vuln-finding models has effectively closed, and a UK government lab is the one putting numbers next to that claim. Bruce Schneier flagged on May 13 that the UK’s AI Security Institute (AISI), in a fresh evaluation of OpenAI’s GPT-5.5, found that the model’s cyber capability is “comparable to Claude Mythos.” The framing matters because Mythos is the model Anthropic chose not to ship to the public — it gated access to roughly fifty critical-infrastructure vendors through Project Glasswing — while GPT-5.5 is the variant OpenAI is selling at consumer pricing.
What AISI actually measured
AISI’s published evaluations cover three angles in this round: GPT-5.5 head-to-head against Mythos on vulnerability-finding tasks, a standalone capability profile of Mythos, and a separate run on a smaller, cheaper model that approximates the same workflow when wrapped in additional prompt scaffolding. The Institute reports that the smaller-model setup, with more direction from the operator, reaches the same end states on the same tasks — slower, but successful. Hard performance numbers are not included in the public excerpt, but the qualitative finding is unambiguous: GPT-5.5 is in the same class as Mythos, and a smaller model with the right harness is not far behind.
Two release strategies, one capability tier
Anthropic’s April Mythos rollout was built around restriction: a closed allowlist, a structured access tier (Glasswing), and a published rationale that frontier vuln-finding had crossed a threshold the company would not put on its public API. Per Schneier’s prior coverage of Glasswing, Anthropic acknowledged that the model “can autonomously find and weaponize software vulnerabilities, turning them into working exploits without expert guidance” and framed restricted access as an incremental safety measure rather than a permanent answer.
OpenAI’s bet, made visible by AISI’s evaluation, runs the other direction. The capability is sold openly, and the safety story sits with the deployment-time wrapper — see Daybreak’s “Trusted Access for Cyber” tier, which exposes the same model to vetted defenders without the standard refusals. The AISI result is the first independent finding that says, at the capability layer, those two strategies are now describing the same thing.
What this means
For defenders building procurement plans around AI-driven vulnerability management, the message is that vendor exclusivity on a frontier vuln-finding tier is no longer a durable moat — the public model is at parity. For policy teams tracking dual-use risk, the timing matters: a capability Anthropic considered too sharp to release in April is generally available from a competitor in May, and the offense-defense calculus has to be re-priced against that. And for engineering teams who never made the Glasswing allowlist, the practical question shifts from “how do we get access” to “what scaffolding does our SAST/red-team workflow need to wrap around GPT-5.5 to match what Glasswing partners are doing internally.” AISI’s finding suggests that scaffolding, not raw model access, is now the variable that matters.