There Is No 'Security Meter' for AI — Schneier Makes the Case for Assurance

As organizations rush to put a number on how secure their AI systems are, Bruce Schneier’s essay “On AI Security” offers a blunt corrective: that number does not exist, and pretending otherwise is its own risk. The piece is short on alarm and long on process, which is precisely why it is worth the attention of anyone being asked to sign off on an AI deployment.

Why benchmarks miss the point

Schneier’s central claim is that benchmarks do not actually measure AI capabilities — and security is the hardest capability of all to benchmark. A test suite can tell you how a model performs on a fixed set of prompts, but it cannot capture emergent, systemic properties: the behaviors that surface only when a model is wired into tools, data, and other agents inside a live business process. A high score on a jailbreak benchmark says little about how the same system behaves when an attacker controls part of its input or chains it to a downstream action.

The problem is structural, not a matter of building a better benchmark. Security failures in deployed AI tend to live in the gaps between components and in adversarial conditions the benchmark author never anticipated. A single “security meter,” Schneier argues, gives a false sense of completion for a problem that is never finished.

Borrow the playbook from software security

Rather than treat this as a dead end, the essay reframes it through the history of application security. Software security matured over roughly thirty years by moving through stages — from penetration testing, to static and dynamic code analysis, to process-driven standards like the Building Security In Maturity Model (BSIMM) that describe what mature organizations actually do. The lesson is that assurance came from process, not from one definitive measurement.

Schneier points to the Berryville Institute of Machine Learning’s work on the limits of AI security measurement as a starting point for that same evolution. The framing he favors is to “clean up our WHAT piles” — get clear on what you are actually trying to protect and assure — and then manage risk by identifying and applying good assurance processes, rather than reducing the question to a leaderboard.

What this means

For practitioners, the takeaway is procedural. If a vendor or internal team is presenting an AI system as “secure” on the strength of benchmark scores, that is a yellow flag, not a green light — the score is measuring the wrong thing. Stronger assurance looks like threat modeling the full system including its tool integrations and agent connections, red-teaming under adversarial input rather than canned prompts, and adopting the kind of maturity-model thinking that software security took decades to reach.

The uncomfortable corollary is that AI security, like software security before it, has no finish line. There is no meter that reads “safe.” There is only the discipline of continuous vigilance and the honest accounting of what you have and have not assured.

There Is No 'Security Meter' for AI — Schneier Makes the Case for Assurance

Why benchmarks miss the point

Borrow the playbook from software security

What this means

Google Discloses First Confirmed AI-Authored Zero-Day, a 2FA Bypass Used for Mass Exploitation

Anthropic's 'Glasswing' Puts an Offensive-Grade AI in the Hands of 50 Defenders — And Schneier Isn't Convinced It's a Gift

Anthropic's Mythos and the Asymmetry Problem in AI Vulnerability Discovery