A.I.

Why a Decade of Writing Detection Logic Makes the Mythos Exploit Numbers Less Scary

Mythos is finding thousands of vulnerabilities. Defenders aren't doomed. Detection has never been 1:1 with exploits, and why I think the numbers are a little* less scary than being made out to be.

27 Apr 2026 — 10 min read — signalblur

Anthropic’s marketing team has been pushing its new Mythos cybersecurity model and the volume of vulnerabilities it’s finding. According to Mozilla, those findings appear to be legitimate. If the pace holds up near term, a lot of people inside and outside the industry are worried, with good reason, and wondering if this is the new normal.

As someone who’s been writing detection logic for cybersecurity vendors for nearly a decade, these numbers are less scary and less world-ending than they appear. I’ve managed SOCs that regularly went up against state-sponsored actors, in the role where our organization won the Cogswell Award from the Defense Counterintelligence Agency. I’ve worked for a Fortune 100 doing detection at an enterprise scale most engineers never get to see, and put out the first public white paper on detection as code. All of that to say, I’ve been at it for quite some time now. While I think the short-term impact of models like Mythos is going to be rough, I also believe It’s also a lot less bad than people are making it out to be.

New Exploit Releases Have Always Far Exceeded Defenders’ Ability to Write Detection

Writing detection logic has always been whack-a-mole. David Bianco’s Pyramid of Pain, one of our industry’s foundational write-ups, argues exactly this. You lean on behavioral detection over individual IoCs and exploits because new exploit disclosures have always outrun defenders’ ability to write rules. One-off exploit coverage isn’t where detection engineers spend most of their time. People still do it. The ET Open ruleset is a decent look at how many individual rules exist for historical CVEs. Rules typically get written for the major vulnerabilities, anything actively used against your industry, and the handful of cases where automation makes the work cheap.

Adversaries Haven’t Needed Zero-Days

Threat actors haven’t needed zero-days to compromise their targets. Old exploits have worked just fine for decades. One of the most prevalent initial-access techniques today, ClickFix, doesn’t rely on zero-days at all, it tricks users into pasting malicious code into PowerShell or the Run dialog and executing it themselves.

Detection Logic Doesn’t Map 1-to-1 to Exploits

For anyone who hasn’t written detection logic before, my favorite example of why behavioral detection beats signature-based hunting on individual exploits and IOCs is Remote Code Execution (RCE) bugs in Microsoft Office. Office products like Word and Excel have produced some of the most impactful and most abused vulnerabilities in the industry for two decades, more than 1,000 distinct RCE CVEs and counting.

Despite the prevalence of these vulnerabilities and their impact, detecting their abuse is a lot less difficult than one might think. For example, in 2022 Microsoft changed the default so that Office documents arriving from the internet, those tagged with Mark of the Web (MOTW), would no longer run macros, requiring the user to right-click the document and choose Unblock or run Unblock-File in PowerShell. While some may think of this as an exploit mitigation or hardening rather than detection, I disagree. From a detection engineer’s perspective, before Microsoft made this change I could have written a custom detector for that same behavior. After Microsoft implemented it, there was a major drop in macro-based malicious document delivery.

This, combined with modern EDR tooling that makes profiling behaviors easy, lets you build baselines and detections for behaviors like an Office document spawning a child process, a hallmark behavior of an Office document executing code. Like the prior behavior, this dramatically reduces a threat actor’s ability to get successful code execution through an Office document regardless of the exploit used.

Overlapping these two behaviors makes successful code execution exponentially harder. From there you can layer more, such as PowerShell executing a .ps1 file downloaded from the web. As a detection engineer, my job is to overlap enough behaviors that when one fires, the others raise the confidence it’s actually malicious, typically by tying them to scores in a Risk-Based Alerting model, where each new detection raises the cumulative likelihood of malicious activity.

Machine Learning and Anomaly Detection is Unlikely to be the Answer

Organizations are scrambling because of the headlines showing the sky is falling for blue teams, and mature detection teams aren’t an exception. Most are looking at ways to transition from individual behavioral detectors to machine learning based models. I think that’s a mistake, and there is research to back it up.

Two papers from the security research community laid out the case against ML-based intrusion detection long before the current AI wave (something anyone that’s worked a SOC doesn’t need a research paper to know):

Robin Sommer and Vern Paxson’s “Outside the Closed World: On Using Machine Learning for Network Intrusion Detection”
Stefan Axelsson’s “The Base-Rate Fallacy and the Difficulty of Intrusion Detection”

Sommer and Paxson’s critique runs to five points, but only three really matter here.

The first is that ML is good at classification, deciding which of several known categories an input belongs to. Anomaly detection inverts the problem. You train on benign traffic and ask the system to flag everything that doesn’t fit. The textbook they cite calls this the closed-world assumption, and they note plainly that it isn’t of much use in real-life problems. Spam classification works because both spam and ham can be trained. Recommendation systems work because they’re surfacing similar items, not novel ones. Network intrusion detection is the opposite shape of problem.

The second is the diversity of network traffic. Real traffic is heavy-tailed, bursty, and variable on every time scale that matters operationally. There is no stable “normal” to learn against. A model that performed well in March will start drifting by June because the application mix shifted, the workforce moved, a new SaaS rolled out, or a major holiday changed user behavior. That drift inflates the false positive rate, which Axelsson tells us is the thing you cannot afford to inflate.

The third is what they call the semantic gap. Even when an anomaly detector flags something correctly, it tells the analyst that an event was unusual, not that it was malicious, not what it was trying to do, and not what to do about it. The analyst still has to do the work of figuring out whether the unusual event matters. In a real SOC, that work is the bottleneck.

If you’re going to use ML in this space, Sommer and Paxson have a few practical recommendations on how to do it well.

Their first recommendation, and the one I’d put above the others, is to understand what the system is actually doing. The PEAK Threat Hunting Framework walks through performing structured threat hunts that can help both document and achieve this understanding.

Their second is to keep the scope as narrow as possible. Don’t ask the model to detect “attacks” in general, ask it to detect a specific, well-defined activity.

Their third is the one that tends to get missed. They argue that machine learning is often most useful as a feature-discovery tool rather than the detector itself. Meaning you use ML to find which features of benign and malicious traffic carry the most signal, then build a non-ML detector on top of those features.

The other point that becomes relevant is the Base Rate Fallacy paper, which they reference:

“In intrusion detection, the relative cost of any misclassification is extremely high compared to many other machine learning applications. A false positive requires spending expensive analyst time examining the reported incident only to eventually determine that it reflects benign underlying activity. As argued by Axelsson, even a very small rate of false positives quickly renders an NIDS unusable.”

This paper, in my opinion, is required reading for detection engineers. To understand why it came to this conclusion, let’s break it down with an easy-to-understand example.

NOTE: For the purposes of this paper, a True Positive is considered any investigation that leads to a malicious outcome and a False Positive is any investigation that is benign activity. I recommend avoiding using a binary True/False positive when it comes to security monitoring however.

Take a small environment of a million events per day, with two actual intrusions per day. Assume each intrusion creates ten events, meaning twenty intrusive events out of a million total. The probability that any given event is intrusive works out to 20 / 1,000,000 = 0.00002. That tiny probability is what makes the false positive rate the most important measurement of the effectiveness of a piece of detection logic.

Detection rate and false positive rate are commonly confused and thought to be inverses, but they are not. Detection rate is true positives over actual intrusive events, whereas the false positive rate is false positives over actual benign events. The two numbers can move independently. The reason false positive rate ends up dominating isn’t that it matters more in some abstract sense, it’s that the benign population is roughly 50,000 times larger than the intrusive one. A perfect detection rate only buys you twenty hits because there are only twenty intrusive events to hit. A false positive rate of 0.001 generates a thousand false hits because there are nearly a million benign events to fire on. False positive rate gets multiplied by a much bigger number than detection rate does.

So to determine the true positive rate (TPR) it is: TPR = TP / actual intrusive events. Using our example, 20 / 20 = 1.0 (perfect detector catches all 20 intrusive events).

To determine the false positive rate (FPR) it is FPR = FP / actual benign events. Using our example, the FPR is 0.001: 1,000 / 999,980 ≈ 0.001.

With a detection rate of 1.0, a perfect detector, and a false positive rate of 0.00001, you catch all twenty intrusive events as true positives. You also throw roughly ten false positives on benign traffic, since 1,000,000 × 0.00001 = 10. Twenty real alerts out of thirty total. A Bayesian detection rate around 66%.

Raise the false positive rate to 0.001, which still sounds respectable on paper, and the alert queue explodes. The twenty true positives don’t move, but the false positive count jumps to 1,000,000 × 0.001 = 1,000. Twenty real alerts out of 1,020 total, or roughly 2%.

That 2% is more brutal than it looks. The 2% is the probability that any single alert sitting in the analyst’s queue is part of a real intrusion. Not “2% of intrusions get detected.” Both intrusions are technically in the alert queue, and at perfect detection rate you’d fire on at least one event from each.

The problem is the analyst can’t tell which twenty alerts out of 1,020 are the real ones without working through every alert. They drown in a thousand false positives to find the twenty real ones. Per-alert trust is too low to act on, even though the intrusions are detected.

The analyst learns to ignore the system within a week. This is why detection rate isn’t what kills you. False positive rate is.

Behavioral Detections Have Less Drift

A well-scoped behavioral rule keys on something with no legitimate business purpose, and “no legitimate business purpose” is a property that rarely drifts. winword.exe spawning powershell.exe is the example I keep coming back to. There’s almost no workflow that needs Word to launch a scripting interpreter. That’s true on a hospital network in 2014 and a law firm in 2026. Traffic volume can double, the workforce can go remote, a new SaaS can roll out, and the rule’s false positive rate rarely moves. None of those shifts generate winword.exe → powershell.exe pairs.

This is how a rule lands near the FPR Axelsson’s math actually requires and stays there. The detector isn’t learning what’s normal from current traffic, but it’s defining a structural fact about the system. ML anomaly detection doesn’t have that property. Its “normal” is a snapshot of the traffic it was trained on, and when the environment shifts the false positive rate spikes, not because anything malicious is happening, but because the baseline moved. Every drift is another retrain, and every retrain is another chance to raise your FPR.

Defenders Also Have AI/LLMs…

Defenders have access to the same models. Like exploit devs hunting zero-days, blue teams are using them to identify new behaviors and work through their behavioral backlogs much faster. And as covered above, detection isn’t 1:1 with exploits, even zero-days.

While I’ve been critical of anomaly detection and ML for detection engineering, it does have its place. As Sommer and Paxson’s paper points out, it can work when targeted at specific, well-scoped use cases. It’s not a binary “use either AI/ML or behaviors”, it’s both.

What Scares Me with LLMs Isn’t a Surge in Exploits

The biggest worry I have with LLMs isn’t the surge in new exploits, but the increase in attack surfaces that aren’t well understood, and the level of access these agents are being given. I also worry that AI agents will make anomaly detection systems even more prone to false positives, since they’re non-deterministic by nature.

For example, it’ll likely become the new normal for those in non-technical roles to start using these agents. If a member of accounting gets prompt injected and the agent is instructed to initiate a wire transfer using the user’s legitimate browser cookies, that gets difficult to detect. And because the user isn’t the one carrying out the task, they may not even know it’s being done, or be able to tell the security team whether or not it was intentional.

Closing / TL;DR

In the short term, I think the increase in exploit availability will negatively impact defenders while the industry and defensive tooling catch up. Most organizations are still figuring out how to use LLMs for detection, and that’s typically much less straightforward than exploit development, environments are dynamic and varied, and high-quality training data is hard to come by.

Long term, I think the gap between new exploits and new detections will start to even out, even though the relationship was never 1:1 to begin with. The thing I actually worry about isn’t the exploit count, it’s the level of access these agents are being given, and the fact that the attack surface they introduce still isn’t well understood.