The Carlini Loop & Mythos: Inside the AI Pipelines Hunting for Decades-Old Bugs
For decades, finding zero-days required deep expertise in assembly, heap internals, and processor microarchitecture. This complexity created a natural barrier to entry. Today, AI is democratizing vulnerability discovery but without the right guardrails, it’s also democratizing risk.
You could not shortcut your way to a working exploit. The constraint did invisible work, holding the ecosystem together under the weight of its own bugs. That world ended in 2025. Quietly. Then suddenly.
The signal wasn’t a dramatic announcement. It was a receipt, a $2,418 bug bounty paid to a researcher who spent less than five dollars in API credits and typed, more or less, “please look for security issues in this code.” Three confirmed CVEs in two of the most widely audited Python frameworks alive. One prompt. One afternoon.
Here’s what changed, who changed it, and what it means for everyone who ships code.
The Carlini Loop: When a Bash Script Outperformed a Career
Nicholas Carlini is a research scientist at Anthropic and one of the most respected figures in adversarial machine learning. In mid-2025, working with Anthropic’s Frontier Red Team, he found something almost insulting in its simplicity.
The method is a bash for-loop. Iterate through every source file in a codebase.
For each file, send it to an LLM with a prompt framed not as “is this code safe?”(models default to yes) but as a CTF challenge: You are a world-class security researcher competing for a bug bounty. Find an exploitable vulnerability in this code.
Then let it run. Thousands of files. Hundreds of concurrent API calls. Fresh context each time. No accumulated assumptions.
“I’ve found more bugs in the last few weeks with Mythos than in the rest of my career.” — Nicholas Carlini, Anthropic
Three things make it work, and they’re worth separating.
Scale. A bash loop dispatching parallel calls processes thousands of files in hours. No human team comes close.
Fresh eyes. A reviewer who has read fifty files in a codebase has built a mental model of it, and that model creates blind spots — familiar patterns stop triggering alarm. An LLM analysing each file in isolation carries none of that bias.
Adversarial framing. CTF mode activates a different cognitive register than defensive review. The model hunts instead of audits.
Using this method, Anthropic’s team documented over 500 previously unknown, high-severity zero-days in production open-source codebases. Among them: a 23-year-old Linux kernel NFS heap buffer overflow that had been hiding in the codebase since 2003, and 22 Firefox CVEs in a single month — more than Mozilla’s entire external bug bounty programme had produced in any month of the prior year.
Mythos: The Model That Moved the Ceiling
On April 7, 2026, Anthropic announced Claude Mythos Preview alongside Project Glasswing, a controlled-access consortium of roughly fifty organisations including AWS, Apple, Google, Microsoft, the Linux Foundation, Cisco, CrowdStrike, NVIDIA, JPMorgan Chase, and Palo Alto Networks. The model was not publicly released. The reasoning was straightforward: the capability is too dangerous to hand to anyone who asks.
What Mythos found, during internal testing, reads like a science fiction story in which the fiction turned out to be real.

The number worth remembering isn’t the bug count. It’s the cost. Scanning all of OpenBSD across a thousand parallel runs ran under $20,000. A working Linux kernel privilege escalation cost under $2,000. A single-file vulnerability survey runs under $50. At those economics, the bottleneck in offensive security has shifted permanently — from researcher expertise to model access.
What’s Already Possible Without Mythos
Mythos is not public. It may never be. But the results being achieved with publicly available models — Claude Opus 4.6, GPT-4o, and their contemporaries — are already significant enough to change how security research works.
The $5 story
An independent researcher; not a professional pen-tester, not a corporate red teamer — used a standard LLM API to audit Django and FastAPI/Starlette. File-by-file, adversarial framing, simple follow-up prompts to validate. Result: three confirmed CVEs (CVE-2025-64458, CVE-2025-64460, CVE-2025-62727).
Total API spend: roughly five dollars. Total bounty: $2,418. Django and FastAPI are not obscure. They have active security teams, disclosure programmes, and years of community scrutiny. The bugs were real, and they were new.
AISLE on OpenSSL
OpenSSL is the cryptographic backbone of a large fraction of internet traffic and one of the most heavily audited libraries in existence. The AISLE autonomous AI security system began analysing it in August 2025. By February 2026 it had been credited with 13 of the 14 CVEs OpenSSL assigned that year. A 93% hit rate on the hardest target available isn’t a benchmark. It’s a new baseline.
ProjectDiscovery’s Neo
A commercial AI-assisted scanner. 22 confirmed CVEs across 13 open-source projects, with 24 verified findings no other tool caught, including SAST, fuzzers, and manual review. False positive rate low enough that findings went straight into disclosure.
The Tools You Can Actually Run Today
The gap between what a research lab can do and what a developer with an API key can do has collapsed. Two open-source tools represent the current state of the art.
1. RAPTOR
RAPTOR (MIT licence, github.com/gadievron/raptor) chains four technologies, each doing what it does best. Semgrep handles pattern matching with deterministic precision. CodeQL does deep dataflow. AFL++ discovers crashes through binary fuzzing. The LLM provides semantic reasoning, exploit chain construction, and patch generation. The orchestration sits inside Claude Code’s agent harness.
Among its authors: Halvar Flake (Thomas Dullien), former head of Google’s Project Zero. When someone of that calibre ships a tool he describes as “held together with enthusiasm and duct tape” but says he “can’t stop using,” the industry pays attention.
RAPTOR’s distinguishing feature is a nine-stage exploitability validation pipeline — alternating LLM analysis with mechanical binary checks (ROP gadget quality, RELRO coverage, input handler constraints, one-gadget satisfiability via Z3 SMT solving). Each finding is classified Unlikely, Difficult, or Likely exploitable, with explicit notes on what would bridge the gaps.
2. OpenAnt
OpenAnt (Apache 2.0, github.com/knostic/OpenAnt) was built around an uncomfortable insight: LLMs are agreeable by default. Ask one “is this code vulnerable?” and it will say yes whether the code is actually vulnerable or not. Naive prompting produces a 60–90% false positive rate, no better than the worst SAST tools.
OpenAnt’s answer is a two-stage architecture. Stage 1 is deliberately aggressive — it over-reports on purpose, because missing a real bug costs more than passing a fake one along. Stage 2 then attacks each finding: the LLM generates a targeted test case and tries to actually trigger the vulnerability. What survives both stages is real. Knostic measures up to 99.98% false positive elimination across a five-stage progressive funnel. The whole thing runs as a single command: openant scan > verify.
The Problem Nobody Wants to Talk About
There’s a failure mode that almost never makes it into the enthusiasm around these tools, and it deserves to be said plainly: language models hallucinate vulnerabilities with the same confidence they report real ones.
Ask a model whether code is vulnerable. It will find a way to say yes. It will construct a plausible attack scenario, sketch an exploit path, suggest a CVE category, and hand you a report that looks authoritative and is completely fabricated. Traditional SAST has a 30–60% false positive rate, which is already bad. Naive LLM prompting can be worse because the false positives come with expert-sounding explanations instead of just flagged lines.
The same capability that makes LLMs useful at finding bugs (generating plausible hypotheses about how code might fail) makes them dangerous at inventing bugs that don’t exist. Without a verification step that attempts to actually trigger the bug, you cannot tell the difference.
This is why every serious system in this space — OpenAnt’s two-stage architecture, RAPTOR’s nine-stage pipeline, the Carlini Loop’s validator pass — treats verification as non-optional. The methodology that produces real CVEs is never “prompt and read.” It’s “prompt, treat the output as a hypothesis, then run a second pass that confirms or refutes it.” That second pass is what separates findings from noise.
What This Means and What It Doesn’t
The capability ceiling is real and lower than the marketing suggests. The $5 Django CVEs were injection-class bugs — subtle enough to escape prior scrutiny, but within the family of taint-flow vulnerabilities LLMs reason about well because that pattern appears constantly in training data.
The 27-year-old OpenBSD SACK bug required holding two contradictory arithmetic conditions simultaneously via signed integer overflow at the 32-bit boundary. That class of reasoning is qualitatively more demanding, and even Opus 4.6 largely fails on it.
AISLE found 93% of OpenSSL’s CVEs. The 7% it missed are not random noise — they are systematically the hardest bugs, the ones requiring deep compositional reasoning about memory, concurrency, and hardware. The ceiling exists. It just isn’t where the slide decks place it.
What has shifted is the floor, not the ceiling. The minimum level of vulnerability discovery available to someone with a subscription and a well-crafted prompt is now far higher than it was two years ago. Bugs that previously needed top-tier researchers to find are now within reach of anyone who understands how to set up the verification step.
For researchers who want in without Mythos-class access, the path is clear. Pick a moderately audited target — a library with 1,000 to 10,000 GitHub stars, active development, no dedicated security team, a disclosure programme. Clone it. Run OpenAnt with an API key. Follow up with RAPTOR’s agentic scan on anything that survives. Read the survivors manually. Trigger them in a test environment. If they’re real, disclose responsibly.
The tools are open source. The methodology is documented. The results are proven.
