Key Takeaways
- Attackers now weaponize new vulnerabilities in days, sometimes before the fix is even public. A test from January tells you nothing about a flaw disclosed in March.
- IBM’s 2025 breach research found it still takes organizations 241 days on average to spot and shut down a breach. A single test twelve months ago can’t cover that gap.
- Annual red teaming gives you a snapshot. Continuous, autonomous red teaming and adversary emulation give you a running video of your real exposure.
- MITRE ATT&CK is the shared language for mapping real adversary behavior to what your team actually tests for, on a schedule that matches how fast your environment changes.
- Offensive security is shifting from a once-a-year event to an ongoing practice, and the tools built for that shift are what makes it affordable for lean teams.
In July 2025, attackers slipped into Allianz Life’s third-party CRM platform through a social engineering trick and walked out with data on most of its 1.4 million US customers. A few weeks later, a similar story played out at TransUnion, where a flaw in a connected CRM system exposed records for 4.4 million people. Neither company was hacked because nobody tested their defenses. They were hacked because the test that mattered happened months before the attack did.
That’s the problem with red teaming on a calendar. It answers “were we safe on that one day,” not “are we safe right now.” And right now is the only day that counts.
Why Annual Red Teaming Is No Longer Enough
A red team test is supposed to answer one question: could a real attacker get in and do damage. For a long time, doing that once a year, or maybe twice, felt like enough. Book the engagement, get the report, patch the findings, move on with confidence.
That confidence is the problem. Environments don’t sit still for twelve months. New code ships. Vendors get onboarded. Permissions drift. Cloud configurations change every week, sometimes every day. A red team report from January describes a system that, by June, barely exists anymore.
Attackers know this better than anyone. Recent research from Mandiant put the average time between a vulnerability’s disclosure and its first real-world exploitation at negative seven days in 2025, meaning attackers were often already exploiting zero-day vulnerabilities before vendors even disclosed them. This is why CISA’s Known Exploited Vulnerabilities (KEV) catalog has become critical for prioritization—it tracks what’s being actively weaponized, not just what’s theoretically exploitable. Rapid7’s 2026 threat report found the median time for a new vulnerability to land on CISA’s Known Exploited Vulnerabilities list dropped from 8.5 days to 5. Compare that to a testing cycle measured in months, and the math stops working.
Here’s how fast that window has closed:
Time to exploit
The window attackers need keeps shrinking
Median days between a vulnerability’s disclosure and its first real-world exploitation. An annual test can’t move at this speed.
Source: Mandiant research put 2025’s median disclosure-to-exploit gap at roughly negative seven days — attackers were already exploiting flaws before vendors disclosed them.
An annual test simply can’t move at that speed. It was built for a world where attackers took months to act. That world is gone.
The real cost of testing once a year
The gap between “we tested in January” and “we got breached in September” isn’t theoretical. According to IBM’s 2025 Cost of a Data Breach Report, organizations took an average of 241 days to identify and contain a breach, made up of roughly 158 days to even notice something was wrong and another 83 days to shut it down. That’s most of a year spent exposed, with a red team report sitting on a shelf the whole time saying “you were fine six months ago.”
Coverage, visualized
A snapshot vs. a running video of your exposure
241
average days to identify & contain a breach — most of a year spent exposed
One test, one day a year. Everything after it is untested until the next engagement.
24/7
mapped-to-ATT&CK attack chains run around the clock, not on a schedule
Every change, new asset, and disclosed technique gets tested as it happens.
Source: IBM Cost of a Data Breach Report, 2025 — 158 days to notice, 83 more to contain.
Security firm Bishop Fox put it plainly in its own research: for any company with several business units, a single annual red team exercise just isn’t enough anymore. Large, complex organizations are moving toward continuous offensive testing instead, running multiple focused engagements across the year rather than one big one.
The Gap Between Point-in-Time Testing and Real Attacks
Annual testing has another blind spot beyond speed. It tests a fixed scope, agreed on weeks in advance, against a system that will have changed by the time testers show up. Real attackers don’t work from a scope document. They probe whatever is exposed on the day they show up, whether that’s a new cloud bucket, a forgotten API, or a vendor integration nobody remembered to review.
A few gaps show up again and again in organizations that still rely on a once-a-year test:
- New assets go untested for months. A cloud instance spun up in February won’t see a red teamer until next year’s engagement, if it survives that long.
- Detection drift goes unnoticed. Security tools get reconfigured, alerts get tuned down to cut noise, and nobody checks whether the detections that passed last year’s test still fire today.
- Findings age out. A vulnerability flagged and “fixed” in the spring can reappear after a rollback, a misconfigured deploy, or a forgotten dependency.
- New attacker techniques don’t get tested. Adversary groups adjust their playbooks constantly. A scope built a year ago tests last year’s threat model.
Where MITRE ATT&CK and adversary emulation come in
This is why more security teams anchor their testing to the MITRE ATT&CK framework instead of a fixed annual checklist. ATT&CK is a living, publicly maintained catalog of the tactics and techniques real attacker groups actually use, from initial access to lateral movement to data theft. Instead of testing an arbitrary list of controls, adversary emulation uses ATT&CK to recreate the specific behaviors of the threat groups most likely to target your industry.
Open source tools like MITRE’s own Caldera project were built for exactly this: running adversary emulation on a schedule that matches how threats evolve, not how often a vendor can staff a team. That’s a meaningful shift. Testing stops being “did we pass an audit” and starts being “can we detect and stop the exact playbook a real group would run against us this month.”
The market is already voting with its budget here. The global red teaming services market was valued at roughly $3.2 billion in 2025 and is projected to nearly triple to $9.7 billion by 2034, largely driven by the move from one-time exercises to ongoing validation. Gartner’s Continuous Threat Exposure Management framework, now widely adopted, treats red team findings as a constant feed into a live risk picture rather than a report that gets filed away.
What Continuous, Autonomous Red Teaming Looks Like
Continuous red teaming doesn’t mean hiring five times the staff to run tests every week. It means using automation to handle the repeatable, high-volume parts of adversary emulation, so human testers can spend their time on the creative, judgment-heavy attacks that automation still can’t replicate.
In practice, autonomous red teaming platforms:
- Launch focused tests automatically when something changes, like a new segmentation rule, a new cloud region, or a newly disclosed exploit relevant to your stack.
- Run mapped-to-ATT&CK attack chains around the clock instead of waiting for a scheduled engagement window.
- Score whether your detection stack actually caught the simulated attack, not just whether the exploit technically worked.
- Feed every miss straight into a detection engineering backlog, so gaps get closed instead of just documented.
This doesn’t replace human red teamers. It changes what they’re for. Automation handles the volume: the daily and weekly checks against known techniques. People handle the parts that require creativity, like chaining an odd business logic flaw with a social engineering angle nobody scripted for. Most mature programs run both side by side, using periodic human-led engagements to validate what the automated layer is finding and to catch what it can’t.
One caution worth naming here: automated testing is only as good as the visibility behind it. A red team program layered on top of a security stack that only monitors part of the environment will report clean results that don’t mean much. Before scaling up testing frequency, it’s worth confirming which log sources and assets actually feed your detection tools in the first place.
Building a Red Team Program That Keeps Pace
Moving from an annual test to a continuous one doesn’t happen overnight, and it doesn’t need to. A workable path usually looks like this:
- Start with your riskiest assets. Identify the systems where a breach would hurt the most, and put those under continuous or monthly testing first.
- Anchor every test to ATT&CK. Map your playbooks to the tactics your actual threat model considers realistic, not a generic kill chain template.
- Automate the repeatable layer. Use autonomous tooling for known techniques and change-triggered tests, freeing your team for deeper manual engagements.
- Close the loop on every finding. A missed detection should turn into a written rule and a retest, not a line item in a report nobody rereads.
- Keep a human check on the automation. Quarterly manual red team exercises validate that the automated layer is finding what matters and not just generating noise.
None of this requires a massive security budget. It requires treating red teaming as an ongoing practice instead of a once-a-year box to check.
Secure.com
Close the gap that opens up after the test ends
Red teaming proves a misconfiguration is exploitable. The Infrastructure Teammate makes sure it doesn’t quietly drift back open.
- Discovers assets continuously across AWS, Azure, GCP, and SaaS as they appear.
- Benchmarks against CIS & DISA STIGs and flags insecure defaults on appearance, not at the next review.
- Routes and remediates drift — auto-fixing low-risk changes, routing the rest with SLAs and audit trails.
- Maps blast radius and attack paths so a proven weakness comes with a known owner and fix.
FAQs
How often should a company run red team exercises?
What’s the real difference between a red team report and a penetration test report?
Can automated or AI-driven red teaming replace human red teamers?
How do I know if my last red team report is still valid?
Conclusion
The math is simple. Attackers move in days. A once-a-year test moves in months. That gap is where breaches happen. Annual red teaming isn’t useless, it’s just no longer the whole answer. Pairing it with continuous, ATT&CK-mapped adversary emulation, and a team or tool ready to act on what it finds, is what closes the distance between “we tested this” and “we’re actually protected.”