DORA 2025: AI Lifts Epics-per-Developer 66% but Bugs Climb 54%

The DORA 2025 data finally shows AI moving roadmaps, not just task counts - epics per developer up 66.2%. But bugs per developer are up 54%, incidents per PR up 242%, and PR review time up 441%. My honest read on what that means for engineering leaders, and why governance has just become the load-bearing problem.

Datum

May 12, 2026

The DORA 2025 numbers finally settle one debate and open a much sharper one. AI-assisted development is moving the needle at portfolio level, not just on isolated tasks: epics completed per developer are up 66.2%. At the same time, bugs per developer are up 54%, incidents per PR are up 242.7%, and median time in PR review is up 441%. I want to walk through what those four numbers mean together, because read in isolation each one supports a different fairy tale.

The "is AI actually helping?" question is closed

For two years, we had a version of the same question. We see the GitHub Copilot dashboards, we see the Claude invoices, we see engineers in standup say they're shipping faster - but where is it in the roadmap? Where is it in throughput? In the 2024 data, the honest answer was: not really visible yet. Productivity studies kept landing at single-digit gains, and the cynical reading was that AI was making individual keystrokes faster without changing what got delivered.

The 2026 Faros AI telemetry analysis, which sits alongside the DORA 2025 survey of around 5'000 developers, changes that picture. Faros measured the actual behaviour of more than 10'000 developers across 1'255 teams - so the picture is no longer self-reported sentiment, it is repository data. Epics, not commits. Epics, not lines of code. 66.2% more completed epics per developer is a portfolio-level signal, and it survives the obvious objections: epic size hasn't been redefined, the comparison is year-over-year on the same organisations, and the gain is consistent across the sample.

So that question is closed. AI-assisted development, deployed seriously, shifts roadmap output. The DORA team puts a rough financial frame around it: a modelled first-year ROI of 39% for a 500-person engineering organisation, with a payback period around eight months. The three-year average ROI in their dataset is 727%. Those are the kind of numbers a CFO can finally hold onto.

The bill arrives in a different envelope

Now the harder reading. The same dataset that gives you 66% more epics gives you:

  • Bugs per developer up 54% (versus +9% in the prior year)
  • Incidents per PR up 242.7%
  • Median time in PR review up 441% (versus +91% the year before)
  • 31% more pull requests merging with no review at all
  • Pull request size up 51.3%

These are not contradictory. They are the same phenomenon. When you accelerate code production by an order of magnitude and leave the rest of the system unchanged, the bottleneck just relocates. It moves from "writing the code" to "reviewing the code", and then from "reviewing the code" to "operating the code in production". The 242% incident rise is the bottleneck arriving in the on-call rotation.

Two findings deserve to be highlighted because they break a comfortable assumption.

First: organisations with mature pre-AI engineering performance experience the same quality degradation as everyone else. Strong foundations are necessary but not sufficient. That punctures the easy narrative that "good teams will be fine, weak teams will burn".

Second: the negative relationship between AI adoption and delivery stability holds even as adoption saturates. The DORA team's phrase is that AI is an amplifier - it magnifies what's already in the organisation, both the strengths and the weaknesses. The DORA team lead Nathen Harvey put it more pragmatically: "The greatest returns on AI investment come not from the tools themselves but from a strategic focus on the underlying organisational system: the quality of the internal platform, the clarity of workflows, and the alignment of teams."

Why the bugs are actually showing up

I want to be specific about the mechanism, because "AI writes buggy code" is the lazy reading and it isn't quite right.

A few things are happening at once. PRs are getting bigger because an agent can confidently produce 600 lines of plausible diff in the time a human used to produce 80. Reviewers face larger surface area per PR and the same calendar time per review, so review either degrades (the 31% no-review merges) or stalls (the 441% review time). The result is that the bugs that ship are not "AI bugs" in some special category. They are ordinary integration bugs, missed edge cases, and silent contract changes - the bugs that a slower human-paced workflow used to catch through friction. AI didn't introduce them. It removed the friction that used to surface them.

The same dynamic explains the incident-per-PR jump. Each PR now changes more, touches more, and rides on less review. The blast radius per change has grown faster than the safety net around it.

Gergely Orosz framed this years ago with a line that has aged extremely well: "Speed of typing out code has never been the bottleneck for software development." The DORA data is now the empirical receipt for that statement. Typing speed went up 10x. Throughput went up 66%. Quality went down. The delta is the bottleneck that was never about typing in the first place: design, review, integration, operation.

What this means if you run engineering at a Swiss or European organisation

If you're a CIO, a head of engineering or an architect reading the AI productivity claims with appropriate scepticism, here is my honest read.

The 66% epic throughput is real and you should plan for it. If you have not modelled what doubled feature output means for your QA capacity, your security review pipeline, your compliance evidence collection, and your incident response - you are about to find out the hard way. Most organisations have provisioned downstream functions assuming roughly linear growth in upstream output. That assumption just broke.

The 54% bug rise and 242% incident rise are also real, but they are policy choices, not laws of physics. They are what happens when you put a turbocharger on the engine and leave the brake lines stock. The DORA team's recommendation reads almost defensively: invest in automated testing, in continuous integration, in small batches, in disciplined version control, in observability. None of that is new. What's new is that the cost of not doing it has gone up by roughly the same multiplier as your throughput.

For regulated organisations - banks, insurers, healthcare, public sector - there's a sharper edge. A 242% rise in incidents per PR is not just an operational problem. It's an audit problem, a SOC 2 problem, a regulation problem. "We shipped twice as much with the same review process" is not a defensible story when the regulator asks for the audit trail.

Where I land, and a note on what I'm building

I read the DORA 2025 data as the moment AI in software development stops being a productivity story and starts being a governance story. The productivity question - "does it actually help?" - is answered yes. The governance question - "what happens to quality, accountability and audit trail when the engine runs ten times faster?" - is wide open, and the early answer is "nothing good unless you intervene".

This is exactly the problem I'm building Shipwright to address. The premise is that the lifecycle gates which used to be optional politeness - structured requirements, planned implementation, generated tests, code review, audit-grade event logs, traceability matrices - become load-bearing when AI agents are doing most of the typing. Not because the AI is bad, but because the friction it removed was doing useful safety work, and that work has to be reconstituted somewhere. Shipwright reconstitutes it in the development workflow itself: every change carries its own evidence trail, every phase has a quality gate, and the speed gain shows up in features delivered rather than in incidents per PR.

→ Explore Shipwright

There's a second-order implication in the DORA data I haven't seen discussed enough. The negative stability impact persists across all organisational maturity levels. That means the answer is not "buy better tools and hire better engineers". It's structural. The DevOps practices that took a decade to embed - trunk-based development, small batches, CI/CD, observability, blameless postmortems - need to be reinforced and extended, not replaced, when AI enters the loop. Anyone selling you a story where AI replaces the discipline is selling you the 242% incident curve.

The boring answer is the right answer. AI accelerates everything, including failure. The organisations that turn the 66% throughput gain into shipped product, rather than into a longer incident queue, are going to be the ones that treated their engineering foundations as infrastructure to invest in - not overhead to cut. The 2025 DORA report is a clear evidence that the gap between those two camps is about to widen sharply.

 


Sources:

  • New DORA Report Claims Strong Engineering Foundations Drive AI Return on Investment - InfoQ - 08.05.2026 - https://www.infoq.com/news/2026/05/dora-roi-ai-assisted-dev-report/
  • DORA Report 2025 Key Takeaways: AI Impact on Dev Metrics - Faros AI - 2026 - https://www.faros.ai/blog/key-takeaways-from-the-dora-report-2025
  • State of AI-assisted Software Development 2025 - DORA / Google Cloud - 2025 - https://dora.dev/dora-report-2025/
  • 2025 DORA State of AI Assisted Software Development - Google Cloud - 2025 - https://cloud.google.com/resources/content/2025-dora-ai-assisted-software-development-report
  • DORA: ROI of AI-assisted Software Development report - Google Cloud - 2026 - https://cloud.google.com/resources/content/dora-roi-of-ai-assisted-software-development
  • AI Is Amplifying Software Engineering Performance, Says the 2025 DORA Report - InfoQ - 2026 - https://www.infoq.com/news/2026/03/ai-dora-report/