What AI is really doing to engineering velocity
New data from 400+ engineering orgs shows AI lifts pull request throughput by single digits, not 10x — because coding was never the real bottleneck.
There is a number that gets repeated so often it has started to sound like a fact: AI will make engineers ten times more productive. It shows up on conference stages, in board decks, and in the pitch for nearly every coding tool shipped in the last two years. It's a great story. The trouble is that when you actually measure what happens inside real engineering organizations, the ten quietly disappears.
DX — the engineering intelligence company co-founded by Abi Noda — spent sixteen months doing exactly that measurement. Working with Microsoft researcher Brian Houck, the team tracked more than 400 engineering organizations as their AI adoption matured, and shared the early findings at the company's first DX Annual. It is the most grounded look we have at what AI is really doing to engineering velocity, and the short version is this: the gains are real, they're worth having, and they're nowhere near the headline.
That gap — between the promise and the measured reality — is worth sitting with, because it changes what you should do next. If you believe typing speed is the bottleneck, you'll buy more licenses and wait for a 10x that never arrives. Once you see where the time actually goes, you start spending your effort somewhere far more valuable.
Key takeaways
- The gains are single digits, not multiples. Across 400+ organizations studied over 16 months, median pull request throughput rose just under 8%, with most landing in a 5–15% range — a long way from the 10x in the headlines. (DX, 2026)
- Coding was never the bottleneck. Developers spend only about 14% of their time writing code, so speeding up that slice can move the whole system only so far. (DX, 2026)
- Lab speed is not delivery speed. In a controlled GitHub study, developers finished a task 55% faster with Copilot — proof that the eye-catching numbers come from isolated tasks, not end-to-end delivery. (GitHub, 2022)
- Faster code can make delivery worse. DORA's 2024 research found rising AI adoption was associated with an estimated 1.5% drop in delivery throughput and a 7.2% drop in stability — largely because AI inflates batch sizes. (DORA, 2024)
- The new costs are review burden and cognitive debt. More generated code shifts pressure onto reviews and QA, and can quietly erode engineers' understanding of the systems they own. (MIT Media Lab, 2025)
- The real prize is the other 86% — planning, review, testing, documentation, and coordination, plus agents that work in parallel rather than just typing faster.
The 10x promise meets the org chart
Start with the number everyone wants. After AI tool usage climbed by an average of 65% across the companies in the study, median pull request throughput went up just under 8%. Most organizations landed somewhere in the 5–15% range, and the mean came in around 11%. Meaningful — but a different universe from "ten times."
What makes the finding credible is how it was gathered. DX restricted the sample to organizations that had actually reached AI maturity — more than 75% monthly active usage of AI coding tools — so this isn't a story about teams that never adopted. It covered companies with more than 100 engineers, ran for sixteen months, and deliberately excluded organizations going through major liquidity events, M&A, IPOs, or regulatory change, where velocity swings for reasons that have nothing to do with AI. Crucially, it measured at the organization level over time rather than comparing individuals in a one-off snapshot, which is the only honest way to see whether a tool moved the whole system.
Hold that 8% next to the most-quoted lab result in the field. In GitHub's controlled experiment, developers asked to build an HTTP server in JavaScript finished 55% faster with Copilot — an hour and eleven minutes versus two hours and forty-one. Both numbers are real. They just measure different things. One is a single, self-contained coding task with a clear finish line. The other is what happens to a real backlog once code has to be reviewed, tested, integrated, and shipped by an organization full of people. The distance between 55% and 8% isn't a contradiction. It's the map of everything that surrounds the act of writing code.
Coding was never the bottleneck
Here is the statistic that reframes the whole conversation: developers spend only about 14% of their time writing code. Even a tool that made that 14% instant — free, perfect, zero seconds — would leave the other 86% of engineering work untouched.
That 86% is not idle time. It's planning and scoping, waiting for and giving code review, writing and fixing tests, chasing down flaky builds, documenting, answering questions, sitting in alignment meetings, and switching context between all of it. This is exactly the terrain that productivity researchers mapped years before the current AI wave. The SPACE framework — published in 2021 by Nicole Forsgren, Margaret-Anne Storey, Brian Houck and colleagues — argued that productivity is never one number; it spans satisfaction, performance, activity, communication, and efficiency. (SPACE, ACM Queue, 2021) Its successor, the DevEx framework from Abi Noda and co-authors, distilled the day-to-day experience down to three things that actually govern throughput: feedback loops, cognitive load, and flow state. (DevEx, ACM Queue, 2023)
Notice what's on those lists and what isn't. "Speed of typing code" appears nowhere. So when AI gets very good at the typing and leaves feedback loops slow, review queues deep, and cognitive load high, the system barely moves — exactly what the throughput data shows.
When AI accelerates writing but not reviewing, the saved time piles up in front of the review queue.
The 10x story vs. the measured reality
| Dimension | The 10x story | What the data shows |
|---|---|---|
| Headline claim | Engineers become 10× more productive | Median PR throughput up just under 8%; most in a 5–15% range |
| Where it's measured | An isolated coding task in a lab | End-to-end delivery across 400+ real organizations, over 16 months |
| What gets faster | Writing code | A ~14% slice of how developers actually spend their time |
| Effect on delivery | Faster code, faster roadmaps | Throughput and stability can dip as batch sizes grow |
| Hidden costs | None mentioned | Review burden, technical debt, token spend, cognitive debt |
| The real bottleneck | Typing speed | Planning, review, testing, coordination — the other 86% |
Why the gains are smaller than leaders expected
When the curve flattens below what the slide deck promised, it's rarely because the model is weak. The constraints sit around the model.
Coding is not the bottleneck. Optimizing 14% of the work caps how far the whole system can move, no matter how good the autocomplete gets.
Automation creates new bottlenecks. Generate code faster and you don't remove the constraint — you relocate it. The pressure lands on reviewers, QA, and the senior engineers who have to vouch for code they didn't write.
Social friction slows real adoption. Skepticism, inconsistent usage, and inflated expectations all blunt the benefit. A tool used by 75% of engineers some of the time is not the same as a tool woven into how the team works.
Tool and skill gaps compound. Getting value out of AI is itself a skill. Teams that never learn the workflows — what to delegate, how to prompt, when to distrust the output — leave most of the upside on the table.
The model still lacks your context. Without a real grasp of business logic, conventions, and the quirks of a large codebase, AI produces plausible code that a human then has to reconcile with reality. That reconciliation is where the saved minutes go.
The trap of false velocity
The most expensive mistake is mistaking motion for progress. As Abi Noda put it, too many teams are "focused on showing off how much faster and more prolific we are with AI, without asking whether it's leading to meaningful improvement." More pull requests, more lines, more commits — a dashboard that climbs while the roadmap doesn't.
The hard evidence here is sobering. For the second year running, Google's DORA research found that increasing AI adoption was associated with worse software delivery — an estimated 1.5% decrease in delivery throughput and a 7.2% decrease in stability. (DORA, 2024) The cause isn't mainly bad AI code. It's that AI makes it easy to write more code at once, batch sizes swell, and large changes have always been riskier and slower to land safely. You can feel faster and deliver slower at the same time.
More code is not more value. A pull-request count can climb all year while the roadmap barely moves — and the bill for the extra code arrives later, in review time, token spend, and maintenance.
Cost is the quiet companion to false velocity. Teams that ship more AI-generated code are watching technical debt, token spend, and long-term maintainability with new attention — because the full price of code written quickly often isn't visible for months, when someone has to understand it, change it, or debug it under pressure.
Cognitive debt: the bill you can't see yet
There's a subtler cost, and it may be the most important thing in the entire study. AI can increase output while decreasing understanding. Developers ship working code faster while building a weaker mental model of the systems they're responsible for. Noda named the risk plainly: "the loss of human understanding of the systems we're building is a really interesting risk."
This isn't only an engineering hunch. A 2025 MIT Media Lab study coined the term "cognitive debt" after measuring brain activity during AI-assisted writing: participants who leaned on a large language model showed the weakest neural connectivity and recalled less of what they'd produced than those who worked unaided. (MIT Media Lab, 2025) The work is early and not yet peer-reviewed, so hold it loosely — but the pattern it describes will be familiar to anyone who has shipped a feature they could no longer fully explain a month later. Short-term efficiency, long-term comprehension cost. The debt comes due the first time something breaks at 2 a.m. and nobody on the team has a map of the territory.
The real opportunity is the other 86%
The optimistic reading of all this is that the biggest gains are still on the table — because almost no one has gone after them yet. If coding is 14% of the work, then planning, review, testing, documentation, incident response, and coordination are the frontier. The leaders furthest along have stopped asking "how do we type faster?" and started asking "what else in the software lifecycle can AI carry?"
Two shifts matter most. The first is moving from acceleration to augmentation — from speeding up a developer to letting autonomous agents take on whole strands of work in parallel, so capacity grows instead of just cadence. The second is unglamorous and decisive: developer experience. Faster feedback loops, better documentation, less workflow friction, more protected focus time — the very levers the DevEx research identified — are what let any AI gain actually compound instead of evaporating into the queue.
Measuring what matters now
If the work is changing, the scorecard has to change with it. Some signals are constant — velocity, quality, and developer experience still tell you whether engineering is healthy. But DX argues for a sharper distinction. As Noda described it, leaders should "separate how you're measuring and framing AI's impact" into two buckets: "acceleration" — humans doing their work faster — as one, "and augmentation" — work done autonomously by agents — as a second. Blend them and you can't tell whether your numbers reflect people getting better tools or machines quietly doing the job; the two demand very different decisions.
The genuinely new frontier is what DX calls agent experience. If agents are becoming workers in the system, you can study their bottlenecks the way you study a team's. "We've applied a similar approach to agents," Noda noted — "surveying agents, which is pretty wild." Strange as it sounds, asking an agent where it gets stuck may become as ordinary as the developer surveys we run today.
A note for leaders — The 10x number was never going to come from a tool. It was always going to come from rebuilding the work around what the tool is good at. The companies that internalize that — and measure honestly — will pull away from the ones still waiting for autocomplete to save them.
A more honest playbook
You don't capture AI's value by buying more seats. You capture it by aiming the technology at the 86% and measuring the right things on the way.
- Measure outcomes, not output (start here) — Pull requests and lines tell you there's motion. Track whether the roadmap moves, quality holds, and customers feel it. Output without outcomes is the definition of false velocity.
- Split acceleration from augmentation — Keep human speed-ups and autonomous agent work in separate buckets. Mixing them hides what's actually happening and leads to the wrong bets.
- Watch your batch size — Heed the DORA signal: smaller changes, strong review and testing. It's the most direct defense against speed quietly eroding stability.
- Point AI at the other 86% — Planning, code review, test generation, documentation, incident response. That's where the time is, so that's where the next real gains are.
- Protect understanding — Treat cognitive debt like technical debt: make it visible and pay it down. Keep humans owning the mental model of the systems that matter most.
Frequently asked questions
How much does AI actually improve engineering velocity? According to DX's longitudinal study of more than 400 engineering organizations over 16 months, median pull request throughput increased just under 8% as AI adoption matured, with most organizations in a 5–15% range and a mean near 11%. These are meaningful but incremental gains — far from the 10x improvements commonly claimed. The research measured organizations over time rather than comparing individuals in isolated tasks. (DX, 2026)
Why isn't AI making developers 10x more productive? Because writing code is only about 14% of a developer's time. AI mostly accelerates that slice, while the majority of engineering work — planning, review, testing, documentation, and coordination — is untouched or even slowed as more generated code increases review burden. Headline figures like GitHub's "55% faster" come from isolated lab tasks, not end-to-end delivery across a real organization.
What is "false velocity"? False velocity is the illusion of progress created by rising output metrics — more pull requests, commits, and lines of code — that don't translate into faster roadmap delivery or business value. DORA's 2024 research found AI adoption was associated with an estimated 1.5% drop in delivery throughput and 7.2% drop in stability, largely because AI inflates batch sizes and larger changes are riskier to ship.
What is cognitive debt in software engineering? Cognitive debt is the gradual loss of understanding that can occur when developers ship AI-generated code faster than they build a mental model of how it works. It trades short-term speed for long-term comprehension, making future debugging, maintenance, and incident response harder. A 2025 MIT Media Lab study popularized the term after observing reduced neural engagement and recall among people who relied on AI assistance.
Where should engineering leaders apply AI beyond coding? In the other 86% of the software lifecycle: planning, code review, test generation, documentation, incident response, and coordination — plus autonomous agents that handle strands of work in parallel. Leaders should also separate "acceleration" (humans working faster) from "augmentation" (agents working autonomously) when measuring impact, and protect developer experience so gains compound.
Sources
- AI productivity gains: more modest than expected — DX (2026)
- The current impact of AI on engineering velocity — Engineering Enablement by Abi Noda / DX (2026)
- 8 myths on software engineering and AI — Brian Houck, DX (2026)
- The SPACE of Developer Productivity — Forsgren, Storey, Maddila, Zimmermann, Houck, Butler, ACM Queue (2021)
- DevEx: What Actually Drives Productivity — Noda, Storey, Forsgren, Greiler, ACM Queue (2023)
- Research: quantifying GitHub Copilot's impact on developer productivity and happiness — GitHub (2022)
- Accelerate State of DevOps Report 2024 — DORA / Google Cloud (2024)
- Your Brain on ChatGPT: Accumulation of Cognitive Debt — MIT Media Lab (2025)