Skip to content
Dev Tools Article

The Return of the Lines of Code Metric

AI vendors are reviving volume-based developer metrics, with massive real-world consequences for engineering teams.

Lenn Voss
Lenn Voss
Cloud & Infrastructure Writer · Jun 11, 2026 · 4 min read

For decades, the quickest way to get laughed out of an engineering leadership meeting was to suggest measuring developer productivity by lines of code. We all knew the punchlines: measuring programming progress by lines of code is like measuring aircraft building progress by weight. We spent twenty years training management to look at outcomes—features shipped, customer satisfaction, system reliability, and cycle time—rather than raw keyboard-mashing volume.

Yet, look at the marketing billboards dominating the industry today. The volume metric has returned, and it has a world-class publicist.

Instead of tracking what got built, the industry is suddenly obsessed with how much is being generated. Google proudly claims that 75% of its new code is AI-generated. Anthropic boasts that Claude writes roughly 80% of merged production code, enabling engineers to ship "8x more code per quarter." OpenAI claims a similar ~80% figure, while Cursor touts "100M+ lines of enterprise code written per day."

We have swapped out "lines of code" for "percent of code written by AI." It is the exact same volume-centric metric, repackaged for the generative era.

The Pivot From Outcomes to Volume

This is a massive shift in how the industry talks about developer tools. A few years ago, when GitHub launched Copilot, the flagship marketing claim was an outcome: developers completed tasks 55% faster. Whether you agreed with the study's methodology or not, it was a bold, falsifiable claim about efficiency and value. If it was wrong, you could measure it and prove it wrong.

The new wave of volume-based claims cannot fail. If a company claims 75% of its codebase is AI-written, that number can keep climbing regardless of whether the software actually gets better, ships faster, or has fewer bugs. It is a metric that only disappoints if adoption stalls—and adoption is the one thing everyone agrees is happening.

The marketing shifted to volume because proving actual, repeatable productivity gains turned out to be incredibly complicated.

The Messy Science of AI Productivity

When researchers try to measure the actual impact of AI on software engineering, the results are all over the map.

On the optimistic side, a large-scale study by Cui et al. of nearly 5,000 developers showed a 26% increase in completed tasks, with junior developers seeing the most significant boost. But other studies paint a far more troubling picture. Research from GitClear revealed that as Copilot adoption deepened, code churn rose and refactoring collapsed—suggesting that developers are writing more code but also throwing more of it away.

Then there is the famous study by the research organization METR. They initially found that experienced open-source developers were 19% slower when using AI in their own codebases, even though those same developers believed they were 20% faster.

However, in February 2026, METR effectively walked back those findings. In a follow-up, their estimates flipped to a speedup, but with massive error bars. More importantly, they abandoned the study design entirely. Why? Because developers now flatly refuse to work without AI assistance, making it impossible to establish a clean control group, and developers cannot reliably self-report time spent on agentic workflows. METR's current stance is that AI probably speeds developers up, but we can no longer cleanly measure by how much.

Meanwhile, at the organizational level, the gains seem to evaporate. A National Bureau of Economic Research (NBER) survey of approximately 6,000 executives found that while 69% of firms are actively using AI, roughly nine in ten report no measurable productivity impact. The broad consensus across various studies points to organizational gains of around 10%—a solid, useful improvement, but a far cry from the "8x" claims on the billboards.

Even Anthropic's own research arm has highlighted this tension. In a randomized controlled trial (RCT), they found that AI-assisted developers scored 17% lower on comprehension of the code they had just shipped, with no statistically significant productivity gain.

Maturity Models and the Definition Deficit

To fill the gap between marketing hype and messy reality, the industry has turned to "maturity models." Carnegie Mellon’s Software Engineering Institute (SEI) and Accenture recently launched an AI Adoption Maturity Model featuring five levels and eight dimensions—marketed alongside a statistic showing that 95% of organizations currently see no returns on their AI investments.

Similarly, industry figures like Steve Yegge have proposed frameworks like the "8 levels of AI-assisted development," which rank engineering organizations based on the tools they run and the level of human supervision required.

The trouble with these maturity ladders is that they measure adoption intensity and call it maturity. The top rung is almost always "use more of our product."

We do not even agree on what we are building. When developer tool vendor Augment surveyed 219 engineering leaders to define "AI-native engineering," they received 219 different answers.

The Real-World Cost of Vanity Metrics

These volume metrics and adoption ladders might seem like harmless marketing fluff, but they are actively shaping corporate strategy, engineering budgets, and headcount.

In February, Jack Dorsey cut over 40% of Block's workforce—more than 4,000 people—using AI as the explicit core thesis. Dorsey stated that "a significantly smaller team, using the tools we’re building, can do more and do it better," even while noting that the business was strong and gross profit was growing.

A couple of weeks later, Atlassian cut 10% of its workforce (around 1,600 people), conceding that "it would be disingenuous to pretend AI doesn’t change the mix of skills we need or the number of roles required."

When leadership teams make massive cuts based on the premise that AI has made everyone exponentially more productive, they are betting on volume claims rather than verified outcomes. There is very little evidence that large portions of the engineering workforce are sitting idle because AI is doing their jobs. In reality, most software companies have endless roadmaps and backlogs that could easily absorb any modest productivity gains.

By treating "lines of code generated" as a proxy for value, the industry risks repeating the exact same management mistakes of the late 1990s—only this time, the code is being generated at a scale we cannot easily comprehend, let alone maintain.

Sources & further reading

  1. Lines of Code Got a Better Publicist — curlewis.co.nz
Lenn Voss
Written by
Lenn Voss · Cloud & Infrastructure Writer

Lenn writes about cloud platforms, Kubernetes internals, and the infrastructure decisions that quietly make or break engineering organizations. Based in Berlin's vibrant tech scene, they have a talent for turning dense platform-engineering topics into prose that people actually finish reading.

Discussion 0

Join the discussion

Sign in or create an account to comment and vote.

No comments yet

Be the first to weigh in.

Related Reading