Why lines accepted is the wrong metric for your engineering org

Every AI coding tool vendor publishes an "acceptance rate" metric. Every engineering leader who has adopted one of these tools watches it. It feels like a productivity signal: the higher the rate, the more the AI is contributing, the better the investment is performing.

We think this metric is worse than useless. It's actively misleading, and the misdirection has a direction. It consistently makes high-risk situations look healthy.

What "lines accepted" actually measures

Acceptance rate measures how often a developer pressed Tab (or the equivalent) instead of writing the code themselves. It is a measure of workflow interruption rate, not productivity. A developer who accepts every suggestion is a developer who is engaged with the tool; it says nothing about whether the resulting code is correct, whether the developer understands it, or whether the productivity gain is real.

Think about what maximises acceptance rate: a developer who is tired, under deadline pressure, or working in an unfamiliar codebase where anything the AI suggests looks plausible. Now think about what correlates with low acceptance rate: a senior engineer who reads every suggestion carefully, modifies most of them, and rejects the ones that don't fit the architecture. The metric punishes the behaviour you want and rewards the behaviour that produces risk.

The structural incentive problem

Here's the uncomfortable part: AI coding tool vendors have a strong commercial incentive to encourage high acceptance rates. They're selling inference. Higher acceptance rate means more tokens consumed, more API calls, more subscription renewal. The metric they're teaching you to optimise for is perfectly aligned with their revenue model and weakly anti-aligned with your engineering quality.

This isn't a conspiracy. It's just what happens when you measure what's easy to measure rather than what matters. But it's worth being clear-eyed about.

A metric that is perfectly aligned with the vendor's revenue model and weakly anti-aligned with your engineering quality should be treated with suspicion.

What to track instead

We've been thinking about this for a year. Here's the measurement stack we think matters:

1. Solo throughput trend

Run a monthly "fly-solo" session: one day where developers work without agent assistance. Track their throughput. Plot the trend over time. If it's rising, the AI tool is working as a training device. If it's flat or falling, the tool is making you dependent. This is the only metric that tells you whether the human is getting better.

2. Concept retention rate

30 days after a significant AI-authored diff, can the developer explain the code? Reproduce the logic? Make a targeted modification? This is hard to automate but not impossible. It's what our explain-to-merge system is designed to produce as a side effect. A retention rate below 60% at 30 days is a warning sign.

3. Modification ratio

What fraction of AI suggestions are accepted verbatim versus modified before merge? A higher modification ratio suggests more engagement, not less productivity. We'd argue a 40% modification rate is healthier than a 10% modification rate, even if the latter looks better on the acceptance-rate dashboard.

4. Incident attribution

When you run an incident postmortem, annotate whether the relevant code was AI-authored or human-authored. Over 6 months, do you see a pattern? We've talked to teams where AI-authored code is over-represented in incidents by a factor of 2 to 3x. None of them knew until they started tracking it.

5. Bus-factor on AI-authored code

Pick any 20 AI-authored modules in your codebase. For each, ask: who on the team can confidently modify this if the original author is unavailable? If the answer is frequently "nobody," you have a bus-factor problem that's invisible to standard metrics. The AI wrote the code; nobody really owns it.

A note on fairness to the tools

We want to be fair: the throughput gains from AI coding tools are real. Our own research showed a mean improvement of 34% across 200 developers over 12 months, and that's significant. The tools are good. The problem isn't the tools. It's the measurement layer that sits above them.

The question isn't whether to use AI coding tools. The question is whether to measure them honestly. The current measurement orthodoxy doesn't. And until engineering leaders demand better metrics from their vendors and their own dashboards, the conversation will stay stuck on acceptance rate, a metric that tells you how much your developers are clicking Tab, not whether your engineering organisation is getting better at its job.

We built aiworklab specifically because we believe the measurement layer matters as much as the tool itself. If you're an engineering leader who wants to run the retention assessment or the fly-solo experiment on your own team, write to us. We'll help.

Why "lines accepted" is the wrong metric for your engineering org.