Hao Li, Haoxiang Zhang, and Ahmed E. Hassan from Queen's University studied what happens when AI becomes your coding teammate:
Although agents frequently outperform humans in speed, our analysis shows their pull requests are accepted less frequently, revealing a stark gap between benchmark performance and real-world trust and utility. Moreover, while agents can massively accelerate code submission—one developer submitted as many Agentic-PRs in three days as they submitted without Agentic help in the previous three years—these contributions tend to be structurally simpler.
They analyzed 456,000 pull requests from AI agents like Devin and GitHub Copilot. Real-world data on our new AI teammates.
The results are fascinating and slightly concerning. AI agents can generate massive volumes of code at incredible speed, but their PRs get accepted far less often than human contributions.
We've solved for speed but not for trust. AI can pump out code faster than ever, but much of it gets rejected. There's a fundamental tension here - quantity vs quality, speed vs acceptance.
The challenge isn't just making AI smarter. It's designing the entire human-computer collaboration around this new reality.
The bottleneck has shifted from writing code to reviewing it. One person can now generate the output of an entire team.
How do we manage that firehose of productivity? How do you review code when it's coming at you faster than you can possibly evaluate? That's the problem we need to solve next.