Simon Willison’s Weblog: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Jul 12, 2025

—

Source URL: https://simonwillison.net/2025/Jul/12/ai-open-source-productivity/#atom-everything
Source: Simon Willison’s Weblog
Title: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Feedly Summary: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
METR – for Model Evaluation & Threat Research – are a non-profit research institute founded by Beth Barnes, a former alignment researcher at OpenAI (see Wikipedia). They’ve previously contributed to system cards for OpenAI and Anthropic, but this new research represents a slightly different direction for them:

We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.

The full paper (PDF) has a lot of details that are missing from the linked summary.
METR recruited 16 experienced open source developers for their study, with varying levels of exposure to LLM tools. They then assigned them tasks from their own open source projects, randomly assigning whether AI was allowed or not allowed for each of those tasks.
They found a surprising difference between developer estimates and actual completion times:

After completing the study, developers estimate that allowing AI
reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%—AI tooling slowed developers down.

I shared my initial intuition about this paper on Hacker News the other day:

My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect.
This study had 16 participants, with a mix of previous exposure to AI tools – 56% of them had never used Cursor before, and the study was mainly about Cursor.
They then had those 16 participants work on issues (about 15 each), where each issue was randomly assigned a “you can use AI" v.s. "you can’t use AI" rule.
So each developer worked on a mix of AI-tasks and no-AI-tasks during the study.
A quarter of the participants saw increased performance, 3/4 saw reduced performance.
One of the top performers for AI was also someone with the most previous Cursor experience. The paper acknowledges that here:

However, we see positive speedup for the one developer who has more than 50 hours of Cursor experience, so it’s plausible that there is a high skill ceiling for using Cursor, such that developers with significant experience see positive speedup.

My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

I got an insightful reply there from Nate Rush, one of the authors of the study, which included this note:

In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor — there’s a bunch of factors that contribute to this result — at least 5 seem likely, and at least 9 we can’t rule out (see the factors table on page 11).

Here’s their table of the most likely factors:

I think Nate’s right that jumping straight to a conclusion about a single factor is a shallow and unproductive way to think about this report.
That said, I can’t resist the temptation to do exactly that! The factor that stands out most to me is that these developers were all working in repositories they have a deep understanding of already, presumably on non-trivial issues since any trivial issues are likely to have been resolved in the past.
I think this is a really interesting paper. Measuring developer productivity is notoriously difficult. I hope this paper inspires more work with a similar level of detail to analyzing how professional programmers spend their time:

To compare how developers spend their time with and without AI assistance, we manually label a subset of 128 screen recordings with fine-grained activity labels, totaling 143 hours of video.

Via Hacker News
Tags: open-source, productivity, ai, generative-ai, llms, ai-assisted-programming, paper-review

AI Summary and Description: Yes

Summary: The study conducted by METR reveals that early-2025 AI tools may actually reduce the productivity of experienced open-source developers, contradicting common expectations of improved efficiency through AI assistance. This insight emphasizes the need to consider the steep learning curve associated with integrating AI tools into existing workflows.

Detailed Description:
The research by METR, a non-profit organization with a focus on model evaluation and threat research, investigates the impact of early-2025 AI tools on experienced open-source developers’ productivity. This study is notable for its unexpected conclusion that AI tools, rather than enhancing speed, significantly slowed down developers, leading to a 19% increase in completion time compared to working without AI.

Key points from the study include:

– **Study Design**:
– A randomized controlled trial involving 16 experienced open-source developers was conducted.
– Each developer worked on tasks from their own projects, with some tasks allowing the use of AI tools (like Cursor) and others not allowing AI.

– **Findings**:
– Despite developers’ expectations that AI would reduce their completion time by 20%, the actual results showed a 19% increase in time taken when AI was used.
– A significant proportion (75%) of participants experienced reduced performance when using AI tools.
– Only 25% of the developers saw an increase in performance, primarily those with substantial previous experience with the AI tool.

– **Learning Curve**:
– The findings suggest that the integration of AI into developers’ workflows involves a steep learning curve, which can initially hinder productivity.
– Experienced users, such as those with more than 50 hours of AI tool usage, were able to achieve a positive speedup, hinting at a high skill ceiling for effective AI assistance in programming.

– **Factors Contributing to Slowdown**:
– The paper discusses numerous factors influencing the difference in performance, indicating that there is no single reason for the slowdown but a combination of at least five major factors and up to nine additional, less definable factors.
– A critical insight is the familiarity of developers with their repositories, which could complicate the adoption of new AI tools.

– **Call for Further Research**:
– The author of the post expresses hope that this study prompts additional research to better understand developer productivity, particularly how developers spend their time with and without AI assistance.
– The research methodology, including manual labeling of screen recordings, indicates a detailed approach that could inspire further studies in this area.

This study challenges prevailing notions regarding AI-assisted development and underscores the complexity of integrating new technology into established workflows. For security and compliance professionals in AI and software development, it’s vital to recognize these learning challenges and productivity implications, as they may impact project timelines and tool adoption strategies.

.NET 1 2 2025 3 4 5 7 a Act adoption after AI AI Act AI tool AI tools ai-assisted-programming alignment and Anthropic anti app Arch art as assistance assisted assisted development at ated authors being Bi by C challenge challenges CI CIA co complexity compliance compliance professionals control core critical Cursor D day de deep DeFi demo design developer developer productivity developers development e edge effective efficiency end evaluation exp experience fact fine for full g Gen general generative Go gs H hack hacker Hacker News high HR http HTTPS implications in integration inter io issue ite J Just k Key knowledge l labeling labels leading learning least led level Li Link linked llm llms lm long low M man Mila ML Mode model model evaluation my N nation new news no non o of on on experience one only oost open open source projects open-source openai OPM opt organization ory oS other out paper pdf per performance point post pre pro product productivity productivity boost professionals profit programmers programming project project timelines projects prompt prompts ps Q R rate RCE ready real record red report research research methodology review right Ro s search sec security security and compliance SHA side Sig Sim single size SoC software software development source source project source projects speed speedup SSE SSO STIG strategies study system T Tags: Tails Task tasks tech technology ted the threat threat research Time times to tool tool usage tooling tools Tor TP trial UI under up US usage use user Users uth V val Valuation video Ware web Wi Wikipedia workflow workflows x yt z