Slashdot: AI Can Write Code But Lacks Engineer’s Instinct, OpenAI Study Finds

Feb 19, 2025

—

Source URL: https://developers.slashdot.org/story/25/02/19/1212257/ai-can-write-code-but-lacks-engineers-instinct-openai-study-finds
Source: Slashdot
Title: AI Can Write Code But Lacks Engineer’s Instinct, OpenAI Study Finds

Feedly Summary:

AI Summary and Description: Yes

Summary: The text discusses a study by OpenAI researchers that evaluates the capabilities of leading AI models in fixing code, highlighting that while these models show promise, they significantly fall short of replacing human software engineers.

Detailed Description:
The article outlines the findings of a comprehensive research study that analyzed the effectiveness of AI models in real-world software engineering tasks. Here are the key insights from the study:

– **Testing Methodology**: OpenAI’s research introduced a testing framework known as SWE-Lancer, which utilized real-world programming tasks derived from 1,488 actual software fixes in the codebase of Expensify, worth around $1 million in freelance engineering effort.

– **Performance Metrics**: The study’s results indicated that even the most advanced AI model tested, Claude 3.5 Sonnet, succeeded in completing only:
– 26.2% of hands-on coding tasks.
– 44.9% of technical management decisions.

– **Limitations Observed**: While the models excelled at identifying relevant code snippets, they struggled with understanding the intricate interactions among different software components, leading to superficial fixes that did not account for broader implications.

– **Complexity of Tasks**: Unlike simpler programming puzzles, the tests employed by OpenAI required the AI models to handle a variety of tasks ranging from minor bug fixes priced at $50 to elaborate feature implementations valued at $32,000.

– **Validation Process**: Each proposed solution underwent rigorous end-to-end testing designed to simulate real user interactions, ensuring the results reflected practical software engineering scenarios.

This study serves as a critical reminder that while AI has come a long way in assisting software engineers, there remains a significant gap in its ability to fully comprehend and execute complex engineering tasks independently. For professionals in AI, software security, and infrastructure, this underscores the importance of collaborative human oversight when using AI for software development.

1 2 3 4 5 7 a account Act actions advanced AI AI ai model AI models and Arch art as Bug bug fixes by C capabilities cell CIA Claude Claude 3.5 Claude 3.5 Sonnet code codebase coding coding tasks Col collaborative complexity core critical D de decision decisions design developer developers development DoT e E 3 effective effectiveness end end-to-end testing engineering engineers Excel exp feature for framework free full g Go gs hands high Highlight http HTTPS human human oversight implementation implications in infrastructure insights Instinct inter interaction iOS ite k Key l Labor led limitations long man management metrics model models no o of on one open openai OPM ory out over oversight performance performance metrics pre price process professionals programming R rate RCE real red research researchers Ro s search sec security short Sig Sim Simple software software components software development software engineer software engineering software engineers software security source SSE T Task tasks tech test Testing testing framework testing methodology text the to Tor TP UI up US use user user interaction user interactions V val Validation Wi x