The Register: JetBrains wants to train AI models on your code snippets

Oct 1, 2025

—

Source URL: https://www.theregister.com/2025/10/01/jetbrains_wants_your_code_to_train_ai/
Source: The Register
Title: JetBrains wants to train AI models on your code snippets

Feedly Summary: Dangles free product licenses in return for code-related data for its training
IDE and developer tools biz JetBrains believes training AI models on public datasets is insufficient, and is offering free product licenses to organizations that are willing to share detailed code-related data.…

AI Summary and Description: Yes

Summary: JetBrains is incentivizing organizations to share valuable code-related data by offering free product licenses, addressing the limitations of training AI models solely on public datasets. This initiative highlights the intersection of AI training and developer engagement, crucial for enhancing the effectiveness of AI in software development.

Detailed Description:
JetBrains, a company known for its integrated development environments (IDEs) and developer tools, has announced a strategy that might influence the future of AI model training, especially in the software development domain. The organization believes that relying only on publicly available datasets to train AI models is inadequate for developing powerful, reliable, and effective AI solutions. As a response to this concern, JetBrains is introducing a program where they will offer free product licenses to organizations willing to contribute detailed code-related data.

Key points include:

– **Incentivization of Data Sharing**: By offering free licenses, JetBrains aims to create a collaborative ecosystem where developers and organizations can contribute valuable data that enhances AI training.

– **Limitations of Public Datasets**: The initiative emphasizes an understanding of the shortcomings of public datasets, which may lack the depth and specificity of proprietary code-related data.

– **Impact on AI Security and Development**: This move could lead to improvements in the security of AI applications, as more comprehensive datasets could result in better-trained models that are aware of diverse coding practices and security implications.

– **Potential for Improved AI Tools**: With access to real-world data, AI models can become more adept at understanding and predicting coding needs, leading to innovative tools that could significantly improve productivity and coding practices for developers.

– **Long-term Implications for Software Security**: As models trained on richer datasets emerge, they may embody enhanced security features that can help developers identify vulnerabilities, secure code practices, and ensure compliance with regulatory standards more effectively.

This initiative stands to stimulate discussions about the ethical implications of data use in AI training and the potential benefits of collaborative data sharing among developers and organizations, ultimately fostering advancements in AI security and software development practices.

01 1 10 2 2025 5 a access Act advancement advancements age AI AI applications ai model AI models AI security AI tool AI tools AI training All and app Application applications as at ated aware benefits Bi brain by C CERN CI CIA co code coding coding practices Col collaborative collaborative data collaborative ecosystem compliance D data data sharing data use dataset datasets de depth developer developer engagement Developer Tools developers development development environment development environments development practices domain e ecosystem effective effectiveness engagement enhanced security environment environments ethical ethical implications feature features for free future future of AI g GIS gs H high Highlight http HTTPS impact implications in Influence innovative tools Integrated Development Environment integrated development environments inter io Iron J JetBrains k Key l Labor Lead leading led Li license limitations long M Mode model model training models N needs no o of off on only ons OPM organization organizations ory oS out per point potential Power practices pre pro product productivity proprietary ps public public data public datasets Q R rate RCE re real real-world data red regulatory related data response return Ro s sec secure security Security Feature security features security implications SHA sharing short Sig size sizes software software development software development practices software security solutions source specific standards Strategy system T ted term implications the to tool tools Tor TP trained trained model trained models training turn UN under US use V val vulnerabilities WAN Ware Wi world z