Hacker News: Microsoft Copilot continues to expose private GitHub repositories

Mar 1, 2025

—

Source URL: https://www.developer-tech.com/news/microsoft-copilot-continues-to-expose-private-github-repositories/
Source: Hacker News
Title: Microsoft Copilot continues to expose private GitHub repositories

Feedly Summary: Comments

AI Summary and Description: Yes

**Summary:** The investigation by Lasso into claims about ChatGPT accessing private GitHub repositories highlighted critical concerns regarding data privacy and the phenomenon known as “Zombie Data.” This situation underscores the risks of AI tools accessing cached information that was previously public, posing significant challenges for organizations in protecting sensitive data.

**Detailed Description:** The investigation by Lasso sheds light on the implications of AI tools accessing previously public data, revealing critical vulnerabilities that organizations may face due to the persistent nature of cached data, which they call “Zombie Data.” Key insights from the findings include:

– **Background of the Investigation:**
– A LinkedIn post alleged that ChatGPT and Microsoft Copilot could access private GitHub data.
– Lasso found that AI tools relied on search engine indexing, particularly Bing, for generating responses, potentially including information from repositories that had reverted to private status.

– **Concept of “Zombie Data”:**
– Data that had once been public can remain retrievable via caches despite being made private or deleted.
– Information believed to be secure might actually be accessible through AI tools like Microsoft Copilot.

– **Testing Internal Systems:**
– Lasso examined their own repositories, discovering that some had been indexed despite being secured.
– ChatGPT’s capabilities differed from Copilot’s; the latter could retrieve actual data rather than just indexed mentions, raising more privacy concerns.

– **Findings from Widespread Investigation:**
– Lasso identified over 20,580 GitHub repositories still accessible through Bing’s cache, creating concerns about data breaches involving sensitive information.
– The investigation revealed numerous organizations, including major global companies, were affected.

– **Microsoft’s Response:**
– After being informed of the findings, Microsoft categorized the issue as “low severity” but removed Bing’s cached link feature and disabled the cc.bingj.com domain.
– However, this action did not fully address ongoing vulnerabilities, as Copilot retained access to sensitive data even after Microsoft’s deletions.

– **Implications for Organizations:**
– Organizations should treat any data made public as potentially compromised permanently.
– Security intelligence must adapt to consider LLMs and AI tools as part of their monitoring frameworks.
– Strict access controls must be enforced to prevent oversharing by AI systems.
– Basic cyber hygiene practices remain essential to mitigate risks associated with sensitive data exposure.

– **Future Considerations:**
– As AI technologies evolve, organizations must remain vigilant and proactive about the security of their data, recognizing that once information is public, it could be subject to future retrieval and misuse.

Lasso’s findings and Microsoft’s partial responses underscore a pressing challenge for data security in the age of generative AI and cloud computing. Organizations are called upon to reevaluate their data management strategies to respond to these emerging threats posed by AI mechanisms and caching systems.

2 5 a access access control access controls Act after AI AI systems AI technologies AI tool AI tools and Arch art as being bing breach breaches by C Cache caching capabilities CERN challenges chat ChatGPT CIA Cloud cloud computing companies Computing concept concerns control controls Copilot core critical cyber cyber hygiene D data data breach Data breaches data exposure data management data privacy data security de developer domain e emerging threats event exp face feature for framework frameworks full future future considerations g Gen generative Generative AI git GitHub GitHub repositories Go GPT gs H hack hacker Hacker News high Highlight HR http HTTPS implications in indexing information insights Intel intelligence inter intern investigation ite J Just k Key l Lasso led Li Link linked LinkedIn llm llms lm low man management management strategies Micro Microsoft Microsoft Copilot misuse Monitor monitoring monitoring framework monitoring frameworks N news no non o of on opilot organization organizations out over post potential pre privacy privacy concerns proactive public public data R raising rate RCE red response retrieval Risk risks Ro RoT s Sable search search engine sec secure security security intelligence sensitive data sensitive information severity SHA sharing side Sig SoC source SSE STIG system systems T tech technologies test Testing the threat threats to tool tools Tor TP trie up US use V val vulnerabilities Wi x Zombie Data