Source URL: https://www.developer-tech.com/news/microsoft-copilot-continues-to-expose-private-github-repositories/
Source: Hacker News
Title: Microsoft Copilot continues to expose private GitHub repositories
Feedly Summary: Comments
AI Summary and Description: Yes
**Summary:** The investigation by Lasso into claims about ChatGPT accessing private GitHub repositories highlighted critical concerns regarding data privacy and the phenomenon known as “Zombie Data.” This situation underscores the risks of AI tools accessing cached information that was previously public, posing significant challenges for organizations in protecting sensitive data.
**Detailed Description:** The investigation by Lasso sheds light on the implications of AI tools accessing previously public data, revealing critical vulnerabilities that organizations may face due to the persistent nature of cached data, which they call “Zombie Data.” Key insights from the findings include:
– **Background of the Investigation:**
– A LinkedIn post alleged that ChatGPT and Microsoft Copilot could access private GitHub data.
– Lasso found that AI tools relied on search engine indexing, particularly Bing, for generating responses, potentially including information from repositories that had reverted to private status.
– **Concept of “Zombie Data”:**
– Data that had once been public can remain retrievable via caches despite being made private or deleted.
– Information believed to be secure might actually be accessible through AI tools like Microsoft Copilot.
– **Testing Internal Systems:**
– Lasso examined their own repositories, discovering that some had been indexed despite being secured.
– ChatGPT’s capabilities differed from Copilot’s; the latter could retrieve actual data rather than just indexed mentions, raising more privacy concerns.
– **Findings from Widespread Investigation:**
– Lasso identified over 20,580 GitHub repositories still accessible through Bing’s cache, creating concerns about data breaches involving sensitive information.
– The investigation revealed numerous organizations, including major global companies, were affected.
– **Microsoft’s Response:**
– After being informed of the findings, Microsoft categorized the issue as “low severity” but removed Bing’s cached link feature and disabled the cc.bingj.com domain.
– However, this action did not fully address ongoing vulnerabilities, as Copilot retained access to sensitive data even after Microsoft’s deletions.
– **Implications for Organizations:**
– Organizations should treat any data made public as potentially compromised permanently.
– Security intelligence must adapt to consider LLMs and AI tools as part of their monitoring frameworks.
– Strict access controls must be enforced to prevent oversharing by AI systems.
– Basic cyber hygiene practices remain essential to mitigate risks associated with sensitive data exposure.
– **Future Considerations:**
– As AI technologies evolve, organizations must remain vigilant and proactive about the security of their data, recognizing that once information is public, it could be subject to future retrieval and misuse.
Lasso’s findings and Microsoft’s partial responses underscore a pressing challenge for data security in the age of generative AI and cloud computing. Organizations are called upon to reevaluate their data management strategies to respond to these emerging threats posed by AI mechanisms and caching systems.