GitHub has launched an AI-powered secret scanning feature within Copilot, integrated into GitHub Secret Protection, that leverages context analysis to improve the detection of leaked passwords in code significantly. This new approach addresses the shortcomings of traditional regular expression-based methods, which often miss varied password structures and generate numerous false positives.
According to a GitHub blog post detailing the development, the system now analyzes the usage and location of potential secrets to reduce irrelevant alerts and provide more accurate notifications critical to repository security. Sorin Moga, a senior software engineer at Sensis, commented on LinkedIn that this marks a new era in platform security, where AI not only assists in development, but also safeguards code integrity.
A key challenge identified during the private preview of GitHub’s AI-powered secret scanning was its struggle with unconventional file types and structures, highlighting the limitations of relying solely on the large language model’s (LLM) initial training data. GitHub’s initial approach involved “few-shot prompting” with GPT-3.5-Turbo, where the model was provided with examples to guide detection.
To address these early challenges, GitHub significantly enhanced its offline evaluation framework by incorporating feedback from private preview participants to diversify test cases and leveraging the GitHub Code Security team’s evaluation processes to build a more robust data collection pipeline. They even used GPT-4 to generate new test cases based on learnings from existing secret scanning alerts in open-source repositories. This improved evaluation allowed for better measurement of precision (reducing false positives) and recall (reducing false negatives).
GitHub experimented with various techniques to improve detection quality, including trying different LLM models (like GPT-4 as a confirming scanner), repeated prompting (“voting”), and diverse prompting strategies. Ultimately, they collaborated with Microsoft, adopting their MetaReflection technique, a form of offline reinforcement learning that blends Chain of Thought (CoT) and few-shot prompting to enhance precision.
As stated in the GitHub blog post:
We ultimately ended up using a combination of all these techniques and moved Copilot secret scanning into public preview, opening it widely to all GitHub Secret Protection customers.
To further validate these improvements and gain confidence for general availability, GitHub implemented a “mirror testing” framework. This involved testing prompt and filtering changes on a subset of repositories from the public preview. By rescanning these repositories with the latest improvements, GitHub could assess the impact on real alert volumes and false positive resolutions without affecting users.
This testing revealed a significant drop in both detections and false positives, with minimal impact on finding actual passwords, including a 94% reduction in false positives in some cases. The blog post concludes that:
This before-and-after comparison indicated that all the different changes we made during private and public preview led to increased precision without sacrificing recall, and that we were ready to provide a reliable and efficient detection mechanism to all GitHub Secret Protection customers.
The lessons learned during this development include prioritizing accuracy, using diverse test cases based on user feedback, managing resources effectively, and fostering collaboration. These learnings are also being applied to Copilot Autofix. Since the general availability launch, Copilot secret scanning has been part of security configurations, allowing users to manage which repositories are scanned.