GitHub Will Use Copilot Interaction Data From Free, Pro, And Pro+ Users To Train AI Models

GitHub has announced that starting April 24, interaction data from Copilot Free, Pro, and Pro+ users will be used to train and improve its AI models. Users are opted in by default and must manually disable the setting if they do not want their data used for training. Copilot Business and Enterprise users are excluded from the change.

The announcement, published by GitHub Chief Product Officer Mario Rodriguez, describes the data that may be collected when the setting is enabled: accepted or modified outputs, inputs and code snippets sent to Copilot, code context surrounding the cursor position, comments and documentation, file names, repository structure, navigation patterns, interactions with Copilot features including chat and inline suggestions, and thumbs up/down feedback on suggestions. Users who previously opted out of GitHub’s prompt and suggestion collection setting will have their preference carried over.

GitHub frames the change as necessary to improve model performance. The company says it has already been incorporating interaction data from Microsoft employees and has seen increased suggestion acceptance rates across multiple languages as a result. The FAQ accompanying the announcement states that the change will go into effect on April 24, giving users 30 days’ advance notice.

The scope of the data collection has drawn scrutiny. Private repository code can be collected and used for training when a user is actively working with Copilot in that repository. GitHub distinguishes between code “at rest,” which it says it does not access, and code actively sent to Copilot during a session, which falls within the scope of the new policy. The collected data may also be shared with GitHub affiliates, as defined in the FAQ, as companies in the same corporate family, primarily Microsoft and its subsidiaries. Third-party model providers do not receive this data for their own training purposes.

Community reaction has been negative, as in the GitHub community discussion, developers criticized the opt-in-by-default approach, with several calling it a dark pattern, such as burnhamup:

Dark pattern to not actually link to the page to update your settings in the email with instructions to disable it.

Another inakarmacoma noted that the opt-out setting was not available through GitHub’s mobile app. On Reddit, a thread with over 1,000 upvotes raised concerns about model collapse from training on AI-generated code, which now makes up a growing share of GitHub repositories, as well as questions about whether the opt-out toggle provides any enforceable guarantee.

The policy also raises questions for organizations using personal-tier Copilot licenses. One developer in the GitHub discussion noted that individual users within an organization typically do not have the authority to license their employer’s source code to third parties. Yet, the opt-out is enforced at the user level, not the organization level. A single team member who does not opt out could potentially expose proprietary code through their Copilot interactions. GitHub’s FAQ addresses this partially, stating that interaction data from users whose accounts are members of or outside collaborators with a paid organization will be excluded from model training, and that data from paid organization repositories is never used, regardless of the user’s subscription tier.

One Reddit commenter, NeatRuin7406, framed the competitive dimension of the issue more broadly, arguing that the opt-out framing misses the structural problem:

When you use copilot, you’re not just getting suggestions, you’re implicitly teaching the model what good code looks like in your domain. Your proprietary patterns, architecture decisions, domain-specific idioms, naming conventions, all get folded into a general model. That model then improves suggestions for everyone else, including your direct competitors who use the same tool.

Another commenter raised potential GDPR concerns, arguing that GitHub’s stated lawful basis of “legitimate interest” for processing personally identifiable information may not hold up under EU law, since the rights and freedoms of data subjects could be considered overriding in this case.

GitHub’s FAQ acknowledges the comparison to competitors, noting that Microsoft, Anthropic, and JetBrains take similar approaches to using interaction data for model training. Users can opt out at any time through their Copilot settings under the “Allow GitHub to use my data for AI model training” heading.