GitHub’s Product Security Engineering team secures the code behind GitHub by developing tools like CodeQL to detect and fix vulnerabilities at scale. They’ve shared insights into their approach so other organizations can learn how to use CodeQL to better protect their own codebases.
CodeQL enables automated security analyses by allowing users to query code in a way similar to querying a database. This method is more effective than simple text-based searches as allows it to follow how data moves through the code, spot insecure patterns, and detect vulnerabilities that wouldn’t be obvious from text alone. This provides a deeper understanding of code patterns and uncovers potential security issues.
The team employs CodeQL in various ways to ensure the security of GitHub’s repositories. The standard configuration uses default and security-extended query suites, which are sufficient for the majority of the company’s repositories. This setup allows CodeQL to automatically review pull requests for security concerns.
For certain repositories, such as GitHub’s large Ruby monolith, additional measures are required. In these cases, the team uses a custom query pack tailored to specific security needs. Additionally, multi-repository variant analysis (MRVA) is used to conduct security audits and identify code patterns that warrant further investigation. Custom queries are written to detect potential vulnerabilities unique to GitHub’s codebase.
Initially, custom CodeQL queries were published directly within the repository. However, this approach presented several challenges, including the need to go through the production deployment process for each update, slower analysis times in CI, and issues caused by CodeQL CLI updates. To address these challenges, the team transitioned to publishing query packs in the GitHub Container Registry (GCR). This change streamlined the process, improved maintainability, and reduced friction when updating queries.
When developing a custom query pack, consideration is given to dependencies such as the ruby-all package. By extending classes from the default query suite, the team avoids unnecessary duplication while maintaining concise and effective queries. However, updates to the CodeQL library API can introduce breaking changes, potentially affecting query performance. To mitigate this risk, the team develops queries against the latest version of ruby-all but locks a specific version before release. This ensures that deployed queries run reliably without unexpected issues arising from unintended updates.
To maintain query stability, unit tests are written for each new query. These tests are integrated into the CI pipeline for the query pack repository, enabling early detection of potential issues before deployment. The release process involves several steps, including opening a pull request, writing unit tests, merging changes, incrementing the pack version, resolving dependencies, and publishing the updated query pack to GCR. This structured approach balances development flexibility with the need for stability.
The method of integrating the query pack into repositories depends on the organization’s deployment strategy. Rather than locking a specific version of the query pack in the CodeQL configuration file, GitHub’s security team opted to manage versioning through GCR. This approach allows repositories to automatically use the latest published version while providing the ability to quickly roll back changes if necessary.
One challenge encountered when publishing query packs in GCR was ensuring accessibility across multiple repositories within the organization. Several solutions were considered, including manually granting access permissions, using personal access tokens, and linking repositories to the package for inherited access permissions. The team ultimately implemented the linked repository approach, which efficiently managed permissions across multiple repositories without manual intervention.
GitHub’s security team writes a variety of custom queries to enhance security analysis. These queries focus on identifying high-risk APIs, enforcing secure coding practices, and detecting missing authorization controls in API endpoints. Some queries serve as educational tools rather than strict enforcement mechanisms, using lower severity levels to alert engineers without blocking deployments. This approach allows developers to assess security concerns while ensuring that the most critical vulnerabilities are addressed promptly.