Cloudflare has eliminated manual configuration errors across hundreds of production accounts by implementing Infrastructure as Code with automated policy enforcement, processing approximately 30 merge requests daily while catching security violations before deployment rather than after incidents occur.
The company’s Customer Zero team faced a critical problem: a single misconfiguration could propagate across Cloudflare’s global edge in seconds, potentially locking out employees or taking down production services. Manual dashboard management across hundreds of accounts created too many opportunities for human error at this scale.
The solution centered on treating all infrastructure configurations as code with mandatory peer review and automated security checks. Every production change now goes through a validation pipeline that enforces approximately 50 security policies before deployment. Teams still use the dashboard for analytics and observability, but critical production changes require code commits tied to users, tickets, and automated compliance checks.
According to Chase Catelli, Ryan Pesek, and Derek Pitts from Cloudflare’s team, this shift-left approach moves security validation to the earliest stages of development, catching issues when remediation costs are lowest. The model prevents incidents rather than responding to them, while actually increasing engineering velocity by giving teams confidence that their changes are compliant.
The implementation centers on Terraform with the Cloudflare Terraform Provider, integrated into a custom continuous integration and deployment pipeline running on Atlantis with GitLab. All production account configurations live in a centralized monorepo, with individual teams owning and deploying their specific sections as designated code owners.
A custom Go program called tfstate-butler acts as an HTTP backend for Terraform, serving as a secure state file broker. The design prioritizes security by ensuring unique encryption keys per state file, limiting the potential blast radius from any compromise.
Policy enforcement uses the Open Policy Agent framework with Rego language to validate security requirements. Policies run automatically on every merge request, operating in two modes: warnings that allow deployment with comments or denials that block changes entirely. Exception handling requires formal Jira-based approval followed by a pull request to document the deviation.
The migration revealed critical lessons about scaling Infrastructure as Code. High barriers to entry initially stalled adoption as Terraform fluency varied across teams. The cf-terraforming command-line utility, which automatically generates Terraform code from the Cloudflare API, significantly accelerated onboarding by eliminating manual resource imports.
Configuration drift emerged when teams made urgent dashboard changes during incidents, leaving Terraform state out of sync. Cloudflare implemented automated drift detection, which continuously compares state files with deployed configurations and automatically creates remediation tickets with service-level agreements when discrepancies are detected.
Cloudflare Terraform Provider lagging API capabilities created friction as Cloudflare’s rapid product innovation outpaced Terraform support. The v5 provider release resolved this by automatically generating code from OpenAPI specifications, maintaining continuous alignment between product APIs and infrastructure code capabilities.
The shift-left model demonstrates how organizations can scale Infrastructure as Code while maintaining strict security governance. By moving validation from reactive audits to proactive automated checks, Cloudflare achieved both increased security and engineering velocity.
Many companies are adopting the shift-left approach. Google Cloud points out that locating security issues in production can lead to significant financial penalties, such as GDPR fines of up to 4% of global revenue. Early detection through automated CI/CD security checks can greatly lower remediation costs and reduce the need for architectural changes. OpsMx notes challenges like implementation barriers, gaps in automation, complex tools, and organizational silos, while emphasizing that automated policy enforcement using frameworks like NIST and OWASP helps teams identify and prioritize risks without burdening developers. According to Splunk’s research, 73% of companies see a lack of automation as their main challenge in shift-left practices, but AI-driven tools are quickly improving security testing through smart automation, with adoption rates rising from 64% to 78% in just one year.
The shift-left movement has evolved beyond simply moving security checks earlier. Organizations are now pursuing continuous security validation through automated scanning (SAST, SCA, DAST, secrets management), policy-as-code enforcement, and AI-driven vulnerability prioritization that provides developers with immediate, actionable feedback within their existing workflows.
