Salesforce Migrates 1,000+ EKS Clusters To Karpenter To Improve Scaling Speed And Efficiency

Salesforce has completed a phased migration of more than 1,000 Amazon Elastic Kubernetes Service (EKS) clusters from the Kubernetes Cluster Autoscaler to Karpenter, AWS’s open-source node-provisioning and autoscaling solution. The large-scale transition aimed to reduce scaling latency, simplify operations, cut costs, and enable more flexible, self-service infrastructure for internal developers across the company’s extensive Kubernetes fleet.

Facing limitations with Auto Scaling group-based autoscaling and the Cluster Autoscaler, including slow scale-up times, poor utilization across availability zones, and a proliferation of thousands of node groups, Salesforce’s platform team built custom tooling to automate and manage the migration safely and reliably. This approach combined carefully orchestrated node transitions with automation that respected Pod Disruption Budgets (PDBs), supported rollback paths, and integrated with the company’s CI/CD provisioning pipelines.

The migration journey began in mid-2025 with lower-risk environments and progressed through testing and validation phases before production adoption in early 2026. Salesforce’s engineers developed an in-house Karpenter transition tool and patching checks that handled node rotation, Amazon Machine Image (AMI) validation, and graceful pod eviction, enabling repeatable and consistent conversion across diverse node pool configurations.

Through this transition, the team resolved operational challenges such as misconfigured PDBs that blocked node replacements, Kubernetes label length constraints that caused automation failures, and workload patterns where Karpenter’s efficient bin-packing needed adjustments to prevent disruptions for single-replica applications. These insights led to refined practices, including proactive policy validation and workload-aware disruption strategies.

Salesforce reported measurable operational and cost improvements following the migration. By adopting Karpenter’s dynamic provisioning model, cluster scaling latency decreased from minutes to seconds, node utilization improved through smarter bin-packing, and reliance on static Auto Scaling groups was significantly reduced.

Operational overhead decreased by approximately 80%, as automated processes replaced manual node group management, enabling developers to declare node pool configurations themselves. This acceleration of onboarding reduced dependency on central platform teams. Additionally, initial results showed cost savings of about 5% in FY2026, with an anticipated further 5–10% reduction in FY2027 as Karpenter’s bin-packing and spot instance utilization continue to optimize resources.

Salesforce’s migration highlights broader trends in large-scale Kubernetes operations, where traditional autoscaling mechanisms struggle to keep pace with dynamic workloads and heterogeneous infrastructure demands. Karpenter’s real-time decision-making, support for heterogeneous instance types (including GPU and ARM), and tighter integration with cloud APIs enable faster responsiveness and more efficient node usage compared to Cluster Autoscaler.

Other organizations undertaking large-scale transitions from traditional Kubernetes autoscaling to more dynamic solutions like Karpenter have faced many of the same structural challenges that Salesforce documented. For example, Coinbase has publicly described its move toward Karpenter to handle complex mixed-workload clusters with variable demand patterns, citing improvements in scale-up latency and resource efficiency while reducing operational friction caused by static node groups. In a similar vein, BMW Group shared how adopting Karpenter across its automotive platforms allowed better use of spot instances and workload-aware scheduling, enabling faster developer feedback loops and reduced infrastructure cost volatility. These cases echo Salesforce’s observation that Cluster Autoscaler’s reliance on predefined Auto Scaling groups and slower decision paths can hinder responsiveness in environments with diverse and bursty workloads.

What sets Salesforce’s migration apart is its scale and automation tooling: transitioning over 1,000 distinct EKS clusters required bespoke tooling to handle policy validation, Pod Disruption Budget constraints, Kubernetes label limits, and incremental rollout automation at a fleet level. Other companies have reported benefits from Karpenter in individual clusters or smaller fleets, but Salesforce’s approach emphasizes repeatable, automated conversion at enterprise scale with integrated rollback and compliance safeguards. In practice, this meant not just replacing autoscaling logic, but also harmonizing workload patterns, governance controls, and developer self-service expectations across a global platform. While the end goals: faster scaling, better utilization, and reduced manual overhead, are shared across these migrations, Salesforce’s blueprint highlights the operational discipline and custom automation required to bring such benefits to large, production-critical environments.

As enterprises increasingly adopt Kubernetes for mission-critical services, Salesforce’s experience offers a blueprint for other organizations weighing similar transitions, demonstrating that automated, federated autoscaling can lead to substantial gains in performance, cost efficiency, and developer velocity – provided that careful planning and tooling support underpin the change.