IBM Cloud Code Engine, the company’s fully managed, strategic serverless platform, has introduced Serverless Fleets with integrated GPU support. With this new capability, the company directly addresses the challenge of running large-scale, compute-intensive workloads such as enterprise AI, generative AI, machine learning, and complex simulations on a simplified, pay-as-you-go serverless model.
Historically, as noted in academic papers, including a recent Cornell University paper, serverless technology struggled to efficiently support these demanding, parallel workloads, which often required thousands or millions of tasks to execute simultaneously using specialized hardware. With Serverless Fleets, IBM aims to bridge this gap by offering high-performance computing resources without the operational complexity of managing dedicated infrastructure.
Michael Behrendt, CTO Serverless and IBM Distinguished Engineer, commented in a LinkedIn post:
The architecture of this capability was informed and driven a lot by running large real-world workloads with 100,000s of processors. It is built in such a robust way that it can run these workloads with essentially zero SRE staff.
Serverless Fleets simplifies how data scientists and developers execute compute-intensive tasks by providing a single endpoint for submitting a large number of batch jobs. In a blog post, IBM mentions that Code Engine then automatically handles the infrastructure orchestration:
- The service automatically provisions the necessary compute resources, including virtual machines (VMs) and serverless Graphics Processing Units (GPUs), such as the NVIDIA L40, to run multiple tasks simultaneously.
- Furthermore, Serverless Fleets is designed for run-to-completion tasks that scale elastically. The system determines the optimal number of worker instances needed and deploys them to handle the parallel execution efficiently.
- And finally, the resources are automatically removed once the workloads are complete, ensuring users are charged only for the technology consumed during execution.
With the launch of IBM Cloud Code Engine’s Serverless Fleets, the company brings a competitive offering. Other hyperscalers, such as AWS, offer solutions like AWS Fargate for running containers on serverless compute (often paired with EKS or ECS for orchestration), and Azure provides Serverless GPUs in Container Apps. Yet, IBM is emphasizing the unified environment with a single, simple platform for web apps, functions, and now massive, GPU-accelerated batch jobs.
Where competitors may require developers to stitch together multiple services (e.g., a serverless runtime, a container service, and a batch orchestrator), Serverless Fleets aims to simplify this by fully managing the provisioning and elastic scaling of GPU-backed Virtual Machines from a single endpoint, reducing the complexity and operational overhead often associated with running elastic, GPU-intensive workloads in the cloud. In a Medium blog post, Luke Roy concluded:
Whether you’re working on media processing, AI inference, or scientific workloads, IBM Cloud Code Engine Serverless Fleets provides a robust and developer-friendly solution.
The company stated in a blog post that, in today’s competitive landscape, enterprises across industries need to deliver services quickly and conveniently while prioritizing security, resilience, and cost savings.
