Cloud platform-as-a-service company Vercel has published a deep dive into Hive, its new low-level compute platform that powers the infrastructure for its customers’ builds. Vercel has used Hive since November 2023 for untrusted and ephemeral computing tasks.
Writing in the blog post, Vercel’s Product Manager for Builds and Compute, Mariano Cocirio, and Principal Engineer Guðmundur Bjarni Ólafsson explain that they developed Hive to address the growing demands of customers while maintaining security in a multi-tenant environment.
“We built Hive because we needed finer control and more granular management to continuously improve Vercel’s infrastructure.”
The platform is made up of a system of regional clusters called ‘hives’, each functioning independently with its own failure boundary. Each cluster contains several key components, including ‘boxes’ (bare metal servers), ‘cells’ (virtual machines), a control plane for orchestration, and a dedicated API for each Hive instance.
At the technical level, Hive uses Kernel-based Virtual Machine (KVM) technology and Firecracker, an open-source MicroVM developed initially at AWS to improve AWS Lambda and AWS Fargate. These technologies combine to create secure, isolated environments for running build processes. The system has a ‘box daemon’ to manage provisioning and communication with ‘cell daemons’, which control the build containers that execute customer workloads.
Vercel has further streamlined the build process by pre-warming cells, allowing most builds to start immediately without having to wait for the cells to be created. When a build is initiated, the system selects an appropriate Hive cluster based on customer and build configurations, and then executes the build within a container inside a cell. After completion, the cell is destroyed, maintaining the platform’s ephemeral nature.
Vercel have focused on security and isolation for the architecture of Hive. Each virtual machine is assigned dedicated CPUs and memory, while disk and network throughput are rate-limited based on overall capacity and box division. This approach ensures potentially malicious code can be executed safely in a multi-tenant environment without compromising either system integrity or performance.
In a YouTube video discussing the Vercel blog post, Mehul Mohan from Codedamn explains the practical workflow: when developers push code to GitHub, a webhook triggers Vercel’s build process, demonstrating why Hive needs to be both untrusted and ephemeral. Mohan draws a parallel to Codedamn’s playgrounds, which face similar challenges in running untrusted code, helping to illustrate why Vercel’s architecture uses multiple layers of isolation.
In a post on Sum Of Bytes, Arpit Kumar discusses various approaches to virtualisation in serverless computing, with a particular focus on MicroVMs as a solution that balances security, isolation, and performance. Traditional containers, while fast and efficient, share the host kernel and can be a security risk if malicious code causes kernel panics. This led to the development of MicroVMs, with Firecracker being a notable example that can start in as little as 300ms.
Kumar compares different MicroVM implementations across large companies – with AWS using Firecracker for Lambda and Fargate, Cloudflare having V8 Isolates to run workers, and Google’s gVisor as another security-focused VM for running containers. He also references WebAssembly as another sandboxed approach with near-native speed, and other emerging Rust-based MicroVM alternatives.
Hive also integrates with Vercel’s Secure Compute product, enabling organizations to use private network connections for sensitive build processes. The integration represents a significant improvement over previous solutions, particularly in provisioning times for secure builds.
In an additional post, Cocirio and Ólafsson explain how this works:
“Each Hive cell initiates a secure tunnel with the connector’s WireGuard interface. Keys are generated at instance boot and distributed via the Key Exchange Service, with no persistence or reuse.”
The platform’s security model uses Linux network namespaces, with each WireGuard interface operating in its own namespace within the box. This architecture ensures that all cell traffic is encrypted and correctly routed through a secure tunnel back to the customer infrastructure.
A significant improvement over the previous Fargate-based solution is evident in the system’s performance. Secure builds previously required up to 90 seconds of provisioning time within the private network. The new Hive-based architecture has reduced this to just 5 seconds while maintaining the same level of network security, with a 30% improvement in overall build performance compared to previous solutions. Vercel is partly attributing these gains to optimisations such as Docker image caching, which in itself has reduced startup times by approximately 45 seconds.
The platform’s success has led Vercel to consider expanding the use of Hive to other areas of its business. While currently focused on infrastructure for customer builds, the company describes Hive as a general-purpose compute platform with the potential for broader applications in the future. Work to further enhance caching strategies and optimise repo cloning is underway as Vercel continues exploring improvements and new use cases.