Amazon Web Services (AWS) has officially launched the Amazon EC2 P5e instances, powered by NVIDIA H100 Tensor Core GPUs, to enhance its computing infrastructure for AI, machine learning, and high-performance computing (HPC) applications.
According to the company, the EC2 P5e instances bring significant improvements in performance, cost-efficiency, and scalability over their predecessors, the EC2 P5 instances, which were already known for their powerful computing capabilities.
The P5e instances are equipped with 8 H200 GPUs, offering enhanced GPU memory size and bandwidth compared to the P5 instances. They support up to 3,200 Gbps of networking using second-generation EFA technology and are deployed in Amazon EC2 UltraClusters for large-scale processing with low latency.
(Source: AWS Machine Learning blog post)
Organizations can leverage the P5e instances for a variety of advanced use cases, such as Large language model (LLM) training and inference, such as OpenAI’s GPT or Google’s BERT, and high-performance simulations, including weather forecasting, genomics research, and fluid dynamics modeling.
The authors of an AWS Machine Learning blog, the EC2 P5e instances, write:
The higher memory bandwidth of the H200 GPUs in the P5e instances allows the GPU to fetch and process data from memory more quickly. This reduces inference latency, which is critical for real-time applications like conversational AI systems where users expect near-instant responses. The higher memory bandwidth enables higher throughput, allowing the GPU to process more inferences per second.
When users launch P5 instances, they can utilize AWS Deep Learning AMIs (DLAMI) to back P5 instances. DLAMI delivers ML practitioners and researchers with the necessary infrastructure and tools to swiftly develop scalable, secure, distributed ML applications in pre-configured environments. Users can run containerized applications on P5 instances using AWS Deep Learning Containers with libraries designed for Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS).
Azure and Google Cloud offer powerful instances like AWS EC2 P5e instances, designed for high-performance computing (HPC) and AI/ML workloads. Azure provides NDv5 series virtual machines equipped with NVIDIA Tensor Core GPUs, while Google Cloud offers A3 instances powered by NVIDIA GPUs.
Sanjay Siboo, a director of cloud solutions at Tata Communications, tweeted:
GPUs have become increasingly important for several large software firms, such as AWS, Google, and OpenAI, as the demand for generative AI continues to grow steadily.
Currently, P5e instances in the p5e.48xlarge size are available in the US East (Ohio) AWS region through EC2 Capacity Blocks for ML.