Exclusive: Together AI Launches Self-service GPU Infrastructure - News

Together Computer Inc., a startup building a cloud service optimized for artificial intelligence model development and deployment, today announced the general availability of Instant Clusters, a service that automates the provisioning of clusters of graphics processing units.

The company, which operates as Together AI, stated that its service allows customers to access GPU clusters, ranging from a single node with eight GPUs to large, multi-node systems with hundreds of processors, using a single application programming interface. It supports the latest Nvidia Corp. hardware, including Hopper and Blackwell GPUs, and is optimized for use cases such as distributed training and elastic inference.

The service has been in beta test since early summer and the GA release includes several updates based on user feedback, said Charles Zedlewski, chief product officer at Together AI. Among them are improved autoscaling features, the ability to extend reserved infrastructure dynamically and support for infrastructure-as-code tools Skypilot and Terraform.

“We added Terraform support so that people could build their own automations around these GPU clusters,” Zedlewski said. “We also added the ability to recreate clusters and remount them with the original data and storage.”

This remounting capability supports episodic training workloads, in which users pause and resume training jobs over extended periods in which are common in large-scale model development.

GPU cloud

Instant Clusters are essentially designed to emulate the user experience of conventional cloud infrastructure while handling the specific demands of AI workloads. Clusters come preloaded with drivers, schedulers and networking components, including GPU Operator, Nvidia Network Operator and InfiniBand interconnects. Configuring those components manually can take days, the company said.

Zedlewski said because GPU infrastructure differs fundamentally from traditional CPU environments, setup and configuration has remained primarily a manual process. “The whole stack of virtualization and automation around GPU infrastructure is meaningfully different than the equivalent stack that we’ve known for a long time with x86 CPU infrastructure,” he said. Cloud computing providers have spent 20 years fine-tuning CPU infrastructure but are still learning the ins and outs of how to optimize for AI.

Together AI said it performs hardware checks, stress tests and inter-node communication validations before making clusters available. “If you provisioned an eight-node, 64-GPU cluster, we basically pretest every node before it shows up in your environment,” Zedlewski said.

Instant Clusters are optimized for use with Kubernetes, Slurm and other orchestration tools. Customers can lock in specific driver and Nvidia Cuda versions and reuse custom container images to simplify reproducibility across training and inference phases.

Storage can be mounted to clusters on demand. Though users must use Together AI’s POSIX-compliant parallel file systems, storage and compute can be scaled independently.

The service supports variable pricing models based on usage duration, with hourly, daily and multimonth commitments available. A low-end Nvidia HGX H100 inference cluster ranges from $1.76 to $2.39 per hour based on the customer’s frequency commitment. Nvidia’s high-end HGX B200 costs $4 per hour with a long-term commitment and $5.50 per hour for on-demand usage.

Zedlewski said most organizations would struggle to match the infrastructure’s cost-efficiency by building in-house: “I’d be very surprised if anyone attempts to roll their own,” he said.

Image: News/Microsoft Image Creator

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About News Media

News Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of News, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — News Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.