Hugging Face has introduced its latest offering, Hugging Face Generative AI Services (HUGS), aimed at simplifying the deployment and scaling of generative AI applications using open-source models.
Built on Hugging Face technologies such as Transformers and Text Generation Inference (TGI), HUGS promises optimized performance across various hardware accelerators.
For developers using AWS or Google Cloud, the service is available at $1 per hour per container, with a five-day free trial on AWS to help users get started.
Streamlining AI with zero-configuration inference
HUGS offers developers a solution to run AI models on their own infrastructure without the need for manual configuration. One of the primary challenges when deploying large language models (LLMs) is optimizing them for specific hardware environments. Each accelerator, whether it is an NVIDIA GPU or an AMD GPU, requires fine-tuning to extract maximum performance.
With HUGS, these optimizations are managed automatically, delivering high throughput out of the box. In addition to NVIDIA and AMD GPUs, the company promises that its support will soon extend to AWS Inferentia and Google TPUs.
Hugging Face aims to ease the transition from black-box APIs to open, self-hosted solutions with support for a wide array of models, including well-known LLMs like Llama and Gemma, with plans to introduce multimodal models such as Idefics and Llava soon. In the future, the company says it will include embedding models like BGE and Jina, giving developers even more options to customize their AI applications.
This service uses standardized APIs compatible with OpenAI’s model interfaces, therefore, developers can migrate their own code.
For startups in particular, HUGS provides an opportunity to build AI applications without incurring the high costs associated with proprietary platforms. The availability of one-click deployments on DigitalOcean makes it even easier for small teams to experiment with generative AI technologies.
Meanwhile, larger enterprises can leverage HUGS to scale their applications without being locked into a single cloud provider or proprietary API. On DigitalOcean, HUGS is included at no extra charge beyond the standard cost of GPU Droplets. Hugging Face also offers custom deployment solutions for enterprises through its Enterprise Hub.