Hugging Face Expands Serverless Inference Options With New Provider Integrations

Hugging Face Expands Serverless Inference Options with New Provider Integrations

Last updated: 2025/02/04 at 5:58 AM

News Room Published 4 February 2025

Hugging Face has launched the integration of four serverless inference providers Fal, Replicate, SambaNova, and Together AI, directly into its model pages. These providers are also integrated into Hugging Face’s client SDKs for JavaScript and Python, allowing users to run inference on various models with minimal setup.

This update enables users to select their preferred inference provider, either by using their own API keys for direct access or by routing requests through Hugging Face. The integration supports different models, including DeepSeek-R1, and provides a unified interface for managing inference across providers.

Developers can access these services through the website UI, SDKs, or direct HTTP calls. The integration allows seamless switching between providers by modifying the provider name in the API call while keeping the rest of the implementation unchanged. Hugging Face also offers a routing proxy for OpenAI-compatible APIs.

Rodrigo Liang, Co-Founder & CEO at SambaNova stated:

We are excited to be partnering with Hugging Face to accelerate its Inference API. Hugging Face developers now have access to much faster inference speeds on a wide range of the best open source models.

And Zeke Sikelianos, Founding Designer at Replicate quoted:

Hugging Face is the de facto home of open-source model weights, and has been a key player in making AI more accessible to the world. We use Hugging Face internally at Replicate as our weights registry of choice, and we’re honored to be among the first inference providers to be featured in this launch.

Fast and accurate AI inference is essential for many applications, especially as demand for more tokens increases with test-time compute and Agentic AI. Open-source models help optimize performance on RDU, enabling developers to achieve up to 10x faster inference with improved accuracy.

Billing is handled by the inference provider if a user supplies their own API key. If requests are routed through Hugging Face, charges are applied at standard provider rates with no additional markup.

Hugging Face Expands Serverless Inference Options with New Provider Integrations

Leave a Reply Cancel reply

Stay Connected

Latest News

OpenAI’s Operator Signals The Agentic Era Of Commerce Is Here

https://news.google.com/read/CBMijgFBVV95cUxQX2tIOU5qZXVaaFlsWU5UaU1ScU9LNnBOQ2RlN0RxcEhCVFFDTGRoSHIwdjN1Yk5WSlE0RjBINXpUMWhCeUZ0TU9VZEJIVnRBSXV2M2l0QVpzeklrYmFfX0hTMC1oVVNtcUVjUzl0RHVRT2xoWlVyT05qTTFfQ25US25za0RBajZCVTRDc0tB?hl=en-GB&gl=GB&ceid=GB%3Aen

This Wireless Xbox Headset Is as Simple as It Gets

Grubhub security breach compromises customer and driver data

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News