Hugging Face has launched the integration of four serverless inference providers Fal, Replicate, SambaNova, and Together AI, directly into its model pages. These providers are also integrated into Hugging Face’s client SDKs for JavaScript and Python, allowing users to run inference on various models with minimal setup.
This update enables users to select their preferred inference provider, either by using their own API keys for direct access or by routing requests through Hugging Face. The integration supports different models, including DeepSeek-R1, and provides a unified interface for managing inference across providers.
Developers can access these services through the website UI, SDKs, or direct HTTP calls. The integration allows seamless switching between providers by modifying the provider name in the API call while keeping the rest of the implementation unchanged. Hugging Face also offers a routing proxy for OpenAI-compatible APIs.
Rodrigo Liang, Co-Founder & CEO at SambaNova stated:
We are excited to be partnering with Hugging Face to accelerate its Inference API. Hugging Face developers now have access to much faster inference speeds on a wide range of the best open source models.
And Zeke Sikelianos, Founding Designer at Replicate quoted:
Hugging Face is the de facto home of open-source model weights, and has been a key player in making AI more accessible to the world. We use Hugging Face internally at Replicate as our weights registry of choice, and we’re honored to be among the first inference providers to be featured in this launch.
Fast and accurate AI inference is essential for many applications, especially as demand for more tokens increases with test-time compute and Agentic AI. Open-source models help optimize performance on RDU, enabling developers to achieve up to 10x faster inference with improved accuracy.
Billing is handled by the inference provider if a user supplies their own API key. If requests are routed through Hugging Face, charges are applied at standard provider rates with no additional markup.