Artificial intelligence chip startup Cerebras Systems Inc. today said it has begun deploying its wafer-scale AI accelerator chips across six new cloud data centers in North America and France to provide ultrafast AI inference.
The company also announced a new partnership with Hugging Face Inc., a hub best known for hosting open-source machine learning and AI models, which will bring the company’s inference platform to Hugging Face Hub.
Cerebras is best known for its specialized architecture that runs on dinner plate-sized silicon wafers for high-performance computing or HPC systems. This allows the company to provide an inference service, which enables it to serve models such as Meta Platform Inc.’s Llama 3.3 70B at more than 2,000 tokens per second.
“Cerebras is turbocharging the future of U.S. AI leadership with unmatched performance, scale and efficiency – these new global data centers will serve as the backbone for the next wave of AI innovation,” said Dhiraj Mallick, chief operating officer of Cerebras Systems.
Launched in August 2024, the company’s AI inference service swiftly gained traction with major AI clients. Notable customers include Mistral AI, a leading French startup that offers the AI assistant and chatbot Le Chat, and the AI-powered search engine Perplexity AI Inc.
The company is expanding by launching the new data centers in Texas, Minnesota, Oklahoma and Georgia, along with campuses in Montreal, Canada and France. Cerebras said it will retain full ownership of the facilities in Oklahoma City and Montreal. The other centers will be operated in partnership with G42, a strategic partner.
As demand for reasoning models such as OpenAI’s o3 and DeepSeek R1 continue to increase, the need for faster inference will follow. These models use a “chain of thought” technique to solve complex problems by breaking them down into smaller, logical steps to reach the solution and display their “thinking” as they go along. That also means models can take minutes to find a final solution, but using Cerebras inference, the models can execute deep reasoning in seconds.
Hugging Face partnership
A new partnership between Hugging Face and Cerebras will bring high-speed AI inference to millions of developers around the world.
Cerebras Inference is capable of running the industry’s most popular models at more than 2,000 tokens per second. The company said that’s more than 70 times faster than comparable cloud-based solutions that use Nvidia Inc.’s most powerful graphics processing units.
The opportunity to use this new service without needing to go to an outside party from directly within Hugging Face will make it easier for developers to experiment with models and build their own solutions faster.
That’s especially important as agentic AI becomes the norm. It’s a type of AI that can take action and achieve goals without human supervision. AI agents “reason” through complex tasks, use external tools and sift through data to complete goals. That type of problem-solving requires a lot of AI computing power.
“By making Cerebras Inference available through Hugging Face, we’re empowering developers to work faster and more efficiently with open-source AI models, unleashing the potential for even greater innovation across industries,” said Cerebras Chief Executive Andrew Feldman.
Developers can turn on Cerebras Inference when using Hugging Face Hub by selecting “Cerebras” as their provider on the platform for any open-source model when using the inference application programming interface.
Photo: Cerebras Systems
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU