At Meta’s first-ever LlamaCon event, the company announced several new tools for building with their Llama AI models: a limited preview of the Llama API that allows developers to experiment with different models, and new Llama Protection Tools for securing AI applications.
LlamaCon was a one-day virtual event that featured a keynote by chief product officer Chris Cox and two one-on-one chats between Meta CEO Mark Zuckerberg and other CEOs: Satya Nadella of Microsoft and Ali Ghodsi of Databricks. Besides announcing the API, Meta also announced a collaboration with Cerebras and Groq to bring fast inference capability to the API. They also announced an integration of LlamaStack with NVIDIA NeMo microservices. Meta also announced new open-source AI safeguard tools: Llama Guard 4, LlamaFirewall, and Llama Prompt Guard 2. According to Meta:
We’re committed to being a long-term partner for enterprises and developers and providing a seamless transition path from closed models. Llama is affordable, easy-to-use, and enabling more people to access the benefits of AI regardless of their technical expertise or hardware resources. We believe in the potential of AI to transform industries and improve lives, which is why we’re excited to continue supporting the growth and development of the Llama ecosystem for the benefit of all. We can’t wait to see what you’ll build next.
The Llama protection tools are a suite of safeguards that developers can use to make their AI applications more secure. The LlamaCon release includes Llama Guard 4, a content moderation model; Prompt Guard 2, a tool for preventing jailbreaks and prompt injection; and LlamaFirewall, an orchestration component for integrating multiple protection tools into an AI application.
The Llama API has been released as a free preview. Meta touts its “easy one-click API key creation and interactive playgrounds.” Available models include the recently released Llama 4 Scout and Maverick MoE models. The release also includes Python and Typescript SDKs, and the API is compatible with OpenAI’s SDK, “making it easy to convert existing applications.”
The API includes resources for fine-tuning and evaluating custom models. Meta claims they will not use any uploaded prompts or model outputs in their own training. They also say that developers can download and run their custom models anywhere.
Some users in Hacker News discussion about LlamaCon lamented the restrictions on the Llama license, making it in their opinion not fully open-source. In reference to the API announcement, another user remarked:
Feels like Meta is going to Cloud services business but in AI domain. They resisted entering cloud business for so long, with the success of AWS/Azure/GCP I think they are realizing they can’t keep at the top only with social networks without owning a platform (hardware, cloud).
In a Reddit thread about the announcements, users reacted positively to the fast inference news:
I think the future lies with speed for sure. You can do some wild things when you are able to pump out hundreds if not thousands of tokens a second.
Developers interested in gaining access to the Llama API Preview can join the waitlist. The Llama Stack code is available on GitHub.