AWS recently announced the availability of Meta’s latest foundation models, Llama 4 Scout and Llama 4 Maverick, in Amazon Bedrock and AWS SageMaker JumpStart. Both models provide multimodal capabilities and follow the mixture-of-experts architecture.
Launched by Meta last April, Llama 4 Scout and Maverick include 17 billion active parameters distributed across 16 and 128 experts, respectively. Llama 4 Scout is optimized to run on a single NVIDIA H100 GPU for general-purpose tasks. According to Meta, Llama 4 Maverick provides enhanced reasoning and coding capabilities and outperforms other models in its class. Amazon highlights the value of the mixture-of-experts architecture in reducing compute costs, making advanced AI more accessible and cost-effective:
Thanks to their more efficient mixture of experts (MoE) architecture—a first for Meta—that activates only the most relevant parts of the model for each task, customers can benefit from these powerful capabilities that are more compute efficient for model training and inference, translating into lower costs at greater performance.
While Llama 4 Scout supports a context window of up to 10 million tokens, Amazon Bedrock currently allows up to 3.5 million tokens, but plans to expand it shortly. Llama 4 Maverick supports a maximum of one million tokens. In both cases, these represent a significant increase over the 128K context window available for Llama 3 models.
On Amazon SageMaker JumpStart, you can use the new models with SageMaker Studio or the Amazon SageMaker Python SDK depending on your use case. Both models default to a ml.p5.48xlarge
instance, which features NVIDIA H100 Tensor Core GPUs. Alternatively, you can choose a ml.p5en.48xlarge
instance powered by NVIDIA H200 Tensor Core GPUs. Llama 4 Scout also supports the ml.g6e.48xlarge
instance type, which uses NVIDIA L40S Tensor Core GPUs.
Llama 4 models are available on several other cloud providers, including Databricks, GroqCloud, Lambda.ai, Cerebras Inference Cloud, and others. Additionally, you can access them on Hugging Face.
In addition to Scout and Maverick, Behemoth is the third model in the Llama 4 family, featuring 288 billion active parameters distributed across 16 experts. Meta describes Behemoth, currently in preview, as the most intelligent teacher model for distillation, having used it to train both Scout and Maverick.