Google has announced Gemma 4, a new family of AI models developed in the same research branch as its proprietary Gemini 3 model. However, unlike the latter, the Gemma family of models is open source and can be used commercially.
It is worth noting that, unlike previous versions of the Gemma models, which included specific terms of use and were not truly open source in the strict sense, Gemma 4 has a Apache 2.0 license totally permissive without commercial restrictions. With this, Google directly challenges the Llama de Meta models, which also use the Apache license.
News and versions of Gemma 4
Google has released its new model in four different sizesdesigned to cover everything from inference on mobile devices to workstation-class implementations. All four models are multimodal, process video and images natively, and are trained in more than 140 languages. “To drive the next generation of research and pioneering products, we have sized Gemma 4 models specifically to run and optimize efficiently on hardware, from billions of Android devices around the world to laptop GPUs, workstations and developer accelerators.”they say from Google.
One of the most important new features of Gemma 4 is its focus on agent-based workflows. All models offer native support for function calls, structured JSON output, and native system instructions. This allows developers to create autonomous AI agents that can reliably execute complex logic and interact with external APIs, all locally.
As to performanceGoogle assures that the Gemma 4 model, with a density of 31 bytes, currently occupies third place among open models in the Arena AI ranking, while the 26-byte model is in sixth place, significantly outperforming competitors up to 20 times its size. The unquantified weights of the 26 and 31 byte models fit perfectly on a single 80GB NVIDIA H100 GPU.
For local development, 26B’s Mixture of Experts (MoE) model is hyper-optimized to minimize latency, activating only 3.8 billion of its parameters during inference. This allows you to generate tokens at breakneck speed, which is useful for powering local encoding wizards on consumer graphics cards.
Google has also focused on multimodality of these models. Expanding on last year’s Gemma 3n, designed for mobile devices, the entire Gemma 4 family naturally processes high-resolution video and images. The E2B and E4B edge models go a step further by incorporating native audio input for smooth, near-zero latency speech recognition. These models feature a 128 KB context window for edge devices and up to 256 KB for the larger 26B/31B models.
Gemma 4 is compatible with platforms such as Hugging Face, Ollama and vLLM, in addition to having the hardware optimization from NVIDIA, AMD, Qualcomm and MediaTek. For mobile app developers, models are ready for prototyping in the AICore Developer Preview, ensuring compatibility with the upcoming Gemini Nano 4.
