Google has released a new AI model called DolphinGemma, which has been developed to assist researchers in analyzing and interpreting dolphin vocalizations. The project is part of an ongoing collaboration with the Wild Dolphin Project (WDP) and researchers at Georgia Tech, and it focuses on identifying patterns in the natural communication of Atlantic spotted dolphins.
DolphinGemma is based on Google’s Gemma language model architecture and adapted specifically for audio data. It uses the SoundStream tokenizer to convert dolphin sounds into machine-readable sequences, enabling the model to detect recurring patterns and predict likely next sounds in a sequence. With approximately 400 million parameters, the model is small enough to run locally on smartphones, including the Google Pixel devices used by WDP in the field.
The WDP has compiled one of the most comprehensive datasets of wild dolphin behavior and vocalizations, collected over nearly four decades. This dataset includes audio and video recordings linked to known individual dolphins, their social relationships, and observed behaviors. Researchers have established connections between specific types of sounds, such as signature whistles, burst-pulse squawks, and buzzes, and particular behavioral contexts. DolphinGemma has been developed to analyze this data, facilitating the identification of statistical regularities and potentially significant structures within the sound sequences.
In addition to analyzing natural communication, DolphinGemma is integrated into the CHAT (Cetacean Hearing Augmentation Telemetry) system developed by Georgia Tech. CHAT enables a basic form of symbolic interaction with dolphins by using synthetic whistles tied to objects the animals interact with, such as seaweed or scarves. If dolphins mimic these sounds, researchers can interpret them as object requests. DolphinGemma supports the system by improving the accuracy of sound recognition and the speed of response, which are key in underwater interactions.
The model runs on current-generation smartphones like the Google Pixel 9, reducing the need for custom hardware. This simplifies deployment in field conditions and helps lower the cost and size of the system. The phone’s built-in processing capabilities allow DolphinGemma to operate in real time during fieldwork, assisting researchers as they track and respond to dolphin vocalizations.
Google has stated its intention to release DolphinGemma as an open-source model later in 2025. Although it is currently trained on Atlantic spotted dolphin vocalizations, the model can be fine-tuned for use with other species. This may support broader research on cetacean communication, though practical application will depend on the availability of well-labeled datasets for each species.
While the model does not interpret meaning in dolphin communication, it may help researchers identify structural features that guide further investigation. The news has sparked significant interest among AI researchers, with many considering it a potential turning point in understanding non-human communication.