TL;DR
- Whisk generates an AI image by combining subject, scene, and style visual inputs.
- It uses Gemini and Imagen 3 to reinterpret the uploaded images.
- You can tweak the underlying prompts to refine the final output.
AI image generators are a modern marvel, but you can’t always find the right words to describe your creative vision. Google has introduced Whisk for just such occasions. This new experimental tool from Google Labs skips the traditional generative text-based AI approach and allows users to upload images for the subject, scene, and style to create unique results.
Unveiling Whisk in a Labs blog post, Google explains how it works: Once you’ve uploaded two or three images, they’re analyzed through Gemini, which generates detailed captions describing the key characteristics of the inputs. In that sense, you’re just getting Whisk to describe the images for you. These captions are then processed by Imagen 3, Google’s latest image generation model, to generate a new image that blends the provided subject, scene, and style.
For example, a user might combine an image of a cat, a lily pad scene, and a sparkly aesthetic to create a fantastical creature resting on a pond. The tool captures the essence of the input images rather than replicating them exactly, but you can ask it to try again if it’s a long way off what you had in mind.
If Whisk is in the right ballpark with the final image, you can refine it by modifying the underlying written prompts or adding additional instructions. This could be to tweak features such as colors, patterns, or other stylistic elements. This gives you the potential to experiment and iterate until you have an image you’re satisfied with.
Whisk is now available in the US through labs.google/whisk. You can try the tool for free and download your creations directly from the platform. Feedback from early adopters will help Google refine it further.