Apple recently published a new paper revealing how it’s training AI to edit images like humans do. Called “Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing,” the paper is another glimpse at Apple’s AI efforts. Even though the company still feels like it’s lagging behind major players, this study may relate to rumors that Apple Intelligence will be able to receive instructions to edit images, like telling Siri to crop an image, adjust balance, and so on.
In this paper, Apple used around 400,000 high-quality examples of text-guided image editing, using the new Nano-Banana model to perform the image edits, Gemini-2.5-Flash to generate edit instructions, and Gemini-2.5-Pro to judge the quality of the edits. This study was organized around 35 detailed edit types, like changing colors, styles, adding objects, and more.
Apple researchers discovered that large-scale editing is possible with high quality by using real images and applying strong filtering via a judging model. In the tests, Apple found that style edits are the most reliable AI experience, while moving objects and changing text have a much lower success rate.
Here’s how Apple could benefit from this study
This study sheds light on how Apple could use AI image editing. So far, the company offers tools like Clean Up and Image Playground. With iOS 26, the company added support for more ChatGPT styles, but Apple hasn’t been too deep into AI image editing like Google, Samsung, and other competitors.
The paper shows Apple might use this dataset to train or fine-tune its future multimodal models, and also serve as a benchmark to evaluate new AI models on image editing precision. If the company keeps building on this study, we might get more natural, powerful image editing tools that understand users’ requests.
Time will tell how Apple is going to implement these possible changes, but it’s likely that we might get a better idea early next year once the company rolls out its long-awaited revamped Siri. In this first stage, the personal assistant is expected to have on-screen awareness and, hopefully, the ability to search data on your device.
    


 
			 
 
                                 
                              
		 
		 
		 
		 
		