By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Apple taught an AI model to reason about app interfaces – 9to5Mac
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Apple taught an AI model to reason about app interfaces – 9to5Mac
News

Apple taught an AI model to reason about app interfaces – 9to5Mac

News Room
Last updated: 2025/07/16 at 2:05 AM
News Room Published 16 July 2025
Share
SHARE

A new Apple-backed study, in collaboration with Aalto University in Finland, introduces ILuvUI: a vision-language model trained to understand mobile app interfaces from screenshots and from natural language conversations. Here’s what that means, and how they did it.

ILuvUI: an AI that outperformed the model it was based on

In the paper, ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations, the team tackles a long-standing challenge in human-computer interaction, or HCI: teaching AI models to reason about user interfaces like humans do, which in practice means visually, as well as semantically.

“Understanding and automating actions on UIs is a challenging task since the UI elements in a screen, such as list items, checkboxes, and text fields, encode many layers of information beyond their affordances for interactivity alone. (….) LLMs in particular have demonstrated remarkable abilities to comprehend task instructions in natural language in many domains, however using text descriptions of UIs alone with LLMs leaves out the rich visual information of the UI. “

Currently, as the researchers explain, most vision-language models are trained on natural images, like dogs or street signs, so they don’t perform as well when asked to interpret more structured environments, like app UIs:

“Fusing visual with textual information is important to understanding UIs as it mirrors how many humans engage with the world. One approach that has sought to bridge this gap when applied to natural images are Vision-Language Models (VLMs), which accept multimodal inputs of both images and text, typically output only text, and allow for general-purpose question answering, visual reasoning, scene descriptions, and conversations with image inputs. However, the performance of these models on UI tasks fall short compared to natural images because of the lack of UI examples in their training data.”

With that in mind, the researchers fine-tuned the open-source VLM LLaVA, and they also adapted its training method to specialize in the UI domain.

They trained it on text-image pairs that were synthetically generated following a few “golden examples”. The final dataset included Q&A-style interactions, detailed screen descriptions, predicted action outcomes, and even multi-step plans (like “how to listen to the latest episode of a podcast,” or “how to change brightness settings.”)

Once trained on this dataset, the resulting model, ILuvUI, was able to outperform the original LLaVA in both machine benchmarks and human preference tests.

What’s more, it doesn’t require a user to specify a region of interest in the interface. Instead, the model understands the entire screen contextually from a simple prompt:

ILuvUI (…) does not require a region of interest, and accepts a text prompt as input in addition to the UI image, which enables it to provide answers for use cases such as visual question answering.

How will users benefit from this?

Apple’s researchers say that their approach might prove useful for accessibility, as well as for automated UI testing. They also note that while ILuvUI is still based on open components, future work could involve larger image encoders, better resolution handling, and output formats that work seamlessly with existing UI frameworks, like JSON.

And if you’ve been keeping up to date with Apple’s AI research papers, you might be thinking of a recent investigation of whether AI models could not just understand, but also anticipate the consequences of in-app actions.

Put the two together, and things start to get… interesting, especially if you rely on accessibility to navigate your devices, or just wish the OS could autonomously handle the more fiddly parts of your in-app workflows.

External drive deals on Amazon

FTC: We use income earning auto affiliate links. More.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Can’t sleep? The 4 exercises proven to beat insomnia and improve sleep quality
Next Article RCS chats not connecting? Google Messages may soon make it easier to diagnose (APK teardown)
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Sky customers unlock summer freebie on their account today that kids will love
News
UNC6148 Backdoors Fully-Patched SonicWall SMA 100 Series Devices with OVERSTEP Rootkit
Computing
Digital occupation: Pro-Russian bot networks target Ukraine’s occupied territories on Telegram
News
You could soon copyright your face, body and voice in Denmark – here’s why
News

You Might also Like

News

Sky customers unlock summer freebie on their account today that kids will love

3 Min Read
News

Digital occupation: Pro-Russian bot networks target Ukraine’s occupied territories on Telegram

55 Min Read
News

You could soon copyright your face, body and voice in Denmark – here’s why

7 Min Read
News

UK retail giant Co-op confirms hackers stole all 6.5 million customer records | News

2 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?