By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Apple researchers develop local AI agent that interacts with apps – 9to5Mac
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Apple researchers develop local AI agent that interacts with apps – 9to5Mac
News

Apple researchers develop local AI agent that interacts with apps – 9to5Mac

News Room
Last updated: 2026/02/21 at 3:57 AM
News Room Published 21 February 2026
Share
Apple researchers develop local AI agent that interacts with apps – 9to5Mac
SHARE

Despite having just 3 billion parameters, Ferret-UI Lite matches or surpasses the benchmark performance of models up to 24 times larger. Here are the details.

A bit of background on Ferret

In December 2023, a team of 9 researchers published a study called “FERRET: Refer and Ground Anything Anywhere at Any Granularity”. In it, they presented a multimodal large language model (MLLM) that was capable of understanding natural language references to specific parts of an image:

Image: Apple

Since then, Apple has published a series of follow-up papers expanding the Ferret family of models, including Ferretv2, Ferret-UI, and Ferret-UI 2.

Specifically, Ferret-UI variants expanded on the original capabilities of FERRET, and were trained to overcome what the researchers defined as a shortcoming of general-domain MLLMs.

From the original Ferret-UI paper:

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities. Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate “any resolution” on top of Ferret to magnify details and leverage enhanced visual features.

Image: Apple
The original Ferret-UI study included an interesting application of the technology, where the user could talk to the model to better understand how to interact with the interface, as seen on the right.

A few days ago, Apple expanded the Ferret-UI family of models even further, with a study called Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents.

Ferret-UI was built on a 13B-parameter model, which focused primarily on mobile UI understanding and fixed-resolution screenshots. Meanwhile, Ferret-UI 2 expanded the system to support multiple platforms and higher-resolution perception.

By contrast, Ferret-UI Lite is a much more lightweight model, designed to run on-device, while remaining competitive with significantly larger GUI agents.

Ferret-UI Lite

According to the researchers of the new paper, “the majority of existing methods of GUI agents […] focus on large foundation models.” That is because “the strong reasoning and planning capabilities of large server-side models allow these agentic systems to achieve impressive capabilities in diverse GUI navigation tasks.”

They note that while there has been a lot of progress on both multi-agent, and end-to-end GUI systems, that take different approaches to streamline the many tasks that involve agentic interaction with GUIs (“low-level GUI grounding, screen understanding, multi-step planning, and self-reflection”), they are basically too large and compute-hungry to run well on-device.

So, they set out to develop Ferret-UI Lite, a 3-billion parameter variant of Ferret-UI, which “is built with several key components, guided by insights on training small-scale” language models.

Ferret-UI Lite leverages:

  • Real and synthetic training data from multiple GUI domains;
  • On-the-fly (or, inference-time) cropping and zooming-in techniques to better understand specific segments of the GUI;
  • Supervised fine-tuning and reinforcement learning techniques.

The result is a model that closely matches or even outperforms competing GUI agent models that are up to 24 times its parameter count.

Image: Apple

While the entire architecture (which is thoroughly detailed in the study) is interesting, the real-time cropping and zooming-in techniques are particularly noteworthy.

The model makes an initial prediction, crops around it, then re-predicts on that cropped region. This helps such a small model compensate for its limited capacity to process large numbers of image tokens.

Image: Apple

Another notable contribution of the paper is how Ferret-UI Lite basically generates its own training data. The researchers built a multi-agent system that interacts directly with live GUI platforms to produce synthetic training examples at scale.

There is a curriculum task generator that proposes goals of increasing difficulty, a planning agent breaks them into steps, a grounding agent executes them on-screen, and a critic model evaluates the results.

Image: Apple

With this pipeline, the training system captures the fuzziness of real-world interaction (such as errors, unexpected states, and recovery strategies), which is something that would be much more challenging to do while relying on clean, human-annotated data.

Interestingly, while Ferret-UI and Ferret-UI 2 used iPhone screenshots and other Apple interfaces in their evaluations, Ferret-UI Lite was trained and evaluated on Android, web, and desktop GUI environments, using benchmarks like AndroidWorld and OSWorld.

The researchers don’t note explicitly why they chose this route for Ferret-UI Lite, but it likely reflects where reproducible, large-scale GUI-agent testbeds are available today.

Be it as it may, the researchers found that while Ferret-UI Lite performed well on short-horizon, low-level tasks, it did not perform as strongly on more complicated, multi-step interactions, a trade-off that would be largely expected, given the constraints of a small, on-device model.

On the other hand, Ferret-UI Lite offers a local, and by extension, private (since no data needs to go to the cloud and be processed on remote servers) agent that autonomously interacts with app interfaces based on user requests, which, by all accounts, is pretty cool.

To learn more about the study, including benchmark breakdowns and results, follow this link.

Accessory deals on Amazon

Add 9to5Mac as a preferred source on Google
Add 9to5Mac as a preferred source on Google

FTC: We use income earning auto affiliate links. More.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article BYD to introduce low-cost EV to Europe: executive · TechNode BYD to introduce low-cost EV to Europe: executive · TechNode
Next Article Social isn’t enough — host your website for Social isn’t enough — host your website for $50
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

HBO Max just got this overlooked Emma Thompson thriller you (probably) missed in theaters — and it’s an intense battle for survival
HBO Max just got this overlooked Emma Thompson thriller you (probably) missed in theaters — and it’s an intense battle for survival
News
The 12 best Windows laptops for 2026, tested by us
The 12 best Windows laptops for 2026, tested by us
News
Talking Tom AI robot to launch before lunar new year · TechNode
Talking Tom AI robot to launch before lunar new year · TechNode
Computing
The New Windows Update Discontinued Support For A Bunch Of Popular Printers – BGR
The New Windows Update Discontinued Support For A Bunch Of Popular Printers – BGR
News

You Might also Like

HBO Max just got this overlooked Emma Thompson thriller you (probably) missed in theaters — and it’s an intense battle for survival
News

HBO Max just got this overlooked Emma Thompson thriller you (probably) missed in theaters — and it’s an intense battle for survival

6 Min Read
The 12 best Windows laptops for 2026, tested by us
News

The 12 best Windows laptops for 2026, tested by us

5 Min Read
The New Windows Update Discontinued Support For A Bunch Of Popular Printers – BGR
News

The New Windows Update Discontinued Support For A Bunch Of Popular Printers – BGR

5 Min Read
The Galaxy Fold 7 is proof that no one at Samsung actually opened the foldable
News

The Galaxy Fold 7 is proof that no one at Samsung actually opened the foldable

6 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?