Ryan Haines / Android Authority
For a brief period last year, it seemed that AI-powered gadgets like the Rabbit R1 were going to be the next big thing. People were fascinated by the idea of replacing their smartphones with tiny wearable boxes they could talk to, but unfortunately, these gadgets failed to live up to the hype. They failed for a variety of reasons — like price, redundancy, and utility — but the seeds they planted for a future of fully automated apps stuck around.
These seeds grew into the “agentic AI” trend we see today. Companies are racing to build AI products that can perform tasks on your behalf, like helping you code a new project, booking an appointment, or ordering items online. As one of the leaders in the AI race, Google is also working on its own AI agents, with Gemini in Chrome being its most notable offering.
Gemini in Chrome, of course, can only perform actions within the browser and not in other applications. If you want to automate Android apps, your options are limited to third-party tools like Tasker, which often have a steep learning curve. Even then, these tools must be meticulously configured to take specific actions in predetermined apps. Unlike newer AI agents, these existing automation tools can’t perform generalized tasks from a single, natural language prompt.
That’s why Project Astra, Google’s experimental universal AI project, is so exciting. At Google I/O, the company showed off a version of Astra that can control your Android phone. In the demo, the assistant found a document online, scrolled through it to find specific information, and then searched YouTube for related videos — all completely hands-free. To accomplish this, Astra recorded the screen for analysis and then sent tap or swipe inputs to launch apps or scroll through pages.
Google’s demo showed the immense potential for an AI agent that can perform tasks in Android apps, but it also revealed that the company still has a lot of work to do. For starters, the parts of the video featuring the AI agent were sped up 2x, suggesting it’s quite slow. This wasn’t an issue in the scenario concocted for the demo, where the user clearly had their hands full, but it will be a problem in the real world. A slow agent means your phone will be occupied while it works, and common interruptions like a notification, an incoming call, or an alarm could disrupt the process by interfering with its screen analysis or inputs.
The purpose of the I/O demo was simply to show off Project Astra’s capabilities rather than detail how an on-device AI agent would actually work. Google had to hack together a prototype that took advantage of existing Android APIs in unintended ways — the MediaProjection API for screen recording and the Accessibility API for screen input — which resulted in the issues mentioned above.
Over the past few months, however, Google has been working on a new, standardized framework for AI agents to control Android apps. This framework, called Computer Control, is designed to enable automated control of Android apps in the background, sidestepping those problems. Although Google probably won’t announce Computer Control until next year’s Android 17 release, we managed to uncover a lot of information about it by digging through Android code. Here’s what we know so far.
You’re reading the Authority Insights Newsletter, a weekly newsletter that reveals some new facet of Android that hasn’t been reported on anywhere else. If you’re looking for the latest scoops, the hottest leaks, and breaking news on Google’s Android operating system and other mobile tech topics, then we’ve got you covered.
Subscribe here to get this post delivered to your email inbox every Saturday.
How Google’s Computer Control feature enables automated control of Android apps
With the release of Android 13 in 2022, Google introduced a new system-level service called the Virtual Device Manager (VDM). This service enables the creation of virtual displays that are separate from the primary, visible display the user sees. Apps can be launched on these virtual displays and then streamed to a remote device. In turn, that remote device can send back input events like clicks or keyboard presses for the system to inject into the app.
VDM already forms the backbone of the App Streaming feature in Chrome OS, which lets you stream an app from your Android phone to your Chromebook. It also enables the Connected Camera feature on Pixel devices, as Google quietly upgraded the service in last year’s release to support virtual cameras. The service plays a key role in Google’s long-term efforts to improve cross-device compatibility, and it seems the company now aims to leverage it to power its new Computer Control feature.
Google has added code for starting a “Computer Control Session” to facilitate the automated control of Android apps. Each session involves a single trusted virtual display that hosts the app being automated, as well as virtual input devices for sending touch and key events.
Client apps using the Computer Control framework must specify the virtual display’s properties, including its name, height, width, and density. They must also specify whether the display should remain unlocked and interactive even when the host device is locked — a crucial feature for truly unattended control. (However, the device must first be unlocked for an automation session to be initiated.)
Furthermore, client apps must set an output surface for the virtual display’s content. This allows them to access the raw display frames, which can then be streamed to a remote connected device for analysis.
Another crucial component of this framework is its ability to mirror the trusted virtual display onto a separate, interactive virtual display. This interactive display can have different dimensions than the original, and the system will automatically map input events between them. This architecture allows users to see and manually interact with the app being automated, crucially without interfering with the automation process itself.
This separation is key. If the original trusted virtual display were mirrored directly to a user — on a connected PC, for example — then resizing the window could break the automation, as changing display dimensions can trigger a configuration change that forces apps to restart. By creating a second, interactive display that mirrors the trusted one, users can remotely view and send inputs to the app without disrupting the process.
Given the sensitive nature of this framework, access is restricted to highly privileged, trusted applications. To use it, an app must first hold the new ACCESS_COMPUTER_CONTROL
permission. This permission can only be held by apps signed with a digital certificate that has been explicitly allowlisted in the OS. Apps holding this permission must then ask the user to explicitly approve their use of the Computer Control feature; this approval can be granted for a single session or for all future sessions.
Mishaal Rahman / Android Authority
Once an app’s request to start a Computer Control session has been granted, the framework can restrict it from launching or interacting with apps other than the one being automated. This protects from automation sessions being exploited to access other sensitive apps on the device.
In practice, this means only select apps trusted by Google or your device’s manufacturer will have access to the Computer Control framework. Regular apps won’t be able to discreetly launch and control other apps in the background without your knowledge.
Computer Control — A new level of automation for Android apps?
While we can conclude that the Computer Control framework is designed to let trusted clients analyze screen data and automate tasks on your behalf, there’s still a lot we don’t know. For example, it’s unclear how exactly these clients will control apps.
Does the “computer” in “Computer Control” literally refer to a PC, suggesting Google plans to stream apps to a remote PC or server for automation? This approach would be similar to how the Rabbit R1 works, though in that case, the apps ran entirely on Rabbit’s servers. Or does “computer” mean a robot in a more general sense, where an on-device AI model analyzes the screen and performs actions locally? The former is more likely given the architecture of the Computer Control framework and where the code resides, but the latter is also certainly possible when using a multimodal model like Gemini Nano. This approach would also be more private, but it would consequently strain the device’s memory and battery more.
Whatever the case, we’re excited to see Google building a proper framework for true agentic AI on Android. The Computer Control framework opens the door to fully automating your apps, which is a big deal as it can not only save time but also dramatically improve accessibility. Of course, AI agents won’t always get things right. This is likely why Google included the ability to mirror the automation onto an interactive display, allowing users to supervise the process and make adjustments as needed.
Again, our understanding of this feature comes from code we examined in the latest Android build (i.e., Android 16 QPR2 Beta 2). We may have missed some details, and it’s unclear when Google plans to launch it. If we learn more about Computer Control, we’ll be sure to let you know — so consider subscribing to the Authority Insights Newsletter so you don’t miss a beat.
Want more?
Authority Insights is more than a newsletter — it’s the hub for all our best content. If you care about Android, you won’t want to miss any of our other exclusive reports.
Don’t have time to read them all? Subscribe to our Authority Insights Podcast to hear me and my co-host, C. Scott Brown, break down our top stories of the week.
This week’s top Authority Insights
The Pixel Watch 4 will bring new watch faces…and potentially hypertension alerts?!
Here are the 8 new watch faces coming with the Pixel Watch 4
Related
Google is preparing its own Apple-style hypertension alert system for the Pixel Watch
Related
Nano Banana, Aerial view, and more new features coming to Google apps
Google’s viral image generator is coming to AI Mode
Related
This upcoming Google Maps feature could make virtual exploration easier, and here’s a look
Related
Google Photos’ Collage tool is about to get way more flexible and easier to use
Related
Google Photos will soon let you choose how you want your photos animated
Related
Here’s the latest progress on how Google Messages @mentions are going to work
Related
Gemini’s home screen could soon get a Discovery-style redesign
Related
This simple change could make Gemini Scheduled Actions a joy to use
Related
Better late than never…
Android 16 QPR2 may let you flip your Pixel’s navigation bar like on Samsung phones (Update: Demo)
Related
Spotify is finally fixing one of its most frustrating shuffle problems
Related
Last minute leaks…
Samsung’s One UI 8.5 update is an even bigger deal than you thought
Related
Exclusive first look: This is the canceled Google Pixel 4 5G
Related
Other top stories
What the heck is going on with Samsung wearables?!
Galaxy Ring battery scare leaves user stranded and hospitalized
Related
Either Galaxy Watch users are suddenly sleeping better, or Samsung changed how sleep scores work
Related
Are you willing to give Google Home another chance?
Gemini for Home is here to transform how you use Google Home
Related
Fool me twice: Smart home users say they’re done trusting Google
Related
I used the new Google Home Speaker. Here’s why you should save your money
Related
Paying for Google Home Premium is an absolute non-starter for me
Related
The more we learn about Android’s sideloading restrictions, the more worried we get
Google’s new rules could wipe out sideloading and alternative app stores, F-Droid warns
Related
We finally know how Android’s new app verification rules will actually work
Related
Check out these new features!
Your Pixel phones can now easily connect to two earbuds for simultaneous audio
Related
The Galaxy Tab S11’s best new feature turns it into a powerful Linux computer
Related
The big Google Play Store update is out now, making space for ‘You’ and your gaming skills
Related
This Niagara Launcher feature should be standard across all Android phones
Related
Maybe it’s time to switch services?
I compared Apple Visual Intelligence to Circle to Search, and the winner won’t surprise you
Related
I’m finally quitting Duolingo after the latest controversial change
Related
Thank you for being part of our community. Read our Comment Policy before posting.