Private AI Compute Enables Google Inference With Hardware Isolation And Ephemeral Data Design

Google announced Private AI Compute, a system designed to process AI requests using Gemini cloud models while aiming to keep user data private. The company describes Private AI Compute as a technology built to “unlock the full speed and power of Gemini cloud models for AI experiences” and claims it “allows you to get faster, more helpful responses, making it easier to find what you need, get smart suggestions and take action.” The announcement positions Private AI Compute as Google’s approach to addressing privacy concerns while providing cloud-based AI capabilities, building on what the company calls privacy-enhancing technologies (PET) it has developed for AI use cases.

Google designed Private AI Compute with multiple layers of protection for processing. The system uses an AMD-based hardware Trusted Execution Environment (TEE) for CPU and TPU workloads to “encrypt and isolate memory and processing from the host.” The company expanded its Titanium Hardware Security Architecture to TPU hardware starting with the sixth-generation Google Cloud TPU, known as Trillium, to meet Private AI Compute’s requirements. The architecture also establishes encrypted communication channels between trusted nodes using protocols including Noise and Application Layer Transport Security (ALTS). Google attests trusted nodes to verify their integrity as part of establishing these encrypted channels, which the company says shields user data from broader Google infrastructure.

Source: Private AI Compute Chain of Trust

Private AI Compute includes protections designed to address privileged access misuse. The system operates on an ephemeral basis, where “inputs, model inferences, and computations are only kept as long as needed to fulfill the user’s query,” which Google says prevents attackers from accessing past data. Key services run on a confidential computing platform based on AMD’s hardware Trusted Execution Environment (TEE), with frontend services running in confidential virtual machines. Google states this approach protects the workload in a guest virtual machine from the host and verifies code through attestation. The system also uses IP-blinding relays operated by third parties to tunnel traffic to Private AI Compute. Google claims this removes the ability to link a user’s IP address or network identifying information to specific queries.

Private AI Compute allows on-device features to access extended capabilities while maintaining privacy protections. Google states the technology makes Magic Cue “more helpful with more timely suggestions” on the latest Pixel 10 phones. The Recorder app on Pixel uses Private AI Compute to “summarize transcriptions across a wider range of languages,” according to the company.

Private AI Compute reflects a broader industry trend toward privacy-focused AI systems. Apple’s Private Cloud Compute and Meta’s Private Processing pursue similar objectives of offloading AI workloads to the cloud while implementing cryptographic and hardware-based protections.

One commenter on Hacker News noted that

there are a few research papers detailing how Trusted Execution Environments can be attacked—aside from the obvious risk that the TEE manufacturer holds the keys and could, if compelled or willing, share access with others.

NCC Group, serving as an external auditor, validated that Private AI Compute’s system design meets privacy and security guidelines. The audit included an architecture review of the Private AI Compute system, a cryptography security assessment of the Oak Session Library, and a security analysis of the IP-blinding relay.

Developers interested in private AI inference can explore OpenPCC, an open-source framework available on GitHub. The repository offers technical details for those looking to examine or experiment with private AI architecture.