Google DeepMind has released a new paper outlining its approach to safety and security in the development of artificial general intelligence (AGI). AGI refers to AI systems that are as capable as humans at most cognitive tasks. The company anticipates that AGI, especially when integrated with agentic capabilities, could soon enable autonomous reasoning, planning, and execution of tasks.
The paper describes a systematic approach to managing four key risk areas: misuse, misalignment, accidents, and structural risks. The company emphasizes a focus on mitigating misuse, where AI systems could be deliberately used for harmful purposes, and misalignment, where AI systems might pursue goals not intended by humans.
DeepMind is working on various strategies to prevent misuse, such as restricting access to dangerous capabilities, implementing stronger security measures to protect model weights, and developing a cybersecurity evaluation framework. The company is also researching threat modeling to identify critical capability thresholds that require enhanced security.
To address misalignment, DeepMind is exploring methods to ensure AI systems accurately follow human instructions without taking unintended shortcuts. Techniques include amplified oversight, where AI systems help evaluate the quality of their own outputs, and robust training practices to prepare AI systems for a wide range of real-world scenarios. Monitoring mechanisms are being developed to detect and flag unsafe actions taken by AI systems.
Research into interpretability and transparency is also underway to make AI decision-making processes more understandable. For example, the company is working on methods like Myopic Optimization with Nonmyopic Approval (MONA) to maintain transparency even as AI systems develop long-term planning abilities.
The AGI Safety Council at DeepMind, led by Co-Founder Shane Legg, is tasked with analyzing risks and recommending safety practices. The council collaborates with internal teams and external organizations, including nonprofits such as Apollo and Redwood Research, to incorporate broader perspectives on safety.
DeepMind is also engaging with governments, civil society groups, and industry organizations to promote collaboration on AI safety standards. Efforts include contributing to international policy discussions and participating in joint safety initiatives through groups like the Frontier Model Forum.
Senior director of AI safety & alignment at Google DeepMind Anca Dragan, posted on X:
The goal with this was primarily to have a systematic breakdown of what we need to do and common ground among the team. Ofc an AGI safety approach is near-impossible to write, so there’s plenty I’d still change + our understanding of this keeps evolving and updating.
Meanwhile, CTO at Aligned Outcomes Tom Bielecki commented:
AI Safety has a narrative problem. You’ve got this incredible engine in a high speed arms race. Yet they’re positioning the “steering, suspension, and brakes” as a necessary evil, regulatory requirement, or risk management. It’s not OSHA, it’s F1 engineering. Hell, they even turned Deepmind into a three letter acronym. Get some branding folks onboard to reframe this all as a performance enabling grand engineering frontier.
Deppmind states that ongoing research, collaboration, and careful preparation will be necessary to ensure the responsible development of AGI.