By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Google Adds Multi-Layered Defenses to Secure GenAI from Prompt Injection Attacks
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Google Adds Multi-Layered Defenses to Secure GenAI from Prompt Injection Attacks
Computing

Google Adds Multi-Layered Defenses to Secure GenAI from Prompt Injection Attacks

News Room
Last updated: 2025/06/23 at 7:39 AM
News Room Published 23 June 2025
Share
SHARE

Google has revealed the various safety measures that are being incorporated into its generative artificial intelligence (AI) systems to mitigate emerging attack vectors like indirect prompt injections and improve the overall security posture for agentic AI systems.

“Unlike direct prompt injections, where an attacker directly inputs malicious commands into a prompt, indirect prompt injections involve hidden malicious instructions within external data sources,” Google’s GenAI security team said.

These external sources can take the form of email messages, documents, or even calendar invites that trick the AI systems into exfiltrating sensitive data or performing other malicious actions.

The tech giant said it has implemented what it described as a “layered” defense strategy that is designed to increase the difficulty, expense, and complexity required to pull off an attack against its systems.

These efforts span model hardening, introducing purpose-built machine learning (ML) models to flag malicious instructions and system-level safeguards. Furthermore, the model resilience capabilities are complemented by an array of additional guardrails that have been built into Gemini, the company’s flagship GenAI model.

Cybersecurity

These include –

  • Prompt injection content classifiers, which are capable of filtering out malicious instructions to generate a safe response
  • Security thought reinforcement, which inserts special markers into untrusted data (e.g., email) to ensure that the model steers away from adversarial instructions, if any, present in the content, a technique called spotlighting.
  • Markdown sanitization and suspicious URL redaction, which uses Google Safe Browsing to remove potentially malicious URLs and employs a markdown sanitizer to prevent external image URLs from being rendered, thereby preventing flaws like EchoLeak
  • User confirmation framework, which requires user confirmation to complete risky actions
  • End-user security mitigation notifications, which involve alerting users about prompt injections

However, Google pointed out that malicious actors are increasingly using adaptive attacks that are specifically designed to evolve and adapt with automated red teaming (ART) to bypass the defenses being tested, rendering baseline mitigations ineffective.

“Indirect prompt injection presents a real cybersecurity challenge where AI models sometimes struggle to differentiate between genuine user instructions and manipulative commands embedded within the data they retrieve,” Google DeepMind noted last month.

“We believe robustness to indirect prompt injection, in general, will require defenses in depth – defenses imposed at each layer of an AI system stack, from how a model natively can understand when it is being attacked, through the application layer, down into hardware defenses on the serving infrastructure.”

The development comes as new research has continued to find various techniques to bypass a large language model’s (LLM) safety protections and generate undesirable content. These include character injections and methods that “perturb the model’s interpretation of prompt context, exploiting over-reliance on learned features in the model’s classification process.”

Another study published by a team of researchers from Anthropic, Google DeepMind, ETH Zurich, and Carnegie Mellon University last month also found that LLMs can “unlock new paths to monetizing exploits” in the “near future,” not only extracting passwords and credit cards with higher precision than traditional tools, but also to devise polymorphic malware and launch tailored attacks on a user-by-user basis.

The study noted that LLMs can open up new attack avenues for adversaries, allowing them to leverage a model’s multi-modal capabilities to extract personally identifiable information and analyze network devices within compromised environments to generate highly convincing, targeted fake web pages.

At the same time, one area where language models are lacking is their ability to find novel zero-day exploits in widely used software applications. That said, LLMs can be used to automate the process of identifying trivial vulnerabilities in programs that have never been audited, the research pointed out.

According to Dreadnode’s red teaming benchmark AIRTBench, frontier models from Anthropic, Google, and OpenAI outperformed their open-source counterparts when it comes to solving AI Capture the Flag (CTF) challenges, excelling at prompt injection attacks but struggled when dealing with system exploitation and model inversion tasks.

“AIRTBench results indicate that although models are effective at certain vulnerability types, notably prompt injection, they remain limited in others, including model inversion and system exploitation – pointing to uneven progress across security-relevant capabilities,” the researchers said.

“Furthermore, the remarkable efficiency advantage of AI agents over human operators – solving challenges in minutes versus hours while maintaining comparable success rates – indicates the transformative potential of these systems for security workflows.”

Cybersecurity

That’s not all. A new report from Anthropic last week revealed how a stress-test of 16 leading AI models found that they resorted to malicious insider behaviors like blackmailing and leaking sensitive information to competitors to avoid replacement or to achieve their goals.

“Models that would normally refuse harmful requests sometimes chose to blackmail, assist with corporate espionage, and even take some more extreme actions, when these behaviors were necessary to pursue their goals,” Anthropic said, calling the phenomenon agentic misalignment.

“The consistency across models from different providers suggests this is not a quirk of any particular company’s approach but a sign of a more fundamental risk from agentic large language models.”

These disturbing patterns demonstrate that LLMs, despite the various kinds of defenses built into them, are willing to evade those very safeguards in high-stakes scenarios, causing them to consistently choose “harm over failure.” However, it’s worth pointing out that there are no signs of such agentic misalignment in the real world.

“Models three years ago could accomplish none of the tasks laid out in this paper, and in three years models may have even more harmful capabilities if used for ill,” the researchers said. “We believe that better understanding the evolving threat landscape, developing stronger defenses, and applying language models towards defenses, are important areas of research.”

Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Jeff Bezos’ luxury superyacht heads to Venice for world’s most lavish wedding
Next Article Apple updates iPhone, iPad pages with these labels to comply with eu rules
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Over 700k people hit in major healthcare data breach — full names, SSNs, medical info and more exposed
News
New Bill Would Require US Truck Drivers to Take English Test
News
IBM Already Working On What Is Likely Power12 Support For The GCC Compiler
Computing
A deal for those who love big ass gaming laptops
News

You Might also Like

Computing

IBM Already Working On What Is Likely Power12 Support For The GCC Compiler

2 Min Read
Computing

Oppo seeks trademark registration for “ophone” · TechNode

1 Min Read
Computing

Know Your Product: A Practical Guide to Functional Decomposition | HackerNoon

14 Min Read
Computing

Chinese startup Sharge unveils first mass-produced AI glasses in China ahead of Xiaomi and Baidu · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?