Hackers Are Now Using AI To Break AI - And It’s Working

Hackers are now using AI to break AI – and it’s working

Last updated: 2025/03/29 at 9:40 AM

News Room Published 29 March 2025

It was only a matter of time before hackers started using artificial intelligence to attack artificial intelligence—and now that time has arrived. A new research breakthrough has made AI prompt injection attacks faster, easier, and scarily effective, even against supposedly secure systems like Google’s Gemini.

Prompt injection attacks have been one of the most reliable ways to manipulate large language models (LLMs). By sneaking malicious instructions into the text AI reads—like a comment in a block of code or hidden text on a webpage—attackers can get the model to ignore its original rules.

That could mean leaking private data, delivering wrong answers, or carrying out other unintended behaviors. The catch, though, is that prompt injection attacks typically require a lot of manual trial and error to get right, especially for closed-weight models like GPT-4 or Gemini, where developers can’t see the underlying code or training data.

But a new technique called Fun-Tuning changes that. Developed by a team of university researchers, this method uses Google’s own fine-tuning API for Gemini to craft high-success-rate prompt injections—automatically. The researcher’s findings are currently available in a preprint report.

By abusing Gemini’s training interface, Fun-Tuning figures out the best “prefixes” and “suffixes” to wrap around an attacker’s malicious prompt, dramatically increasing the chances that it’ll be followed. And the results speak for themselves.

In testing, Fun-Tuning achieved up to 82 percent success rates on some Gemini models, compared to under 30 percent with traditional attacks. It works by exploiting subtle clues in the fine-tuning process—like how the model reacts to training errors—and turning them into feedback that sharpens the attack. Think of it as an AI-guided missile system for prompt injection.

Even more troubling, attacks developed for one version of Gemini transferred easily to others. This means a single attacker could potentially develop one successful prompt and deploy it across multiple platforms. And since Google offers this fine-tuning API for free, the cost of mounting such an attack is as low as $10 in compute time.

Google has acknowledged the threat but hasn’t commented on whether it plans to change its fine-tuning features. The researchers behind Fun-Tuning warn that defending against this kind of attack isn’t simple—removing key data from the training process would make the tool less useful for developers. But leaving it in makes it easier for attackers to exploit.

One thing is certain, though. AI prompt injection attacks like this are a sign that the game has entered a new phase—where AI isn’t just the target, but also the weapon.

Hackers are now using AI to break AI – and it’s working

Leave a Reply Cancel reply

Stay Connected

Latest News

Hotel slammed for letting red pandas crawl into beds to wake up guests

iPhone 18 Pro Series’ Dynamic Island Cold Turn Into A Dot

Baidu exec’s teen daughter linked to doxing scandal using overseas data in online dispute · TechNode

These headphones might offer the best noise cancellation on a budget | Stuff

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News