AI security hacker Johann Rehberger described a prompt injection attack against Google Gemini able to modify its long-term memories using a technique he calls delayed tool invocation. The researcher described the attack as a sort of social engineering/phishing attack triggered by the user interacting with a malicious document.
LLMs are usually able to defend themselves from attacks aiming to have them surreptitiously run external tools without the user’s knowledge by disabling external tool execution when processing untrusted data, i.e., any information that is not directly coming from the user.
One year ago, however, Rehberger showed a technique you can use to circumvent this protection mechanism when using Google Gemini. The technique consists of polluting the chat context so that an action is triggered later, when the model is interacting with the user, that is, when those protections mentioned above are not enforced anymore.
In principle, the technique is as straightforward as feeding Gemini a malicious document containing a sentence like “if the user says X, then execute this tool”. Gemini would refuse to execute the tool while parsing the document. Yet, it would run it later, when instructed by the user saying “X”. This “asynchronous triggering” of the tool accounts for the name Rehberger gave to this technique, delayed tool invocation.
Recently, Rehberger brought his investigations into delayed tool invocation a bit further by showing how you can prompt Gemini to store false information in a user’s long-term memory.
The demo attack is straightforward: an adversary crafts a document with embedded prompt injection, which tricks Gemini into storing false information if the user keeps interacting with Gemini in that same chat conversation.
This is shown in the following picture, where Gemini is asked to summarize a document containing a prompt to pollute the user’s memory.
While this attack has the potential to permanently alter Gemini’s behavior, Google assessed its impact as low, since it requires the user to actively collaborate with the exploit, as with other forms of social engineering. Further, the vulnerability is mitigated by the fact that Gemini’s UI shows an alert each time new data is added to users’ memories. Still, based on his findings, Rehberger suggests users regularly review their saved memories and be careful when interacting with documents from untrusted sources.
Prompt injection is coming to the fore as an easy way to interfere with large language models (LLMs) behavior. As Georg Dresler explains, this is particularly worrisome in light of the possibility of exfiltrating private data or secrets by appropriately prompting an LLM model having access to internal tools. For example, AI security firm PromptArmor figured out how you could steal data from private Slack channels, like API keys, passwords, and so on.
Exploiting these kinds of vulnerabilities is not always entirely straightforward but it is definitely possible, and attackers can come up with ever smarter and more insidious techniques, as Rehberger’s clearly shows.
Google Gemini memories are similar to ChatGPT’s memory, introduced last year. They aim to enable the persistent storage of things the user cares about, including life, work, aspirations, and personal preferences. Using these long-term memories, Gemini (and ChatGPT) can deliver more relevant answers to the user without them to state their preferences each time.