Researchers Identify Growing Trend of Chatbot Manipulation
Commonly, jailbreaking relies upon meticulous prompting to trick chatbots into providing responses that override their programming. All AI models have a primary and secondary goal – to follow the user’s instructions and avoid sharing information that is deemed to be harmful, biased, unethical, or illegal. Jailbreaking works by getting in between those two goals.
During their research, Rokach and Fire discovered a “universal jailbreak attack” that is able to exploit multiple leading AI chatbots. This allowed them to generate responses that would normally be refused, including how to hack computer networks or make drugs. Fire remarked: “It was shocking to see what this system of knowledge consists of.”
The researchers took their findings to several leading chatbot providers, but claimed that their responses were “often inadequate.” Alarmingly, many of the LLMs in question were still vulnerable to the attack seven months on from its discovery, with the original findings published online in late 2024.