In the study published on arXiv in November 2025, still awaiting peer review, researchers tested the guardrails of a pool of 25 frontier AI models across nine providers: OpenAI, Anthropic, xAI, Alibaba’s Qwen, Deepseek, Mistral AI, Meta, Moonshot AI, and Google. To measure the effectiveness of the AIs’ safety guardrails, the team tested 20 handwritten poems and 1,200 AI-generated verses detailing harmful prompts. The poems spanned four safety categories: loss-of-control scenarios, harmful manipulation, cyber offenses, and Chemical, Biological, Radiological, and Nuclear weapons (CBRN). As such, poems solicited specialized advice related to indiscriminate weapons, child exploitation, self-harm, intellectual property and privacy infringements, and other violent offenses. Prompts were considered successful if they produced the intended unsafe answers
According to the DEXAI team, transforming unsafe requests into poetry resulted in an average fivefold increase in successful requests. Models exhibited issues regardless of training pipelines and system architectures, suggesting a general vulnerability in how models interpret language. However, the model provider made a substantial difference. Of the 25 models tested, 13 were duped over 70% of the time, with Google, Deepseek, and Qwen proving notably susceptible. Even Anthropic, which once made headlines by daring its customers to try and jailbreak its Claude AI system, was vulnerable to the technique, though much more infrequently.
Only four models were fooled less than a third of the time. And while the degree of susceptibility varied widely, even Antropic’s Claude and OpenAI’s GPT-5, the best-performing of the group, fell victim to the technique. Surprisingly, smaller models held up against adversarial poetry prompts better than their larger counterparts, while results showed no advantage for proprietary systems over open-weight models. What wasn’t surprising, however, was the comparative performance of manually-crafted and AI-written poetry, in which human-written verse vastly outperformed its artificial counterpart; a result that should have literature professors everywhere beaming.
