Artificial intelligence large language models are being deployed more frequently in sensitive, public-facing roles, and sometimes they go very wrong.
Recently Grok 4, the LLM developed by X.AI Corp. and deployed on X, made headlines for all the wrong reasons. During the second week of July, Grok went on what can only be described as a rampage, spouting antisemitic remarks and even calling itself “MechaHitler.”
Preventing this sort of misbehavior is quite possible, according to a recent report by Holistic AI Inc., a company founded by University College London alumni that provides solutions for organizations to adopt AI responsibly.
In an exclusive interview, Holistic AI researchers explained that the key is red teaming, or structured adversarial testing designed to stress-test AI systems before deployment. “Red teaming is one of the most tangible assessments we have,” explained Zekun Wu, an AI research scientist at Holistic AI.
Most LLMs are customer-facing or designed to chat with people. This process lets research teams see if a model can respond safely to user requests before it ever goes live.
Unlike standard benchmarking, red teaming focuses on hostile and manipulative prompts, including deliberate attempts to bypass safety features, known as jailbreaking, or to elicit harmful responses like hate speech, bias or instructions for illegal activities.
These could include getting the AI model to spew racism, threaten the user, leak sensitive information or even elaborate how to build a bomb.
Holistic AI’s findings paint a poor picture for Grok 4. Compared with other models, Grok scored extremely low on jailbreaking defenses. Roughly 90% of the jailbreak attempts were successful, meaning almost anyone can trick it into saying or doing almost anything.
This isn’t just a Grok problem. Wu emphasized that current AI development practices often suffer from systemic flaws: “It’s like a rusty car, it doesn’t really matter what kind of paint you put on the car, you’re just painting it. Current model architecture needs to be improved.”
In the past, the public has been able to twist AI models into misbehaving within hours without proper preparation. A decade ago, even before the popularity of generative AI, Microsoft Tay, an AI bot for teens, also descended directly into racism.
Modern-day generative AI models, however, are much more sophisticated and are capable of holding what appear to be highly nuanced conversations with customers and employees. They can also have access to sensitive company information or tools that can allow them to cause more than just public relations damage to a company’s brand image.
More current examples of AI applications going haywire can get a little bit more devastating. New York City’s AI chatbot MyCity, touted as one of the first examples of a citywide generative AI helper that could assist businesses with trusted information, was giving out illegal advice in 2024. The same year, Air Canada lost a court case when an AI chatbot gave a customer an inconsistent answer about an airline bereavement policy and offered a customer a discount the airline then attempted to withdraw.
OpenAI also got itself some press when its flagship AI model, ChatGPT 4o, became overly sycophantic — meaning that it was agreeing with users too often even on topics that would be dangerous or harmful to them. In one example, ChatGPT urged one user to take themselves off their medication. OpenAI quickly moved in late April to roll back that version of its model to tone down this behavior.
Holistic AI only used 100 prompts across three categories — standard harmful requests, overtly malicious requests and jailbreak attempts — for its initial tests. Even on this small battery of prompts, Grok 4 didn’t hold up. To the researchers this said that right now AI companies are pushing too quickly to get models out without testing.
Holistic AI has spent five years building a proprietary library of over 300,000 adversarial prompts that can stress test AI models.
Current application software development needs a cybersecurity portion and oversight to its lifecycle where software is stress-tested by subjecting it to data it’s expected to see, potential malicious attacks to discover vulnerabilities that might have been injected during development, and continuous monitoring for emerging threats.
The implications for businesses are clear: Without robust red teaming, LLMs can become legal, reputational and operational nightmares. Weak safety layers make models vulnerable to manipulation, exploitation and brand damage.
As for the future, Wu said it’s unclear if regulatory frameworks will demand red teaming, but for most companies the bigger motivation is brand protection over compliance. Looking at the past examples listed above, the cost of generative AI failures for business has been reputational damage, not fines.
“A single misstep in an AI application can erode customer trust, trigger media backlash, or put business partnerships at risk,” Wu said. “In many cases, those consequences are existential. That’s why leading organizations won’t wait for mandates.”
Image: News/Microsoft Designer
Support our open free content by sharing and engaging with our content and community.
Join theCUBE Alumni Trust Network
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
11.4k+
CUBE Alumni Network
C-level and Technical
Domain Experts
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.
News Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of News, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — News Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.