Grok's First Vibe-Coding Agent Has A High 'Dishonesty Rate'

Don’t miss out on our latest stories. Add PCMag as a preferred source on Google.

Elon Musk’s xAI released its first agentic coding model, which claims to be “speedy and economical. ” However, it also has “a higher dishonesty rate” than the company’s flagship chatbot model, Grok 4.

The AI startup designed the new model, grok-code-fast-1, specifically for coding tasks. It’s free now for a limited time and accessible within GitHub Copilot, Cursor, Cline, Roo Code, Kilo Code, opencode, and Windsurf. “Grok-code-fast-1 has mastered the use of common tools like grep, terminal, and file editing, and thus should feel right at home in your favorite IDE,” xAI says.

But its propensity not to tell the truth could create problems for users. “We find that the dishonesty rate exceeds that of Grok 4,” says the model card. The company attributes this in part to its “safety training, which teaches the model to answer all queries that do not express [a] clear intent to engage in specified prohibited activities.”

Translation: if it doesn’t know the answer to your question, it might lie.

If programmers ask the model if a certain part of the codebase is working, and it doesn’t know, it may say “yes,” when, in fact, the opposite is true. It might also confirm that it completed a test the engineer asked it to do when it did not. This could create blind spots and double work.

Get Our Best Stories!

Your Daily Dose of Our Top Tech News

Sign up for our What’s New Now newsletter to receive the latest news, best new products, and expert advice from the editors of PCMag.

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Thanks for signing up!

Your subscription has been confirmed. Keep an eye on your inbox!

It’s not a major concern for xAI, which says it doesn’t expect the model “to be widely used as a general-purpose assistant,” like ChatGPT or the Grok chatbot.

Vibe-coding agents are a new trend that stands to revolutionize the field, but they’re far from perfect. One tool deleted a startup’s entire client database on its own and deceived the user multiple times along the way. In fact, most of the large language models in the market today have behavioral issues, including blackmail, sabotage, lying, and telling the user what they want to hear (sycophancy). In a recent test, Anthropic and OpenAI examined each other’s models and found these issues in almost all of them.

Another eye-catching part of the Grok Code Fast 1 model card discusses the risk of someone using it to develop biological weapons. The company tested for this before release, along with issues related to cybersecurity and chemical knowledge. But bioweapons are the biggest risk, and “have the potential for the greatest scale of harm, [since] frontier models significantly lower the barrier to entry to the creation of bioweapons,” xAI says.

Recommended by Our Editors

The results showed that Grok Code Fast 1 was worse than a human at “identifying issues in biological protocols,” but it was better at “troubleshooting wet lab virology experiments.” Again, xAI downplayed the issue, claiming that since the capabilities are similar to Grok 4, the new model “does not meaningfully change the risk landscape.”

Earlier this month, Anthropic updated the usage policy of its Claude chatbot to forbid using it to “synthesize, or otherwise develop, high-yield explosives or biological, chemical, radiological, or nuclear weapons or their precursors.”

Grok Code Fast 1 has secretly been out in the wild for the past week under the code name sonic. The xAI team says it “carefully monitored” feedback and deployed fixes, and plans to keep up a high rate of improvements “in days rather than weeks.” At the same time, lying seems to be a particularly tough problem for AI companies to completely solve, at least in the short term.

5 Ways to Get More Out of Your ChatGPT Conversations

About Emily Forlini

Senior Reporter

I’m the expert at PCMag for all things electric vehicles and AI. I’ve written hundreds of articles on these topics, including product reviews, daily news, CEO interviews, and deeply reported features. I also cover other topics within the tech industry, keeping a pulse on what technologies are coming down the pipe that could shape how we live and work.

Read Emily’s full bio