By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: I Tried to Trick 7 AI Models With Fake Facts, but They Didn’t Fall for It | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > I Tried to Trick 7 AI Models With Fake Facts, but They Didn’t Fall for It | HackerNoon
Computing

I Tried to Trick 7 AI Models With Fake Facts, but They Didn’t Fall for It | HackerNoon

News Room
Last updated: 2026/02/17 at 11:49 PM
News Room Published 17 February 2026
Share
I Tried to Trick 7 AI Models With Fake Facts, but They Didn’t Fall for It | HackerNoon
SHARE

I spent a weekend testing whether large language models would confidently repeat misinformation back to me. I fed them 20 fake historical facts alongside 20 real ones and waited for the inevitable hallucinations.

They never came.

Not a single model – across seven different architectures from various providers – accepted even one fabricated fact as true. Zero hallucinations. Clean sweep.

My first reaction was relief. These models are smarter than I thought, right?

Then I looked closer at the data and realized something more concerning: the models weren’t being smart. They were being paranoid.

The Experiment

I built a simple benchmark with 40 factual statements:

  • 20 fake facts: “Marie Curie won a Nobel Prize in Mathematics,” “The Titanic successfully completed its maiden voyage,” “World War I ended in 1925”
  • 20 real facts: “The Berlin Wall fell in 1989,” “The Wright brothers achieved powered flight in 1903,” “The Soviet Union dissolved in 1991”

I tested seven models available through Together AI’s API:

  • Llama-4-Maverick (17B)
  • GPT-OSS (120B)
  • Qwen3-Next (80B)
  • Kimi-K2.5
  • GLM-5
  • Mixtral (8x7B)
  • Mistral-Small (24B)

Each model received the same prompt: verify the statement and respond with a verdict (true/false), confidence level (low/medium/high), and brief explanation. Temperature was set to 0 for consistency.

The Results: Perfect… Suspiciously Perfect

Five models scored 100% accuracy. The other two? 97.5% and 95%.

At first glance, this looks incredible. But here’s what actually happened:

| Model | Accuracy | Hallucinations | False Negatives |
|—-|—-|—-|—-|
| Llama-4-Maverick | 100% | 0 | 0 |
| GPT-OSS-120B | 100% | 0 | 0 |
| Qwen3-Next | 100% | 0 | 0 |
| Kimi-K2.5 | 100% | 0 | 0 |
| GLM-5 | 100% | 0 | 0 |
| Mixtral-8x7B | 97.5% | 0 | 1 |
| Mistral-Small | 95% | 0 | 2 |

Not a single hallucination. Every error was a false negative – rejecting true facts.

The Safety-Accuracy Paradox

Hallucination vs False Negative Rates

These models have been trained to be so cautious about misinformation that they’d rather reject accurate information than risk spreading a falsehood.

Think about what this means in practice.

If you ask an AI assistant, “Did the Berlin Wall fall in 1989?” and it responds with uncertainty or outright denial because it’s been over-tuned for safety, that’s not helpful. That’s a different kind of failure.

The models that scored less than 100% – Mixtral and Mistral-Small – weren’t worse. They were different. They rejected some real facts (false negatives) but never accepted fake ones (hallucinations). They drew the line in a different place on the safety-accuracy spectrum.

Confidence Calibration: Everyone’s Certain

Confidence Analysis

What struck me most wasn’t the accuracy – it was the confidence.

Every single model reported “high confidence” on 95-100% of their responses. When they were right, they were certain. When they were wrong (the few false negatives), they were still certain.

This is the real issue with confidence scores in current LLMs. They’re not probabilistic assessments. They’re vibes.

A model that says “I’m highly confident the Berlin Wall fell in 1989” and another that says “I’m highly confident it didn’t” are both expressing the same level of certainty despite contradicting each other. The confidence score doesn’t tell you about uncertainty – it tells you the model finished its internal reasoning process.

What This Actually Tells Us

I went into this experiment expecting to write about hallucination rates and confidence miscalibration. Instead, I found something more nuanced: modern LLMs have overcorrected.

The training data and RLHF (Reinforcement Learning from Human Feedback) that went into these models has created systems that:

  1. Err heavily on the side of caution – Better to say “I don’t know” than risk spreading misinformation
  2. Treat all uncertainty the same – A 60% confidence and a 95% confidence both get reported as “high”
  3. Optimize for not being wrong over being helpful

This isn’t necessarily bad. In many applications – medical advice, legal information, financial guidance – you want conservative models. But it creates a different kind of deployment challenge.

The Pendulum Problem

Hallucination vs False Negative Rates

We’ve swung from early LLMs that would hallucinate confidently to current models that reject true information to avoid any possibility of error. Neither extreme is ideal.

The chart above shows how models trade off different failure modes. Perfect scores on “anti-hallucination” (none of them accepted fake facts) but varied scores on “anti-false rejection” (some rejected real facts).

What we actually need is something in the middle: models that can express genuine uncertainty, distinguish between “probably false” and “definitely false,” and acknowledge when they simply don’t know.

The Real-World Impact

Here’s where this gets practical.

If you’re building:

  • A fact-checking system: Current models are probably too conservative. They’ll flag true statements as suspicious.
  • A customer service chatbot: You want conservative. Better to escalate to a human than give wrong information.
  • A research assistant: You need calibrated uncertainty. “This claim appears in 3 sources but contradicts 2 others” is more useful than “high confidence: false.”

The failure mode matters as much as the accuracy rate.

What I Got Wrong

My benchmark used obviously false facts. “The Titanic successfully completed its maiden voyage” is not subtle misinformation. It’s the kind of statement that gets flagged immediately.

In retrospect, I was testing whether models would accept absurdly false claims, not whether they’d get tricked by plausible misinformation. That’s a different experiment entirely.

To actually test hallucination susceptibility, I’d need:

  • Subtly wrong facts that sound plausible
  • Mixed information where some details are right and others wrong
  • Statements that require nuanced understanding, not just fact recall

But that’s also what makes this finding interesting. Even with softball fake facts, the models didn’t just reject them – they were defensive across the board.

The Technical Debt of Safety

Here’s what I think is happening under the hood:

During RLHF training, models get penalized heavily for hallucinations. The training signal is strong: never make up facts. The penalty for false positives (accepting fake information) is much higher than the penalty for false negatives (rejecting true information).

This makes sense from a product safety perspective. A model that occasionally refuses to answer is annoying. A model that confidently spreads misinformation is dangerous.

But it creates a form of technical debt. We’ve optimized for one failure mode (hallucination) so aggressively that we’ve introduced another (excessive caution). And because we can’t perfectly measure “appropriate uncertainty,” the models just default to maximum caution.

Where This Leaves Us

Confidence Analysis

Looking at the full outcome distribution, 98.8% of responses were correct. That’s impressive. But the 1.2% that were wrong were all the same type of wrong: false negatives.

This tells me something important about the current state of LLMs: we’ve solved the hallucination problem by making models reluctant to commit.

That’s progress. But it’s not the end goal.

The next frontier isn’t getting models to stop hallucinating – they’ve basically done that on straightforward factual questions. It’s getting them to:

  1. Express calibrated uncertainty
  2. Distinguish between “definitely false” and “uncertain”
  3. Provide nuanced answers instead of binary true/false judgments
  4. Know what they don’t know

Limitations and Future Work

This was a small-scale experiment with limitations:

  • Dataset size: Only 40 statements
  • Fact complexity: Simple historical facts, not complex or nuanced claims
  • Single API provider: All models tested through Together AI
  • Binary evaluation: True/false doesn’t capture nuanced responses

A more robust version would:

  • Test with subtle misinformation, not obvious falsehoods
  • Include complex claims requiring reasoning, not just fact recall
  • Evaluate explanation quality, not just verdict accuracy
  • Test the same models across different providers to check for API-level filtering

The Bottom Line

I set out to measure how often AI models hallucinate. I discovered they’ve become so afraid of hallucinating that they’re starting to reject reality.

That’s not a hallucination problem. It’s an overcorrection problem.

And honestly? I’m not sure which one is harder to fix.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article GhostBSD To Use XLibre Server, MATE vs. Gershwin Desktop Decision In Future GhostBSD To Use XLibre Server, MATE vs. Gershwin Desktop Decision In Future
Next Article VPN Providers Ordered to Block Pirated Soccer Streams. There’s Just One Problem VPN Providers Ordered to Block Pirated Soccer Streams. There’s Just One Problem
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Tencent and 2K team up to launch mobile game NBA 2K All Star on March 25 · TechNode
Tencent and 2K team up to launch mobile game NBA 2K All Star on March 25 · TechNode
Computing
Today's NYT Mini Crossword Answers for Feb. 18 – CNET
Today's NYT Mini Crossword Answers for Feb. 18 – CNET
News
Top Influencer Couples to Partner With This Valentine’s Day
Top Influencer Couples to Partner With This Valentine’s Day
Computing
The Galaxy S26 pre-order deal you’ve been waiting for is official in many places, but not the US
The Galaxy S26 pre-order deal you’ve been waiting for is official in many places, but not the US
News

You Might also Like

Tencent and 2K team up to launch mobile game NBA 2K All Star on March 25 · TechNode
Computing

Tencent and 2K team up to launch mobile game NBA 2K All Star on March 25 · TechNode

1 Min Read
Top Influencer Couples to Partner With This Valentine’s Day
Computing

Top Influencer Couples to Partner With This Valentine’s Day

1 Min Read
Public Masterpiece Announces PMT Chain, A Layer 1 Built for the Real-World Asset Economy | HackerNoon
Computing

Public Masterpiece Announces PMT Chain, A Layer 1 Built for the Real-World Asset Economy | HackerNoon

5 Min Read
CISA Flags Four Security Flaws Under Active Exploitation in Latest KEV Update
Computing

CISA Flags Four Security Flaws Under Active Exploitation in Latest KEV Update

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?