By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: When chatbots say “I don’t know”: A statistical trick improves reliability
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Gadget > When chatbots say “I don’t know”: A statistical trick improves reliability
Gadget

When chatbots say “I don’t know”: A statistical trick improves reliability

News Room
Last updated: 2026/04/27 at 12:24 AM
News Room Published 27 April 2026
Share
When chatbots say “I don’t know”: A statistical trick improves reliability
SHARE

Researchers at MIT in Cambridge, USA, have developed a method that makes the self-awareness of language models mathematically measurable and therefore correctable. The process is called Reinforcement Learning with Calibration Rewards, or RLCR for short, and directly targets the cause of hallucinations in inferential models.

The systematic weakness of guessing

Previous training approaches for so-called reasoning models, such as those used in the development of OpenAI, have a systematic weakness. These models are traditionally trained to find correct answers without assessing the confidence of their judgment.

A team led by graduate student Mehul Damani and graduate student Isha Puri from MIT CSAIL finds that simple reward systems encourage guessing. Models receive the same reward whether they find an answer through logical deduction or just get lucky while guessing.

The Brier score as a corrective

To correct this behavior, the team uses the so-called Brier score as an additional component in the reward function. This statistic penalizes the deviation between the certainty given by the model and the actual correctness of the answer.

During training, the model not only learns the solution to a problem, but at the same time must provide a numerical assessment of its own uncertainty. An answer that is given with high confidence but is incorrect will result in a significant point deduction in training.

Recommended editorial content

Here you can find external content from TargetVideo GmbHwhich complement our editorial offering on . By clicking “Show content” you agree that we can show you content from. now and in the future TargetVideo GmbH may display on our pages. Personal data may be transmitted to third-party platforms.

Note on data protection

Unfortunately something went wrong…

At this point you will usually find external content from TargetVideo GmbHbut we were unable to retrieve your consent settings.
Reload the page or adjust your consent settings manually.

Significant reduction in the error rate

The study results published on the preprint server Arxiv show that the calibration error could be reduced by up to 90 percent using this method. What is particularly noteworthy is that the general accuracy of the models in the tasks does not suffer from the new honesty.

The researchers were able to demonstrate that conventional training methods actually actively worsen the models’ self-assessment while they become more powerful. “What is striking is that ordinary reinforcement learning not only does not improve calibration, but actively damages it,” explains Isha Puri in an MIT press release.

Added value through reflection

By integrating uncertainty analysis directly into the AI’s thought processes, information is created that goes far beyond decorative additions. According to the study, smaller models benefit particularly greatly when they have to explicitly reflect on their own ignorance.

However, a critical look at its practical suitability remains necessary, as the method slightly increases the computational effort during training. Even if the results are impressive, better calibration does not automatically mean that the model will no longer make errors.

A signal for human choice

It simply provides a more reliable signal for the moment when users should seek a second opinion. Especially in sensitive areas such as medicine or finance, this knowledge about what you don’t know could make the crucial difference.

Editorial recommendations

Previous attempts to increase the trustworthiness of AI through subsequent filters often proved inadequate compared to internal corrections. Instead, MIT’s method starts at the foundation of the learning process to ensure the integrity of the output from the start.

Whether this process will be implemented across the board also depends on the willingness of the development departments to integrate the additional complexity into their processes. In any case, the scientific basis for a more honest artificial intelligence has been laid with this work.

Top Article

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Workaholic: This is how workaholics get their lives back under control Workaholic: This is how workaholics get their lives back under control
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Workaholic: This is how workaholics get their lives back under control
Workaholic: This is how workaholics get their lives back under control
News
why prices continue to rise in April
why prices continue to rise in April
Mobile
Putting on a bear suit to defraud more than 1,000 in car insurance seemed believable. Until it stopped being
Putting on a bear suit to defraud more than $141,000 in car insurance seemed believable. Until it stopped being
Gaming
Has Claude become less intelligent? Anthropic finally reveals the truth
Has Claude become less intelligent? Anthropic finally reveals the truth
Computing

You Might also Like

How gig workers help humanoid robots
Gadget

How gig workers help humanoid robots

3 Min Read
Github project takes automation to the extreme in China
Gadget

Github project takes automation to the extreme in China

0 Min Read
Humanoid robot for ,400 on Aliexpress
Gadget

Humanoid robot for $4,400 on Aliexpress

5 Min Read
This is how you build your own dashboards
Gadget

This is how you build your own dashboards

6 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?