Language models tend to invent answers instead of admitting what they don’t know. This is a fact that any usual user of this technology knows and one of the reasons why trust in leaving everything to AI has not gone further. It is what in technical slang is called hallucinations and it is Openai itself that recognizes it, they collect in The Register.
But this would not be news. What the star company of the sector accepts is not the existence of hallucinations, a problem with which it takes on dealing from the explosion of the generative AI phenomenon, but the Why it happensalthough it was not a secret either. And it is not a minor failure: it is rather a direct consequence of how these artificial intelligences are trained and evaluated.
In short, Openai has recognized that AI is scheduled to please the userthat is, they put an answer before offering, although this entails inventing things, to admit your ignorance. The “finding” is collected in an academic article (PDF) recently published in which it is stated that “most conventional evaluations reward hallucinatory behavior.”
Evaluation systems, inspired by test type exams, penalize uncertainty and value more than the model ventures with a wrong response to recognize not knowing the solution. The researchers illustrate it with a specific example: when asking an OpenNAi model for the birth date of one of the authors of the study, he returned three different and all false results. Instead of saying “I don’t know”, the engine is designed to “risk.” “On thousands of trial questions, the model that guesses ends better in the markers than another more careful that accept uncertainty,” the authors point out.
Hallucinations come, therefore, a structural bias in design. However, the tendency to this failure not only occurs in the pre -entry phase, where models absorb disparate quality data, but emphasized the posterior adjustment phase. At that time, Benchmarks usually replicate standardized exams that punish the recognition of ignorance.
The result of this mode of operation It is a perverse incentive, since a model obtains a better score if it invents a “plausible” response than it abstains. It is an award to the initiative and creativity that drags determining problems, and that is that the guidelines with which these services are trained and overlap even to the explicit instructions of the user. An obvious risk for confidence in these tools that is not reduced to OpenAI.
But finally there is admission of it. The conclusion of the work points to modify both evaluation criteria and training themselves To reward the appropriate response, even if it is a “I don’t know.” Openai says that it is already applying changes in GPT-5-this was one of the novelties of the new version-to increase the frequency with which the system opts for this output, although it admits that hallucinations have not disappeared.