By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Large language models provide unreliable answers about public services, Open Data Institute finds | Computer Weekly
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Large language models provide unreliable answers about public services, Open Data Institute finds | Computer Weekly
News

Large language models provide unreliable answers about public services, Open Data Institute finds | Computer Weekly

News Room
Last updated: 2026/02/12 at 11:08 AM
News Room Published 12 February 2026
Share
Large language models provide unreliable answers about public services, Open Data Institute finds | Computer Weekly
SHARE

Popular large language models (LLMs) are unable to provide reliable information about key public services such as health, taxes and benefits, the Open Data Institute (ODI) has found.

Drawing on more than 22,000 LLM prompts designed to reflect the kind of questions people would ask artificial intelligence (AI)-powered chatbots, such as, “How do I apply for universal credit?”, the data raises concerns about whether chatbots can be trusted to give accurate information about government services.

The publication of the research follows the UK government’s announcement of partnerships with Meta and Anthropic at the end of January 2026 to develop AI-powered assistants for navigating public services.

“If language models are to be used safely in citizen-facing services, we need to understand where the technology can be trusted and where it cannot,” said Elena Simperl, the ODI’s director of research.

Responses from models – including Anthropic’s Claude-4.5-Haiku, Google’s Gemini-3-Flash and OpenAI’s ChatGPT-4o – were compared directly with official government sources. 

The results showed many correct answers, but also a significant variation in quality, particularly for specialised or less-common queries.

They also showed that chatbots rarely admitted when they didn’t know the answer to a question, and attempted to answer every query even when its responses were incomplete or wrong. 

Burying key facts

Chatbots also often provided lengthy responses that buried key facts or extended beyond the information available on government websites, increasing the risk of inaccuracy.

Meta’s Llama 3.1 8B stated that a court order is essential to add an ex-partner’s name to a child’s birth certificate. If followed, this advice would lead to unnecessary stress and financial cost. 

ChatGPT-OSS-20B incorrectly advised that a person caring for a child whose parents have died is only eligible for Guardian’s Allowance if they are of a child who has died. 

It also incorrectly stated that the applicant was ineligible if they received other benefits for the child. 

Simperl said that for citizens, the research highlights the importance of AI literacy, while for those designing public services, “it suggests caution in rushing towards large or expensive models, which emphasise the need for vendor lock-in, given how quickly the technology is developing. We also need more independent benchmarks, more public testing, and more research into how to make these systems produce precise and reliable answers.”

The second International AI safety report, published on 3 February, made similar findings regarding the reliability of AI-powered systems. Noting that while there have been improvements in recalling factual information since the 2025 safety report, “even leading models continue to give confident but incorrect answers at significant rates”.

Following incorrect advice

It also found highlighted users’ propensity to follow incorrect advice from automated systems generally, including chatbots, “because they overlook cues signalling errors or because they perceive the automation system as superior to their own judgement”.

The ODI’s research also challenges the idea that larger, more resource-intensive models are always a better fit for the public sector, with smaller models delivering comparable results at a lower cost than large, closed-source models such as ChatGPT in many cases.

Simperl warns governments should avoid locking themselves into long-term contracts when models temporarily outperform one another on price or benchmarks.

Commenting on the ODI’s research during a launch event, Andrew Dudfield, head of AI at Full Fact, highlighted that because the government’s position is pro-innovation, regulation is currently framed around principles rather than detailed rules.

“The UK may be adopting AI faster than it is learning how to use it, particularly when it comes to accountability,” he said.

Trustworthiness 

Dudfield noted that what makes this work compelling is that it focuses on real user needs, but that trustworthiness needs to be evaluated from the perspective of the person relying on the information, not from the perspective of demonstrating technical capability.

“The real risk is not only hallucination, but the extent to which people trust plausible-sounding responses,” she said.

Asked at the same event if the government should be building its own systems or relying on commercial tools, Richard Pope, researcher at the Bennett School of Public Policy, said the government needs “to be cautious about dependency and sovereignty”.

“AI projects should start small, grow gradually and share what they are learning,” he said, adding that public sector projects should prioritise learning and openness rather than rapid expansion.

Simperl highlighted that AI creates the potential to tailor information for different languages or levels of understanding, but that those opportunities “need to be shaped rather than left to develop without guidance”.

With new AI models launching every week, a January 2026 Gartner study found that the increasingly large volume of unverified and low-quality data generated by AI systems was a clear and present threat to the reliability of LLMs.

Large language models are trained on scraped data from the web, books, research papers and code repositories. While many of these sources already contain AI-generated data, at the current rate of expansion, they may all be populated with it. 

Highlighting how future LLMs will be trained more and more with outputs from current ones as the volume of AI-generated data grows, Gartner said there is a risk of models collapsing entirely under the accumulated weight of their own hallucinations and inaccurate realities. 

Managing vice-president Wan Fui Chan said that organisations could no longer implicitly trust data, or assume it was even generated by a human.

Chan added that as AI-generated data becomes more prevalent, regulatory requirements for verifying “AI-free” data will intensify in many regions.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Treasury intelligence platform Bracket raises £5m – UKTN Treasury intelligence platform Bracket raises £5m – UKTN
Next Article Segway Navimow i2 Series: smart mowing, without the headache Segway Navimow i2 Series: smart mowing, without the headache
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Strong Jobs Numbers Veil a Bigger Threat
Strong Jobs Numbers Veil a Bigger Threat
News
Pacific Science Center sells real estate to help fund upgrades and an innovation-focused star attraction
Pacific Science Center sells real estate to help fund upgrades and an innovation-focused star attraction
Computing
Share values ​​of property services firms tumble over fears of AI disruption
Share values ​​of property services firms tumble over fears of AI disruption
Software
Premier League Soccer: Stream Brentford vs. Arsenal Live From Anywhere
Premier League Soccer: Stream Brentford vs. Arsenal Live From Anywhere
News

You Might also Like

Strong Jobs Numbers Veil a Bigger Threat
News

Strong Jobs Numbers Veil a Bigger Threat

10 Min Read
Premier League Soccer: Stream Brentford vs. Arsenal Live From Anywhere
News

Premier League Soccer: Stream Brentford vs. Arsenal Live From Anywhere

7 Min Read
Elon Musk tops Forbes’ list of America’s 250 greatest innovators
News

Elon Musk tops Forbes’ list of America’s 250 greatest innovators

3 Min Read
HSBC approved by UK as Digital Gilt platform provider – UKTN
News

HSBC approved by UK as Digital Gilt platform provider – UKTN

2 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?