How zero-cost, authoritative-sounding text is breaking institutional accountability
On January 16, the chief constable of West Midlands Police in the United Kingdom stepped down after admitting that an AI tool had been used to help draft an official safety assessment that cited a football match that never took place. The report was used to justify banning supporters of Maccabi Tel Aviv from attending a Europa League match against Aston Villa in November 2025. Embedded in the assessment was a reference to a fixture between Maccabi Tel Aviv and West Ham that did not exist.
Under questioning, the chief constable acknowledged that part of the document had been drafted using Microsoft Copilot. Political attention followed fast: the home secretary said she no longer had confidence in his leadership, and the region’s police and crime commissioner announced a public accountability hearing. What initially looked like a technical error quickly escalated into a leadership crisis. How could an official-sounding, finished-looking document enter a high-stakes decision process without clear verification?
Much of the public debate has focused on bias, judgment, and individual responsibility, but the episode points to a structural problem that has been developing for a few years now.
From human diligence to synthetic diligence
For decades, modern institutions relied on an implicit assumption: if a document existed – especially one that looked formal, reasoned, structured – someone had spent time producing it. Reports, legal filings, safety assessments, and policy briefings were costly to generate and even low-quality work required hours of human attention. That cost function created an informal but reliable signal of accountability.
But Generative AI breaks that assumption.
Draft-quality pieces can now be produced in seconds, arguments and citations included, and look convincing even when the underlying claims are entirely fabricated or misinterpreted. The issue in this case is not that automated systems sometimes hallucinate; humans make mistakes, too. The issue is that institutions have no scalable way to distinguish between text produced by a person reasoning through a problem and text produced by a model optimised to mimic that reasoning.
As the cost of producing authoritative-sounding text begins to approach zero, institutions begin to deal with synthetic work faster than they can verify it. Safety assessments, legal briefs, student essays, internal reports, and consultancy deliverables all start to look finished long before anyone has actually done any of the work implied by their appearance.
That fluency substitutes human judgment and verification becomes the new bottleneck.
Where failures are already visible
The West Midlands case is not an isolated one and similar failures are already forcing adjustments across institutions: courts, universities, government bodies, professional services, and even journalism have all been caught out.
Courts
Judges in several jurisdictions have sanctioned lawyers for submitting filings containing AI-generated, non-existent case law. In the United States, the Mata vs Avianca case led to fines after attorneys relied on fabricated citations produced by ChatGPT and legal analysis judged “gibberish.” In response, some federal judges, like Judge Brantly Starr in Texas, have introduced standing orders requiring lawyers to certify that they have personally reviewed and verified any AI-assisted content. Courts in England and Wales have issued warnings that submitting fictitious AI-generated case law may amount to professional misconduct or contempt. These measures are not bans on AI tools; they are attempts to re-establish a clear line of human accountability in the court record.
Universities
Higher education institutions face a similar verification problem. Many have concluded that detecting AI use in take-home assignments is unreliable. One student said to ABC News it’s difficult, explaining: “We’re looking at the same piece of legislation, we’re quoting the same cases, we’re looking at the same issues,” while another gave up: ‘”I just decided to take the punishment because I was simply too scared to argue further.”
Some departments have reintroduced handwritten or supervised exams, expanded oral assessments, and shifted evaluation into in-person settings. Oxford’s Faculty of Medieval and Modern Languages reinstated closed-book, handwritten exams. The University of Sydney now treats unauthorised AI use as an academic integrity violation and has tightened assessment design accordingly. Regulators in Australia have explicitly advised universities to move away from assessment formats where authorship cannot be reliably established.
Public bodies
Governments are beginning to formalise disclosure and auditability requirements for algorithmic tools. In the UK, the Algorithmic Transparency Recording Standard (ATRS) requires public bodies to document and publish information about the automated systems they deploy. The government’s AI Playbook emphasises accountability, human oversight, and transparency in public-sector AI use. And, at the European level, the EU AI Act introduces obligations to disclose AI-generated or manipulated content in certain contexts. These frameworks are early attempts to ensure that official decisions can later be scrutinised – not just for what they say, but for how they were produced.
Private sector n
The private sector is encountering the same problem, often with direct financial consequences. In Australia, Deloitte produced a AU$440,000 review for the Department of Employment and Workplace Relations that was later found to contain fabricated references and made-up court quotes. The firm acknowledged that parts of the report were drafted using a generative AI toolchain and refunded a portion of its fee after the errors were exposed. The model behaved as designed: it generated plausible text. What failed was the workflow. AI-assisted output passed through internal checks and into a government decision environment without adequate human verification.
Similar episodes have surfaced elsewhere. Media outlets including CNET and MSN have retracted AI-generated articles containing factual errors or fake bylines. Air Canada was held liable after its website chatbot gave a customer incorrect information about refund eligibility. In academic publishing, papers have been found to include fabricated citations linked to automated text generation.
Across these cases, we see a consistent pattern. Institutions assumed that efficiently produced text was a reliable signal of underlying work. But that assumption no longer holds.
Why institutions are adding friction
The emerging responses – manual attestation, in-person assessment, disclosure requirements, limits on undeclared AI use – can look like resistance to innovation. It is not. It is an attempt to restore a basic institutional function we still rely on: linking text to responsibility.
When verification capacity is scarce, adding friction is rational rather than Luddite. If an organisation can generate more documents than anyone can realistically check, it accumulates decisions that no one can truly own. Over time, that erodes internal trust and external legitimacy: colleagues stop believing that reports reflect real expertise. Courts, regulators, and the public lose confidence that official records rest on accountable judgment.
The West Midlands episode illustrates this dynamic clearly. The political fallout was not caused solely by an incorrect reference. It was caused by the revelation that a document carrying real consequences had entered an official process without anyone being able to say, with confidence, who – if anyone – had verified it.
The structural change coming
Generative AI does not simply make institutions faster. It changes what is scarce: production is now abundant, verification is not.
And that shift requires a redesign of institutional workflows. Provenance – how a document was produced, who edited it, who checked it, and who stands behind it – now needs to become explicit rather than assumed. Some categories of work will need clear boundaries where identifiable human authorship remains non-negotiable. Others may accommodate automation, but only within review limits that match available oversight.
This is not a temporary adjustment. Synthetic diligence is cheap and convincing, and failures like the one in West Midlands are likely to continue to happen. Each event will test public trust – in AI tools and, more importantly, in institutions and their safeguards.
The institutions that adapt will be those that accept a slower, more verification-centric mode of operation in high-stakes contexts. Those that don’t will continue to produce documents that look finished – until the moment they are forced to explain who actually did the work.
:::info
Lead image credit: AdobeStock |132785912
:::
