The White House’s recent executive orders on artificial intelligence and the launch of America’s AI Action Plan have made one thing clear: The United States intends to lead the world in AI.
To be sure, the policy direction is ambitious: Accelerate innovation, strengthen infrastructure and ensure fairness and safety. But as important as these goals are, rules and principles alone cannot deliver trust.
The next frontier in AI governance is not about writing more rules. It’s about making them work.
What policymakers need to do now is ensure that they have reliable, independent evidence to know whether systems are actually meeting the standards they set. In other words, AI needs the same thing that financial markets and public health already depend on: reliable evaluation infrastructure.
Why rules aren’t enough
History shows us that regulation without verification rarely works. Imagine if Wall Street firms were allowed to audit their own books, or if pharmaceutical companies could approve their own drugs. The risks would be obvious and unacceptable. Yet, in AI today, much of the information policymakers see about model performance and safety comes straight from the companies developing those systems, leaving regulators dependent on the very firms they are meant to oversee.
Self-reporting, intentionally or not, creates structural blind spots. Developers have incentives to highlight strengths and minimize weaknesses, and even honest disclosures can leave out important context. The result is that policymakers may receive an incomplete or overly favorable picture of how systems behave. Because different companies disclose different things, selective reporting also undermines fair competition, rewarding opacity over openness.
A separate problem is that most evaluations are static. One-off benchmarks or demonstrations capture only a moment in time, often under narrow or idealized conditions. But AI systems evolve quickly, and their shortcomings — bias, drift or regressions — often surface only after deployment. Without continuous evaluation, regulators are left with snapshots that can’t keep pace with real-world change.
The question, then, is what credible oversight should demand.
What trust in AI looks like
Building trust in AI oversight doesn’t require reinventing the wheel. Other sectors already set the precedent: Financial markets rely on independent audits, medicine relies on clinical trials, and aviation relies on safety testing that continues long after certification. AI governance should be held to the same standard.
The first requirement is independence. Oversight must be based on information that does not come solely from the companies themselves: data that can be inspected, verified and trusted as neutral. Without that independence, even well-intentioned disclosures risk being selective or incomplete.
The second requirement is continuity. AI systems evolve quickly, and their performance often shifts once they are deployed in the wild. Benchmarks conducted at launch can’t capture how models change over time, or how they behave across different languages, domains and user needs. Effective oversight depends on visibility into those dynamics, not just snapshots.
Until governance reflects both independence and continuity, rules risk becoming symbolic: ambitious in principle, but unenforceable in practice.
What independent oversight delivers
Embedding neutral evaluation infrastructure into AI policy has three immediate benefits:
-
- Better evidence for regulators. Instead of relying on duplicative filings or unverified vendor reports, agencies can draw on independent, continuous evaluation to make faster, better-informed decisions.
- More confident adoption by industry. Enterprises and governments need assurance that AI systems are reliable before they deploy them at scale. Transparent evaluation provides that assurance, reducing the fear of hidden risks and accelerating responsible adoption.
- Public trust. People need to know that AI systems are tested against the challenges they face in the real world. Neutral evaluation grounded in diverse user and expert communities ensures that oversight reflects actual needs, not just technical claims.
The time for real governance is now
The executive orders issued this year — whether focused on removing barriers to innovation (EO 14179), accelerating data center infrastructure (EO 14318), or ensuring neutrality in federal AI (EO 14319) — all highlight the same underlying need: an independent referee.
The U.S. can’t afford to wait. If oversight doesn’t keep up, risks will grow faster than our ability to manage them.
By embedding independent evaluation into procurement, export licensing and critical infrastructure oversight, policymakers can close the gap between ambition and enforcement.
The bottom line
AI policy is at a crossroads. The U.S. has set bold goals, but without reliable evaluation, those goals risk becoming little more than rhetoric. Rules set the direction. Proof provides the trust.
If the U.S. wants to lead not just in AI innovation but in AI governance, it must treat independent evaluation as essential civic infrastructure. The opportunity is not to draft new regulations but to make existing ones enforceable, turning reasonable rules into practical criteria. Done right, evaluation strengthens innovation by ensuring that compliance does not become a bureaucratic burden, and it keeps both proprietary and open-source systems on a level playing field.
Just as financial audits underpin markets and safety tests underpin medicine, transparent evaluation must underpin AI. This is how America can innovate boldly, govern wisely and build the public confidence that this moment demands.
Ion Stoica is a professor in the EECS Department and the Xu Bao Chancellor Chair at the University of California at Berkeley, the director of Sky Computing Lab, co-founder and executive chairman of LMArena, Databricks and Anyscale, and co-founder of Conviva. He wrote this article for News.
Image: News/Reve
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
