By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: QCon London 2026: Running AI at the Edge – Running Real Workloads Directly in the Browser
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > QCon London 2026: Running AI at the Edge – Running Real Workloads Directly in the Browser
News

QCon London 2026: Running AI at the Edge – Running Real Workloads Directly in the Browser

News Room
Last updated: 2026/03/23 at 12:54 PM
News Room Published 23 March 2026
Share
QCon London 2026: Running AI at the Edge – Running Real Workloads Directly in the Browser
SHARE

At QCon London 2026, James Hall presented Running AI at the Edge: Running Real Workloads Directly in the Browser demonstrating how browser-native inference using tools like Transformers.js, WebLLM, and WebGPU can deliver practical AI workloads without sending data to third-party cloud providers. Hall is the founder and tech director at Parallax and creator of jsPDF.

Hall opened by framing the downsides of server-side inference in concrete terms: sending prompts and user data to third parties creates privacy concerns, every request incurs network round trips that can make real-time experiences feel sluggish, and usage-based cloud inference costs rise with success rather than falling away.

He then explored the motivations for running AI locally in the browser, emphasizing privacy, reduced latency, and cost efficiency. He argued that local processing provides “architectural privacy,” where the design itself makes data upload impossible rather than relying on policy promises. For real-time audio and video applications, eliminating round-trip delays to data centres proves critical, while cloud cost scaling means successful products become increasingly expensive to operate.

The presentation covered several categories of local AI technology. Bring-your-own-model approaches using Transformers.js from Hugging Face, WebLLM, and ONNX Runtime allow developers to quantize and cache models directly in the browser. Hugging Face recently released Transformers.js v4, which delivers a 4x speedup for BERT models via the WebGPU runtime and supports 20-billion parameter models at 60 tokens per second. Chrome’s built-in Prompt API with Gemini Nano offers inference with no model download required, alongside translator, summarizer, and language detector capabilities. Hardware acceleration through WebGPU is now well supported across Safari, Firefox, and Chromium browsers, while the WebNN API, currently a W3C Candidate Recommendation, promises access to specialised NPUs on mobile devices.

Hall demonstrated several practical use cases including near-human quality transcription using Whisper models locally, with access to probability scores for hallucination detection. For data analytics, he combined DuckDB running analytical SQL workloads in-browser via WebAssembly with a local LLM generating queries, enabling data exploration without sending information to servers.

The talk also addressed design principles that Hall considers essential for browser AI applications. He cautioned against defaulting to chatbot interfaces, noting user fatigue, and instead recommended identifying what the model excels at and presenting structured suggestions. He advocated hiding model loading time using perceived performance techniques and only reaching for AI when problems are genuinely difficult and fuzzy.

On the topic of testing and evaluation, Hall emphasised that most AI project work lies in measurement and validation rather than model integration. He recommended using stronger frontier models to evaluate weaker local models, and building visual evaluation suites that domain experts can review rather than relying solely on engineering tools. Model optimisation through quantisation can reduce 7GB models to 2GB with modest quality loss.

Closing guidance was a practical rule of thumb for when to choose in-browser inference: use it when privacy, latency, offline capability, or cost predictability matter enough to outweigh the constraints of running smaller models on client hardware, and benchmark that trade-off against real workloads rather than assuming a server call is always necessary.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Canon Promo Codes: 30% Off for March 2026 Canon Promo Codes: 30% Off for March 2026
Next Article Opinion: The AI white collar displacement debate — doom or delay? Opinion: The AI white collar displacement debate — doom or delay?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

It Takes 12 Hours To Move The Artemis II Rocket 4 Miles – Here’s Why – BGR
It Takes 12 Hours To Move The Artemis II Rocket 4 Miles – Here’s Why – BGR
News
GeekWire’s AI summit is Tuesday: What to know if you’re attending our ‘Agents of Transformation’ event
GeekWire’s AI summit is Tuesday: What to know if you’re attending our ‘Agents of Transformation’ event
Computing
How a Family of 3 Lives on $500,000 on the Upper West Side
Software
Google Contacts is prepping a smarter profile layout to tame its growing bloat
Google Contacts is prepping a smarter profile layout to tame its growing bloat
News

You Might also Like

It Takes 12 Hours To Move The Artemis II Rocket 4 Miles – Here’s Why – BGR
News

It Takes 12 Hours To Move The Artemis II Rocket 4 Miles – Here’s Why – BGR

4 Min Read
Google Contacts is prepping a smarter profile layout to tame its growing bloat
News

Google Contacts is prepping a smarter profile layout to tame its growing bloat

3 Min Read
Apple sets June date for WWDC 2026, teasing “AI advancements”
News

Apple sets June date for WWDC 2026, teasing “AI advancements”

2 Min Read
Take Nearly 50% Off the Anker Solix C800 Solar Bundle Ahead of Amazon's Big Spring Sale
News

Take Nearly 50% Off the Anker Solix C800 Solar Bundle Ahead of Amazon's Big Spring Sale

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?