Artificial intelligence startup aiOla says it’s breaking new ground in its quest to develop machines that can understand human speech as well as people can.
Today it has announced a new “Speech Intelligence Gateway” that’s able to facilitate more reliable speech recognition by dynamically routing each audio request to the model that’s best suited to understand it.
The startup first came to attention last year when it debuted Drax, a new kind of voice AI model that uses a parallel flow-matching training techniques to enhance voice recognition. It works by reconstructing human speech from a noisy representation by processing the entire sequence of spoken words at once, unlike sequential methods that predict one token at a time. The method exposes the model to realistic, acoustically plausible errors, which improves its ability to understand accented speech and background noise.
The company is now going further with its speech intelligence gateway, dubbed “QUASAR,” which stands for “quality-weighted unsupervised ASR assessment and ranking.” According to aiOla, QUASAR will identify the speaker’s characteristics, such as their accent, as well as the audio conditions and domain context, and send their audio signal to the most suitable automatic speech recognition system so it can be transcribed with greater accuracy.
It’s a powerful capability, because the voice AI model market has become highly fragmented, with hundreds of competing ASR systems, which have all been trained differently. OpenAI’s Whisper, Amazon’s Transcribe, Alibaba’s Qwen2 and Deepgram constantly try to outdo each other with successive new releases, striving to improve their accuracy depending on accents, noise and context. Yet most businesses don’t take advantage of this rich variety of options and, instead of using the best ASR for each scenario, simply adopt the one that performs best in benchmarks as a one-size-fits-all approach.
Co-founder and President Amir Haramaty said most enterprises simply accept the blind spots of whatever ASR engine they choose. But that’s a bad idea, he argues. For instance, though their chosen ASR might be great at interpreting speakers with a U.S. accent, it may fall short when trying to understand British English speakers. Alternatively, some ASR’s work great during perfect conditions, but throw in background noise such as a busy airport or a poor quality connection, and it’s no longer able to make sense of what people are saying.
That unreliability just doesn’t cut it in many situations. For instance, a customer support agent needs to understand the customer’s problem so it doesn’t mistakenly give them the runaround.
“QUASAR treats speech recognition as a dynamic problem, where the best option can shift from one interaction to the next based on real conditions, not averages,” Haramaty said. “This is a major leap for the industry and potentially a massive disruption for how ASRs are being consumed.”
The startup said it has carried out extensive internal evaluations across diverse benchmarks spanning clean read speech, varied accents, professional talks, institutional audio and domain-heavy financial content. During those tests, QUASAR was able to select the best-performing ASR on 88.8% of calls, enabling more accurate automated conversations between AI agents and humans.
Haramaty says QUASAR is an important development because voice is fast becoming the default way for humans to interact with AI models. Organizations simply can’t tolerate faulty speech recognition systems, but there’s no single, all-powerful ASR that’s able to understand voices perfectly in every scenario.
“ASRs must function as living infrastructure, and QUASAR brings that vision to life by operationalizing speech recognition at scale, improving consistency across diverse populations and environments,” he said. “The result is a platform that can transform the entire voice ecosystem, from individual developers building captioning tools to global contact centers processing billions of minutes of audio each year.”
Image: News/Gemini
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
