OpenAI Group PBC is reportedly developing a new artificial intelligence model optimized for audio generation tasks.
The Information today cited sources as saying that the algorithm will launch by the end of March. According to the publication, it’s expected to produce more natural-sounding speech than OpenAI’s current models. The AI will also be better at handling real-time back-and-forth interactions with users.
OpenAI will reportedly base the model on a new architecture. The company’s current flagship real-time audio model, GPT-realtime, uses the ubiquitous transformer architecture. It’s unclear whether the company will pivot to an entirely different algorithm design or simply adopt a new transformer implementation.
Some transformer-based audio models process speech directly. Others, such as the Whisper algorithm that OpenAI released in 2022, turn audio files into graphs called spectrograms before processing them. Whisper and the company’s newer audio models are all available in multiple editions with varying output quality. It’s possible OpenAI will also offer multiple versions of the algorithm it’s expected to release this quarter.
The company has reportedly combined several engineering, product and research teams to support its audio model push. The initiative is said to be led by Kundan Kumar, a former researcher at venture-backed AI provider Character.AI Inc. Many of the startup’s other staffers joined Google LLC in late 2024 as part of a $2.7 billion reverse acquihire.
It’s possible OpenAI’s upcoming model will not focus solely on speech generation use cases. The nascent AI-generated music segment is currently experiencing rapid growth: The Wall Street Journal recently reported that one market player, startup Suno Inc., is generating more than $200 million in annual revenue. Joining the fray may help OpenAI boost its consumer business.
The upcoming audio model is part of a broader effort on the company’s part to enter the consumer electronics market. According to The Information, OpenAI plans to launch an “audio-first personal device” in about a year. It’s believed the company could eventually introduce an entire portfolio of devices complete with a smart speaker and smart glasses.
Last May, OpenAI acquired product design startup io Products Inc. to support its consumer hardware push. The transaction valued the Jony Ive-founded startup at $6.5 billion. In October, the Financial Times reported that Ive is working on a smartphone-sized device that is designed to sit on a desk or table.
OpenAI may seek to develop a lightweight, on-device audio model to support its move into consumer hardware. Processing prompts locally is more cost-efficient than sending them to the cloud. Google has taken a similar approach with its Pixel smartphone series, which uses an on-device model called Gemini Nano to power some AI features.
Image: OpenAI
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About News Media
Founded by tech visionaries John Furrier and Dave Vellante, News Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
