Since the beginning of the year, the objective of great technological ones has been clear: that we speak with artificial intelligence (AI). Openai, Microsoft, Google and Meta have added voice functions to their assistants. But this seems to be just the beginning. The industry advances at a frantic pace and the way we interact with these tools continues to evolve.
Tell the voice agents ‘hello’. Sam Altman’s company has been betting on text agents with tools such as Operator or Computer-Using Agents for months. However, Openai already has it ready if next great movement to continue highlighting in the race for the development of AI: to promote a new and powerful generation of voice agents.
New models on stage. Openai has announced the launch of new audio models to turn voice and vice versa. They are not in Chatgpt, but in the API, where developers can use them to create voice agents. The important thing? They aim to be much more precise and to bring customization to the next level.
The new OpenAI models, built on GPT-4O and GPT-4O-mini, promise to improve Whisper already their previous text to voice tools, which will also remain active through the API. But it is not just a matter of performance: now they can also modulate their tone to sound, for example, “as an empathic customer service agent.”
Destination: the call centers. Openai makes it clear where they point with this launch. He assures that “for the first time, developers can tell the model not only to say, but also how to say it, which allows more personalized experiences for use cases ranging from customer service to creative narrative.”
According to Openai, this technology will allow creating much richer “conversational experiences.” If we take into account that ChatGPT, promoted by GPT-3.5, arrived in November 2022, it is evident that the progress has been vertiginous. And everything indicates that these models will end up arriving at the call centers.
We might think that at first the interactions will be somewhat limited, but well above the current voice systems. They will move away from traditional automated assistants and will be much more natural. Over time, the line between a conversation with a person and an AI could become almost imperceptible.
Images | Charanjeet Dhiman | OpenAI
In WorldOfSoftware | We have tried Sesame’s conversational. It is the experience closest to a “human voice” that we have seen
In WorldOfSoftware | China has found an unusual strategy to avoid US mosquadillas with AI: bet on the Open Source