Tavus Inc., an artificial intelligence research startup developing models for real-time AI technology that can mimic the experience of talking to another person, today announced the release of a family of AI models that push the envelope.
The company said it’s building what it calls an operating system for human-AI interaction with its “Conversational Video Interface,” which will allow AI to perceive, interpret and respond naturally. It would be like talking to another person in a Zoom or FaceTime call. The mission of Tavus is to allow AI to understand facial expressions, tone and body language and interpret their meanings, but also be responsive enough with their own expressions and tone to convey meaning.
“Humans are evolutionarily designed to communicate face to face. So, we want to teach machines how to be able to do that,” Chief Executive Hassaan Raza told News in an interview. “If we believe in the science fiction future where there are AI coworkers, friends and assistants, we need to build the interfaces for that to happen.”
Today’s release features three models: Phoenix-3, the first full-face AI rendering model capable of conveying subtle expressions; Raven-0, a groundbreaking AI perception model that can see and reason like a human; and Sparrow-0, a state-of-the-art turn-taking model that adds the “spark of life” to conversations.
Phoenix-3 is the company’s flagship foundation model, designed to create “digital twins,” or highly realistic representations of individuals, and is equipped with AI-driven human expression capabilities, as Raza explained. Now in its third iteration, it offers full-face animation, capable of cloning people and accurately representing every muscle in the face, which is essential for mimicking subtle expressions. Most commercial facial animation models don’t handle full faces, he said, the result is that the lower half doesn’t match the upper half and that disrupts the immersive quality.
“Phoenix-3 is a full-face expression model that also has emotion control, and it’s the first model that can do this without requiring a ton of data,” Raza said.
Most importantly, Phoenix-3’s high fidelity and representation of facial muscle control means that it can accurately emulate “micro-expressions.” These are brief, involuntary facial expressions that are the result of emotional responses. By adding this capability, the model creates a vivid video model experience that appears far more realistic than simple animated faces that feel more emotional and expressive.
To enable Phoenix-3 to respond similarly to a human, Raven-0 gives the AI the capability to see and interpret what is happening in a scene. Rather than taking individual snapshots, it continuously observes and understands the context of the events in the video. That includes recognizing emotions on the user’s face and detecting changes in their environment.
For instance, an AI tutor could identify when a student appears confused or frustrated by monitoring their expressions and adjusting its explanations accordingly. Similarly, a support assistant could observe a customer as they work with a product and offer guidance on how to resolve any problems.
Sparrow-0 tries to handle something that a lot of AI gets wrong, Raza said. Natural conversation has a flow, a give-and-take between the participants where one waits for the other to stop talking and then jumps in.
However, AI can jump in too quickly – sometimes right on top of the other person. This suddenness happens because AI models think more quickly than humans and one thing that AI model developers work very hard on doing is reducing latency, the time that it takes AI models to respond. But if an AI responds too quickly it will seem uncanny.
The Sparrow model works to make the conversation feel natural by understanding the rhythm of speech to know when to pause, when to talk and when to listen. It won’t react to filler words like “uh” or wait for a long silence – instead, it adjusts to the tone, pacing and context.
“If it’s very sure you’re having a fast-paced friendly conversation, it’ll respond quickly,” Raza explained. “But, if you say, ‘Hey, let me think about that,’ the AI will give you space. So, it just makes the conversation more natural.”
Unlike other companies that stitch technologies together, Raza said, Tavus has developed an integrated system that integrates these models. The result is a highly immersive experience that feels more like talking to another person and is less uncanny than other human avatar AI systems.
Raza said there’s still more ways to go with the model capabilities and that means continuously improving the AI’s ability to perceive and understand humans.
“It’s not perfect today, but it’s best-in-class,” Raza added. “However, in the future, what we’re going for is having a model that so deeply understands humans so that you wouldn’t know if it was a model unless you asked it.”
Image: Tavus
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU