This Talking Robot Wants To Escape The Uncanny Valley

This talking robot wants to escape the uncanny valley

Last updated: 2026/03/16 at 3:16 AM

News Room Published 16 March 2026

Humanoid robots are progressing at high speed. They become more agile, more intelligent and more capable of interacting with us. But as soon as they open their mouth, the illusion often collapses quite quickly.

The problem of robots that speak… badly

The problem is not the voice: modern speech synthesis can already sound very natural. What often betrays the machine are the lips. In many robots, mouth movements do not properly follow the rhythm of speech, creating a slight lag, much like bad dubbing.

Subscribe to WorldOfSoftware

However, for robots intended to interact with humans (assistants, companions or educational robots), this detail matters a lot. Believable synchronization between voice and mouth can make the conversation much more natural.

This is precisely what a team of researchers from AheadForm is trying to improve, with a new method based on machine learning. Their F1 robot uses an artificial intelligence technique capable of analyzing an audio recording and automatically deducing the corresponding lip movements. In other words: the robot listens to the sentence, then calculates itself how its mouth should move to pronounce the words.

Obviously there remains a question: do these movements really seem more credible? To verify this, the researchers organized a large-scale experiment. The principle was simple: show volunteers several videos of a talking robot, and ask them which one seemed best synchronized with the audio. Participants first saw a reference video, namely a synthetic lip movement considered ideal, then three different versions of the physical robot uttering the same sentence. Their mission: choose the most convincing version. In total, 1,300 people took part in the experiment.

The result is quite clear: the new method was chosen in more than 60% of cases, far ahead of the two reference approaches tested in the study. One, based on the amplitude of the voice, obtains only 23% of the votes; the other, based on matching facial landmarks, falls to 14%. Of course, everything is not perfect yet. Researchers recognize that measuring the “quality” of a lip movement remains difficult: there is no real standard indicator for judging this type of realism. It is precisely for this reason that they chose to rely on the opinion of human observers.