Vasco Pedro had always believed that, despite the rise of artificial intelligence (AI), getting machines to translate languages as well as professional translators do would always need a human in the loop. Then he saw the results of a competition run by his Lisbon-based startup, Unbabel, pitting its latest AI model against the company’s human translators. “I was like…no, we’re done,” he says. “Humans are done in translation.” Mr Pedro estimates that human labour currently accounts for around 95% of the global translation industry. In the next three years, he reckons, human involvement will drop to near zero.
It is hardly a surprise that the AI model-makers are bullish, but the optimism feels apt. Machine translation has become so reliable and ubiquitous so fast that many users no longer see it. The first computerised translations were attempted more than 70 years ago, when an IBM computer was programmed with a vocabulary of 250 words of English and Russian and six grammatical rules. That “rules-based” approach was superseded in the 1990s by a “statistical” approach, based on crunching large datasets, which was still the state of the art when Google Translate was launched in 2006. The field exploded in 2016, though, when Google switched to a “neural” engine—the forebear of today’s large language models (LLMs). Influence flowed both ways: when LLMs became better, so too did machine translation.
In Unbabel’s test, human and machine translators were asked to translate everything from casual text messages to dense legal contracts and the archaic English of an old translation of “Meditations” by Marcus Aurelius. Unbabel’s AI model held its own. Measured by Multidimensional Quality Metrics, a framework that tracks translation quality, humans were better than machines if they were fluent in both languages and also experts in the material being translated (for instance, specialist legal translators dealing with contracts). But the lead was small, says Mr Pedro, who added that it would be hard to see how, two or three years from now, machines would not overtake humans entirely.
Marco Trombetti, boss of Translated, based in Rome, has created a different measure for the quality of machine translations, called Time to Edit (TTE). This is the amount of time it takes a human translator to check a transcript produced by a machine. The more errors in the transcript, the slower the human has to go. Between 2017 and 2022 TTE dropped from three seconds per word to two across the ten most-translated languages. Mr Trombetti predicts it will fall to one second in the next two years. At that point, a human is adding little to the process for most tasks other than what Madeleine Clare Elish, head of responsible AI at Google Cloud, calls a “moral crumple zone”: a face to take the blame when things go wrong, but with no reasonable expectation of improving outcomes.
The problem of translating one sentence to another is “pretty close to solved” for those “high-resource” languages with the most training data, says Isaac Caswell, a research scientist at Google Translate. But going beyond this to make machine translation as good as a multilingual person—especially for languages that do not have reams of available training data—will be a more daunting task.
Complex translations face the same problems that plague LLMs in general. Without the ability to plan, refer to long-term memory, draw from factual sources or revise their output, even the best translation tools struggle with book-length work, or precision tasks such as keeping a translated headline to a certain length. Even tasks that a human finds trivial still trip them up. They will, for instance, “forget” translations for static phrases like shop names, translating them afresh, and often differently, each time they are encountered. They may also hallucinate information they don’t have to fit grammatical structures of the target language. “To have the perfect translation, you also have to have human-level intelligence,” says Mr Caswell. Without being a competent poet, it is difficult to translate a haiku.
That is if users can even agree on what a perfect translation is. Translation has long been a struggle between “transparency” and “fidelity”—the choice between translating sentences exactly as they are in the original language, or exactly as they feel to the target audience. A transparent translation would leave an idiomatic phrase as it is, letting English speakers hear a Pole dismiss a problem as “not my circus, not my monkeys”; a faithful one may even go so far as to change whole cultural references, so that Americans aren’t taken off-guard by “football-shaped” being used to describe a spherical object.
Even if there could be a simple dial to turn between transparency and fidelity, perfecting the interface of such a system would require AI assistance. Translating between languages can sometimes require more information than is present in the source material: to translate “I like you” from English to Japanese, for instance, a person needs to know the gender of the speaker, their relationship to the person they are addressing and ideally their name to avoid the impolite use of the word “you”. A perfect machine translator would need to be able to interpret and replicate all these subtle cues and inflections.
Adding checkboxes and dials to an interface would bamboozle users. In practice, therefore, a perfect machine translator would be human-level in the quality of its output as well as the method of its input. The requirement to ask follow-up questions, to know when to trade transparency for fidelity, and to understand what a translation is for, means that advanced translation will need more information than just the source text, says Jarek Kutylowski, founder of DeepL, a German startup. “If we can see the address you’re emailing, maybe the conversation history, we can say, ‘Hey, this person is actually your boss’ and tailor it to that.” (DeepL also works with The Economist to provide translations in “Espresso”, our daily news app, which is free for students.)
Then there is the issue of “low-resource” languages, where the paucity of written text means that the accuracy of translations is not being improved by the LLM breakthroughs that have transformed the rest of the industry. Less data-hungry approaches are being tested. A team at Google, for instance, built a system to add speech-to-speech translation for 15 African languages. Rather than being trained on gigabytes of audio data, it instead learns to read written words the same way a child would, associating speech sounds with sequences of characters in written form.
Live translation is also in the works. DeepL launched a voice-to-voice translation system in November, offering interpretation for one-on-one conversations in person and multi-member video chats. Unbabel, meanwhile, has demonstrated a device capable of reading small muscle movements in the wrists or eyebrows and pairing them with LLM-generated text to allow communication without the need to speak or type. The firm intends to build the tech into an assistive device for people with motor-neurone disease who can no longer speak by themselves.
Despite the progress, and his part in it, Mr Caswell is hopeful that the value in speaking other languages will not disappear entirely. “Translation tools are very useful for navigating the world, but they’re a tool,” he says. “They can’t replace the human experience of learning a language in terms of actually understanding where other people are coming from, understanding what a different place is like.”
Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.