Translation By Technology

U.S. president Barack Obama using simultaneous interpretation equipment during a meeting with a foreign head of state. — A number of companies now offer apps that can sensibly translate spoken phrases in one language into an enormous variety of other languages.

When Jonathan Werner, a technology specialist at Cape Elizabeth High School in Maine, was asked to work with two new students from Italy last year, he found that his once-sharp command of their language had lost its edge. Rather than stumbling through his sentences, however, Werner chose to download Google Translate, an app that quickly transforms spoken dialogue in one language to text or speech in another. He encouraged the students to do the same, and the results were immediate: "they could instantly interact with their teachers," Werner says. "This tool kept them functioning in real time instead of lagging behind."

Researchers have been working on automated translation for decades, but until recently a universal, high-functioning translation tool seemed to be the stuff of science fiction. Now a number of companies, including Google and Microsoft, offer apps that can sensibly translate spoken phrases in one language into an enormous variety of others. Similarly, an English-speaking caller can use Skype Translator to have a long-distance conversation with someone in Spanish, Italian, or Mandarin Chinese.

In each case, a speech recognition engine first transforms the raw audio into a transcript, then the system has to translate this basic text into the listener’s language and either pronounce it or spell it out on screen. These basic steps have not changed in decades, but the technology has become both more accurate and more versatile – quickly incorporating an increasing number of languages – because of the adaptation of machine learning techniques.

Previously, automated translation was based largely on rules specific to each language; expanding to a new dialect meant coding in new rules. Now, given data in the form of parallel texts, or documents translated from one language to another, machine learning algorithms can do this work independently. Computer scientist Atefeh Farzindar of NLP Technologies says the quality of these texts is hugely important. NLP, a Canadian company, trains its algorithms on official court documents that have been translated from French into English. Yet she also notes that the size of the corpus is critical, since the algorithms benefit from training on more data. In this sense, Google has a huge advantage: "Their corpus is the whole Internet," she notes.

Google Translate now covers 90 languages, and the company constantly trawls the Web for new pages that look like mirrors of each other (ideal texts include official United Nations transcripts or books). Google’s translation algorithms match up sentences in the two parallel texts, then look for phrases, sequences of a few words, and individual words that seem to appear together more frequently. When comparing Spanish and English texts, for example, the frequency with which "dog" and "perro" occur in parallel would suggest they have the same meaning. The technology tracks all these words, phrases, and sequences. "Then when we go to translate a sentence, we find the words and phrases in it that we think we have translations for, and we look at the most probable ones," says Macduff Hughes, director of engineering for Google Translate.

The system also runs the candidate sentences through a language model that corrects grammar and adheres to conventions. When translating the phrase "the big dog" from English to Spanish, for example, the language model flips the noun and adjective, as one would in the latter tongue. This in itself is not a huge advance. Previously, though, a language expert would have had to write in such rules. "The advance was to machine-learn it by looking at the data," says Hughes. "We were able to do word re-ordering for languages that we don’t speak."

Skype Translator, still in beta, faces a slightly different set of challenges because it is designed specifically for active conversations. When people are using a translation app, they enunciate. On the telephone, however, we mutter and pause; we fill conversational spaces with "umms" and "ahhs" as we formulate our next thought. To learn the difference between real words and mumbles, the Skype team trains its system on both exact transcriptions of speeches and cleaned-up versions, without repeated or partial words. "That teaches us the corrections we need to apply," explains Chris Wendt, program manager at Microsoft Research.

Wendt says the speech recognition engine also uses deep learning techniques that identify and then segregate different layers of patterns within the data. In the past, the system might have mastered one language only to sacrifice accuracy when forced to incorporate a heavily-accented version of that same speech. With deep learning, the system recognizes variations as different versions of the same language, and keeps them separate but linked. "Applying deep neural networks has enabled us to broaden the training data with a variety of accents without impacting the quality of the translation of the non-accented speakers," says Wendt. "That’s really the major practical gain."

To maintain a conversational flow, the Skype technology has to be fast. As the speaker talks, the speech recognition system works in conjunction with the language model, and the system churns out the most likely translations. When the speaker pauses for 500 milliseconds, the system assumes he or she is finished, then chooses the translation with the highest probability of being correct.

Neither Skype Translator nor Google offer exact metrics on accuracy. Both say their technology is steadily improving, but they do not claim to have developed the perfect universal translator; there are still mistakes. Yet Werner, the high school technology specialist, says he has already seen a significant change from when he and his students began using Google Translate at the start of 2014 until today. "There were some flat-out hilarious translations at the outset," he says, noting one case in which the app seemed to suggest that one of the teachers was swearing (he was not). "But it has gotten way better just in the course of this year."

Gregory Mone is a Boston, MA-based writer and the author of the novel Dangerous Waters.