Artificial Intelligence and Machine Learning

Lost in Translation?

A number of services can convert spoken or typed sentences and display or read a reasonably accurate translation.
Fueled by improvements in speech recognition, machine learning, better algorithms, cloud processing, and more powerful computing devices, the quality of machine translations is improving.

Learning another language has never been a simple proposition. It can take months of study to absorb the basics and years to become fluent. Of course, there's the added headache that learning a language doesn't help if a person encounters one of the world's other 7,000 or so languages.

 "There has always been a need for human translators and interpreters," says Andrew Ochoa, CEO of translation technology firm Waverly Labs.

However, thanks to digital technology, the accent is now shifting to automated machine translation.

The dream of producing devices and software that can translate from Japanese to Spanish, or from Farsi to Hindi, is taking shape. Fueled by improvements in speech recognition, machine learning, better algorithms, cloud processing, and more powerful computing devices, the quality of machine translations is improving.

For example, Google Translate and Bing Translate can convert spoken or typed sentences and read a reasonably accurate translation out loud. Google Translate on a smartphone can also convert text from a foreign language menu, sign, or document into a user's native language. At the same time, startups such as Waverly Labs have introduced earphones and earbuds that allow participants to converse in different languages but hear everything in their own language in near-real time.

Words Matter

From casual travel to international business and global conferences, machine language translation is changing the face of human interaction. Alex Waibel, a computer science professor at Carnegie Mellon University and the Karlsruhe Institute of Technology, says that while the technology is nothing new (he and others began exploring it in the mid-1980s), the last few years have fueled enormous advances in the field.

In the past, machine language translation systems have mostly used rules-based methods to determine the most accurate combination of words. Yet, it's impossible to program the millions (or even billions) of rules needed to address every translation.

The next advance was statistical learning machine translation systems. Although still imperfect, this led to broad adoption of machine translation. This advance also made the translation of spoken language possible.

A major breakthrough occurred in 2009 when Waibel and his team developed an iPhone app called Jibbigo, which translated voice and text for vocabularies of more than 40,000 words in near-real time on a smartphone. This brought even speech dialog translation to the masses. By 2015, recurring neural networks began to reshape the field—producing performance gains of 30%  or more.

Achieving a higher level of accuracy isn't about simply dumping words into machine learning systems and allowing them to parse through all the combinations. Accents, dialects, regional variations, synonyms, idioms, and colloquialisms require close scrutiny. Further complicating the task is the fact that new words and slang constantly appear, such as the term Brexit, and meanings of words sometimes change over time. Consequently, researchers often use specific datasets that focus on context and industry-specific terms. This helps a system distinguish between a river bank and a financial bank, or between a construction crane and a bird.

Today's machine language frameworks accommodate upwards of millions of words and, in many cases, produce increasingly high-quality translations, as measured by a BLEU score, an algorithmic tool for evaluating machine-translated text across languages.

Still, perfection remains elusive.

"It's difficult to communicate speaking style, tone, and intent through machine translation. Direct text output, such as a translation that takes place at a social media site, doesn't convey the true feeling," says Ge Gao, an assistant professor at the College of Information Studies, University of Maryland College Park.

Lingua Franca

In labs around the world, research continues into how to build better machine translators and scale up the number of languages that machines can accommodate. Although tools such as Google Translate are suitable for casual use, business and diplomatic interpretation requires a level of precision and handling of stylistic, social, and diplomatic variations and subtleties that are lacking in today's technology. "You have people who can interpret multiple languages appropriately, including communicating politeness, sarcasm, humor, insults, formality and anger," Waibel says.

Training systems to translate accurately across several thousand languages and hundreds of thousands of dialects becomes an almost impossible task. Speech and image recognition present additional challenges; Waibel describes the task as a "software nightmare." At present, Waverly Ambassador supports 20 languages in 42 dialects. The $149 wireless over-the-ear units stream translations in near-real time by tapping cloud processing. Ochoa says 5G cellular technology and further enhancements in chips and machine learning will fuel better systems.

Gao believes it may one day be possible to hit 100% accuracy, and incorporate tone and emotions into machine language translation. Even then, the need for humans in translation likely will persist. Devices and machines are not suitable, or even available, for every environment and situation.

Waibel believes machine language translation technology won't replace the desire to learn languages; he says such systems have actually produced an uptick in interest in gaining language skills. "The more people have the technology, the more they venture into other languages without fear," he says. "They learn phrases, they seek out travel opportunities, and they venture into a culture."

Samuel Greengard is an author and journalist based in West Linn, OR, USA.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More
Computing Applications BLOG@CACM

Lost in Translation

The Communications Web site,, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish selected posts or excerpts.

Follow us on Twitter at

Daniel Reed on straddling the intellectual divide between technology experts and policymakers.
  1. Daniel Reed "Being Bilingual: Speaking Technology and Policy"
  2. Divergent Ontology
  3. Interpreting the Signs
  4. Bridging the Divide
  5. Moving Forward: Technical Policy Ambassadors
  6. Author
Microsoft Research Director Daniel Reed
June 30, 2011

There is an old joke:

Q: What do you call someone who speaks only one language?

A: An American.

Certainly, being monolingual limits and constrains one’s exposure to and understanding of the cultural and linguistic diversity that is our global human heritage. Alas, I fear the same is true in far too many domains where cross-cultural fertilization would inform and enlighten all parties.

I will spare you a meandering discourse on the theory of language origins, the Sapir-Whorf hypothesis, or the Indo-European language tree. Nor will I digress to discuss the intellectual divide that separates the sciences from the arts, for C.P. Snow has written far more eloquently about that than I can. Rather, I want to focus on a far more constrained and practical intellectual concern, the cultural gap separating technologists and policy experts.

Back to Top

Divergent Ontology

Over the years, I have learned that being bilingual in matters of science and technology and in matters of strategy and policy is far rarer than I might have first hypothesized. Those of us with Ph.D’s, Sc.D’s, or research M.D.’s speak a particular argot largely incomprehensible to the general public and even to the learned and sophisticated in other domains. Similarly, those who live in the legislative and policy world depend on a vernacular that seems obscure and obtuse to those in technical domains.

The technical and policy communities lack shared cultural referents, created all too often by endemic pressure to differentiate. In consequence, the communities are often estranged, lacking an ontology of discourse to address their common problems and exploit their complementary skills. The power of consilience has long been known, as the tale of the Tower of Babel makes clear.

And the Lord said, Behold, the people is one, and they have all one language; and this they begin to do; and now nothing will be restrained from them, which they have imagined to do. Go to, let us go down, and there confound their language, that they may not understand one another’s speech.

Today, we have our own Babel of misperceptions and miscommunications at the intersection of technology and technical policy. (For a few thoughts on consilience in a technological world, see my blog “Consilience: The Path To Innovation,” Nov. 9, 2009.)

Back to Top

Interpreting the Signs

The linguistic and cultural divergence of technologists and policy experts is no more evident than in the way they identify and select outcomes. If you have ever felt compelled to explain quantum efficiency when discussing silicon solar cells and renewable energy sources, electron mobility and leakage current when discussing the future of smartphones, Shannon’s theorem and the Heaviside layer when explaining wireless communication, or transcriptional gene regulation when discussing the future of health care, you live on the technological side of the communication chasm. Conversely, if you are facile with poll dynamics and sampling error, macro-and microeconomics and their shifting theories of global economic impact, trade imbalances and structural unemployment; if you understand the distinct and important roles of the World Bank and the International Monetary Fund; and if you are adept at reading the nuance of diplomatic language, then you live on the policy side of the communication chasm.

There are, of course, other telltale signs. If you own more than one T-shirt covered with Maxwell’s equations (and you can explain them) and a statistically significant fraction of your wardrobe is festooned with technical conference logos, then you are definitely on the geek side. Conversely, if you own a closet full of suits, perhaps some bespoke, and you choose the suit, matching tie, shoes, and fountain pen based on your mood, those you expect to meet, and the venue, then you are likely a policy wonk.

I exaggerate, of course, on both counts to illustrate a point, though elements of the humorous stereotypes are real. I resonate with both caricatures for my closet is filled with both conference T-shirts and with a variety of suits. But—and this is important—I do not wear them at the same time.

Back to Top

Bridging the Divide

How do we cross the intellectual divide, providing technical advice to policy experts in ways that they find useful and actionable? Equally importantly, how do we translate policy constraints—political, economic, and social—into contexts intelligible and actionable by technical experts?

The key in both cases is to respect the differences and value each bring, and place one’s self in the other’s situation. If you are a technical expert, this often means finding intuitive analogies that capture the key elements of the technical idea. For example, I recently explained the potential economic advantage of cloud computing by saying that it brought some of the efficiencies to organizational IT that big box retailers brought to consumer goods.

Had I explained the design of a cloud data center, the networking and content distribution network, and the infrastructure optimizations, I would have bewildered my audience. I simply wanted them to understand that familiar economic forces were driving the cloud transition and raise awareness of the policy implications. Any new technology can both create new jobs and spur economic development and disrupt an industry, creating unemployment, which then strains the educational and social safety net.

Likewise, if you are a policy expert or a technical person facile in policy, you must explain the constraints and practicalities of government actions and budgets to your technical partners. As a technologist, one must respect those realities rather than disparage them. The policy world is a complex, dynamic system with deep and unexpected consequences from almost any major change.

It is a critical fallacy to believe that if a legislator or staff member had access to the same facts as a scientist or technologist, then he or she would draw the same conclusions about policy implications as the technologist. Policy discussions may begin with data gathering, but their outcomes are based on values, priorities, and trade-offs. Any government’s agenda must be balanced against a myriad of social and political constraints, including education, social welfare, and national defense. Often this means finding ways to compromise and achieve part of one’s goal. Ten percent of the objective can be victory and should be celebrated as such. There is often time to seek the next 10% at a future engagement.

Back to Top

Moving Forward: Technical Policy Ambassadors

For those of us in science and technology, I believe we must encourage more of our colleagues to become bilingual. An increasing fraction of our world is shaped by technology, and it is incumbent on us to facilitate the discussion of technology, social welfare, economic development, environmental policy, security and privacy, health care and medicine, defense and protection, and innovation and discovery. If we are respectful of political constraints, we can help policymakers understand the pace of technological change and its possibilities and recognize that even a national legislative body cannot overturn the laws of physics.

Only by being ambassadors to both our technical colleagues and our policy partners can we constructively shape our future. We must find and expand those shared cultural referents, creating an ontology of technical policy discourse, and we must reward our colleagues for such engagements.

Remember, it’s okay to wear the conference T-shirt and the suit, just not at the same time.


 Reader Comment

This article shows one of the many reasons that central planning actually does not work. No matter how much geekspeak a policy wonk understands, or how much economics a technogeek understands, one size does not fit all. Community organizers and career politicians make for very poor technology bosses. Even in economics, there is only one U.S. Congressman at this time that fully understands the damage that false (fiat) FED currency does to the body politic, for example.


Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More