Sign In

Communications of the ACM

ACM Careers

Research Finds Automated Voice Imitation Can Fool Humans and Machines


voice impersonation attack, illustration

University of Alabama at Birmingham researchers have found that automated and human verification for voice-based user authentication systems are vulnerable to voice impersonation attacks. This new research is described in "All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines," being presented at ESORICS 2015, the European Symposium on Research in Computer Security.

Using an off-the-shelf voice-morphing tool, the researchers developed a voice impersonation attack to attempt to penetrate automated and human verification systems.

A person's voice is an integral party of daily life. It enables people to communicate in physical proximity, as well as in remote locations using phones or radios, or over the Internet using digital media.

"Because people rely on the use of their voices all the time, it becomes a comfortable practice," says Nitesh Saxena, director of the Security and Privacy In Emerging computing and networking Systems (SPIES) lab and associate professor of computer and information sciences at UAB. "What they may not realize is that level of comfort lends itself to making the voice a vulnerable commodity. People often leave traces of their voices in many different scenarios. They may talk out loud while socializing in restaurants, giving public presentations or making phone calls, or leave voice samples online."

A person with potentially malicious intentions can record a person’s voice by being in physical proximity of the speaker, by making a spam call, by searching and mining for audiovisual clips online, or even by compromising servers in the cloud that store audio information.

This study from researchers within the UAB College of Arts and Sciences Department of Computer and Information Sciences and Center for Information Assurance and Joint Forensics Research explores how an attacker in possession of audio samples from a victim's voice could compromise the victim's security, safety, and privacy.

Advances in technology, specifically those that automate speech synthesis such as voice morphing, allow an attacker to build a very close model of a victim's voice from a limited number of samples. Voice morphing can be used to transform the attacker's voice to speak any arbitrary message in the victim's voice.

"As a result, just a few minutes' worth of audio in a victim's voice would lead to the cloning of the victim's voice itself," Saxena says. "The consequences of such a clone can be grave. Because voice is a characteristic unique to each person, it forms the basis of the authentication of the person, giving the attacker the keys to that person's privacy."

As a case study for their paper, the researchers investigated the aftermaths of stealing voices in two important applications and contexts that rely upon voices as the basis for authentication.

The first application is a voice-biometrics, or speaker-verification, system that uses the potentially unique features of an individual's voice to authenticate that individual.

"Voice biometrics is the new buzzword among banks and credit card companies," Saxena says. "Many banks and credit card companies are striving for giving their users a hassle-free experience in using their services in terms of accessing their accounts using voice biometrics."

The technology has now also been deployed on smartphones as a replacement to traditional
PIN locks, and is being used in many government organizations for building access control.

Voice biometrics is based on the assumption that each person has a unique voice that depends not only on his or her physiological features of vocal cords but also on his or her entire body shape, and on the way sound is formed and articulated.

Once the attacker defeats voice biometrics using fake voices, he could gain unfettered access to the system, which may be a device or a service, employing the authentication functionality.

Secondly, the research team looked at the implications stealing voices had on human communications as its other application for the paper's case study. The voice-morphing tool imitated two famous celebrities, Oprah Winfrey and Morgan Freeman, in a controlled study environment.

If an attacker can imitate a victim's voice, the security of remote conversations could be compromised. The attacker could make the morphing system speak literally anything that the attacker wants to, in the victim's tone and style of speaking, and can launch an attack that can harm a victim's reputation, his or her security, and the safety of people around the victim.

"For instance, the attacker could post the morphed voice samples on the Internet, leave fake voice messages to the victim's contacts, potentially create fake audio evidence in the court, and even impersonate the victim in real-time phone conversations with someone the victim knows," Saxena says. "The possibilities are endless."

The results show that the state-of-the-art automated verification algorithms were largely ineffective to the attacks developed by the research team. The average rate for rejecting fake voices was less than 10 to 20 percent for most victims. Even human verification was vulnerable to the attacks. According to two online studies with about 100 users, researchers found that study participants rejected the morphed voice samples of celebrities as well as somewhat familiar users about half the time.

"Our research showed that voice conversion poses a serious threat, and our attacks can be successful for a majority of cases," Saxena says. "Worryingly, the attacks against human-based speaker verification may become more effective in the future because voice conversion/synthesis quality will continue to improve, while it can be safely said that human ability will likely not."

While the results of this study show how vulnerable a person can be to voice attacks, there are ways to prevent one's voice from being stolen. Saxena suggests people increase their awareness of the possibility of these attacks, and also that they be wary of posting audio clips of their voices online.

"Ultimately, the best defense of all would be the development of speaker verification systems that can completely resist voice imitation attacks by testing the live presence of a speaker," Saxena says. "Our future research will examine this and other defense strategies."

UAB graduate students Dibya Mukhopadhyay and Maliheh Shirvanian, researchers in UAB's SPIES Lab co-authored the article with Saxena.


 

No entries found

Read CACM in a free mobile app!
Access the latest issue, plus archived issues and more
ACM Logo
  • ACM CACM apps available for iPad, iPhone and iPod Touch, and Android platforms
  • ACM Digital Library apps available for iOS, Android, and Windows devices
  • Download an app and sign in to it with your ACM Web Account
Find the app for your mobile device
ACM DL Logo