Hiding Data in Music

A spectrogram of the 1999 song "Nannou" by Aphex Twin, demonstrating the ability to hide an image in the music. — A spectrogram of the song Nannou by Aphex Twin yields this image of the performing artist.

Data buried deep within the background music played in cafes, shops, sports venues and airports could soon be providing a raft of new services to smartphone users, such as seamlessly pairing phones with wireless or Bluetooth networks, or receiving flight updates or sports news and stats when cellphone networks are swamped with too many users.

This sonic feat has been made possible by a team of acoustics engineers at ETH Zurich, the Swiss Federal Institute of Technology in Zurich, who used a technique normally used to prevent interference in digital radio broadcasts and 4G signaling to conceal data streams in music played over loudspeakers.

ETH researchers led by Manuel Eichelberger and Simon Tanner were inspired in part by an XKCD comic strip (see below) highlighting just how complicated it remains, more than 20 years after the advent of Wi-Fi and Bluetooth appeared, to pair phones with wireless routers and Bluetooth devices you have never encountered before while on the move.

Credit: XKCD

The researchers wondered if they could use sound to make it easier for people to set up such links, which they described as “a hassle in countless ad hoc communications situations.” In research presented at the ACM HotMobile conference in Santa Cruz, CA, in February, the ETH team revealed its response to the problem: a smart method of encoding data in music such that it cannot be heard.

The ETH team was not starting with a blank slate, however; people have hidden data in music before (but they have hidden it amongst the 0s and 1s of computer music files, rather than in the sound emitted by a loudspeaker). These files are normally used in digital “steganography” applications: the hiding of data in plain sight to keep messages secret, such as illicit bank account numbers or encryption keys. Play that data through speakers, though, and the error rate is so high, it is not possible to reconstruct the digital message accurately.

However, the ETH researchers were not seeking secrecy: its aim is to embed useful data in music tracks in a way that is as audibly unobtrusive as possible, and at a high data rate. To achieve this, the ETH team’s technology plays two major tricks.

First, they harness a property of the human auditory system called psychoacoustic masking, in which the ear cannot discern some sounds if they are within a certain range (called the critical bandwidth) of high amplitude (louder) sounds. The louder sounds in this case are the harmonics of a particular piece of music, and the data ETH wants to hide piggybacks on signals within the critical masking bandwidth of those harmonics.

Second, they do not simply create one masked data-carrying signal within a critical bandwidth of the loud harmonics in a piece of music. Instead, they create a great many very narrow signals near each harmonic frequency, allowing the data rate to be boosted as each signal carries different bits of data simultaneously. Called orthogonal frequency-division multiplexing (OFDM), this encoding method (more typically used in digital radio and 4G cellular transmissions) also has the effect of making signal reception more robust against interference from echoes as music reverberates around a room.

“OFDM has at least two advantages,” says Eichelberger. “One is its echo resistance. The other is specific to our method: the narrow frequency bands used as our OFDM channels allow us to efficiently use the masking frequencies. This allows for a high data transmission rate.”

Calling the data rate “high” is, admittedly, a relative term, and the speed achieved sounds distinctly like a pre-1990 modem specification: the method achieves a data rate of just 900 bits per second. Yet that is enough for Wi-Fi or Bluetooth configuration data, and is four times faster than any previously suggested method has achieved in simulations.

In initial audibility tests with 40 volunteers across the classical, jazz, hip hop, electronic and hard rock genres, 18% said they could hear that a track they listened to over loudspeakers had been modified. However, after tweaking the algorithms to cope, in particular, with the high dynamic range of classical music, the ETH team said they have made it all but impossible to tell a track is tainted with data.

You can see what you think by downloading ETH’s original and modified sound clips, here.

“With the improved masking, it looks perfect now,” says Eichelberger. “In fact, a larger fraction of participants think that the modified music is the original than the actual original.”

The ETH team will be revealing the details on how that marked improvement was achieved at the IEEE’s International Conference on Acoustics, Speech, and Signal Processing in Brighton, UK, in May, he says.

Observers are guardedly impressed with the ETH work so far. “You can indeed hide data in music streams. You look at masking models of the ear and find bits that the brain won’t notice if it is changed,” says Trevor Cox, a professor of acoustic engineering at Salford University in the U.K. “People inaudibly hide watermarks for music in a similar way, so if the ETH system is designed well, I see no reason for it not to work if conditions are good.”

Cox explains, “There is a trade-off between data bit rate and audibility. The other issue will be the effect of the spaces it is used in, as reverberation can boost error rates. And above a certain speed, I would also anticipate that a moving phone would cause problems.”

Another issue is security. Although it was never confirmed, fears that audio can be used to transmit malware were raised in 2013 when Canadian developer Dragos Ruiu discovered a new, air gapped MacBook was nevertheless being infected by malware that he dubbed BadBIOS, seemingly via its built-in microphone. In addition, researchers at Germany’s Fraunhofer Institute have shown that covert mesh networks can indeed be established between computers using audio transmission alone.

Still, Eichelberger is doubtful the issue could be any worse with audio than it is with standard wireless communications. “Our technique works like any other data transmission method, like LTE or Wi-Fi, so it is no more or less secure than these protocols,” he says. “Problems cannot arise from the transmission itself, but only from the handling of the data in the smartphone app accessing the microphone data stream.”

Despite such issues, things are looking up at one end of the audio spectrum: ultrasound-based wireless transmission. In the vanguard here is Chirp, a Hoxton, U.K-based startup providing software developer kits that allow devices like acoustic beacons, mobiles, and laptops to generate inaudible ultrasonic tones that allow data exchange between IoT devices. “No pre-existing music or audio is needed as a carrier,” says Chirp CTO Daniel Jones.

In one major application, Chirp provides connectivity the full length of the 80-meter-long turbine halls of EDF Energy’s nuclear power stations in France, where radio-based wireless signaling is not allowed for fear of creating interference with the high power generating kit. The ultrasound beacons are also finding roles in IoT-based smart workplaces, transport, and ticketing applications.

Jones sees such uptake as an encouraging sign for developments like the ETH technology, as well as for Chirp. “Audio is often overlooked as a carrier for the information that’s always around us, particularly as smart speakers and voice interfaces become ubiquitous,” he says.

Paul Marks is a technology journalist, writer, and editor based in London, U.K.