University of Surrey researchers have separated human voices from the background in a wide range of songs using some of the latest advances associated with deep neural networks. The researchers say they have solved the cocktail party effect, which is the ability to focus on a specific human voice while filtering out other voices or background noise, a task that has challenged computer engineers.
The new method involves a database of 63 songs that are available as a set of individual tracks that each contains a different instrument or voice, as well as the fully mixed version of the song. The researchers divided each track into 20-second segments and created a spectrogram for each showing how the frequencies in the sound vary over time, resulting in a unique fingerprint that identifies the instrument or voice. The researchers then trained a deep convolutional neural network to pick the voice's unique spectrogram from the other spectrograms that were present. The researchers used 50 songs, generating more than 20,000 spectrograms, to train the network while keeping the remaining 13 to test it on.
"These results demonstrate that a convolutional deep neural network approach is capable of generalizing voice separation, learned in a musical context, to new musical contexts," the researchers say.
From Technology Review
View Full Article
Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA
No entries found