Using artificial-intelligence techniques, a new system from Disney Research and ETH Zurich is capable of learning the association between images and the sounds they make. According to the researchers, a system that can successfully recognize and return the sound of a slamming door or car could be used to add sound effects to film or give audio feedback to people with visual impairments.
To train the system, data was collected from videos with audio tracks. A key challenge was the presence of extraneous sounds that were not associated with the visual content, such as background music, narration, and off-screen noises. The team was able to filter out the extraneous sounds by looking for redundancies between videos; for example, a video collection of cars will contain recurring car engine sounds. The researchers say uncorrelated sounds generally are not repeated within other videos and can be filtered out. After the video frames containing uncorrelated sounds are removed, the algorithm learns which sounds match with an image.
The researchers found the system learning from filtered videos returned better results than one trained with the original video collection.
From Disney Research
View Full Article
Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA
No entries found