Researchers at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a deep-learning algorithm that can take a still image of a scene and model a video simulating the future of that scene.
The algorithm was trained on 2 million unlabeled videos compiling a year's worth of footage, and it produced videos that were deemed by human subjects to be 20% more realistic than a baseline model.
CSAIL postdoctoral student Carl Vondrick says the algorithm can help machines identify human activities without costly annotations.
The researchers taught the model to create multiple frames by producing the foreground separate from the background, and then positioning objects in the scene so the model can differentiate animate from inanimate objects.
The adversarial learning technique, in which two competing neural networks are trained, was used. One network generates video while the other discriminates between actual and simulated videos.
Vondrick says over time the generator can learn to deceive the discriminator. "In the future, this will let us scale up vision systems to recognize objects and scenes without any supervision, simply by training them on video," he says.
From MIT News
View Full Article
Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA
No entries found