One of the most important components of training machine-learning models is data. The amount of training data, how clean it is, its diversity, how well it reflects the real world—all can have a dramatic effect on the performance of a trained model. Hence, collecting new datasets and finding reliable sources of supervision have become imperative for advancing the state of the art in many computer vision and graphics tasks, which have become highly dependent on machine learning. However, collecting such data at scale remains a fundamental challenge.
In this article, we focus on an intriguing source of training data—online videos. The Web hosts a large volume of video content, spanning an enormous space of real-world visual and auditory signals. The existence of such abundant data suggests the following question: How can we imbue machines with visual knowledge by directly observing the world through raw video? There are a number of challenges faced in exploring this question.
No entries found