Artificial Intelligence and Machine Learning Research highlights

Technical Perspective: A Better Way to Learn Features

Posted Oct 1 2011

Article
References
Author

A typical machine learning program uses weighted combinations of features to discriminate between classes or to predict real-valued outcomes. The art of machine learning is in constructing the features, and a radically new method of creating features constitutes a major advance.

In the 1980s, the new method was backpropagation, which uses the chain rule to backpropagate error derivatives through a multilayer, feed-forward, neural network and adjusts the weights between layers by following the gradient of the backpropagated error. This worked well for recognizing simple shapes, such as handwritten digits, especially in convolutional neural networks that use local feature detectors replicated across the image.⁵ For many tasks, however, it proved extremely difficult to optimize deep neural nets with many layers of non-linear features, and a huge number of labeled training cases was required for large neural networks to generalize well to test data.

In the 1990s, Support Vector Machines (SVMs)⁸ introduced a very different way of creating features: the user defines a kernel function that computes the similarity between two input vectors, then a judiciously chosen subset of the training examples is used to create "landmark" features that measure how similar a test case is to each training case. SVMs have a clever way of choosing which training cases to use as landmarks and deciding how to weight them. They work remarkably well on many machine learning tasks even though the selected features are non-adaptive.

The success of SVMs dampened the earlier enthusiasm for neural networks. More recently, however, it has been shown that multiple layers of feature detectors can be learned greedily, one layer at a time, by using unsupervised learning that does not require labeled data. The features in each layer are designed to model the statistical structure of the patterns of feature activations in the previous layer. After learning several layers of features this way without paying any attention to the final goal, many of the high-level features will be irrelevant for any particular task, but others will be highly relevant because high-order correlations are the signature of the data’s true underlying causes and the labels are more directly related to these causes than to the raw inputs. A subsequent stage of fine-tuning using backpropagation then yields neural networks that work much better than those trained by backpropagation alone and better than SVMs for important tasks such as object or speech recognition.^1,2,4 The neural networks outperform SVMs because the limited amount of information in the labels is not being used to create multiple features from scratch; it is only being used to adjust the class boundaries by slightly modifying the features.

The following paper by Lee et al. is the first impressive demonstration that greedy layer-by-layer feature creation can be applied to large images. To make this work, they had to use replicated local feature detectors, and they had to solve a tricky technical problem in probabilistic modeling of convolutional neural networks. These networks summarize the outputs of nearby copies of the same feature detector by simply reporting their maximum value. For unsupervised learning to work properly, this operation must be given a sensible probabilistic interpretation. The authors solve this problem by using a soft, probabilistic version of the maximum function, and they show that this allows them to learn an impressive feature hierarchy in which the first layer represents oriented edge filters, the second layer represents object parts, and the third represents larger parts or whole objects. They also show their model can combine bottom-up and top-down inference, using more global context to select appropriately between local features.

The learning algorithm used by the authors is designed to produce a composite generative model called a "deep belief net,"³ but they perform top-down inference as if it were a different generative model called a "deep Boltzmann machine." They achieve quite good results at image completion, and even better results might be obtained if they fine-tuned their generative model as a deep Boltzmann machine using a recent algorithm developed by Salakhutdinov.⁷

Machine learning still has some way to go before it can efficiently create the complicated features like SIFT⁶ used in many leading systems for computer vision. However, this paper should seriously worry those computer vision researchers who still believe that hand-engineered features have a long-term future. Further improvements from unsupervised learning also seem likely: biology tells us that applying high-resolution filters across an entire image is not the best way to use a neural net, even if it has billions of neurons.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Technical Perspective: A Better Way to Learn Features

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/2001269.2001294

October 2011 Issue

Published: October 1, 2011

Vol. 54 No. 10

Page: 94

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Technical Perspective: A Better Way to Learn Features

DOI

October 2011 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.