Computational Learning Techniques Becoming Mainstream

In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. — Usually a support vector machine works in a multi-dimensional hyperplane, but here it separates two classes of points in two dimensions: H1 (green) does not separate the classes at all, while H2 (blue) does but only by a small margin (small black lines pe

Machine learning today is one of the fastest ways to cut through the enormous and growing amounts of big data that would take years of study by a human analyst to make sense. It is no wonder, then, that one of the most-cited ACM papers — and the 10th most-cited of any computer science paper (according to CiteSeer) — is addressing machine learning by quickly dividing seemingly random data points into coherent multidimensional clusters.

That paper, LIBSVM: a library for support vector machines, by Chih-Chung Chang and Chih-Jen Lin of the department of computer science of National Taiwan University, appeared in the May 2011 issue of ACM Transactions on Intelligent Systems and Technology; it has since been dounloaded more than 14,000 times, according to the ACM Digital Library.

For the past 15 years, Lin has devoted himself to developing this highly regarded software library. The library has broken through the labyrinth of machine-learning debacles by allowing users without extensive experience to easily apply support vector machines (SVMs) to tasks of predictive analytics.

Lin has helped to make SVMs one of the most widely used data classification methods since they were invented by Vladimir Vapnik and his colleagues at Bell Labs in the early 1990s. The original idea was to construct a set of hyperplanes in a high-dimensional space that could then be used for effective classification. However, this setting (often referred to as kernel tricks) results in a difficult numerical optimization problem. Lin’s group was among the earliest to develop efficient and effective algorithms to solve this optimization problem; he proved the theoretical convergence of some popular optimization algorithms currently being used for SVMs.

In its original form, an SVM can handle only data in two classes (for example, positive and negative), rather than multi-class scenarios such as handwritten digit recognition. Nor could an SVM give probability outputs to show the possibility a data instance is in a specific class. Lin’s group proposed extensions that allowed SVM to become a more complete classification technique; all these functionalities were included and remain properly maintained in the LIBSVM package.

"The support vector machine is not really a machine, but a classifier," Lin said. "SVMs narrow down important data that in turn create the model used for future prediction. This subset of data instances retained in the model are called support vectors and that is from which the name of the method comes."

Aside from its use in SVMs, numerical optimization is applied in many successful classification techniques. For example, neural network-based "deep learning" creates models with optimization methods–mainly back-propagation of training errors.

A decade ago, optimization and machine learning were considered completely separate areas, but optimization has become just as ubiquitous in the development of many new machine-learning methods, according to Lin. As data becomes bigger and more heterogeneous, he expects more advanced optimization techniques, including parallel ones with distributed execution, will be needed for the next generation of machine learning.

Unfortunately, the learning curve of machine-learning techniques is still relatively steep. For example, the practical use of SVMs involves tunable parameters, but users often have trouble selecting suitable values, which prompts them to seek ways of automating the machine-learning process. Some simple tools for automatic parameter selection have been provided in LIBSVM, which significantly simplify the process, contributing to its popularity among users, Lin said.

For applications such as speech recognition and text mining, "developers have to deal with issues such as data pre-processing, feature selection, and model selection," Lin said. "We hope to automate many of these issues so that people who are not well-versed in machine-learning methods can easily use our machine-learning tools to choose the best features, the most suitable methods, as well as precisely tuning the parameters. Democratizing machine learning is our ultimate goal."

Lin said he has seen has some promising preliminary results on automatic machine learning, adding that some forward-looking companies already are creating software for the same purpose. Integration with web-based interfaces and cloud data storage are also being actively developed.

He anticipates such systems will encourage domain experts to adopt machine learning, rather than it being used solely by computer scientists.

R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.