During the last Super Bowl, Emotient, a San Diego, CA-based company, hosted an unusual party. Emotient recruited 30 volunteers to a local bar to watch the game, eat, and drink. As the annual spectacle progressed on two televisions, a camera attached to each flat screen monitored the viewers. Behind the scenes, the Emotient Analytics system identified and tracked each face within the camera’s view, then determined the changing emotional state of each individual over time. Whether they were amused, ambivalent, or surprised, the system matched those reactions to what was happening on screen, determining which commercials seemed to bore the viewers and which ones they liked enough to share. “We were able to predict which advertisements were likely to go viral based on facial behavior,” says lead scientist Marian Bartlett.
Both the Emotient technology and a similar system from Waltham, MA-based Affectiva are products of the burgeoning field of affective computing, in which researchers are developing systems that estimate the internal emotional state of an individual based on facial expressions, vocal inflections, gestures, or other physiological cues. Affectiva and Emotient are catering to advertising and market research companies, but affective computing stands to impact a number of areas. The technique could lead to online learning systems that notice when a student is growing frustrated with a problem, healthcare applications that measure how a depressed patient is responding to a new medication, or in-home assistance robots that closely monitor the emotional state of the elderly.
The field has been around for two decades; Affectiva co-founder and Massachusetts Institute of Technology professor Rosalind Picard coined the term “affective computing” in the mid-1990s. Yet experts say a combination of improved data, increased processing power, and more robust machine learning techniques has led to significant recent advances. “Affective computing has really exploded in the last five years,” says signal processing expert Shrikanth Narayanan of the University of Southern California.
Focusing on Faces
When people communicate with each other, we study each other’s vocal intonations, gestures, facial expressions, and posture, each of which gives us some information about the other person’s internal state—whether they are engaged, distracted, or annoyed. In some cases, the expressions are contradictory; the fact that a child is smiling, for example, might seem to indicate she is happy. Yet Narayanan’s work has shown how computers trained to pick up clues in the pitch, intonation, or rhythm of a child’s speech can uncover hidden emotions lurking behind that smile.
This sort of detection is natural for humans; we track multiple cues to guess what someone is really thinking or feeling. “We’re implicitly integrating all these pieces of information,” Narayanan says. “We’re not picking one or the other.”
At this point, though, the most popular affective computing systems focus on visual input, and on the face in particular. Until recently, simply recognizing and tracking faces in a crowded, dimly lit environment such as a Super Bowl party was a serious challenge. Early systems could be confused by something as pedestrian as a beard. Today’s software can delineate contours in the lips, eyes, nose, and more, and it can do so even in poor lighting conditions, according to Akshay Ashtana, a facial recognition researcher at Seeing Machines in Canberra, Australia.
Emotient and Affectiva benefit from this technology, since it allows them to pick out multiple faces in a single crowded scene, but before they focus their systems on those targets, they have to train them to recognize the different expressions and emotions. Both technologies are based on psychologist Paul Ekman’s facial action coding system, or FACS, which details the involuntary muscle movements we generate in response to different stimuli, and how these subtle cues are linked to different emotional states.
Generating Quality Data
Emotient employed trained consultants to produce and analyze hundreds of thousands of pictures of people spontaneously displaying emotions. Then the company trained its algorithms on even more data. To teach the system to recognize joy, for example, they fed it 100,000 images of people looking and/or feeling joyful, then another million pictures of individuals who were not quite so happy.
The Emotient Analytics technology segments each frame into a grid of 2,304 pixels. Each pixel is then analyzed in terms of its brightness, which can range from zero up to 255. Next, the system searches across that array of pixel values for higher-level features such as edges. This provides the system with what Bartlett calls the dense texture features of the image: a precise measure of the patterns of light and darkness across the entire face. “We end up with a very high-dimensional description,” she says.
Once the system breaks down each image, it compares the different bodies of data—joy and not-joy. Then it uses statistical pattern recognition to figure out what joy actually looks like in the raw data. Emotient has used this general approach for years, but more recent advances in its capabilities can be traced in part to its switch to deep learning methods. “Deep learning is much more powerful at being able to pick up patterns in the data,” Bartlett says.
Deep Learning
In the past, scientists had to instruct a system to look for specific features. Deep learning analyzes the raw data without such instructions, using a hierarchical approach. First the system searches for patterns at the pixel level. Then it looks across groups of pixels and picks out edges, corners, gradients, and so forth. Once these and other layers are scrutinized, and the emergent patterns are linked to specific emotions, the chance of miscategorizing or failing to recognize an expression drops significantly.
McDuff likens deep learning to the way an infant identifies connections in the real world. When babies start to see cars on the road or on the television, they do not necessarily know each of these is called an automobile, but their brains begin to associate the unique combinations of edges and shapes with each other. They start to link all of these images of cars together, even though some are two-dimensional representations on a screen, because they identify shared features at different levels. “That’s what our algorithms are doing,” says McDuff. “They’re looking at the pixel information, and loads and loads of examples, and they’re figuring out what types of pixel structures coincide with different facial expressions.”
At Emotient, Bartlett says deep learning has increased the system’s range. Originally, if a person turned more than 15 degrees to the side, or tilted or rolled back their head by the same amount, the system wouldn’t be able to accurately estimate his or her facial actions. With deep learning, however, the system can capture expressions when the individual’s face is beyond 30 degrees off-center.
The deep learning approach also scales more effectively than previous methods. Like Emotient, Affectiva has its own dataset of hundreds of thousands of hand-coded images, and this is a huge advantage with deep learning. “As you increase the amount of training data or the number of images you give the algorithm, the performance just continues to improve,” McDuff says.
Applications and Multimodal Approaches
The power of this new measurement tool for market research and advertising has not been lost on businesses; both Emotient and Affectiva have signed a number of major clients. Yet affective computing experts inside and outside the commercial sphere are also pressing forward with many other applications. Bartlett, who is also a computer scientist at the University of California, San Diego, has been working with doctors and researchers to improve pain management for children. “There’s a significant problem of undermanaged pain in pediatrics,” she explains. Children do not always accurately express how they feel, but a facial analytics system could help find the truth. “The idea is to be able to have a system that monitors pain levels the same way you keep track of heart rate and blood pressure.”
The number of applications will also expand as researchers incorporate more physiological input, going beyond facial expressions, gestures, voice, and other more obvious cues. At the University of Notre Dame, computer scientist Sydney D’Mello has conducted studies that gauge a person’s emotional state based on the frequency and intensity of their keystrokes during an online learning session. McDuff has shown it is possible to gain insights into someone’s emotions based on who they text, email, or call during the day, and how frequently they do so. Other scientists, such as Robert Jenke at the University of Munich, have conducted studies trying to pinpoint a subject’s true emotions using electroencephelogram (EEG) electrodes.
Eventually, scientists hope the technology will incorporate multiple sensor input streams—voice, audio, gesture—to create a more accurate picture of a person’s internal emotional state. This added data would be especially valuable when evaluating people from different parts of the world. In some cultures, for example, social norms dictate that people mask their emotions when in groups. Estimating the emotional state of such individuals based on visual feedback could be misleading. For now, using multiple input streams remains a major challenge, but humans stand as proof of the potential benefits. “Computers are still miles away from what a human can do in terms of intuition and understanding emotion,” says McDuff. “Computers still just have the understanding of babies.”
Further Reading
Picard, R.W.
Affective Computing, The MIT Press, 2000.
Calvo, R.A., D’Mello, S.K., Gratch, J., and Kappas, A. (Eds.)
The Oxford Handbook of Affective Computing, Oxford University Press, 2015.
Bartlett, M., Littlewort, G., Frank, M., and Lee, K.
Automated Detection of Deceptive Facial Expressions of Pain, Current Biology, 2014.
Narayanan, S., et. al.
Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information, Proceedings of the International Conference on Multimodal Interfaces, 2004.
Emotient https://vimeo.com/67741811
Join the Discussion (0)
Become a Member or Sign In to Post a Comment