Computing Applications Research highlights

Emotion Recognition Using Wireless Signals

By Mingmin Zhao, Fadel Adib, and Dina Katabi

Posted Sep 1 2018

Abstract
1. Introduction
2. Background and Related Work
3. EQ-Radio Overview
4. Capturing the RF Signal
5. Beat Extraction Algorithm
6. Emotion Classification
7. Evaluation
8. Conclusion
References
Authors
Footnotes

This paper demonstrates a new technology that can infer a person’s emotions from RF signals reflected off his body. EQ-Radio transmits an RF signal and analyzes its reflections off a person’s body to recognize his emotional state (happy, sad, etc.). The key enabler underlying EQ-Radio is a new algorithm for extracting the individual heartbeats from the wireless signal at an accuracy comparable to on-body ECG monitors. The resulting beats are then used to compute emotion-dependent features which feed a machine-learning emotion classifier. We describe the design and implementation of EQ-Radio, and demonstrate through a user study that its emotion recognition accuracy is on par with state-of-the-art emotion recognition systems that require a person to be hooked to an ECG monitor.

1. Introduction

Emotion recognition is an emerging field that has attracted much interest from both the industry and the research community.^{13, 18, 22, 35, 40} It is motivated by a simple vision: Can we build machines that sense our emotions? If we can, such machines would enable smart homes that react to our moods and adjust the lighting or music accordingly. Movie makers would have better tools to evaluate user experience. Advertisers would learn customer reaction immediately. Computers would automatically detect symptoms of depression, anxiety, and bipolar disorder, allowing early response to such conditions. More broadly, machines would no longer be limited to explicit commands, and could interact with people in a manner more similar to how we interact with each other.

Existing approaches for inferring a person’s emotions either rely on audiovisual cues, such as images and audio clips,^{22, 42, 48} or require the person to wear physiological sensors like an Electrocardiogram (ECG) monitor.^{7, 21, 26, 36} Both approaches have their limitations. Audiovisual techniques leverage the outward expression of emotions, but cannot measure inner feelings.^{12, 16, 36} For example, a person may be happy even if she is not smiling. Also, people differ widely in how expressive they are in showing their inner emotions, which further complicates this problem.²⁵ The second approach recognizes emotions by monitoring the physiological signals that change with our emotional state. Intuitively, a person’s heart rate increases with anger or excitement; there are also more complex changes that appear as variability in the duration of a heart beat.^{12, 39} This approach uses on-body sensors – For example, ECG monitors – to measure these signals and correlate their changes with joy, anger, etc. This approach is more correlated with the person’s inner feelings since it taps into the interaction between the autonomic nervous system and the heart rhythm.^{27, 39} However, the use of body sensors is cumbersome and can interfere with user activity and emotions, making this approach unsuitable for regular usage.

In this paper, we introduce a new method for emotion recognition that achieves the best of both worlds – that is, it directly measures the interaction of emotions and physiological signals, but does not require the user to carry sensors on his body. Our design uses Radio Frequency (RF) signals to sense emotions. Specifically, RF signals reflect off the human body and get modulated with bodily movements. Recent research has shown that such RF reflections can be used to measure a person’s breathing and average heart rate without body contact.^{6, 15, 19, 23, 33} However, the periodicity of the heart signal (i.e., its running average) is not sufficient for emotion recognition. To recognize emotions, we need to measure the minute variations in each individual beat length.^{12, 29, 39}

Yet, extracting individual heartbeats from RF signals incurs multiple challenges, which can be seen in Figure 1. First, RF signals reflected off a person’s body are modulated by both breathing and heartbeats. The impact of breathing is typically orders of magnitude larger than that of heartbeats, and tends to mask the individual beats (see the top graph in Figure 1); to separate breathing from heart rate, past systems operate over multiple seconds (e.g., 30sec in Ref. Adib et al.⁶) in the frequency domain, forgoing the ability to measure the beat-to-beat variability. Second, heartbeats in the RF signal lack the sharp peaks which characterize the ECG signal, making it harder to accurately identify beat boundaries. Third, the difference in Inter-Beat-Intervals (IBI) is only a few tens of milliseconds. Thus, individual beats have to be segmented to within a few milliseconds. Obtaining such accuracy is particularly difficult in the absence of sharp features that identify the beginning or end of a heartbeat. Our goal is to address these challenges to enable RF-based emotion recognition.

Figure 1. Comparison of RF signal with ECG signal. The top graph plots the RF signal reflected off a person’s body. The envelope of the RF signal follows the inhale-exhale motion. The small dents in the signal are due to heartbeats. The bottom graph plots the ECG of the subject measured concurrently with the RF signal. Individual beats are marked by grey and white shades. The numbers report the beat-length in seconds. Note the small variations in consecutive beat lengths.

We present EQ-Radio, a wireless system that infers people’s emotions from the radio signals that bounce off their bodies. EQ-Radio’s key enabler is a new algorithm for extracting individual heartbeats and their differences from RF signals. Our algorithm first mitigates the impact of breathing. While chest displacement due to the inhale-exhale process is orders of magnitude larger than minute vibrations due to heartbeats, the acceleration of breathing is smaller than that of heartbeats. This is because breathing is slow and steady while a heartbeat involves rapid contraction of the muscles (which happen at localized instances in time). Hence, EQ-Radio operates on the acceleration of RF signals to dampen the breathing signal and emphasize the heartbeats.

Next, EQ-Radio needs to segment the RF reflection into individual heartbeats. In contrast to the ECG signal which has a known expected shape (see the bottom graph in Figure 1), the shape of a heartbeat in RF reflections is unknown and varies depending on the person’s body and exact posture with respect to the device. Thus, we cannot simply look for a known shape as we segment the signal; we need to learn the beat shape as we perform the segmentation. We formulate the problem as a joint optimization, where we iterate between two sub-problems: the first sub-problem learns a template of the heartbeat given a particular segmentation, while the second finds the segmentation that maximizes resemblance to the learned template. We keep iterating between the two sub-problems until we converge to the best beat template and the optimal segmentation that maximizes resemblance to the template. Finally, we note that our segmentation takes into account that beats can shrink and expand and hence vary in beat length. Thus, the algorithm finds the beat segmentation that maximizes the similarity in the morphology of a heartbeat signal across consecutive beats while allowing for flexible warping (shrinking or expansion) of the beat signal.

We have built EQ-Radio into a full-fledged emotion recognition system. EQ-Radio’s system architecture has three components: The first component is an Frequency Modulated Carrier Waves (FMCW) radio that transmits RF signals and receives their reflections. The radio leverages the approach in Ref. Adib et al.⁶ to zoom in on human reflections and ignore reflections from other objects in the scene. Next, the resulting RF signal is passed to the beat extraction algorithm described above. The algorithm returns a series of signal segments that correspond to the individual heartbeats. Finally, the heartbeats – along with the captured breathing patterns from RF reflections – are passed to an emotion classification sub-system as if they were extracted from an ECG monitor. The emotion classification sub-system computes heartbeat-based and respiration-based features recommended in the literature^{12, 26, 36} and uses an Support Vector Machine (SVM) classifier to differentiate various emotional states.

We evaluate EQ-Radio by conducting user experiments with 30 subjects. We design our experiments in accordance with the literature in the field.^{12, 26, 36} Specifically, the subject is asked to evoke a particular emotion by recalling a corresponding memory (e.g., sad or happy memories). She/he may use music or photos to help evoking the appropriate memory. In each experiment, the subject reports the emotion she/he felt, and the period during which she/he felt that emotion. During the experiment, the subject is monitored using both EQ-Radio and a commercial ECG monitor. Further, a video is taken of the subject then passed to the Microsoft image-based emotion recognition system.¹

Our experiments shows that EQ-Radio’s emotion recognition is on par with state-of-the-art ECG-based systems, which require on-body sensors.²¹ Specifically, if the system is trained on each subject separately, the accuracy of emotion classification is 87% in EQ-Radio and 88.2% in the ECG-based system.^a

Our results also shows that EQ-Radio’s performance is due to its ability to accurately extract heartbeats from RF signals. Specifically, even errors of 40–50ms in estimating heartbeat intervals would reduce the emotion recognition accuracy to 44%. In contrast, our algorithm achieves an average error in IBI of 3.2ms, which is less than 0.4% of the average beat length.

2. Background and Related Work

2.1. Emotion recognition

Existing approaches for extracting emotion-related signals fall under two categories: audiovisual techniques and physiological techniques. Audiovisual techniques rely on facial expressions, speech, and gestures.^{17, 48} The advantage of these approaches is that they do not require users to wear any sensors on their bodies. However, because they rely on outwardly expressed states, they tend to miss subtle emotions and can be easily controlled or suppressed.²⁶ Moreover, vision-based techniques require the user to face a camera in order for them to operate correctly. On the other hand, physiological measurements, such as ECG and EEG signals, are more robust because they are controlled by involuntary activations of the Autonomic Nervous System (ANS).¹⁰ However, existing sensors that can extract these signals require physical contact with a person’s body, and hence interfere with the user experience and affect her emotional state. In contrast, EQ-Radio can capture physiological signals without requiring the user to wear any sensors by relying purely on wireless signals reflected off her/his body.

2.2. RF-based sensing

RF signals reflect off the human body and are modulated by body motion. Past work leverages this phenomenon to sense human motion: it transmits an RF signal and analyzes its reflections to track user locations,⁵ gestures,^{4, 8, 38, 43} activities,⁴⁶ and vital signs.^{6, 15} Our work is closest to prior art that uses RF signals to extract a person’s breathing rate and average heart rate.^{6, 15, 19, 23, 33} In contrast to this past work, which recovers the average period of a heartbeat (which is of the order of a second), emotion recognition requires extracting the individual heartbeats and measuring small variations in the beat-to-beat intervals with millisecond-scale accuracy. Unfortunately, prior research that aims to segment RF reflections into individual beats either cannot achieve sufficient accuracy for emotion recognition^{11, 20, 31} or requires the monitored subjects to hold their breath.⁴¹ In particular, past work that does not require users to hold their breath has an average error of 30–50ms,^{11, 20, 31} whereas EQ-Radio achieves average accuracy of 3.2ms.

3. EQ-Radio Overview

EQ-Radio is an emotion recognition system that relies purely on wireless signals. It operates by transmitting an RF signal and capturing its reflections off a person’s body. It then analyzes these reflections to infer the person’s emotional state. It classifies the person’s emotional state according to the known arousal-valence model into one of four basic emotions^{26, 30}: anger, sadness, joy, and pleasure (i.e., contentment).

EQ-Radio’s system architecture consists of three components that operate in a pipelined manner, as shown in Figure 2:

Figure 2. EQ-Radio Architecture. EQ-Radio has three components: a radio for capturing RF reflections (Section 4), a heartbeat extraction algorithm (Section 5), and a classification subsystem that maps the learned physiological signals to emotional states (Section 6).

A radio sensor that transmits RF signals and captures their reflections off a person’s body.
A beat extraction algorithm, which takes the captured reflections as input and returns a series of signal segments that correspond to the person’s individual heartbeats.
An emotion-classification subsystem, which computes emotion-relevant features from the captured physiological signals – that is, the person’s breathing pattern and heartbeats – and uses these features to recognize the person’s emotional state.

4. Capturing the RF Signal

EQ-Radio operates on RF reflections off the human body. To capture such reflections, EQ-Radio uses a radar technique called FMCW.⁵ There is a significant literature on FMCW radios and their use for obtaining an RF signal that is modulated by breathing and heartbeats.^{6, 9, 37} We refer the reader to Ref. Adib et al.⁶ for a detailed description of such methods, and summarize below the basic information relevant to this paper.

The radio transmits a low power signal and measures its reflection time. It separates RF reflections from different objects/bodies into buckets based on their reflection time. It then eliminates reflections from static objects which do not change across time and zooms in on human reflections. It focuses on time periods when the person is quasi-static. It then looks at the phase of the RF wave which is related to the traveled distance as follows⁴⁵: where ϕ(t) is the phase of the signal, λ is the wavelength, d(t) is the traveled distance, and t is the time variable. The variations in the phase correspond to the compound displacement caused by chest expansion and contraction due to breathing, and body vibration due to heartbeats.

The phase of the RF signal is illustrated in the top graph in Figure 1. The envelop shows the chest displacements as the inhale-exhale process. The small dents are due to minute skin vibrations associated with blood pulsing. EQ-Radio operates on this phase signal.

5. Beat Extraction Algorithm

A person’s emotions are correlated with small variations in her/his heartbeat intervals; hence, to recognize emotions, EQ-Radio needs to extract these intervals from the RF phase signal described above.

The main challenge in extracting heartbeat intervals is that the morphology of heartbeats in the reflected RF signals is unknown. Said differently, EQ-Radio does not know how these beats look like in the reflected RF signals. Specifically, these beats result in distance variations in the reflected signals, but the measured displacement depends on numerous factors including the person’s body and her exact posture with respect to EQ-Radio’s antennas. This is in contrast to ECG signals where the morphology of heartbeats has a known expected shape, and simple peak detection algorithms can extract the beat-to-beat intervals. However, because we do not know the morphology of these heartbeats in RF a priori, we cannot determine when a heartbeat starts and when it ends, and hence we cannot obtain the intervals of each beat. In essence, this becomes a chicken-and-egg problem: if we know the morphology of the heartbeat, that would help us in segmenting the signal; on the other hand, if we have a segmentation of the reflected signal, we can use it to recover the morphology of the human heartbeat.

This problem is exacerbated by two additional factors. First, the reflected signal is noisy; second, the chest displacement due to breathing is orders of magnitude higher than the heartbeat displacements. In other words, we are operating in a low Signal-to-Interference-and-Noise Ratio (SINR) regime, where “interference” results from the chest displacement due to breathing.

To address these challenges, EQ-Radio first processes the RF signal to mitigate interference from breathing. It then formulates and solves an optimization problem to recover the beat-to-beat intervals. The optimization neither assumes nor relies on perfect separation of the respiration effect. In what follows, we describe both of these steps.

5.1. Mitigating the impact of breathing^b

The goal of the preprocessing step is to dampen the breathing signal and improve the SINR of the heartbeat signal. Recall that the phase of the RF signal is proportional to the composite displacement due to the inhale-exhale process and the pulsing effect. Since displacements due to the inhale-exhale process are orders of magnitude larger than minute vibrations due to heartbeats, the RF phase signal is dominated by breathing. However, the acceleration of breathing is smaller than that of heartbeats. This is because breathing is usually slow and steady while a heartbeat involves rapid contraction of the muscles. Thus, we can dampen breathing and emphasize the heartbeats by operating on a signal proportional to acceleration as opposed to displacement.

By definition, acceleration is the second derivative of displacement. Thus, we can simply operate on the second derivative of the RF phase signal. Since we do not have an analytic expression of the RF signal, we have to use a numerical method to compute the second derivative. There are multiple such numerical methods which differ in their properties. We use the following second order differentiator because it is robust to noise²:

where refers to the second derivative at a particular sample, f_i refers to the value of the time series i samples away, and h is the time interval between consecutive samples.

In Figure 3, we show an example RF phase signal with the corresponding acceleration signal. The figure shows that in the RF phase, breathing is more pronounced than heartbeats. In contrast, in the acceleration signal, there is a periodic pattern corresponding to each heartbeat cycle, and the breathing effect is negligible.

Figure 3. RF Signal and Estimated Acceleration. The figure shows the RF signal (top) and the acceleration of that signal (bottom). In the RF acceleration signal, the breathing motion is dampened and the heartbeat motion is emphasized. Note that while we can observe the periodicity of the heartbeat signal in the acceleration, delineating beat boundaries remains difficult because the signal is noisy and lacks sharp features.

5.2. Heartbeat segmentation

Next, EQ-Radio segments the acceleration signal into individual heartbeats. Recall that the key challenge is that we do not know the morphology of the heartbeat to bootstrap this segmentation process. To address this challenge, we formulate an optimization problem that jointly recovers the morphology of the heartbeats and the segmentation.

The intuition underlying this optimization is that successive human heartbeats should have the same morphology; hence, while they may stretch or compress due to different beat lengths, they should have the same overall shape. Below we formalize this intuition.

Let x = (x₁, x₂, . . ., x_n) denote the sequence of length n. A segmentation S = {s₁, s₂, . . .} of x is a partition of it into non-overlapping contiguous subsequences (segments), where each segment s_i consists of |s_i| points. The goal of our algorithm is to find the optimal segmentation S^* that minimizes the variance of segments, which can be formally stated as follows:

We can rewrite it as the following optimization problem

The term μ in the definition above represents the template for the beat shape (i.e., its morphology), and ω(μ, |s_i|) is linear warping of μ into length |s_i|. The terms b_min and b_max are constraints on the length of each heartbeat cycle. The optimization aims to find the optimal segmentation S and template μ that minimize the sum of the square differences between segments and template. This optimization problem is difficult as it involves both combinatorial optimization over S and numerical optimization over μ.

Solving the optimization problem^c. Instead of estimating the segmentation S and the template μ simultaneously, our algorithm alternates between updating the segmentation and template, while fixing the other. During each iteration, our algorithm updates the segmentation given the current template, then updates the template given the new segmentation. For each of these two sub-problems, our algorithms can obtain the global optimal with linear time complexity.

Update segmentation S. In the l-th iteration, segmentation S^l+1 is updated given template μ^l as follows:

Though the number of possible segmentations grows exponentially with the length of x, the above optimization problem can be solved efficiently using dynamic programming. The recursive relationship for the dynamic program is as follows: if D_t denotes the minimal cost of segmenting sequence x_1:t, then:

where τ_t,B specifies possible choices of τ based on segment length constraints. The time complexity of the dynamic program based on Equation 5 is O(n) and the global optimum is guaranteed.

Update template μ. In the l-th iteration, template μ^l+1 is updated given segmentation S^l+1 as follows:

where m is the required length of template. The above optimization problem is a weighted least squares with the following closed-form solution:

Figure 4 shows the final beat segmentation for the data in Figure 3. The figure also shows the ECG data of the subject. The segmented beat length matches the ECG of the subject to within a few milliseconds. There is a small delay since the ECG measures the electric signal of the heart, whereas the RF signal captures the heart’s mechanical motion as it reacts to the electric signal.⁴⁷

Figure 4. Segmentation Result Compared to ECG. The figure shows that the length of our segmented beats in RF (top) is very similar to the length of the segmented beats in ECG (bottom). There is a small delay since the ECG measures the electric signal of the heart, whereas the RF signal captures the heart’s mechanical motion as it reacts to the electric signal.

6. Emotion Classification

After EQ-Radio recovers individual heartbeats from RF reflections, it uses the heartbeat sequence along with the breathing signal to recognize the person’s emotions.

2D Emotion Model: EQ-Radio adopts a 2D emotion model whose axes are valence and arousal; this model serves as the most common approach for categorizing human emotions in past literature.^{26, 30} The model classifies between four basic emotional states: Sadness (negative valence and negative arousal), Anger (negative valence and positive arousal), Pleasure (positive valence and negative arousal), and Joy (positive valence and positive arousal).
Feature Extraction: EQ-Radio extracts features from both the heartbeat sequence and the respiration signal. There is a large literature on extracting emotion-dependent features from human heartbeats,^{3, 26, 36} where past techniques use on-body sensors. These features can be divided into time-domain analysis, frequency-domain analysis, time-frequency analysis, Poincaré plot,²⁴ Sample Entropy,²⁸ and Detrend Fluctuation Analysis.³⁴ EQ-Radio extracts 27 features from IBI sequences as listed in Table 1. These particular features were chosen in accordance with the results in Ref. Kim and André.²⁶ We refer the reader to Ref. Acharya et al.³; Ref. Kim and André.²⁶ for a detailed explanation of these features.
EQ-Radio also employs respiration features. To extract the irregularity of breathing, EQ-Radio first identifies each breathing cycle by peak detection after low pass filtering. Since past work that studies breathing features recommends time-domain features,³⁶ EQ-Radio extracts the time-domain features in the first row of Table 1.

Table 1. Features used in EQ-Radio.

Handling Dependence: Physiological features differ from one subject to another for the same emotional state. Further, those features could be different for the same subject on different days. This is caused by multiple factors, including caffeine intake, sleep, and baseline mood of the day. In order to extract better features that are user-independent and day-independent, EQ-Radio incorporates a baseline emotional state: neutral. The idea is to leverage changes of physiological features instead of absolute values. Thus, EQ-Radio calibrates the computed features by subtracting for each feature its corresponding values calculated at the neutral state for a given person on a given day.
(d) Feature Selection and Classification: As mentioned earlier, the literature has many features that relate IBI to emotions. Using all of those features with a limited amount of training data can lead to over-fitting. Thus, EQ-Radio uses l₁-SVM⁵⁰ which selects a subset of relevant features while training an SVM classifier. Table 1 shows the selected IBI and respiration features in bold and italic respectively. The performance of the resulting classifier is evaluated in Section 7.2.

7. Evaluation

All experiments in this section were approved by our IRB.

7.1. Evaluation of heartbeat extraction

First, we assess the accuracy of EQ-Radio’s segmentation algorithm in extracting heartbeats from RF signals.

Experimental setup. Participants: We recruited 30 participants (10 females). The subjects are between 19 and 77 year old. The subjects had no restrictions on their clothing.

Experimental Environment: We perform our experiments in five different rooms in a standard office building. The evaluation environment contains office furniture including desks, chairs, couches, and computers. The experiments are performed while other users are present in the room. The change in the experimental environment and the presence of other users had a negligible impact on the results because the FMCW radio described in Section 4 eliminates reflections from static objects (e.g., furniture) and isolates reflections from different humans.⁶

Metrics: To evaluate EQ-Radio’s heartbeat extraction, we use metrics that are common in emotion recognition:

Inter-Beat-Interval (IBI): The IBI measures the accuracy in identifying the boundaries of each individual beat.
Root Mean Square of Successive Differences (RMSSD): This metric focuses on differences between successive beats. RMSSD is typically used as a measure of the parasympathetic nervous activity that controls the heart.⁴⁴ We calculate RMSSD for IBI sequences in a 2min window.
Standard Deviation of NN Intervals (SDNN): The term NN-interval refers to the IBI. Thus, SDNN measures the standard deviation of the beat length over a window of time. We use a window of 2min.

Baseline: We obtain the ground truth for the above metrics using a commercial ECG monitor. We use the AD8232 evaluation board with a 3-lead ECG monitor to get the ECG signal. The ECG device and the FMCW radio are connected to a shared clock to keep them synchronized.

Accuracy in comparison to ECG. We run experiments with 30 participants, collecting over 130,000 heartbeats. Each subject is simultaneously monitored with EQ-Radio and the ECG device. We process the data to extract the above three metrics.

We first compare the IBIs estimated by EQ-Radio to the IBIs obtained from the ECG monitor. Figure 5(a) shows a scatter plot where the x and y coordinates are the IBIs derived from EQ-Radio and the ECG respectively. The color indicates the density of points in a specific region. Points on the diagonal have identical IBIs in EQ-Radio and ECG, while the distance to the diagonal is proportional to the error. It can be visually observed that all points are clustered around the diagonal, and hence EQ-Radio can estimate IBIs accurately irrespective of the their lengths.

Figure 5. Comparison of IBI Estimates Using EQ-Radio and a Commercial ECG Monitor. The figure shows various metrics for evaluating EQ-Radio’s heartbeat segmentation accuracy in comparison with an FDA-approved ECG monitor. Note that the CDF in (b) jumps at 4ms intervals because the RF signal was sampled every 4ms.

We quantitatively evaluate the errors in Figure 5(b), which shows a Cumulative Distribution Function (CDF) of the difference between EQ-Radio’s IBI estimate and the ECG-based IBI estimate for each beat. The CDF has jumps at 4ms intervals because each FMCW sweep takes 4ms. The CDF shows that the 97th percentile error is 8ms. Our results further show that EQ-Radio’s mean IBI estimation error is 3.2ms. Since the average IBI in our experiments is 740ms, on average, EQ-Radio estimates a beat length to within 0.43% of its correct value.

In Figure 5(c), we report results for beat variation metrics that are typically used in emotion recognition. The figure shows the CDF of errors in recovering the SDNN and RMSSD from RF reflections in comparison to contact-based ECG sensors. The plots show that the median error for each of these metrics is less than 2% and that even the 90th percentile error is less than 8%. The high accuracy of these emotion-related metrics suggests that EQ-Radio’s emotion recognition accuracy will be on par with contact-based techniques, as we indeed show in Section 7.2.

7.2. Evaluation of emotion recognition

We evaluate EQ-Radio’s ability to recognize emotions.

Experimental Setup. Participants: We recruited 12 participants (6 females). Among them, 6 participants (3 females) have acting experience of 3∼7 years. People with acting experience are more skilled in emotion management, which helps in gathering high-quality emotion data and providing a reference group.³⁶ All subjects were compensated for their participation, and all experiments were approved by our IRB.

Experiment design: Obtaining high-quality data for emotion analysis is difficult, especially in terms of identifying the ground truth emotion.³⁶ Thus, it is crucial to design experiments carefully. We designed our experiments in accordance with previous work on emotion recognition using physiological signals.^{26, 36} Specifically, before the experiment, the subjects individually prepare stimuli (e.g., personal memories, music, photos, and videos); during the experiment, the subject sits alone in one out of the five conference rooms and elicits a certain emotional state using the prepared stimuli. Some of these emotions are associated with small movements like laughing, crying, smiling, etc. After the experiment, the subject reports the period during which she/he felt that type of emotion. Data collected during the corresponding period are labeled with the subject’s reported emotion.

Throughout these experiments, each subject is monitored using three systems: (1) EQ-Radio, (2) AD8232 ECG monitor, and (3) a video camera focused on the subject’s face.

Ground Truth: As described above, subjects are instructed to evoke a particular emotion and report the period during which they felt that emotion. The subject’s reported emotion is used to label the data from the corresponding period. These labels provide the ground truth for classification.

Metrics & Visualization: When tested on a particular data point, the classifier outputs a score for each of the emotional states. The data point is assigned the emotion that corresponds to the highest score. The classification accuracy is the percent of test data that is assigned the correct emotion.

We visualize the output of the classification as follows: Recall that the four emotions in our system can be represented in a 2D plane whose axes are valence and arousal. Each emotion occupies one of the four quadrants: Sadness (negative valence and negative arousal), Anger (negative valence and positive arousal), Pleasure (positive valence and negative arousal), and Joy (positive valence and positive arousal). Thus, we can visualize the classification result for a particular test data by showing it in the 2D valence-arousal space. If the point is classified correctly, it would fall in the correct quadrant.

For any data point, we calculate the valence and arousal scores as: S_valence = max(S_joy, S_pleasure) – max(S_sadness, S_anger) and S_arousal =max(S_joy, S_anger)– max(S_pleasure, S_sadness), where S_joy, S_pleasure, S_sadness, and S_anger are the classification score output by the classifier for the four emotions. For example, consider a data point with the following scores S_joy = 1, S_pleasure, = 0, S_sadness = 0, and S_anger = 0 – that is, this data point is one unit of pure joy. Such data point falls on the diagonal in the upper right quadrant. A data point that has a high joy score but small scores for other emotions would still fall in the joy quadrant, but not on the diagonal.

EQ-radio’s emotion recognition accuracy. To evaluate EQ-Radio’s emotion classification accuracy, we collect 400 two-minute signal sequences from 12 subjects, 100 sequences for each emotion. We train two types of emotion classifiers: a person-dependent classifier, and a person-independent classifier. Each person-dependent classifier is trained and tested on data from a particular subject. Training and testing are done on mutually-exclusive data points using leave-one-out cross validation.¹⁴ The person-independent classifier, it is trained on 11 subjects and tested on the remaining subject, and the process is repeated for different test subjects.

We first report the person-dependent classification results. Using the valence and arousal scores as coordinates, we visualize the person-dependent classification in Figure 6. Different types of points indicate different emotions. We observe that emotions are well clustered and segregated, suggesting that they are distinctly encoded in valence and arousal, and can be decoded from features captured by EQ-Radio. We also observe that the points tend to cluster along the diagonal and anti-diagonal, showing that our classifiers have high confidence in the predictions. Finally, the accuracy of person-dependent classification for each subject is also shown in the figure with an average accuracy of 87.0%.

Figure 6. Visualization of EQ-Radio’s Person-dependent Classification Results. The figure shows the person-dependent emotion-classification results for each of our 12 subjects. The x-axis in each of the scatter plots corresponds to the valence, and the y-axis corresponds to the arousal. For each data point, the label is our ground truth, and the coordinate is the classification result. At the bottom of each sub-figure, we show the classification accuracy for the corresponding subject.

The results of person-independent emotion classification are shown in Figure 7. EQ-Radio can recognize a subject’s emotion with an average accuracy of 72.3% purely based on data from other subjects, meaning that EQ-Radio succeeds in learning person-independent features for emotion recognition.

Figure 7. Visualization of EQ-Radio’s Person-independent Classification Results. The figure shows the results of person-independent emotion-classification. The x-axis corresponds to valence, and the y-axis corresponds to arousal.

As expected, the accuracy of person-independent classification is lower than that of person-dependent classification. This is because person-independent emotion recognition is intrinsically more challenging since an emotional state is a rather subjective conscious experience that could be very different among different subjects. We note, however, that our accuracy results are consistent with the literature both for the case of person-dependent and person-independent emotion classifications.²¹ Further, our results present the first demonstration of RF-based emotion classification.

To better understand the classification errors, we show the confusion matrix of both person-dependent and person-independent classification results in Figure 8. We find that EQ-Radio achieves comparable accuracy in recognizing the four types of emotions. We also observe that EQ-Radio typically makes fewer errors between emotion pairs that are different in both valence and arousal (i.e., joy vs. sadness and pleasure vs. anger).

Figure 8. Confusion Matrix of Person-dependent and Person-independent Classification Results. The diagonal of each of these matrices shows the classification accuracy and the off-diagonal grid points show the confusion error.

Comparison to ECG and vision-based systems. Table 2 shows the accuracy of EQ-Radio in comparison to an ECG-based emotion classifier. Both classifiers use the same set of features and decision making process. However, the ECG-based classifier uses heartbeat information directly extracted from the ECG monitor. In addition, we allow the ECG monitor to access the breathing signal from EQ-Radio and use EQ-Radio’s breathing features. The results in Table 2 shows that EQ-Radio achieves comparable accuracy to emotion recognition systems that use on-body sensors. Thus, by using EQ-Radio, one can eliminate body sensors without jeopardizing the accuracy of emotion recognition based on physiological signals.

Table 2. Comparison with the ECG-based Method^d.

Next, we compare EQ-Radio with vision-based emotion recognition. We use the Microsoft Project Oxford Emotion API to process the images of the subjects collected during the experiments, and analyze their emotions based on facial expressions. Since the Microsoft Emotion API and EQ-Radio use different emotion models, we use the following four emotions that both systems share for our comparison: joy/pleasure, sadness, anger, and neutral. For each data point, the Microsoft Emotion API outputs scores for eight emotions. We consider their scores for the above four shared emotions and use the label with highest score.

Figure 9 compares the accuracy of EQ-Radio (both person-dependent and person-independent) with the Microsoft Emotion API. The figure shows that that the Microsoft Emotion API does not achieve high accuracy for the first three categories of emotions, but achieves very high accuracy for neutral state. This is because vision-based methods can recognize an emotion only when the person explicitly expresses it on her face, and fail to recognize the innermost emotions and hence they report such emotions as neutral. We also note that the Microsoft Emotion API has higher accuracy for positive emotions than negative ones. This is because positive emotions typically have more visible features (e.g., smiling), while negative emotions are visually closer to a neutral state.

Figure 9. Comparison of EQ-Radio with Image-based Emotion Recognition. The figure shows the accuracies (on the y-axis) of EQ-Radio and Microsoft’s Emotion API in differentiating among the four emotions (on the x-axis).

8. Conclusion

This paper presents a technology for inferring a person’s emotions from the wireless signals reflected off her/his body. We believe this marks an important step in the nascent field of emotion recognition. Furthermore, while we used the heartbeat extraction algorithm for determining the beat-to-beat intervals and exploited these intervals for emotion recognition, our algorithm recovers the entire human heartbeat from RF, and the heartbeat displays a very rich morphology. We envision that this result paves way for exciting research on understanding the morphology of the heartbeat both in the context of emotion-recognition as well as in the context of non-invasive health monitoring and diagnosis.

Figure. Watch the authors discuss their work in this exclusive Communications video. https://cacm.acm.org/videos/emotion-recognition-using-wireless-signals

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Emotion Recognition Using Wireless Signals

View in the ACM Digital Library

Copyright held by owners/authors. Publication rights licensed to ACM.
Request permission to publish from permissions@acm.org

DOI

10.1145/3236621

September 2018 Issue

Published: September 1, 2018

Vol. 61 No. 9

Pages: 91-100

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Apr 26 2024

Optimizing Energy Efficiency in Datacenters with Advanced Cooling Technologies

Alex Williams

Architecture and Hardware

Credit: Getty Images Servers in snowy setting.

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

1. Introduction

2. Background and Related Work

3. EQ-Radio Overview

4. Capturing the RF Signal

5. Beat Extraction Algorithm

6. Emotion Classification

7. Evaluation

8. Conclusion

Emotion Recognition Using Wireless Signals

DOI

September 2018 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.