Micro-expressions are very short involuntary facial expressions that reveal emotions people try to hide (see Figure 1). They can be used for lie detection and are used by trained officials at US airports to detect suspicious behavior. For example, a terrorist trying to conceal a plan to commit suicide would very likely show a very short expression of intense anguish. However, human recognition accuracy is very low; even highly trained human detectors are notoriously inaccurate, achieving a recognition accuracy of only 47%.1 This performance makes an automatic computer detector very attractive.
We have developed the first system for recognizing real, spontaneous facial micro-expressions. We have developed a temporal interpolation method that enables multiple kernel learning and other machine learning algorithms to classify micro-expressions even with a normal 25 frames per second (fps) camera (see Figure 2 for an illustration of the method). Our system achieves very promising results that compare favorably to human micro-expression recognition accuracy. We also have the first publicly available database of micro-expressions.2
The major challenges in recognizing micro-expressions are twofold. The first is involuntariness. How can we get human training data for our algorithm when the expressions are involuntary? We cannot rely on actors as they cannot act out involuntary expressions. The second is short duration. The implication of this is there are a very limited number of frames available using normal cameras, making recognition very challenging.
To obtain samples of involuntary expressions, we conducted an induced emotion suppression experiment where subjects were recorded as they attempted to suppress their facial expressions while watching 16 emotion-eliciting film clips. They were told that experimenters would be watching their face and that if their facial expression leaked so that the experimenter correctly guessed which clip they were watching, they would be asked to fill in a dull 500-question survey. This induced 77 micro-expressions in six subjects, recorded with a camera taking 100fps. To solve the problem of the expressions' short duration, we devised a temporal interpolation method that interpolates each micro-expression to a larger number of frames. This method enables us to detect micro-expressions even with a standard camera.
Figure 1. The bottom image shows a temporal cross-section during the six-frame-long facial micro-expression depicted in the top image. The cross-section is positioned at a given x-coordinate on the upper lip of the subject.
Figure 2. An example of a facial micro-expression (above left) being interpolated through graph embedding (above right); the result from which spatiotemporal local texture descriptors are extracted (lower right), enabling recognition with multiple kernel learning.
To address the large variations in the spatial appearances of micro-expressions, we crop and normalize the face geometry according to the eye positions from a Haar eye detector and the feature points from an active shape model (ASM)3 deformation. Using 68 ASM feature points—see Figure 3—we compute a local weighted mean4 transformation of the frames. We further temporally normalize all micro-expressions to a given set of frames. To do this, we view a video of a micro-expression as a set of images sampled along a curve and create a continuous function in a low-dimensional manifold by representing the micro-expression video as a path graph. We then sample a new interpolated video along the graph (see Figure 4). We refer to this process as a temporal interpolation method (TIM or TIMn, where n is the resulting number of frames).
Figure 3. Spatial normalization of faces. Facial feature points derived from an active shape model are mapped onto a model face.
Figure 4. Temporal interpolation method. The video is mapped onto a curve along which a new video is sampled.
In an ideal case, spontaneous micro-expression recognition would work with standard cameras without special hardware. An illustrative comparison of results using different machine learning algorithms shows that TIM enables high recognition accuracy even when using a standard 25fps frame rate. In the detection phase we distinguish micro-expressions from other facial data (see Figure 5). The best results were achieved by combining the ‘random forest’ classifier combined with TIM to 20 frames using 25fps data, yielding 78.9% accuracy.
In the classification phase we classify the recognized micro-expression as negative or positive. With a support vector machine learning algorithm only, we achieve a rather poor accuracy of 54.2% (50% chance). However, using the multiple kernel learning machine learning algorithm and TIM instead, we improve the result to 71.4%.
Figure 5. Subject cross-validation results. MKL: Multiple kernel learning. RF: Random forests classifier. TIMn: Temporal interpolation to n frames. SVM: Support vector machine.
In summary, we have described the first system to successfully recognize spontaneous facial micro-expressions. We use temporal interpolation to counter short video lengths and an induced emotion suppression experiment to induce spontaneous micro-expressions in human subjects. Our system achieves very promising results that compare favorably to human micro-expression recognition accuracy. Future work includes expanding the micro-expression corpus to more participants, comparing our system to the performance achieved by trained humans on our dataset and enabling real-time recognition.
The authors are grateful to Infotech Oulu and the Academy of Finland for their funding.
Tomas Pfister, Matti Pietikäinen
Machine Vision Group
University of Oulu
Tomas Pfister received a BA in computer science from the University of Cambridge, UK. He also completing his PhD at the Visual Geometry Group at the University of Oxford, UK. His research interest is human-centered computer vision, in particular affect recognition from facial features and sign language recognition.
1. M. G. Frank, M. Herbasz, K. Sinuk, A. Keller, A. Kurylo, C. Nolan, I see how you feel: training laypeople and professionals to recognize fleeting emotions, 2009. Paper presented at annual meeting of the International Communication Association. http://www.allacademic.com/meta/p15018_index.html
2. T. Pfister, X. Li, G. Zhao, M. Pietikäinen, Recognising spontaneous facial micro-expressions, Poster presented at Int'l Conference on Computer Vision (ICCV), 2011. Available from http://tomas.pfister.fi
3. T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham, Active shape models–their training and application, Comput. Vision Image Understanding
61, pp. 38-59, 1995. doi:10.1006/cviu.1995.1004
4. A. Goshtasby, Image registration by local approximation methods, Image Vision Comput.
6, pp. 255-261, 1988. doi:10.1016/0262-8856(88)90016-9