Recognizing faces like humans
Humans are extremely powerful in recognizing faces that they see often. During encounters, we either recognize a face or reject it as unfamiliar. Praise for this ability still echoes in the literature:1 “The only system that seems to work well under challenging conditions is the human visual system.” While humans do this routinely, a particularly challenging aspect of face-recognition research is the question of rejecting previously unseen faces as unfamiliar. A system with this ability has long been desired:2 “The similarity measure used in a face-recognition system should be designed so that humans' ability to perform face recognition and recall are imitated as closely as possible by the machine.”
The prevailing approach is based on matching and ranking images. Given a test image, a face-recognition algorithm finds its closest match in a database of stored images based on some similarity measure. In ‘closed-world’ applications, where the test image/person is guaranteed to be in the database, if the closest match is found correctly, the test image/person will be identified correctly. By contrast, in ‘open-world’ applications, where the test image/person may not be in the database (as might occur with watchlist surveillance, where we are interested only in recognizing ‘wanted’ subjects), mis-identification may occur regardless of the outcome of the search.
A threshold could be used to decide whether the best match is a correct match. However, establishing the proper threshold value that works well for previously unseen data is very difficult. We have developed an approach that uses an artificial neural network to replicate the human ability to recognize faces. In contrast to most existing approaches, our approach is particularly useful for open-world applications.
Much like humans, our face-recognition algorithm can recognize a certain number of pre-specified ‘target persons’ while rejecting everyone else. In a surveillance application, those individuals may be people on a watch list, and in an access-control application they will be people with authority. Humans do not match or rank faces/images as a precursor to recognition. Our approach is not based on ranking images, but rather on identifying and enclosing the region in the human face space that belongs to the target person. Consequently, at the test phase, if the image is projected inside that region, it will be identified as the target, but otherwise rejected.
During training, the region is identified with the help of a human critic. We take the image of the target person and morph it towards different facial images from a large database until it becomes borderline acceptable, i.e., significantly different from the target but still recognizable as such (according to the human critic). Next, we morph it even further until it becomes borderline unacceptable, i.e., with some resemblance to the target but not enough to be recognizable as that person.3 Figure 1 illustrates the process.
Typically, a human critic would need to examine morphed images of the target person compared with only some 10 to 20 images from the database to determine the average morphing percentages for the borderline acceptable and unacceptable exemplars. (We have found that different human critics do not produce significantly different morphing percentages with regard to a given face.) Applying those percentages and using images in the database, the computer will automatically generate and label a sufficiently large training set. In practice, it is adequate if most of the generated exemplars are projected where intended because the neural net will fit hyperplanes to the generated data. For greater accuracy the database may be divided into subgroups according to gender, race, and age. The human critic may pick slightly different morphed percentages for each subgroup. Next, a three-layer neural network is trained on the two sets of positive and negative exemplars.
Over the past several years, we have collected several thousand facial images at the Naval Research Laboratory (NRL). Furthermore, we have made extensive use of the Face Recognition Grand Challenge dataset4 as well as other databases. Extensive algorithmic experiments involving 10 target persons under different indoor lighting conditions and over 10,000 nontarget images have indicated a false accept (false alarm) rate of one in 100,000. The false reject error rate is, however, more difficult to assess. In our experiments such errors appeared to be almost all due to head poses and expressions outside the expected range, e.g., looking far to the side, etc. Of course, in real-life applications of automatic surveillance, cameras will be installed at many different locations, and each is likely to capture more than one image per person. Therefore, it may be of no consequence if the target person is missed in one image or by one camera. That is, the true accept rate does not have to be 100%, but the false accept rate must be extremely low.
The history of face recognition, however, indicates that it is considerably more challenging to obtain a good recognition rate in real-life scenarios, i.e., live systems facing people in person (with many inherent unpredictabilities), rather than algorithms recognizing their pre-recorded facial images. We have been conducting a large-scale system test by taking the system to different buildings (under different lighting conditions) at the NRL. So far, close to 200 people have participated in live experiments. The system has recognized those on its watch list while rejecting all others with no error. The next steps would involve expanded testing, continuation of our effort to further improve the quality of automated morph images, and incorporating age progression.
The support of the US Naval Research Laboratory and the Office of Naval Research is gratefully acknowledged.
Behrooz Kamgar-Parsi received his PhD in physics from the Catholic University of America. He is currently a senior research scientist at the Navy Center for Artificial Intelligence, where he is engaged in research in computer vision. He is an associate editor of the journal Pattern Recognition Letters.