A face-tracking system to detect falls in the elderly
As life expectancy increases and birth rates fall, most industrialized countries anticipate a growing elderly population in the coming century. In Western Europe, for example, people aged over 60 represented 20% of the total population in 2000, but this number will reach 42% in 2050.1 Given these projections, and the costs and logistics of caring for the elderly, it is generally recommended that the healthiest dependent people remain in their own homes, rather than transferring to an institutionalized setting. To realize this aim, care applications, or 'smart homes,' have evolved in recent decades.2–12 These heterogeneous systems, designed to assist dependent people in everyday life, include automatic detection methods for falls---the primary cause of accidental death in isolated dependent people. Solutions currently available include wearable sensors (push-buttons8 or accelerometers5, 8,9), but these technologies have major drawbacks. For example, carelessness or cognitive trouble can lead to them being worn intermittently, and the wearer of the sensor needs to be conscious to press the button. Furthermore, when a loss of consciousness occurs slowly, it is undetectable by this kind of technology.
Consequently, we require a system that is able to interpret a situation and detect and analyze a movement. We propose an automated and stand-alone surveillance method, fully integrated within the environment (see Figure 1). A large number of sensors set up in the home would collect different kinds of accessible data: audio, video, IR, or pressure (from sensors embedded in furniture). Information from these would pass to a local calculation unit for testing and analysis. Thus, it would be possible to consider a large variety of situations such as falls, unusual inaction, or a sudden change in habits. Information about these events would go to emergency services, and would provide diagnostic information to health practitioners. Furthermore, an alert would go to relatives by Short Message Service or email.
Based on the concept of a fall as a transition from standing to lying down, we tracked the position of a subject's face to assess temporal and spatial information. At present, our work focuses on this tracking stage.13 Our system has the advantages of being relatively simple and able to simultaneously process detection, identification, and localization. A Fourier transform is applied on an entry plane composed of a reference and a target image (the face to be recognized), and an inverse Fourier transform yields a correlation plane. In this study we use a joint transform correlator (JTC),14 an image processing technique that can be used to compare several images in parallel, and which is particularly suitable for tracking situations.

Tracking | Without histogram | Histogram |
---|---|---|
Tracked pictures (%) | 58.11 | 81.46 |
Non-tracked pictures (%) | 41.89 | 18.54 |
The correlation plane given by a JTC implementation contains two cross-correlation peaks whose location depends on the relative position of reference and target images in the entry plane. This allows the localization of a target motif (namely a face) in a scene. An iterative algorithm---in which the reference image at each timestamp (time t) is replaced by the previously detected face in the target image (t−1)---makes face tracking possible in each video frame, taking into account the different variations of our tracked motif in time (see Figure 2). The algorithm initialization (t=0) is performed by means of a Viola-Jones object detection framework.15 To avoid false detection cases (correlation with the scene background, for example), we realize a histogram comparison between the target and reference image (the person's face). We can detect a large inter-frame variation, making it possible to re-initialize our algorithm if there is a loss of tracking.
We produced an experimental setup to test the reliability of our approach. First, we created a reproduction of a hospital or retirement home (see Figure 3). Second, we imagined a wide variety of scenarios (where the subject is facing away from the detector, rotates, or falls, or where the face is hidden by another object) comprising 21,087 frames (see Figure 4). We recorded the face position on each frame manually, leading to ‘ground truth,’ where in each frame we manually localize the position of the head and register it, so that the information can be used to evaluate the algorithm. Table 1 presents a comparison between the iterative JTC algorithm with and without the histogram similarity stage. The effect of histogram correction is noticeable, giving an improvement of 23 percentage points.
Finally, we experimented with fall detection using a naive method based on speed measurement. A fall is detected when the downward vertical speed of a face across successive video frames exceeds a certain threshold. We also considered the horizontal speed for elliptical falls, weighting it by a 1/4 factor, yielding the formula
where xt and yt are the face coordinates at time t.
The results, obtained on a set of 60 falls (vertical, left, and right), are shown in Table 2. We correctly detected 58% of the total falls. Various factors affected the result. The speed measured depends on the distance between the subject and the camera. If the face is obscured during a fall, the Viola-Jones detector may not be able to re-initialize the algorithm, and the naive fall detection method is unsuitable for slow falls and for when the face follows an elliptical trajectory.
Vertical | Left | Right | Total | |
---|---|---|---|---|
Recognized falls | 13 | 10 | 12 | 35 |
Recognized falls (%) | 65 | 50 | 60 | 58.34 |
Our method can simultaneously detect, localize, and identify the person. Furthermore, it can accurately perform a tracking process. Unfortunately, that process still suffers from some limitations, and the correlation has to be considered as a baseline method, to be improved in future work. A background subtraction (where we define the background scene with a fixed camera, and eliminate it from the results) may be an appropriate enhancement of our system, as would silhouette and skeleton detections for posture identification, which could be fuzzed with our system. Finally, we need to compare our technique with other fall detection systems, using an extended experimental database.16
This work is supported by Project 0OR251, a collaboration between the Malakoff-Médérik Group, Open Society, and ISEN-Brest.
Institut Supérieur d'Électronique et du Numérique (ISEN)
Philippe Katz received his engineering diploma from ISEN-Brest and his MSc in signals and images in biology and medicine from the University of Brest in 2011. Since then, he has been a PhD student at ISEN. His research interests include image and signal processing and smart homes.
Michael Aron received an engineering diploma in 2002 from Polytech-Sophia, University of Nice-Sophia Antipolis. He received his PhD in computer science from the University of Lorraine in 2009, and conducted his image processing post-doctoral research at The French Research Institute for Exploitation of the Sea. Since 2011, he has been an associate professor at ISEN-Brest. His research topics include computer vision and image processing.
Ayman Alfalou's research interests are in optical engineering, optical information processing, signal and image processing, telecommunications, and optoelectronics. He has published more than 110 refereed journal articles or conference papers, and is a senior member of SPIE, the Optical Society of America, and the Institute of Electrical and Electronics Engineers, and is a member of the Institute of Physics.