A low-cost visual sensor network for elderly care
As the population ages, there is growing incidence of impaired mobility and cognitive disorders such as Alzheimer's disease. For elderly people with these conditions it is often necessary to move to care facilities, where round-the-clock assistance is available. Many dread this solution, and furthermore it comes at a huge economic cost to society.
Technology can provide increased levels of safety for elderly people living at home, postponing the move to institutional care settings, or even eliminating it completely. Simple devices, such as wearable panic buttons, are cheap and useful but fail when patients forget to wear them or how to use them, or become unconscious. Multisensor networks in the living environment, such as pressure sensors (on a bed, chair, or toilet), door and window opening sensors, and motion sensors,1 can provide basic location data, but cameras would allow even richer safety and behavioral monitoring. Camera images can localize a person, analyzing their pose2 and behavior in detail. However, such systems are expensive, not so much because of the actual cameras, but because of the associated infrastructure (networks, cabling, and computers) and installation costs.
Our solution is a sensor network based on very low-resolution (900-pixel) visual sensors and low-bit-rate wireless communication. Distributed processing algorithms running on microcontrollers and microcomputers analyze changes in motion and behavior patterns over time and detect possible emergency situations. At that point, family members or caregivers can activate (low-quality) video streaming to assess the situation. The low resolution of the sensor poses significant technical challenges, but it enables a cheap, battery-powered, wireless system.
A key component of our approach is analysis of location, motion and pose. Figure 1 shows the distributed processing pipeline. First, a microcontroller in each sensor performs preprocessing, including devignetting (correcting for lower brightness at the periphery of the image), automatic gain control, and noise reduction, and then runs video analysis algorithms to separate the silhouettes of moving persons from the static background.3,4 Achieving proper handling of the low resolution, noise, and the poor and quickly changing lighting conditions is particularly challenging. A single-board, low-cost computer runs a robust people tracker5 based on recursive maximum likelihood principles. The tracker requires only the bounding boxes of the silhouettes as seen by each camera, and therefore avoids data communication in the absence of changes, prolonging battery life.
To detect behavioral changes over time, we first cluster person trajectories in time and space (see Figure 2). Specifically, our system automatically detects ‘hot spots’—frequently occupied locations—and computes mobility statistics for these, and for the tracks between them. Figure 3 shows the changes in activity level of an elderly person recovering from a stroke over 40 days. In this case, activity levels decrease in the sitting area, but increase in the kitchen,6,7 indicating improved mobility. Figure 4 shows the evolution of other behavior-related statistics.8
Pose and motion analysis may indicate possible emergencies such as falls or wandering behavior, but to reduce the cost of false alarms any emergency response should adopt a cascaded approach. For example, as a first step, family members or caregivers may attempt to contact the person by phone. If there is no response, they can activate video transmission to assess the situation.
We designed the system's video codec (for coding and decoding) specifically for extremely low-resolution data. It allows high-quality but very low-bandwidth wireless transmission, and can still be used on microcontrollers, despite their limited computing power. The main functional units implement only the most probable coding options9, meaning only one intra-frame prediction mode and only one data block size. Moreover, the video coder has reduced computational needs because it avoids motion estimation (computing the extent to which objects move in the picture), the most time-consuming operation in traditional codecs.10 Hence, our system avoids mode decision mechanisms, predicting the inter-coded frames using the corresponding blocks in the preceding pictures.
We ensured error-resilient transmission using a row-column bit interleaver—which spreads transmission losses over multiple packets—and systematic forward-error-correction codes that protect each chunk of video data. We adjusted the protection level to the network properties by randomly omitting a number of parity bits generated by the error-correction coder. This reduces the amount of memory used for storing the generator matrices (of which the rows form the basis for the linear code) at the sensor node.10
Despite significant technical challenges, low-resolution visual sensor networks are a viable solution to monitor people's behavior at home. They provide sufficiently rich information to detect health-related behavioral changes and even enable low-quality video transmission to assess emergency situations. They can provide this functionality without cabling, significantly reducing installation cost.
Our future work will focus on rudimentary semantic activity classification, using temporal probability models.11For example, frequent motion within the kitchen area followed by a period of sitting may indicate cooking followed by eating.
This system was developed in the iMinds research project 'Little Sister: Low-cost monitoring for care and retail'12 and is currently being evaluated in the Ambient Assisted Living Joint Programme project SONOPA (Social Networks for Older adults to Promote an Active life).13
Francis Deboeverie received a Master of Science in electronics and ICT engineering technology in 2007, and a PhD in engineering in 2014. He is currently a postdoctoral researcher.
Richard Kleihorst received a PhD from Delft University in 1994. He worked at Philips, NXP, and VITO, and is a guest professor at Ghent University. His main research topic is smart camera networks, which form the basis of two companies he started. He founded the IEEE/Association for Computing Machinery International Conference on Distributed Smart Cameras and the Workshop on Architecture of Smart Camera.
Wilfried Philips is a senior professor and heads the Image Processing and Interpretation research group. His main research interests are image and video restoration and multi-camera computer vision. He has received several scientific awards, including the Alumni award of the Belgian-American Educational Foundation.
Vrije Universiteit Brussel/iMinds
Jan Hanca received MSc and engineering degrees in electronics and telecommunications from Poznan University of Technology, Poland, in 2010. He is currently a PhD researcher, benefiting from a grant from the Flemish agency Innovation by Science and Technology.
Adrian Munteanu is a professor in the Electronics and Informatics Department. His area of expertise is data compression, on which he has published more than 200 scientific articles, patent applications, and contributions to standards. He currently serves as associate editor for IEEE Transactions on Multimedia.