The human vision system fuses our eyesight with the gravity-based pose and movement sensing capabilities provided by the vestibular system in our inner ear. This gives us an accurate perspective on the 3-D world and allows us to function effectively within it. Recent developments at the Institute of Systems and Robotics at the University of Coimbra (Coimbra, Portugal) have fused micro-electro-mechanical system (MEMS)-based inertial sensors and stereoscopic cameras to provide similar capabilities to artificial systems. "Combining these two sensing modalities will benefit applications ranging from robotics and bioimplants to autonomous vehicles and movie-making," says Jorge Dias, an associate professor at the university. The group has built prototype vision systems to develop the sensor fusion techniques and the algorithms central to their use.
Inertial sensors provide important information on linear and angular motion, changes in momentum, and gravity. Gravity is a key practical reference for the vision system. The cameras observe the structured 3-D world, including visual references and spatial orientation. Fusion of this sensory data occurs at an early processing stage in biological vision systems. In humans, for example, the sense of motion is derived from a combination of the vestibular system and retinal visual flow. Since the MEMS inertial sensors are single-chip devices, they can be incorporated alongside the camera's imaging sensor so that the combination can occur at an early stage in the machine vision case as well. Motion can be sensed as a combination of image flow, scene features, and inertial data on movement.
The researchers used capacitive accelerometers and MEMS gyroscopes as the inertial sensors, which provide important cues regarding scene structure such as vertical and horizontal references. The gyroscopes sense angular motion by measuring the Coriolis effect induced by rotation, using a vibrating MEMS structure. Recent low-cost MEMS inertial sensors fabricated with micromachining techniques perform similarly to the human vestibular system, which makes them suitable for vision applications. Both inertial systems can sense rotations on the order of 0.14 deg/s2, 0.5 deg/s2, and 0.5 deg/s2 for yaw, roll, and pitch motions, respectively.
The image sensors provide spatial directions, and the gravity vector derived from the inertial sensors gives the vision system a unique reference. The system can be constructed such that the relative positions of the image and inertial sensors are known accurately or calibrated. Knowing the relationship between the vertical (gravity) reference and the stereoscopic cameras allows the system to establish a ground plane for the 3-D scene. All positions and motions within the scene, or for the vision system itself, are measurable from this reference data. The researchers can segment and reconstruct the 3-D image based on the combined sensor information, and a variety of image-processing techniques can be used. "The sensor data is quite good, so many of the next key advances will be in the areas of segmentation of the image and 3-D reconstruction," explains Dias.
Axel Pinz of the Graz Institute of Technology (Graz, Austria) agrees. "Inertial and visual sensor hardware has been developed to the required levels of performance. The real tricks will come in the areas of algorithm development for the 3-D processing," he explains. His group has been working in the area for many years and he sees big potential for the hybrid-sensor approach. Both groups are working toward robust solutions on image segmentation and 3-D image reconstruction for the benefit of autonomous robots, real-time tracking systems, and even head-mounted 3-D visual systems for gaming and entertainment.