Video analysis tools on mobile communication devices
Video applications for mobile communications have been designed mostly for content creation, access, and replay. For instance, recent examples replicate the functionalities of portable video cameras, recorders, and digital TV receivers. Many current devices also have two cameras built in, one for capturing high-resolution images and the other for lower, typically VGA (display standard)-resolution (640×480 pixels) video telephony. This technology represents an attractive platform for building and demonstrating embedded computer vision solutions. Indeed, computational resources and camera application programming interfaces have of late become more effective. By the same token, few truly innovative and usable video or image analysis applications are available for mobile phones.
We have designed four novel solutions that can be implemented on mobile communication devices. The first is a real-time motion-based user interface for browsing large images or documents such as maps on small screens. The motion information is extracted from the image sequence captured by the camera. The second solution is a real-time panorama builder that uses individual video frames, while the third assembles document panoramas, also from single frames. The fourth development is a real-time face and eye detector that can be used with autofocusing and red-eye reduction techniques.Mobile video applications
The motion-based user interface estimates movement between video frames recorded by the camera (see Figure 1).1 Together with accelerometer information, it provides an unprecedented approach to constructing user interfaces for games and large document-browsing applications. The panorama-building solution analyses video frames for motion and moving objects, quantifies the quality of each frame, and stitches up to 360° views from the best available images.2 Figure 2 shows two examples. The document panorama builder is essentially a camera-based scanner (see Figure 3).3 In this case, it interactively guides the user in moving the device over a document, such as a newspaper page, to assemble a high-quality image from individual video frames.
Figure 1. A motion-based user interface with zooming functionalities estimates the motion of the device relative to the user (a). It can be used to browse large image documents on the screen of a handheld device (b).
Figure 2. An efficient panorama builder stitches high-quality images even when the scene contains moving objects.
Figure 3. A mobile device can be used as a camera-based scanner. (a) Building a large map image document. (b) An example output of the page-scanning function.
The real-time face and eye detector/tracker can be employed in autofocusing, or exposure and image content indexing.4 In addition, it incorporates efficient software-based automatic red-eye removal, as shown in Figure 4. Robust face and eye location information provides another basis for motion-related user interfaces, since detection of the presence and motion of a human face in the camera can be a powerful application enabler.
We have demonstrated that proper video analysis and machine intelligence can be used to create new mobile device applications. All the described solutions have been implemented on Nokia Nseries mobile phones. More information about these computer vision developments is available on our Web site, including a demonstration video.
Olli Silvén obtained his MS and PhD degrees in electrical engineering from the University of Oulu in 1982 and 1988, respectively. Since 1996, he has been professor of signal processing engineering. His research interests focus on machine vision and signal processing.
Markus Turtinen obtained his MS and PhD degrees in electrical engineering from the University of Oulu in 2002 and 2007, respectively. He is the co-owner of Visidon Ltd. and specializes in computer vision, pattern recognition, and embedded software.