Although multifunctional hand-held devices are proliferating rapidly, for reasons of cost they generally feature video services that merely offer resized content for mobile consumption.1 Experiments to assess image resolution and bit-rate requirements indicate that the primary defect of the reduced image resolution is loss of visual detail.2 Field sports are more problematic than music, news, and animation. Soccer, one of the most popular broadcast sports (see Figure 1), is a prime example.
Figure 1. Long-shot camera angles in a soccer video display on a Digital Multimedia Broadcasting device with loss of visual detail.
For some types of content, an intelligent display scheme would generate a magnified view of regions of interest (ROIs), defined as some portion of a scene to which the viewer will pay special attention. Determining ROIs is valuable for context-aware content adaptation, transcoding, and intelligent information management. In addition, it can serve as a first step to semantic-level interpretation.
Instead of traditional visual salience orientation, we employ a domain-specific approach to exploit the particular attributes of soccer video. We distinguish between the various camera angles and the long shot, in which the ball and players are tiny when viewed on a small display, as in Figure 2(a). The scene would be more readily understood if an ROI were extracted and magnified. Other frames, whether mid-shots or close-ups, are easily understood: see Figure 2(b) and (c). Thus, for improved display on a small LCD panel, the frames can be sorted into just two categories, and only a part of a whole image need be extracted from the long-shot frames.
Figure 2. The long-shot frame (a) requires a magnified view of ROI, which is unnecessary for mid-shot frames (b) or close-ups (c).
Raising the quality of the viewing experience
Our proposed algorithm is outlined in Figure 3, in which detection of ground pixels is the first step in analysis.3 A block-based approach prompts processing. Taking into account temporally consecutive ground blocks at the same locations, shot boundaries are readily detected. If the center part of a frame is occupied by non-ground blocks, it is not considered a long-shot frame, and needs no further processing before display. Long-shot frames are sent to the next stage, in which more precise analysis is required. It is reasonable to expect the ROI to be located around the ball, which is discovered via pixel-based segmentation and ball detection by means of predefined attributes. Detection determines the ROI window, which positions the ball at its center.
Figure 3. A simple block diagram indicates the proposed algorithm.
The process of ROI determination includes several additional considerations. First, window movement should be smooth to avoid erratic shifts in display. Second, the ball as followed by a camera most probably remains around the center area of the shot. Finally, an acceleration scheme is needed to take into account moments when the ball is flying through long-shot frames.
Implementation on a PDA
The proposed algorithm has been tested on Microsoft Embedded Visual Studio 4.0 for a PDA application. Specifications for the PDA are summarized in Table 1. To test performance, we used ten soccer video clips, running 120s and encoded in H.264 Baseline Profile. The average processing time, including decoding and rendering on the display, is currently about 28 frames per second.
Table 1. The environment for PDA implementation does not require high-end hardware.
Conclusions and outlook
The system we have developed provides more comfortable viewing of soccer video on small-display viewers by distinguishing and magnifying frames that require ROI. The system runs almost real time on a low-end PDA, and the code and algorithm are currently being optimized.
Our next step will be to combine additional intelligent schemes to further improve the viewing experience. We hope, for example, to relocate the score box when it disappears due to ROI extraction. Another challenge will be to extend the intelligent display scheme to other types of videos.
Figure 4. Screen shots from the demo system are shown with the original image frame and ROI window overlapped, left, and the ROI window magnified to fit the screen, right, in (a) close-up shot, (b) mid-shot, (c) long shot, and (d) long shot with ground shadow.
Visual Information Processing Laboratory, Information and Communications University
Changick Kim received his BS and MS degrees from Yonsei University and POSTECH. After receiving a PhD from the University of Washington, Seattle, in 2000, he joined Epson Research and Development Palo Alto Laboratory. Currently he is an assistant professor at the School of Engineering of Information and Communications University (ICU) in South Korea.
2. H. Knoche, J. D. McCarthy, M. A. Sasse, Can small be beautiful? Assessing image resolution requirements for mobile TV,
Conf. Proc. 13th Ann. ACM Intl. Conf. Multimedia,
pp. 829-838, ACM Press, New York, NY, USA, 2005.