Proceedings Volume 5685

Image and Video Communications and Processing 2005

cover
Proceedings Volume 5685

Image and Video Communications and Processing 2005

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 14 March 2005
Contents: 16 Sessions, 108 Papers, 0 Presentations
Conference: Electronic Imaging 2005 2005
Volume Number: 5685

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Video Surveillance I
  • Video Surveillance II
  • Image Coding
  • H.264 Video Coding
  • Investigative Image Processing
  • Video Coding
  • Scalable Video Coding
  • Selective Encryption for Image/Video
  • Object Tracking
  • Computer Vision
  • Media over Networks
  • Challenges in Real-World Media Streaming
  • Wyner-Ziv Video Coding
  • Video Processing I
  • Video Processing II
  • Poster Session
  • Investigative Image Processing
Video Surveillance I
icon_mobile_dropdown
Technologies for multimedia and video surveillance convergence
In this paper, we present an integrated system for video surveillance developed within the European IST WCAM project, using only standard multimedia and networking tools. The advantages of such a system, while allowing cost reduction and interoperability, is to benefit from the fast technological evolution of the video encoding and distribution tools.
WCAM: smart encoding for wireless surveillance
Jerome Meessen, C. Parisot, C. Le Barz, et al.
In this paper, we present an integrated system for smart encoding in video surveillance. This system, developed within the European IST WCAM project, aims at defining an optimized JPEG 2000 codestream organization directly based on the semantic content of the video surveillance analysis module. The proposed system produces a fully compliant Motion JPEG 2000 stream that contains regions of interest (typically mobile objects) data in a separate layer than regions of less interest (e.g. static background). First the system performs a real-time unsupervised segmentation of mobiles in each frame of the video. The smart encoding module uses these regions of interest maps in order to construct a Motion JPEG 2000 codestream that allows an optimized rendering of the video surveillance stream in low bandwidth wireless applications, allocating more quality to mobiles than for the background. Our integrated system improves the coding representation of the video content without data overhead. It can also be used in applications requiring selective scrambling of regions of interest as well as for any other application dealing with regions of interest.
WCAM: secured video surveillance with digital rights management
Yulen Sadourny, Vania Conan, Carlos Serrao, et al.
The WCAM project aims to provide an integrated system for secure delivery of video surveillance data over a wireless network, while remaining scalable and robust to transmission errors. To achieve these goals, the content is encoded in Motion-JPEG2000 and streamed with a specific RTP protocol encapsulation to prevent the loss of packets containing the most essential data. Protection of the video data is performed at content level using the standardized JPSEC syntax, along with flexible encryption of quality layers or resolution levels. This selective encryption respects the JPEG2000 structure of the stream, not only ensuring end-to-end ciphered delivery, but also enabling dynamic content adaptation within the wireless network (quality of service, adaptation to the user's terminal). Some DRM modules from OPENSDRM platform will be added to manage all authenticated peers on the WLAN (from end-users to cameras), as well as to manage the rights to display conditionally the video data. This whole integrated architecture addresses several security problems such as data encryption, integrity and access control. Using several protection layers, the level of confidentiality can depend on both the content characteristics and the user rights, thus also addressing the critical issue of privacy.
Seamless wireless networking for video surveillance applications
The EU FP6 WCAM (Wireless Cameras and Audio-Visual Seamless Networking) project aims to study, develop and validate a wireless, seamless and secured end-to-end networked audio-visual system for video surveillance and multimedia distribution applications. This paper describes the video transmission aspects of the project, with contributions in the areas of H.264 video delivery over wireless LANs. The planned demonstrations under WCAM include transmission of H.264 coded material over 802.11b/g networks with TCP/IP and UDP/IP being employed as the transport and network layers over unicast and multicast links. UDP based unicast and multicast transmissions pose the problem of packet erasures while TCP based transmission is associated with long delays and the need for a large jitter buffer. This paper presents measurement data that have been collected at the trial site along with analysis of the data, including characterisation of the channel conditions as well as recommendations on the optimal operating parameters for each of the above transmission scenarios (e.g. jitter buffer sizes, packet error rates, etc.). Recommendations for error resilient coding algorithms and packetisation strategies are made in order to moderate the effect of the observed packet erasures on the quality of the transmitted video. Advanced error concealment methods for masking the effects of packet erasures at the receiver/decoder are also described.
Smart video surveillance system preserving privacy
In this paper, we present a smart video surveillance system based on standard technologies and wired or wireless IP networking. The camera is performing the following operations. First, a video analysis module detects suspicious events and identifies regions of interest in the scene. The video is then compressed using JPEG 2000. The resulting compressed stream is signed and scrambled for the purpose of data integrity verification and confidentiality. Finally, the stream is protected for robustness to transmission errors based. The system is designed in such a way to protect the privacy of people under surveillance. More specifically, during the analysis stage, people in the scene are detected by change detection or face detection techniques. Scrambling is then applied only to the corresponding regions. Furthermore, the amount of distortion can be controlled by restricting scrambling to some resolution levels or quality layers. In this way, people under surveillance cannot be recognized, but the remaining of the scene is clear. The encryption key is kept under control of judicial authorities, who can grant authorization to unlock the protection and view the whole scene. Per consequent, this system successfully addresses the loss of privacy issue.
Video Surveillance II
icon_mobile_dropdown
Systematic acquisition of audio classes for elevator surveillance
We present a systematic framework for arriving at audio classes for detection of crimes in elevators. We use a time series analysis framework to analyze the low-level features extracted from the audio of an elevator surveillance content to perform an inlier/outlier based temporal segmentation. Since suspicious events in elevators are outliers in a background of usual events, such a segmentation help bring out such events without any a priori knowledge. Then, by performing an automatic clustering on the detected outliers, we identify consistent patterns for which we can train supervised detectors. We apply the proposed framework to a collection of elevator surveillance audio data to systematically acquire audio classes such as banging, footsteps, non-neutral speech and normal speech etc. Based on the observation that the banging audio class and non-neutral speech class are indicative of suspicious events in the elevator data set, we are able to detect all of the suspicious activities without any misses.
Object tracking in low-frame-rate video
Fatih Porikli, Oncel Tuzel
In this paper, we present an object detection and tracking algorithm for low-frame-rate applications. We extend the standard mean-shift technique such that it is not limited within a single kernel but uses multiple kernels centered around high motion areas obtained by change detection. We also improve the convergence properties of the mean-shift by integrating two additional likelihood terms. Our simulations prove the effectiveness of the proposed method.
Embedded vision platform for video surveillance systems
Axel Techmer, Hans-Martin Bluethgen, Cyprian Grassmann, et al.
Automatic video surveillance systems are used in many different fields of applications like intelligent vehicles, intelligent highways or in security tasks. An automatic and correct interpretation of video sequences is based on the detection, tracking and classification of various objects under highly diverse conditions. This requires high sophisticated algorithms as well as high computational performance to fulfil real-time constraints. Here, the Infineon Technologies AG develops a fully programmable, scalable multi-processor architecture optimized for video processing, which provides a processing performance similar to actual PCs but at much lower costs, lower power consumption and lower physical size. This architecture supports data-parallelism for low-level image processing, parallelism of tasks in the medium- and high-level as well as control oriented processing. A cycle accurate, virtual prototype of the architecture is available. A library of optimized image processing functions supports a comfortable application development and the reuse of existing application software. Beside the implementation of standard low-level operators new efficient approaches for motion estimation, object detection and tracking are developed and tested in applications for intelligent vehicles and intelligent highway scenarios. The integration of these application specific tasks into the image processing library results in a powerful embedded vision platform for video surveillance systems, which supports a comfortable application development.
Modeling of pedestrian motion for recognition
Good pedestrian classifiers that analyze static images for presence of pedestrians are in existence. However, even a low false positive error rating is sufficient to flood a real system with false warnings. We address the problem of pedestrian motion (gait) modeling and recognition using sequences of images rather than static individual frames, thereby exploiting information in the dynamics. We use two different representations and corresponding distances for gait sequences. In the first a gait is represented as a manifold in a lower dimensional space corresponding to gait images. In the second a gait image sequence is represented as the output of a dynamical system whose underlying driving process is an action like walking or running. We examine distance functions corresponding to these representations. For dynamical systems we formulate distances derived based on parameters of the system taking into account both the structure of the output space and the dynamics within it. Given appearance based models we present results demonstrating the discriminative power of the proposed distances
Cooperative multisensor system for real-time face detection and tracking in uncontrolled conditions
Luca Marchesotti, Stefano Piva, Andrea Turolla, et al.
The presented work describes an innovative architecture for multi-sensor distributed video surveillance applications. The aim of the system is to track moving objects in outdoor environments with a cooperative strategy exploiting two video cameras. The system also exhibits the capacity of focusing its attention on the faces of detected pedestrians collecting snapshot frames of face images, by segmenting and tracking them over time at different resolution. The system is designed to employ two video cameras in a cooperative client/server structure: the first camera monitors the entire area of interest and detects the moving objects using change detection techniques. The detected objects are tracked over time and their position is indicated on a map representing the monitored area. The objects’ coordinates are sent to the server sensor in order to point its zooming optics towards the moving object. The second camera tracks the objects at high resolution. As well as the client camera, this sensor is calibrated and the position of the object detected on the image plane reference system is translated in its coordinates referred to the same area map. In the map common reference system, data fusion techniques are applied to achieve a more precise and robust estimation of the objects’ track and to perform face detection and tracking. The work novelties and strength reside in the cooperative multi-sensor approach, in the high resolution long distance tracking and in the automatic collection of biometric data such as a person face clip for recognition purposes.
A software for complete calibration of multicamera systems
We present a software for complete metric calibration of synchronized multicamera setups. The minimum is three cameras, but there is no upper limit. The method is fully automatic and a freely moving bright spot is the only calibration object. No camera pre-calibration is required. The software computes complete set of intrinsic camera parameters including the parameters of non-linear distortion as well as external orientation of the cameras in one common coordinate system. The software is written in Matlab, runs in non-interactive mode and produces both textual and intuitive graphical output. It is very robust and may be run with the same set of parameters in different setups. The software is free. We suggest a trick on how to adapt a very ordinary laser pointer to be a useful calibration object. Projections of such calibration object are localized with sub-pixel accuracy. Average reprojection error around 1/5 pixel was reached in well synchronized multicamera setups even for wide-angle lenses with severe non-linear distortion. We show several virtual reality and telepresence applications demonstrating the usefulness of having multicamera setups being fully calibrated.
JPEG2000-based scalable summary for understanding long video surveillance sequences
Jerome Meessen, Jean-Francois Delaigle, Li-Qun Xu, et al.
This paper presents a new method for remote and interactive browsing of long video surveillance sequences. The solution is based on interactive navigation in JPEG 2000 coded mega-images. We assume that the video 'key-frames' are available through automatic detection of scene changes or abnormal behaviors. These key-frames are concatenated in raster scanning order forming a very large 2D image, which is then compressed with JPEG 2000 to produce a scalable video summary of the sequence. We then exploit a mega image navigation platform, designed in full compliance with JPEG 2000 part 9 "JPIP", to search and visualize desirable content, based on client requests. The flexibility offered by JPEG 2000 allows highlighting key-frames corresponding to the required content within a low quality and low-resolution version of the whole summary. Such a fine grain scalability is a unique feature of our proposed JPEG 2000 video summaries expansion. This possibility to visualize key-frames of interests and playback the corresponding video shots within the context of the whole sequence enables the user to understand the temporal relations between semantically similar events. It is then particularly suited to analyzing complex incidents consisting of many successive events spread over a long period.
Image Coding
icon_mobile_dropdown
Reduced memory multi-layer multi-component rate allocation for JPEG2000
Hyperspectral images are acquired incrementally in a “push-broom” fashion by on-board sensors. Since these images are highly voluminous, buffering an entire image before compression requires a large buffer and causes latency. Incremental compression schemes work on small chunks of raw data as soon as they are acquired and help reduce buffer memory requirements. However, incremental processing leads to large variations in quality across the reconstructed image. The solution to this problem lies in using carefully designed rate control algorithms. We propose two such “leaky bucket” rate control algorithms that can be employed when incrementally compressing hyperspectral images using the JPEG2000 compression engine. They are the Multi-Layer Sliding Window Rate Controller (M-SWRC) and the Multi-Layer Extended Sliding Window Rate Controller (M-EWRC). Both schemes perform rate control using the fine granularity afforded by JPEG2000 bitstreams. The proposed algorithms have low memory requirements since they buffer compressed bitstreams rather than raw image data. Our schemes enable SNR scalability through the use of quality layers in the codestream and produce JPEG2000 compliant multi-layer codestreams at a fraction of the memory used by conventional schemes. Experiments show that the proposed schemes provide significant reduction in quality variation with no loss in mean overall PSNR performance.
Integrated lossy, near-lossless, and lossless compression of medical volumetric data
We propose an integrated, wavelet based, two-stage coding scheme for lossy, near-lossless and lossless compression of medical volumetric data. The method presented determines the bit-rate while encoding for the lossy layer and without any iteration. It is in the spirit of "lossy plus residual coding" and consists of a wavelet-based lossy layer followed by an arithmetic coding of the quantized residual to guarantee a given pixel-wise maximum error bound. We focus on the selection of the optimum bit rate for the lossy coder to achieve the minimum total (lossy plus residual) bit rate in the near-lossless and the lossless cases. We propose a simple and practical method to estimate online the optimal bit rate and provide a theoretical justification for it. Experimental results show that the proposed scheme provides improved, embedded lossy, and lossless performance competitive with the best results published so far in the literature, with an added feature of near-lossless coding.
Adaptive SPIHT for image coding based on curved wavelet transform
Liang Zhang, Demin Wang, Andre Vincent
The curved wavelet transform performs 1-D filtering along curves and exploits orientation features of edges and lines in an image to improve the compactness of the wavelet transform. This paper investigates the issue of efficient data organization and representation of the curved wavelet coefficients. We present an adaptive zero-tree structure that exploits the cross-subband similarity of the curved wavelet transform. The child positions in the adaptive zero-tree structure are not restricted to a square of 2x2 pixels and they vary with the curves along which the WT is performed. Five child patterns have been determined according to different combination of curve orientations. A new image coder, using the curved wavelet transform, is then developed based on this adaptive zero-tree structure and the set partitioning technique. Experimental results using synthetic and natural images show the effectiveness of the proposed adaptive zero-tree structure for encoding of the curved wavelet coefficients. The coding gain of the proposed coder can be as higher as 1.2dB in terms of PSNR compared to the SPIHT coder.
H.264 Video Coding
icon_mobile_dropdown
Fast intermode prediction for P slices in the H264 video coding standard
Ming Yuan Yang, Christos Grecos
We propose an inter mode decision scheme for P slices in the H264 video coding standard. Our scheme initially exploits neighbourhood information jointly with a set of skip mode conditions for enhanced skip mode decision. It subsequently performs inter mode decision for the remaining macroblocks by using a gentle set of smoothness constraints. For RD performance very close to the standard we achieve 35-58% reduction in run times and 33-55% reduction in CPU cycles for both the rate controlled and the non rate controlled versions of H264. Compared to other work that has been proposed as input to the standard, gains of 9-23% in run times and 7-22% in CPU cycles are also reported.
Stereo-view video coding using H.264 tools
Shijun Sun, Shawmin Lei
The amount of data that has to be stored and transmitted for stereo-view video applications can be double of conventional mono-view video applications if a mono-view video coding method is applied directly. Although not designed for stereo-view video coding, the H.264 coding tools can be arranged to take advantage of the correlations between the pair of views of a stereo-view video, and provide very reliable and efficient compression performance as well as stereo/mono-view scalability. This paper introduces methods for coding stereo-view video sequences efficiently using the H.264 coding tools, such as the interlace-coding tools and the sub-sequence coding concept. With the help of the stereo video Supplemental Enhancement Information (SEI) message defined in H.264 Fidelity Range Extensions (FRExt), a decoder can easily synchronize the views, and a streaming server or a decoder can easily detect the scalability of a coded stereo video bitstream. Coding performances are shown in the paper for Main and Baseline profiles using both scalable and non-scalable coding options. The scalable coding options that can be signaled by the H.264 SEI messages could have coding performance similar to or better than that of proprietary system configurations for non-scalable coding.
Multiple global affine motion model for H.264 video coding with low bit rate
Xiaohuan Li, Joel R. Jackson, Aggelos K. Katsaggelos, et al.
A multiple global affine motion model is proposed for low bit rate video compression. Block-wise motion segmentation is first performed with the number of motion objects L predefined. The affine motion models for multiple MOs are estimated and coded in the frame header. The scaling parameters a1, a2, a4 and a5 are coded with a 4-dimensional vector-quantizer (VQ), whose 16 most recently used code words are maintained on line and searched for VQ match, and the 300-word long main code book stored offline. The translational parameters a3 and a6 are coded predicatively as a classical motion vector. L new macro-block modes are added to the standard’s list of 7 intra and inter modes. No segmentation information is transmitted, for the mode already indicates that if one of the affine modes is selected by Lagrange rate-distortion optimization. A metric S is defined to measure locality of the motion and will disable use of affine models when a threshold is surpassed. Simulation shows that more than 50% of the MB’s choose one of the affine modes. When 100kbps or lower bandwidths are available, the proposed codec not only saves 1~18% bit rate, but also enhances error-resilience in multiple slice frames and reduces blocking artifacts notably.
Synchronous backward error tracking algorithm for H.264 video coding
Ming-Kuang Tsai, Tien-Hsu Lee, Jong-Tzy Wang, et al.
The objective of this paper is to develop a robust error-resilient algorithm, called the Synchronous Backward Error Tracking (SBET), to completely terminate the error propagation effects in the error-prone environment for H.264 video coding. The motivation is that if the state of the decoder is available to the encoder, i.e., the state of the encoder can synchronize to the state of the decoder, the effect of error propagation can be entirely terminated because all predictions are based on the same references. Therefore, we assume that a feedback channel is available and the encoder can be aware of the decoder's error concealment by any external means. The pixel-based Precise Backward Error Tracking (PBET) is modified and utilized to track the error locations and reconstruct the state of the decoder in the encoder. The proposed method only involves memory access, simple addition and multiplication operations for the error-contaminated pixels to achieve encoder-decoder synchronization. By observing simulation results, the rate-distortion performance of the proposed algorithm is always better than that of the conventional algorithms. Specifically, SBET outperforms PBET up to 1.21 dB under 3% slice error rate for the QCIF format Foreman sequence. In addition, instead of forced INTRA refreshing, the phenomenon of burst bit rate can be avoided.
Rate-distortion characteristics of MPEG-2 and H.264
Recent advances in digital video coding tools have led to the introduction of the H.264 video coding standard, which promises increased visual quality and reduced bandwidth. In this paper, we analyze and compare MPEG-2 and H.264 video compression methods. Although H.264 is similar to MPEG-2 in that both are hybrid coders that use motion compensation, H.264 also includes advanced features such as improved entropy encoding, in-loop filtering of reference frames, flexible macroblock sizing, and multiple reference frame capability. Many experiments were performed to illustrate the coding gains of each feature in H.264 as well as to compare H.264 to MPEG-2. Quality was measured using two different objective video metrics: peak signal-to-noise ratio and the Structural Similarity Index. A variety of natural video test sequences were used with varying resolutions and data rates. TM5 and JM reference software were used to create MPEG-2 and H.264 compressed bitstreams. Results for all experiments show significant coding gain with H.264 versus MPEG-2 when compressing natural video sequences, especially at low data rates.
Investigative Image Processing
icon_mobile_dropdown
Did Georges de la Tour use optical projections while painting Christ in the Carpenter’s Studio?
Recently it has been theorized that some painters as early as 1420 used concave mirrors (and, later, converging lenses) to project real inverted images onto their supports which they then traced and painted over. We consider a specific painting adduced as evidence for this bold theory, the Lorainnese Baroque master Georges de la Tour’s Christ in the carpenter’s studio (1645). We perform analyses of the reflections and shadows -- “cast” shadows and “form” shadows -- to infer the source(s) of illumination. We find compelling evidence that this source is the candle flame depicted within the painting and held by Christ. We find it implausible that the source is direct solar illumination, which has the intensity demanded by the projection theory, or artificial illumination as hypothesized by theory proponents. Similar analyses of several other paintings by de la Tour uniformly support the conclusion that the illumination is small and artificial within the space of the tableau (i.e., a candle), not extremely powerful illumination from outside the tableau. We created a very simple computer graphics model to test and illustrate part of our conclusions. Our research is the first application of technical shadow analysis to the question whether artists as early as the 15th century used optical projections when painting. Careful reading of the historical record of de la Tour’s working methods supports our technical results and extend the growing image analytic methods and historical sources rebutting the theory.
Computer-assisted handwriting style identification system for questioned document examination
Sung-Hyuk Cha, Sungsoo Yoon, Charles C. Tappert, et al.
Handwriting originates from a particular copybook style such as Palmer or Zaner-Bloser that one learns in childhood. Since questioned document examination plays an important investigative and forensic role in many types of crime, it is important to develop a system that helps objectively identify a questioned document’s handwriting style. Here, we propose a computer vision system that can assist a document examiner in the identification of a writer’s handwriting style and therefore the origin or nationality of an unknown writer of a questioned document. We collected 33 Roman alphabet copybook styles from 18 countries. Each character in a questioned document is segmented and matched against all of the 33 handwriting copybook styles. The more characters present in the questioned document, the higher the accuracy observed.
Segmentation of cartridge cases based on illumination and focus series
Cartridge cases are important forensic specimen for the identification of weapons. The illumination conditions in the area of the firing pin marks and the breech face marks are very different and have to be treated separately to achieve an appropriate image quality for a visual inspection. Furthermore, not only the comparison but also the detection of the different and independent forensic marks should be automated. Both problems lead to the task of segmenting the different parts of the cartridge case bottom. In this paper, two automated approaches for the segmentation of cartridge cases are investigated and compared. The aim of the segmentation is the detection of the cartridge case border, the primer, the firing pin mark and additionally the letters around the primer. The first approach uses images obtained under systematically varied illumination conditions. After a preprocessing step a circle detection is applied to find the circular structures. The analysis of illumination series combined with a the connected components labeling method detect the letters. In a second approach, the depth-from-focus method is used to obtain 2½ D-data. This data is segmented applying a plane estimation technique. This results directly in the detection of the letters. Afterwards a circle detection algorithm identifies the parameters of the circular structures. With the introduced methods it is possible to optimize the illumination in order to realize a higher contrast of both the striation marks on the cartridge case surface and of the indentation of the firing pin independently. The improved image quality helps the examiner in identifying weapons and will help to improve the automated comparison strategies.
Semiautomatic reconstruction of strip-shredded documents
Patrick De Smet, Johan De Bock, Wilfried Philips
Until recently, the forensic or investigative reconstruction of shredded documents has always been dismissed as an important but unsolvable problem. Manual reassembly of the physical remnants can always be considered, but for large amounts of shreds this problem can quickly become an intangible task that requires vast amounts of time and/or personnel. In this paper we propose and discuss several image processing techniques that can be used to enable the reconstruction of strip-shredded documents stored within a database of digital images. The technical content of this paper mainly revolves around the use of feature based matching and grouping methods for classifying the initial database of shreds, and the subsequent procedure for computing more accurate pairing results for the obtained classes of shreds. Additionally, we discuss the actual reassembly of the different shreds on top of a common image canvas. We illustrate our algorithms with example matching and reconstruction results obtained for a real shred database containing various types of shredded document pages. Finally, we briefly discuss the impact of our findings on secure document management strategies and the possibilities for applying the proposed techniques within the context of forensic investigation.
Determining digital image origin using sensor imperfections
In this paper, we demonstrate that it is possible to use the sensor’s pattern noise for digital camera identification from images. The pattern noise is extracted from the images using a wavelet-based denoising filter. For each camera under investigation, we first determine its reference noise, which serves as a unique identification fingerprint. This could be done using the process of flat-fielding, if we have the camera in possession, or by averaging the noise obtained from multiple images, which is the option taken in this paper. To identify the camera from a given image, we consider the reference pattern noise as a high-frequency spread spectrum watermark, whose presence in the image is established using a correlation detector. Using this approach, we were able to identify the correct camera out of 9 cameras without a single misclassification for several hundred images. Furthermore, it is possible to perform reliable identification even from images that underwent subsequent JPEG compression and/or resizing. These claims are supported by experiments on 9 different cameras including two cameras of exactly same model (Olympus C765).
Geometrical methods for accurate forensic videogrammetry: Part I. Measuring with non-point features
Lenny Rudin, Ping Yu, Theo Papadopoulo
In forensic applications of photogrammetric techniques any measurable information is a precious commodity. In this paper, we describe the development of a new geometrical approach to photogrammetric measurements. While classical photogrammetry requires knowledge of point measurements, in this approach we exploit other geometrical constraints that are often available in photo and video forensic evidence. In Part I paper, the first in the series, we introduce line constraints and demonstrate algorithms that combine point and line measurements.
Video Coding
icon_mobile_dropdown
MPEG motion picture coding with long-term constraint on distortion variation
A highly desirable feature in storage video applications is uniform video quality. Variable bit rate (VBR) coding has the potential to produce nearly constant quality throughout an entire movie. This can be defined as a bit allocation problem with a long-term constraint on distortion variation. We consider optimal bit allocation with multiple constraints including disk capacity and distortion bounds on the individual frames. We find the theoretical optimality conditions and propose a practical iterative solution based on Lagrangian methods. While minimizing both average distortion and distortion variation cannot be achieved simultaneously for a given bit budget, the proposed algorithms are able to efficiently trade-off between the two goals. The computational complexity of the exact rate-distortion (R-D) functions for real movies is addressed by a statistical R-D model proposed in this work. The model is formed by a rate-quantization (R-Q) function and the corresponding distortion-quantization (D-Q) function. A novel two-pass MPEG-2 VBR encoder based on the proposed algorithms is developed for coding with long-term nearly constant quality. Experimental results are promising and the encoder effectively achieves the fit-to-disc function and at the same time controls objective quality variation. By incorporating basic subjective coding techniques into the encoder significant visual quality improvement is observed during the preliminary subjective tests.
Using game theory for perceptual tuned rate control algorithm in video coding
Jiancong Luo, Ishfaq Ahmad
This paper proposes a game theoretical rate control technique for video compression. Using a cooperative gaming approach, which has been utilized in several branches of natural and social sciences because of its enormous potential for solving constrained optimization problems, we propose a dual-level scheme to optimize the perceptual quality while guaranteeing “fairness” in bit allocation among macroblocks. At the frame level, the algorithm allocates target bits to frames based on their coding complexity. At the macroblock level, the algorithm distributes bits to macroblocks by defining a bargaining game. Macroblocks play cooperatively to compete for shares of resources (bits) to optimize their quantization scales while considering the Human Visual System’s perceptual property. Since the whole frame is an entity perceived by viewers, macroblocks compete cooperatively under a global objective of achieving the best quality with the given bit constraint. The major advantage of the proposed approach is that the cooperative game leads to an optimal and fair bit allocation strategy based on the Nash Bargaining Solution. Another advantage is that it allows multi-objective optimization with multiple decision makers (macroblocks). The simulation results testify the algorithm’s ability to achieve accurate bit rate with good perceptual quality, and to maintain a stable buffer level.
Analysis of motion-compensated temporal filtering versus motion-compensated prediction
In previous work, a performance bound for multi-hypothesis motion-compensated prediction (MCP) has been derived based on a video signal model with independent Gaussian displacement errors. A simplified form of the result is derived in this work. A performance bound for optimal motion-compensated temporal filtering (MCTF) has also been proposed based on a signal model with correlated Gaussian displacement errors. In this previous work, the optimal MCTF (KLT) was found to perform better than one-hypothesis MCP but not better than infinite-hypothesis MCP. In this work, we derive the performance of multi-hypothesis MCP again based on the signal model with correlated Gaussian displacement errors. Now with the same signal model, we find that optimal MCTF has the same performance as that of infinite-hypothesis MCP.
Rate distortion performance of leaky prediction layered video coding: theoretic analysis and results
Yuxin Liu, Josep Prades-Nebot, Gregory W. Cook, et al.
Leaky prediction layered video coding (LPLC) incorporates a scaled version of the enhancement layer in the motion compensated prediction (MCP) loop, by using a leaky factor between 0 and 1, to balance between coding efficiency and error resilience performance. In this paper, we address the theoretic analysis of LPLC using two different approaches: the one using rate distortion theory and the one using quantization noise modeling. In both approaches, an alternative block diagram of LPLC is first developed, which significantly simplifies the theoretic analysis. We consider two scenarios of LPLC, with and without prediction drift in the enhancement layer, and obtain two sets of rate distortion functions in closed form for both scenarios. We evaluate both closed form expressions, which are shown to conform with the operational results.
Optimization of transform coefficient selection and motion vector estimation considering interpicture dependencies in hybrid video coding
Considering inter-picture dependencies when selecting transform coefficient levels in hybrid video coding can be done via formulating the decoding process as a linear signal model and solving a quadratic program. The basic method assumes motion estimation and quantization parameters as being given and then selects the transform coefficient levels. However, when motion vectors are determined in advance, motion estimation must be conducted using uncoded reference pictures which is known to deliver inferior results compared to motion estimation on decoded reference pictures. In this work, we expand the basic method to incorporate the case where the motion estimation is considering decoded reference pictures. We propose an approach that iterates between transform coefficient selection and motion estimation. We find that a simple two-pass iteration works reasonably well. Our simulation results using an H.264/AVC-conforming encoder show coding gains up to 1 dB in comparison to the quantization method specified in the test model of H.264/AVC.
Scalable Video Coding
icon_mobile_dropdown
Scalable motion vector coding for MC-EZBC
In the scalable video coder MC-EZBC, the scalability for motion vectors was not provided, and this greatly impacts its performance when scaling down to very low bit rates and resolutions. Here we enhance MC-EZBC with scalable motion vector coding using the Context based Adaptive Binary Arithmetic Coder (CABAC). Both a layered structure for motion vector coding and an alphabet general partition (AGP) of the motion vector symbols are employed for SNR and resolution scalability of the motion vector bitstream. With these two new features and the careful arrangement of the motion vector bitstream output from the existing MC-EZBC, we obtain temporal, SNR, and resolution scalability for motion vectors. This significantly improves both visual and objective performance at low bit rates and resolutions with only a slight PSNR loss (about 0.05 dB), but no detectable visual loss, at high bit rates.
Generic modeling of complexity for motion-compensated wavelet video decoders
Motion-compensated wavelet video coders have been shown to exhibit good coding efficiency over a large range of bit-rates, in addition to providing spatial and temporal scalability. While the rate-distortion performance provided by these coders is well understood, their complexity scalability behavior is not well studied. In this paper, we first analyze the complexity of such wavelet video coders, and determine what the critical components are and how they vary depending on the transmission bit-rates. Subsequently, we construct generic complexity models for the critical components of the scalable wavelet video decoders; such that optimal rate, distortion and complexity bitstreams can be created that fulfill not only various network constraints, but also resource constraints such as memory and power. The generic complexity metrics are independent of the hardware architecture and implementation details of the decoders and capture both the time varying video content characteristics and the corresponding encoding parameters. The generic complexity measures can be converted into real platform specific complexity measures like execution time with limited overhead at runtime. Preliminary results show that the proposed models can predict the complexity of the various components of wavelet video decoders with high accuracy.
Importance of motion in motion-compensated temporal discrete wavelet transforms
Discrete wavelet transforms (DWTs) applied temporally under motion compensation (MC) have recently become a very powerful tool in video compression, especially when implemented through lifting. A recent theoretical analysis has established conditions for perfect reconstruction in the case of transversal MC-DWT, and also for the equivalence of lifted and transversal implementations of MC-DWT. For Haar MC-DWT these conditions state that motion must be invertible, while for higher-order transforms they state that motion composition must be a well-defined operator. Since many popular motion models do not obey these properties, thus inducing errors (prior to compression), it is important to understand what is the impact of motion non-invertibility or quasi-invertibility on the performance of video compression. In this paper, we present new experimental results of a study aiming at a quantitative evaluation of such impact in case of block-based motion. We propose a new metric to measure the degree with which two motion fields are not inverses of each other. Using this metric we investigate several motion inversion schemes, from simple temporal sample-and-hold, through spatial nearest-neighbor, to advanced spline-based inversion, and we compare compression performance of each method to that of independently-estimated forward and backward motion fields. We observe that compression performance monotonically improves with the reduction of the proposed motion inversion error, up to 1-1.5dB for the advanced spline-based inversion. We also generalize the problem of "unconnected" pixels by extending it to both update and prediction steps, as opposed to the update step only used in conventional methods. Initial tests show favorable results compared to previously reported techniques.
Fully scalable video compression with sample-adaptive lifting and overlapped block motion
David Taubman, Reji Mathew, Nagita Mehrseresht
Motion compensated temporal lifting is a highly effective means for exploiting motion in wavelet-based video compression algorithms. One way to achieve both spatial and temporal scalability attributes is to apply a conventional spatial DWT to an initial set of temporal subbands. This "t+2D" paradigm may be reversed, performing the spatial transform first and temporally transforming its spatial subbands. In this paper, we show how the two paradigms can be bridged by a family of "vector" motion compensation operators. Different members of this family have different implications for compression efficiency and for artifacts which can appear at reduced resolutions. We show how the vector motion compensation operators can be adaptively selected so as to achieve high compression efficiency, while simultaneously suppressing artifacts which might otherwise occur during scaling. The vector motion compensation paradigm provides an efficient framework for in-band block-based motion modeling, which suppresses the appearance of blocking artifacts. The proposed adaptive motion compensation operators have an added advantage in automatically selecting between different resolution-dependent motion models, so as to maximize energy compaction while avoiding the appearance of artifacts at reduced resolutions. Resolution-dependent motion models extend the useful range of bit-rate scalability over many orders of magnitude.
Fully scalable video coding with packed stream
Manuel F. Lopez, Sebastian G. Rodriguez, Juan Pablo Ortiz, et al.
Scalable video coding is a technique which allows a compressed video stream to be decoded in several different ways. This ability allows a user to adaptively recover a specific version of a video depending on its own requirements. Video sequences have temporal, spatial and quality scalabilities. In this work we introduce a novel fully scalable video codec. It is based on a motion-compensated temporal filtering (MCTF) of the video sequences and it uses some of the basic elements of JPEG 2000. This paper describes several specific proposals for video on demand and video-conferencing applications over non-reliable packet-switching data networks.
Scalable video transmission over Rayleigh fading channels using LDPC codes
Manu Bansal, Lisimachos P. Kondi
In this paper, we investigate an important problem of efficiently utilizing the available resources for video transmission over wireless channels while maintaining a good decoded video quality and resilience to channel impairments. Our system consists of the video codec based on 3-D set partitioning in hierarchical trees (3-D SPIHT) algorithm and employs two different schemes using low-density parity check (LDPC) codes for channel error protection. The first method uses the serial concatenation of the constant-rate LDPC code and rate-compatible punctured convolutional (RCPC) codes. Cyclic redundancy check (CRC) is used to detect transmission errors. In the other scheme, we use the product code structure consisting of a constant rate LDPC/CRC code across the rows of the `blocks' of source data and an erasure-correction systematic Reed-Solomon (RS) code as the column code. In both the schemes introduced here, we use fixed-length source packets protected with unequal forward error correction coding ensuring a strictly decreasing protection across the bitstream. A Rayleigh flat-fading channel with additive white Gaussian noise (AWGN) is modeled for the transmission. The rate-distortion optimization algorithm is developed and carried out for the selection of source coding and channel coding rates using Lagrangian optimization. The experimental results demonstrate the effectiveness of this system under different wireless channel conditions and both the proposed methods (LDPC+RCPC/CRC and RS+LDPC/CRC) outperform the more conventional schemes such as those employing RCPC/CRC.
Selective Encryption for Image/Video
icon_mobile_dropdown
A layered complexity-aware scalable video encryption scheme
In this paper, we propose a layered complexity-aware encryption scheme to partially encrypt scalable video bitstreams to achieve reasonable security that the specific application entails and the involved computational complexity allows. The proposed scheme naturally combines wavelet-based scalable video coding technique and the concept of selective encryption. We also study the relationship between rate, distortion, and the involved encryption complexity (R-D-EC). Here, distortion is used to measure the level of security of the encrypted video bitstream, and the percentage of encrypted bitstream is used to denote encryption complexity. Our simulation results indicate that selective encryption using a prioritized bitstream structure provided by scalable video coding can achieve almost bitrate-independent encryption complexity control.
Object Tracking
icon_mobile_dropdown
Detection-based particle filtering for real-time multiple-head tracking applications
We present a novel detection based particle filtering framework for real-time multi-object tracking (MOT). It integrates object detection and motion information with particle filter detecting and tracking the multiple objects dynamically and simultaneously. To demonstrate the approach, we concentrate on the complex multi-head tracking while the framework is general for any kind of objects. Three novel contributions are made: 1) Distinct with the conventional particle filter which generates particles from the prior density, we propose a novel importance function based on up to date detection and motion observation which is much closer to the desired posterior. 2) By integrating detection, the tracker can do the initialization automatically, handle new object appearance and hard occlusion for MOT. By using motion estimation, it can track fast motion activities. 3) Hybrid observations including color and detection information are used to calculate the likelihood which makes the approach more stable. The proposed method is superior to the available tracking methods for multi-head tracking and can handle not only the changes of scale, lighting, zooming, and orientation, but also fast motion, appearance, and hard occlusion.
Sensor specific distributions for improved tracking of people
Rik Bellens, Sidharta Gautama, Johannes P.F. D'Haeyer
In this paper, we examine sensor specific distributions of local image operators (edge and line detectors), which describe the appearance of people in video sequences. The distributions are used to describe a probabilistic articulated motion model to track the gestures of a person in terms of arms and body movement. The distributions are based on work of Sidenbladh where general distributions are examined, collected over images found on the internet. In our work, we focus on the statistics of one sensor, in our case a standard webcam, and examine the influence of image noise and scale. We show that although the general shape of the distributions published by Sidenbladh are found, important anomalies occur which are due to image noise and reduced resolution. Taking into account the effects of noise and blurring on the scale space response of edge and line detectors improves the overall performance of the model. The original distributions introduced a bias towards small sharp boundaries over large blurred boundaries. In the case of arms and legs which often appear blurred in the image, this bias is unwanted. Incorporating our modifications in the distributions removes the bias and makes the tracking more robust.
Particle filtering with multiple cues for object tracking in video sequences
Paul A. Brasnett, Lyudmila Mihaylova, Nishan Canagarajah, et al.
In this paper we investigate object tracking in video sequences by using the potential of particle filtering to process features from video frames. A particle filter (PF) and a Gaussian sum particle filter (GSPF) are developed based upon multiple information cues, namely colour and texture, which are described with highly nonlinear models. The algorithms rely on likelihood factorisation as a product of the likelihoods of the cues. We demonstrate the advantages of tracking with multiple independent complementary cues compared to tracking with individual cues. The advantages are increased robustness and improved accuracy. The performance of the two filters is investigated and validated over both synthetic and natural video sequences.
Salient points for tracking moving objects in video
Chandrika Kamath, Abel Gezahegne, Shawn Newsam, et al.
Detection and tracking of moving objects is important in the analysis of video data. One approach is to maintain a background model of the scene and subtract it from each frame to detect the moving objects which can then be tracked using Kalman or particle filters. In this paper, we consider simple techniques based on salient points to identify moving objects which are tracked using motion correspondence. We focus on video with a large field of view, such as a traffic intersection with several buildings nearby. Such scenes can contain several salient points, not all of which move between frames. Using public domain video and two types of salient points, we consider how to make these techniques computationally efficient for detection and tracking. Our early results indicate that salient regions obtained using the Lowe keypoints algorithm and the Scale-Saliency algorithm can be used successfully to track vehicles in moderate resolution video.
Computer Vision
icon_mobile_dropdown
Pixels to objects: a generic vision front-end
Vision implies objects, so a vision system front-end needs to produce a feature-based description of image objects. The functional boundaries and specifications for the front-end are derived from analyzing: 1) What feature information can be extracted from context-free video? 2) What feature information will reduce the probability distribution model complexity for statistical object recognition and tracking? 3) How should the feature information be encoded? Segmentation is a flexible tool for extracting features. Recently proposed segmentation algorithms can be adapted to high-performance, low-cost hardware. Inexpensive segmentation will have a multiplying affect on vision system performance/complexity. Two examples are techniques for extending hardware functions into both parallel pixel processes and object tracking.
A real-time 3D interface using uncalibrated cameras
This paper proposes a real-time 3D user interface using multiple possibly uncalibrated cameras. It tracks the user’s pointer in real-time and solves point correspondences across all the cameras. These correspondences form spatio-temporal “traces” that serve as a medium for sketching in a true 3-D space. Alternatively, they may be interpreted as gestures or control information to elicit some particular action(s). Through view synthesis techniques, the system enables the user to change and seemingly manipulate the viewpoint of the virtual scene even in the absence of camera calibration. It also serves as a flexible, intuitive, and portable mixed-reality display system. The proposed system has numerous implications in interaction and design, especially as a general interface for creating and manipulating various forms of 3-D media.
An accurate semi-automatic segmentation scheme based on watershed and change detection mask
This paper presents a region-based segmentation method extracting automatically moving objects from video sequences. Non-moving objects can also be segmented by using a graphical user interface. The segmentation scheme is inspired from existing methods based on the watershed algorithm. The over-segmented regions resulting from the watershed are first organized in a binary partition tree according to a similarity criterion. This tree aims to determine the fusion order. Every region is then fused with the most similar neighbour according to a spatio-temporal criterion regarding the region colors and the temporal colors continuity. The fusion can be stopped either by fixing a priori the final number of regions, or by markers given through the graphical user interface. Markers are also used to assign a class to non-moving objects. Classification of moving objects is automatically obtained by computing the Change Detection Mask. To get a better accuracy on the contours of the segmented objects, we perform a simple post-processing filter to refine the edges between different video object planes.
Optimized mean shift algorithm for color segmentation in image sequences
The application of the mean shift algorithm to color image segmentation has been proposed in 1997 by Comaniciu and Meer. We apply the mean shift color segmentation to image sequences, as the first step of a moving object segmentation algorithm. Previous work has shown that it is well suited for this task, because it provides better temporal stability of the segmentation result than other approaches. The drawback is higher computational cost. For speed up of processing on image sequences we exploit the fact that subsequent frames are similar and use the cluster centers of previous frames as initial estimates, which also enhances spatial segmentation continuity. In contrast to other implementations we use the originally proposed CIE LUV color space to ensure high quality segmentation results. We show that moderate quantization of the input data before conversion to CIE LUV has little influence on the segmentation quality but results in significant speed up. We also propose changes in the post-processing step to increase the temporal stability of border pixels. We perform objective evaluation of the segmentation results to compare the original algorithm with our modified version. We show that our optimized algorithm reduces processing time and increases the temporal stability of the segmentation.
Estimating physical camera parameters based on multisprite motion estimation
Global-motion estimation algorithms as they are employed in the MPEG-4 or H.264 video coding standards describe motion with a set of abstract parameters. These parameters model the camera motion, but they cannot be directly related to a physical meaning like rotation angles or the focal-length. We present a two step algorithm to factorize these abstract parameters into physically meaningful operations. The first step applies a fast linear estimation method. In an optional second step, these parameters can be refined with a non-linear optimization algorithm. The attractivity of our algorithm is its combination with the multi-sprite concept that allows for unrestricted rotational camera motion, including varying of focal-lengths. We present results for several sequences, including the well-known stefan sequence, which can only be processed with the multi-sprite approach.
Feature-based techniques for real-time morphable model facial image analysis
Siddhartha Chaudhuri, Randhir Kumar Singh, Edoardo Charbon
We present an algorithm to quickly analyse and compress facial images using a 2-dimensional morphable model. It runs in real-time on reasonable resources, and offers considerable opportunities for parallelization. A morphable model associates a "shape vector" and a "texture vector" with each image of a sample set. The model is used to analyze a novel image by estimating the model parameters via an optimization procedure. The novel image is compressed by representing it by the set of best match parameters. For real-time performance, we separate the novel image into shape and texture components by computing correspondences between the novel image and a reference image, and match each component separately using eigenspace projection. This approach can be easily parallelized. We improve the speed of algorithm by exploiting the fact that facial correspondence fields are smooth. By computing correspondences only at a number of "feature points" and using interpolation to approximate the dense fields, we drastically reduce the dimensionality of the vectors in the eigenspace, resulting in much smaller compression times. As an added benefit, this system reduces spurious correspondences, since weak features that may confuse the correspondence algorithm are not tracked.
View morphing and interpolation through triangulation
Xiaoyong Sun, Eric Dubois
This paper presents a method for view morphing and interpolation based on triangulation. View morphing here is viewed here as a basic tool for view interpolation. The feature points in each source image are first detected. Based on these feature points, each source image is segmented into a set of triangular regions and local affine transformations are then used for texture mapping of each triangle from the source image to the destination image. This is called triangulation-based texture mapping. However, one of the significant problems associated with this approach is texture discontinuity between adjacent triangles. In order to solve this problem, the triangular patches that might cause these adjacent discontinuities are first detected and the optimal affine transformations for these triangles are then applied. In the subsequent view interpolation step, all source images are transferred to the novel view through view morphing, and the final novel view is the combination of all these candidate novel views. The major improvement over the traditional approach is a feedback-based method proposed to determine the weights for the texture combination from different views. Simulation results show that our method can reduce the discontinuities in triangle-based view morphing and significantly improve the quality of the interpolated views.
Media over Networks
icon_mobile_dropdown
Techniques for generating multiresolution repositories of XML subschemas
Techniques for generating multi-resolution repositories of reusable XML subschemas are investigated in this research. There are three major components to achieve this goal: weighting XML elements, decomposing XML schemas, and clustering the decomposed subschemas. Two approaches are proposed to calculate the weights of XML elements. One is based on the analysis of links and their attributes while the other is based on the information accumulated from all reachable nodes. Elements with higher weights are selected as roots of reusable subschemas, which are further clustered into K groups according to the information amount carried to obtain multi-resolution repositories.
Image transmission system using adaptive joint source and channel decoding
In this paper, an adaptive joint source and channel decoding method is designed to accelerate the convergence of the iterative log-dimain sum-product decoding procedure of LDPC codes as well as to improve the reconstructed image quality. Error resilience modes are used in the JPEG2000 source codec, which makes it possible to provide useful source decoded information to the channel decoder. After each iteration, a tentative decoding is made and the channel decoded bits are then sent to the JPEG2000 decoder. Due to the error resilience modes, some bits are known to be either correct or in error. The positions of these bits are then fed back to the channel decoder. The log-likelihood ratios (LLR) of these bits are then modified by a weighting factor for the next iteration. By observing the statistics of the decoding procedure, the weighting factor is designed as a function of the channel condition. That is, for lower channel SNR, a larger factor is assigned, and vice versa. Results show that the proposed joint decoding methods can greatly reduce the number of iterations, and thereby reduce the decoding delay considerably. At the same time, this method always outperforms the non-source controlled decoding method up to 5dB in terms of PSNR for various reconstructed images.
Multimedia proxy adaptive scheduler driven by perceptual quality for multi-user environment
Broadband multimedia communications will be a key element in the 3rd generation of wireless services. A primary challenge is the support of interactive services requiring synchronous playout of multimedia content with a maximum acceptable delay in a multi-user scenario. Subjective tests show that the subjects prefer a lower quality reproduction than a “hopping” video: the pause needed to re-buffer severely affects the overall quality. To maximize the number of active users that can be served with a predefined Quality of Service (QoS) level by a packet-oriented radio access, we propose the use of a multimedia proxy. This component may transcode the original video stream to match a predefined quality level based on the quality of each user’s channel.
Rate-distortion-based scheduling of video with multiple decoding paths
Huisheng Wang, Antonio Ortega
We present a general rate-distortion based scheduling framework that can accommodate cases where multiple encoded versions for the same video are available for transmission. Previous work on video scheduling is mostly focused on those encoding techniques, such as layered coding, which generate only one set of dependent packets. However, it is sometimes preferred to have a codec that produces redundant video data, where multiple different decoding paths are possible. Examples of these scenarios are multiple description layered coding and multiple independently encoded video streams. A new source model called Directed Acyclic HyperGraph (DAHG) is introduced to describe the relationship between different video data units with multiple decoding paths. Based on this model, we propose two low-complexity scheduling algorithms: the greedy algorithm and the M-T algorithm. Experiments are made to compare the performance of these algorithms. It is shown that, in the case of multiple decoding paths, the M-T algorithm outperforms the greedy algorithm by taking into account some of the transmission possibilities available in the near future before making a decision.
Multiple description distributed image coding with side information for mobile wireless transmission
In this paper, we proposed a multiple description distributed image coding system for mobile wireless transmission. The innovations of proposed system are two folds: First, when MDC is applied to wavelet subband based image coding; it is possible to introduce correlation between the descriptions in each subband. At the encoder, the correlation information is encoded by systematic Reed Solomon (RS) encoder. Only the parity check bits are sent to channel. At the receiver, when part of descriptions are lost, however, their correlation information are available, the Wyner Ziv decoder can still recover the description by using the correlation information and the partly received description as noisy version and the side information. Secondly, in each description, we use multiple bitstream image coding to achieve error robust transmission. In conventional entropy subband coding, the first bit error may cause the decoder to discard all subsequent bits. A multiple bitstream image encoding is developed based on the decomposition of images in the wavelet domain. We show that such decomposition is able to reduce error propagation in transmission, thus to achieve scalability on PSNR performance over the changes of BER, Experimental result shows that the PSNR could be improved with the same coding rate.
Challenges in Real-World Media Streaming
icon_mobile_dropdown
Robust media processing on programmable power-constrained systems
To achieve consumer-level quality, media systems must process continuous streams of audio and video data while maintaining exacting tolerances on sampling and frame rate, jitter, and synchronization. While it is relatively straightforward to design fixed-function hardware implementations to satisfy worst-case conditions, there is a growing trend to utilize programmable multi-tasking solutions for media applications. The flexibility of these systems enables support for multiple current and future media formats, which can reduce design costs and time-to-market. This paper seeks to provide practical engineering solutions to achieve robust media processing on such systems, with specific attention given to power-constrained environments. The techniques covered in this article utilize the fundamental concepts of software optimization, software/hardware partitioning, stream buffering, hierarchical prioritization, and system resource and power management. A novel enhancement to dynamically adjust processor voltage and frequency based on buffer fullness to reduce system power consumption is examined in detail. The application of these techniques is provided in a case study of a portable video player implementation based on a general-purpose processor running a non real-time operating system that achieves robust playback from local storage and streaming over 802.11.
Building adaptive applications: on the need for congestion control
While the convergence of video communication networks and the Internet has many benefits, a number of issues arise when adapting video conferencing and streaming media systems to run on IP networks. One of the major problems is the lack of deployed quality-of-service support, meaning video applications must compete for resources with other best-effort traffic. Since the other traffic is overwhelmingly TCP/IP-based, video applications must either use TCP/IP themselves or behave in a TCP-Friendly manner to ensure the stability of the network. This paper describes why using TCP/IP directly is often inappropriate, explores the problems inherent in adapting video codec output to match the behaviour of the network using a TCP-Friendly protocol, and discuss the trade-off between network friendly behaviour and video quality. Potential solutions such as TFRC and DCCP are discussed, and their standardization status noted. The aim of the work is to highlight problems and areas where further research is needed – both for network protocol designers, video codec developers, and application authors – with the aim of fostering better network aware codecs.
Media rights and media security
Digital Rights Management (DRM) systems typically do not treat rights management as a security problem. DRM uses cryptographic techniques but not security relationships. Instead, DRM systems use "tamper-resistant mechanisms" to discourage unauthorized access to rights-managed content. Although proven ineffective in practice, tamper-resistant mechanisms penalize legitimate customers with added complexity and costs that arise from tamper-resisting data or program code. This paper explores how a security relationship between provider and consumer might be more effective for managing rights to content works on two-way networks.
Digital media in the home: technical and research challenges
This article attempts to identify some of the technology and research challenges facing the digital media industry in the future. We first discuss several trends in the industry, such as the rapid growth of broadband Internet networks and the emergence of networking and media-capable devices in the home. Next, we present technical challenges that result from these trends, such as effective media interoperability in devices, and provide a brief overview of Windows Media, which is one of the technologies in the market attempting to address these challenges. Finally, given these trends and the state of the art, we argue that further research on data compression, encoder optimization, and multi-format transcoding can potentially make a significant technical and business impact in digital media. We also explore the reasons that research on related techniques such as wavelets or scalable video coding is having a relatively minor impact in today’s practical digital media systems.
Media delivery and media service overlays
Multimedia communication and streaming media services will become mainstream network infrastructure applications in the coming decade. However, there are many challenges that must be overcome. These challenges include the Internet’s limited ability to handle real-time, low-latency media streams, the need for media security, and an uncertainty of the killer app. The nature of these challenges lends itself to enabling technology innovations in the media delivery and media processing space. Specifically, we envision an overlay infrastructure that supports networked media services that couple media delivery with in-network media processing. The media overlay should be programmable to allow rapid deployment of new applications and services and manageable so as to support the evolving requirements of the resulting usage models. Furthermore, the media overlay should allow for the delivery of protected media content for applications that have security requirements. A properly architected infrastructure can enable real-time multimedia communication and streaming media services in light of the inherent challenges.
Challenges in media delivery systems and servers
Although multimedia compression formats and protocols to stream such content have been around for a long time, there has been limited success in the adoption of open standards for streaming over IP (Internet Protocol) networks. The elements of such an end-to-end system will be introduced outlining the responsibilities of each element. The technical and financial challenges in building a viable multimedia streaming end-to-end system will be analyzed in detail in this paper outlining some solutions and areas for further research. Also, recent migration to IP in the backend video delivery network infrastructures have made it possible to use IP based media streaming solutions in non-IP last mile access networks like cable and wireless networks in addition to the DSL networks. The advantages of using IP streaming solutions in such networks will be outlined. However, there is a different set of challenges posed by such applications. The real time constraints are acute in each element of the media delivery end-to-end system. Meeting these real time constraints in general purpose non real time server systems is quite demanding. Quality of service, resource management, session management, fail-over, reliability, and cost are some important but challenging requirements in such systems. These will also be analyzed with suggested solutions. Content protection and rights management requirements are also very challenging for open standards based multimedia delivery systems. Interoperability unfortunately interferes with security in most of the current day systems. Some approaches to solve the interoperability problems will also be presented. The requirements, challenges, and possible solutions for delivering broadcast, on demand, and interactive video delivery applications for IP based media streaming systems will be analyzed in detail.
Wyner-Ziv Video Coding
icon_mobile_dropdown
Network-driven Wyner-Ziv video coding using forward prediction
In some video applications such as video surveillance, a simple encoder is preferred and the computational intensive jobs shall be left to the decoder. It is pointed out in Wyner and Ziv’s paper that this goal is achievable by exploiting video source statistics only at the decoder. In many existing Wyner-Ziv video coding schemes, a lot of frames have to be intra coded so that the decoder can derive sufficiently accurate side information from the I frames. In this paper we present a new network-drive Wyner-Ziv method using forward prediction. The basic idea is to perform motion estimation at the decoder and send motion information back to the encoder through the feedback channel. We implement our approach by modifying the H.264 video codec JM8.0 with different configurations. The results show that our proposed approach can improve coding efficiency compared to other Wyner-Ziv video coding schemes.
Video coding scheme using irregular binning and iterative decoding
Kang-Sun Choi, Antonio Ortega
We propose a novel video coding scheme requiring a simple encoder and a complex decoder where video frame is intra-coded periodically and the intermediary frames between the successive intra-coded frames are coded efficiently by a proposed irregular binning. We investigate a method of forming an irregular binning which is capable of quantizing any value effectively with only small number of bins. Based on the fact that successive frames in a video are highly correlated, the video reconstructed at the decoder is enhanced gradually by applying POCS (projection on the convex sets). After an image frame is reconstructed with the irregular binning information at the proposed decoder, we can improve the resulting quality more by modifying the reconstructed image with motion-compensated data from the neighboring frames. In the proposed decoder, several iterations of these modification and re-projection can be performed. Experimental results show that the performance of the proposed coding scheme is slightly lower than that of H.264, and thus can be an alternative of H.264 in the applications requiring a simple encoder.
Systematic lossy error protection versus layered coding with unequal error protection
Shantanu D. Rane, Bernd Girod
In this paper we compare two schemes for error-resilient video transmission, viz., systematic lossy error protection, and layered coding with unequal error protection. In the first scheme, the systematic portion consists of the compressed video signal transmitted without channel coding. For error-resilience, an additional bitstream, generated by Wyner-Ziv encoding of the video signal, is transmitted. In the event of channel errors, the Wyner-Ziv description is decoded using the error-prone systematic description as side-information. In the second scheme, the video bitstream is partitioned into two or more layers, and each layer is assigned different amounts of parity information depending upon its relative significance. Since the base layer has heavy protection, a certain minimum video quality is guaranteed at the receiver. We derive information-theoretic conditions for optimality of each of the above systems. We also compare experimentally, the performance of the competing schemes, for a particular application, i.e., Error-resilient digital video broadcasting. It is shown that systematic lossy error protection outperforms layered (scalable) coding by ensuring graceful degradation of video quality without incurring the loss in rate-distortion performance observed in practical layered video coding schemes.
Video Processing I
icon_mobile_dropdown
Architectural implications for high-quality video format conversion
Erwin B. Bellers, Johan G.W.M. Janssen
Video format conversion is required if the received video format mismatches the display format (spatially and/or temporally). In addition, a high-quality video format converter can successfully be used to eliminate movie judder (2:2 and 3:2 pull down) by applying motion-compensated temporal up-conversion techniques which provides a smooth motion portrayal. In this paper we present our architecture, the associated algorithms which provide high-quality video format conversion, and there mutual implications.
Subjective evaluation of de-interlacing techniques
Interlace is part of television standards since the very start of TV-broadcast. The advent of new display principles that cannot handle interlaced video, the wish to up-scale standard definition video for display on large high-definition screens and the introduction of video in traditionally non-interlaced multimedia PCs ask for advanced de-interlacing techniques. Various de-interlacing techniques can be categorized into non-motion compensated methods and motion compensated methods. The former includes linear techniques such as spatial filtering, temporal filtering, vertical-temporal filtering and non-linear techniques like motion adaptive filtering, edge-dependent interpolation, implicitly adapting methods and hybrid methods. The latter category includes temporal backward projection, time-recursive de-interlacing, adaptive-recursive de-interlacing, generalized sampling theorem de-interlacing method and hybrid method. An objective comparison based on Mean Square Error (MSE) and Motion Trajectory Inconsistency (MTI) metric has been given on above listed methods. In this paper, we describe a subjective assessment in which a number of de-interlacing techniques will be ranked by a group of viewers (typically twenty persons). The experiment was set-up according to the recommendations of the ITU. Combined with the objective scores presented in the earlier publications, we then have a thorough analysis of each selected de-interlacing algorithms. This improves the relevance and reliability of our knowledge concerning the performance of these de-interlacing algorithms.
A multilevel projective image registration technique
This paper proposes a projective image registration algorithm, oriented to consumer devices. It exploits a “multi-resolution feature based method” for estimating the projective parameters through a 2D Daubechies Discrete Wavelet Transform (DWT). The algorithm has been fully tested with real image sequences acquired by CMOS sensors and compared to other registration techniques. The obtained results highlight the accuracy of the registration parameters.
An improved adaptive deblocking filter for MPEG video decoder
Do-Kyoung Kwon, Mei-Yin Shen, C.-C. Jay Kuo
A highly adaptive de-blocking algorithm is proposed for MPEG video, which is improved in three ways compared to previous algorithms. First of all, the proposed algorithm is adaptive to quantization parameter (QP) change. Since blocking artifacts between two blocks encoded with different QPs tend to be more visible due to quality difference, filters should be able to dynamically adapt to QP change between blocks. Secondly, the proposed algorithm classifies block boundary into three different region modes based on local region characteristics. The three region modes include active, smooth and dormant region. By applying different filters of different smoothing strengths to each region mode, the proposed algorithm can minimize undesirable blur so that both subjective and objective qualities improve for various type of sequences at a wide range of bitrates. Finally, the proposed algorithm also provides threshold determination methods. Adaptive de-blocking algorithms require several thresholds for mode decision as well as filtering decision. Since quality of video sequences after filtering depends largely on them, each threshold should be determined carefully. In the proposed algorithm, thresholds are determined adaptively to strength of blocking artifact, therefore adaptively to various encoding parameters such as QP, absolute difference between QPs, and block coding type, which are closely related to strength of blocking artifact. The experimental results show the proposed algorithm can achieve 0.2-0.4 dB gain for I- and P-frames, and 0.1-0.3 dB gain for the B-frames when bitstreams are encoded by using the TM5 rate control algorithm.
Video Processing II
icon_mobile_dropdown
Motion estimation based on spatio-temporal correlations and pixel decimation
The full search block-matching algorithm is the simplest, but computationally very intensive approach. In recent years there was substantial progress in block motion estimation algorithms. Important milestones on this path were such algorithms as two-dimensional logarithmic search, three-step search, four step search, and diamond search. All these methods try to minimize the amount of search points applying sum of absolute differences (SAD) or equivalent metrics for each point. There were some works that studied partial SAD (PSAD) but mostly concentrated on a constant decimation factor. What we tried to study in this work is the performance of one of the best block matching search algorithms in combination with adaptive PSAD as a matching metric. The idea is that we use original motion estimation based on spatio-temporal correlation method, but instead of using SAD as a matching metric we use PSAD with adaptively chosen decimation factor. Our simulation results show that for high motion sequences PSNR degradation between full search and the proposed method was around 0.1-0.7 dB. The computational complexity reduction of 650-1700 times (compared with the full search) and 9 times (compared to the original method) is pretty big and may be well worth this video quality decrease. In the case of more static sequences PSNR degradation between full search and the original motion estimation method was around 0 dB. When we compare the original method with the proposed one the degradation increases to 0.1 dB. The computational complexity reduction was around 1600-1700 times (compared with the full search) and 5-7 times (compared to the original method).
Motion estimation on interlaced video
Motion compensated de-interlacing and motion estimation based on Yen's generalisation of the sampling theorem (GST) have been proposed by Delogne and Vandendorpe. Motion estimation methods using three-fields have been designed on a block-by-block basis, minimising the difference between two GST predictions. We will show that this criterion degenerates into a two-fields criterion, leading to erroneous motion vectors, when the vertical displacement per field period is an even number of pixels. We provide a solution for this problem, by adding a term to the matching criterion.
Motion deblurring by coupled nonlinear diffusion with discrete calculus adaptive to a motion direction
Takahiro Saito, Hiroyuki Harada, Takashi Komatsu
We previously presented a method of selective image sharpening based on a coupled nonlinear diffusion process that was composed of a nonlinear diffusion term, a fidelity term and an isotropic peaking term. It could sharpen only blurred edges without increasing noise visibility. This paper extends our method to the removal of blurs in images due to image motion. Motion blur is not only shift-variant but also anisotropic. To adapt our method to motion de-blurring, we replaced our isotropic peaking term, with an anisotropic peaking term steered in direction of motion at each pixel. We then devised discrete calculus to adapt it to the direction of motion. Through experiments with a test sequence of moving images containing artificial motion-blurs, we quantitatively evaluated sharpening performance. Our new method with the anisotropic peaking term performed better than our prototypal method with the isotropic peaking term, and was robust against errors in the direction of motion.
Poster Session
icon_mobile_dropdown
A processor for MPEG decoder SOC: a software/hardware co-design approach
Guojun Yu, Qingdong Yao, Peng Liu, et al.
Media processing such as real-time compression and decompression of video signal is now expected to be the driving force in the evolution of media processor. In this paper, a hardware and software co-design approach is introduced for a 32-bit media processor: MediaDsp3201 (briefly, MD32), which is realized in 0.18μm TSMC, 200MHz and can achieve 200 million multiply-accumulate (MAC) operations per second. In our design, we have emerged RISC and DSP into one processor (RISC/DSP). Based on the analysis of inherent characteristics of video processing algorithms, media enhancement instructions are adopted into MD32’instruction set. The media extension instructions are physically realized in the processor core, and improves video processing performance effectively with negligible additional hardware cost (2.7%). Considering the high complexity of the operation for media instructions, technology named scalable super pipeline is used to resolve problem of the time delay of pipeline stage (mainly EX stage). Simulation results show that our method can reduce more than 31% and 23% instructions for IDCT compared to MMX and SSE’s implementation and 40% for MC compared to MMX’s implementation.
Improved color interpolation using discrete wavelet transform
Giuseppe Spampinato, Arcangelo Bruna, Giuseppe Sanguedolce, et al.
New approaches to Color Interpolation based on Discrete Wavelet Transform are described. The Bayer data are split into the three colour components; for each component the Wavelet Coefficient Interpolation (WCI) algorithm is applied and results are combined to obtain the final colour interpolated image. A further anti-aliasing algorithm can be applied in order to reduce false colours. A first approach consists of interpolating wavelet coefficients starting from a spatial analysis of the input image. It was considered an interpolation step based on threshold levels associated to the spatial correlation of the input image pixel. A second approach consists of interpolating wavelet coefficients starting from the analysis of known coefficients of the second transform level. The resolution of each wavelet transform level is double as regards the successive one, so we can suppose a correspondence among wavelet coefficients belong to successive sub-bands. Visual quality of the interpolated RGB images is improved, reducing the zipper and aliasing effects. Moreover, in an embedded systems, which use JPEG2000 compression, a low computational cost is achieved in both cases since only some threshold evaluations and the IDWT step are performed for the first approach, while the second one involves only the DWT and IDWT steps.
Combined data partitioning and fine granularity scalability for channel adaptive video transmission
Layered video coding is used for adaptive transmission over channels having variable bandwidths. In the two well-known methods of data partitioning (DP) and fine granularity scalability (FGS), a base layer contains essential information and one or more enhancement layers contain fine detail. FGS is continuously scalable above the base layer by successive DCT coefficient bit planes of lower significance, but suffers losses in coding efficiency at low base layer rates. DP, on the other hand, only provides a base partition for header information and low-frequency coefficients and one or more enhancement partitions for higher-frequency coefficients. This results in degraded quality when the enhancement layer is lost but offers performance near single-layer video as the transmission rate approaches the encoding rate. DP is thus suited to bandwidths that vary over a narrow range, whereas FGS performs robustly over a wider range but not as well as single-layer or DP at bandwidths near the full rate. A combination of the two methods can provide higher quality than FGS alone, over a greater bandwidth range than DP alone. This is achieved by using DP on an FGS base layer, which can now have a sufficiently high rate to improve the FGS coding efficiency. Such a combination has been investigated for one form of DP, known as Rate-Distortion optimal Data Partitioning (RDDP), which attempts to provide the best possible base partition quality for a given rate. A method for combining FGS and DP is described, along with expected and computed performances for different rates.
Fuzzy logic recursive change detection for tracking and denoising of video sequences
Vladimir Zlokolica, Matthias De Geyter, Stefan Schulte, et al.
In this paper we propose a fuzzy logic recursive scheme for motion detection and temporal filtering that can deal with the Gaussian noise and unsteady illumination conditions both in temporal and spatial direction. Our focus is on applications concerning tracking and denoising of image sequences. We process an input noisy sequence with fuzzy logic motion detection in order to determine the degree of motion confidence. The proposed motion detector combines the membership degree appropriately using defined fuzzy rules, where the membership degree of motion for each pixel in a 2D-sliding-window is determined by the proposed membership function. Both fuzzy membership function and fuzzy rules are defined in such a way that the performance of the motion detector is optimized in terms of its robustness to noise and unsteady lighting conditions. We perform simultaneously tracking and recursive adaptive temporal filtering, where the amount of filtering is inversely proportional to the confidence with respect to the existence of motion. Finally, temporally filtered frames are further processed by the proposed spatial filter in order to obtain denoised image sequence. The main contribution of this paper is the robust novel fuzzy recursive scheme for motion detection and temporal filtering. We evaluate the proposed motion detection algorithm using two criteria: robustness to noise and changing illumination conditions and motion blur in temporal recursive denoising. Additionally, we make comparisons in terms of noise reduction with other state of the art video denoising techniques.
Parameterized sketches from stereo images
In this paper we present an algorithm to automatically generate sketches from stereo image pairs. Stereo analysis is initially performed on the input stereo pair to estimate a dense depth map. An Edge Combination image is computed by localising object contour edges, as indicated by the depth map, within the intensity reference image. We then approximate these pixel-represented contours by devising a new parametric curve fitting algorithm. The algorithm successfully recovers a minimum number of control points required to fit a Bezier curve onto the pixel-edge dataset in the Edge Combination image. Experiments demonstrate how the Edge Combination algorithm, used for dominant edge extraction, can be combined with a curve fitting algorithm to automatically provide parameterized artistic sketches or concept drawings of a real scene.
B-picture coding with motion-compensated frame rate upconversion
In this paper, we propose a new method to improve the coding efficiency of B-pictures by applying the motion compensated frame rate up-conversion technique. In B-Picture coding in H.264, the direct mode is used for one of the prediction modes. In the direct mode, B-picture is constructed without transmitted motion vector information. Therefore the direct mode is the most effective coding mode in B-pictures. On the other hand, in motion compensated frame rate up-conversion technique, interpolated frames which are similar to B-pictures, can be constructed with the motion vectors by detecting motion between coded images. In our proposed method, the same motion estimation algorithm is used for detecting motion vectors between coded P-pictures in both of encoder side and decoder side. Therefore, our proposed decoder can use the detected motion vectors without transmitted motion vector information for B-picture coding. In the experiment, we selected prediction modes for each block by using conventional rate distortion optimization algorithm. The proposed method provides B-picture coding efficiency nearly 30% over than that of H.264.
Rate-distortion optimized video summary generation and transmission over packet lossy networks
The goal of video summarization is to select key frames from a video sequence in order to generate an optimal summary that can accommodate constraints on viewing time, storage, or bandwidth. While video summary generation without transmission considerations has been studied extensively, the problem of rate-distortion optimized summary generation and transmission in a packet-lossy network has gained little attention. We consider the transmission of summarized video over a packet-lossy network such as the Internet. We depart from traditional rate control methods by not sacrificing the image quality of each transmitted frame but instead focusing on the frames that can be dropped without seriously affecting the quality of the video sequence. We take into account the packet loss probability, and use the end-to-end distortion to optimize the video quality given constraints on the temporal rate of the summary. Different network scenarios such as when a feedback channel is not available, and when a feedback channel is available with the possibility of retransmission, are considered. In each case, we assume a strict end-to-end delay constraint such that the summarized video can be viewed in real-time. We show simulation results for each case, and also discuss the case when the feedback delay may not be constant.
Linear filtering of image subbands for low complexity postprocessing of decoded color images
In reference 1, image adaptive linear minimum mean squared error (LMMSE) filtering was proposed as an enhancement layer color image coding technique that exploited the statistical dependencies among the luminance/chrominance or Karhunen Loeve Transform (KLT)coordinate planes of a lossy compressed color image to enhance the red, blue, green (RGB) color coordinate planes of that image. In the current work, we propose the independent design and application of LMMSE filters on the subbands of a color image as a low complexity solution. Towards this end, only the coordinates of the neighbors of the filtered subband coefficient, that are sufficiently correlated with the corresponding coordinate of the filtered subband coefficient, are included in the support of the filter for each subband. Additionally, each subband LMMSE filter is selectively applied only on the high variance regions of the subband. Simulation results show that, at the expense of an insignificant increase in the overhead rate for the transmission of the coefficients of the filters and with about the same enhancement gain advantage, subband LMMSE filtering offers a substantial complexity advantage over fullband LMMSE filtering.
Computational algebraic topology-based video restoration
This paper presents a scheme for video denoising by diffusion of gray levels in the video domain, based on the Computational Algebraic Topology (CAT) image model. Contrary to usual approaches, using the heat transfer PDE and discretizing and solving it by a purely mathematical process, our approach considers the global expression of the heat transfer and decomposes it into elementary physical laws. Some of these laws link global quantities, integrated on some domains. They are called conservative relations, and lead to error-free expressions. The other laws depend on metric quantitites and require approximations to be expressed in this scheme. However, as every step of the resolution process has a physical interpretation, the approximations can be chosen wisely depending of the wanted behavior of the algorithm. We propose in this paper a nonlinear diffusion algorithm based on the extension to video of an existing 2D algorithm thanks to the flexibility of the topological support. After recalling the physical model for diffusion and the decomposition into basic laws, these laws are modeled in the CAT image model, yielding a numerical scheme. Finally, this model is validated with experimental results and extensions of this work are proposed.
The fast stereo sequence compression using edge-directional joint disparity-motion estimation
Yongtae Kim, Changseob Park, Jaeho Lee, et al.
This paper presents an efficient joint disparity-motion estimation algorithm and fast estimation scheme for stereo sequence CODEC. In stereo sequences, frames from one camera view (usually the left) are defined as the base layer, and frames from the other one as the enhancement layer. The enhancement-from-base-layer prediction then turns out as a disparity-compensated prediction instead of a motion-compensated prediction. Although the disparity-compensated prediction fails, it is still possible to achieve compression by motion-compensated prediction with the same channel. At the same time, the base layer represents a monoscopic sequence. Joint disparity-motion estimation can increase coding efficiency and reduce complexity of stereo CODEC using relationship between disparity and motion fields. The disparity vectors are estimated by using the left and right motion vectors and the previous disparity vectors in each time frame. In order to obtain more accurate disparity vectors, we include spatial prediction process after joint estimation. From joint estimation and spatial prediction, we can obtain accurate disparity vectors and then increase coding efficiency. Moreover, we proposed fast motion estimation technique which utilizes correlation for motion vectors of neighboring blocks. We confirmed PSNR of the proposed method increases by 0.5~1.5dB compared to the conventional methods from simulation results. At the same time, the processing time is reduced by almost 1/10.
Motion-based morphological segmentation of wildlife video
Naveen M. Thomas, Nishan Canagarajah
Segmentation of objects in a video sequence is a key stage in most content-based retrieval systems. By further analysing the behaviour of these objects, it is possible to extract semantic information suitable for higher level content analysis. Since interesting content in a video is usually provided by moving objects, motion is a key feature to be used for pre content analysis segmentation. A motion based segmentation algorithm is presented in this paper that is both efficient and robust. The algorithm is also robust to the type of camera motion. The framework presented consists of three stages. These are the motion estimation stage, foreground detection stage and the refinement stage. An iteration of the first two stages, adaptively altering the motion estimation parameters each time, results in a joint segmentation and motion estimation approach that is extremely fast and accurate. Two dimensional histograms are used as a tool to carry out the foreground detection. The last stage uses morphological approaches as well as a prediction of foreground regions in future frames to further refine the segmentation. In this paper, results obtained from simple and traditional approaches are compared with that of the proposed framework in the wildlife domain.
MPEG-4 constant-quality constant-bit-rate controls
Cheng-Yu Pai, William E. Lynch
Most video rate-control research emphasizes constant bit-rate (CBR) applications. These aim to produce a CBR bitstream with the highest possible quality, within the bitrate constraint and with no consideration for quality variation. In this paper, two MPEG-4 Constant-Quality (CQ) CBR controls are proposed. These aim to produce a CBR bitstream that meets a target quality level whenever possible. The Frame-level Laplacian CQ (FLCQ) algorithm uses a distortion model based on a Laplacian model for DCT coefficients. In contrast, the MB-level Viterbi CQ (MVCQ) algorithm uses the Viterbi algorithm to determine the best combination of MB-QP’s. “CQ” is measured by the deviation of the mean quality from the target quality, and by quality variance over time. Simulation results suggest that the proposed algorithms perform better than Q2 and TM5 under these measures. In some cases, they produce bitstreams with fewer bits while having higher average PSNR, and smaller variance. The FLCQ algorithm has more variation in quality than the MVCQ algorithm. With extra computational complexity, the MVCQ algorithm gives the best performance over all algorithms tested. Often, it precisely meets the target PSNR with no variation. This is truly a CQ rate-control algorithm.
A complete system for head tracking using motion-based particle filter and randomly perturbed active contour
N. Bouaynaya, Dan Schonfeld
Recent advances in multimedia and communication require techniques for accurately tracking objects in video sequences. We propose a complete system for head tracking and contour refinement. Our tracking approach is based on particle filtering framework. However, unlike existing methods that use prior knowledge or likelihood functions as proposal densities, we use a motion-based proposal. Adaptive Block Matching (ABM) algorithm is the motion estimation technique used. Several advantages arise from this choice of proposal. (i) Only few samples are propagated. (ii) The tracking is adaptive to different categories of motion (iii) Off-line motion learning is not needed. Following the tracking is the contour refinement step. We want to transform the parametric estimate representing the tracked head at a given time instant into an elastic contour delineating the head’s boundaries. We use an active contour framework based on a dynamic programming scheme. However active contours are very sensitive to parameter assignment and initial condition. Using the tracked parametric estimate, we create a set of randomly perturbed initial conditions. The optimal contour is then the one corresponding to the lowest energy. Our system demonstrates tracking a person’s head in complex environments and delineates its boundaries for future use.
Server scheduler design for distributed video-on-demand service
Online media server scheduling algorithms in distributed video-on-demand (VoD) systems are studied in this work. We first identify the failure rate and the server-side network bandwidth consumption as two main cost factors in a distributed VoD service model. The proposed distributed server scheduler consists of two parts: the request migration scheme and the dynamic content update strategy. By improving the random early migration (REM) scheme, we propose a cost-aware REM (CAREM) scheme to reduce the network bandwidth consumption due to the migration process. Furthermore, to accommodate the change in video popularity and/or client population, we use the server-video affinity to measure the potential server-side bandwidth cost after placing a specific video copy on that server. The dynamic content update strategy uses the server-video affinity to reconfigure video copies on media servers. We conduct extensive simulations to evaluate the performance of the proposed algorithm. It can be shown that CAREM together with the dynamic content update strategy can improve the system performance by reducing the request failure rate as well as the server bandwidth consumption.
Joint power and distortion control in video coding
Yongfang Liang, Ishfaq Ahmad, Jiancong Luo
For video coding in futuristic ubiquitous environments, how to efficiently manage the power consumption while preserving high video quality is crucial. To address the above challenge elegantly, we formulate a multiple objective optimization problem to model the behavior of power-distortion-optimized video coding. Though the objectives in this problem are incommensurate and in conflict with one another. By assessing the performance trade-offs as well as the collective impact of power and distortion, we propose a joint power distortion control strategy (JPDC), in which the power and distortion are jointly considered. After the analysis on the approach of solving the problem statically, we utilize a sub-optimal “greedy” approach in the JPDC scheme. Each complexity parameter is adjusted individually. The system starts coding at the highest complexity level, and will automatically migrate to lower/higher level until the performance improvement gets saturated, leading to the optimal operation point. We perform simulations to demonstrate the effectiveness of the proposed scheme. Our results show that the proposed JPDC scheme is aware of the power constraint as well as the video content, and achieves significant power savings with well-perceived video quality. Such a feature is particularly desirable for futuristic video applications.
Error resilient video coding using virtual reference picture
Error resilience becomes an important feature of video applications over networks. Due to widely used motion-compensated prediction coding, an important method to combat or conceal errors is providing robust prediction reference. If each block uses one reference, the situation is not different from conventional prediction. If multiple references are used, cost of motion search will increase dramatically. To take advantage of both low complexity of using one reference and robustness of using multiple references, we proposed a video coding system that composes a virtual frame based on previously decoded frames and use it as a prediction reference. The frame is generated by applying a nonlinear filter on previously coded frames. In error free environment, both encoder and decoder can compose an identical reference frame. In case of error, the decoder first conceals the errors by predicting the damaged blocks spatially. Effect of errors is further constrained by composing a reference frame since correct data from decoded frames are used. Lower video quality degradation and smaller quality oscillation can be achieved. Since the composed reference may have lower correlation with the current frame and damage some details, coding efficiency will slightly decrease.
Vision-based speaker location detection
Generally, speaker location detection in video conferencing is audio-based. However, physical room environment which is beyond the control of the speaker detection system can severely change room acoustics. Room acoustics introduce interference and can deteriorate the performance of audio-based speaker detection system. In this paper, we propose a video-based speaker detection method which can be used independently or along with audio-based detection systems. The information on speaker location is intended to create 3-dimensional audio reproduction in order to provide more reality to video conference. In the proposed ethod, we detect moving lips in video sequences. We first detect lips using color information and determine whether the lips are moving. Experiments with real videos provide promising results.
Real-time head tracking based on color and shape information
Dong-gil Jeong, Yu Kyung Yang, Dong-Goo Kang, et al.
In this paper, we propose a robust real-time head tracking algorithm using a pan-tilt-zoom camera. In the algorithm, the shape of the head is assumed as an ellipse and a model color histogram is acquired in advance. Then, in the first frame, a user defines the position and scale of a head. In the following frame, we consider the ellipse selected in the previous frame as the initial position, and apply the mean-shift procedure to move the position to the target center where the color histogram similarity to the reference one is maximized. Here, the reference color histogram is adaptively updated from the model color histogram by using the one in the previous frame. Then, by using gradient information, the position and scale of the ellipse are further refined. Large background motion often makes the initial position not converge to the target position. To alleviate this problem, we estimate a reliable initial position by compensating the background motion. Here, to reduce the computational burden of motion estimation, we use the vertical and horizontal 1-D projection dataset. Extensive experiments show that a head is well tracked even when a person moves fast and the scale of the head changes drastically.
Rate distortion optimized slicing over bit error channels
In this paper, we study the problem of optimizing slice sizes and locations for video transmission over bit error channels. We propose a method that utilizes estimation of slice rate and distortion which is a function of the inter-macroblock dependency as exploited in the video codec. First we experimentally show that our estimation is effective. Since there are practically numerous possibilities for slice configurations and one must actually check all possibilities for a complete search, we assume segmentation of macroblocks so that a slice can only start at the segments. Although this results in a slightly suboptimal solution, it reduces the number of possibilities. However there are still practically too many configurations. Then we use the proposed RD estimation method and combine it with a dynamic programming approach to effectively determine the most optimal slice configuration. RD optimization is carried out in terms of minimizing the expected Lagrangian using the expected distortion of a slice. We assume that the physical layer is capable of generating an error report showing which transmission units are received in error. This allows us to use NEWPRED with macroblock level feedback and prevent wrongful decoding of erroneous macroblocks. We compare the proposed system with the two common approaches namely, fixed number of MBs per slice and fixed number of bits per slice and see that the proposed method performs better than both.
Image segmentation using thick-fluid watersheds
Traditional watershed and marker-based image segmentation algorithms are very sensitive to noise. The main reason for this is that these segmentation algorithms are locally dependent on some type of edge indicator input image that is traditionally computed on a pixel-by-pixel basis. Additionally, as a result of raw watershed segmentation, the original image can be seriously oversegmented, and it may be difficult to reduce the oversegmentation and the impact of noise without also inducing several undesired region merges. This last problem is a typical result of local "edge gaps" that may appear along the topographic watershed mountain rims. Through these gaps the marker or watershed labels can easily leak into neighboring segments. We propose a novel pair of algorithms that uses "thick fluid" label propagation in order to try and solve these problems. The thick fluid technique is based on considering information from multiple adjacent pixels along the topographic watershed mountain rims that separate the different objects in an initial pre-segmented image.
Energy efficient video summarization and transmission over a slow fading wireless channel
With the deployment of 2.5G/3G cellular network infrastructure and large number of camera equipped cell phones, the demand for video enabled applications are high. However, for an uplink wireless channel, both the bandwidth and battery energy capability are limited in a mobile phone for the video communication. These technical problems need to be effectively addressed before the practical and affordable video applications can be made available to consumers. In this paper we investigate the energy efficient video communication solution through joint video summarization and transmission adaptation over a slow fading channel. Coding and modulation schemes, as well as packet transmission strategy are optimized and adapted to the unique packet arrival and delay characteristics of the video summaries. Operational energy efficiency -- summary distortion performance is characterized under an optimal summarization setting.
A new lossless multiresolution motion estimation algorithm using PDE, adaptive matching scan, and spiral search
Jong-Nam Kim, Kyung-Won Kang, Kwang-Seok Moon
In this paper, we propose a new lossless MRME algorithm applicable to the current international video coding standards in which we remove only unnecessary computation in calculating block-matching error without any degradation of prediction quality. Our proposed algorithm employs the MRME scheme, the PDE (Partial Distortion Elimination), the spiral search, and the adaptive matching scan from the image complexity of the reference block. Important thing in the PDE algorithm is that how fast impossible candidates are detected by removing unnecessary computation. In this paper, we use the fact that the block-matching error is proportional to the complexity of the reference block with Taylor series expansion. The motivation of the proposing algorithm is using image complexity to find the impossible candidates faster. Local complexity of subblock is defined as spatial complexity of image data for each subblock and measured with gradient magnitude. From the experimental results, our proposed algorithm saves 50%~80% compared with the computations of the original MRME algorithm, while our proposed algorithm has the same prediction quality as that of the original MRME algorithm. Our proposed algorithm is applicable to the MPEG video codec such as MPEG-2 and MPEG-4 AVC and will be useful to real-time video coding applications.
A fast texture feature extraction method for region-based image segmentation
Hui Zhang, Jason E. Fritts, Sally A. Goldman
In region-based image segmentation, an image is partitioned into connected regions by grouping neighboring pixels of similar features. To achieve fine-grain segmentation at the pixel level, we must be able to define features on a per-pixel basis. Typically for individual pixels, texture feature extraction is very computationally intensive. In this paper, we propose a new hierarchical method to reduce the computational complexity and expedite texture feature extraction, by taking advantage of the similarities between the neighboring pixels. In our method, an image is divided into blocks of pixels of different granularities at the various levels of the hierarchy. A representative pixel is used to describe the texture within a block. Each pixel within a block gets its texture feature values either by copying the corresponding representative pixel’s texture features, if its features are deemed sufficiently similar, or by computing its own texture features if it is a representative pixel itself. This way, we extract texture features for each pixel in the image with the minimal amount of texture feature extraction computation. The experiments demonstrate the good performance of our method, which can reduce 30% to 60% of the computational time while keeping the distortions in the range of 0.6% to 3.7%. By tailoring the texture feature extraction threshold, we can balance the tradeoff between extraction speed and distortion according to the each system’s specific needs.
Skin color constancy for illumination invariant skin segmentation
Accuracy of skin segmentation algorithms is highly sensitive to changes in lighting conditions. When the lighting condition in a scene is different from that in the training examples, miss-classification rate of the skin segmentation algorithms is high. Using color constancy approach we aim to compensate for skin color variations to achieve accurate skin color segmentation. Skin color constancy is realized in an unsupervised manner by using the color changes observed on a face for different illuminations to drive the model. By training on few faces of different ethnicities, our model is able to generalize the color mapping for any unseen ethnicity. The color changes observed are used to learn the color mapping from one lighting condition to the other. These mappings are represented in a low dimensional subspace to obtain basis vector fields. Using these basis vector fields we can model the nonlinear color changes to transform skin colors in arbitrary lighting conditions to a reference lighting condition. We show the proof of concept of unsupervised skin color constancy on faces from the PIE database. Skin segmentation is performed on the color compensated faces using a Skin Distribution Map (SDM), which is trained on skin colors in reference lighting condition.
Computations of the symmetric cosine transform using Forsythe and Clenshaw's recurrence fomulae
Maurice F. Aburdene, Hoang M. Le, John E. Dorband
The discrete cosine transform (DCT) is commonly used in signal processing, image processing, communication systems and control systems. We use two methods based on the algorithms of Clenshaw and Forsyth to compute the recursive DCT in parallel. The symmetrical discrete cosine transform (SCT) is computed first and then it can be used as an intermediate tool to compute other forms of the DCT. The advantage of the SCT is that both the forward SCT and its inverse can be computed by the same method and hardware implementation. Although Clenshaw’s algorithm is the more efficient in computational complexity, it is not necessarily the more accurate one. The computational accuracy of these algorithms is discussed. In addition, the front-to-back forms of Clenshaw and Forsyth’s algorithms are implemented in aCe C, a parallel programming language.
Adaptive update using visual models for lifting-based motion-compensated temporal filtering
Song Li, H. Kai Xiong, Feng Wu, et al.
A fundamental difference in the MCTF coding scheme from the conventional compensated DCT schemes is that the predicted residue is further used to update the temporal low-pass frames. If the motion prediction is inaccurate, it would introduce ghost at the low-pass frames when some high-pass frames are dropped due to limited channel bandwidth or device capability. However, it will definitely hurt the coding efficiency if the update step is removed totally. To solve the dilemma, this paper proposes a content adaptive update scheme, where the JND (Just Noticeable Difference) metric is used to evaluate the impact of the update steps in terms of visual quality at the low-pass frames. The JND thresholds are image dependent, and as long as the update information remains below these thresholds, we achieve “update residual” transparency. Therefore, the potential ghost artifacts detected by the model can be alleviated by adaptively removing visible part of the predicted residues. Experimental results show that the proposed algorithm not only significantly improves subjective visual quality of the temporal low-pass frames but also maintains the PSNR performance compared with the normal full update.
Zero-motion vector-biased cross-diamond search algorithm for rapid block matching motion estimation
Xiaoquan Yi, Nam Ling
High compression ratio and very low encoding computational complexity are the keys in designing successful encoder for energy constrained video conversational applications since coding efficiency, speed, and energy frugality are critical. Computation-intensive motion estimation (ME) process is an obstacle to overcome for these applications. To control and optimize encoding complexity behavior, we propose a zero-motion-vector-biased cross-diamond search (ZCDS) algorithm for rapid block matching based on the well-known cross-diamond search (CDS) algorithm. Unlike many conventional fast block-matching algorithms (BMAs), which use either fixed threshold or distortion function of temporally or spatially adjacent blocks for early search termination, ZCDS is based on a dynamic block distortion threshold, via a linear model utilizing already computed statistics and information of current block. A new fine granularity halfway-stop (FGHS) method is also proposed for early termination of the search process. Designed for various motion contents, ZCDS adaptively starts with a small or large cross search pattern, which is automatically determined via an initial block matching distortion. Experimental results show that the proposed algorithm achieves smoother motion vector fields and demands significantly less search points with marginal peak-signal-to-noise-ratio (PSNR) loss when compared to those of full search and other conventional fast BMAs.
Adaptive multifoveation for low-complexity video compression with a stationary camera perspective
Sriram Sankaran, Rashid Ansari, Ashfaq A. Khokhar
In human visual system the spatial resolution of a scene under view decreases uniformly at points of increasing distance from the point of gaze, also called foveation point. This phenomenon is referred to as foveation and has been exploited in foveated imaging to allocate bits in image and video coding according to spatially varying perceived resolution. Several digital image processing techniques have been proposed in the past to realize foveated images and video. In most cases a single foveation point is assumed in a scene. Recently there has been a significant interest in dynamic as well as multi-point foveation. The complexity involved in identification of foveation points is however significantly high in the proposed approaches. In this paper, an adaptive multi-point foveation technique for video data based on the concepts of regions of interests (ROIs) is proposed and its performance is investigated. The points of interest are assumed to be centroid of moving objects and dynamically determined by the foveation algorithm proposed. Fast algorithm for implementing region based multi-foveation processing is proposed. The proposed adaptive multi-foveation fully integrates with existing video codec standard in both spatial and DCT domain.
JPEG2000-based image communication for modern browsing techniques
Rene Rosenbaum, Heidrun Schumann
The size of large images often exceeds the display area of the user's output device. To present such images appropriately sophisticated image browsing techniques have been developed. To achieve a better image comprehension, these techniques combine detailed and distorted image regions in a variety of ways. However, much of the commonly transmitted image data is discarded during the creation of the representation. Our proposal for remote image browsing is to limit the image transmission to data mandatory to represent the image within a certain browsing technique without loss of information. In this publication a systematic view on image communication for different browsing techniques is presented. It can be shown that an appropriate linkage and combination of all communication steps can significantly improve the performance of the whole system. Based on Regions of Interest and Levels of Detail, a classification of current image browsing approaches is presented, in order to characterize these techniques regarding their representation of the image. Based on this, appropriate strategies for image compression and transfer are derived and guidelines for the design of remote image browsing systems are given. Due to its excellent compression performance and numerous features, the JPEG2000 standard is adopted as a foundation of the proposed scheme regarding compression and efficient transmission of the image.
Improved error resilient H.264 coding scheme using SP/SI macroblocks
In this work, we improve the H.264 error resilient coding scheme in [1, 2] with a hybrid scheme that generates alternative SP macroblocks utilizing both multiple reference frames [1] and the concealed versions of corrupted frames [2]. The new scheme is more robust and able to work effectively under different coding environments. Although the bit rate overhead introduced by [1] and [2] is acceptable for some applications, we use an adaptive coding and bit stream replacement mechanism to reduce the overhead furthermore to meet the strict bandwidth constraints. Specifically, two different versions of alternative SP macroblocks are coded using different quantization levels. They provide different levels of error resilient performance with different bit rate consumptions. When the sender attempts to replace the originally coded version of a target macroblock in the bit stream, it will select one proper version according to the importance of the macroblock. The importance of the macroblock is measured by its influence to the subjective quality of the current frame and its impact to subsequent frames. The implementation and the standard conformance of the proposed scheme are detailed in this work.
Compressed-domain registration techniques for MPEG video
Ming-Sui Lee, Mei-Yin Shen, C.-C. Jay Kuo
A multi-scale DCT-domain image registration technique for two MPEG video inputs is proposed in this work. Several edge detectors are first applied to the luminance component of DC coefficients to generate the so-called difference maps for each input image. Then, a threshold is selected for each difference map to filter out regions of lower activity. Following that, we estimate the displacement parameters by examining the difference maps of the two input images associated with the same edge detector. Finally, the ultimate displacement vector is calculated by averaging the parameters from all detectors. In order to reach higher quality of the output mosaic, 1D alignment is locally applied to pixels around the boundaries of displacement that is decided in the previous step. It is shown that the proposed method reduces the computation complexity dramatically as compared to pixel-based image registration techniques while reaching a satisfactory result in composition. Moreover, we discuss how the overlapping region affects the quality of alignment.
Bayes-risk minimized intra/inter coding mode prediction for H.264
In this paper, a hierarchical prediction scheme for fast motion search is proposed in order to implement an effective encoder that pays a small amount of search cost to achieve the best match without little rate-distortion performance degradation. The proposed method has a hierarchical tree structure, where each node has two sub-tasks; namely, the reference frame selection and the spatial search region prediction. From now on, let’s call best motion vector by full search as true motion vector. Proposed fast motion search for multiple reference frames can be summarized as: Step1. Initial searching point in previous frame: In literature, majority of true motion vectors are distributed around the center of searching window in recent frames so we first search the previous frame. Diamond search fails to search the true motion when the initial point misleads search path to local minima. In order to solve this problem, initial points are searched based on multiresolution motion search. Step2. Perform diamond search for each initial points and compare the minimum distortion measure with stopping criterion which defines maximum searching points, minimum diamond size and threshold based on QP. If the stopping criterion is met, then motion search is finished with the motion vector having minimum distortion. Otherwise go to Step 3. Here, the diamond size depends on the motion activity between current frame (T) and previous frame (T-1). Step3. Find out search center in the frame indicated by motion trace (motion vector field obtained from previous encoding stage) for all best points in each diamonds in previous frame (T-1). Based on its motion activity, the diamond size is determined and checked the stopping criterion for all diamond in second previous frames. Iterate this step until the stopping condition is met.
Low-complexity video encoding using B-frame direct modes
Yuxin Liu, Josep Prades-Nebot, Paul Salama, et al.
Low complexity video encoding shifts the computational complexity from the encoder to the decoder, which is developed for applications characterized by scarce resources at the encoder. Wyner-Ziv and Slepian-Wolf theorems have provided the theoretic bases for low complexity video encoding. In this paper, we propose a low complexity video encoding using B-frame direct modes. We extend the direct-mode idea that was originally developed for encoding B frames, and design new B-frame direct modes. Motion vectors are obtained for B-frames at the decoder and transmitted back to the encoder using a feedback channel, hence no motion estimation is needed at the encoder to encoding any B frame. Experimental results implemented by modifying ITU-T H.26L software show that our approach can obtain a competitive rate distortion performance compared to that of conventional high complexity video encoding.
MPEG-4 outer-inner lip FAP interpolation
This paper presents a novel inner-outer lip facial animation parameters (FAPs) interpolation within the framework of the MPEG-4 standard to allow for interpolation and realistic animation that can be accomplished at very low bit rates. The proposed technique applies Maximum a posteriori (MAP) estimation to interpolate the inner lip FAPs given outer lip FAPs, or vice versa. The vectors of the lip FAPs along with the first order and the second order temporal derivatives over frames and the conditional probabilities of these vectors are modeled as Gaussian distributions. As a result, the MAP estimate of the inner lip FAPs can be obtained as a linear transformation of the outer lip FAPs, or vice versa. When data obtained from the training process are stored in the receiver, the inner (or outer) lip FAPs can be interpolated from the outer (or inner) lip FAPs, which are all that needs to be transmitted. Thus, without any increase in the amount of transmitted information, a more realistic talking animation that provides more visual information for speech reading can be achieved. Based on our experiments, the proposed technique reduces the Percentage Normalized Mean Error (PNME) error by half over the standard technique.
A genetic algorithm for layered multisource video distribution
Lai-Tee Cheok, Alexandros Eleftheriadis
We propose a genetic algorithm -- MckpGen -- for rate scaling and adaptive streaming of layered video streams from multiple sources in a bandwidth-constrained environment. A genetic algorithm (GA) consists of several components: a representation scheme; a generator for creating an initial population; a crossover operator for producing offspring solutions from parents; a mutation operator to promote genetic diversity and a repair operator to ensure feasibility of solutions produced. We formulated the problem as a Multiple-Choice Knapsack Problem (MCKP), a variant of Knapsack Problem (KP) and a decision problem in combinatorial optimization. MCKP has many successful applications in fault tolerance, capital budgeting, resource allocation for conserving energy on mobile devices, etc. Genetic algorithms have been used to solve NP-complete problems effectively, such as the KP, however, to the best of our knowledge, there is no GA for MCKP. We utilize a binary chromosome representation scheme for MCKP and design and implement the components, utilizing problem-specific knowledge for solving MCKP. In addition, for the repair operator, we propose two schemes (RepairSimple and RepairBRP). Results show that RepairBRP yields significantly better performance. We further show that the average fitness of the entire population converges towards the best fitness (optimal) value and compare the performance at various bit-rates.
A multiresolutional algorithm for halftone detection
In this paper we present an algorithm that segments a scanned, compound document into halftone and non-halftone areas. Our work is intended as a precursor to sophisticated document processing applications (descreening, compression, document content analysis, etc.) for which undetected halftones may cause serious adverse affects. Our method is of very low computational and memory complexity and performs only a single pass on the scanned document. Halftone regions of arbitrary size and shape are detected on compound multilingual documents in a completely automated fashion without any prior knowledge on the type and resolution of the halftones to be detected. The proposed technique can be adjusted to determine halftones of particular dpi resolution or decompose detected halftones to constituent resolutions. We obtain high detection probabilities on compound multilingual documents containing halftones and fine text.
Hand based biometry
Erdem Yoruk, Helin Dutagaci, Bulent Sankur
A biometric scheme based on the silhouettes and/or textures of the hands is developed. The crucial part of the algorithm is the accurate registration of the deformable shape of the hands since subjects are not constrained in pose or posture during acquisition. A host of shape and texture features are comparatively evaluated, such as Independent component features (ICA features), Principal Component Analysis (PCA features), Angular Radial Transform (ART features) and the distance transform (DT) based features. Even with a limited number of training data it is shown that this biometric scheme can perform reliably for populations up to several hundreds.
Investigative Image Processing
icon_mobile_dropdown
Geometrical methods for accurate forensic videogrammetry. Part II. Reducing complexity of Cartesian scene measurements via epipolar registration
Lenny Rudin, Pascal Monasse, Ping Yu
We present the technical steps involved in a new method of photogrammetry, requiring much fewer measurements in the observed scene, no availability of the original camera’s calibration, and no prior knowledge of its position and orientation. The practical setup involves minimal humans measuring intervention. This largely automatic configuration permits to get accurate lengths, angles, and areas measurements in the original scene without prospecting the 3-D Cartesian coordinates for known points. The crucial observation is that the two snapshots, even if not simultaneous and from different cameras, provide a stereo system. Therefore the correspondence of 8 points in the views gives the epipolar geometry of the stereo setup, and as one of the camera is calibrated, the calibration can be “transferred” to the other camera through the epipolar matrix. This transfer yields a calibration of the original camera (internal parameters and position in the scene) even if it is not available anymore, its settings have changed, its orientation is different or it was moved. Thus we replace the technically difficult, time-consuming, and potentially error prone data collection by the epipolar registration and some rudimentary scene measurements. This new technique can also be applied to the task of photo-comparison.