Object-based video: extraction tools, evaluation metrics and applications
Author(s):
Andrea Cavallaro;
Touradj Ebrahimi
Show Abstract
The representation of video information in terms of its content is at
the foundation of many multimedia applications, such as broadcasting,
content-based information retrieval, interactive video, remote
surveillance and entertainment. In particular, object-based
representation consists in decomposing the video content into a
collection of meaningful objects. This approach offers a broad range
of capabilities in terms of access, manipulation and interaction with
the visual content. The basic difference when compared with
pixel-based procedures is that instead of processing individual
pixels, image objects are used in the representation. To exploit the
benefits of object-based representation, multimedia applications need
automatic techniques for extracting such objects from video data, a
problem that still remains largely unsolved. In this paper, we first
review the extraction techniques that enable the separation of
foreground objects from the background. Their field of applicability
and their limitations are discussed. Next, automatic tools for
evaluating their performances are introduced. The major applications
that benefit from an object-based approach are then analysed. Finally,
we discuss some open research issues in object-based video.
Automatic video object segmentation for MPEG-4
Author(s):
Wei Wei;
King Ngi Ngan
Show Abstract
This paper presents a novel method to automatically extract moving objects using motion and color information. Based on pattern recognition and object tracking principles, the proposed method can handle sequences both with moving and stationary backgrounds. It derives for each physical object a two-dimensional binary model, with the model points corresponding to the edges that were detected by the Canny Operator. The resulting binary models combined with the temporal and spatial information will then guide the extraction of the actual VOPs from the video sequence. The performance of the segmentation technique is illustrated by simulations carried out on standard video sequences.
Multimetric evaluation protocol for user-assisted video object extraction systems
Author(s):
Na Li;
Shipeng Li;
Chun Chen
Show Abstract
In this paper, a multi-metric evaluation protocol is proposed to evaluate performance of user-assisted video object extraction systems. Evaluation metrics are the essential element in performance evaluation methodology. Recent works on video object segmentation/extraction are mostly restricted to a single objective metric to judge the overall performance of algorithms. Motivated by a novel framework for performance evaluation on image segmentation using Pareto front, we propose a multi-metric evaluation protocol, including metrics for contour-based spatial matching, temporal consistency, user workload and time consumption. Taking the characteristic of a user-assisted video object extraction system into consideration, we formulate the metrics in a way simple but close to the assessment of human visual system. For spatial matching, we define three types of errors: sharp error, smooth error and mass error, which can precisely score an extraction result. The time consistency is introduced to evaluate the stability over time of a system. In addition, as far as a user-assisted system is concerned, the workload of users is also in our metric. Incorporating multi-metric into one 4-D fitness space, we adopt the Pareto front to find the best choice of a system with optimal parameters. The tests of our evaluation method show that the multi-metric protocol is effective.
Performance measures for video object segmentation and tracking
Author(s):
Cigdem Eroglu Erdem;
Bulent Sankur;
A. Murat Tekalp
Show Abstract
We propose measures to evaluate the performance of video object segmentation and tracking methods quantitatively without
ground-truth segmentation maps. The proposed measures are based on
spatial differences of color and motion along the boundary of the
estimated video object plane and temporal differences between the
color histogram of the current object plane and its neighbors.
They can be used to localize (spatially and/or temporally) regions
where segmentation results are good or bad; and/or combined to
yield a single numerical measure to indicate the goodness of the
boundary segmentation and tracking results over a sequence. The
validity of the proposed performance measures without ground
truth have been demonstrated by canonical correlation analysis of
the proposed measures with another set of measures it with
ground-truth on a set of sequences (where ground truth
information is available). Experimental results are presented to
evaluate the segmentation maps obtained from various sequences
using different segmentation and tracking algorithms.
Real-time segmentation of video objects for mixed-reality interactive applications
Author(s):
Xavier Marichal;
Toshiyuki Umeda
Show Abstract
The present paper introduces a very specific and pragmatic approach to segmentation. It is driven by a particular application context: in the framework of mixed-reality, Tranfiction (“transportation into fictional spaces”) is designed to mix synthetic and natural images in real time while allowing users to interact in these input/output screens. Segmentation is therefore used to provide both the immersion and interaction capabilities. The former aspect is achieved by composing the image of the user within the projected virtual scenes, while the later is achieved thanks to basic body/gesture analysis on the segmented silhouettes. According to indoor or outdoor usages, two real-time techniques are developed. Results are analyzed with respect to the overall application, not only in terms of absolute quality but also in terms of perception by the users.
Human detection and tracking for video surveillance applications in a low-density environment
Author(s):
Lionel Carminati;
Jenny Benois-Pineau;
Marc Gelgon
Show Abstract
In this paper, we describe a new way to create an object oriented video surveillance system that monitors activity in a site. The process is performed in two steps: first, detection of human faces as a guess for objects of interest is done and tracking of these entities through a video stream. The guidelines here are not to perform a very accurate detection and tracking, based on the contours for example, but to provide a global image processing system on a simple Personal Computer taking advantage from co-operation of detection and tracking. So the scheme we propose here provides a simple, fast solution that tracks few specific points of interest on the object boundary and possibly engage a motion based detection in order to recover the object of interest in the scene or to detect new object of interest as well. This tracker also enables learning motion activities, detecting unusual activities, and supplying statistical information about motion in a scene.
The unbearable lightness of being there: contrasting approaches to presence engineering
Author(s):
Wijnand A. IJsselsteijn;
Joy van Baren;
Natalia Romero;
Panos Markopoulos
Show Abstract
The emergence and proliferation of email, mobile communication devices, internet chatrooms, shared virtual environments, advanced tele-conferencing platforms and other telecommunication systems underline the importance of developing measurement methods that are sensitive to the human experience with these systems. In this paper, we discuss the concepts of social presence and connectedness as complementary notions, each relating to a different set of media properties that serve distinct communication needs. We aim to broaden the scope of current presence technologies and applications, illustrating the various factors that play a role in establishing, enhancing, and enriching the experience of human connectedness through communication media. Based on existing literature, we discuss a number of user requirements for home communication and awareness systems. To make these ideas tangible, we finish the paper by briefly discussing the ASTRA project as a case study in designing and evaluating an awareness system for the home.
Immersive 3D video conferencing: challenges, concepts, and implementations
Author(s):
Peter Eisert
Show Abstract
In this paper, a next generation 3-D video conferencing system is
presented that provides immersive tele-presence and natural
representation of all participants in a shared virtual meeting space.
The system is based on the principle of a shared virtual table
environment which guarantees correct eye contact and gesture
reproduction and enhances the quality of human-centered communication. The virtual environment is modeled in MPEG-4 which also allows the seamless integration of explicit 3-D head models for a low-bandwidth connection to mobile users. In this case, facial
expression and motion information is transmitted instead of video
streams resulting in bit-rates of a few kbit/s per participant. Beside low bit-rates, the model-based approach enables new possibilities for image enhancements like digital make-up, digital dressing, or modification of scene lighting.
Studio production system for dynamic 3D content
Author(s):
Oliver Grau
Show Abstract
This contribution describes a studio system that supports the production of dynamic 3D content. The system is based on a multi-camera approach to capture dynamic scenes in a studio equipped with a chroma-keying facility. This paper focusses on the real-time functionality of the system. Three different methods for the computation of the visual hull from silhouette images were implemented and compared. Two methods use a volumetric representation (3D-array and octree representation). The third method uses a new line representation and has been shown to deliver more accurate surface approximations of objects.
Further to the capture functionality, the system provides visualisation tools for planning the production and on-set visualisation. The latter includes a pre-view of the composed programme for the director and an immersive feedback for the actor. The immersive feedback is implemented using a view-dependent projection onto a special retro-reflective chroma-keying cloth and is not interfering with the shape capturing sub-system.
Hue feature-based stereo scheme for detecting half-occlusion and order reversal of objects
Author(s):
Tsan Po Siu;
Tardi Tjahjadi;
Irene Yu-Hua Gu
Show Abstract
One of the most challenging tasks in stereopsis is to find corresponding points in a stereo image pair to obtain depth information. Half occlusion and order-reversal of objects further complicate the problem. In this paper, we propose the use of hue and intensity of images for stereo correspondences. Stereo correspondence is performed row by row. Planar first-order least squares curve-fitting is exploited to extract line-segments with similar hue. A sloping intensity profile of line-segment is used to indicate the presence of a sloping surface with the same hue. A Left-Left Right-Right (LL-RR) constraint is introduced for matching line-segment pair: the possible occluded part should be located both at the leftmost part of the left-line-segment and at the rightmost part of the right-line-segment. This constraint is used to detect half-occlusion in line-segment pair matching. Finally, an object is formed from a block of contiguous line-segments separated from other blocks by gaps. Experiments were performed on stereo image pairs and the estimated disparity maps are used to evaluate the performance of the proposed algorithm.
A contour-based stereovision algorithm for video surveillance applications
Author(s):
Christian Woehler;
Lars Krueger
Show Abstract
In this paper, we will describe a real-time stereo vision algorithm that determines the disparity map of a given scene by an evaluation of the object contours, relying on a reference image displaying the scene without objects. Contours are extracted from the full-resolution absolute difference image between current and reference image by binarization with several locally adaptive thresholds. To estimate disparity values, contour segments extending over several epipolar lines are used. This approach leads to very accurate disparity values. The algorithm can be configured such that no image region that differs from the reference image by more than a given minimum statistical significance is overlooked, which makes it especially suitable for safety applications.
We successfully apply this contour based stereo vision (CBS)algorithm to the task of video surveillance of hazardous areas in the production environment, regarding several thousands of test images. Under the harsh conditions encountered in this setting, the CBS algorithm achieves to faithfully detect objects entering the scene and determine their three-dimensional structure. What is more, it turns out to cope with small objects and very difficult illumination and contrast settings.
Investigation on the effect of disparity-based asymmetrical filtering on stereoscopic video
Author(s):
Gi Mun Um;
Filippo Speranza;
Liang Zhang;
Wa James Tam;
Ron Renaud;
Lew B. Stelmach;
Chung-Hyun Ahn
Show Abstract
Current binocular stereoscopic displays cause visual discomfort when the objects with large disparities are present in the scene. With this technique, the improvement of visual comfort has been reported by blurring far background and foregrounds in the scene. However, this technique has a drawback of degrading overall image quality. To lesson visual discomfort caused by large disparities while maintaining high-perceived image quality, we use a novel disparity-based asymmetrical filtering technique. Asymmetrical filtering, which refers to the filtering applied to the image of one eye only, has been showen to maintain the sharpness of a stereoscopic image, provided that the amount of filtering is low. Disparity-based asymmetrical filtering usese the disparity information in a stereoscopic image for controlling the severity of blurring. We investigated the effects of this technique on stereoscopic video by measuring visual comfort and apparent sharpness. Our results indicate that disparity-based asymmetrical filtering does not always improve visual comfort but it maintains image quality.
Virtual control of optical axis for stereo HDTV
Author(s):
Jong-Il Park;
Sang Hyo Han;
Gi Mun Um;
Chung Hyun Ahn;
Soo In Lee
Show Abstract
In stereoscopic television, there is a trade-off between visual comfort and 3D impact with respect to the baseline-stretch of 3D camera. It has been reported that an optimal condition can be reached when we set the baseline-stretch at about the distance of human pupils1. However, we cannot get such distance in case that the sizes of the lens and CCD module are big. In order to overcome this limitation, we attempt to control the baseline-stretch of stereoscopic camera by synthesizing virtual views at the desired location of interval between two cameras. Proposed technique is based on the stereo matching and view synthesis techniques. We first obtain a dense disparity map using a hierarchical stereo matching with the edge-adaptive shifted window. And then we synthesize virtual views using the disparity map. Simulation results with various stereoscopic images demonstrate the effectiveness of the proposed technique.
High-flexibility scalable image coding
Author(s):
Pascal Frossard;
Pierre Vandergheynst;
Rosa M. Figueras i Ventura
Show Abstract
This paper presents a new, highly flexible, scalable image coder
based on a Matching Pursuit expansion. The dictionary of atoms is
built by translation, rotation and anisotropic refinement of
gaussian functions, in order to efficiently capture edges in
natural images. In the same time, the dictionary is invariant
under isotropic scaling, which interestingly leads to very simple
spatial resizing operations. It is shown that the proposed scheme
compares to state-of-the-art coders when the compressed image is
transcoded to a lower (octave-based) spatial resolution. In
contrary to common compression formats, our bit-stream can
moreover easily and efficiently be decoded at any spatial
resolution, even with irrational re-scaling factors. In the same
time, the Matching Pursuit algorithm provides an intrinsically
progressive stream. This worthy feature allows for easy rate
filtering operations, where the least important atoms are simply
discarded to fit restrictive bandwidth constraints. Our scheme is
finally shown to favorably compare to state-of-the-art
progressive coders for moderate to quite important rate
reductions.
SNR scalability by transform coefficient refinement for block-based video coding
Author(s):
Till Halbach;
Thomas R. Fischer
Show Abstract
Various techniques for SNR scalability in hybrid block-based
video coding exist in the literature and in different standards.
A new approach based on transform coefficient refinement is proposed
in this article. The coefficient difference is computed after quantization, subsequently entropy-encoded, and transmitted for reconstruction of a high-quality layer at the decoder. The approach achieves in most cases only a moderate increase in bit rate as compared to other schemes. The bit rate of our two-layer framework converges towards the rate of a single-layer system as the quality gap between both layers increases. The gains come at the cost of increased computational complexity and memory requirements.
Wavelet based scalable video coding with spatially scalable motion vectors
Author(s):
Huipin Zhang;
Frank Bossen
Show Abstract
The paper studies scalable video coding based on multiresolution
video representations generated by multi-scale subband motion
compensated temporal filtering (MCTF) and spatial wavelet
transform. Since MCTF is performed subband by subband in the
spatial wavelet domain, motion vectors are available for
reconstructing video sequences of any possible reduced spatial
resolution, restricted by the dyadic decomposition pattern and the
maximal spatial decomposition level. The multiresolution
representations naturally provide a framework with which both
spatial scalability and temporal scalability can be very
conveniently and efficiently supported by a video coder that
utilizes such multiresolution video representations. Such video
coders can be fully scalable by incorporating wavelet-domain
bit-plane image coding techniques. This paper examines the
performance, including scalability and coding efficiency, of a
scalable video coder that utilizes such multi-scale video
representations together with the EZBC image coder. A
wavelet-domain variable block size motion estimation algorithm is
introduced to enhance the performance of the subband MCTF.
Experiments show that the proposed coder outperforms the state of
the art fully scalable coder MC-EZBC in terms of the spatial
scalability.
Robust video transmission over lossy packet networks using block-based fine granularity scalable coding
Author(s):
Yuwen He;
Shiqiang Yang
Show Abstract
This paper proposes a robust video transmission system over lossy packet-switched networks with block-based fine granularity scalable (B-FGS) coding including source coding, efficient rate allocation and packetization, which works effectively when packet loss occurs. The compressed B-FGS bit-stream can be truncated in fine grain for its block-based bit-plane coding at enhancement layer. Therefore channel adaptive rate allocation can be implemented and the time-varying bandwidth can be exploited efficiently. Distortion associated rate allocation method is proposed to substitute Rate-Distortion optimal rate allocation in order to reduce the streaming server’s complexity. In addition, packetizing with independent data provides robust video transmission, which restrains the error propagation across packets. Furthermore, in order to mitigate quality flicking of decoded video induced by packets loss, spatial interleaving is applied in packetizing. Thus, in proposed video transmission system, available bandwidth variation and packet loss, two key problems in video streaming applications, are well dealt with. In order to validate the effectiveness of proposed transmission system, MPEG-4 FGS transmission system and that with conditional retransmission are compared in simulations. Proposed transmission system is more robust to packet-loss errors than the other two, especially when packet-loss rate is high.
End-to-end embedded approach for digital video multicast/broadcast over CDMA cellular networks
Author(s):
Yee Sin Chan;
James W. Modestino
Show Abstract
We investigate multicast/broadcast of digital video over spread-spectrum CDMA cellular networks, a platform targeted at various kinds of multimedia services. In particular, we propose an end-to-end embedded transmission scheme which combines a scalable video source coder, adaptive power allocation, adaptive channel coding and an embedded multiresolution modulation strategy to simultaneously deliver a basic quality-of-service (QoS) to less capable receivers and an enhanced QoS for more capable receivers. We demonstrate the efficacy of this approach using the ITU-T H.263+ video source coder, although the approach is generally applicable to other scalable source coding schemes as well.
FGS encoder for efficiency improvement and decoder for drift reduction
Author(s):
Kenji Matsuo;
Koichi Takagi;
Atsushi Koike;
Shuichi Matsumoto
Show Abstract
FGS is a video coding scheme that can control picture quality with high scalability, but it has lower efficiency than the single layer video coding scheme because of its hierarchical structure. This paper proposes application scheme of three methods for efficiency improvement to the FGS encoder and decoder. The first method is the bit plane coding method using two bit group separation based on the distribution of a significant bit. The second method is the introduction of motion compensation estimation to the enhancement layer. The third method is a complement method to make a high quality reference picture from both the base layer and the enhancement layer. In conclusion, the proposed methods can improve the coding efficiency in the enhancement layer of conventional FGS by 2.2% and the PSNR performance of conventional FGS by 1.1dB, as well as reduce propagation of the drift error caused by low rate transmission and channel congestion. The results show that the proposed method is more efficient and more suitable for video streaming on a heterogeneous network with channel deviation.
Layered self-identifiable and scalable video codec for delivery to heterogeneous receivers
Author(s):
Wei Feng;
Ashraf A. Kassim;
Chen-Khong Tham
Show Abstract
This paper describes the development of a layered structure of a multi-resolutional scalable video codec based on the Color Set Partitioning in Hierarchical Trees (CSPIHT) scheme. The new structure is designed in such a way that it supports the network Quality of Service (QoS) by allowing packet marking in a real-time layered multicast system with heterogeneous clients. Also, it provides (spatial) resolution/ frame rate scalability from one embedded bit stream. The codec is self-identifiable since it labels the encoded bit stream according to the resolution. We also introduce asymmetry to the CSPIHT encoder and decoder which makes it possible to decode lossy bit streams at heterogeneous clients.
Bitstream switching for progressive fine granularity scalable video coding
Author(s):
Jizheng Xu;
Feng Wu;
Shipeng Li
Show Abstract
In this paper, we propose a bitstream switching scheme for progressive fine granularity scalable (PFGS) video coding that aims to improve the streaming performance of the scalable bitstreams. By switching among PFGS video bitstreams with different enhancement layer’s reference rates based on the same base layer, significant performance gain can be achieved, especially when the bandwidth fluctuates broadly. Furthermore, compared with other bitstream switching schemes, the proposed scheme shows its flexibilities and various advantages due to using the common base layer. The experimental results illustrate that the proposed scheme can significantly improve the streaming performance.
Toward optimal rate control: a study of the impact of spatial resolution, frame rate, and quantization on subjective video quality and bit rate
Author(s):
Demin Wang;
Filippo Speranza;
Andre Vincent;
Taali Martin;
Phil Blanchfield
Show Abstract
Multi-dimensional rate control schemes, which jointly adjust two or three coding parameters, have been recently proposed to achieve a target bit rate while maximizing some objective measures of video quality. The objective measures used in these schemes are the peak signal-to-noise ratio (PSNR) or the sum of absolute errors (SAE) of the decoded video. These objective measures of quality may differ substantially from subjective quality, especially when changes of spatial resolution and frame rate are involved. The proposed schemes are, therefore, not optimal in terms of human visual perception. We have investigated the impact on subjective video quality of the three coding parameters: spatial resolution, frame rate, and quantization parameter (QP). To this end, we have conducted two experiments using the H.263+ codec and five video sequences. In Experiment 1, we evaluated the impact of jointly adjusting QP and frame rate on subjective quality and bit rate. In Experiment 2, we evaluated the impact of jointly adjusting QP and spatial resolution. From these experiments, we suggest several general rules and guidelines that can be useful in the design of an optimal multi-dimensional rate control scheme. The experiments also show that PSNR and SAE do not adequately reflect perceived video quality when changes in spatial resolution and frame rate are involved, and are therefore not adequate for assessing quality in a multi-dimensional rate control scheme. This paper describes the method and results of the investigation.
Improved rate control for MPEG-4 video transport over wireless channel
Author(s):
Zhenzhong Chen;
King Ngi Ngan;
Chengji Zhao
Show Abstract
Video streaming over wireless channel is challenging due to a number of factors such as limited bandwidth and loss sensitivity. In this paper, we develop a novel rate control algorithm for MPEG-4 video coding. Unlike traditional rate control schemes, we jointly consider the encoding complexity variation and buffer variation as well as human visual properties to optimize the rate control efficiency. We also analyze the sensitivity of a macroblock (MB) as a result of bit errors and calculate its error sensitivity metric. This metric is used in unequal error protection (UEP) of the MB. Simulation results show that proposed approach can improve the decoded picture quality in wireless video coding and transmission.
Spatio-temporal rate allocation for hybrid video coding
Author(s):
Markus Beermann
Show Abstract
State-of-the-art video coders feature a high level of adaptivity to meet the properties of their input signals. Video signal statistics are apt to sometimes discontinuously vary in all three dimensions:
spatially and temporally. Consequently for video compression, different coding tools are employed at different coordinates of the signal to adaptively and maximally reduce redundancy. Since the decision for a certain tool and for the apropriate parameters is highly signal-dependent, a video coder forms a non-linear system and optimization is not trivial to perform. Lagrange Rate/Distortion-Optimization (RDO) has become an important tool for video encoding. It has achieved high gains in coding efficiency when applied to
the independent encoding of macroblocks (MBs). For a chosen
quantizer, bit rate is allocated to each MB's coefficients and prediction parameters subject to minimization of a Lagrange cost function of rate and distortion. In effect, the cheaper option is chosen to either use a more precise prediction, i.e. motion vectors and block tiling, or to spend more bit rate on the coefficients.
Often, the video coding objective is that of constant quality. This is approximately achieved by a constant quantizer for all MBs and RDO within each MB. Due to the recursive structure of hybrid video coders though, MBs are dependent on each other, demanding for dependent RDO. This has early been formulated, but is hardly solvable due to the high dimensionality of this problem. In the following, possible simplified ways to take dependencies into account are explored. The variations in all three dimensions of the signal properties are differentiated and simplified models used for a dependent optimization.
Virtual frame rate control for efficient MPEG-2 video encoding
Author(s):
Byung Cheol Song;
Kang Wook Chun
Show Abstract
This paper presents a virtual frame rate control algorithm and a bit allocation algorithm at frame level for efficient MPEG-2 video encoding. The proposed frame rate control scheme is composed of three steps. At first step, teh scan format of an input video sequence is converted into progressive scan format before video encoding. At the second step, an average motion activity of the frames within a previous temporal window of a pre-defined size is examined, and a proper frame rate of a current temporal window is adaptively determined based on the computed average motion activity. At the final step, the frames located at particular positions in a current temporal window are virtually skipped according to the determined frame rate. We can skip the selected frames by deliberately fixing the coding types of all the macroblocks in those frames to 'skipped MB.' We also propose a modified version of MPEG-2 TM5 bit allocation algorithm suitable for virtual frame rate control. Experimental results show that the proposed algorithm noticeably improves coding efficiency in comparison with conventional MPEG-2 video encoding.
Using shot-change information to prevent an overload of platform resources
Author(s):
Rob H. M. Wubben;
Christian Hentschel
Show Abstract
Future consumer terminals (TV sets, set-top boxes, displays) will increasingly be based on programmable platforms instead of only dedicated hardware. When the available resources are insufficient to deal with the worst-case requirements then resource overloads might lead to system instability and reduced output quality. For robust and cost-effective media processing on programmable platforms, dynamic resource management is required together with resource-quality scalable video algorithms.
A way to prevent resource overloads is to stop processing when the assigned resources have been used. This may lead to drops and changes in quality. We propose to use shot-change information prior to the actual processing to predict difficult situations and react on them in a proper way. For a scalable motion estimator we show that, after a shot-change, generating motion vectors that indicate zero or close to zero motion prevents overloads, saves resources and can lead to a higher quality.
A scalable representation for motion vectors for rate adaptation to network constraints
Author(s):
Ugur Sezer;
Seyfullah Halit Oguz
Show Abstract
There is a growing need for mechanisms to shape the rate of precompressed video for adaptation to network constraints. The motion vector data component in an encoded bitstream might constitute some considerable portion of P and especially B type pictures, yet no technique currently exists to scale it. Motion vectors of neighboring macroblocks tend to be well-correlated in both horizontal and vertical directions. However, the current standards only exploit the horizontal correlations through predictive coding, introducing dependency and thus resulting in a nonscalable representation. In this paper, we propose a new representation and encoding algorithm for the motion vector information enabling the exploitation of two-dimensional (2-D) correlations, and realizing the signal-to-noise ratio (SNR) scaling of an already encoded video object to an alternate (lower) rate - (higher) distortion level. In the proposed approach, the values of the motion vectors are represented in a binary format. Consequently, the individual bit planes of this representation are coded through a novel 2-D hierarchical bit plane encoding scheme. Scaling, when necessary, is achieved by discarding the least significant bit plane(s). Simulations based on MPEG-2 standard demonstrate that up to 20% scaling is achievable without visually noticeable distortion.
Bit-rate adaptation flow control and client-based congestion control for multimedia-on-demand
Author(s):
Siu-Ping Chan;
Chi-Wah Kok
Show Abstract
A flow control for streaming multimedia data over UDP on IP network is presented. The bitrate adaptation algorithm embedded in the protocol is considered to be an end-to-end mechanism in the application level. The flow control system constantly maintains the streaming buffer at a prescribed capacity even with bursty network losses by adapting the multimedia bitrate B from the streaming codec. A congestion control algorithm which is considered to be located in a lower level than that of the flow control mechanism is presented. It works together with the presented flow control to resolve network congestion problems while maintaining a degree of TCP-friendliness by changing the sending rate R at the server. Simulation results obtained from NS2 have shown better resource allocation can be obtained, and an overall incrase in the average sending rate and hence better quality of streaming media is observed.
Label prediction and local segmentation for accurate video object tracking
Author(s):
Guillaume Foret;
Pascal Bertolino
Show Abstract
This paper presents an approach dedicated to accurately track one or several semantic objects in a video sequence. The accurate tracking of the partition object boundary is obtained by a label prediction. This prediction is performed thanks to motion vectors obtained with two different block-matching uses. In the predicted partition, a local segmentation is necessary only where matching failed and close to the predicted boundaries, in order to get the most accurate boundaries. This local segmentation is then followed by a classification step. During the classification a backward projection is used to assign or not a region to a given object.
Tracking soccer players based on homography among multiple views
Author(s):
Sachiko Iwase;
Hideo Saito
Show Abstract
In this paper, we propose a method of tracking soccer players using multiple views. As many researchers have done on soccer scene analysis by using trajectories of the playser and the soccer ball, it is desirable to track soccer players robustly. Soccer player tracking enables strategy analysis, scene recovery, making scenes for broadcasting, and automatic system of the camera control. However, soccer is a sport that occlusion occurs in many cases, and tracking often fails by the occlusion of the players. It is difficult to track the players by using a single camera alone. Therefore, we use multiple view images to avoid the occlusion problem, so that we can obtain robustness in player tracking. As a first step, inner-camera operation is performed independently in each camera to track the players. In any case that player can not be tracked in the camera, inter-camera operation is performed as a second step. Tracking information of all cameras are integrated by using the geometrical relationship between cameras called homography. Inter-camera operation makes it possible to get the location of the player who is not detected in the image, who is occluded by the other player, and who is outside the angle of view. Experimental results show that robust player tracking is available by tracking advantage using multiple cameras.
Adaptive stereo object tracking system based on block-based SSD algorithm and camera configuration parameter
Author(s):
Jung-Hwan Ko;
Young Huh;
Chang-Ju Park;
Eun-Soo Kim
Show Abstract
In this paper, an adaptive stereo object tracking system based on a block-based SSD(sum of squared difference) algorithm and camera configuration parameters is proposed. That is, by applying SSD algorithm to the reference image of the previous frame and the input image of the current frame, each location coordinate of a moving target in the right and left images are extracted and the respective shifted distances from the reference target position which is assumed to be at the origin in the initial frame are detected. Using the pan/tilt's moving angle calculated from the target's shifted distance and the configuration parameters of a stereo camera system, the block-mask size of a target object can be adaptively determined. The target image segmented with this block mask is used as a reference image in the next stage of tracking, and it is automatically updated according to the same procedure during the course of target tracking. From some experiments using sequential 48 frames of the dynamic stereo image, it is analyzed that the horizontal and vertical stereo disparities on the target object after stereo tracking are kept to be very low values of 1.5 and 0.4 pixels on average, respectively.
Interacting multiple-model-based tracking of multiple point targets using expectation maximization algorithm in infrared image sequence
Author(s):
Mukesh A Zaveri;
Uday B Desai;
Shabbir N Merchant
Show Abstract
Data association and model selection are important factors for tracking multiple targets in a dense clutter environment without using apriori information about the target dynamic. We propose Interacting Multiple Model-Expectation Maximization (IMM-EM) algorithm, by incorporating different dynamic models for the target and Markov Random Field (MRF) for data association, and hence it is possible to track maneuvering and non-maneuvering targets simultaneously in a single batch mode (sequential). Moreover it can be used for real time application.
The proposed method overcomes the problem of data association by incooperating all validated measurements together using EM algorithm and exploiting MRF. It treats the data association problem as incomplete data problem. In the proposed method, all validated measurements are used to update the target state. It uses only measurement association as missing data, which simplifies E-step and
M-step of the algorithm. In the proposed approach probability density function (pdf) of an observed data given target state and measurement association, is treated as a mixture pdf. This allows to combine likelihood of a measurement due to each model, and the association process is defined to incorporate IMM and consequently, it is possible to track any arbitrary trajectory. We also consider two different cases for association of measurement to target: Case I:- association of each measurement to target is independent of each other, Case II:- association of a measurement influences an association of its neighbor measurement.
An object-oriented software approach for a distributed human tracking motion system
Author(s):
Daniela L. Micucci
Show Abstract
Tracking is a composite job involving the co-operation of autonomous activities which exploit a complex information model and rely on a distributed architecture. Both information and activities must be classified and related in several dimensions: abstraction levels (what is modelled and how information is processed); topology (where the modelled entities are); time (when entities exist); strategy (why something happens); responsibilities (who is in charge of processing the information). A proper Object-Oriented analysis and design approach leads to a modular architecture where information about conceptual entities is modelled at each abstraction level via classes and intra-level associations, whereas inter-level associations between classes model the abstraction process. Both information and computation are partitioned according to level-specific topological models. They are also placed in a temporal framework modelled by suitable abstractions. Domain-specific strategies control the execution of the computations. Computational components perform both intra-level processing and intra-level information conversion. The paper overviews the phases of the analysis and design process, presents major concepts at each abstraction level, and shows how the resulting design turns into a modular, flexible and adaptive architecture. Finally, the paper sketches how the conceptual architecture can be deployed into a concrete distribute architecture by relying on an experimental framework.
Robust head tracking using hybrid color and 3D under natural and unspecified environments
Author(s):
Gwang-Myung Kim;
Sung-Ho Yoon;
Jung-Hyun Kim;
Gi-Taek Hur
Show Abstract
We present a crossbreed feature-based head tracking technique in natural and unspecified environment. Kalman filter is a famous estimation technique in many areas to predict the route of moving object. We tested and developed a Kalman filter to track unpredictable and fast moving objects. Depth information could generate robust tracking result that is little affected by background texture and color. However this is also limited by selected conditions like distance, accuracy of stereo camera, and object occlusion at same distance, etc. To overcome these restrictions, we combined multiple features together into single tracking system that does largely depend on depth feature. We consider multi people environment with rapid walking path.
Nonrigid object tracking using a likelihood spatio-temporal model
Author(s):
Dubhe Chavira-Martinez;
Stephane Pateux
Show Abstract
In this paper, we present a technique for tracking non-rigid video objects in a sequence. It assumes that the object in the initial image has been previously defined by an object partition. Most of object tracking methods usually rely on the motion homogeneity of the object to be tracked. They do not assume that the selected objects present either motion or spatial homogeneity. When they assume these, they employed a spatial or temporal criterion separately to achieve the object tracking. Our proposed object tracking approach relies on the concept of backward partition projection using a likelihood joint spatial-temporal model to deal with occlusion, uncover areas and fast motion problems. Several examples in different scenarios are finally presented in order to demonstrate the performance of the proposed method.
Unsupervised motion-based object segmentation refined by color
Author(s):
Matthijs C Piek;
Ralph Braspenning;
Chris Varekamp
Show Abstract
For various applications, such as data compression, structure from motion, medical imaging and video enhancement, there is a need for an algorithm that divides video sequences into independently moving objects. Because our focus is on video enhancement and structure from motion for consumer electronics, we strive for a low complexity solution. For still images, several approaches exist based on colour, but these lack in both speed and segmentation quality. For instance, colour-based watershed algorithms produce a so-called oversegmentation with many segments covering each single physical object. Other colour segmentation approaches exist which somehow limit the number of segments to reduce this oversegmentation problem. However, this often results in inaccurate edges or even
missed objects.
Most likely, colour is an inherently insufficient cue for real world object segmentation, because real world objects can
display complex combinations of colours. For video sequences, however, an additional cue is available, namely the motion of
objects. When different objects in a scene have different motion, the motion cue alone is often enough to reliably
distinguish objects from one another and the background. However, because of the lack of sufficient resolution of efficient
motion estimators, like the 3DRS block matcher, the resulting segmentation is not at pixel resolution, but at
block resolution. Existing pixel resolution motion estimators are more sensitive to noise, suffer more from aperture problems
or have less correspondence to the true motion of objects when compared to block-based approaches or are too computationally
expensive.
From its tendency to oversegmentation it is apparent that colour segmentation is particularly effective near edges of
homogeneously coloured areas. On the other hand, block-based true motion estimation is particularly effective in heterogeneous
areas, because heterogeneous areas improve the chance a block is unique and thus decrease the chance of the wrong position
producing a good match. Consequently, a number of methods exist which combine motion and colour segmentation. These methods use
colour segmentation as a base for the motion segmentation and estimation or perform an independent
colour segmentation in parallel which is in some way combined with the motion segmentation. The presented
method uses both techniques to complement each other by first segmenting on motion cues and then refining the segmentation
with colour. To our knowledge few methods exist which adopt this approach. One example is \cite{meshrefine}. This method uses an
irregular mesh, which hinders its efficient implementation in consumer electronics devices. Furthermore, the method produces
a foreground/background segmentation, while our applications call for the segmentation of multiple objects.
NEW METHOD
As mentioned above we start with motion segmentation and refine the edges of this segmentation with a pixel resolution colour
segmentation method afterwards. There are several reasons for this approach:
+ Motion segmentation does not produce the oversegmentation which colour segmentation methods normally produce, because
objects are more likely to have colour discontinuities than motion discontinuities. In this way, the colour segmentation only
has to be done at the edges of segments, confining the colour segmentation to a smaller part of the image. In such a part, it
is more likely that the colour of an object is homogeneous.
+ This approach restricts the computationally expensive pixel resolution colour segmentation to a subset of the image.
Together with the very efficient 3DRS motion estimation algorithm, this helps to reduce the computational
complexity.
+ The motion cue alone is often enough to reliably distinguish objects from one another and the background.
To obtain the motion vector fields, a
variant of the 3DRS block-based motion estimator which analyses three frames of input was used. The 3DRS motion estimator is
known for its ability to estimate motion vectors which closely resemble the true motion.
BLOCK-BASED MOTION SEGMENTATION
As mentioned above we start with a block-resolution segmentation based on motion vectors. The presented method is inspired by
the well-known $K$-means segmentation method \cite{K-means}. Several other methods (e.g. \cite{kmeansc}) adapt $K$-means for
connectedness by adding a weighted shape-error. This adds the additional difficulty of finding the correct weights for the
shape-parameters. Also, these methods often bias one particular pre-defined shape. The presented method, which we call
$K$-regions, encourages connectedness because only blocks at the edges of segments may be assigned to another segment. This
constrains the segmentation method to such a degree that it allows the method to use least squares for the robust fitting of
affine motion models for each segment. Contrary to \cite{parmkm}, the segmentation step still operates on vectors instead of
model parameters. To make sure the segmentation is temporally consistent, the segmentation of the previous frame will be used
as initialisation for every new frame. We also present a scheme which makes the algorithm independent of the initially chosen
amount of segments.
COLOUR-BASED INTRA-BLOCK SEGMENTATION
The block resolution motion-based segmentation forms the starting point for the pixel resolution segmentation. The pixel
resolution segmentation is obtained from the block resolution segmentation by reclassifying pixels only at the edges of
clusters. We assume that an edge between two objects can be found in either one of two neighbouring blocks that belong to
different clusters. This assumption allows us to do the pixel resolution segmentation on each pair of such neighbouring blocks
separately. Because of the local nature of the segmentation, it largely avoids problems
with heterogeneously coloured areas. Because no new segments are introduced in this step, it also does not suffer from
oversegmentation problems. The presented method has no problems with bifurcations. For the pixel resolution segmentation
itself we reclassify pixels such that we optimize an error norm which favour similarly coloured regions and straight edges.
SEGMENTATION MEASURE
To assist in the evaluation of the proposed algorithm we developed a quality metric. Because the problem does not have an
exact specification, we decided to define a ground truth output which we find desirable for a given input. We define the measure for the segmentation quality as being how different the segmentation
is from the ground truth. Our measure enables us to evaluate oversegmentation and undersegmentation seperately. Also, it
allows us to evaluate which parts of a frame suffer from oversegmentation or undersegmentation. The proposed algorithm has been tested on several typical sequences.
CONCLUSIONS
In this abstract we presented a new video segmentation method which performs well in the segmentation of multiple
independently moving foreground objects from each other and the background. It combines the strong points of both colour and
motion segmentation in the way we expected. One of the weak points is that the segmentation method suffers from
undersegmentation when adjacent objects display similar motion. In sequences with detailed backgrounds the segmentation will
sometimes display noisy edges. Apart from these results, we think that some of the techniques, and in particular the
$K$-regions technique, may be useful for other two-dimensional data segmentation problems.
Object extraction in video sequences based on the spatio-temporal independent component analysis
Author(s):
Zhenhe Chen;
Xiao-Ping Zhang
Show Abstract
Compression and content-based video retrieval (CBVR) are essential needs for efficient and intelligent utilizations of vast multimedia databases over the Internet. In video sequences, object based extraction techniques are gaining importance in achieving compression and performing content-based video retrieval. In this paper, a novel technique is developed to extract objects from video sequences based on spatiotemporal independent component analysis (stICA) and multiscale analysis. The stICA is used to extract the preliminary source images containing moving objects in the video sequences. The source image data obtained after stICA analysis are further processed using wavelet based multiscale image segmentation and region detection techniques to improve the accuracy of the extracted object. Preliminary results demonstrate great potential for stICA based object extraction technique in content-based video processing applications.
A segmentation system with model-assisted completion of video objects
Author(s):
Dirk Farin;
Peter H. N. de With;
Wolfgang Effelsberg
Show Abstract
This paper presents a new algorithm for video-object segmentation,
which combines motion-based segmentation, high-level object-model
detection, and spatial segmentation into a single framework.
This joint approach overcomes the disadvantages of these algorithms
when applied independently. These disadvantages include the low semantic accuracy of spatial segmentation and the inexact object boundaries obtained from object-model matching and motion segmentation. The now proposed algorithm alleviates three problems common to all motion-based segmentation algorithms. First, it completes object areas that cannot be clearly distinguished
from the background because their color is near the background color.
Second, parts of the object that are not considered to belong
to the object since they are not moving, are still added to the object mask. Finally, when several objects are moving, of which only one is of interest, it is detected that the remaining regions
do not belong to any object-model and these regions are removed from the foreground. This suppresses regions erroneously considered as moving or objects that are moving but that are completely irrelevant to the user.
Cast-shadows detection on Lambertian surfaces in video sequences
Author(s):
Jean-Marie Pinel;
Henri Nicolas
Show Abstract
This paper proposes a new method for the detection of moving cast shadows in natural video sequences. The variations of illumination generate in a shadow area are modelized assuming that the source light is fixed and unique, and that the surface on which the shadow is projected is plane and Lambertian. A local and global matching of this model is then done on the current image in order to obtain a first detection of the moving shadow areas. This matching process is based on a reference image which is assumed to contain no moving shadow. A spatio-temporal follow up of the obtained areas is applied in order to remove false detection. The proposed segmentation method was tested and validated on real video sequences.
Recognition and location of real objects using eigenimages and a neural network classifier
Author(s):
Maria Asuncion Vicente;
Oscar Reinoso;
Carlos Perez;
Cesar Fernandez;
Jose Maria Sabater
Show Abstract
A representation using eigenimages that achieves in two stages identify the object and locate its pose is addressed.In this paper we demonstrate how a mixture of two approaches based on eigenspaces with some little modifications resolve the problem of identification and the location of the object. In the first stage we recognize the object by means of PCA* (Principal Component Analysis) method combined with a neural network classifier, and in the second step, the object’s pose is obtained using a modification of typical PCA (we name as PCA2 method). We present the results obtained using a database made with 25 real objects.
Human eye interfaced multiple video objects coding
Author(s):
Jae Jeong Hwang;
Sang-Gyu Cho;
Chi-Gyu Hwang;
Hong Ren Wu
Show Abstract
In the general multiple video object coder, more interesting objects such as a speaker or a moving object is consistently coded with higher priority. Since the priority of each object may not be fixed in the whole sequence and be variable on a frame by frame basis, it must be adjusted in a frame. In this paper, we analyze the independent rate control algorithm and the global algorithm that the QP value is controlled by static parameters, object importance or priority, the target PSNR and the weighted distortion. The priority among static parameters is analyzed and adjusted into dynamic parameters according to the visual interests or importance obtained by a camera interface. The target PSNR and the weighted distortion are proportional to magnitude, motion, and distortion. We apply these parameters for the weighted distortion control and the priority-based control leading to an efficient bit-rate distribution. As result, we have achieved that fewer bits are allocated for video objects which have less importance and more bits for those which have higher visual importance. The period to reach stability in the visual quality is reduced to less than 15 frames of the coded sequence. With respect to the PSNR, the proposed scheme shows higher quality of over 2dB than the conventional schemes. Thus the coding scheme interfaced to human-eyes proves to be an efficient video coder dealing with the multiple video objects.
Sport video shot segmentation and classification
Author(s):
Rozenn Dahyot;
Niall Rea;
Anil C. Kokaram
Show Abstract
This paper considers the statistics of local appearance based measures that are suitable for the visual parsing of sport events. The moments of the colour information are computed, and the shape content in the frames is characterised by the moments of local shape measures. Their generation process is very low cost. The temporal evolution of the features then is modelled with a Hidden Markov Model. The HMM is used to generate higher level information by classifying the shots as close ups, court views, crowd shots and so on. The paper illustrates how those simple features, coupled with the HMM, can be used for parsing snooker and tennis footages.
Moving objects self-generated illumination variations removal for automatic video-surveillance systems
Author(s):
Lucio Marcenaro;
Luca Marchesotti;
Carlo S Regazzoni
Show Abstract
This paper proposes novel technique for elimination of moving objects self-generated illumination variations with respect to a fixed reference frame (background). In particular, proposed techniques can be used by a video-based surveillance system able to automatically detect potentially dangerous or particular situations happening within a guarded area. Proposed techniques are based on chromatic properties of cast shadows and illumination and achieved results demonstrate that the proposed approach can greatly improve performances of a video-surveillance system in term of the precision of detected objects. Presented processing techniques are necessary in the case of a real system that must be able to work with good performances twenty-four hours a day.
Content-based image retrieval using a Gaussian mixture model in the wavelet domain
Author(s):
Hua Yuan;
Xiao-Ping Zhang;
Ling Guan
Show Abstract
The research on Content-based Image Retrieval (CBIR) has been very active in recent years. The performance of a CBIR system can be significantly improved by selecting a good indexing feature space to represent image characteristics. In this paper, we introduce a statistical-model based technique for analyzing and extracting image features in the wavelet domain. The images are decomposed into a set of wavelet subspaces in the wavelet domain and for each wavelet subspace, a two component Gaussian mixture model is developed to describe the statistical characteristics of the wavelet coefficients. The model parameters, which are a good reflection of image features in the wavelet subspaces, are obtained by an EM (Expectation-Maximization) algorithm and employed to construct the indexing feature space for a CBIR system. We apply the new method on the Brodatz image database to demonstrate its performance. The experimental results indicate that our indexing feature space is very effective in representing image characteristics and provides a high retrieval rate in the CBIR system. When compared with some other conventional feature extraction methods, the new method achieves comparable retrieval performance with less number of features in the feature space, which means it is more computationally efficient.
Framework for image mining and retrieval
Author(s):
Rokia Missaoui;
Madenda Sarifuddin;
Youssef Hamouda;
Jean Vaillancourt;
Hayet Laggoune
Show Abstract
In this paper, we describe a three-step content-based approach to image retrieval and mining. At a first step, visual features such as color and shape are generated from images by improving a few existing feature extraction techniques. Then, both visual features and descriptive data (i.e., metadata) are considered for a coarse-grain similarity analysis using a conceptual clustering approach called formal concept analysis (concept lattice theory). This approach is designed and implemented so that exploratory mechanisms such as browsing, zooming/shrinking and visualization allow the user to discover and refine the cluster which is the most appropriate to his/her target image. At this second stage, issues such as dimension reduction, cluster construction and association rule generation are handled. The last step consists to conduct a fine-grain similarity analysis on some selected cluster(s) identified at the second stage by using two newly proposed similarity measures.
Compressed domain image retrieval: a comparative study of similarity metrics
Author(s):
Maria Hatzigiorgaki;
Athanassios N. Skodras
Show Abstract
Content based image retrieval has gained considerable attention in nowadays as a very useful tool in a plethora of applications. Web has become the most important application, because over 70% of it is devoted to images, and looking for a specific image is a really daunting task. The vast majority of these images are JPEG compressed. An extensive study of eighteen similarity measures used for image retrieval has been conducted and the corresponding results are reported in the present communication. The energy histograms of the low frequency DCT coefficients have been used as the feature space for similarity testing. Query-by-image-example was used in all tests.
Object detection in cinematographic video sequences for automatic indexing
Author(s):
Jurgen Stauder;
Bertrand Chupeau;
Lionel Oisel
Show Abstract
This paper presents an object detection framework applied to cinematographic post-processing of video sequences. Post-processing is done after production and before editing. At the beginning of each shot of a video, a slate (also called clapperboard) is shown. The slate contains notably an electronic audio timecode that is necessary
for audio-visual synchronization. This paper presents an object detection framework to detect slates in video sequences for automatic indexing and post-processing. It is based on five steps. The first two steps aim to reduce drastically the video data to be analyzed. They ensure high recall rate but have low precision. The first step detects images at the beginning of a shot possibly showing up a slate while the second step searches in these images for candidates regions with color distribution similar to slates. The objective is to not miss any slate while eliminating long parts of video without slate appearance. The third and fourth steps are statistical classification and pattern matching to detected and precisely locate slates in candidate regions. These steps ensure high recall rate and high precision. The objective is to detect slates with very little false alarms to minimize interactive corrections. In a last step, electronic
timecodes are read from slates to automize audio-visual synchronization. The presented slate detector has a recall rate of 89% and a precision of 97,5%. By temporal integration, much more than 89% of shots in dailies are detected. By timecode coherence analysis, the precision can be raised too. Issues for future work are to accelerate the system to be faster than real-time and to extend the framework for several slate types.
Gesture recognition in video image with combination of partial and global information
Author(s):
Chil-Woo Lee;
Hyun-Ju Lee;
Sung Ho Yoon;
Jung Hyun Kim
Show Abstract
In this paper, we describe an algorithm which can automatically recognize human gesture in a sequence of natural video image by utilizing two dimensional features extracted from bodily region of the images. In the algorithm, we first construct a gesture model space by analyzing the statistical information of sample images with principle component analysis method. And then, input images are compared to the model and individually symbolized to one part of the model space. In the last step, the symbolized images are recognized with HMM as one of model gestures. The feature of our method is to use a combination of partial and global information of two-dimensional abstract bodily motion, consequently it is very convenient to apply to real world situation and the recognition results are very robust.
A technique for mapping irregular-sized vectors applied to image collections
Author(s):
Jonathan D. Edwards;
John K. Riley;
John P. Eakins
Show Abstract
This paper describes a strategy to visualise a data-set of multi-component feature vectors using multidimensional scaling (MDS). MDS is employed, instead of more commonly applied mapping techniques, as
it can utilise arbitrary distance measures and hence can easily incorporate the non-linear distance metrics employed when matching multi-component vectors. To test this mapping approach, we have applied it to a data-set of two hundred and sixty eight images, segmented into multiple components each represented by a shape descriptor. The inter-image distances are measured using a series of simple non-location based image distance metrics. The maps are encouraging, with well clustered areas for duplicate or near duplicate trademarks. This gives a clear indication that MDS can be used for this type of visualisation task. However, the maps themselves
also significantly highlight the inadequacies of the segmentation and matching phases. Particularly for the images with an overall impression that doesn’t correspond to the segmented parts, for example figure/ground reversal or macro texture.
How good are the visual MPEG-7 features?
Author(s):
Horst Eidenberger
Show Abstract
The study presented in this paper analyses descriptions extracted with MPEG-7-descriptors from visual content from the statistical point of view. Good descriptors should generate descriptions with high variance, a well-balanced cluster structure and high discriminance to be able to distinguish different media content. Statistical analysis reveals the quality of the used description extraction algorithms. This was not considered in the MPEG-7-design process where optimising the recall was the major goal. For the analysis eight basic visual descriptors were applied on three media collections: the Brodatz dataset (monochrome textures), a selection of the Corel dataset (colour photos) and a set of coats-of-arms images (artificial colour images with few colour gradations). The results were analysed with four statistical methods: mean and variance of descriptor elements, distribution of elements, cluster analysis (hierarchical and topological) and factor analysis. The main results are: The best descriptors for combination are Color Layout, Dominant Color, Edge Histogram and Texture Browsing. The other are highly dependent on these. The colour histograms (Color Structure and Scalable Color) perform badly on monochrome input. Generally, all descriptors are highly redundant and the application of complexity reduction transformations could save up to 80% of storage and transmission capacity.
Adaptive overcomplete wavelet video coding with spatial transcaling
Author(s):
Mihaela van der Schaar;
Jong Chul Ye;
Hayder Radha
Show Abstract
The unprecedented increase in the level of heterogeneity of emerging wireless networks and the mobile Internet emphasizes the need for scalable and adaptive video solutions both for coding and transmission purposes. However, in general, there is an inherent tradeoff between the level of scalability (e.g., in terms of bitrate range, levels of spatial resolutions, and/or levels of temporal resolutions) and the video-coding penalty incurred by such scalable video schemes as compared to non-scalable coders. In other words, the higher the level of scalability, the lower the overall video quality of the scalable stream that is needed to support the desired scalability level. In [1][2][3], we introduced the notion of TranScaling, which is a generalization of (non-scalable) transcoding. With TranScaling, a scalable video stream covering a given bandwidth range, is mapped into one or more scalable video streams covering different bandwidth ranges. In this paper, we illustrate the benefits of Spatial TranScaling in the context of a recently developed scalable and adaptive inband motion compensated temporal filtering scheme (IBMCTF) [4][5]. We show how using TranScaling, the already high coding efficiency performance of such adaptive IBMCTF schemes can be further improved.
Coding mode optimization for MPEG-2 transcoding with spatial resolution reduction
Author(s):
Hao-Song Kong;
Anthony Vetro;
Huifang Sun
Show Abstract
This paper presents a coding mode decision algorithm for MPEG-2 spatial transcoding. The optimization for coding mode and quantization scale are formulated in an operational rate distortion sense and solved by Lagarange multiplier method. The experimental results show that the proposed transcoder with optimized coding mode and quantizer can achieve better quality and lower bit rate than those obtained using cascaded transcoder or MPEG-2 TM5 encoder.
EZBC video streaming with channel coding and error concealment
Author(s):
Ivan V. Bajic;
John W. Woods
Show Abstract
In this text we present a system for streaming video content encoded using the motion-compensated Embedded Zero Block Coder (EZBC). The system incorporates unequal loss protection in the form of multiple description FEC (MD-FEC) coding, which provides adequate protection for the embedded video bitstream when the loss process is not very bursty. The adverse effects of burst losses are reduced using a novel motion-compensated error concealmet method.
A video coder for low bitrate mobile video streaming
Author(s):
Fulvio Moschetti;
Kazuo Sugimoto
Show Abstract
Achieving very low bitrates for video applications in a mobile environment is essential. Video codecs standards evolution had a tremendous impact in the development of digital video, making possible today widespread usage of video data. At the same time it had an impact on the direction of research, biasing most of the efforts towards block based video. Alternative approaches may though lead towards different architectures able to provide higher compression efficiency. In this paper we propose a Matching Pursuit (MP) based video coder that adopts a generalized sub-pixel motion compensation and arithmetic coding. We introduce a new dictionary, and an adaptive grid for atom coding combined with an arithmetic coder. Comparison with H.264 show an improvement of up to 20% in compression efficiency for the same PSNR
Psychovisual masks and intelligent streaming RTP techniques for the MPEG-4 standard
Author(s):
Alessandro Mecocci;
Francesco Falconi
Show Abstract
In today multimedia audio-video communication systems, data compression plays a fundamental role by reducing the bandwidth waste and the costs of the infrastructures and equipments. Among the different compression standards, the MPEG-4 is becoming more and more accepted and widespread. Even if one of the fundamental aspects of this standard is the possibility of separately coding video objects (i.e. to separate moving objects from the background and adapt the coding strategy to the video content), currently implemented codecs work only at the full-frame level. In this way, many advantages of the flexible MPEG-4 syntax are missed. This lack is due both to the difficulties in properly segmenting moving objects in real scenes (featuring an arbitrary motion of the objects and of the acquisition sensor), and to the current use of these codecs, that are mainly oriented towards the market of DVD backups (a full-frame approach is enough for these applications).
In this paper we propose a codec for MPEG-4 real-time object streaming, that codes separately the moving objects and the scene background. The proposed codec is capable of adapting its strategy during the transmission, by analysing the video currently transmitted and setting the coder parameters and modalities accordingly. For example, the background can be transmitted as a whole or by dividing it into “slightly-detailed” and “highly detailed” zones that are coded in different ways to reduce the bit-rate while preserving the perceived quality. The coder can automatically switch in real-time, from one modality to the other during the transmission, depending on the current video content. Psychovisual masks and other video-content based measurements have been used as inputs for a Self Learning Intelligent Controller (SLIC) that changes the parameters and the transmission modalities.
The current implementation is based on the ISO 14496 standard code that allows Video Objects (VO) transmission (other Open Source Codes like: DivX, Xvid, and Cisco’s Mpeg-4IP, have been analyzed but, as for today, they do not support VO). The original code has been deeply modified to integrate the SLIC and to adapt it for real-time streaming. A personal RTP (Real Time Protocol) has been defined and a Client-Server application has been developed. The viewer can decode and demultiplex the stream in real-time, while adapting to the changing modalities adopted by the Server according to the current video content.
The proposed codec works as follows: the image background is separated by means of a segmentation module and it is transmitted by means of a wavelet compression scheme similar to that used in the JPEG2000. The VO are coded separately and multiplexed with the background stream. At the receiver the stream is demultiplexed to obtain the background and the VO that are subsequently pasted together.
The final quality depends on many factors, in particular: the quantization parameters, the Group Of Video Object (GOV) length, the GOV structure (i.e. the number of I-P-B VOP), the search area for motion compensation. These factors are strongly related to the following measurement parameters (that have been defined during the development): the Objects Apparent Size (OAS) in the scene, the Video Object Incidence factor (VOI), the temporal correlation (measured through the Normalized Mean SAD, NMSAD). The SLIC module analyzes the currently transmitted video and selects the most appropriate settings by choosing from a predefined set of transmission modalities. For example, in the case of a highly temporal correlated sequence, the number of B-VOP is increased to improve the compression ratio. The strategy for the selection of the number of B-VOP turns out to be very different from those reported in the literature for B-frames (adopted for MPEG-1 and MPEG-2), due to the different behaviour of the temporal correlation when limited only to moving objects. The SLIC module also decides how to transmit the background. In our implementation we adopted the Visual Brain theory i.e. the study of what the “psychic eye” can get from a scene. According to this theory, a Psychomask Image Analysis (PIA) module has been developed to extract the visually homogeneous regions of the background. The PIA module produces two complementary masks one for the visually low variance zones and one for the higly variable zones; these zones are compressed with different strategies and encoded into two multiplexed streams. From practical experiments it turned out that the separate coding is advantageous only if the low variance zones exceed 50% of the whole background area (due to the overhead given by the need of transmitting the zone masks). The SLIC module takes care of deciding the appropriate transmission modality by analyzing the results produced by the PIA module.
The main features of this codec are: low bitrate, good image quality and coding speed. The current implementation runs in real-time on standard PC platforms, the major limitation being the fixed position of the acquisition sensor. This limitation is due to the difficulties in separating moving objects from the background when the acquisition sensor moves. Our current real-time segmentation module does not produce suitable results if the acquisition sensor moves (only slight oscillatory movements are tolerated). In any case, the system is particularly suitable for tele surveillance applications at low bit-rates, where the camera is usually fixed or alternates among some predetermined positions (our segmentation module is capable of accurately separate moving objects from the static background when the acquisition sensor stops, even if different scenes are seen as a result of the sensor displacements). Moreover, the proposed architecture is general, in the sense that when real-time, robust segmentation systems (capable of separating objects in real-time from the background while the sensor itself is moving) will be available, they can be easily integrated while leaving the rest of the system unchanged.
Experimental results related to real sequences for traffic monitoring and for people tracking and afety control are reported and deeply discussed in the paper. The whole system has been implemented in standard ANSI C code and currently runs on standard PCs under Microsoft Windows operating system (Windows 2000 pro and Windows XP).
Real-time scheduling and online resource allocation on scalable streaming media server
Author(s):
Kui Gao;
Wen Gao;
Simin He;
Yuan Zhang
Show Abstract
In this paper, we propose a layer-based integrated real-time scheduling algorithm in a single scalable stream and an online
dynamic resource allocation algorithm among multiple concurrent users for scalable streaming media server over a network with packet loss and variable delay. The layer-based real-time scheduling algorithm efficiently schedules the packets in the buffer of the scalable streaming media server for transmission. The on-line resource allocation algorithm can allocate the server’s resource among all the concurrent streams fairly and improve the playback quality in client.
Simulation results show that our proposed algorithms outperform the frame-based scheduling algorithm and the off-line resource allocation algorithm in various situations with different round-trip times, channel errors, etc. The low complexity of the proposed algorithms also enables them to be applied in real-time applications.
Global optimization of video quality by improved rate control on IP-networks
Author(s):
Andy G. Backhouse;
Irene Gu;
Sverrir Olafsson;
Mike J. Smith
Show Abstract
This paper addresses the optimal rate control in a communication
network consisting of multiple users, transmitting
video. We suggest that rate control in the Internet should take
the signal properties into account. An optimal distribution of
rates should maximise the aggregate quality of the received
signals. This paper addresses video sequences specifically, but
the results are expected to hold for more generalized signals as
well. The optimal transmission rate of a user will vary in time.
This is due to its dependence on time-varying parameters such as
the traffic routing, network topology and the demands of other
users. Moreover, the optimal transmission rate is dependent on
the metric selected to describe the video quality. We show that
when congestion is controlled `correctly' the optimal distribution
of rates is an attractor. Using the proposed rate control
mechanisms, the transmission rates will then converge weakly
towards this attractor. The algorithm for rate control mechanisms
for video transmission is then given. The proposed control
mechanisms are directly implementable in the Internet without
requiring intelligent feedback from the Internet gateways.
Simulations using the proposed congestion control mechanisms
have been conducted using different metrics to describe the
video quality.
Multimedia streaming gateway with jitter detection
Author(s):
Siu-Ping Chan;
Chi-Wah Kok;
Albert K. Wong
Show Abstract
This paper investigates a novel active buffer management scheme, "Jitter Detection" (JD) for gateway-based congestion control to stream multimedia traffics in packet-switched networks. The quality of multimedia presentation can be greatly degraded due to network delay variation or jitter when transported over packet-switched network. Jitter degrades the timing relationship among packets in a single media stream and between packets from different media streams and hence creates multimedia synchronization problems. Moreover, too much jitter will also degrade the performance of the streaming buffer in the client. Packets received by client will render useless if they have accumulated a large enough jitter. The proposed active buffer management scheme will improve the quality of service in multimedia networking by detecting and discarding useless packets that accumulated large enough jitter. Such as to maintain a high bandwidth for packets within the multimedia stream's jitter
tolerance. Simulation results have shown that the proposed scheme can effectively lower the average received packet jitter and increase the goodput of the received packets when compared to random early detection (RED), and Droptail used in gateway-based congestion control. Furthermore, simulation results have also revealed that
the proposed scheme can maintain the same TCP-friendliness when compared to that of RED and Droptail used for multimedia streams.
Comparing subjective video quality testing methodologies
Author(s):
Margaret H. Pinson;
Stephen Wolf
Show Abstract
International recommendations for subjective video quality assessment (e.g., ITU-R BT.500-11) include specifications for how to perform many different types of subjective tests. Some of these test methods are double stimulus where viewers rate the quality or change in quality between two video streams (reference and impaired). Others are single stimulus where viewers rate the quality of just one video stream (the impaired). Two examples of the former are the double stimulus continuous quality scale (DSCQS) and double stimulus comparison scale (DSCS). An example of the latter is single stimulus continuous quality evaluation (SSCQE). Each subjective test methodology has claimed advantages. For instance, the DSCQS method is claimed to be less sensitive to context (i.e., subjective ratings are less influenced by the severity and ordering of the impairments within the test session). The SSCQE method is claimed to yield more representative quality estimates for quality monitoring applications. This paper considers data from six different subjective video quality experiments, originally performed with SSCQE, DSCQS and DSCS methodologies. A subset of video clips from each of these six experiments were combined and rated in a secondary SSCQE subjective video quality test. We give a method for post-processing the secondary SSCQE data to produce quality scores that are highly correlated to the original DSCQS and DSCS data. We also provide evidence that human memory effects for time-varying quality estimation seem to be limited to about 15 seconds.
An objective method for combining multiple subjective data sets
Author(s):
Margaret H. Pinson;
Stephen Wolf
Show Abstract
International recommendations for subjective video quality assessment (e.g., ITU-R BT.500-11) include specifications for how to perform many different types of subjective tests. In addition to displaying the video sequences in different ways, subjective tests also have different rating scales, different words associated with these scales, and many other test variables that change from one laboratory to another (e.g., viewer expertise and criticality, cultural differences, physical test environments). Thus, it is very difficult to directly compare or combine results from two or more subjective experiments. The ability to compare and combine results from multiple subjective experiments would greatly benefit developers and users of video technology since standardized subjective data bases could be expanded upon to include new source material and past measurement results could be related to newer measurement results. This paper presents a subjective method and an objective method for combining multiple subjective data sets. The subjective method utilizes a large meta-test with selected video clips from each subjective data set. The objective method utilizes the functional relationships between objective video quality metrics (extracted from the video sequences) and corresponding subjective mean opinion scores (MOSs). The objective mapping algorithm, called the iterated nested least-squares algorithm (INLSA), relates two or more independent data sets that are themselves correlated with some common intermediate variables (i.e, the objective video quality metrics). We demonstrate that the objective method can be used as an effective substitute for the expensive and time consuming subjective meta-test.
Video quality evaluation for mobile applications
Author(s):
Stefan Winkler;
Frederic Dufaux
Show Abstract
This paper presents the results of a quality evaluation of video sequences encoded for and transmitted over a wireless channel. We selected content, codecs, bitrates and bit error patterns representative of mobile applications, focusing on the MPEG-4 and Motion JPEG2000 coding standards. We carried out subjective experiments using the Single Stimulus Continuous Quality Evaluation (SSCQE) method on this test material. We analyze the subjective data and use them to compare codec performance as well as the effects of transmission errors on visual quality. Finally, we use the subjective ratings to validate the prediction performance of a real-time non-reference quality metric.
Video quality assessment using neural network based on multi-feature extraction
Author(s):
Susu Yao;
Weisi Lin;
Zhongkang Lu;
EePing Ong;
Xiao Kang Yang
Show Abstract
In this paper, we propose a new video quality evaluation method based on multi-feature and radial basis function neural network. Multi-feature is extracted from a degraded image sequence and its reference sequence, including error energy, activity-masking and luminance-masking as well as blockiness and blurring features. Based on these factors we apply a radial basis function neural network as a classifier to give quality assessment scores. After training with the subjective mean opinion scores (MOS) data of VQEG test sequences, the neural network model can be used to evaluate video quality with good correlation performance in terms of accuracy and consistency measurements.
Mixed variables modeling method to estimate network video quality
Author(s):
Hiroaki Ikeda;
Takeshi Yoshida;
Tomoko Kashima
Show Abstract
This paper describes a model to estimate human subjective video quality scores based on a framework of double-ended with a full reference system or end-to-end network video systems where sending video and receiving videos are available. The model incorporates multi-dimensional and multi-variables all of which are objectively measurable. Its aim is to estimate subjective score in the difference mean opinion score between a reference video and corresponding degraded videos. The model is developed based on an application of the multivariable analysis and trained by a set of a priori known relationship between physically obtainable parameters and corresponding subjective score. The paper describes how the model incorporates a mathematical algorithm as an executable programme module, and in what degree the model can be applicable in order to account actual examples of video pairs for both training video set used and foreign video sets. Referring to an application of the developed model to a set of foreign videos, it reports the capability of the model to different quality range. In addition to the worked results, some issues for further study are highlighted.
No-reference quality metric for degraded and enhanced video
Author(s):
Jorge E. Caviedes;
Franco Oberti
Show Abstract
In this paper we present a no-reference objective quality metric (NROQM) that has resulted from extensive research on impairment metrics, image feature metrics, and subjective image quality in several projects in Philips Research, and participation in the ITU Video Quality Experts Group. The NROQM is aimed at requirements including video algorithm development, embedded monitoring and control of image quality, and evaluation of different types of display systems. NROQM is built from metrics for desirable and non-desirable image features (sharpness, contrast, noise, clipping, ringing, and blocking artifacts), and accounts for their individual and combined contributions to perceived image quality. We describe our heuristic, incremental approach to modeling quality and training the NROQM, and its advantages to deal with imperfect data and imperfect metrics. The results of training the NROQM using a large set of video sequences, which include degraded and enhanced video, show high correlation between objective and subjective scores, and the results of the first performance test show good objective-subjective correlations as well. We also discuss issues that require further research such as fully content-independent metrics, measuring over-enhanced video quality, and the role of temporal impairment metrics.
PQSM-based RR and NR video quality metrics
Author(s):
Zhongkang Lu;
Weisi Lin;
Eeping Ong;
Xiaokang Yang;
Susu Yao
Show Abstract
This paper presents a new and general concept, PQSM (Perceptual
Quality Significance Map), to be used in measuring the visual
distortion. It makes use of the selectivity characteristic of HVS
(Human Visual System) that it pays more attention to certain
area/regions of visual signal due to one or more of the following
factors: salient features in image/video, cues from domain
knowledge, and association of other media (e.g., speech or audio).
PQSM is an array whose elements represent the relative
perceptual-quality significance levels for the corresponding
area/regions for images or video. Due to its generality, PQSM can
be incorporated into any visual distortion metrics: to improve
effectiveness or/and efficiency of perceptual metrics; or even to
enhance a PSNR-based metric. A three-stage PQSM estimation method
is also proposed in this paper, with an implementation of motion,
texture, luminance, skin-color and face mapping. Experimental
results show the scheme can improve the performance of current
image/video distortion metrics.
Method for precise detection of local impairment of coded video by use of reduced reference
Author(s):
Osamu Sugimoto;
Ryoichi Kawada;
Atsushi Koike;
Masahiro Wada
Show Abstract
A novel method of detecting local impairment of the coded picture caused by transmission failure on digital television transmission using a reduced reference framework is proposed. The method utilizes the SSSWHT method, which is based on spread spectrum and extraction of WHT coefficients, and precisely estimates the MSE in a small region within the frame using reduced references to detect local impairment caused by MPEG bitstream error. Computer simulations show that the proposed method detects the impaired frames completely at a reference path bitrate of 425 kbps. The method also detects slices that have impaired pixel blocks with accuracy of approximately 70% at a reference path bitrate of 425 kbps. These results confirm that the proposed method is effective for television transmission monitoring.
Successive refinement of video: fundamental issues, past efforts, and new directions
Author(s):
David Taubman
Show Abstract
The paper provides an overview of the fundamental issues confronting scalable video compression, together with some of the most promising general approaches to addressing these issues. For motivation, the paper sketches a compelling application in remote browsing of video, whose realization is not possible without efficient highly scalable video compression technology. The paper outlines some of the most important structural alternatives for wavelet-based scalable video compression systems, together with experimental findings in support of particular alternatives. In particular, the paper describes novel spatio-temporal decomposition structures, a novel finely embedded motion coding structure, and a complete compression system which addresses the needs of a wide range of potential applications. Preliminary compression performance results are provided together with information on an implementation which is capable of real-time decoding at CIF resolutions and beyond.
Fully scalable video transmission using the SSM adaptation framework
Author(s):
Debargha Mukherjee;
Peisong Chen;
Shih-Ta Hsiang;
John W. Woods;
Amir Said
Show Abstract
Recently a methodology for representation and adaptation of arbitrary scalable bit-streams in a fully content non-specific manner has been proposed on the basis of a universal model for all scalable bit-streams called Scalable Structured Meta-formats (SSM). According to this model, elementary scalable bit-streams are naturally organized in a symmetric multi-dimensional logical structure. The model parameters for a specific bit-stream along with information guiding decision-making among possible adaptation choices are represented in a binary or XML descriptor to accompany the bit-stream flowing downstream. The capabilities and preferences of receiving terminals flow upstream and are also specified in binary or XML form to represent constraints that guide adaptation. By interpreting the descriptor and the constraint specifications, a universal adaptation engine sitting on a network node can adapt the content appropriately to suit the specified needs and preferences of recipients, without knowledge of the specifics of the content, its encoding and/or encryption. In this framework, different adaptation infrastructures are no longer needed for different types of scalable media. In this work, we show how this framework can be used to adapt fully scalable video bit-streams, specifically ones obtained by the fully scalable MC-EZBC video coding system. MC-EZBC uses a 3-D subband/wavelet transform that exploits correlation by filtering along motion trajectories, to obtain a 3-dimensional scalable bit-stream combining temporal, spatial and SNR scalability in a compact bit-stream. Several adaptation use cases are presented to demonstrate the flexibility and advantages of a fully scalable video bit-stream when used in conjunction with a network adaptation engine for transmission.
Transition filtering and optimized quantization in interframe wavelet video coding
Author(s):
Thomas Rusert;
Konstantin Hanke;
Jens-Rainer Ohm
Show Abstract
In interframe wavelet video coding, wavelet-based motion-compensated temporal filtering (MCTF) is combined with spatial wavelet decomposition, allowing for efficient spatio-temporal decorrelation and temporal, spatial and SNR scalability. Contemporary interframe wavelet video coding concepts employ block-based motion estimation (ME) and compensation (MC) to exploit temporal redundancy between successive frames. Due to occlusion effects and imperfect motion modeling, block-based MCTF may generate temporal high frequency subbands with block-wise varying coefficient statistics, and low frequency subbands with block edges. Both effects may cause declined spatial transform gain and blocking artifacts. As a modification to MCTF, we present spatial highpass transition filtering (SHTF) and spatial lowpass transition filtering (SLTF), introducing smooth transitions between motion blocks in the high and low frequency subbands, respectively. Additionally, we analyze the propagation of quantization noise in MCTF and present an optimized quantization strategy to compensate for variations in synthesis filtering for different block types. Combining these approaches leads to a reduction of blocking artifacts, smoothed temporal PSNR performance, and significantly improved coding efficiency.
Inter-view wavelet compression of light fields with disparity-compensated lifting
Author(s):
Chuo-Ling Chang;
Xiaoqing Zhu;
Prashant Ramanathan;
Bernd Girod
Show Abstract
We propose a novel approach that uses disparity-compensated
lifting for wavelet compression of light fields. Disparity
compensation is incorporated into the lifting structure for the
transform across the views to solve the irreversibility limitation
in previous wavelet coding schemes. With this approach, we obtain
the benefits of wavelet coding, such as scalability in all
dimensions, as well as superior compression performance. For light
fields of an object, shape adaptation is adopted to improve the
compression efficiency and visual quality of reconstructed images.
In this work we extend the scheme to handle light fields with
arbitrary camera arrangements. A view-sequencing algorithm is
developed to encode the images. Experimental results show that
the proposed scheme outperforms existing light field compression
techniques in terms of compression efficiency and visual quality
of the reconstructed views.
Advanced lifting-based motion-threading (MTh) technique for 3D wavelet video coding
Author(s):
Lin Luo;
Feng Wu;
Shipeng Li;
Zhenquan Zhuang
Show Abstract
This paper proposes an advanced motion-threading technique to improve the coding efficiency of the 3D wave-let coding. We extend the original motion-threading technique to the lifting wavelet structure. This extension solves the artificial motion thread truncation problem in long support temporal wavelet filtering, and enables the accuracy of motion alignment to be fractional-pixel with guaranteed perfect reconstruction. Furthermore, the mismatch problem in the motion-threading caused by occlusion or scene-change is considered. In general, the temporal wavelet decomposition consists of multiple layers. Unlike the original motion-threading scheme, in the proposed scheme each layer owns one set of motion vectors so as to achieve both high coding efficiency and temporal scalability. To reduce the motion cost, direct mode is used to exploit the motion vector correlation. An R-D optimized technique is introduced to estimate motion vectors and select proper prediction modes for each macroblock. The proposed advanced motion-threading scheme can outperform the original motion-threading scheme up to 1.5~5.0 dB. The experimental results also demonstrate that the 3D wavelet coding scheme can be competitive with the start-of-the-art JVT video standard on coding efficiency.
Complete-to-overcomplete discrete wavelet transforms for scalable video coding with MCTF
Author(s):
Yiannis Andreopoulos;
Mihaela van der Schaar;
Adrian Munteanu;
Joeri Barbarien;
Peter Schelkens;
Jan P.H. Cornelis
Show Abstract
Techniques for full scalability with motion-compensated temporal filtering (MCTF) in the wavelet-domain (in-band) are presented in this paper. The application of MCTF in the wavelet domain is performed after the production of the overcomplete discrete wavelet transform from the critically-sampled decomposition, a process that occurs at both the encoder and decoder side. This process, which is a complete-to-overcomplete discrete wavelet transform, is critical for the efficiency of the system with respect to scalability, coding performance and complexity. We analyze these aspects of the system and set the necessary constraints for drift-free video coding with in-band MCTF. As a result, the proposed architecture permits the independent operation of MCTF within different resolution levels or even different subbands of the transform and allows the successive refinement of the video information in resolution, frame-rate and quality.
Low-rate FGS video compression based on motion-compensated spatio-temporal wavelet analysis
Author(s):
Jerome Vieron;
Christine M. Guillemot
Show Abstract
This paper describes low rate FGS video compression algorithms
based on closed-loop temporal prediction combined with motion-compensated spatio-temporal wavelet analysis. After presenting an overall coding architecture, the paper reviews different motion-compensated (MC) temporal filtering solutions and discusses their amenability for compression and fine grain scalability. The coding architecture relies on a rate-constrained hierarchical motion estimation and a 3D-EBCOT algorithm. The analysis of different MC-temporal filtering solutions is substantiated with various quantitative elements on filter length, on motion models and temporal transform reliability. Some of their limitations are then addressed by designing algorithms which combine closed-loop temporal prediction with MC-temporal analysis. The PSNR performances of the approaches are compared with those obtained with MPEG-4 part 2 and H.264 JM2.1.
Experiments in JPEG 2000-based INTRA coding for H.26L
Author(s):
Roland Norcen;
Andreas Uhl
Show Abstract
In this paper we compare the coding performance of the JPEG2000 still image coding standard with the INTRA coding method used in the H.26L project. We discuss the basic techniques of both coding schemes and show the effect of improved I-frame coding to the overall performance of a H.26L-based system. The coding efficiency as well as the
runtime behaviour is considered in our comparison.
JPEG 2000 coding of image data over adaptive refinement grids
Author(s):
Manuel N. Gamito;
Miguel Salles Dias
Show Abstract
An extension of the JPEG 2000 standard is presented for non-conventional images resulting from an adaptive subdivision process. Samples, generated through adaptive subdivision, can have different sizes, depending on the amount of subdivision that was locally introduced in each region of the image. The subdivision principle allows each individual sample to be recursively subdivided into sets of four progressively smaller samples. Image datasets generated through adaptive subdivision find application in Computational Physics where simulations of natural processes are often performed over adaptive grids. It is also found that compression gains can be achieved for non-natural imagery, like text or graphics, if they first undergo an adaptive subdivision process. The representation of adaptive subdivision images is performed by first coding the subdivision structure into the JPEG 2000 bitstream, ina lossless manner, followed by the entropy coded and quantized transform coefficients. Due to the irregular distribution of sample sizes across the image, the wavelet transform must be applied on irregular image subsets that are nested across all the resolution levels. Using the conventional JPEG 2000 coding standard, adaptive subdivision images would first have to be upsampled to the smallest sample size in order to attain a uniform resolution. The proposed method for coding adaptive subdivision images is shown to perform better than conventional JPEG 2000 for medium to high bitrates.
Post-processing for JPEG 2000 image coding using recursive line filtering based on a fuzzy control model
Author(s):
Susu Yao;
Susanto Rahardja;
Xiao Lin;
Keng Pang Lim;
Zhongkang Lu
Show Abstract
In this paper, we propose a new method for removing coding artifacts appeared in JPEG 2000 coded images. The proposed method uses a fuzzy control model to control the weighting function for different image edges according to the gradient of pixels and membership functions. Regularized post-processing approach and recursive line algorithm are described in this paper. Experimental results demonstrate that the proposed algorithm can significantly improve image quality in terms of objective and subjective evaluation.
Resolution scalability for arbitrary wavelet transforms in the JPEG 2000 standard
Author(s):
Christopher M Brislawn;
Brendt E Wohlberg;
Allon G Percus
Show Abstract
A new set of boundary-handling algorithms has been developed for discrete wavelet transforms in the ISO/IEC JPEG-2000 Still Image Coding Standard. Two polyphase component extrapolation policies are
specfied: a constant extension policy and a symmetric extension policy. Neither policy requires any computations to generate the extrapolation. The constant extension policy is a low-complexity option that buffers just one sample from each end of the input being extrapolated. The symmetric extension policy has slightly higher memory and conditional-logic requirements but is mathematically equivalent to wholesample symmetric pre-extension when used with whole-sample symmetric filter banks. Both policies can be
employed with arbitrary lifted filter banks, and both policies preserve resolution scalability and reversibility. These extension policies will appear in Annex H, "Transformation of images using arbitrary wavelet transformations," in Part 2 ("Extensions") of the JPEG-2000 standard.
Novel wavelet base choice in JPEG 2000
Author(s):
Qinghai Wang;
Yulong Mo;
Chunmei Han
Show Abstract
In this paper we put forward a novel V9/3 wavelet that constructed by human visual system model (HVS). Using this wavelet in JPEG2000 standard, there will be better visual effect and lower computation complexity than that of Daubechies9/7 in JPEG2000. The wavelet base coefficients and its lift scheme discussed in paper. The experiment result display this wavelet have better visual effect.
Architecture, philosophy, and performance of JPIP: internet protocol standard for JPEG 2000
Author(s):
David S. Taubman;
Robert Prandolini
Show Abstract
JPEG2000 is a family of technologies based on the image compression system defined in IS 15444-1. Presently, the ISO/IEC Joint Technical Committee of Photographic Experts (JPEG) is developing an international standard for interactivity with JPEG2000 files, called JPIP; it will become Part 9 of the standard. One of the main goals of JPIP is to exploit the multi-resolution and spatially random access properties of JPEG2000, to permit “smart dissemination” of the data for client-server based applications. The purpose of this paper is to discuss the principles and philosophy of operation underlying the JPIP standard, to outline aspects of the architecture of JPIP systems, and to report on the performance of a prototype implementation.
Importance prioritization coding in JPEG 2000 for interpretability with application to surveillance imagery
Author(s):
Anthony N. Nguyen;
Vinod Chandran;
Sridha Sridharan;
Robert Prandolini
Show Abstract
Surveillance imagery is used for supporting the decision processes in strategic, operational and tactical tasks. The visual tasks associated with the interpretation of such imagery may require the detection of targets or the recognition of image contents. It is important for an image compression scheme to be tuned to maximise the imagery's interpretability (or content recognition) performance. Importance prioritisation coding of an image, where image contents are prioritised in order of importance for interpretability, is hence a very desirable feature for such applications. The paper presents a flexible framework for such prioritisation in JPEG2000 using regions of interests (ROIs) and importance maps. The coding scheme allows multiple and arbitrary shaped ROIs to be defined at different spatial locations, sub-band orientations and scale, and subsequently be prioritised using arbitrary importance scores. No ROI or importance map information need to be signalled to the decoder, and progressive lossy to lossless reconstruction of ROIs and the background is also possible. Furthermore, the prioritisation maintains the JPEG2000 code-stream syntax and can be decoded by any generic JPEG2000 decoder.
Novel efficient architecture for JPEG 2000 entropy coder
Author(s):
Omid Fatemi;
Parvin Asadzadeh Birjandi
Show Abstract
With the continual expansion of multimedia and Internet applications, the needs and requirements of advanced technologies, grew and evolved. With the increasing use of multimedia technologies, image compression techniques require higher performance as well as new features. Significant progress has recently been made in image compression techniques using discrete wavelet transforms. The overall performance of these schemes may be further improved by properly designing of efficient entropy coders. In this paper, we describe an efficient architecture for JPEG2000 entropy coder, which is a new standard to address the needs in the specific area of still image encoding. Our proposed architecture consists of two main parts, the coefficient bit modeler (CBM) and the binary arithmetic coder (BAC), which communicate through a FIFO buffer. Optimizations have been made in our proposed architecture to reduce accesses to memories. Our Proposed architecture is fast and modular and is suitable for real-time applications.
Selective coding with controlled quality decay for 2D and 3D images in a JPEG 2000 framework
Author(s):
Alberto Signoroni;
Fabio Lazzaroni M.D.;
Riccardo Leonardi
Show Abstract
This paper presents some ideas which extend the functionality and the application fields of a spatially selective coding within a JPEG2000 framework. At first, the image quality drop between the Regions of Interest (ROI) and the background (BG) is considered. In a conventional approach, the reconstructed image quality steeply drops along the ROI boundary; however, this effect could be considered or perceived objectionable in some use cases. A simple quality decay management is proposed here, which makes use of concentric ROI with different scaling factors. This allows the technique to be perfectly consistent with the JPEG2000 part 2 ROI definition and description. Another considered issue is the extension of the selective ROI coding to a 3D Volume of Interest coding. This extension is currently under consideration for the part 10 of JPEG2000, JP3D. An easy and effective 2D to 3D extension for the VOI definition and description is proposed here: a VOI is defined by a set composition of ROI generated solids, where ROI are defined along one or more volume cutting direction, and is described by the relative set of ROI parameters. Moreover, the quality decay management can be applied to this extension. The proposed techniques could have a significant impact on the selective coding of medical images and volumes. Image quality issues are very important but very critical factors in that field, which also constitutes the dominant market for 3D applications. Therefore, some experiments are presented on medical
images and volumes in order evaluate the benefits of the proposed
approaches in terms of diagnostic quality improvement with respect to
a conventional ROI coding usage.
An information theoretic approach to digital image watermarking
Author(s):
Selin Aviyente
Show Abstract
With the rapid development of wireless communication systems, transmission of digital multimedia has become widely spread. This brings with itself the issue of copyright protection for digital work. Digital watermarking is the process of embedding data inside a host image such that it does not degrade the perceptual quality of the image. In recent years, there have been many approaches to introduce different watermarking algorithms for the purposes of copyright protection, broadcast monitoring and covert communication. In this paper, a new transform based watermarking algorithm for digital images is introduced. This method uses the Singular Value Decomposition to obtain the eigenimages of a given image which are known to be the best orthogonal basis that can express that image in a least squares sense. The watermark is embedded by changing the strength of the singular values. The strength of the watermark is determined based on the entropy of the eigenimages to ensure robustness and imperceptibility at the simultaneously. A corresponding watermark detection and extraction algorithm is proposed. The performance of the algorithm under different types of attacks that a digital image can go through during transmission is illustrated through an example.
Bayesian approach to attack characterization using robust watermarks
Author(s):
Henry D Knowles;
Dominique A. Winne;
C. Nishan Canagarajah;
David R. Bull
Show Abstract
In this paper we propose the use of a Bayesian framework to allow characterisation of image tampering from a library of attacks. We use the double watermarking strategy proposed in our previous work to derive sufficient information to drive the classifier. A non-parametric Bayesian classifier, trained on data derived from Monte
Carlo simulations is used. In addition to classification, the effects of varying the input parameters are studied. The results obtained show that the non-parametric Bayesian classifier has a very low misclassification rate for this type of problem. Explanations as to the nature of the results, and some of the practical considerations, are given.
Efficient watermarking system with increased reliability for video authentication
Author(s):
Dominique Albert Winne;
Henry D. Knowles;
David R. Bull;
C. Nishan Canagarajah
Show Abstract
The widespread adoption of digital video techniques has generated a requirement for authenticity verification in applications such as criminal evidence, insurance claims and commercial databases. This paper extends our previous work and improves the watermark estimation procedure of a spatial digital video watermarking system designed to
detect and characterize time-base attacks. Most watermark extraction processes utilize the noise masking levels of the image. These levels change during transmission, especially when the host signal is compressed at low bit-rates. The blocking artifacts that are introduced by this process modify the noise masking levels and influence the ability to form a good estimate of the embedded watermark. This paper describes a novel filter procedure to eliminate these artifacts from the noise masking levels. The efficiency is compared with the standard MPEG-4 deblocking and deringing filters.
Extracting the watermark from only the encoded Macroblocks, excluding the skipped Macroblocks improves the performance significantly without an increase in computational complexity. The functionality of this system within an MPEG-4 implementation is demonstrated with a receiver operating characteristic.
Image vectorization in digital image watermarking
Author(s):
Lin Shang
Show Abstract
Unlike most previous work, which used a random number of a sequence of bits or an image as watermark directly, this paper proposed a new image vectorisation method for digital image watermarking.
A watermark image (image to be embedded) is firstly contourised into a sequence of contour curves by constructing a "vector" for each of the grey level values. In the contourisation process, a topology analysis method is applied for looking for local maxima, minima and saddle points to implement a topology table. It is well known that the volume of "vector" data from real image contourisation may be up to an order of magnitude greater than the raster representation.
Therefore, a simplification method is adopted to analysis the great expansion of data, and to determine which contours it is necessary to preserve in a given image and which contours can be discarded in that image. With help of previously obtained the topology table, image is decomposed into a number of adjacent sub regions known as catchments basins, each of which typically surrounds of a local maximum or minimum and is defined by a contour which is referred to as a
watershed or watershed boundary. Then the simplified contour points are embedded as watermark onto the cover image by using the well known spread spectrum technique. After the contour points are extracted from watermarked image, the watermark image is reconstructed by constructing the triangle mesh defined by the contour map and rendering it using conventional rendering method.
Performance issues in MPEG-4 VTC image coding
Author(s):
Roland Norcen;
Andreas Uhl
Show Abstract
In this paper, we discuss how we can enhance the performance of the MPEG-4 Visual Texture Coding algorithm (VTC). Runtime analysis reveals the major coding stages and shows a weak point within the vertical filtering stage. A useful cache-access strategy is considered, which lifts this problem almost entirely. Additionally, we
perform the DWT and the zerotree coding stage in parallel using OpenMP. The improved sequential version of the vertical filtering improves also the parallel eaciency significantly. We present results from 2 different multiprocessor platforms (SGI Power Challenge: 20 IP25 RISC CPUs, running at 195 Mhz; SGI Origin3800:128 MIPS RISC R12000 CPUs, running at 400MHz).
Encoding strategies for realizing MPEG-4 universal scalable video coding
Author(s):
Yi-Shin Tung;
Jin-Hau Kuo;
Ja-Ling Wu;
Wen-Huang Cheng;
Ting-Jian Pan
Show Abstract
The universal scalability, which integrates different types of scalabilities and consequently provides a large scaling range for each parameter, is of high interests to the applications in the current heterogeneous surroundings. In our previous work, an MPEG-4 universal scalable codec basing on a layered path-tree structure [1,2] has been addressed, in which a video layer and the coding order of two consecutive layers are interpreted as a node and the parent-to-child relationship in a path-tree, respectively. Since individual video layers can be coded separately using different coding tools in MPEG-4 simple scalable profile (SSP) [3] and fine-granularity scalable profile (FGS) [4], the proposed scalable video coder may include spatial, temporal and SNR enhancements simultaneously. In this paper, based on some visual observations we first address some encoding strategies for the universal scalable coding, including spatial-temporal quality tradeoff, region sensitivity and frequency weighting. Applying these strategies will take the content characteristics into consideration and can determine better coding parameters. As a result, bit allocation becomes more sensitive to those perceptually important parts of spatial, temporal and SNR enhancements. Next, a batch encoding process is conducted to generate universal scalable streams automatically, in which all the abovementioned encoding strategies are fully integrated together. The preliminary experiments show that better visual quality can be obtained within the full bitrate range.
A study on rate distortion optimization scheme for JVT coder
Author(s):
Koichi Takagi;
Yasuhiro Takishima;
Yasuyuki Nakajima
Show Abstract
This paper focuses on the encoder of ITU-T H.264 | ISO/IEC MPEG-4 Part-10 AVC (hereafter “JVT”), especially the Lagrange parameter for the rate distortion optimization (RDO) scheme. Since the Lagrange optimization method is very effective from the viewpoint of coding efficiency optimization, it is specified as an essential experimental condition for performance evaluation in JVT standardization activity. After discussing the background of this mechanism, we verify this method and the currently used parameter theoretically. Computer simulation results show that the Lagrange multiplier might not necessarily be original value; smaller values provide better performance in QCIF, for example.
Unequal error protection codes for wavelet video transmission over W-CDMA, AWGN, and Rayleigh fading channels
Author(s):
Minh Hung Le;
Ranjith Liyana-Pathirana
Show Abstract
The unequal error protection (UEP) codes with wavelet-based algorithm for video compression over wide-band code division multiple access (W-CDMA), additive white Gaussian noise (AWGN) and Rayleigh fading channels are analysed. The utilization of Wavelets has come out to be a powerful method for compress video sequence. The wavelet transform
compression technique has shown to be more appropriate to high quality video applications, producing better quality output for the compressed frames of video. A spatially scalable video coding framework of MPEG2 in which motion correspondences between successive video frames are exploited in the wavelet transform domain. The basic motivation for our coder is that motion fields are typically smooth that can be efficiently captured through a multiresolutional
framework. Wavelet decomposition is applied to video frames and the coefficients at each level are predicted from the coarser level through backward motion compensation. The proposed algorithms of the embedded zero-tree wavelet (EZW) coder and the 2-D wavelet packet transform (2-D WPT) are investigated.
Performance evaluation of Eureka-147 with RS(204,188) code for mobile multimedia broadcasting
Author(s):
Seung-Gi Chang;
Victor H. S. Ha;
Zhiming Zhang;
Yongje Kim
Show Abstract
The demand for mobile multimedia broadcasting service is increasing consistently as more people expect seamless outdoor connections and communication capabilities. In this paper, we introduce the digital multimedia broadcasting (DMB) system based on Eureka-147 that has been tentatively adopted in Korea. Since Eureka-147 is originally
designed for broadcasting digital audio data, it provides a bit error rate (BER) of about 10-4 while the transmission of compressed video data, for example, requires the BER of about 10-9. To deal with this mismatch, the Korean DMB standard is considering the addition of the RS(204,188) coder to Eureka-147. In this paper, we apply the RS(204,188) coder to the Eureka-147 and present the simulation results on the performance of this modified system at various transmission and protection modes. We conclude that the addition of RS(204,188) coder to Eureka-147 in the Korean DMB system results in the satisfactory level of BER for mobile multimedia broadcasting services.
Bit error recovery in internet facsimile without retransmission
Author(s):
Hyunju Kim;
Abdou Youssef
Show Abstract
This paper proposes a bit error recovery method for Internet facsimile images compressed with Group 3, Extended 2 Dimensional MMR coding scheme. When an error occurs in an MMR coded bitstream, the bitstream cannot be correctly decoded after the error point. To prevent losing valid information after an error, we developed an error recovery system that detects bit errors and applies bit-inversion algorithms to correct the errors. In case the bit-inversion cannot correct the error, the system applies new algorithms that utilize syntactical structure information of the coded bitstream. Testing results show that around 95% of bit errors are corrected with our bit-inversion algorithms. The error recovery system also recovers nearly all the data for the remaining 5% of the images.
An adaptive error-resilient video encoder
Author(s):
Liang Cheng;
Magda El Zarki
Show Abstract
When designing an encoder for a real-time video application over a wireless channel, we must take into consideration the unpredictable fluctuation of the quality of the channel and its impact on the transmitted video data. This uncertainty motivates the development of an adaptive video encoding mechanism that can compensate for the infidelity caused either by data loss and/or by the post-processing (error concealment) at the decoder. In this paper, we first explore the major factors that cause quality degradation. We then propose an adaptive progressive replenishment algorithm for a packet loss rate (PLR) feedback enabled system. Assuming the availability of a feedback channel, we discuss a video quality assessment method, which allows the encoder to be aware of the decoder-side perceptual quality. Finally, we present a novel dual-feedback mechanism that guarantees an acceptable level of quality at the receiver side with modest increase in the complexity of the encoder.
Error-resilient method for robust video transmissions
Author(s):
Dong-Hwan Choi;
Tae-Gyun Lim;
Sang-Hak Lee;
Chan-Sik Hwang
Show Abstract
In this paper we address the problems of video transmission in error prone environments. A novel error-resilient method is proposed that uses a data embedding scheme for header parameters in video coding standards, such as MPEG-2 and H.263. In case of requiring taking the loss of data information into account except for header errors, the video decoder hides visual degradation as well as possible, employing an error concealment method using an affine transform. Header information is very important because syntax elements, tables, and decoding processes all depend on the values of the header information. Therefore, transmission errors in header information can result in serious visual degradation of the output video and also cause an abnormal decoding process. In the proposed method, the header parameters are embedded into the least significant bits (LSB) of the quantized DCT coefficients. Then, when errors occur in the header field of the compressed bitstream, the decoder can accurately recover the corrupted header parameters if the embedded information is extracted correctly. The error concealment technique employed in this paper uses motion estimation considering actual motions, such as rotation, magnification, reduction, and parallel motion, in moving pictures. Experimental results show that the proposed error-resilient method can effectively reconstruct the original video sequence without any additional bits or modifications to the video coding standard and the error concealment method can produce a higher PSNR value and better subjective video quality, estimating the motion of lost data more accurately.
Clustering of singular value decomposition of image data with applications to texture classification
Author(s):
Alireza Tavakoli Targhi;
Azad Shademan
Show Abstract
In this paper, some applications of a local version of Singular
Value Decomposition (SVD) to texture classification and texture
segmentation are studied. We introduce two measures, obtained from
SVD transform, which capture some of the perceptual and conceptual
features in an image. One of the measures classifies the textures
by their roughness and structures. Experimental results show that
these measures are suitable for texture clustering and image
segmentation and they are robust relative to changes in lighting.
Object and event recognition for stroke rehabilitation
Author(s):
Ahmed Ghali;
Andrew S. Cunningham;
Tony P. Pridmore
Show Abstract
Stroke is a major cause of disability and health care expenditure around the world. Existing stroke rehabilitation methods can be effective but are costly and need to be improved. Even modest improvements in the effectiveness of rehabilitation techniques could produce large benefits in terms of quality of life. The work reported here is part of an ongoing effort to integrate virtual reality and machine vision technologies to produce innovative stroke rehabilitation methods. We describe a combined object recognition and event detection system that provides real time feedback to stroke patients performing everyday kitchen tasks necessary for independent living, e.g. making a cup of coffee. The image plane position of each object, including the patient’s hand, is monitored using histogram-based recognition methods. The relative positions of hand and objects are then reported to a task monitor that compares the patient’s actions against a model of the target task. A prototype system has been constructed and is currently undergoing technical and clinical evaluation.
Blind separation of mixed images using multiscale transforms
Author(s):
Pavel Kisilev;
Michael Zibulevsky;
Yehoshua Y. Zeevi
Show Abstract
It was previously shown that sparse representations can improve and simplify the estimation of an unknown mixing matrix of a set of images and thereby improve the quality of separation of source images. Here we propose a multiscale approach to the problem of blind separation of images from a set of their mixtures. We take advantage of the properties of multiscale transforms such as wavelet packets and decompose signals and images according to sets of local features. The resulting partial representations on a tree of data structure depict various degrees of sparsity. We show how the separation error is affected by the sparsity of the decomposition coefficients, and by the misfit between the prior, formulated in accordance with the probabilistic model of the coefficients' distribution, and the
actual distribution of the coefficients. Our error estimator, based on the Taylor expansion of the quasi Log-Likelihood function, is used in selection of the best subsets of coefficients, utilized in turn for further separation. The performance of the proposed method is assessed by separation of noise-free and noisy data. Experiments with simulated and real signals and images demonstrate significant improvement of separation quality over previously reported results.
Intuitive strategy for parameter setting in video segmentation
Author(s):
Elisa Drelie Gelasca;
Elena Salvador;
Touradj Ebrahimi
Show Abstract
In this paper, we propose an original framework for an intuitive tuning of parameters in image and video segmentation algorithms. The proposed framework is very flexible and generic and does not depend on a specific segmentation algorithm, a particular evaluation metric, or a specific optimization approach, which are the three main components of its block diagram. This framework requires a manual segmentation input provided by a human operator as he/she would have performed intuitively. This input allows the framework to search for the optimal set of parameters which will provide results similar to those obtained by manual segmentation. On one hand, this allows researchers and designers to quickly and automatically find the best parameters in the segmentation algorithms they have developed. It helps them to better understand the degree of importance of each parameter's value on the final segmentation result. It also identifies the potential of the segmentation algorithm under study in terms of best possible performance level. On the other hand, users and
operators of systems with segmentation components, can efficiently
identify the optimal sets of parameters for different classes of images or video sequences. In a large extent, this optimization can be
performed without a deep understanding of the underlying algorithm,
which would facilitate the exploitations and optimizations in real
applications by non-experts in segmentation. A specific implementation
of the proposed framework was obtained by adopting a video segmentation algorithm invariant to shadows as segmentation component, a full reference segmentation quality metric based on a perceptually motivated spatial context, as the evaluation component, and a down-hill simplex method, as optimization component. Simulation results on various test sequences, covering a representative set of indoor and ourdoor video, show that optimal set of parameters can be obtained efficiently and largely improve the results obtained when compared to a simple implementation of the same segmentation algorithm with ad-hoc parameter setting strategy.
Optimally smooth error resilient streaming of 3D wireframe animations
Author(s):
Socrates Varakliotis;
Stephen Hailes;
Jorn Ostermann
Show Abstract
Much research has been undertaken in the area of streaming video across computer networks in general and the Internet in particular, but relatively little has been undertaken in the field of streaming 3-D wireframe animation. Despite superficial similarities, both being visual media, the two are significantly different. Different data passes across the network so loss affects signal reconstruction differently. Regrettably, the perceptual effects of such loss have been poorly addressed in the context of animation to date and much of
the work that there has been in this field has relied on objective measures such as PSNR in lieu of those that take subjective effects into account.
In this paper, we bring together concepts from a number of fields to address the problem of how to achieve optimal resilience to errors in terms of the perceptual effect at the receiver. To achieve this, we partition the animation stream into a number of layers and apply Reed-Solomon (RS) forward error correction (FEC) codes to each layer independently and in such a way as to maintain the same overall bitrate whilst minimizing the perceptual effects of error, as measured by a distortion metric derived from related work in
the area of static 3-D mesh compression.
Experimental results show the efficacy of our proposed scheme under varying network bandwidth and loss conditions for different layer partitionings. The results indicate that with the proposed Unequal Error Protection (UEP) combined with Error Concealment (EC) and efficient packetization scheme, we can achieve graceful degradation of streamed animations at higher packet loss rates than other approaches that do not cater for the visual importance of the layers and use only objective layering metrics. Our experiments also demonstrate how to tune the packetization parameters in order to achieve efficient layering with respect to the subjective metric of surface smoothness.
Model for unbalanced multiple description video transmission using path diversity
Author(s):
Sila Ekmekci;
Thomas Sikora
Show Abstract
Multiple State Video Coding (MSVC) is a Multiple Description Coding Scheme where the video is coded into multiple independently decodable streams, each with its own prediction process and state. The system subject to this work is composed of two subsystems: 1- multiple state encoding/decoding, 2- path diversity transmission system. In [1] we discussed how to optimize the rate allocation of such a system while maximizing the average reconstructed frame PSNR at the decoder and minimizing the PSNR variations between the streams given the total bitrate RT and the balanced (equal loss probabilities) or unbalanced (unequal) loss probabilities p1 and p2 over the two paths. In our current work we establish a theoretical framework to estimate the rate-(decoder) distortion (R-Dd) function and have taken into account the MSVC-structure, the rate allocation, channel impairments and reconstruction strategies respectively. The video sequence is modeld as an AR(1) source and the distortion associated with each reconstructed frame in both threads is a lossy transmission environment is estimated recursively depending on the system parameters.
Recovery of incorrectly transmitted DWT-coded images
Author(s):
Yan Niu;
Ashraf Kassim
Show Abstract
This paper presents a post-processing error concealment technique for handling transmission of DWT image coefficients. Our technique recovers the lost regularity information as well as irregularity information. First, the Harmonic Function is employed for overall smoothness recovery. Next, in order to enhance local edges, the spatial coherence between DWT subbands is explored. The amplitudes of those detected singular points are then modified, according to the regions they belong to. Our proposed method reveals edges from blurred areas due to smooth recovery accurately.
Flat-scalable video communication for wireless transmission error tolerance
Author(s):
Ryoichi Kawada;
Atsushi Koike;
Masahiro Wada;
Yoshinori Hatori
Show Abstract
Measures for coping with transmission errors in wireless environments are essential for mobile video communication. This paper proposes a “flat-scalable” scheme that has transmission error tolerance and is applicable regardless of the video compression method. The proposed scheme, which uses two sets of encoders and decoders, is a diversity method in the application layer. This scheme, by differentiating the temporal position of I-pictures between the two encoders, obtains two decoded pictures whose coding noises differ. When there is no error on
both streams, it averages the two decoded pictures to reduce noise and improve PSNR. When one of the streams has errors, it outputs the decoded picture without errors. This scheme greatly improves the received picture quality compared to the conventional single stream scheme especially when there are transmission errors.
Error-resilient performance evaluation of MPEG-4 and H.264
Author(s):
Bongsoo Jung;
Young Hooi Hwang;
Byeungwoo Jeon;
Myung Don Kim;
Song-In Choi
Show Abstract
Recent advances in video coding technology have resulted in rapid growth of application in mobile communication. With this explosive growth, reliable transmission and error resilient technique become increasingly necessary to offer high quality multimedia service. This paper discusses the error resilient performances of the MPEG-4 simple profile under the H.324/M and the H.264 baseline under the IP packet networks. MPEG-4 simple profile has error resilient tools such as resynchronization marker insertion, data partitioning, and reversible VLC. H.264 baseline has the flexible macroblock ordering scheme, and others. The objective and subjective quality of decoded video is measured under various random bit and burst error conditions.
Application of web-based visualizations to interactive television: a practical approach
Author(s):
Sepideh Chakaveh;
Olaf Geuer;
Stefan Werning;
Sorina Borggrefe;
Ralf Haeger
Show Abstract
In the near future interactive television will provide many entertaining and innovative broadcasting formats for TV viewers. Moreover through the recent advancements in web-based visualisation techniques, complex application scenarios can already be realised. With the aid of a demonstrator called "deinewahl02" we proved a challenging concept on how to import web-based applications onto more complex platforms such as MHP set-up boxes. "deinewahl02" which represents a pretend political TV debate of the German general elections 2002, gives one the possibilities to playfully & entertainly be guided into the programme. Using the various functionalities such as "Voting" or "Hotspots" the possibilities of interaction, are well demonstrated within the programme. This provides a two way communication channel which can be established instantly, between the viewer and the broadcaster. "deinewahl02" has successfully demonstrated, ways where web-based applications may quickly and cheaply be implemented onto much more complex platforms.
Color data visualization for color imaging
Author(s):
Alain Tremeau;
Philippe Colantoni
Show Abstract
There are several ways to display color data of a color image. In this paper we present different methods that we have developed in order to understand and to analyze color image information. These methods use traditional 2D and 3D visualization model associated with specific color transformation. We also introduce a new multidimensional visualization model usefull to analyse spatiocolorimetric data.
Media handling for visual information retrieval in VizIR
Author(s):
Horst Eidenberger
Show Abstract
This paper describes how the handling of visual media objects is implemented in the visual information retrieval project VizIR. Essentially, four areas are concerned: media access, media representation in user interfaces, visualisation of media-related data and media transport over the network. The paper offers detailed technical descriptions of the solutions developed in VizIR for these areas. Unified media access for images and video is implemented through class MediaContent. This class contains methods to access the view on a media object at any point in time as well as methods to change the colour model and read/write format parameters (size, length, frame-rate). Based on this low-level-API class VisualCube allows accessing spatio-temporal areas in temporal media randomly. Transformer-classes allow to modify visual objects in a very simple but effective way. Visualisation of media object is implemented in class MediaRenderer. Each MediaRenderer represents one media object and is responsible for any aspect of its visualisation. In the paper examples for reasonable implementations of MediaRenderer-classes are presented. Visualisation of media-related data is strongly connected to MediaRenderer. MediaRenderer is to a large extent responsible for displaying visual panels created by other framework components. Finally, media object transport in VizIR is based on the Realtime Transfer Protocol (for media objects) and XML-messaging (for XML-data).
An efficient optimized JPEG 2000 tier-1 coder hardware implementation
Author(s):
Paul R Schumacher
Show Abstract
It is a well-known fact that the major bottleneck of a JPEG2000 encoder is the bit/context modeling and arithmetic coding tasks (also known as the tier-1 coding portion of EBCOT). Whereas the technique of using mutiple coding passes on multiple bit-planes follows a near-optimal path on the rate-distortion curve and helps create an elegant embedded codestream, this tier-1 coding requies a large amount of computation for each block of data as well as significant memory resources and memory accesses. Luckily, the JPEG2000 standard allows us to perform a number of the tier-1 coding tasks in parallel. If this parallelization is exploited and if smart data organization techniques are used, then the throughput of a JPEG2000 system can be dramatically improved. This paper discusses an efficient, optimized hardware implementation of a tier-1 coder that exploits these available parallelisms. This paper also describes implementation on Xilinx FPGA platforms. The proposed technique described in this paper is approximately 50% faster than the best technique described in the literature.
Acceleration of MPEG-4 video applications with the reconfigurable HW processor XPP
Author(s):
Claus Ritter;
Eberhard Schueler;
Johannes Quast;
Klaus D. Mueller-Glaser
Show Abstract
The next generation of mobile phones need high computational power to fulfil their primary tasks, multimedia applications and services. To achieve this goal, powerful processors with high clock frequencies are used. Although the processing power capabilities are increased, the capabilities in the electrical power supply are not. The results are powerful mobile devices with insufficient batteries. The formula: higher frequency is equivalent with a higher computational power is still valid, but for the price of a high power consumption. One solution is the usage of specialized and therefore more compact hardware, like ASICs, DSPs etc. On the other side this will greatly reduce the flexibility of the device and the application areas will be limited.
New technology approaches have to be found to reduce these dilemmas. This paper describes an ongoing study of a SoC design where the reconfigurable coprocessor XPP is embedded with a standard mobile phone processor. The target application for this system is a low-cost/power environment running a MPEG-4 encoder/decoder (Visual Profile: Simple@L1). The whole MPEG-4 encoding/decoding process is partitioned between the standard processor, which is controlling the system and executes control-intensive algorithms, and its XPP coprocessor, which executes the computational-intensive data-flow algorithms and sends the results back to the host processor or a shared memory bank.
A platform for distributed image processing and image retrieval
Author(s):
Mark Oliver Gueld;
Christian J. Thies;
Benedikt Fischer;
Daniel Keysers;
Berthold B. Wein;
Thomas M. Lehmann
Show Abstract
We describe a platform for the implementation of a system for content-based image retrieval in medical applications (IRMA). To cope with the constantly evolving medical knowledge, the platform offers a flexible feature model to store and uniformly access all feature types required within a multi-step retrieval approach. A structured generation history for each feature allows the automatic identification and re-use of already computed features. The platform uses directed acyclic graphs composed of processing steps and
control elements to model arbitrary retrieval algorithms. This visually intuitive, data-flow oriented representation vastly improves the interdisciplinary communication between computer scientists and physicians during the development of new retrieval algorithms.
The execution of the graphs is fully automated within the platform.
Each processing step is modeled as a feature transformation. Due to a high degree of system transparency, both the implementation and the
evaluation of retrieval algorithms are accelerated significantly.
The platform uses a client-server architecture consisting of a central database, a central job scheduler, instances of a daemon
service, and clients which embed user-implemented feature ansformations. Automatically distributed batch processing and distributed feature storage enable the cost-efficient use of an existing workstation cluster.
A pipeline memory-efficient programmable architecture for the 2D discrete wavelet transform using lifting scheme
Author(s):
Sara Bolouki;
Omid Fatemi
Show Abstract
In this paper we propose a dedicated architecture to implement a 2-D Discrete Wavelet Transform (DWT) by using the lifting scheme method. The advantages of lifting scheme are lower computational complexity, transforming signal without extension and reduced memory requirement.
The proposed architecture is re-configurable for 5/3 and 9/7 filters and employs folded configuration to reduce the hardware cost and achieve higher hardware utilization. The architecture is useful for VLSI implementation and various image- video applications. The design has been modeled by VHDL language and simulated by Modelsim and
it is fully synthesizable.
Pose estimation and 3D modeling from video by volume feedback
Author(s):
Alireza Nasiri Avanaki;
Babak Hamidzadeh;
Faouzi Kossentini
Show Abstract
Volume reconstruction and pose retrieval of an arbitrary rigid object
from monocular video sequences is addressed. Initially, the object
pose is estimated in each image by locating similar textures, assuming
a flat depth map. Then shape-from-silhouette is used to make a volume
(3-D model). This volume is used in a new round of pose estimations,
this time by a model-based method that gives better estimates. Before
repeating this process by building a new volume, pose estimates are
adjusted to reduce error by maximizing a novel quality measure for
shape-from-silhouette volume reconstruction. The feedback loop is
terminated when pose estimates do not change much, as compared to
those produced by the previous iteration. Based on the theoretical
study of the proposed system, a test of convergence to a given set of
poses is devised. Reliable performance of the system is also proved by
several experiments. No model is assumed for the object. Feature
points are neither detected nor tracked, as there is no problematic
feature matching or correspondence. Our method can be also applied to
3-D object tracking in video.
Fly-through viewpoint video system for multiview soccer movie using viewpoint interpolation
Author(s):
Naho Inamoto;
Hideo Saito
Show Abstract
This paper presents a novel method for virtual view generation that allows viewers to fly through in a real soccer scene. A soccer match is captured by multiple cameras at a stadium and images of arbitrary viewpoints are synthesized by view-interpolation of two real camera images near the given viewpoint. In the proposed method, cameras do not need to be strongly calibrated, but epipolar geometry between the cameras is sufficient for the view-interpolation. Therefore, it can easily be applied to a dynamic event even in a large space, because the efforts for camera calibration can be reduced. A soccer scene is classified into several regions and virtual view images are generated based on the epipolar geometry in each region. Superimposition of the images completes virtual views for the whole soccer scene. An application for fly-through observation of a soccer match is introduced as well as the algorithm of the view-synthesis and experimental results..
Real-time free-viewpoint video rendering from volumetric geometry
Author(s):
Bastian Goldluecke;
Marcus Magnor
Show Abstract
The aim of this work is to render high-quality views of a dynamic scene from novel viewpoints in real-time. An online system available at our institute computes the visual hull as a geometry proxy to guide the rendering at interactive rates. Because only a sparse set of cameras distributed around the scene is used to record the scene, only an coarse model of the scene geometry can be recovered.
To alleviate this problem, we render textured billboards defined by the voxel model of the visual hull, preserving details in the source images while achieving excellent performance. By exploiting multi-texturing capabilities of modern graphics hardware, real-time frame rates are attained. Our algorithm can be used as part of an inexpensive system to display 3D-videos, or ultimately even in live 3D-television. The user is able to watch the scene from an arbitrary viewpoint chosen interactively.
Light field rendering with omnidirectional camera
Author(s):
Hiroshi Todoroki;
Hideo Saito
Show Abstract
This paper presents an approach to capture visual appearance of a real environment such as an interior of a room. We propose the method for generating arbitrary viewpoint images by building light field with the omni-directional camera, which can capture the wide circumferences. Omni-directional camera used in this technique is a special camera with the hyperbolic mirror in the upper part of a camera, so that we can capture luminosity in the environment in the range of 360 degree of circumferences in one image. We apply the light field method, which is one technique of Image-Based-Rendering(IBR), for generating the arbitrary viewpoint images. The light field is a kind of the database that records the luminosity information in the object space. We employ the omni-directional camera for constructing the light field, so that we can collect many view direction images in the light field. Thus our method allows the user to explore the wide scene, that can acheive realistic representation of virtual enviroment. For demonstating the proposed method, we
capture image sequence in our lab's interior environment with an omni-directional camera, and succesfully generate arbitray viewpoint images for virual tour of the environment.
Fully scalable 3D overcomplete wavelet video coding using adaptive motion-compensated temporal filtering
Author(s):
Jong Chul Ye;
Mihaela van der Schaar
Show Abstract
In this paper, we present a fully scalable 3-D overcomplete wavelet video coder that employs a new and highly efficient 3-D lifting structure for adaptive motion compensated temporal filtering (MCTF). Unlike the conventional interframe wavelet video techniques that apply MCTF on the spatial domain video data and then encode the resulting temporally filtered frames using critical sampled wavelet transforms, the scheme proposed in this paper performs first the spatial domain wavelet transform and subsequently applies MCTF for each wavelet band. To overcome the inefficiency of motion estimation in the wavelet domain, the low band shifting method (LBS) is used at both the encoder and decoder to generate an overcomplete representation of the temporal reference frames. A novel interleaving algorithm for the overcomplete wavelet coefficient is proposed that enables optimal sub-pixel accuracy motion estimation implementations. Furthermore, to achieve arbitrary accuracy motion estimation and compensation in the overcomplete wavelet domain with perfect reconstruction, a novel 3-D lifting structure is also introduced. Simulation results shows that the proposed fully scalable 3-D overcomplete wavelet video coder has comparable or better performance (up to 0.5dB) than the previously proposed interframe wavelet coders under the same coding conditions. Several techniques that can further improve the performance of the proposed overcomplete wavelet coding scheme are also discussed.
Compression of 3D integral images using wavelet decomposition
Author(s):
Meriem Mazri;
Amar Aggoun
Show Abstract
This paper presents a wavelet-based lossy compression technique for unidirectional 3D integral images (UII). The method requires the extraction of different viewpoint images from the integral image. A single viewpoint image is constructed by extracting one pixel from each microlens, then each viewpoint image is decomposed using a Two Dimensional Discrete Wavelet Transform (2D-DWT). The resulting array of coefficients contains several frequency bands. The lower frequency bands of the viewpoint images are assembled and compressed using a 3 Dimensional Discrete Cosine Transform (3D-DCT) followed by Huffman coding. This will achieve decorrelation within and between 2D low frequency bands from the different viewpoint images. The remaining higher frequency bands are Arithmetic coded. After decoding and decompression of the viewpoint images using an inverse 3D-DCT and an inverse 2D-DWT, each pixel from every reconstructed viewpoint image is put back into its original position within the microlens to reconstruct the whole 3D integral image.
Simulations were performed on a set of four different grey level 3D UII using a uniform scalar quantizer with deadzone. The results for the average of the four UII intensity distributions are presented and compared with previous use of 3D-DCT scheme. It was found that the algorithm achieves better rate-distortion performance, with respect to compression ratio and image quality at very low bit rates.
Compression of LADAR imagery
Author(s):
Joseph C. Dagher;
Michael W. Marcellin;
Mark A. Neifeld
Show Abstract
We develop novel methods for compressing volumetric imagery that has been generated by single platform (mobile) range sensors. We exploit the correlation structure inherent in multiple views in order to improve compression efficiency. We evaluate the performance of various two-dimensional (2D) compression schemes on the traditional 2D range representation. We then introduce a three-dimensional (3D) representation of the range measurements and show that, for lossless compression, 3D volumes compress more efficiently than 2D images by a factor of 60%.
Scan order and quantization for 3D-DCT coding
Author(s):
Nikola Bozinovic;
Janusz Konrad
Show Abstract
Two types of coders dominate the field of video compression research today: well-established hybrid coders, that are in the core of all MPEG and H.26X standards, and emerging three-dimensional (3D) subband coders, largely inspired by the success of wavelet-based still image compression. However, there are surprisingly few results reported on 3D Discrete Cosine Transform (DCT) based transform coders. Even while exploiting all the benefical properties of the DCT itself (forward/inverse symmetry, fast separable implementation, and excellent energy compaction), these coders under-perform when compared to competing hybrid coders primarily due to
ineffcient quantization, scanning and entropy coding used. In this paper, we study means of improving 3D-DCT coding by proposing adaptive scanning order and quantization of coeffcients that are better matched to 3D-DCT spectrum of a motion sequence. Our results show signifcant improvement in performance over previously
reported techniques.
Three-dimensional mesh simplification using normal variation error metric and modified subdivided edge classification
Author(s):
Eun-Young Chang;
Chung-Hyun Ahn;
Yo-Sung Ho
Show Abstract
In order to transmit or store three-dimensional (3-D) mesh models efficiently, we need to simplify them. Although the quadric error metric (QEM) provides fast and accurate geometric simplification of 3-D mesh models, it cannot capture discontinuities faithfully. Recently, an enhanced QEM based on subdivided edge classification has been proposed to handle this problem. Although it can capture discontinuities well, it has slight degradation in the reconstruction quality. In this paper, we propose a novel mesh simplification algorithm where we employ a normal variation error metric, instead of QEM, to resolve the quality degradation issue. We also modify the subdivided edge classification algorithm to be cooperative with the normal variation error metric while preserving discontinuities. We have tested the proposed algorithm with various 3-D VRML models. Simulation results demonstrate that the proposed algorithm provides good approximations while maintaining discontinuities well.
Video coding of model based at very low bit rates
Author(s):
XianPing Fu;
Zhao Wang
Show Abstract
Very low bit rate coding is an important part of the MPEG-4 standard. To obtain maximum compression in videoconference sequences a technique called model-based coding can be used. With this technique, the face of the speaker is represented by a model, and its movement between frames is coded with a small set of parameters. As with
all MPEG standards, the encoder algorithm is not described and this leaves the door open to many possible implementations. In this paper, a summary of model-based coding is presented. Many fields have been studied, including 3D wire-frame models, face detection, facial feature extraction, motion parameter estimation, and face model
parameter coding. Several problems need to be solved before model-based video communication will become practical. Some of them are discussed in this paper, such as how to detect and locate a face in an image, how to extract parameters from an image to adapt a face model, and how to compress the animation parameters.
Semantic transcoding of video based on regions of interest
Author(s):
Jeongyeon Lim;
Munchurl Kim;
Jong-Nam Kim;
Kyeongsoo Kim
Show Abstract
Traditional transcoding on multimedia has been performed from the perspectives of user terminal capabilities such as display sizes and decoding processing power, and network resources such as available network bandwidth and quality of services (QoS) etc. The adaptation (or transcoding) of multimedia contents to given such constraints has been made by frame dropping and resizing of audiovisual, as well as reduction of SNR (Signal-to-Noise Ratio) values by saving the resulting bitrates. Not only such traditional transcoding is performed from the perspective of user’s environment, but also we incorporate a method of semantic transcoding of audiovisual based on region of interest (ROI) from user’s perspective. Users can designate their interested parts in images or video so that the corresponding video contents can be adapted focused on the user’s ROI. We incorporate the MPEG-21 DIA (Digital Item Adaptation) framework in which such semantic information of the user’s ROI is represented and delivered to the content provider side as XDI (context digital item). Representation schema of our semantic information of the user’s ROI has been adopted in MPEG-21 DIA Adaptation Model. In this paper, we present the usage of semantic information of user’s ROI for transcoding and show our system implementation with experimental results.
Optimized sign language video coding based on eye-tracking analysis
Author(s):
Dimitris Agrafiotis;
C. Nishan Canagarajah;
David R. Bull;
Matt Dye;
Helen Twyford;
Jim Kyle;
James Chung How
Show Abstract
The imminent arrival of mobile video telephony will enable deaf people to communicate - as hearing people have been able to do for a some time now - anytime/anywhere in their own language sign language. At low bit rates coding of sign language sequences is very challenging due to the high level of motion and the need to maintain good image quality to aid with understanding. This paper presents optimised coding of sign language video at low bit rates in a way that will favour comprehension of the compressed material by deaf users. Our coding suggestions are based on an eye-tracking study that we have conducted which allows us to analyse the visual attention of sign language viewers. The results of this study are included in this paper. Analysis and results for two coding methods, one using MPEG-4 video objects and the second using foveation filtering are presented. Results with foveation filtering are very promising, offering a considerable decrease in bit rate in a way which is compatible with the visual attention patterns of deaf people, as these were recorded in the eye tracking study.
Iterative image coding with overcomplete complex wavelet transforms
Author(s):
Nick G. Kingsbury;
Tanya Reeves
Show Abstract
Overcomplete transforms, such as the Dual-Tree Complex Wavelet
Transform, can offer more flexible signal representations than
critically-sampled transforms such as the Discrete Wavelet
Transform. However the process of selecting the optimal set of
coefficients to code is much more difficult because many different
sets of transform coefficients can represent the same decoded
image. We show that large numbers of transform coefficients can
be set to zero without much reconstruction quality loss by forcing
compensatory changes in the remaining coefficients. We develop a
system for achieving these coding aims of coefficient elimination
and compensation, based on iterative projection of signals between
the image domain and transform domain with a non-linear process
(e.g.~centre-clipping or quantization) applied in the transform
domain. The convergence properties of such non-linear feedback
loops are discussed and several types of non-linearity are
proposed and analyzed. The compression performance of the
overcomplete scheme is compared with that of the standard Discrete
Wavelet Transform, both objectively and subjectively, and is found
to offer advantages of up to 0.65 dB in PSNR and significant
reduction in visibility of some types of coding artifacts.
Multiscale geometric image processing
Author(s):
Justin K. Romberg;
Michael B. Wakin;
Richard G. Baraniuk
Show Abstract
Since their introduction a little more than 10 years ago, wavelets
have revolutionized image processing. Wavelet based
algorithms define the state-of-the-art for applications
including image coding (JPEG2000), restoration, and segmentation.
Despite their success, wavelets have significant shortcomings in their
treatment of edges. Wavelets do not parsimoniously capture even the
simplest geometrical structure in images, and wavelet based processing
algorithms often produce images with ringing around the edges.
As a first step towards accounting for this structure, we will show
how to explicitly capture the geometric regularity of contours in
cartoon images using the wedgelet representation and a multiscale geometry model. The wedgelet representation builds up an image out of simple piecewise constant functions with linear discontinuities. We will show how the geometry model, by putting a joint distribution on the orientations of the linear discontinuities, allows us to weigh several factors when choosing the wedgelet representation: the error between the representation and the original image, the parsimony of the representation, and whether the wedgelets in the representation form "natural" geometrical structures. Finally, we will analyze a simple wedgelet coder based on these principles, and show that it has optimal asymptotic performance for simple cartoon images.
Geometrical image compression with bandelets
Author(s):
Erwan Le Pennec;
Stephane G. Mallat
Show Abstract
This paper introduces a new class of bases, called bandelet bases,
which decompose the image along vectors that are elongated in the
direction of a geometric flow. This geometric flow indicates the
direction in which the image grey levels have regular variations.
The image decomposition in a bandelet basis is implemented with
a fast subband filtering algorithm. Bandelet bases lead to optimal approximation rates for geometrically regular images. For image compression, the bandelet basis geometry is optimized with a fast best basis algorithm. Comparisons are made for image compression with wavelet basis.
Discrete directional wavelet bases for image compression
Author(s):
Pier Luigi Dragotti;
Vladan Velisavljevic;
Martin Vetterli;
Baltasar Beferull-Lozano
Show Abstract
The application of the wavelet transform in image processing
is most frequently based on a separable construction. Lines and columns in an image are treated independently and the basis functions are simply products of the corresponding one dimensional functions.
Such method keeps simplicity in design and computation, but is not capable of capturing properly all the properties of an image. In this paper, a new truly separable discrete multi-directional transform is proposed with a subsampling method based on lattice theory. Alternatively, the subsampling can be omitted and this leads to a multi-directional frame. This transform can be applied in many areas like denoising, non-linear approximation and compression. The results on non-linear approximation and denoising show very interesting gains compared to the standard two-dimensional analysis.
Overview of multimodal techniques for the characterization of sport programs
Author(s):
Nicola Adami;
Riccardo Leonardi;
Pierangelo Migliorati
Show Abstract
The problem of content characterization of sports videos is of
great interest because sports video appeals to large audiences and
its efficient distribution over various networks should contribute
to widespread usage of multimedia services. In this paper we
analyze several techniques proposed in literature for content
characterization of sports videos. We focus this analysis on the
typology of the signal (audio, video, text captions, ...) from
which the low-level features are extracted. First we consider the
techniques based on visual information, then the methods based on
audio information, and finally the algorithms based on
audio-visual cues, used in a multi-modal fashion. This analysis
shows that each type of signal carries some peculiar information,
and the multi-modal approach can fully exploit the multimedia
information associated to the sports video. Moreover, we observe
that the characterization is performed either considering what
happens in a specific time segment, observing therefore the
features in a "static" way, or trying to capture their "dynamic"
evolution in time. The effectiveness of each approach depends
mainly on the kind of sports it relates to, and the type of
highlights we are focusing on.
Semantic annotation for live and posterity logging of video documents
Author(s):
Marco Bertini;
Alberto Del Bimbo;
W. Nunziati
Show Abstract
Broadcasters usually envision two basic applications for
video databases: Live Logging and Posterity Logging. The former
aims at providing effective annotation of video in quasi-real time
and supports extraction of meaningful clips from the live stream;
it is usually performed by assistant producers working at the same
location of the event. The latter provides annotation for later
reuse of video material and is the prerequisite for retrieval by
content from video digital libraries; it is performed by trained
librarians. Both require that annotation is performed, at a great
extent, automatically. Video information structure must encompass
both low-intermediate level video organization and event
relationships that define specific highlights and situations.
Analysis of the visual data of the video stream permits to
extract hints, identify events and detect highlights. All of this
must be supported by a-priori knowledge of the video domain and
effective reasoning engines capable to capture the inherent
semantics of the visual events.
Application of MPEG-7 descriptors for content-based indexing of sports videos
Author(s):
Michael Hoeynck;
Thorsten Auweiler;
Jens-Rainer Ohm
Show Abstract
The amount of multimedia data available worldwide is increasing every day. There is a vital need to annotate multimedia data in order to allow universal content access and to provide content-based search-and-retrieval functionalities. Since supervised video annotation can be time consuming, an automatic solution is appreciated. We review recent approaches to content-based indexing and annotation of videos for different kind of sports, and present our application for the automatic annotation of equestrian sports videos. Thereby, we especially concentrate on MPEG-7 based feature extraction and content description. We apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information and taking specific domain knowledge into account. Having determined single shot positions as well as the visual highlights, the information is jointly stored together with additional textual information in an MPEG-7 description scheme. Using this information, we generate content summaries which can be utilized in a user front-end in order to provide content-based access to the video stream, but further content-based queries and navigation on a video-on-demand streaming server.
Object-based approach to image-based rendering with linear filters using defocus information
Author(s):
Akira Kubota;
Kiyoharu Aizawa
Show Abstract
In this paper, we present a novel approach to image-based rendering (IBR) for generating an arbitrary view image with arbitrary focus of a scene with two approximately constant depths. The presented method differs from the conventional IBRs using multiple view images in that we acquire two differently focused images at each camera position
and render parallax and focus effects on each object simply by linear filtering of the acquired images without segmentation.
Experimental results on the real images acquired with 4 cameras located in parallel are presented.
Acoustic-based rendering by interpolation of the plenacoustic function
Author(s):
Thibaut Ajdler;
Martin Vetterli
Show Abstract
In the present paper, we study the spatialization of the sound
field in a room, in particular the evolution of room impulse
responses as function of their spatial positions. The presented
technique allows us to completely characterize the sound field in
any arbitrary location if the sound field is known in a certain
finite number of positions. Our technique simply starts from the
measurements of impulse responses in a finite number of positions
and with this information the total sound field can be recreated.
An analytical solution of the problem is given for any rectangular
room. Further, we determine the number and the spacing between the
microphones needed to perfectly reconstruct the sound field up to
a certain temporal frequency.
Nonuniform sampling of image-based rendering data with the position-interval-error (PIE) function
Author(s):
Cha Zhang;
Tsuhan Chen
Show Abstract
In this paper we study the non-uniform sampling problem for image-based rendering (IBR). We first propose a general position-interval error (PIE) function that can help solve practical sampling problems. We then propose two approaches to capturing IBR data non-uniformly based on PIE, namely progressive capturing (PCAP) and rearranged capturing (RCAP). PCAP is applicable for static scenes. Based on the images captured so far, PCAP determines where to take the next image. RCAP, on the other hand, is for capturing both static and dynamic scenes. The goal is to intelligently arrange the positions of a limited number of cameras such that the final rendering quality is optimal. Experimental results demonstrate that the non-uniform sampling approaches outperform the traditional uniform methods.
Inter-frame image enhancement for motion JPEG 2000
Author(s):
Wei Liu;
Wentao Zheng;
Yali Guo
Show Abstract
Video transmission over Internet or wireless channels usually suffers from network congestion. However, for transmission of video coded by Motion JPEG 2000 (MJP2), the effect of network congestion can be greatly moderated by exploiting the progressive transmission. When network congestion occurs, data packets representing detailed information will be dropped selectively, and this will lead to degradation of image quality in the corresponding frames. In this paper, we present an algorithm to enhance the image quality of the degraded frames by utilizing inter-frame correlation. For a degraded frame, its detailed information is recovered from the previously decoded frame. Simulation results show that both the objective and visual quality of these frames are greatly improved.
Selective enhancement of contrast blocks for MPEG/JPEG image compression
Author(s):
Konstantin Y Kupeev;
Zohar Sivan
Show Abstract
In our work, we introduce the block significance measure reflecting
the property of the blocks to be shared by pixels from different image
regions. The blocks B with higher significance value C(B) belong to
the fields of region contrast and thus tend to attract more attention
in terms of the Human Visual System (HVS). The computation of C(B) is
based on the execution of a region merging procedure and determination
of the first partitioning containing a region completely covering
a block B. The introduced measure is related to the flat/texture/edge
block classification but reflects global image properties and differs
from the latter. We study how one may incorporate the measure in the
DCT-based based image/video compression. Experimental results are
presented.
Artifact reduction for MPEG-2 encoded video using a unified metric for digital video processing
Author(s):
Lilla Boroczky;
Yibin Yang
Show Abstract
In this paper we propose a new deringing algorithm for MPEG-2 encoded video. It is based on a Unified Metric for Digital Video Processing (UMDVP) and therefore directly linked to the coding characteristics of the decoded video. Experiments carried out on various video sequences have shown noticeable improvement in picture quality and the proposed algorithm outperforms the deringing algorithm described in the MPEG-4 video standard. Coding artifacts, particularly ringing artifacts, are especially annoying on large high-resolution displays. To prevent the enlargement and enhancement of the ringing artifacts, we have applied the proposed deringing algorithm prior to resolution enhancement. Experiments have shown that in this configuration, the new deringing algorithm has significant positive impact on picture quality.
Superresolution images reconstructed from aliased images
Author(s):
Patrick Vandewalle;
Sabine E. Susstrunk;
Martin Vetterli
Show Abstract
In this paper, we present a simple method to almost quadruple the spatial resolution of aliased images. From a set of four low resolution, undersampled and shifted images, a new image is constructed with almost twice the resolution in each dimension. The resulting image is aliasing-free. A small aliasing-free part of the frequency domain of the images is used to compute the exact subpixel shifts. When the relative image positions are known, a higher resolution image can be constructed using the Papoulis-Gerchberg algorithm. The proposed method is tested in a simulation where all simulation parameters are well controlled, and where the resulting image can be compared with its original. The algorithm is also applied to real, noisy images from a digital camera. Both
experiments show very good results.
Color image sharpening based on collective time-evolution of simultaneous nonlinear reaction-diffusion
Author(s):
Takahiro Saito;
Jun Satsumabayashi;
Hiroyuki Harada;
Takashi Komatsu
Show Abstract
Previously we have presented a method for selective sharpness enhancement of monochrome images. Our method is based on the simultaneous nonlinear reaction-diffusion time-evolution equipped with a nonlinear diffusion term, a reaction term and an overshooting term, and it can sharpen only degraded edges blurred by several causes without increasing the visibility of nuisance factors such as random noise. This paper extends our method to selective sharpening of color images. As to the how to extend it, we take into accounts some variations about the treatment of three color components and the selection of the color space. By experiments, we quantitatively evaluate performance of these variations. Among them, the collective treatment of color components based on the simultaneous full-nonlinear reaction-diffusion time-evolution achieves the best performance, and sharpens blurred color edges selectively much better than the existing sharpness enhancement methods such as the adaptive peaking method.
Scalable video compression using longer motion-compensated temporal filters
Author(s):
Abhijeet V Golwelkar;
John W Woods
Show Abstract
Three-dimensional (3-D) subband/wavelet coding using a motion compensated temporal filter (MCTF) is emerging as a very effective structure for highly scalable video coding. Most previous work has used two-tap Haar filters for the temporal analysis/synthesis. To make better use of the temporal redundancies, we are proposing an MCTF scheme based on longer biorthogonal filters. We show a lifting based coder capable of subpixel accurate motion compensation.
If we retain the fixed size GOP structure of the Haar filter MCTFs, we need to use symmetric extensions at both ends of the GOP. This gives rise to loss of coding efficiency at the GOP boundaries resulting in significant PSNR drops there. This performance can be considerably improved by using a 'sliding window,' in place of the GOP block. We employ the 5/3 filter and its non-orthogonality causes PSNR variation, which can be reduced by employing filter-based weighting coefficients.
Overall the longer filters have a higher coding gain than the Haar filters and show significant improvement in average PSNR at high bit rates. However, a doubling in the number of motion vectors to be transmitted, translates to a drop in PSNR at the lower video bit rates.
Zerotree image compression using anisotropic wavelet packet transform
Author(s):
Rade Kutil
Show Abstract
The anisotropic wavelet packet transform is an extension of the conventional wavelet (packet) transform where the basis can have different scales in different dimensions. As there are certain kinds of images with different behaviour in horizontal and vertical direction, anisotropic wavelet packet bases can be adapted more precisely to these images. Zero-tree image compression has already proved its efficiency on conventional wavelet transformed data as well as for wavelet packets. In this work, zero-tree methods are extended to work with anisotropic wavelet packets and coding results are shown for several types of images.
New directions in video coding
Author(s):
Xin Li
Show Abstract
Video coding still remains a largely open problem after decades of
research. In this paper, we present some new ideas of video coding
from a geometric perspective. We focus on developing an improved
modeling of video source by studying its geometric constraints. It
is suggested that understanding the relationship between
location uncertainty models of motion discontinuity and image
singularity is critical to the efficiency of video coding. We
argue that linearity is the root of problem in conventional motion
compensated predictive coding and propose a classification-based
nonlinear coding framework in both spatial and wavelet domains.
Nonlinear processing tools are advocated for resolving
location-related uncertainty during the exploitation of geometric
constraints.
Motion-compensated predictive subband coding of temporal lowpass frames from a 3D wavelet video coding scheme
Author(s):
Claudia Mayer
Show Abstract
In 3D wavelet video coding schemes, in which a temporal wavelet decomposition of the video data is combined with a spatial wavelet transform, temporal scalability and the reduction of temporal redundancy is often achieved at the expense of a delay. The delay increases according to the number of video frames that are jointly coded or, in other terms, according to the temporal wavelet transform depth. Depending on the system delay that is allowed by a specific application, the maximum temporal transform depth might be limited. On the other hand, consecutive temporal lowpass frames at the highest permitted temporal decomposition level might still be strongly correlated, especially in case of video material with static background or low motion that can be optimally compensated. In this case, the temporal correlation should be exploited to improve the coding efficiency without inducing an additional delay to the overall system. In this paper, we consider a 3D wavelet video coding scheme in which the temporal wavelet decomposition precedes the spatial wavelet decomposition, and investigate the application of a spatially scalable wavelet video coder with in-band prediction to the temporal lowpass frames at the maximum temporal transform depth.
Perceptually adaptive hybrid video encoding based on just-noticeable-distortion profile
Author(s):
Xiaokang Yang;
Weisi Lin;
Zhongkang Lu;
E. P. Ong;
Susu Yao
Show Abstract
In this paper, just noticeable distortion (JND) profile based upon the human visual system (HVS) has been exploited to guide the motion search and introduce an adaptive filter for residue error after motion compensation, in hybrid video coding (e.g., H.26x and MPEG-x). Because of the importance of accurate JND estimation, a new spatial-domain JND estimator (the nonlinear additivity model for masking-NAMM for short) is to be firstly proposed. The obtained JND profile is then utilized to determine the extent of motion search and whether a residue error after motion compensation needs to be consine-tranformed. Both theoretical analysis and experimental data indicate significant improvement in motion search speedup, perceptual visual quality measure, and most remarkably, objective quality (i.e., PSNR) measure.
Image and animation cell coding using wedgelets and beamlets
Author(s):
Wei Siong Lee;
Ashraf A Kassim
Show Abstract
Edges are one of the most perceptually important features in images that provide geometric information to views to distinguish objects in a scene. Wavelet based coding algorithms have been successful in image compression, but like classical cosine transform methods, they are unable to provide a good representation of 2D edges. Cel-based animation sequences, or cartoons, contain predominantly edge information. Although these types of images are relatively less complex than real-life images, the geometrical structures within the cel images challenge even the most complex wavelet-based coders. Wedgelets and beamlets are able to capture geometrical structures in images more efficiently than wavelets. In this paper, we propose a novel hybrid image coder that utilises both wavelets and wedgelets to code cartoon images. Our prototype coder demonstrates significant improvements in visual quality of wedgelet/ beamlet coded cartoon images over standard wavelet-based coding techniques.
Exploring image coding techniques for remote sensing and geographic information systems
Author(s):
Joan Serra;
Cristina Fernandez;
Francesc Auli;
Fernando Garcia
Show Abstract
Remote Sensing and Geographic Information Systems applications are becoming more and more present in everyday life. These applications are based on the growing availability of natural, multi-spectral and hyper-spectral images, but these kind of images imposes a large demand of memory capability due to their large size and increasing resolution.
Well known lossless and lossy image coding techniques have been used to settle down this problem, but RS and GIS applications have some particular requirements that are not taken into account by the standard methods. There is therefore a need to investigate new approaches of image coding for RS and GIS applications.
New quadtree predictive image coding technique using pattern-based classification
Author(s):
Farhad Keissarian
Show Abstract
A new quadtree segmented predictive image coding technique is presented in this paper for exploiting the correlation between adjacent image blocks and uniformity in variable block size image blocks. For exploiting correlation between adjacent image blocks, a predictive coding technique is used for reducing the inter block correlation. The proposed quadtree technique decomposes an image into variable size block, and each segmented block is then coded at a different bit rate according to the level of visual activity inside the block. A novel classification scheme, which operates based on
the distribution of the block residuals, is employed to determine the activity level inside the block. In this method, the orientation of the pattern appearing inside the block will be computed as an aid to the classification. To preserve edge integrity, a block pattern based coding technique is proposed and incorporated to the predictive coding method for coding the high-activity blocks of the segmented image. The use of a set of parameters associated with the pattern
appearing inside a high activity block at the receiver, together with the inter block correlation information, reduce the cost of instruction and saves the encoding time. Experiments have been conducted to compare with other predictive and quadtree- based techniques. Results show a lower bit rate at competitive reconstruction quality.
Fast local motion estimation algorithm using elementary motion detectors
Author(s):
Eiji Nakamura;
Takehito Nakamura;
Katsutoshi Sawada
Show Abstract
This paper presnts a fast local motion estimation algorithm based on so called elementary motion detectors or EMDs. EMDs, modeling insect’s visual signal processing systems, have low computational complexity aspects and can thus be key components to realize such a fast local motion estimation algorithm. The contribution of the presented work is to introduce dual parameter estimators or DPEs by configuring EMDs so that they can estimate local motions in terms of both direction and speed mode parameters simultaneously. The estimated
local motion vectors are displayed as arrows superimposed over video image frames. The developed algorithm is implmented in a DirectShow application by using Mircosoft’s DirectX runtime library and is evaluated using various types of video image sequences. It is found to be able to estimate local motion vectors in real time even
in moderate PC computing platforms and hece no high profile hardware devices are needed for its real time operation.
Feature-assisted search technique for motion estimation
Author(s):
Jae Hun Lee;
Jong Beom Ra
Show Abstract
Motion estimation (ME) is one of the most time-consuming parts in video encoding system, and significantly affects the quality of the reconstructed sequences. In this paper, we propose a new fast algorithm, or the Feature Assisted Search Technique for Motion Estimation (FASTME), which is a gradient descent search combined with a new search strategy, namely, one-dimensional feature matching based on selective integral projections (1DFM). 1DFM assists a local search around an initial search center with a low computational complexity. After performing 1DFM, a small diamond search pattern and more compact horizontal and vertical search patterns, are adaptively used according to the result of 1DFM. The proposed algorithm outperforms existing fast algorithms in terms of speed up performance. In addition, its PSNR performance is better or similar to those of existing fast algorithms.
Fast half-pixel motion estimation based on directional search and a linear model
Author(s):
Yun-Gu Lee;
Jae Hun Lee;
Jong Beom Ra
Show Abstract
As many fast integer-pixel motion estimation algorithms have become available, an integer-pixel motion vector can be found by examining less than 10 search points. Meanwhile, 8 half-pixel positions around the integer-pixel motion vector are to be examined for half-pixel motion estimation. Hence, it becomes more meaningful to reduce the computational complexity of half-pixel motion estimation. In this paper, we propose a fast half-pixel motion estimation algorithm, by combining a directional search and linear modeling of SAD curve. The proposed algorithm reduces the number of search points to 2.21 in average, while the image quality of reconstructed sequences in terms of PSNR is similar to existing fast half-pixel motion estimation algorithms. In addition, by adjusting a user-defined parameter, the proposed algorithm can significantly reduce the number of search points to 0.34 on average with a slight PSNR degradation.
Fast disparity estimation algorithm for mesh-based stereo image/video compression with two-stage hybrid approach
Author(s):
Shao-Yi Chien;
Shu-Han Yu;
Li-Fu Ding;
Yun-Nien Huang;
Liang-Gee Chen
Show Abstract
Disparity estimation is a very important operation in stereo image and video compression. However, existing disparity estimation algorithms require too large computation power. In this paper, we propose a fast disparity estimation algorithm for mesh-based stereo image and video compression with two-stage hybrid approach, which is names as Two Stage Iterative Block and Octagonal Matching algorithm (TS-IBOM). The first stage of this algorithm is Iterative Block Matching algorithm, which can give a good initial guess of the disparity vectors with a little computation. In the second stage, Iterative Octagonal Matching algorithm is employed to refine the disparity vectors. A local updating scheme is also proposed to further accelerate the process by skipping nodes whose disparity vectors have been derived already. Experimental results show that the proposed algorithm can give good results and is 18 times faster than Octagonal Matching. This algorithm is suitable to be employed in mesh-based stereo image and video compression systems.
An efficient and direct nonplanar rotation estimation algorithm for video applications
Author(s):
Mireya S Garcia;
Henri Nicolas
Show Abstract
We present a new method for the estimation of non-planar rotations, i.e. rotations around axis parallel to the image plane, in the context of video compression applications. This method is based on a non planar rotation model which assumes that the moving objects have a planar surface. The proposed block-based motion estimation approach is performed between consecutive or non-consecutive images, which may be contained large displacements, and aims at minimizing the motion compensation error. The efficiency of the method has been compared to the results obtained with the classical full search block matching approach. Experimental results have been done on real video sequences. These results show a significant gain in term of PSNR for the motion compensated P or B frames, compared to the classical full search block matching approach, while the coding cost of the additional motion information is very low, which demonstrates the interest of the proposed rotation model in the context of motion compensation for video compression applications.
Global motion estimation algorithm for video segmentation
Author(s):
Edmundo Saez;
Jose Manuel Palomares;
Jose Ignacio Benavides;
Nicolas Guil
Show Abstract
This work presents a new global motion estimation algorithm for MPEG compressed video sequences. It makes use of a feature extraction technique based on the Generalized Hough Transform, which is able to provide rotation, scale and displacement parameters when comparing two different frames from a video sequence. Thus, pan, tilt, swing (rotation along z-axis) and zoom effects will be studied using the proposed algorithm.
DC coefficients of the DCT transform are extracted from the MPEG stream and used to create DC images, which are the starting point for the global motion estimation algorithm. Further application of the feature extraction technique to DC images will allow to perform motion estimation, reducing processing time as the decompression process is avoided.
Pseudocode and details about the implementation of the algorithm as well as statistics to illustrate the efficiency of the algorithm are provided.
Fast motion estimation algorithm for H.264/MPEG-4 AVC by using multiple reference frame skipping criteria
Author(s):
Bing-Yu Hsieh;
Yu-Wen Huang;
Tu-Chih Wang;
Shao-Yi Chien;
Liang-Gee Chen
Show Abstract
The emerging video coding standard, H.264/MPEG-4 AVC, allows to use multiple reference frames for motion estimation to enhance temporal prediction. Exhaustive search of each frame requires a lot of computation which is propotional to the number of searched frames. However, the reduction of prediction residues is highly dependent on the characteristics of video sequences. In many cases, searching more reference frames contributes to nothing but only waste of computation. In this paper, we proposed a novel algorithm to accelerate the motion estimation by using reference frame skipping criteria. We adopted selected macroblock mode, intra/inter frame prediction residues, compactness of motion vectors, and scene changes in the criteria. By using these criteria, each macroblock can determine whether it is necessary to keep on searching more reference frams after the block matching process in the first reference frame. Simulation results show that the proposed algorithm can save up to 80% of computation without noticeable degradation of video quality.
Fast motion estimation and mode decision with variable motion block sizes
Author(s):
Woong Il Choi;
Jeyun Lee;
Sungmo Yang;
Byeungwoo Jeon
Show Abstract
The adaptive coding schemes in H.264 standard provide a significant coding efficiency and some additional features like error resilience and network friendliness. The variable block size motion compensation using multiple reference frames is one of the key H.264 coding elements to provide notable performance gain. However it is also the main culprit that increases the overall computational complexity. For this reason, this paper proposes a fast algorithm for variable block size motion estimation in H.264. In addition, we also propose a fast mode decision scheme by classifying modes based on rate-distortion cost. The experimental results show that the combined proposed methods provide significant improvement in processing speed without noticeable coding loss.
Robust approach for color image quality assessment
Author(s):
Patrick Le Callet;
Dominique Barba
Show Abstract
This paper presents a visual color image quality metric assessment with full reference image. The metric is highly based on human visual system properties in order to get the best correspondence with human judgements. Contrary to some others objective criteria, it doesn't use any a priori knowledge on the type of introduced degradations. So the main interest of the metric is on its ability to produce robust results independently of the distortions. The metric can be decomposed into two steps. The first one projects each images, the reference one and the distorted one, in a perceptual space. The second step achieves the pooling of errors between perceptual representation of two images in order to get a score for the overall quality. Since we have shown that these two steps have equivalent importance regarding metric performance, we have particularly paid attention in correct balancing when designing the two steps. Especially, for the second one, that is generally limited to poor consideration in literature, we have developed some new original approaches . We compare results of the metric with human judgments on images distorted with different compression schemes. High performances are obtained leading to assure that the metric is robust, so this approach constitutes an alternative useful tool to PSNR for image quality assessment.
New perceptual quality assessment method with reduced reference for compressed images
Author(s):
Mathieu Carnec;
Patrick Le Callet;
Dominique Barba
Show Abstract
This paper presents a new method to measure the quality of compressed images. The method is based on a Human Visual System model and extracts perceptual structural information from images. This model is implemented and perceptual representations of images are built. These representations describe the structural information of images. For quality assessment, the representation of the original image, actually a reduced reference, is compared to the representation of the distorted image using similarity measures. Similarity scores have shown to be highly correlated with the quality of images produced by human observers in experiments. So the novelty of this method is that structural information is used to assess the quality. This method has been implemented in an application called "Smart Compress" (freely available on the Internet) which allows the user to compress images in JPEG format by choosing the visual quality of the output images.
Methodologies for objective evaluation of video segmentation quality
Author(s):
Paulo Lobato Correia;
Fernando Manuel Ber Pereira
Show Abstract
The major objective when working with image and video segmentation is to design an algorithm that produces appropriate segmentation results for the particular goals of the application addressed. The problem of video segmentation quality assessment thus assumes a crucial importance in evaluating the appropriateness of a segmentation algorithm for a given application. This problem has received considerably less attention in the literature than the development of the segmentation algorithms themselves. The purpose of this paper is to propose methodologies for the objective evaluation of video segmentation quality.
Evaluation model considering static-temporal quality degradation and human memory for SSCQE video quality
Author(s):
Yuukou Horita;
Takamichi Miyata;
Irwan Prasetya Gunawan;
Tadakuni Murai;
Mohammed Ghanbari
Show Abstract
To perform Quality of Service (QoS) control of video communication more efficiently, it is necessary to develop an objective quality evaluation method for coded video. Many proposed conventional methods to obtain video quality require the availability of both reference and processed video sequence. However, in case of re-encoding
the coded video stream at the receiver side where reference video sequence is not present, it is impossible to do such a full-reference evaluation. Therefore, we have developed a video quality evaluation model by using reduced reference for evaluated value obtained by SSCQE method. In this approach, we use some features extracted from
reference video. It is called reduced-reference method by VQEG. Transmitting these features with coded video, the proposed model can estimate the video quality, even in the absent of the full original video in the decoder side. The video quality rating obtained from the proposed model shows good agreement with subjective quality.
Spatially adaptive denoising using mixture modeling of wavelet coefficients
Author(s):
Il Kyu Eom;
Yoo-shin Kim
Show Abstract
A wavelet coefficient is generally classified into two categories: significant (large) and insignificant (small). Therefore, each wavelet coefficient is efficiently modelled as a random variable of a Gaussian mixture distribution with unknown parameters. In this paper, we propose an image denoising method by using mixture modelling of wavelet coefficients. The coefficient is classified as either noisy or clean by using proper threshold [2]. Based on this classification, binary mask value that takes an important role to suppress noise is produced. The probability of being clean signal is estimated by a set of mask values. Then we apply this probability to design Wiener filter to reduce noise and also develop the method of selecting windows of different sizes around the coefficient. Despite the simplicity of our method, experimental results show that our method outperforms other critically sampled wavelet denoising schemes.
On-chip digital noise reduction for integrated CMOS cameras
Author(s):
Markus Rullmann;
Jens-Uwe Schluessler;
Rene Schueffny
Show Abstract
We propose an on-line noise reduction system especially designed for noisy CMOS image sensors. Image sequences from CMOS sensors in general are corrupted by two types of noise, temporal noise and fixed pattern noise (FPN). It is shown how the FPN component can be estimated from a sequence. We studied the theoretical performance of two different approaches called direct and indirect FPN estimation. We show that indirect estimation gives superior performance, both theoretically and by simulations. The FPN estimates can be used to improve the image quality by compensating it. We assess the quality of the estimates by the achievable SNR gains. Using those results a dedicated filtering scheme has been designed to accomplish both temporal noise reduction and FPN correction by applying a single noise filter. It allows signal gains of up to 12dB and provides a high visual quality of the results. We further analyzed and optimized the memory size and bandwidth requirements of our scheme and conclude that it is possible to implement it in hardware. The required memory size is 288kByte and the memory access rate is 70MHz. Our algorithm allows the integration of noisy CMOS sensors with digital noise reduction and other circuitry on a system-on-chip solution.
RM-filtering procedures in image and video processing
Author(s):
Volodymyr Ponomaryov D.D.S.;
Francisco J. Gallegos-Funes;
Lyubov O. Ponomaryova
Show Abstract
In this paper we present the robust Cascaded RM-filter to remove the mixture of impulsive and multiplicative noise from corrupted images. The designed filter uses combined R- and M- estimators called RM-estimators. The capability of the impulsive noise removal in the image processing applications by using of the RM-KNN (MM-KNN, WM-KNN, ABSTM-KNN or MoodM-KNNN) are presented. The presented Cascaded RM-filter is the consequential connection of two filters. The first filter uses one of the proposed filters to provide impulsive noise rejection and detail preservation. The second one uses an M-filter to provide multiplicative noise suppression. We investigated various types of influence functions used in the M-estimator of designed filter. Extensive simulation results in reference images have demonstrated that the proposed filters consistently can outperform other nonlinear filters by balancing the tradeoff between noise suppression and detail preservation. The criterions used to compare performance were the PSNR and MAE. Finally, we also present the real-time implementation of proposed filter using DSP TMS320C6701 to demonstrate that it potentially provides a real-time solution to quality TV/video transmission.
Three-dimensional entropy vector median filter for color video filtering
Author(s):
Rastislav Lukac;
Bogdan Smolka;
Konstantinos N. Plataniotis;
Anastasios N. Venetsanopoulos
Show Abstract
We provide a new non-motion compensated adaptive multichannel filter for the detection and removal of impulsive noise, bit errors and outliers in color video or color image sequences. The proposed nonlinear filter takes the advantages of the concept of the local entropy contrast and the robust order-statistics theory. The new entropy based vector median is computationally attractive, robust for a wide range of the impulsive noise corruption and significantly improves the signal-detail preservation capability of standard vector median filter. Because the precision of statistical operators such as mean or entropy increases with the increased number of observed samples, the used spatiotemporal cube filter window guarantees a high accuracy of the proposed method that is able to achieve excellent results in terms of commonly used objective measures and clearly outperforms standard vector filtering schemes
Application of kernel density estimation for color image filtering
Author(s):
Bogdan Smolka;
Rastislav Lukac;
Konstantinos N. Plataniotis;
Anastasios N. Venetsanopoulos
Show Abstract
This paper presents a new filtering scheme for the removal of impulsive noise in color images. It is based on estimating the probability density function for color pixels in a filter window by means of the kernel density estimation method. A quantitative comparison of the proposed filter with the vector median filter shows its excellent ability to reduce noise while simultaneously preserving fine image details.
Modified anisotropic diffusion framework
Author(s):
Bogdan Smolka;
Rastislav Lukac;
Konstantinos N. Plataniotis;
Anastasios N. Venetsanopoulos
Show Abstract
In this paper a novel approach to the problem of edge preserving noise reduction in color images is proposed and evaluated. The new algorithm is based on the combined forward and backward anisotropic diffusion with incorporated time dependent cooling process. This method is able to efficiently remove image noise, while preserving and even enhancing image edges. The proposed algorithm can be used as a first step of different techniques, which are based on color, shape and spatial location information.
Color image denoising with wavelet thresholding based on human visual system model
Author(s):
Kai-qi Huang;
Zhen-yang Wu
Show Abstract
This paper presents a new approach to color image denoising under consideration of human visual system (HVS) model. The denosing process takes place in the wavelet transform domain. A contrast sensitivity function (CSF) implementation is employed into wavelet-based algorithm based on an invariant single factor weighting per subband and noise masking in succession. Experimental results show that the new approach is good in terms of perceptual error metrics and visual effect.
Lossless coding using predictors and VLCs optimized for each image
Author(s):
Ichiro Matsuda;
Noriyuki Shirai;
Susumu Itoh
Show Abstract
This paper proposes an efficient lossless coding scheme for still images. The scheme utilizes an adaptive prediction technique where a set of linear predictors are designed for a given image and an appropriate predictor is selected from the set block-by-block. The resulting prediction errors are encoded using context-adaptive variable-length codes (VLCs). Context modeling, or adaptive selection of VLCs, is carried out pel-by-pel and the VLC assigned to each context is designed on a probability distribution model of the prediction errors. In order to improve coding efficiency, a generalized Gaussian function is used as the model for each context. Moreover, not only the predictors but also parameters of the probability distribution models are iteratively optimized for each image so that a coding rate of the prediction errors can have a minimum. Experimental results show that the proposed coding scheme attains comparable coding performance to the state-of-the-art TMW scheme with much lower complexity in the decoding process.
A study on multiresolution lossless video coding using inter/intra frame adaptive prediction
Author(s):
Takayuki Nakachi;
Tomoko Sawabe;
Tetsuro Fujii
Show Abstract
Lossless video coding is required in the fields of archiving and editing digital cinema or digital broadcasting contents. This paper combines a discrete wavelet transform and adaptive inter/intra-frame prediction in the wavelet transform domain to create multiresolution lossless video coding. The multiresolution structure offered by the wavelet transform facilitates interchange among several video source formats such as Super High Definition (SHD) images, HDTV, SDTV, and mobile applications. Adaptive inter/intra-frame prediction is an extension of JPEG-LS, a state-of-the-art lossless still image compression standard. Based on the image statistics of the wavelet transform domains in successive frames, inter/intra frame adaptive prediction is applied to the appropriate wavelet transform domain.
This adaptation offers superior compression performance. This is achieved with low computational cost and no increase in additional information. Experiments on digital cinema test sequences confirm the effectiveness of the proposed algorithm.
Modifying integer wavelet transforms for scalable near-lossless image coding
Author(s):
G. Charith K. Abhayaratne
Show Abstract
In near-lossless image coding, each reconstructed pixel of the
decoded image differs from the corresponding one in the original
image by not more than a pre-specified value. Such schemes are mainly based on predictive coding techniques, which are not capable of scalable decoding. Lossless image coding with scalable decoding is mainly based on integer wavelet transforms. In this paper, methods to modify integer wavelet transforms for near-lossless image coding with scalable decoding features are presented. This is achieved by incorporating the near-lossless quantisation process, driven by the pre-specified value, into lifting steps (online quantisation). Two online quantisation techniques based on 1-D and 2-D transforms are presented. They outperform the pre-quantisation based near-lossless image coding method in both bit rate and rms error performances. Further, they result in both subjectively and objectively superior performance in spatial and bit rate scalable decoding. The 2-D online
scheme results in comparable performance with JPEG-LS, which is not capable of scalable decoding. It is evident from this research that with these novel schemes, scalable decoding features can be integrated into near-lossless coding with only a small increase in
bit rate compared to those achieved in JPEG-LS.
Reversible integer-to-integer transforms and symmetric extension of even-length filter banks
Author(s):
Brendt Wohlberg;
Christopher Brislawn
Show Abstract
The recent JPEG2000 image coding standard includes a lossless coding mode based on reversible integer to integer filter banks, which are constructed by inserting rounding operations into the filter bank lifting factorisation. The baseline (Part 1) of the JPEG2000 standard supports a single reversible filter bank, the finite length input to which is symmetrically extended to avoid difficulties at the boundaries. While designing support for arbitrary filter banks for Part 2 of the standard, we discovered that reversibility is not always possible for even length integer to integer filter banks combined with symmetric pre-extension.
Orthonormal integer block transforms for lossless coding: design and performance analysis
Author(s):
G. Charith K. Abhayaratne
Show Abstract
In this paper a general framework for N-point, where N is any positive integer power of 2, orthonormal block transforms that map
integers to integers using the lifting scheme is presented. The
design of the Discrete Cosine Transform (DCT) that maps integers
to integers (I2I-DCT), the Discrete Sine Transform that maps
integers to integers (I2I-DST) and the Walsh Hadamard Transform
that maps integer to integers (I2I-WHT) is presented. The main
significance of this design is that the orthonormal property of
the transforms is maintained by proper normalisation while
preserving the integer to integer mapping and perfect reconstruction. This makes these transforms usable in both lossless and lossy image coding, especially in scalable lossless coding. The coding performance of these transforms is evaluated in lossless image coding and lossless video coding applications.
Hardware platform for region extraction in foveal images
Author(s):
Francisco J. Coslado;
Martin Gonzalez;
Pelegrin Camacho;
Francisco Sandoval
Show Abstract
Foveal sensors can substantially increase the performance of active vision systems because of their ability to handle wide field of view and simultaneously reducing the data/bandwidth with space variant sensing. In order to process the multiresolution images and associated data structures, a new hierarchical processing has been applied to minimize data communications and retrieval. In the paper we present a hardware platform implementing a level sequential
segmentation algorithm in one of these hierarchical structures generated using a Cartesian lattice topology. This platform is designed to work at 33 frame/s using as an input the levels obtained after preprocessing the uniform resolution images from digital cameras.
Enabling real-time H.26L video services over wireless ad hoc networks using joint admission and transmission power control
Author(s):
Yong Pei;
James W. Modestino;
Qi Qu;
Xiaochun Wang
Show Abstract
In a wireless ad hoc network, packets are sent from node-to-node in a multihop fashion until they reach the destination. In this paper we investigate the capacity of a wireless ad hoc network in supporting packet video transport. The ad hoc network consists of n homogeneous video users with each of them also serving as a relay node for other users. We investigate how the time delay aspects the video throughput in such an ad hoc network and how to provide a time-delay bounded packet video delivery service over such a network? The analytical results indicate that appropriate joint admission and power control have to be employed in order to efficiently utilize the network capacity while operating under the delay constraint as the distance between source and destination changes.
SHD digital cinema distribution over a long distance network of Internet2
Author(s):
Takahiro Yamaguchi;
Daisuke Shirai;
Tatsuya Fujii;
Mitsuru Nomura;
Tetsuro Fujii;
Sadayasu Ono
Show Abstract
We have developed a prototype SHD (Super High Definition) digital cinema distribution system that can store, transmit and display eight-million-pixel motion pictures that have the image quality of a 35-mm film movie. The system contains a video server, a real-time decoder, and a D-ILA projector. Using a gigabit Ethernet link and TCP/IP, the
server transmits JPEG2000 compressed motion picture data streams to the decoder at transmission speeds as high as 300 Mbps. The received data streams are decompressed by the decoder, and then projected onto a screen via the projector. With this system, digital cinema contents can be distributed over a wide-area optical gigabit IP network.
However, when digital cinema contents are delivered over long distances by using a gigabit IP network and TCP, the round-trip time increases and network throughput either stops rising or diminishes. In a long-distance SHD digital cinema transmission experiment performed on the Internet2 network in October 2002, we adopted enlargement of the TCP window, multiple TCP connections, and shaping function to control the data transmission quantity. As a result,
we succeeded in transmitting the SHD digital cinema content data at about 300 Mbps between Chicago and Los Angeles, a distance of more than 3000 km.
Multimodal browsing using VoiceXML
Author(s):
Giuseppe Caccia;
Rosa C. Lancini;
Giuseppe Peschiera
Show Abstract
With the increasing development of devices such as personal computers, WAP and personal digital assistants connected to the World Wide Web, end users feel the need to browse the Internet through multiple modalities. We intend to investigate on how to create a user interface and a service distribution platform granting the user access to the Internet through standard I/O modalities and voice simultaneously. Different architectures are evaluated suggesting the more suitable for each client terminal (PC o WAP). In particular the design of the multimodal usermachine interface considers the synchronization issue between graphical and voice contents.
Dynamic power scheduling system for JPEG 2000 delivery over wireless networks
Author(s):
Maurizio Martina;
Fabrizio Vacca
Show Abstract
Third generation mobile terminals diffusion is encouraging the development of new multimedia based applications. The reliable transmission of audiovisual content will gain major interest being one of the most valuable services. Nevertheless, mobile scenario is severely power constrained: high compression ratios and refined energy
management strategies are highly advisable. JPEG2000 as the source encoding stage assures excellent performance with extremely good visual quality. However the limited power budged imposes to limit the computational effort in order to save as much power as possible. Starting from an error prone environment, as the wireless one, high error-resilience features need to be employed. This paper tries to investigate the trade-off between quality and power in such a challenging environment.
MPEG-7 and image understanding systems
Author(s):
Gary Kuvich
Show Abstract
Inexpensive computer hardware and optical devices has made image/video applications available even for private individuals. This has created a huge demand for image and multimedia databases and other systems, which work with visual information. Analysis of visual information has not been completely formalized and automated yet. The reason for that is a long tradition of separation of vision and knowledge subsystems. However, brain researches show that vision is a part of a larger information system that converts visual information into knowledge structures. These structures drive vision process, resolve ambiguity and uncertainty in real images via feedback projections, and provide image understanding that is an interpretation of visual information in terms of such knowledge models. It is hard to split such system apart. Brain does not recreate 3-D image of visual scene, but analyzes an image as a graph-type decision structure created via multilevel hierarchical compression of visual information. Vision mechanisms can never be completely understood separately from the informational processes related to knowledge and intelligence. MPEG-7 is an industry-wide effort to incorporate knowledge into image/video code. This article describes basic principles of integration low-level image processing with high-level knowledge reasoning, and shows how Image Understanding systems can utilize MPEG-7 standard. Such applications can add to the standard the power of image understanding.
MPEG-7 content-based analysis/retrieval system and its applications
Author(s):
Jin-Hau Kuo;
Ja-Ling Wu
Show Abstract
In this paper, we proposed a content-based multimedia analysis/retrieval system basing mainly on the MPEG-7 defined
features. Some new and specific features are also included for supporting various requirements of different applications.
The proposed system is capable of extracting a variety of features from different kinds of media data, such as videos, audios, and images. The system design is well modularized, so that the tasks of system development and feature construction can be done independently and separately. In other words, adding a new feature into the system can be done without modifying the structure of the system. In addition, a user can also input some hints about what he or she is
looking for into the system, and the system will search through the database and return the best matched candidates. On the basis of this well-modularized system, several interesting applications have been investigated and realized in our Lab, to show the effectiveness and flexibility of the proposed work.
Grouping viewpoint images into scenes based on similarity between frames
Author(s):
Takeshi Nagasaki;
Masashi Toda;
Toshio Kawashima
Show Abstract
We propose a method of automatically generating a snapshot sequence, which describes usual events, from head-mounted video camera. And we discuss a relationship between the snapshot sequence and behavior. We extract a snapshot from its video frames when the head hardly moved. This condition appears when the subject kept observation. As the eye (head) motion relates to his behavior, the snapshot sequence relates to his behavior, too. At its condition, neighbor frames are highly similar. So, we judge the head motion by similarity between neighbor frames. We use similarity estimation method by local grayvalue invariants.
Video indexing application based on watermarking using turbocode and side information
Author(s):
Agnese Buccoliero;
Rosa C. Lancini;
Francesco Mapelli;
Stefano Tubaro
Show Abstract
The aim of this paper is to present a technique to hide data in digital video based on a watermarking approach. The algorithm exploits the side information in the embedding phase; it uses error-correcting codes to improve the achievable robustness and different spatial and temporal masking function to hide the watermark. The watermarking approach is based on an algorithm working in the uncompressed video domain in order to reduce the computational cost. The proposed scheme has been implemented and tested under different attacks, in particular compression and transcoding attacks. The results show that side information and error correcting codes strongly improve the robustness.
Image retrieval using dynamic spatial chromatic histograms
Author(s):
Gianluigi Ciocca;
Raimondo Schettini
Show Abstract
Image retrieval is a two steps process: 1) indexing, in which a set or a vector of features summarizing the properties of each image in the database, is computed and stored; and 2) retrieval, in which the features of the query image are extracted and compared with the others in the database. The database images are then ranked in order of their similarity. We introduce an innovative image retrieval strategy, the Dynamic Spatial Chromatic Histogram, which makes it possible to take into account spatial information in a flexible way without greatly adding to computation costs. Our preliminary results on a database of about 3000 images show that the proposed indexing and retrieval strategy is a powerful approach
Face recognition with non-negative matrix factorization
Author(s):
Menaka Rajapakse;
Lonce Wyse
Show Abstract
A face can conceptually be represented as a collection of sparsely distributed parts: eyes, nose, mouth etc.We use Non-negative Matrix Factorization (NMF) to yield sparse representation of localized features to represent distributed parts over a human face. This paper explores the potential of NMF for face recognition and the possibilities for gender-based features in face reconstruction. Further, we compare the results of NMF with other common face recognition methods.
Human face detection for automatic classification and annotation of personal photo collections
Author(s):
Bertrand Chupeau;
Vincent Tollu;
Jurgen Stauder;
Izabela Grasland
Show Abstract
The recent proliferation of digital images captured by digital cameras and, as a result, the users’ needs for automatic
annotation tools to index huge multimedia databases arouse a renewed interest in face detection and recognition technologies. After a brief state-of-the-art, the paper details a model-based face detection algorithm for color images, based on skin color and face shape properties. We compare a stand-alone model-based approach with a hybrid approach in which this algorithm is used as a pre-processor to provide candidate faces to a supervised SVM classifier.
Experimental results are presented and discussed on two databases of 250 and 689 pictures respectively. Application to a system to automatically annotate the photos of a personal collection is eventually discussed from the human factors point of view.
Towards fast feature adaptation and localization for real-time face recognition systems
Author(s):
Fei Zuo;
Peter H.N. de With
Show Abstract
In a home environment, video surveillance employing face detection and recognition is attractive for new applications. Facial feature (e.g. eyes and mouth) localization in the face is an essential task for face recognition because it constitutes an indispensable step for face geometry normalization. This paper presents a new and efficient feature localization approach for real-time personal surveillance applications with low-quality images. The proposed approach consists of three major steps: (1) self-adaptive iris tracing, which is preceded by a trace-point selection process with multiple initializations to overcome the local convergence problem, (2) eye structure verification using an eye template with limited deformation freedom, and (3) eye-pair selection based on a combination of metrics. We have tested our facial feature localization method on about 100 randomly selected face images from the AR database and 30 face images downloaded from the Internet. The results show that our approach achieves a correct detection rate of 96%. Since our eye-selection technique does not involve time-consuming deformation processes, it yields relatively fast processing. The proposed
algorithm has been successfully applied to a real-time home video surveillance system and proven to be an effective and computationally efficient face normalization method preceding the face recognition.
Hybrid method of holistic and analytic approach for face recognition
Author(s):
Weigang Chen;
Feihu Qi
Show Abstract
A two-stage face recognition method is proposed in this paper. In the first stage, we present a hybrid method of GA and gradient-based algorithm for training NMF. The set of candidates is narrowed down with the Euclidean distance in NMF subspace serving as the global similarity. In the second stage, face image is segmented into facial bands, and synergetic approach is employed for further recognition. The similarity between a given region of the query image and a stored sample is obtained as the order parameter which serves as an element of the order vector. A modified definition of the potential function is given, and the dynamic model of recognition is derived from it. The effectiveness of the proposed method is experimentally confirmed.
Fault-induced attack on semi-fragile image authentication schemes
Author(s):
Yongdong Wu;
Changsheng Xu
Show Abstract
This paper describes an attack on semi-fragile image authentication schemes proposed in papers. In this attack, the adversary manipulates an authentic image and queries a verifier with the corrupted image. According to the answers from the verifier, the adversary can disclose the secret relationship graphs used to produce a signature. With the disclosed relationship graphs, the adversary can impersonate an innocent person to forge authentic images easily. A
countermeasure to this attack is to change scheme parameters with the relationship edges so that the relationship graphs reconstructed by the attacker are different from the original one. Sequentially, the attacker is hard to forge an authentic image without correct relationship graphs.
Wavelet-based steganalysis using a computational immune system approach
Author(s):
Jacob T. Jackson;
Gregg H. Gunsch;
Roger L. Claypoole Jr.;
Gary B. Lamont
Show Abstract
Current signature-based approaches for detecting the presence of hidden messages in digital files - the initial step in steganalysis - rely on discerning "footprints" of steganographic tools. Of greater recent concern is detecting the use of novel proprietary or "home-grown" tools for which no signature has been established. This research focuses on detecting anomalies in seemingly innocuous images, applying a genetic algorithm within a computational immune system to leverage powerful image processing through wavelet analysis. The sensors developed with this new, synergistic system demonstrated a surprising level of capability to detect the use of steganographic tools for which the system had no previous exposure.
Hiding optical watermarks in hard copy images with reducing degradation of halftone quality
Author(s):
Tadahiko Kimoto
Show Abstract
A digital image printed as a binary halftone image can have an optical watermark which becomes visible only if the associated halftone image is superimposed on it. In the scheme of template-dot halftoning, such pair of images can be generated by using different halftone cells on each image at a watermark dot. To make the watermark
invisible on either image, a halftone cell should be randomly selected among those available for representing an identical halftone level. This randomness results in noisy appearance of the generated halftone images. In this paper, to reduce the degradation of halftone quality the template cells used for each halftone level are limited
to two cells. Either one of the two template cells is randomly selected and used to generate a pair of halftone images with hiding optical watermarks in them. A set of two alternative template cells used for each halftone level which yields less noisy halftone quality is investigated through computer simulation. An example of such set of clustered-dot type cells and that of dispersed-dot type cells are presented.
Flexible authentication of images
Author(s):
Yanjiang Yang;
Feng Bao;
Robert Deng
Show Abstract
In this paper, we propose a novel authentication method for digital images that achieves fine authenticating granularity (e.g., block-level granularity). Its appealing flexibility lies in the fact that with only one piece of authenticating data (e.g., digital signature or message authentication code) spanning all blocks, each block can be independently authenticated while ignoring the other blocks. In our method, the authenticating data are derived by means of the
Chinese Remainder Theorem. To further attest its flexibility, the method provides adjustable confidence by managing the public system parameters. Security issues, advantages, as well as limitations related to the method are addressed. A wide range of potential applications are envisioned to demonstrate its practical significance.
Slant transform watermarking for digital images
Author(s):
Anthony Tung Shuen Ho;
Xunzhan Zhu;
Jun Shen
Show Abstract
In this paper, we propose a digital watermarking algorithm based on the Slant transform for the copyright protection of images. Our earlier research work associated with the fast Hadamard transform for robust watermark embedding and retrieval of images and characters suggests that this transform could also provide a good “hidden” space for digital watermarking. The Slant transform has many similar properties to the Walsh-Hadamard transform. In terms of transform coding, the Slant transform is considered to be a sub-optimum orthogonal transform for energy compaction. For digital watermarking, the energy spread becomes a significant advantage, as there is now a good spread of middle to higher frequencies with significant energies for robust information hiding. In this paper, an analytical comparative study on the performance of the Slant transform adapting our earlier watermarking schemes for fast Hadamrd transform will be performed based on its robustness against various Stirmark attacks. The performance results of the Slant transform for image watermarking against other transforms such as Cosine transform will also be presented.
View-dependent transmission of three-dimensional mesh models using hierarchical partitioning
Author(s):
Sung-Yeol Kim;
Jeong-Hwan Ahn;
Yo-Sung Ho
Show Abstract
In this paper, we propose a new scheme for sending three-dimensional (3-D) mesh models. In order to make a viewdependent representation of 3-D mesh models, we combine sequential and progressive mesh transmission techniques. After we partition a 3-D mesh model into a hierarchical tree, we determine the amount of information for each submesh. Then, we can send the 3-D model information by view-dependent selection with mesh merging and splitting operations.
Experimental results have demonstrated that the proposed scheme can send mesh information adaptively through mesh merging and splitting operations and provides good visual quality in a limited bandwidth channel.
Spatio-temporal view interpolation in real-time
Author(s):
Torsten Radtke
Show Abstract
An algorithm is presented to view interpolation of dynamic events in real-time across time and space. Two temporal and two spatial flow fields are estimated from four images captured by two cameras at two different times. Hybrid gradient- and correlation-based motion estimation is used to compute optical flow fields with high density and accuracy. Based on the flow fields, texture coordinates of small textured squares are computed and a new image is composed at an arbitrary viewpoint and time. Real-time processing is possible through
vectorized implementation of computational demanding functions and visualization using OpenGL and standard graphics hardware. The spatio-temporal view interpolation algorithm is applicable to non-rigid events, does not use explicit 3D models, and requires no user input.
Weighted bit allocation for multiresolution 3D mesh geometry compression
Author(s):
Frederic Payan;
Marc Antonini
Show Abstract
In this paper, we propose an efficient geometry compression method
well-adapted to densely sampled semi-regular triangular mesh. Based on
multiresolution analysis performed by wavelet transform, it includes a
low complexity model-based bit allocation across wavelet subbands. The
main contribution of this paper is the resolution of the sub-optimal
bit allocation problem related to biorthogonal wavelet coders for 3D triangular meshes. Indeed, using biorthogonal filters weights the MSE distortion of the reconstructed quantized mesh. These weights can be computed from the wavelet filter bank. This permits to obtain a simple but efficient surface adapted distortion criterion for 3D wavelet coefficient coordinates, and to highly improve the perfomances of MSE-based codecs.
Three-dimensional image representation of buildings utilizing heterogeneous information for multimedia ambiance communication
Author(s):
Minako Toba;
Takahiro Saito;
Kiyoharu Aizawa;
Kenji Mochizuki;
Takeshi Naemura;
Hiroshi Harashima
Show Abstract
The Multimedia Ambiance Communication project has been researched and developed using 3D image space that is shared by people in different locations. In this communication, 3D space is constructed of layered structure defined for long-range, middle-range, short-range views. This research especially focuses on buildings in the middle-range view. We acquire 3D data of buildings using a range scanner and texture data, and represent them in VR space. In the case of real buildings, restrictions result in the acquisition of partial data. To compensate for this, other information such as drawings, photographs, general knowledge, etc, were used. The authors detail the construction of photo-realistic representation of buildings and 3D space.
Active camera tracking using affine motion compensation
Author(s):
Young-Kee Jung;
Yo-Sung Ho
Show Abstract
This paper describes a feature-based tracking system that can track moving objects with an active camera. A robust camera motion estimation algorithm is proposed to obtain a stable global motion for feature tracking. After we identify background and foreground motions based on dominant motion estimates, we estimate camera motion on the background by applying a parametric affine motion model. After compensating for camera motion, we trace multiple corner features in the scene and segment foreground objects by clustering motion trajectories of the corner features. We can also command the pan-tilt controller to position the moving object at the center of the camera.
Scalable reduced dimension object segmentation based on wavelet
Author(s):
Lei Zhang;
Guo-Fang Tu
Show Abstract
Scalable reduced dimension object segmentation (SRDOS) based on wavelet is presented in this paper. SRDOS algorithm is taken advantage of the characteristic of wavelet coefficients multiresolution in the same direction, which makes SRDOS be applied to detect the video object of a reduced dimension image with much lower complexity and more sufficient accuracy. It is also important that SRDOS may be a multiresolution object segmentation algorithm based on wavelet transform. So SRDOS is a fast and efficient object segmentation algorithm. The proposed algorithm has been successfully integrated with our video object based wavelet color image coding algorithm.
Fast method of segmentation and indexing MPEG1-2 flow
Author(s):
Lionel Brunel;
Pierre Mathieu
Show Abstract
Multimedia data accessibility depends on a precise indexing, involving a computational cost. This paper proposes a new fast method of segmentation and indexing in order to fill out in an automatic way several MPEG7 fields (e.g. camera and objects movement). In order
to accelerate segmentation process, we exploit most of the information contained in MPEG1-2 flow; the decompression is restricted to entropic decoding and inverse quantization, the estimation of the camera movement is obtained from MPEG1-2 motion prediction. Segmentation in homogeneous color zones is obtained by a "split and merge" algorithm improved by a B-Splines active contour segmentation regularization.
Segmentation of the breast region in mammograms using active contours
Author(s):
Michael A Wirth;
Alexei Stapinski
Show Abstract
The largest single feature on a mammogram is the skin-air interface, or breast contour. Extraction of the breast contour is useful for a number of reasons. Foremost it allows the search for abnormalities to be limited to the region of the breast without undue influence from the background of the mammogram. Segmentation of the breast-region from the background is made difficult by the tapering nature of the breast, such that the breast contour lies in between the soft-tissue and the non-breast region. This paper explores the application of active contours to the problem of extracting the breast region in mammograms.
Human activities recognition by head movement using partial recurrent neural network
Author(s):
Henry C. C. Tan;
Kui Jia;
Liyanage C. De Silva
Show Abstract
Traditionally, human activities recognition has been achieved mainly by the statistical pattern recognition methods or the Hidden Markov Model (HMM). In this paper, we propose a novel use of the connectionist approach for the recognition of ten simple human activities: walking, sitting down, getting up, squatting down and standing up, in both lateral and frontal views, in an office environment. By means of tracking the head movement of the subjects over consecutive frames from a database of different color image sequences, and incorporating the Elman model of the partial recurrent neural network (RNN) that learns the sequential patterns of relative change of the head location in the images, the proposed system is able to robustly classify all the ten activities performed by unseen subjects from both sexes, of different race and physique, with a recognition rate as high as 92.5%. This demonstrates the potential of employing partial RNN to recognize complex activities in the increasingly popular human-activities-based applications.
Spatially adaptive HOS-based motion detection for video sequence segmentation
Author(s):
Stefania Colonnese;
Alessandro Neri;
Giuseppe Russo;
Gaetano Scarano
Show Abstract
In this paper an adaptive procedure, based on a coarse-to-fine scheme, for the segmentation of a video sequence into background and moving objects, aimed at supporting content-based functionalities, is presented. The coarse stage provides a pixel-based motion detection based on non Gaussian signal extraction using Higher Order Statistics (HOS). The fine motion detection phase refines the coarse classification by introducing some topological constraints on the
segmentation map essentially by means of simple morphological operators at low computational cost. The background model takes explicitly into account the apparent motion, induced by background fluctuations typically appearing in outdoor sequences. Spatial adaptation of the algorithm is obtained by varying the threshold of the HOS based motion detector on the basis of the local spectral characteristics of each frame, measured by a parameter representing the local spatial bandwidth. Simulation results show that, the introduction of local bandwidth to control the segmentation
algorithm rejects the large apparent motion observed in outdoor sequences, without degrading the detection performance in indoor sequences.
Texture segmentation based on features in wavelet domain for image retrieval
Author(s):
Ying Liu;
Si Wu;
Xiaofang Zhou
Show Abstract
Texture is a fundamental feature which provides significant information for image classification, and is an important content used in content-based image retrieval (CBIR) system. To implement texture-based image database retrieval, texture segmentation techniques are need to segment textured regions from arbitrary images in the database. Texture segmentation has been recognized as a difficult problem in image analysis. This paper proposed a block-wise automatic texture segmentation algorithm based on texture features in wavelet domain. In this algorithm, texture features of each block are extracted and L2 distance between blocks are calculated; a pre-defined threshold is used to determine if two blocks should be classified into same class, hence belong to same textured region. Results show that the proposed algorithm can efficiently catch the texture mosaics of arbitrary images. In addition, features of each textured region can be obtained directly and used for image retrieval. Applying various thresholds instead of uniform threshold to different blocks according to their homogeneity property, texture segmentation performance can be further improved. Applied to image database, the proposed algorithm shows promising retrieval performance based on texture features.
Recursively weighting pixel domain intra prediction on H.264
Author(s):
Hideaki Kimata;
Masaki Kitahara;
Yoshiyuki Yashima
Show Abstract
Intra coding for lossy block base transform video coding and still picture coding has been studied. In H.264, pixel domain prediction is applied, where all pixel values in a block are predicted from decoded images in surrounding blocks. There are some advantages in pixel domain prediction comparing with DCT domain prediction. One thing is
that in pixel domain prediction, residual data at block boundaries becomes smaller. On the other hand, in pixel base prediction scheme for lossless coding, each pixel value is predicted from surrounding pixels generally. In Multiplicative Autoregressive Models (MAR) or JPEG-LS, each pixel is predicted from some neighboring pixels. This pixel base prediction scheme is more effective to reduce prediction error than block base prediction. In this paper, the new intra prediction method, Recursively Weighting pixel domain Intra Prediction (RWIP) method for block base transform coding is proposed. The RWIP applies similar approach to pixel base prediction scheme in order to reduce prediction error more than the conventional block base prediction scheme, especially for blur or complicated
directional edge images. This paper also demonstrates the efficiency of the RWIP over the normal intra prediction of H.264.
Fractal coding of color images using the correlation between Y and C components
Author(s):
Yuki Nakane;
Eiji Nakamura;
Katsutoshi Sawada
Show Abstract
This paper presents an efficient fractal coding scheme for color images and demonstrates its experimental results. The proposed fractal coding scheme utilizes the correlation between a luminance component (Y) and two color difference components (Cr and Cb) of an input color image. The Y, Cr and Cb components are first decomposed to low and high frequency sub-band images. Fractal block coding is performed only on the lowest frequency sub-band images of Y, Cr
and Cb. The other high frequency sub-band images are encoded by vector quantization (VQ). In the fractal coding process for Y, each range block is encoded by a set of contractive affine transformations of its correspondent domain block. For Cr and Cb, on the other hand, only the range block average values are coded. The other fractal coded data of the correspondent range block of Y are applied also to Cr and Cb. The computer simulation experimental results show
that the coded and decoded color images obtained by the proposed scheme give higher SNR values and better image qualities compared to the conventional fractal coding scheme and JPEG.
Comparison of texture coding methods in a unified motion prediction/compensation video compression algorithm
Author(s):
Julien Reichel;
Francesco Ziliani
Show Abstract
Efficient video compression is based on three principles: reduction
of the temporal, spatial and statistical redundancy present in the
video signal. Most video compression algorithm, (MPEGs, H.26x, ...)
use the same principle to reduce the spatial redundancy, i.e. an 8x8
DCT transform. However there exist other transforms capable of
similar results. Those are the integer 4x4 DCT and the wavelet
transforms for instance. This article compare many transforms in the
same global compression scheme, i.e the same motion estimation,
compensation strategy and the same entropy coding. Moreover the tests
are conducted on sequences of different nature, such as sport, video
surveillance and movies. This allows a global performance comparison
of those transforms in many different scenarios.
Study of mutual scan-line fractal coding
Author(s):
Para Limmaneepraserth;
Ruttikorn Varakulsiripunth
Show Abstract
In this paper, we concentrate on mutual scan-line fractal coding (MSFC), which is a novel fractal coding approach, and its performance. The purposes of this study are to examine the advantages and disadvantages of the MSFC. The main idea of this approach is to make use of similarity of each imaging scan-line pair. In this method, the domain pool is designed by using equivalent property of two complete metric spaces. Then, the domain-range mapping, which is at the heart of iterated contractive transformation, is implemented with a pair of contractive affine transformations for generating mutual fractal codes. This method provides the small size of domain pools, thus speeding up the encoding time. The experimental results illustrate that the proposed approach can reduce both encoding and decoding times and also maintain coding efficiency in bit rate at the range of 1.2-2.5 bpp, without any entropy encoding at the end of the scheme.
Apple Quicktime vs. Microsoft Windows Media: an objective comparison of video encoding quality
Author(s):
Arman Aygen;
Kambiz Homayounfar
Show Abstract
This paper presents a methodology and a framework for quality assessment of compressed video. Usage of the framework is illustrated by taking five source video clips, compressing them by two commercially available video encoders, and calculating four perceptual metrics for each encoded clip. The perceptual metrics utilized to characterize vide quality were: jerkiness, blur, block distortion, and an objective overall quality index. The paper provides over forty curves to show the shape and range of the perceptual metrics.
Sports video categorizing method using camera motion parameters
Author(s):
Shinichi Takagi;
Shinobu Hattori;
Kazumasa Yokoyama;
Akihisa Kodate;
Hideyoshi Tominaga
Show Abstract
In this paper, we propose a content based video categorizing method
for broadcasted sports videos using camera motion parameters. We
define and introduce two new features in the proposed method; "Camera motion extraction ratio" and "Camera motion transition". Camera motion
parameters in the video sequence contain very significant information for categorization of broadcasted sports video, because in most of sports video, camera motions are closely related to the actions taken in the sports, which are mostly based on a certain rule depending on types of sports. Based on the charactersitics, we design a sports video categorization algorithm for identifying 6 major different sports types. In our algorithm, the features automatically extracted from videos are analysed statistically. The experimental results show a clear tendency and the applicability of the proposed method for sports genre identification.
Episode image analysis for wearable daily life recording system
Author(s):
Masashi Toda;
Takeshi Nagasaki;
Toshio Kawashima
Show Abstract
Now we are developing a wearable recording system which can capture images from user’s view point in user’s everyday life automatically. The user refers to the images which is acquired using this system later, and these images support a user’s activity, such as human memory and human thoughts. Amount images which are acquired becomes so
huge that it is difficult for the user to refer them. So the mechanism for viewing effectively is very important, such as summary. In this research, we describe the concept for summary mechanism of everyday life images.
Key frames extraction in athletic video
Author(s):
Giuseppe Caccia;
Rosa Lancini;
Stefano Russo
Show Abstract
In this paper, we present an effective framework for features extraction from an athletic sport sequence. We analyze both forward and backward motion vectors from MPEG 2 video sequences for camera movements detection. Features like the beginning and the end of the race and the type of competition are strictly connected to the camera
motion. Our algorithm is able to extract the frame number of the investigated feature with very high accuracy.
On the detection of pornographic digital images
Author(s):
Raimondo Schettini;
Carla Brambilla;
Claudio Cusano;
Gianluigi Ciocca
Show Abstract
The paper addresses the problem of distinguishing between pornographic and non-pornographic photographs, for the design of semantic filters for the web. Both, decision forests of trees built according to CART (Classification And Regression Trees) methodology and Support Vectors Machines (SVM), have been used to perform the classification. The photographs are described by a set of low-level features, features that can be automatically computed simply on gray-level and color representation of the image. The database used in our experiments contained 1500 photographs, 750 of which labeled as pornographic on the basis of the independent judgement of several viewers.
Two new algorithms for fast computation of Legendre moments
Author(s):
Lei Qin;
Huazhong Shu;
Fenghua Jin;
Christine Toumoulin;
Limin Luo
Show Abstract
Orthogonal moments have been successfully used in the field of pattern recognition and image analysis. However, due to the complexity in their calculation, the problem of fast computation of orthogonal moments has not till now been well solved. This paper presents two fast and efficient algorithms for the two dimensional (2D) Legendre moment computation. They are based on a block representation of the image and respectively use cumulative and integral methods. Results on 2D binary images show that these algorithms can decrease the computational complexity in a very important way.
Detection and repair of defects in range-and-color image data observed with a laser range scanner
Author(s):
Takahiro Saito;
Takashi Komatsu;
Shin-ichi Sunaga;
Masayuki Hashiguchi
Show Abstract
Some types of laser range scanner measuring range and color data simultaneously are used to acquire 3D structure of outdoor scenery. However, a laser range scanner cannot give us perfect information about target objects such as buildings, and various factors incur defects of range data. We present a defect detection scheme based on the region segmentation using observed range and color data, and apply a time-evolution method to the repair of defective range data. As to the defect detection, performing the segmentation, we divide observed data into several regions corresponding to buildings, the ground and so on. Using the segmentation results, we determine defect regions. Given defect regions, their range data will be repaired from observed data in their neighborhoods. Reforming the transportation-based inpainting algorithm, previously developed for the defect repair of an intensity image, we construct a new defect-repair method that applies the interleaved sequential updates, composed of the transportation-based inpainting and the data projection onto the viewing direction of each range sample, to 3D point data converted from observed range data. The performance evaluations using artificially damaged test range data demonstrate that our repair method outperforms the existing repair methods both in quantitative performance and in subjective quality.
Shape contour description using modified radius-vector function
Author(s):
Seung-Jin Park;
Hae-Jin Park;
Jong-An Park
Show Abstract
Shape is one of the salient features of visual content and can be used in visual information retrieval. Radius vector function for star-shaped objects is available. A modified radius-vector function for both types of shapes-star shaped, and non-star shaped is presented. The center of gravity is selected as the reference point, while the principal of axes is selected as the reference line. The corner points are marked as the nodes. Normalized distances are stored from the reference point to the corner nodes. The Euclidean distance of the center to the corner distances of test object with those of objects in the database is used for shape matching.
On edge detector using local histogram analysis
Author(s):
Abdelmagid Khalil;
Amar Aggoun;
Ahmed El-Mabrouk
Show Abstract
The objectives of this paper is to present a novel edge extraction algorithm, based on differentiation of the local histograms of small non-overlapping blocks of the output of the first derivative of a narrow 2D Gaussian filter. It is shown that the proposed edge extraction algorithm provides the best trade off between noise rejection and accurate edge localisation and resolution. The proposed edge detection algorithm starts by convolving the image with a narrow 2D Gaussian smoothing filter to minimise the edge displacement, and increase the resolution and detectability. Processing of the local histogram of small non-overlapping blocks of the edge map is carried out to perform an additional noise rejection operation and automatically determine the local thresholds. The results obtained with the proposed edge detector are compared to the Canny edge detector
Detection of salient curvature features on surfaces and their applications
Author(s):
Zhongwei Yin;
Xijuan Liu;
Shouwei Jiang
Show Abstract
In this paper, we presented an approach for detection of perceptually salient curvature extrema on surfaces approximated by dense triangles. Based on the properties of B-spline or NURBS, we use data points instead of scattered points (control points)and their principal and mean curvatures and explore “area degenerating” method to detect the salient curvature features.. This paper demonstrates the advantages of the proposed method by comparison with existing algorithms.