Proceedings Volume 4067

Visual Communications and Image Processing 2000

King N. Ngan, Thomas Sikora, Ming-Ting Sun
cover
Proceedings Volume 4067

Visual Communications and Image Processing 2000

King N. Ngan, Thomas Sikora, Ming-Ting Sun
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 30 May 2000
Contents: 29 Sessions, 172 Papers, 0 Presentations
Conference: Visual Communications and Image Processing 2000 2000
Volume Number: 4067

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Special Session: Image-Based Rendering: Processing, Compression, and Rendering
  • Video Coding I
  • Image Sequence Analysis I
  • Wireless Video
  • Video Coding II
  • Image Sequence Analysis II
  • Special Session: Internet Video
  • Poster Session II: Segmentation, Tracking, and Feature Extraction
  • Special Session: Internet Video
  • Object-Based Coding
  • Stereo/Multiview Imaging
  • Wireless/Internet Video
  • Image Coding I
  • Special Session: Face Segmentation and Its Applications
  • Special Session: Testing and Quality Metrics for Digital Video Services
  • Image Coding II
  • Segmentation and Tracking I
  • VLSI I
  • Content-Based Coding
  • Segmentation and Tracking II
  • Special Session: Image Analysis and Understanding
  • Motion Estimation I
  • Wavelets I and II
  • Special Session: Error-Resilient Image and Video
  • Motion Estimation II
  • Wavelets I and II
  • Image Coding I
  • Wavelets I and II
  • VLSI II
  • Synthetic Image/Video Coding
  • Application Systems
  • Poster Session I: Image and Video Coding
  • Poster Session II: Segmentation, Tracking, and Feature Extraction
  • Poster Session III: Image Processing
  • Special Session: Testing and Quality Metrics for Digital Video Services
Special Session: Image-Based Rendering: Processing, Compression, and Rendering
icon_mobile_dropdown
Review of image-based rendering techniques
Harry Shum, Sing Bing Kang
In this paper, we survey the techniques for image-based rendering. Unlike traditional 3D computer graphics in which 3D geometry of the scene is known, image-based rendering techniques render novel views directly from input images. Previous image-based rendering techniques can be classified into three categories according to how much geometric information is used: rendering without geometry, rendering with implicit geometry (i.e., correspondence), and rendering with explicit geometry (either with approximate or accurate geometry). We discuss the characteristics of these categories and their representative methods. The continuum between images and geometry used in image-based rendering techniques suggests that image-based rendering with traditional 3D graphics can be united in a joint image and geometry space.
Model-based coding of multiviewpoint imagery
Marcus Magnor, Bernd Girod
A compression scheme for calibrated images depicting a static scene from arbitrary viewpoints is presented. 3D scene geometry is reconstructed, and view-dependent texture maps are generated from all images. Texture is wavelet-coded using the SPIHT coding scheme extended to 4D, exploiting correlations within as well as between texture maps. During decoding, all texture maps are simultaneously and progressively reconstructed. The coder provides 3D scene geometry and multiple texture maps, enabling the use of graphics hardware to accelerate the rendering process. Three image sets acquired from real-world objects are used to evaluate the model-based coding scheme. Coding efficiency is shown for geometry approximations of different accuracy.
Real-time stereo rendering of concentric mosaics with linear interpolation
Minsheng Wu, Honghui Sun, Harry Shum
In this paper, we introduce an efficient image based rendering system for concentric mosaics, which can render stereo pair images of real scene in real time, with improved image qualities. We will mainly focus on related user interface issues and three interpolation algorithms: point sampling, linear interpolation with infinite or constant depth.
Rendering of 3D-wavelet-compressed concentric mosaic scenery with progressive inverse wavelet synthesis (PIWS)
Yunnan Wu, Lin Luo, Jin Li, et al.
The concentric mosaics offer a quick solution to the construction and navigation of a virtual environment. To reduce the vast data amount of the concentric mosaics, a compression scheme based on 3D wavelet transform has been proposed in a previous paper. In this work, we investigate the efficient implementation of the renderer. It is preferable not to expand the compressed bitstream as a whole, so that the memory consumption of the renderer can be reduced. Instead, only the data necessary to render the current view are accessed and decoded. The progressive inverse wavelet synthesis (PIWS) algorithm is proposed to provide the random data access and to reduce the calculation for the data access requests to a minimum. A mixed cache is used in PIWS, where the entropy decoded wavelet coefficient, intermediate result of lifting and fully synthesized pixel are all stored at the same memory unit because of the in- place calculation property of the lifting implementation. PIWS operates with a finite state machine, where each memory unit is attached with a state to indicate what type of content is currently stored. The computational saving achieved by PIWS is demonstrated with extensive experiment results.
Compression and rendering of concentric mosaics with reference block codec (RBC)
Cha Zhang, Jin Li
Concentric mosaics have the ability to quickly capture a complete 3D view of a realistic environment and to enable a user to wander freely in the environment. However, the data amount of the concentric mosaics is huge. In this paper, we propose an algorithm to compress the concentric mosaic image array through motion compensation and residue coding, which we called reference block codec. A two-level index table is embedded in the compressed bitstream for random access. During the rendering, the entire compressed concentric mosaic scene is not fully expanded at any time. Instead, only the contents necessary to render the current view are decoded in real time. We denote such rendering scheme as just-in-time rendering. Four decoder caches are implemented to speed up the rendering.
Video Coding I
icon_mobile_dropdown
Local statistics adaptive entropy coding method for the improvement of H.26L VLC coding
Kook-yeol Yoo, Jong Dae Kim, Byung-Sun Choi, et al.
In this paper, we propose an adaptive entropy coding method to improve the VLC coding efficiency of H.26L TML-1 codec. First of all, we will show that the VLC coding presented in TML-1 does not satisfy the sibling property of entropy coding. Then, we will modify the coding method into the local statistics adaptive one to satisfy the property. The proposed method based on the local symbol statistics dynamically changes the mapping relationship between symbol and bit pattern in the VLC table according to sibling property. Note that the codewords in the VLC table of TML-1 codec is not changed. Since this changed mapping relationship also derived in the decoder side by using the decoded symbols, the proposed VLC coding method does not require any overhead information. The simulation results show that the proposed method gives about 30% and 37% reduction in average bit rate for MB type and CBP information, respectively.
Improved single VO rate control for constant bit-rate applications using MPEG-4
Thomas Meier, King N. Ngan
This paper considers single video object rate control for MPEG-4 and presents a new rate control algorithm based on the quadratic rate-distortion model. The major innovation are a novel constraint for the least mean square estimation of the model parameters of the rate-distortion function, a new measure for the encoding complexity, and an efficient frame skipping strategy. An extension of the proposed rate control scheme to sequences with multiple video objects is possible. Experiments that were carried out on different video sequences showed an increase in the average PSNR of 1.2 dB and, more significantly, a reduction in the number of skipped frames of about 25% compared to the rate control in the MPEG-4 video verification model. The likelihood of buffer overflows was also greatly reduced for our proposed technique.
Video transcoding for multiple clients
Jeongnam Youn, Jun Xin, Ming-Ting Sun, et al.
Most previous research efforts on video transcoding have been focused on changing the bit-rate of one pre-encoded bit-stream to another. However, in many applications (e.g. streaming video over a heterogeneous network), it may require to transcode a pre-encoded video bit-stream to multiple bit-streams with different bit-rates and features to support multiple clients with different requirements. In this paper, we discuss the case of point-to-multipoint transcoding. We compare two transcoder architectures in terms of processing speed for H.263 and MPEG-2 transcoding. We show that for point-to-multipoint transcoding, a cascaded video transcoder is more efficient since some parts of the transcoder can be shared by the multiple clients.
Syntax-constrained rate-distortion optimization for DCT-based image encoding methods
Guobin Shen, Alexis Michael Tourapis, Ming Lei Liou
In this paper, we presented a novel and effective optimization method to achieve better trade-off between rate and distortion. The optimization is done at the encoder side alone and is transparent to the decoder. That is the quantized DCT coefficient set it optimized with full syntax compliance and decoder compatibility. The proposed method determines both the positions and the retained values of quantized DCT coefficients according to the rate-distortion performance measured by the associating Lagrangian cost. Fast dynamic programming technique was developed to release the computation burden. All the experiments show that the proposed method consistently outperforms other existing optimization methods.
Motion vector certainty reduces bit rate in backward motion estimation video coding
Astrid Lundmark, Haibo Li, Robert Forchheimer
Wavelet video coding using motion vectors estimated simultaneously at the transmitter and receiver side from the transmitted image data have been reported to have good compression capabilities, comparable to the non-scalable version of H.263. When scalability is required, the comparison turns even more in favor of the wavelet coding scheme. This paper shows that it is possible to reduce the bit-rate further in backward motion estimation schemes by using the certainty of each estimated motion vector. In this paper we report a lowering in bit rate of about 20% by using the motion vector certainty as background information in the entropy coding/decoding process. We also propose a low- complexity algorithm which does not require motion estimation/compensation, but uses the motion vector certainty.
Image Sequence Analysis I
icon_mobile_dropdown
Efficient detection of eye movements in video image sequences
Raimund Lakmann
A system for automatic detection of eye movements in medical applications is presented. It is designed to register exactly the dynamic motions of human eyes. In principle, the system runs independently of specific applications and may be used for different medical diagnoses. The method for motion analysis is applied to video image sequences of the retina recorded by a scanning laser ophthalmoscope (SLO). A modified block-matching algorithm has been developed for tracking the ocular fundus in the SLO sequences. The image processing strategies implemented provide high reliability of the motion detection which is used to process off-line. Several modifications have been developed in order to take into account the video quality of the SLO sequences. Additionally, some improvements have been implemented to increase the speed of the software. The system includes a phase of automatic motion detection followed by a subsequent phase of manual quality control. In this latter phase, an efficient tool for error detection and correction guarantees high quality and security of movement detection, which is of great importance in clinical applications.
Video retrieval based on the object's motion trajectory
Kyu-Won Lee, Woong-Shik You, Jinwoong Kim
This paper presents an efficient way of indexing and searching based on object-specific feature for video retrieval at different semantic levels. By tracking individual objects with segmented data, we generate motion trajectories with moving trails of objects and set a model using polynomial curve fitting. The trajectory model is used as an indexing key for accessing each object in the semantic level. The proposed searching system supports various types of queries including query-by-example, query- by-sketch and query on weighting parameters for event-based retrieval. When retrieving the interested video clips, the system returns the best-matched events in the similarity order. In addition, we implement a temporal event graph for direct accessing and browsing a specific event in the video sequence.
Shot modeling and clustering in MPEG-compressed video
Jesus Bescos, Francisco Lopez
Early algorithms detecting camera shot transitions were based on the calculation of the difference between a single parameter of two consecutive frames. For many applications, only MPEG video is available; and in this situation, performing previous decompression in order to apply pixel- based algorithms may not be strictly required to achieve reasonable results: an alternative to the pixel-based approaches is to work with frame differences directly extracted from the MPEG compressed sequence. This paper intends to prove that shot detection on this domain can be as precise as that on the uncompressed one. After a discussion on the parameters of the MPEG stream best adapted to calculate an inter-frame difference, we present a distance-independent cut detection method, based on cut modeling and clustering, which obtains results similar to those obtained in the uncompressed domain.
Automatic thresholding for change detection in digital video
Nariman Habili, Alireza Moini, Neil Burgess
In this paper we propose a thresholding technique for change detection in digital video. The technique assumes that the difference image generated from two frames of a video sequence has a trimodal Gaussian distribution and a computationally efficient fitting criterion is employed to find the best match between the data and model. Results show that the technique is capable of detecting true motion in very images.
Statistical approach to shot-boundary detection in an MPEG-2-compressed video sequence
Taehwan Shin, JaeGon Kim, Jinwoong Kim, et al.
In this paper, we propose an efficient shot boundary detection algorithm for direct processing of MPEG-2 video sequences. The proposed algorithm utilizes the hierarchical structure of the compressed bitstreams and characteristics of the coded parameters, for example, picture coding type and macroblock coding mode, thus greatly reducing computational requirement compared to pixel domain processing with full decompression. Occurrence of a shot boundary is checked first in a sub-GOP level, and if the result is affirmative it is checked again in each picture. And, to solve selection of appropriate threshold values, we use the hypothesis test with statistical characteristics of the coded parameters, for example, the ratio of intra-coded macroblocks to inter-coded macroblocks, the ratio of the backward macroblocks to bi-directional macroblocks, and the ratio of forward macroblocks to bi-direction macroblocks.
Wireless Video
icon_mobile_dropdown
Unequal error protection for MPEG-2 video transmission over frequency-selective Rayleigh fading channels
Tran Anh Tuan, R. M.A.P. Rajatheva
In this paper, we investigate the effects of errors on different parts in each slice of coded pictures of each type in MPEG-2 video sequences. We conclude, based on the simulation results, that the bits at the beginning of each slice are far more important than those near the end of the slice. Thus if the bits at the beginning of each slice are better protected while letting errors corrupt other bits in the slice, better video quality could be achieved while minimizing the overhead due to forward error control. We then propose a new unequal error protection (UEP) scheme, which gives better video quality than the existing schemes. In our scheme, each slice is first split into a number of segments. Then the segments at the beginning of each slice are better protected, while other segments in the slice are less protected. Our simulation results show remarkable (up to 4 dB improvement in peak signal to noise ratio) improvement in video quality, compared to the equal error protection and one existing UEP schemes, without bandwidth expansion. We conclude that the proposed UEP scheme is robust for MPEG-2 video transmission over frequency- selective Rayleigh fading channels.
Mixed video/data transmission over indoor wireless channels using unequal error protection method
Jae-Sung Roh, Ki-Sung Kang, Sung-Joon Cho
This paper provides a system level description of a mixed video/data CDMA transmission model. We propose the combination of rate compatible punctured convolution (RCPC) code for the wireless physical (PHY) layer and CDMA RAKE receiver for wireless mixed video/data traffic which can be sent over indoor wireless channels. Through numerical analysis, using a frequency selective multipath Rayleigh fading channel model, it is shown that RCPC coding at the wireless PHY layer can be effective in providing reliable wireless CDMA network. And the proposed scheme that combine RCPC coder and CDMA RAKE receiver provides a significant gain in the BER and PER performance over multipath fading channels.
Error concealment for SNR scalable video coding in wireless communication
Andre Kaup
A mandatory requirement for future wireless multimedia communication is the availability of error resilient media codecs. This work discusses error detection and concealment techniques for a layered scalable video decoder, based on the SNR scalability option of the H.263 standard. A concealment method is proposed which uses error frequency and error location statistics for efficient hiding of transmission errors. The designed decoder allows error robust decoding with acceptable image quality even for highly corrupted video sequences up to bit error rates of 10-3, optimizing the image quality specifically for two-layer coded sequences. Simulation results show that scalable video coding outperforms single-layer coding under typical wireless conditions even if no priority or increased protection is applied to the base layer. The proposed two- layer concealment method yields a consistent improvement of up to 5 dB in image quality.
Independence of source and channel coding for progressive image and video data in mobile communications
Arjen van der Schaaf, Reginald L. Lagendijk
In this paper we assess the independence of the optimization of source and channel coding parameters. We propose a method to separate the source and channel coding optimization as much as possible while maintaining the possibility of joint optimization. We theoretically derive key parameters that must be passed through an interface between source and channel coding. This separation greatly reduces the complexity of the optimization problem and enhances the flexibility.
Video Coding II
icon_mobile_dropdown
Novel video coding scheme using adaptive mesh-based interpolation and node tracking
Eckhart Baum, Joachim Speidel
An alternative method to H.263 for encoding of moving images at bit rates below 64 kbit/s is presented using adaptive spatial subsampling, mesh based interpolation and node tracking. This is why we call this new coding algorithm Mesh Based Interpolative Coding. Data compression is achieved by representing the image content by a number of non- equidistant sampling points (nodes) for luminance and color difference signals. The decoder reconstructs the image by interpolating the transmitted sampling points. For a given number of nodes which corresponds approximately to the total bit rate, the coder generates the node positions by minimizing the mean square error between the original picture and the interpolated picture. For moving images, each node is associated with a motion vector for node tracking. Simulation results of a complete encoder and decoder show that this method can provide lower bit rates than conventional schemes at a given picture quality for sequences with moderate movement.
Error-resilient video coding technique based on wavelet transform
In this paper, we propose a zerotree wavelet image/video coding technique with resilience to transmission errors which typically occur on noisy channels. A key tool that we employ is the bit partitioning algorithm and the bit reorganization algorithm which is called the EREC (Error Resilient Entropy Code). In order to take full advantage of the bit reorganization algorithm, the bit partitioning algorithm composes the data as separate code-blocks although the zerotree wavelet coding algorithm is not block-based compression technique. The bit reorganization algorithm requires a very low redundancy for the sequential transmission of variable length blocks that offer virtually guaranteed code and block synchronization. We present simulation results verifying the error resiliency of the proposed algorithm both for image coding which uses the wavelet transform and for video coding which uses the 3D wavelet transform. Experimental results show that the proposed coders outperform the existing error resilient coders for both noise-free channels and noisy channels. In addition, we confirm that the proposed algorithm is more error-resilient than the previously reported error-resilient coders for various channel error conditions.
Embedded color coding for scalable 3D wavelet video compression
With the recent expansion of multimedia applications, video coding systems are expected to become highly scalable, that is to allow partial decoding of the compressed bit-stream. Encoding techniques based on subband/wavelet decompositions offer a natural hierarchical representation for still pictures and their high efficiency in progressively encoding images yields a scalable representation. The multiscale representation can be extended to video data, by a 3D (or 2D+t) wavelet analysis, which includes the temporal dimension within the decomposition. Progressive encoding of video data represented by a 3D-subband decomposition was recently proposed as an extension of image coding techniques exploiting hierarchical dependencies between wavelet coefficients. In most of the previous image and video coding techniques, the compression is performed independently for luminance and chrominance coordinates. In this paper we propose a new coding technique for the chrominance coefficients, which not only delivers a bit-stream with a higher degree of embedding, but also takes advantage of the dependencies between luminance and chrominance components to provide an effective compression.
Low-bit-rate generalized quad-tree motion compensation algorithm and its optimal encoding schemes
Hanan Ahmed-Hosni Mahmoud, Magdy A. Bayoumi
Quad-tree structured motion-compensation technique effectively utilizes the motion content of a frame as opposed to fixed size block motion compensation technique. In this paper, we propose a novel quad-tree-structured region-wise motion compensation technique that divides a frame into equivalent triangle blocks using the quad-tree structure. Arbitrary partition shapes are achieved by allowing 4-to-1, 3-to-1 and 2-1 merge/combine of sibling blocks having the same motion vector. We propose an optimal code scheme and a temporal predictive coding for the quad- tree. Simulation results show that our techniques reduce the bit rate by 40% as compared to other methods.
Video codec incorporating block-based multihypothesis motion-compensated prediction
Multi-hypothesis prediction extends motion compensation with one prediction signal to the linear superposition of several motion-compensated prediction signals. These motion- compensated prediction signals are referenced by motion vectors and picture reference parameters. This paper proposes a state-of-the-art video codec based on the ITU-T Recommendation H.263 that incorporates multi-hypothesis motion-compensated prediction. In contrast to B-Frames, reference pictures are always previously decoded pictures. It is demonstrated that two hypotheses are efficient for practical video compression algorithms. In addition, it is shown that multi-hypothesis motion-compensated prediction and variable block size prediction can be combined to improve the overall coding gain. The encoder utilizes rate- constrained coder control including rate-constrained multi- hypothesis motion estimation. The advanced 4-hypothesis codec improves coding efficiency up to 1.8 dB when compared to the advanced prediction codec with ten reference frames for the set of investigated test sequences.
Image Sequence Analysis II
icon_mobile_dropdown
Analysis and tracking of human gait via a marker-free system
Elodie F. Calais, Louis Legrand
This paper presents a marker-free methodology for facilitating the analysis and tracking of human motion during gait. It consists of recognition and reconstruction of the legs of a walking human. In order to study the gait, a video system composed of three synchronized CCD cameras has been devised. It provides three different grey level image sequences of the same scene. First of all, a model of the human, based on tapered superquadric curves, has been defined; the leg can be divided into three parts: the thigh, the calf, and the foot. The whole methodology can be split following this scheme: determination of the boundaries of the human in motion by using successively an optical flow process and a crest lines extraction algorithm, prediction of the location of the human body in the 3D space, direct reconstruction of each part of the leg with a Least Median of Squares regression, and finally application of a spatial coherence process. The method described has been tested on synthetic images (mean error of about 1.2 mm and a maximal error of about 5 mm along the coordinate axes) and on image sequences of a walking human.
MIME: a gesture-driven computer interface
Daniel Heckenberg, Brian Lovell
MIME (Mime Is Manual Expression) is a computationally efficient computer video system for recognizing hand gestures. The system is intended to replace the mouse interface on a standard personal computer to control application software in a more intuitive manner. The system is implemented in C code with no hardware-acceleration and tracks hand motion at 30 fps on a standard PC. Using a simple 2D model of the human hand, MIME employs a highly- efficient, single-pass algorithm to segment the hand and extract its model parameters from each frame in the video input. The hand is tracked from one frame to the next using a constant-acceleration Kalman filter. Tracking and feature extraction is remarkably fast and robust even when the hand is placed above difficult backdrops such as a typical cluttered desktop environment. Because of the efficient coding of the gesture tracking software, adequate CPU power remains to run standard application software such as web browsers and presentation software.
Real-time face recognition using eigenfaces
Raphael Cendrillon, Brian Lovell
In recent years considerable progress has been made in the area of face recognition. Through the development of techniques like eigenfaces, computers can now compte favorably with humans in many face recognition tasks, particularly those in which large databases of faces must be searched. Whilst these methods perform extremely well under constrained conditions, the problem of face recognition under gross variations in expression, view, and lighting remains largely unsolved. This paper details the design of a real-time face recognition system aimed at operating in less constrained environments. The system is capable of single scale recognition with an accuracy of 94% at 2 frames-per- second. A description of the system's performance and the issues and problems faced during it's development is given.
Dim point target detection and tracking system in IR imagery
Temporal profiles of point-like dim targets and the extended cloud pixels provide useful information in detecting the targets. Among the recent methods that utilize temporal profile of pixel in detecting point target are Triple Temporal Filter (TTF) and Continuous Wavelet Transform (CWT). TTF uses two damped sinusoidal filters, an exponential averaging filter with six appropriate coefficients to deal with different aspects of clutter. TTF is recursive and efficient in detecting point targets without applying any threshold techniques. The performance of CWT is comparable to TTF but all the frames in a sequence need to be stored. Therefore it is computationally complex algorithm.
Special Session: Internet Video
icon_mobile_dropdown
Adaptive optimal intra-update for lossy video transmission
Klaus Werner Stuhlmueller, Niko Faerber, Bernd Girod
An adaptive algorithm to adjust the Intra update rate of a video encoder is presented. As optimization criterion, the PSNR at the decoder after lossy transmission is used. Based on a model for the video encoder and error propagation at the video decoder the optimal Intra rate can be calculated for a given transmission channel analytically. In this paper it is shown how the parameters of these models can be measured adaptively during encoding. Thus, the Intra rate can be adjusted to changing channel conditions and sequence statistics. It is shown that a practical robust system can be built with the presented algorithm.
Joint source and channel rate control in multicast layered video transmission
Xavier Henocq, Fabrice Le Leannec, Christine M. Guillemot
Delivering temporally-constrained multimedia streams in heterogeneous environments, offering no guarantee in terms of bandwidth, packet loss, or delay, is a very challenging problem faced today by both the networking and the coding community. Layered coding is often proposed as a solution for rate-based congestion control of video transmission in heterogeneous environments. The problem addressed more specifically here is the design of a responsive mechanism for rate allocation in each layer, that would guarantee the best bandwidth usage for all the receivers. After a review of solutions for congestion and loss control in unicast video communications, the paper describes a rate-based congestion control mechanism for multicast layered video transmission.
Poster Session II: Segmentation, Tracking, and Feature Extraction
icon_mobile_dropdown
Network security in video communications using chaotic systems
Qurban A. Memon, Zahid Ali
IN this paper, we report implementation of encryption of the signal in networked environment using synchronized chaos. The science of chaos has successfully been explored and implemented in hardware for communication signals by Cuomo et al in 1993, although he claimed that the mathematics behind the phenomenon was still obscure and not very clear. We are utilizing this unpredictable, random phenomenon called chaos to code our information by superimposing it on a chaotic signal. We applied this technique to different formats of data and results were 100% in the sense that decoded files were run without error. This testing scenario encourages use of chaos in information coding across networks. It is shown that, in this environment, the security breach may not be possible even if sender and receiver have compatible scramblers and descramblers, and the key is also known to the receiver.
Special Session: Internet Video
icon_mobile_dropdown
Three-dimensional mesh warping for natural eye-to-eye contact in Internet video communication
Insuh Lee, Byeungwoo Jeon, Jechang Jeong
A camera used in video communication over Internet is usually placed on top of a monitor, therefore it is hard for an user to make a natural eye contact with the peer communicator since the user gazes at the monitor, not the camera lens. In this paper, we propose a single 3D mesh warping technique for gaze-correction. It performs 3D rotation of face image by a certain correction angle to obtain a gaze-corrected image. The correction angle is estimated in an unsupervised way by using invariant face feature, and a very simple face section model is used in 3D rotation instead of precise, but not easily attainable in most cases, 3D face models. The method is computationally simple enough to implement for real-time casual video communication applications.
Use of UDP for efficient imagery dissemination
Robert Prandolini, T. Andrew Au, Andrew K. Lui, et al.
In the defence organization, imagery represents an important information source for users in the tactical, operational and strategic environments. Its wider dissemination may rely on deployed communication networks that are often unstable with high bit error rates and outages. This paper presents efficient techniques for imagery dissemination using the user datagram protocol (UDP). The use of UDP is compared with the popular transmission control protocol and shown to be superior in performance for error prone IP networks. We have employed a wavelet based coder producing an embedded bit-stream. The packetization of the bit-stream is investigated and we show that it is better to tile an image into independent embedded bit-streams when the network performance is poor. Variable-size tiling is compared with, and shown to be superior to, a fixed-size tiling approach. Selective re-transmission of lost packets is implemented for efficient imagery dissemination using UDP. The selective re- transmission scheme is a function of network bandwidth, delay times, error rates and the significance of the packet.
Object-Based Coding
icon_mobile_dropdown
Perceptually most significant edge detection algorithm for object-based coding
The latest digital image coding technique known as the object-based coding is mainly relied on efficient segmentation algorithms. In this paper an edge detail detection technique has been proposed and it can segment the image into objects that are perceptually meaningful. It is a transform domain technique and it makes use of perceptual characteristics, such as frequency distribution decomposition and edge structure decomposition, of transform coefficients thus it detects the perceptually most significant edge details which are useful for segmentation in object-based coding and compression. An experiment has also been conducted to test its robustness to image compression.
Three-dimensional shape-adaptive discrete wavelet transforms for efficient object-based video coding
Ji-Zheng Xu, Shipeng Li, Ya-Qin Zhang
In this paper, we present an object-based coding scheme using 3D shape-adaptive discrete wavelet transforms (SA- DWT). Rather than straightforward extension of 2D SA-DWT, a novel way to handle the temporal wavelet transform using a motion model is proposed to achieve higher coding efficiency. Corresponding to this transform scheme, we use a 3D entropy coding algorithm called Motion-based Embedded Subband Coding with Optimized Truncation (ESCOT) to code the wavelet coefficients. Results show that ESCOT can achieve comparable coding performance with the state-of-the-art MPEG-4 verification model 13.0 while having the scalability and flexibility of the bitstream in low bit-rate object- based video coding. And in relative higher bit-rate, our coding approach outperforms MPEG-4 VM 13.0 by about 2.5 dB.
Novel object-oriented video coder employing correlation-maximizing extrapolation
Jun-Seo Lee, Seung-Seok Oh, Rin-Chul Kim, et al.
In this paper, the performance of the extrapolation block transform (EBT) and the shape-adaptive DCT (SA-DCT) for arbitrarily shaped image segment coding is theoretically compared. The comparison indicates that the EBT approach yields better performance than the SA-DCT on the highly correlated image at low bit rates. Since the correlation maximizing extrapolation (CME) algorithm proposed in maximizes the correlation of the extrapolated block, it is considered to be one of the EBT techniques that can achieve the performance expected by the theoretical performance. Thus, a novel object-oriented video coder employing the CME algorithm for the texture coding is proposed in this paper, and its performance is compared with that of the MPEG-4 VM through intensive computer simulations. In the proposed video coder, the contour coding technique employing the two- stage motion compensation is adopted for the shape coding, and the rate-constrained hierarchical grid interpolation is adopted for the object-based motion compensation. The simulation results show that the proposed coder outperforms the MPEG-4 VM by about 0.84 approximately 1.29 dB depending on the test sequences at the same bit rate.
Embedded wavelet coding of arbitrarily shaped objects
Alfred Mertins, Sudhir Singh
This paper presents an embedded zerotree wavelet coding technique for the compression of arbitrarily shaped 2D objects. The wavelet decomposition is carried out with an optimized, biorthogonal, shape adaptive discrete wavelet transform (SA DWT) which performs non-expansive multiresolution decompositions of arbitrary image regions. The proposed SA DWT is defined for even-length, symmetric wavelet filters such as the 6-10 filters. The processing at region boundaries is carried out via reflection, followed by an optimization stage which requires only a few operations per boundary pixel. The computationally inexpensive optimization results in an additional performance gain of up to 0.5 dB compared to the plain reflection based scheme.
Joint position estimation for object-based analysis-synthesis coding
An object-based analysis-synthesis coder for coding moving images at low data rates is investigated. The coder is based on the source model of articulated 3D objects. This model describes the real objects by means of model objects define by shape, motion and color parameters. The model objects may be articulated, i.e. they consist of several rigid object components linked to each other by joints. In this contribution a new algorithm for joint position estimation is presented. First the motion parameters of the connected object components are estimated at different times. For motion estimation a Maximum-Likelihood estimator is applied. Then the position of the joints is determined by evaluating the estimated motion parameters in the equations representing the constraints imposed for the joints on the relative motion of the connected object components. The algorithm was applied to a synthetic test sequence (CIF, 10 Hz). At a camera noise figure of 40 dB a joint position estimation average error of 0.94 pel can be achieved.
Stereo/Multiview Imaging
icon_mobile_dropdown
Stereo matching method with deformable window and its application to 3D measurement of the human face
Weiguo Wu, Atsushi Yokoyama, Teruyuki Ushiro, et al.
In this paper, we present a new stereo matching method with deformable windows to measure depth information more precisely. In order to correct the projective distortion between the left and right images, a window on the left image was deformed more than one window called deformable windows with a linear interpolation, and each deformable window keeps same texture pattern with different size. Stereo matching are performed simply by computing the normalized sum of squared difference (SSD) between the left and right images with deformable windows. The correspondence is determined by evaluating the matching score with a minimum of normalized SSD. The algorithm has been tested with real stereo image, and applied to measure the 3D-shape of human face. The results demonstrate the effectiveness of the stereo matching method with deformable windows.
Quality evaluation model of coded stereoscopic color image
Yuukou Horita, Yoshinao Kawai, Yohko Minami, et al.
To consider the quality of service for stereoscopic image through the network, it is necessary to develop a quality evaluation method for coded stereoscopic image. We propose a quality evaluation model of the coded stereoscopic color image. This evaluation model considers not only the distortions of the edge region and smooth region but also the texture features of the left image. In addition, this model takes into account the disparity information between the left and right images. Instead of the disparity compensated coded image, we employ the JPEG coded image for the subjective assessment test. As the results, the evaluation model is useful for coded stereoscopic image.
Probabilistic diffusion for MAP-based correspondence estimation and image pairs coding in multiresolution
Sang Hwa Lee, Jong-Il Park, ChoongWoong Lee
This paper focuses on the correspondence field estimation and utilizes the estimation performance to compress stereoscopic images. This paper proposes the dense correspondence estimation with new probabilistic diffusion algorithm based on maximum a posteriori (MAP) estimation. The MAP-based correspondence field estimation including occlusion and line field is derived with reflecting the probabilistic distribution of the neighborhoods, and is applied to the compression of stereoscopic images. The proposed probabilistic diffusion algorithm considers the neighborhoods in Markov random field with their joint probability density, which is the main difference from the previous MAP-based algorithms. The joint probability density of neighborhood system is implemented by using the probabilistic plane configuration model. And, the paper derives the upper and lower bounds of the probabilistic diffusion to analyze the applied to quadtree-decomposed blocks.
Definition and construction of a 3D compact representation of image sequences
Yannick Nicolas, Philippe Robert
This paper tackles the 3D model-based representation of video sequences of real scenes and focuses on sequences derived from a camera moving in static scenes. 3D models can be used advantageously for compression purposes. Moreover, they are quite adapted to interactive viewpoint synthesis. The object is here to build a compact 3D-based representation of a given video sequence. It first requires establishing correspondence between images from which viewpoint parameters and depth maps are estimated. The representation is then built by selecting in the image sequence the data that are necessary and sufficient to reconstruct the sequence at a given quality level. This paper presents this second part of the system. Our representation is a structured view-dependent 3D model composed of an ordered set of rectangular patches describing 2.5D regions (flag+texture+depth for each pixel) with attached viewpoint parameters sets.
High-quality stereo panorama generation using a three-camera system
Kunio Yamada, Tadashi Ichikawa, Takeshi Naemura, et al.
A method for high quality stereo panorama mosaicing is presented. The surrounding scene is captured by our original 3-camera system as stereo moving picture sequences, and the images are stitched after improvement of the texture quality. The multi-purpose 3-camera system features accurate frame synchronization between 3 channels, and can be used outdoors through battery operation. For registration of panorama stitching, under the newly investigated distortion- free condition. Affine parameters are estimated through the overlapped areas by the steepest descent algorithm. The texture improvement has two steps, one is vertical resolution recovery by field integration, and the other is image enhancement by 2D quadratic Volterra filter which satisfies Weber-Fechner's law. The presented method enables high quality stereo mosaicing with accurate mutual disparities between the channel and without visible distortion of textures.
Wireless/Internet Video
icon_mobile_dropdown
Multiple hierarchical image transmission over Rayleigh fading channels
Dong-Feng Yuan, Bing Han, Zhigang Cao
In this paper, we present a multiple hierarchical protection scheme applied in for Rayleigh fading channels using multilevel coding (MLC) system in which both capacity rule and block partitioning have been taken into account. As a result, the non-uniform set-partitioning design trades the performance of the less important data for better performance of the most important ones, and thus improves the multiple hierarchical protection ability of MLC schemes.
Implementations of error-resilient transcoders for MPEG-2 video over HIPERLAN
Greg J. Cain, David W. Redmill, David R. Bull
Digital video broadcasts can be redistributed over a local area network using the packet based HIPERLAN (High Performance Radio Local Area Network) protocol. An example of the application of this technology is wireless communication within the home environment. This paper addresses the problem of transmitting MPEG-2 video over HIPERLAN and describes error resilient video transcoding as a means of handling channel errors. The implementation of two transcoding techniques on both PC and DSP platforms are discussed.
Error-resilient video coding using long-term memory motion-compensated prediction over feedback channel
Han Seung Jung, Rin-Chul Kim, Sang Uk Lee
The MC-DCT based video coding has been generally used for its efficiency and easy implementation, but it is very vulnerable in error-prone environments. In this paper, we present an error-resilient video coding using multiple reference frame, based on the long-term memory motion compensated prediction (LTMP), and an error concealment technique associated with the proposed technique. The rate- distortion optimization of the LTMP is extended in order to yield the improved error-resilience and error concealment capabilities. Also, the proposed algorithm confines effectively the temporal error propagation using the negative acknowledgement in feedback channel, in which the corrupted area by channel errors as well as the propagated is estimated and removed from the search region for the motion compensation. Thus, the proposed algorithm yields the similar performance to the forced intra update (FIU) method in terms of the PSNR, but avoids the abrupt increase in the bitrate, resulting in more efficient network utilization than the FIU. By the computer simulations, we shall demonstrate that the proposed technique provides an acceptable performance both subjectively and objectively in error-prone environments, as compared with the H.263 and LTMP, with or without the feedback messages.
Transactional interactive multimedia banner
Zon-Yin Shae, Xiping Wang, Juerg von Kaenel
Advertising in TV broadcasting has shown that multimedia is a very effective means to present merchandise and attract shoppers. This has been applied to the Web by including animated multimedia banner ads on web pages. However, the issues of coupling interactive browsing, shopping, and secure transactions e.g. from inside a multimedia banner, have only recently started to being explored. Currently there is an explosively growing amount of back-end services available (e.g., business to business commerce (B2B), business to consumer (B2C) commerce, and infomercial services) in the Internet. These services are mostly accessible through static HTML web pages at a few specific web portals. In this paper, we will investigate the feasibility of using interactive multimedia banners as pervasive access point for the B2C, B2B, and infomercial services. We present a system architecture that involves a layer of middleware agents functioning as the bridge between the interactive multimedia banners and back-end services.
User- and content-aware object-based video streaming over the Internet
Huai-Rong Shao, Wenwu Zhu, Ya-Qin Zhang
This paper presents a new scheme for efficient MPEG4 video transmission over the Internet that provides differentiated services (diff-serv). The scheme has three new elements: (1) smart packetization scheme which distinguishes different types of MPEG4 data streams; (2) object-based bit rate control and adaptation corresponding to users' interactions and selections; (3) selective dropping of packets that takes advantage of differentiated classes in diff-serv. Experiment was conducted on the Microsoft's IPv6 platform where an adaptive MPEG4 video streaming system was developed to verify our proposed approach. Experimental results show that our proposed scheme provides must improved QoS and capability for object-based user interactivity.
Image Coding I
icon_mobile_dropdown
Domain indexing for fractal image compression
Hsueh-Ting Chu, Chaur-Chin Chen
This paper presents a novel algorithm to accelerate the encoding procedure of fractal image compression. We develop an indexing technology to access candidate domain blocks. The location of maximal gradient is adopted as the key for indexing. Only those blocks whose positions of maximal gradients matching that of a given range block are rested. In our experiments, the new algorithm promises good performance. It takes few seconds to encode a 512 by 512 image on a Pentium II 450 PC with a slight loss of decoded image fidelity.
Improved fractal image coding using subblock luminance shifting
This paper presents an improved fractal block coding scheme for still images. The proposed scheme employs a new technique which we call `sub-block luminance level shifting.' In fractal block coding, an input image is first partitioned into range blocks. Each range block is encoded by a set of contractive affine transformations of its corresponding domain block. One of the coded data for each range block is an average pixel value of the range block, which is used for luminance level shifting between the range block and the contracted domain block. In our proposed method, a range block is further partitioned into sub-blocks in some cases and an average value of each sub-block instead of the range block is used for luminance level shifting. We have proposed an improved fractal block coding scheme applying this sub-block luminance level shifting adaptively block-by-block basis and also combining this method with adaptive range block size fractal coding. The computer simulation results show that the proposed fractal coding scheme gives higher SNR (Signal-to- Noise Ratio) values and better image qualities compared to the conventional fractal block coding scheme.
Image deblocking using spatially adaptive wavelet thresholding
Yuttapong Rangsanseri, Punya Thitimajshima, Siriporn Dachasilaruk
Low bit rate image coding is essential for many visual communication applications. However, it yields visually annoying artifacts that highly degrade the perceptual quality of image data. In this paper we proposed a novel method for reducing the blocking artifact that occurs when using DCT image coding at low bit rates. The blocking artifact has varied visibility in different region, that it, it is more visible in a smooth region than detailed region such as textures and edges. Hence, the proposed method is based on image segmentation and then thresholding the coefficients of different regions with adaptive thresholds. The proposed method gives consistent improvement over the previous methods for deblocking in term of peak signal-to- noise ratio, edge variance, and visual quality.
Adaptive interpolator with context modeling in lifting scheme for lossless coding
WenThong Chang, WenJen Ho
Adaptive interpolation with double-interpolator is used for image coding. The double-interpolator means that two adaptive prediction stages are used. The outer loop is an adaptive FIR predictor. The inner loop is an texture based bias estimation. The bias means a content dependent estimated prediction error. By assuming the signals to consist of polynomials of various degrees, the predictor of the outer loop is constructed by linearly combining a set of maximally flat filters. The maximally flat filter is the filtering implementation of the Lagrange interpolation. The prediction is done block by block. Within a block, similar with the lifting scheme, a hierarchical multiresolution prediction is used starting from the lowest resolution 2 by 2 sub-block. The least square prediction error criterion is used to derive the weighting coefficients of the predictor. To further reduce the prediction error, an inner loop to estimate the prediction error is included. Within the inner loop, the pixel to be predicted is classified into groups according to the neighborhood condition. The accumulated mean of the prediction errors of all the pixels within the same group is considered as the bias of the prediction error. This bias is then extracted from the actual prediction error to reduce entropy.
Special Session: Face Segmentation and Its Applications
icon_mobile_dropdown
Adaptive skin segmentation for head and shoulder video sequences
Nada Bojic, Khok Khee Pang
In recent years, there has been much interest in object- oriented video coding schemes as an alternative to conventional video coding schemes, particularly for very low bit rate applications such as the coding of head and shoulder sequences for video conferencing. Automatic face detection is desirable in object-oriented video coding schemes that seek to code head and shoulder video sequences. However automatic face detection is not a trivial problem when complex backgrounds are present. We propose a novel method of adequately characterizing skin color by exploiting the luminance and chrominance information found in the first frame of a sequence. The skin color characterization is then used to facilitate skin segmentation in subsequent frames, and thereby automatic face detection. To aid the adaptive skin characterization process, the head is constrained to a frontal, upright position in the first frame of the sequence. However no constraints are placed on the position of the head in the subsequent frames. The proposed scheme was tested on a number of common H.263 and MPEG-4 head and shoulder video sequences. Some experimental results are presented.
Finding faces in wavelet domain for content-based coding of color images: two approaches
Jayashree Karlekar, U. B. Desai
Human face images form the important database in banks, security kiosks, police departments and they are also found in abundance in day-to-day life. In these databases the important content, of course is the face region. In this paper, we present two highly efficient methods for compression face images. The first method, which is lossy, detects the human faces in the wavelet domain for discriminative quantization to achieve high perceptual quality content-based image compression. This method gives superior subjective performance over JPEG standard without sacrificing the performance in the rate-distortion spectrum. The second method, which is lossy as well as lossless, also detects the human faces in the wavelet domain so that lossy/lossless mode can be selected dynamically for compression. This method gives the best solution to applications which can not tolerate losses for face regions. The method improves the overall coding efficiency by adapting lossy mode for non-face regions instead of coding entire image in lossless mode.
Detecting humans: analysis and synthesis as a pattern recognition problem
Pankaj Kumar, Kuntal Sengupta, Surenda Ranganath
In this paper, we address a few important image analysis problems, which are fundamental to the design of Perceptual User Interface. We use an inexpensive stationary desktop camera to collect the video streams and use them as input to the system. We present an algorithm for segmenting moving foreground object of interest from a complex, but stationary background. This algorithm can cope with illumination changes due to shadows, Automatic Exposure Correction and long term illumination changes in the environment. The segmentation is done real time and works well for both indoor and outdoor scenes. The detected foreground is recognized as human being based on head shoulder profile.
Segmentation and tracking of facial regions in color image sequences
Bernd Menser, Mathias Wien
In this paper a new algorithm for joint detection and segmentation of human faces in color images sequence is presented. A skin probability image is generated using a model for skin color. Instead of a binary segmentation to detect skin regions, connected operators are used to analyze the skin probability image at different threshold levels. A hierarchical scheme of operators using shape and texture simplifies the skin probability image. For the remaining connected components, the likelihood of being a face is estimated using principal components analysis. To track a detected face region through the sequence, the connected component that represent the face in the previous frame is projected into the current frame. Using the projected segment as a marker, connected operators extract the actual face region from the skin probability image.
Special Session: Testing and Quality Metrics for Digital Video Services
icon_mobile_dropdown
Double-ended system for objective video quality assessment: brief description of GUI and algorithm
Mark Lutsker
Because of the complexity, variability and cost of subjective video quality testing there is a strong requirement for an instrument or system which will make objective quality estimations as close as possible to the subjective ones. This paper includes a description of such Video Quality Analyzing System. It works off-line, on Sun's Ultra-10 platform, and we think that its distinctions from other well-known systems such as Tektronix PQA200, Rhode & Schwarz DVQ etc. are significant--functionality and algorithmically. This is a result of intensive three-years- experience, and we hope that will be clear that the work `analyzing' is not accidental here: there is the main distinction.
Perceptual blocking distortion measure for digital video
Zhenghua Yu, Henry R. Wu, Tao Chen
In this paper, a perceptual blocking distortion metric for block-based transform coded digital video is proposed. The metric is based on a spatio-temporal multichannel vision model to calculate the Just Noticeable Distortion (JND) map. The blocking dominant regions are segmented after the spatio-temporal decomposition and the JNDs in these regions are summed up to form an objective measure of the blocking artifact. Subjective and objective tests have been conducted and the results show a strong correlation between the objective blocking rating and the mean opinion score.
Quality meter and digital television applications
Pierre Bretillon, Nathalie Montard, Jamal Baina, et al.
In today's television competitive environment, quality of service is critical. For network providers, the delivered quality of service is usually a matter of technical signal quality. However, with the current development of digital broadcasting, the quality of service that is provided to the end-user is one of the most important performance criterion. Thus, perceived image and sound quality are important and must be monitored. This paper addresses the issue of video quality monitoring in digital television networks. Many approaches to objective video quality assessment have been proposed and are being evaluated by normalization bodies. In order to build and apply in-service measurement, we propose a global model describing the whole approach and the different technical solution. Our technical solution is a double-ended with reduced reference method. Video quality assessment of digital television signals in a broadcasting network sets strong technical constraints. The reduced reference, a comparative approach, is well adapted since it has a moderate technical complexity and because it informs on any distortion in the network. The proposed method has been implemented and tested in a range of situations on simulated and real DVB networks. Three main applications are presented, network monitoring, DVB-T coverage area determination, and laboratory tests. This allows concluding that in-service measurements are the only valid measurements for end-to-end transmissions.
Impairment metrics for digital video and their role in objective quality assessment
Jorge Caviedes, Antoine Drouot, Arnaud Gesnot, et al.
In this paper we discuss work on quantification of video impairments resulting from MPEG compression, their role, and their scope of application for objective quality assessment. Three important metrics, blocking artifacts level, ringing artifact level, and corner outlier artifact level have been used to create a combined impairment metric. The relevance of this metric to develop an objective quality assessment has been investigated, as well as the issues facing the creation of a no-reference quality metric. The main issues are overall metric completeness, and performance of the individual metric components. The impairment metrics that we have studied appear to be key components for future no- reference type of objective quality metrics. Impairment metrics are also of great importance because they allow closing the detect-measure-correct loop that is necessary to improve image quality in real time. Applications of single- ended quality metrics include multimedia home terminals, STBs, digital TV, and low bit-rate video applications such as IP videotelephony and video streaming over IP.
Objective picture quality scale for video images (PQSvideo): definition of distortion factors
Yamashita Tetsuji, Masashi Kameda, Makoto M. Miyahara
We propose a PQSvideo (Picture Quality Scale for moving image), a method of objective quality assessment for coded moving images. We expect that the proposed PQSvideo approximates subjective assessment well. In PQSvideo, we define essential distortion factors considering not only global distortions (such as random noise) but also distortions on local features. Then, we describe each distortion metrically considering human visual perception. The PQSvideo is given by a linear combination of define each essential distortion factor, utilizing the principal component analysis method and the multiple regression analysis method between quantity of each essential distortion factor and MOS (Mean Opinion Score) obtained by assessment test. We have confirmed that the PQSvideo approximates MOS successfully.
Image Coding II
icon_mobile_dropdown
Enhanced-MMSE inverse halftoning using table-lookup vector quantization
Pao-Chi Chang, Tien-Hsu Lee, Che-Sheng Yu
The objective of this work is to reconstruct high quality gray-level images from bi-level halftone images. We develop optimal inverse halftoning methods for several commonly used halftone techniques, which include dispersed-dot ordered dither, clustered-dot ordered dither, and error diffusion. At first, the least-mean-square (LMS) adaptive filtering algorithm is applied in the training of inverse halftone filters and the optimal mask shapes are computed for various halftone techniques. In the next step, we further reduce the computational complexity by using lookup tables designed by the minimum mean square error (MMSE) method. The optimal masks obtained from the LMS method are used as the default filter masks. Finally, we propose the enhanced MMSE inverse halftone algorithm. It normally uses the MMSE table lookup method for its fast speed. When an empty cell is referred, the LMS method is used to reconstruct the gray-level value. Consequently, the proposed method has the advantages of both excellent reconstructed quality and fast speed. In the experiments, the error diffusion yields the best reconstruction quality among all three halftone techniques.
Subregion search algorithm for efficient VQ encoding of images
Man-Yee Lee Anson, Jian Feng
In this paper, we new sub-region search algorithm is proposed for fast encoding of image vector quantization. In the proposed scheme, we will only search a portion of the codebook to locate the best match vector of each input vector according to the state of its neighboring blocks. A block transition criterion is developed for selecting the subset of codebook for searching. Simulation results show that the encoding time required by the proposed algorithm is just about 10 - 20% of that required by the full search with almost the same output image quality.
Predictive absolute-moment block truncation coding for image compression
Sriram Subramanian, Anamitra Makur
Block Truncation Coding is one of the oldest known forms of image compression algorithms, its main attraction being its simple underlying concepts and ease of implementation. In this paper we present a new Predictive Absolute Moment Block Truncation Coding scheme which improves the performance of the existing Block Truncation Coding schemes. The proposed scheme is based on selectively predicting the reconstruction values of a block based on the corresponding values in the neighboring blocks, as well as predicting the bitplane from the bitplanes of the corresponding blocks in other color components.
Variable-block-size double predictor DPCM image coding
Jia-Chyi Wu, Hong-Bing Chen, Ren-Jean Liu
This study is to improve double predictor differential pulse code modulation (DP-DPCM) algorithm for image data compression. A variable block-size double predictor DPCM (VBDP-DPCM) image coding system operates on an image that has been preprocessed into segments of variable size, square blocks, and each block is separately encoded by a DP-DPCM system. Quadtree segmentation algorithm is utilized to divide a given real-world image into variable size image blocks. The detail regions of a given image is segmented into blocks with smaller block size, and the background regions of the image will be assigned larger block size to the image blocks. After quadtree segmentation procedure, the differential values between the nearby pixels within an image block are reduced. Therefore, we can decrease the distribution range of the prediction error as well as reduce the quantization levels and the bit rate. We then adopt the double predictor DPCM image coding system to reduce the effect from the fed-back quantization error and not to increase the system complexity. The source coding performance of this proposed variable block-size DP-DPCM image encoder/decoder system is about 5 dB (or greater) coding gain in Signal-to-Noise Ratio than that of a conventional DPCM system.
Multispectral satellite image compression based on multimode linear prediction
Wen-Nung Lie, Chun-Hung Chen, Chi-Fa Chen
In this paper, we propose a multi-mode linear prediction (MM_LP) scheme for the compression of multi-spectral satellite images. This scheme, extending our prior work on block-based single mode linear prediction, discards the prediction residuals and transforms the traditional residual-encoding problem into another mode-map encoding problem. The increase in the extra storage for more coefficients is nearly negligible and the compression of mode-map might be expected to have a higher efficiency than the residuals can achieve. We also propose an alternative scheme to hide the mode information in the LSB (least significant bit) of the residual data, which are then encoded to give a nearly lossless compression with PSNR larger than 51 dB (error variance (sigma) 2 equals 0.5/per pixel). Comprehensive experiments justify performance of our MM_LP schemes and recommend that MM_LP (k >= 2) is suitable for PSNR less than 41.5 dB; single-mode LP (k equals 1) is for PSNR between 41.5 dB and 50 dB, while 2-mode mode- embedding approach is for PSNR > 50 dB.
Segmentation and Tracking I
icon_mobile_dropdown
Image visualization based on MPEG-7 color descriptors
Thomas Meiers, H. Czernoch-Peters, L. Ihlenburg, et al.
In this paper we address the user-navigation through large volumes of image data. A similarity-measure based on MPEG-7 color histograms is introduced and Multidimensional Scaling concepts are employed to display images in two dimensions according to their mutual similarities. With such a view the user can easily see relations and color similarity between images and understand the structure of the data base. In order to cope with large volumes of images a modified version of k-means clustering technique is introduced which identifies representative image samples for each cluster. Representative images (up to 100) are then displayed in two dimensions using MDS structuring. The modified clustering technique proposed produces a hierarchical structure of clusters--similar to street maps with various resolutions of details. The user can zoom into various cluster levels to obtain more or less details if required. The results obtained verify the attractiveness of the approach for navigation and retrieval applications.
Integrate-and-fire models for image segmentation
Gregory A. Crebbin, Meutia Fajria
This paper describes an approach to image segmentation that is based on an integrate-and-fire operation. An analog cell is described that will fire current pulses when the input light intensity to the cell exceeds a given threshold. The firing of one cell can induce the firing of neighboring cells, so contiguous object regions are formed using both individual and neighborhood information. The proposed cell is relatively simple in structure and is capable of incorporating a range of advanced autonomous functions by adapting threshold levels to various local and global conditions.
Fast block-based image segmentation for natural and texture images
The block-based image segmentation method is known to alleviate the over-segmentation problem of the morphological segmentation methods. In this paper, we improve the previous block-based MAP segmentations. First, to reduce the execution time, we try to reduce the number of undecided blocks. That is, as the block size is reduced, we define new monotone regions with the undecided blocks to decrease the number of undecided blocks and to overcome the undersegmentation problem. Second, to improve the segmentation accuracy, we adopt two different block sizes. For texture block clustering process, we use a large block- size. On the contrary, for monotone and edge block classification, it is more efficient to use a small block- size. The proposed segmentation method is applied to natural images with monotone and texture regions. Experimental results show that the proposed method yields large segments for texture regions while it can also pick up some detail monotone regions to overcome the under-segmentation problem.
Region-based motion estimation for content-based video coding and indexing
Bertrand Chupeau, Edouard Francois
For the past decade, the region-based approach, that combines object segmentation and optical flow estimation, has emerged as the only one likely to provide automatically, at a reasonable computational cost, higher-quality descriptions of 2D apparent motion in video sequences, as compared to conventional pixel-based motion estimation. Within this framework, a hybrid algorithm, embedding classical defense motion field estimation and color-based spatial segmentation, is presented. Per each, arbitrarily shaped, color-homogeneous region, a polynomial motion- parameter set is robustly estimated from pixel displacement vectors. Following a graph-based approach and starting from the initial color partition, neighboring regions are iteratively merged according to their mutual motion similarity. The obtained motion-homogeneous regions are eventually temporally tracked along the sequence. The region-based motion estimation algorithm is described in details and its computational complexity is loosely evaluated through processing time statistics on a workstation. The partition maps and modeled motion fields obtained on three well-known test sequences--`Table Tennis', `Mobile and calendar' and `Flower Garden'--are displayed. Alternative approaches in the literature are then assessed, their results being compared with the above ones. Application of such an automatic `mid-level' image analysis tool to object-based representation, manipulation and coding as well as indexing of video is outlined at last.
Singular value features of images
Jingxin Zhang, Jim Schroeder, Tristrom Cooke, et al.
This paper presents the preliminary results of our investigation on using singular value decomposition transform to extract effective image features. It proposes using the singular values of subimages to discriminate different classes of imagery. The proposed new features have been very effective when applied to target detection and background discrimination in low resolution SAR imagery.
VLSI I
icon_mobile_dropdown
Digital implementation of shunting-inhibitory cellular neural network
Shunting inhibition is a model of early visual processing which can provide contrast and edge enhancement, and dynamic range compression. An architecture of digital Shunting Inhibitory Cellular Neural Network for real time image processing is presented. The proposed architecture is intended to be used in a complete vision system for edge detection and image enhancement. The present hardware architecture, is modeled and simulated in VHDL. Simulation results show the functional validity of the proposed architecture.
Fast search block-matching motion estimation algorithm using FPGA
Vera Ying Y. Chung, Man To Wong, Neil W. Bergmann
Many fast search block-matching motion estimation (BMME) algorithms have been developed in order to minimize the search positions and speed up the computation but they do not consider how they can be effectively implemented by hardware. In this paper, we propose a new regular fast search block-matching motion estimation algorithm named Two Step Search (2SS). The 2SS BMME will then be implemented by 8 Xilinx XC6216 fine-grain, sea-of-gate FPGA chips. The experimental and simulation results shows that it can have better algorithmic performance and can be implemented by FPGA chips very cost-effectively for video compression applications. Also, the 30 frames per second real time 2SS BMME video compression can be obtained by using eight Xilinx XC6216 FPGAs.
Array address translation for SDRAM-based video processing applications
Hansoo Kim, In-Cheol Park
To increase memory bandwidth of Synchronous DRAM (SDRAM) that is commonly employed as external memory in video applications, a memory address translation method is proposed for minimizing the number of overhead cycles needed for row-activations and precharges. The features of SDRAM and memory access patterns of video processing applications are considered to find a suitable address translation. Experimental results show that the proposed method increases the memory bandwidth by 42% over that of the conventional linear translation.
Novel video signal processor with VLIW-controlled SIMD architecture
Yong Zhang, Kai-Kuang Ma, Qingdong Yao
In this paper, we present a novel video signal processor (VSP) architecture, named VS-VSP, which combines the very long instruction word (VLIW) control with the single instruction multiple data (SIMD) processing technology. The SIMD architecture provides high performance for the computation-intensive tasks, and the VLIW control introduces satisfactory flexibility to the whole system. In addition, a hierarchical memory organization is employed in VS-VSP to tackle the high data-bandwidth requirements in the video signal processing application. The proposed VS-VSP architecture can be exploited for implementing a variety of video coding algorithms efficiently.
Content-Based Coding
icon_mobile_dropdown
Shape representation for content-based image retrieval
Ali Khenchaf, Marinette Bouet
To retrieve efficiently a specific image in their voluminous image database, users need of appropriate tools. That is the reason why, over the last years, content-based image retrieval systems have been developed. In these systems, users formulate their queries from both visual and textual descriptions. Consequently, these features must be described in a well-suited representation in order to satisfy the efficiency and relevant retrieval criterion. In the sequel, we will only dwell on one of the most important visual features, namely the shape feature. This feature is essential as it corresponds to region of interest in images. As far as the shape is concerned, an interesting representation can not be reached without efficient synergy between image analysis techniques, mathematics, and database technology. In this paper, the development of our theoretical model describing shape representation for content-based retrieval is presented as well as the numerical results computed for different simulations. These results are used to show the usefulness of the considered Fourier descriptors (first order and second order interpolation). Under MATLAB, a powerful mathematical software, we analyzed and easily validated the proposed method. To conclude, the relation between Moment theory (another interesting shape representation) and Fourier analysis is discussed.
Efficient image description method for image retrieval using circular scanning pattern
Ho-Keun Song, Eung-Kwan Kang
Although global color histogram and moments have been proven to be very useful for image indexing, they do not take color-based spatial information into account. Thus when the image collection becomes very large, many false hits frequently occur. The latest work, so called spatial color indexing, attempts to characterize finer details of the color distribution. However the conventional methods have to undergo multi-step processes of partitioning and indexing the image blocks. This paper proposes an efficient image description method for image retrieval using circular scanning pattern.
Wavelet-transform-based video content extraction for 3D wavelet video coding
Wen-Kuo Lin, Alireza Moini, Neil Burgess
In recent years 3D wavelet video coding has become a popular research area due to its low coding complexity. As the emerging MPEG-4 video compression standard places a strong emphasis on content base manipulation and compression, there is a need to implement the content coding feature within the 3D video coding algorithms. In this paper we propose a very simple algorithm to separate the foreground and background pixels for content based video coding. This algorithm exploits the 3D wavelet transform characteristics and only adds as small amount of overhead to the existing 3D video compression algorithms.
Mesh-based scalable video coding with rate-distortion optimization
In this paper, we present a mesh-based motion estimation scheme for image sequence. Nodal motion vectors optimization is performed by using a multi-resolution differential method. Because our final aim is mesh tracking throughout a video sequence with optimized reconstruction, neither backward tracking nor forward tracking is well suited. One motivation of our work is to take advantage of both forward tracking (which enables tracking) and backward tracking (for its efficiency) in a `backward in forward' method. For the optimization of the nodal motion vectors, we also propose a novel approach with multi-resolution and several hierarchy levels, which, in addition, makes it possible scalable representation. This is achieved with a progressive representation defined according to a rate distortion criterion. Results are presented to illustrate the proposed methods.
Scene change detection for video retrieval on MPEG streams
Eung-Kwan Kang, Sung-Joo Kim, SurngGabb Jahng, et al.
IN this paper, we propose a new scene change detection (SCD) algorithm, and also provide a novel video-indexing scheme for fast content-based browsing and retrieval in video databases. We detect scene changes from the MPEG video sequence, and extract key frames to represent contents of a shot. Then, we perform the video indexing by applying the rosette pattern to the extracted key frames, and retrieve them. Our SCD method is better than the conventional ones in terms of the SCD performance. Moreover, by applying the rosette pattern for indexing, we can remarkably reduce the number of pixels required to index and excellently retrieve the video scene.
Segmentation and Tracking II
icon_mobile_dropdown
Compressed video indexing based on object motion
Nevine H. AbouGhazaleh, Yousry Saber El Gamal
Compressed video processing for the sake of content based retrieval saves time of the expensive decoding. In this paper, we process the compressed MPEG video data for the motion analysis of its contents. Two motion components are differentiated from each other. Firstly, the object's motion; the change in object's co-ordinates throughout consecutive frames. Secondly, the camera motion resulting from the camera effects such as zooming in and out, and panning right and left, ... etc. A trajectory is constructed for each object and represented by a spatio-temporal representation. Video objects are indexed by the actual motion of the objects, independent from the moving camera motion.
Edge detection in multisurface objects in a homogeneous background using the phase-shift fringe projection method
This paper presents a technique for detecting edges of a multi-surface object placed in a background having grayscale intensity same as that of the object surface, i.e. in a homogeneous background. The various surfaces of the object also have the same intensities. For these types of objects an intensity gradient does not exist across the image and thus common intensity based edge detectors such as Sobel, Prewitt and Canny are unable to detect the edges successfully. In the proposed technique, sinusoidally-coded structured light is projected obliquely onto the object surface that is viewed from a normal position. The different illumination and viewing directions cause the projected light pattern (fringes) to break at the edges of the object due to the presence of a height difference between the various surfaces of the object and its background. By locating the positions of the intensity break points in the image it is possible to locate the various edges of the object. The main drawbacks of this technique is that the accuracy of the edge detection is greatly affected by noise. However, by using the phase-shift method and processing the resulting phase image to extract edges it was found that the accuracy is almost unaffected by the presence noise. The algorithms for the proposed edge detection technique were developed based on simulated images and later applied to real images with suitable modifications.
Joint tracking of region-based and mesh models of 2D VOPs in video sequences
The problem of tracking of triangulated meshes for Video Object Planes in video sequences is considered. Triangulated meshes are constructed on the basis of hierarchical region- based models of VOP. The mesh is articulated, that is each polygonal region in a VOP is Delaunay triangulated separately and all partial meshes are connected in the global triangulation. Nodal optical flow is computed from motion parameters of associated regions according to an affine motion model using a hierarchy of region-based models. The tracking of such a VOP along video sequence is based on a full region based tracking of a hierarchical region-based model of a VOP. A forward tracking scheme and a backward motion compensation for a hybrid coding are proposed. Experimental results for a complex articulated VOP are presented.
Multiple-object tracking under occlusion conditions
Young-Kee Jung, Yo-Sung Ho
This paper describes an algorithm for multiple object tracking that takes a new occlusion reasoning approach. In order to track individual objects under occlusion conditions, we design a 2D token-based tracking system using Kalman filtering. The proposed tracking system consists of two parts: object detection and tracking, and occlusion reasoning using feature matching. The object detection and tracking part finds moving objects from their background. For object detection, we develop an adaptive background update technique. By tracking individual objects with segmentation information, we generate motion trajectories. Computer simulation of the proposed scheme demonstrates its robustness to various occlusion conditions for several test sequences.
Special Session: Image Analysis and Understanding
icon_mobile_dropdown
Supervised texture segmentation using DT-CWT and a modified k-NN classifier
Brian W. Ng, Abdesselam Bouzerdoum
Texture segmentation has been an important problem in image processing. Filtering approaches have been popular, and recent studies have indicated a need for efficient, low- complexity algorithms. In this paper, we present a texture segmentation scheme based on the Dual-Tree Complex Wavelet Transform (DT-CWT). The advantage of the DT-CWT over other approaches is that it offers a partially redundant representation with strong directionality. The texture segmentation scheme presented here consists of three steps: feature extraction, conditioning, and classification. A number of feature smoothing windows have been tested. Classification is performed using a modified K-NN clustering algorithm. The proposed scheme consistently achieves error rates of less than 10%.
Review of 3D object representation techniques for automatic object recognition
George Jure Mamic, Mohammed Bennamoun
We present a critical review of the representation available for model-based object recognition. The choice of representation has a direct impact on the ability of the system to match objects, particularly in environments where noise or occlusion are presented. This review provides a set of criteria for object representation in recognition systems and this forms the basis upon which a qualitative critical review has been formulated. Initially well established techniques for object representation are reviewed. In particular object centered representations and viewer centered representations are addressed. The primary issue which afflicts these representations is the domain of objects which they can effectively represent. Free-form representations, which overcome this problem, are also examined. In particular, parametric representations including splines, implicit algebraic surfaces and finite element techniques, and geometric models which are based on differential geometry are considered. Ultimately, only representations, which possess an elegant blend of efficiency, accuracy and large representational domain are successful in providing a recognition system with the information required to achieve the level of scene understanding required in modern computer vision.
Aircraft recognition and pose estimation
This work presents a geometry based vision system for aircraft recognition and pose estimation using single images. Pose estimation improves the tracking performance of guided weapons with imaging seekers, and is useful in estimating target manoeuvres and aim-point selection required in the terminal phase of missile engagements. After edge detection and straight-line extraction, a hierarchy of geometric reasoning algorithms is applied to form line clusters (or groupings) for image interpretation. Assuming a scaled orthographic projection and coplanar wings, lateral symmetry inherent in the airframe provides additional constraints to further reject spurious line clusters. Clusters that accidentally pass all previous tests are checked against the original image and are discarded. Valid line clusters are then used to deduce aircraft viewing angles. By observing that the leading edges of wings of a number of aircraft of interest are within 45 to 65 degrees from the symmetry axis, a bounded range of aircraft viewing angles can be found. This generic property offers the advantage of not requiring the storage of complete aircraft models viewed from all aspects, and can handle aircraft with flexible wings (e.g. F111). Several aircraft images associated with various spectral bands (i.e. visible and infra-red) are finally used to evaluate the system's performance.
Motion Estimation I
icon_mobile_dropdown
New block-matching algorithm for motion estimation based on predicted direction information
Jae Yeal Nam, Jae Soo Seo, Jin Suk Kwak, et al.
This paper proposes a new and efficient algorithm for block matching motion estimation. It reduces search times and improves performance of motion estimation. Instead of using the fixed first search point to be alike previous search algorithms to search motion vector, the proposed method finds more accurate first search point as to compensating search area using temporal correlation of motion vector. Therefore, the proposed method could reduce search times and improve the accuracy of motion estimation. The main idea of proposed method is based on the property of consistent directivity and center-biased distributions of motion vector. Simulation results show that the PSNR values are improved up to the 3.4 dB as depend on the image sequences and advanced about 1.5 dB on an average. The results of the comparison show that the performance of the proposed algorithm is better than those of other fast search algorithms whether the image sequence contains fast or slow motion, and is similar to the performance of the Full Search algorithm. Simulation results also show that the performance of the proposed scheme gives better subjective picture quality than the other fast search algorithms.
Overlapped multiresolution motion compensation technique for wavelet video compression
Yufei Yuan, Choong Wah Chan
A new block based motion estimation and compensation technique named overlapped multi-resolution motion compensation (OMRMC) is proposed. The algorithm employs the overlapped block motion compensation algorithm in wavelet domain to reduce the blocky artifacts in the predicted frames using the multi-resolution motion estimation (MRME) technique. Simulation results showed the use of OMRMC reduced up to 26% of the displaced frame difference energy compared with MRME.
Fast motion compensation algorithm for video sequences with local brightness variations
Sang Hyun Kim, Rae-Hong Park
In this paper, a fast motion compensation algorithm is proposed that improves coding efficiency for video sequences with brightness variations. We also propose a cross entropy measure between histograms of two frames to detect brightness variations. The framewise brightness variation parameters, a multiplier and an offset field for image intensity, are estimated and compensated. Simulation results show that the proposed method yields a higher peak signal to noise ratio compared with the conventional method, with a greatly reduced computational load, when the video scene contains illumination changes.
Comparative study of motion estimation for low-bit-rate video coding
Cheng Du, Yun He
Typical motion estimation for block-based video coding schemes consists of two parts: the one at integer pixel accuracy and the other at half pixel accuracy. In this paper, integer pixel motion estimation algorithms are first discussed in terms of three technical categories: search step, search pattern and decision of the initial motion vector. With the development of the efficiency of integer pixel motion estimation, the computation load of widely used full half pixel search becomes relatively higher. In order to further improve the speed of half pixel search, this paper proposes a paraboloid prediction based fast half pixel search algorithm. Experimental results show that variable search step, search pattern with less points and predicted initial motion vector helps to improve the performance of fast integer pixel search and the proposed fast half pixel search increases the speed of half pixel search with almost not affecting the image quality.
Lost motion vector recovery for digital video communication
Zhenghua Yu, Henry R. Wu, Songyu Yu
For MPEG-II and other hybrid MC/DPCM/DCT based video coding standards, it is very important to reconstruct the predicted frames based on the block motion information. In case of transmission over unreliable channels, error concealment methods are introduced to recover the lost or erroneous motion vectors. In this paper, a novel side motion estimation method is proposed to recover the lost motion vectors by selecting from a candidate motion vector set. The outer boundary of the lost block is used to perform motion estimation and the recovered motion vector is the one that minimizes the squared error of the block boundary pixels between two consecutive frames. The method takes advantage of the same motion direction of most blocks and their boundaries. It releases the boundary pixel gray level continuity assumption of traditional boundary match/side match approaches so that better estimation result can be achieved. Overlapped block motion compensation is also incorporated in the proposed method to reduce the blocking artifacts. By reducing the number of motion vectors in the candidate set, the performance of the proposed algorithm can be further improved.
Wavelets I and II
icon_mobile_dropdown
Impulse noise reduction from corrupted images using lifting wavelet filters
Shigeru Takano, Koichi Kuzyme, Koichi Niijima
A new method to remove impulse noise from images using lifting wavelet filters which contain free parameters is presented. High frequency components obtained by wavelet decomposition are large around impulse noise. Lifting wavelet filters are designed by tuning their free parameters so as to vanish high frequency components at the location of impulse noise. The designed filters have features of impulse noise. Detection of impulse noise can be done by applying the learnt filters to corrupted images. Reduction of impulse noise from the image is carried out using a wavelet reconstruction formula.
Optimum dithering for scalar quantization of image subband
Mohsen Ashourian, Zulkalnain Mohd. Yusof
A joint optimization method for design of filter banks and related dithered quantizers by using time-domain formulation of filter banks is provided. Mathematical analysis and experiments show that in image subband coding by appreciate choice of synthesis filter bank and dithering, it would be possible to replace signal-dependent artifacts with signal- independent white noise. This process improves the perceptual quality of reconstructed image, especially at low bit-rates and makes it more practical for a noise-removal algorithm to remove the output noise. Experiment results illustrate the effectiveness of dithering for lower frequency bands and modification of synthesis filter bank for higher frequency bands.
Special Session: Error-Resilient Image and Video
icon_mobile_dropdown
Coding scheme for wireless video transport with reduced frame skipping
We investigate the scenario of using the Automatic Repeat reQuest (ARQ) retransmission scheme for two-way low bit-rate video communications over wireless Rayleigh fading channels. We show that during the retransmission of error packets, due to the reduced channel throughput, the video encoder buffer may fill-up quickly and cause the TMN8 rate-control algorithm to significantly reduce the bits allocated to each video frame. This results in Peak Signal-to-Noise Ratio (PSNR) degradation and many skipper frames. To reduce the number of frames skipped, in this paper we propose a coding scheme which takes into consideration the effects of the video buffer fill-up, an a priori channel model, the channel feedback information, and hybrid ARQ/FEC. The simulation results indicate that our proposed scheme encode the video sequences with much fewer frame skipping and with higher PSNR compared to H.263 TMN8.
High-level syntax for H.26L: first results
Stephan Wenger
This paper introduces some preliminary results of the standardization process of ITU-T's H26L project. This forthcoming video coding standard will not only significantly improve the coding efficiency, but it will also introduce new concepts such as network friendliness, which are not common in current video coding approaches. The paper focuses on the high level syntax which resides hierarchically above the macroblock layer. Data partitioning techniques are used to separate data of different types from each other. The partitions are arranged in packets of data with different importance for the reproduced picture quality. Along with unequal error protection, which can either be a function of the underlying network or implemented on the application layer, the error resilience and thus the reproduced picture quality in error prone environments is greatly improved. To verify the findings, simulation results for an Internet/RTP environment, based on real-world observations of the current Internet that do not assume network-based quality of service, are included.
Structured design of standard-compatible error-resilient video coding with application to H.263
David W. Lin, YenLin Chen, Chi-Tien Lee
Many methods for video transmission error control have been proposed recently, especially in relation to transmission over bursty-error channels. However, a thorough, structural taxonomical framework for analysis and design of such methods seems lacking. Such a framework helps clarify in thought and aids in inspiring new error control methods or combinations thereof. We present a framework for classification of the various transmission error control techniques. We then consider error control for H.263. Several techniques are presented from the viewpoint of the proposed analysis framework and they illustrate how different techniques can be integrated coherently to achieve enhanced error resilience in the overall system. In particular, we employ slotted multiplexing at the multiplex level to reduce synchronization errors in variable-length coded data and to randomize the locations of the remaining error-corrupted image areas. At the source level, we introduce two standard-compatible schemes, called length- based intra refresh and motion vector pairing, respectively, which further limit spatial-temporal error propagation.
Performance analysis of unequal error protection codes for image transmission
Minh H. Le, Ranjith Liyanapathirana
The performance analysis of unequal error protection codes for image transmission over noisy channels and AWGN channels is considered in this paper. The International Standards Organization has proposed the JPEG standard for still image compression. JPEG standard not only provides the basic feature of compression as baseline algorithm but also provides the framework for reconstructing images in different picture qualities and sizes. These features are referred to as SNR and spatial scalability, respectively. The ability of calculation of the bit sensitivity of the compressed image data enables one to perform a highly efficient unequal error protection for image transmission over noisy channels. A channel code is proposed to protect progressively compressed and packetized image data that is transmitted across noisy channels. A group of unequal error protection codes using Trellis Coded Modulation (TCM) for the JPEG compressed images is considered. Computer simulated results of two levels of UEP with TCM and four levels of UEP with TCM are presented.
Application of soft-decision trellis decoding of block codes in narrow-banded image transmission system over Rayleigh fading channels
Dong-Feng Yuan, Chun-Yan Gao, Li-Jun Zhang
A new trellis decoding method of block codes based on the GAC structure is applied to Rayleigh fading channels in this paper. Performance of this decoding technique with hard- decision and soft-decision is studied respectively. Researches are also carried out to introduce this maximum- likelihood decoding method to narrow-banded mobile image transmission systems.
Motion Estimation II
icon_mobile_dropdown
Motion estimation using adaptive matching and multiscale methods
Stephanus Suryadarma, Teddy Surya Gunawan, Chong Man Nang
Past approaches on motion estimation use iterative algorithm to produce dense motion fields, which is modeled by the energy functions. The optimization strategy such as simulated annealing or iterated conditional mode reorganize the motion fields slowly. This paper introduces adaptive block matching and multiscale smoothing as an initial motion fields for bayesian based motion estimation. The adaptive block matching is a local intensity matching procedure, which gives a unique matching results. The results are smoothed by multiscale smoothing algorithm. This algorithm is based on kalman filter, but the time domain of this filter becomes the scale domain. The result shows that this strategy can give a more global motion fields than the result of single resolution bayesian motion estimation method. This multiscale smoothing algorithm have numerous possibility to enhance the speed as well as strategy to produce better motion fields.
Data adapting motion estimation and subsampling
Motion estimation represents the main computational burden of every hybrid video encoder. Various solutions have been proposed in order to reduce the number of operations needed for this task, trying to keep good quality of the estimation and of the relative encoded video. In this paper we propose an algorithm that, exploiting the statistical properties of the motion field, searches a number of points dynamically related to the evolution of the sequence. A subsampling pattern of the macroblock is also proposed to reduce the overall impact of the motion estimation in an MPEG encoder.
New predictive diamond search algorithm for block-based motion estimation
Alexis Michael Tourapis, Guobin Shen, Ming Lei Liou, et al.
In this paper a new fast motion estimation algorithm is presented. The algorithm, named as Predictive Diamond Search, is actually based on the Diamond Search (DS) algorithm, which was recently adopted inside the MPEG-4 standard. The DS algorithm, even though faster than most known algorithms, was found not to be very robust in terms of quality for several sequences. By introducing a new predictive criterion and some additional steps in DS, our simulation results show that the proposed algorithm manages to have similar complexity with the DS algorithm, while having superior and more robust quality, similar to that of the Full Search algorithm.
Block/object-based algorithm for estimating true motion fields
Demin Wang, Daniel Lauzon
A hybrid algorithm for estimating true motion fields is proposed in this paper. This algorithm consists of three steps: block-based initial motion estimation, image segmentation, and wrong motion vector correction based on objects. The hierarchical block-matching algorithms are improved for the initial motion estimation. The improved algorithm uses an adaptive technique to propagate motion vectors between hierarchical levels. It produces accurate motion field everywhere, except in the areas of motion occlusion. In order to correct wrong motion vectors in the areas of motion occlusion, the current image is segmented into objects and an object-based method is proposed to process the estimated motion fields. With the object-based method, wrong motion vectors are detected by approximating the estimated motion field in each object with a motion model, and are corrected using an object-adaptive interpolator. The object-adaptive interpolator is also used to increase the density of the motion field. Experimental results show that the improved hierarchical block-matching algorithm outperforms the conventional hierarchical block- matching algorithms. The proposed algorithm results in dense motion fields that are smooth within every object, discontinuous between objects of different motion, and very close to the true motion fields.
Wavelets I and II
icon_mobile_dropdown
Adaptive scalable video coding using wavelet packets
Mathias Wien, Bernd Menser
In this paper a hierarchical spatial scalable wavelet video coder is presented. The scheme employs backward motion compensation, therefore no motion vectors have to be transmitted. The coarser levels of the wavelet decomposition of the current frame are used for motion estimation and motion compensation of the lowpass band of the next finer level. The lowpass band of the coarsest level has to be coded separately, e.g. using DPCM.
Image Coding I
icon_mobile_dropdown
Proposal for a combination of compression and encryption
Lutz Vorwerk, Thomas Engel, Christoph Meinel
This paper describes how to integrate encryption in an algorithm which uses wavelets to compress data. Wavelets are more useful in image compression than other methods. This is because the definition about which data should be left out is more flexible. This adaptation of wavelets to specific features of images leads to acceptable results. The use of computers in image transfer will increase further and it will therefore become necessary to transfer image data quickly. In addition, a protection of the image against unauthorized access will become necessary. Image compression increases the amount of time needed for transferring an image. Regarding the area of telemedicine, there is a demand for the protection of images. The reason for this demand is the need to prevent the relationship between patient data and an image from being determined. Therefore, the approach intends to describe a combination of image compression by using wavelets and to integrate encryption into the wavelet- transform and -compression procedure.
Wavelets I and II
icon_mobile_dropdown
New method for reducing boundary artifacts in block-based wavelet image compression
Jianxin Wei, Mark R. Pickering, Michael R. Frater, et al.
The wavelet transform is usually performed on a whole image. However when the amount of memory available for the transformation is limited, the input image is partitioned into non-overlapping blocks and then each block is processed independently. Quantization, that typically follows the transformation procedure in a compression system, inevitably introduces distortion, which becomes especially pronounced along the boundaries of the blocks. In this paper we show that a significant reduction in these block boundary artifacts can be achieved by choosing odd block sizes given by 2N + 1 rather than the conventional even block sizes given by 2N. We show that, for the same coefficient entropy, an image compressed using 17 X 17 blocks of wavelet coefficients has at least 1 dB higher PSNR than an image compressed using 16 X 16 blocks of wavelet coefficients.
VLSI II
icon_mobile_dropdown
Synchronization of video in distributed computing systems
A distributed multimedia computing system consists of sources, processing units and presentation devices in which each component operates autonomously. This independency can be exploited for optimization of individual component performance. Flexibility in performance improvement is enhanced by using independent clock domains. This paper presents a Video I/O model for such a multimedia system. This model provides an asynchronous communication interface for independent clock domains with the ability to synchronize a video display to one of the video sources. The communication interface has been used for the design of I/O modules in a multimedia system, which are briefly outlined.
VLSI architecture for motion estimation on a single-chip video camera
Alexander Roach, Alireza Moini
This paper presents a flexible architecture for motion estimation and compensation using a 1D pipelined systolic array. It has been specifically designed to implement the four-step search algorithm but can easily be adapted to a wide range of other reduced-complexity search algorithms. The intention is for the architecture to be incorporated into the digital compression unit of a single-chip video camera, the target application of which is as a device enabling people to communicate using sign-language over a standard phone line. The complete architecture has been implemented as register transfer level VHDL code and its functionality has been verified by simulation. The final VLSI layout will be a combination of synthesized and custom- designed cells.
Coprocessor architecture for MPEG-4 video object rendering
Christoph Heer, Carolina Miro, Anne Lafage, et al.
The most crucial backend algorithm of the new MPEG-4 standard is the computational expensive rendering of arbitrary shaped video objects to the final video scene. This co-processor architecture presents a solution for the scene rendering of the CCIR 601 video format with an arbitrary number of video objects. For the very high data bandwidth rate a hierarchical memory concept has been implemented. The total size of all rendered objects for one scene may reach two times the size of the CCIR 601 format. Running at 100 MHz clock frequency, the co-processor achieves a peak performance at about two billion multiply- accumulate operations. The co-processor has been designed for a 0.35 micrometers CMOS technology. About 60% of the overall area of 52 mm2 is used for on-chip static memory. The power consumption of the co-processor has been estimated with 1 W.
Computation complexity analysis and VLSI architectures of shape coding for MPEG-4
Danian Gong, Yun He
There are huge bit-level operations in shape coding algorithms for MPEG-4. These bit-level operations make it difficult to implement real-time MPEG-4 coding in the general-purpose processor without any hardware support. This paper first gives the analysis of computation complexity in MPEG-4 shape coding algorithms. Then a dedicated VLSI architecture to accelerate shape coding, which is called Shape Engine, is proposed. Shape Engine is composed of three dedicated but flexible processors element (PE) namely PB-PE, Filtering-PE and CAE-PE. The combination of RISC core and Shape Engine leads to a great speed up over pure software implementation of MPEG-4 shape coding.
CMOS circuit for high-speed flexible read-out of CMOS imagers
Amine Bermak, Abdesselam Bouzerdoum, Kamran Eshraghian, et al.
In this paper, a CMOS circuit for flexible read-out of imagers is proposed allowing random access, sequential access and window based access to the pixels. The circuit has been implemented within a CMOS imager using 0.7 micrometers technology. It is shown that this versatile read-out technique is obtained with only 8% increase in the silicon area as compared to the often used sequential read-out technique. The read-out circuit is fully digital, which makes it more robust against sizing mismatch. The circuit operates at a maximum frequency of 50 MHz which makes it very attractive for real time applications.
Synthetic Image/Video Coding
icon_mobile_dropdown
Modeling and training emotional talking faces of virtual actors in synthetic movies
Savant Karunaratne, Hong Yan
This paper presents an overview of a virtual actor system composed of several subsystems designed to automate some of these animation tasks. Our emphasis is on the facial animation of virtual actors. The paper specifically details the situations processor component of the framework, which is a major building block in the automatic virtual actor system. An expert system using a fuzzy knowledge-based control system is used to realize the automated system. Fuzzy linguistic rules are used to train virtual actors to know the appropriate emotions and gestures to use in different situations of a synthetic movie, the higher level parameters of which are provided by human directors. Theories of emotion, personality, dialogue, and acting, as well as empirical evidence is incorporated into our framework and knowledge bases to produce promising results.
Multiresolution feature-based image registration
Chiou-Ting Hsu, Rob A. Beuker
Image registration is the fundamental task used to match two or more partially overlapping images and stitch these images into one panoramic image comprising the whole scene. To register two images, the coordinate transformation between a pair of images must be found. In this paper, a multiresolution feature-based method is developed to efficiently estimate an eight-parametric projective transformation model between pairs of images. The proposed method has been tested and work well on many images of static scene taken with a hand-held digital still camera.
Artificial object trajectory modifications for 2D object-based video compositing
Franck Denoual, Henri Nicolas
With the emergence of MPEG-4, the new standard for multimedia applications, the mix of natural and synthetic material is made possible and will lead to fast developments of applications in virtual and augmented realities fields such as video special effects or post-processing. Nevertheless, with existing techniques, object-based editing and compositing of real video sequences require important manual processing. The proposed interactive and semi- automatic object-based video editing approach has been designed in order to reduce as much as possible this human work. It is based on key-framing and permits to add new moving objects, and to remove or to modify the trajectories of existing ones. This method has been validated on real test sequences.
Video reframing relying on panoramic estimation based on a 3D representation of the scene
Agnes de Simon, Jean Figue, Henri Nicolas
This paper describes a new method for creating mosaic images from an original video and for computing a new sequence modifying some camera parameters like image size, scale factor, view angle... A mosaic image is a representation of the full scene observed by a moving camera during its displacement. It provides a wide angle of view of the scene from a sequence of images shot with a narrow angle of view camera. This paper proposes a method to create a virtual sequence from a calibrated original video and a rough 3D model of the scene. A 3D relationship between original and virtual images gives pixel correspondent in different images for a same 3D point in model scene. To texture the model with natural textures obtained in the original sequence, a criterion based on constraints related to the temporal variations of the background and 3D geometric considerations is used. Finally, in the presented method, the textured 3D model is used to recompute a new sequence of image with possibly different point of view and camera aperture angle. The algorithm is being proven with virtual sequences and, obtained results are encouraging up to now.
Construction of omnidirectional images for image-based virtual studio
Yuko Yamanouchi, Hideki Mitsumine, Seiki Inoue, et al.
To generate highly realistic scenes of a virtual studio, we are developing technology of a new virtual studio based on image components from real videos instead of CG. And we call this system an Image-Based Virtual Studio. Two types of image components are now being developed for the system. These are an environmental image component for long distant view and a 3D image component for short distant view.
Application Systems
icon_mobile_dropdown
Adaptive flow control for mobile video terminals
Yasuyoshi Sakai, Jun Matsuda
We propose a flow control method that controls the usage quantity of resources and the playing quality of received image streams in mobile video terminals in a wireless telecommunications environment. Our adaptive flow control (AFC) is able to dynamically select and execute the most suitable playing method among several methods that have mutually different playing qualities affected by frame jitter (i.e., the data arrival jitter for one frame), that use different amounts of resources, and that have different structures in their threads, in order to play video, e.g., real-time video conferencing.
Summary description schemes for efficient video navigation and browsing
JaeGon Kim, Hyun Sung Chang, Munchurl Kim, et al.
The Summary Description Scheme (DS) in MPEG-7 aims at providing a summary of an audio-visual program that offers the effective mechanism for efficient access to the program by abstracting the contents. In this paper, we present, in details, the Summary DS proposed to MPEG-7 that allows for efficient navigation and browsing to the contents of interest as well as overview of the overall content in an incorporated way. This efficiency is achieved by a unified description framework that combines static summary based on key frames and key sounds with dynamic summary based on a series of highlights. The proposed DS also allows efficient description for the event-based summarization by specifying summary criteria. In this paper, we also show the usefulness of the Summary DS in real applications largely based on the results of the Validation and Core Experiments we performed in MPEG-7 activities. We also describe a methodology for the automated generation of a dynamic summary.
Visual sensing system for detecting accidents in lavatories using fiber grating vision sensor
Hirooki Aoki, Masato Nakajima
We have developed a visual sensing system for finding the elderly or sick people's paroxysm accidents in the lavatory. The system utilize our developed fiber grating vision sensor to get the 3D information of the person in the lavatory. A visual sensor installed on the ceiling watches the appearance of the person in the lavatory without disturbing his/her privacy. And the warning should be issued to his/her family or the staffs of the hospital in such a case that the system give a decision that the person falls senseless. This paper describes the configuration of the system for finding paroxysm accidents in lavatory using fiber grating vision sensor and the method for classify the states of the person in the lavatory. And, we installed our trial system in the model lavatory and performed an experiment assuming various situations in the lavatory. We think that we could obtain satisfactory determination precision regarding the external warning.
Fast image retrieval based on K-means clustering and multiresolution data structure for large image databases
Byung Cheol Song, Myung Jun Kim, Jong Beom Ra
This paper presents a fast search algorithm based on multi- resolution data structure for efficient image retrieval in large image databases. The proposed algorithm consists of two stages: a database-building stage and a searching stage. In the database-building stage, we partition the image data set into a pre-defined number of clusters by using the MacQueen K-means clustering algorithm. The searching stage has the two steps to choose proper clusters and to find the best match among all the images included in the chosen clusters. In order to reduce the heavy computational cost in the searching stage, we proposed two kinds of fast exhaustive searching algorithms based on the multi- resolution feature space, which guarantee a perfect retrieval accuracy of 100%. By applying these two algorithms to the searching stage, we can find the best match with very high search speed and accuracy. In addition, we consider a retrieval scheme producing multiple output images including the best match. Intensive simulation results show that the proposed schemes provide a prospective search performance.
Poster Session I: Image and Video Coding
icon_mobile_dropdown
Block truncation coding with adaptive decimation and interpolation
Ye-Kui Wang, Guo-Fang Tu
A modification of block truncation coding (BTC) is proposed in this paper. The modification is with a novel adaptive decimation and interpolation method, and predictive entropy coding of the quantization data. The decimation algorithm is designed based on the directional gradients of image blocks so as (1) to preserve the edge information, (2) to enable the interpolator utilize the already reconstructed data in decoder and (3) to better predict the mean value of a block. The quantization data is not directly coded by the two reconstruction levels, but by the block mean and the difference between the block mean and the lower reconstruction level, to further reduce the bit-rate. Compared to other interpolative methods, the new decimation and interpolation method substantially improves the image quality and removes the annoying blocky artifact at the same time. The proposed scheme, which has low computational complexity, is shown to have comparable or better performance to/than BTC methods combined with vector quantization/discrete cosine transform and the most recently BTC modification that utilize the inter-block correlation.
Rate-distortion-model-based rate control algorithm for real-time VBR video encoding
Junfeng Bai, Qingmin Liao, Xinggang Lin
Rate Control is an important component in a video encoder for date storage or real-time visual communication. In this paper, we will discuss the rate control in MPEG encoder for real-time video communication over Variable Bit Rate (VBR) channel. In interactive video communication, the video transmission is subject to both channel rate constraints and end-to-end delay constraints. Our goal in this paper is to modify the rate control in MPEG-2 encoder and satisfy the rate constraints, and study how to improve the video quality in the scenario of VBR transmission. Here, we employ Leaky- Bucket to describe the traffic parameters and monitor the encoder's output. Depending on the Rate-distortion models developed by us, we present a rate control algorithm to achieve almost uniform distortion both within a frame and between frames in a scene. With adaptive rate-distortion models and additional function of scene detecting, our method can robustly deal with scenes of different statistical characteristics. Comparing to MPEG2 TM5, in real time video communication, we could keep the constant buffer delay while maintain the decoded image quality stable. Furthermore, the bit allocation in our algorithm is more reasonable and controllable. Therefore, our method realized the advantages that advanced by VBR video communication, such as small end-to-end delay, consistent image quality and high channel efficiency.
Evaluation of DWT and DCT for irregular mesh-based motion compensation in predictive video coding
Martina Eckert, Damian Ruiz, Jose Ignacio Ronda, et al.
While traditional video coding standards employ block-based processing (regarding motion estimation, motion compensation and DCT) which produce undesired artifacts, the most recent developments (H.263, MPEG) include first improvements in motion compensation (such as overlapped block motion compensation), which constitute a first step to avoid these limitations. Mesh-based motion compensation represents a further progress in this area as it treats the motion information in a continuous way over the whole frame. We investigated in these methods and have found that especially triangle meshes over irregularly spread node points lead to very good results. We presented first results of the compensation step, here we will show results of the complete coding scheme for rectangular frames. As the insertion of the transform/coding part raised the problem of which type of transformation should be employed to code the error image efficiently and without disturbing the gains of the compensation step, we here compare DCT and DWT (discrete wavelet transform) with different filter types, to find out which one is most useful to apply with mesh-based motion compensation.
Fast calculation of IFS parameters for fractal image coding
Masaki Harada, Tadahiko Kimoto, Toshiaki Fujii, et al.
Fractal image coding based on Iterated Function System (IFS) has been attracting much interest because of possibilities of drastic data compression. It achieves compression by using the self-similarity in an image. It is one of the weak points on IFS that the calculation time is huge. Especially, the amount of calculation on scaling parameter and rmse is very huge. In this paper, we propose two schemes to reduce the calculation time while the quality of the image is kept. The first one reduces calculation time of parameters, affine transform and rmse by using the maximum amplitude ratio which is a ratio between the maximum amplitude range of range block and that of domain block. By using the maximum amplitude radio, domain block which does not seem to choose is excluded before calculating parameters. The second one reduces calculation time of scaling parameters by using the ratio between variance of range block and that of domain block. The variance ratio is used instead of the scaling parameter. We perform the fractal compression experiments based on the proposed scheme to verify the effectiveness of these schemes. Computational experiments show that about 50% of calculation time is reduced by using both of two schemes.
Scalable and lossless transmission of multilevel images using minimized average error method
Shigeo Kato, Madoka Hasegawa
Image information service systems have been actively developed. It takes, however, very long transmission time for a multi-level image over narrow band channel, because the image information has a large amount of data. A variety of progressive or scalable transmission schemes have been proposed to reduce this disadvantage, which enable the receivers to recognize the image contents at the earlier transmission stage. Among these scalable transmission schemes, there are some schemes in which use bit plane coding techniques. These schemes have good performances because the resolution or the number of multi-levels can be improved independently. However, the effects of the half- tone representation at the earlier transmission stages are not good because of using the natural binary planes. In this paper, we propose a new scalable and lossless transmission method of multi-level images using the minimized average error method, which gives good quality of the half-tone reproduction. The simulation results show that our method gives good image quality at the first stage of the transmission.
Rate control scheme for low-delay MPEG-2 video transcoder
Hiroyuki Kasai, Maki Sugiura, Tsuyoshi Hanamura, et al.
In this paper, we focus on the video transcoder as a bit rate reducer in a network node and propose a rate control scheme for a low-delay MPEG-2 video transcoder. First, we summarize the requirements of a rate control algorithm for low-delay transcoding. Next, based on these requirements, we describe the proposed rate control scheme in detail. Then, we analyze the input and output buffer delay and calculate the total delay time of the proposed transcoder. Finally, we evaluate the proposed scheme from the simulation results of experiments on picture quality, transcoding delay time and GOP (Group Of Pictures) structure information (N/M value) of the input bit stream. Consequently, we showed that the proposed scheme can provide the same picture quality as a traditional scheme and is independent of the GOP structure.
Rate control scheme for MPEG transcoder considering drift-error propagation
Isao Nagayoshi, Hiroyuki Kasai, Hideyoshi Tominaga
As the transform method of video bit stream to the required bit stream format, the video transcoder has been adopted in various applications. In this paper, we focus on a MPEG video transcoder that can reduce the required bit rate by using re-quantization in the DCT domain. Furthermore, we propose a rate control method for this MPEG video transcoder that takes into consideration the estimation of drift-error propagation. First, we show the transcoder architecture with calculation of `Drift-Error Propagation Ratio'. Next, we explain the proposed rate control method that uses minimization of the sum of re-quantization Error and drift- error. Finally, from simulation experiments, we compare the proposed rate control method to the traditional rate control method in terms of complexity, required buffer size and picture quality.
Lossless compression of 3D medical images using reversible integer wavelet transforms
HyoJoon Kim, JongHyo Kim, ChoongWoong Lee
This paper presents a lossless compression method for 3D medical images such as CT, MRI by using reversible integer wavelet transforms. This method is named 3D block-based zerotree with block partitioning (BZBP) and is based on zerotree coding in a 3D hierarchical tree and octonary block partitioning. Block-based zerotree codes zero regions and finds significant blocks in which there are significant coefficients. Significant blocks are zoomed in by block partitioning to detect significant coefficients in the block. The results show that the lossless compression ratio of 3D BZBP is higher than that of 3D SPIHT and 3D BZBP outperforms 3D SPIHT for reconstructed image quality at low bit-rate and in case of low wavelet decomposition level.
Robust image compression using reversible variable-length coding
Andrew Perkis, Oscar Solano Jimenez
This paper demonstrates a robust compression/decompression system for still image coding. The error resilience is obtained by substituting a regular Variable Length Coding (VLC) scheme with a Reversible Variable Length Coding (RVLC) scheme. The results show that this substitution increases the coder robustness significantly. Results on the substitutions are obtained by comparing the performance of RVLC to an early implementation of JPEG2000 (VM3.0B). Reversible Variable Length Coders can decode independently both from the beginning and the end of the sequences. This achieves an increased robustness to errors in that more codewords will be decoded than in a regular VLC, which can only decode from the beginning of the sequence. The gain of our coders in the region of interest, bit error rates ranging from 10-4 to 10-2, are in order of 2 dB over the VM3.0B. Visually the differences are significant.
Coding of surveillance imagery for interpretability using local dimension estimates
Robert Prandolini
This paper introduces a novel image coding principle: the coding of an image to maximize its interpretability versus bit-rate performance. For large surveillance images it would be more appropriate if the encoded wavelet coefficients were prioritized in their order of importance for interpretability. This paper presents one method for such a system. The importance values are derived from the estimates of the local dimension in image regions, which is a measure on the local image dynamics. The scale of the area used for the estimates is dyadic and maps to the image scale-space. The wavelet coefficients from a Mallat decomposition are prioritized according to their importance, based on the local information dimension estimates. Subjective evaluations have shown that this importance prioritization schema is preferred over the traditional progressive PSNR optimal approach. The paper will discuss the implementation of an importance prioritization schema for the EBCOT image coder, which is the algorithm used in JPEG2000. The concept of importance prioritization for interpretability may benefit future low bit-rate image and video coding.
Content-based rate control for low-bit-rate video applications
Kewei Shi, Anni Cai, Jingao Sun
In low bit-rate or very low bit-rate video applications, such as videophone and video conferencing, it is difficult for traditional methods to gain a satisfactory image quality. They often suffer from some shortcomings: images with artifacts caused by limited bandwidth and motion discontinuity caused by frame skipping when bit-streams are transmitted through a fixed rate channel in low-delay applications. To attack the two problems above, we propose a content-based rate control method for videophone and video conferencing applications. Our work is focused on three aspects: analysis of bit-streams in low bit-rate video applications, content-based bit allocation to facial areas and background, and low-delay rate control in object-layer. Experiments show that the proposed method can achieve better subjective quality compared with that of the test model TMN5 of H.263, and slightly worse performance in PSNR, but with significantly better subjective quality in the facial areas, compared with that of the TMN8 of H.263+.
Optimal real-time control of low-bit-rate coders
Jose Ignacio Ronda, Alvaro Bescos, Martina Eckert, et al.
This paper addresses the design of a model-based optimal control system for the real-time operation of low bit-rate coders. This control problem differs from its classical version, since the control system must decide for each video frame whether it can be coded or it must be skipped, apart from deciding the coding parameters for each coded video frame. The control policy, which specifies this decision process, is obtained as the solution of a stochastic dynamic programming problem. This problem corresponds to a formal specification of the rate control issue, which results from the previous obtainment of a stochastic model of the video coder behavior and a cost function whose minimization becomes the target of the regulation task. The approach is tested on a standard implementation of an H.263 coder.
Design and implementation of the second-generation HDTV prototype video encoder of China
Jun Sun, Zhenghua Yu, Wei Ye, et al.
On October 1, 1999, the 2nd-generation HDTV prototype system of China has been successfully used for the first experimental live HDTV terrestrial broadcasting in China. Shanghai Jiao Tong University has developed the HDTV video encoder and the system multiplexer of the system. This paper will focus on the design and implementation of HDTV video encoder. Firstly, the backgrounds of developing HDTV in China are introduced briefly. Secondly, the whole second- generation HDTV prototype system scheme is overviewed. Thirdly, design and implementation of the HDTV video encoder are discussed in detail. Finally, the author discusses the development prospects of HDTV in China.
HDTV down-conversion using a drift reduction scheme
Dennis Jia Yu Chan, Sheng Mei Shen, Takafumi Ueno
In the near future of the digital television (DTV) broadcasting, we assume that a standard definition television (SDTV) decoder should have the function of receiving a high definition television (HDTV) signal so that user with SDTV display can receiver HDTV program and display it on its monitor. In order to do so, it is required to convert a HDTV bitstream down to a SDTV signal. In this paper, we propose a fast and memory efficient scheme for the down-conversion. The proposed scheme uses adaptive scalar to truncate the HDTV motion vector for performing half-pel motion compensation (MC) in low-resolution so that the loss in motion vector accuracy will not be accumulated. It saves a lot of computations and requires a smaller memory by performing MC after the down-conversion. The simulation results show that the proposed scheme provides acceptable visual quality while maintaining reasonable complexity.
Analysis and coding technique based on computational intelligence methods and image-understanding architecture
Human vision involves higher-level knowledge and top-bottom processes for resolving ambiguity and uncertainty in the real images. Even very advanced low-level image processing can not give any advantages without a highly effective knowledge-representation and reasoning system that is the solution of image understanding problem. Methods of image analysis and coding are directly based on the methods of knowledge representation and processing. Article suggests such models and mechanisms in form of Spatial Turing Machine that in place of symbols and tapes works with hierarchical networks represented dually as discrete and continuous structures. Such networks are able to perform both graph and diagrammatic operations being the basis of intelligence. Computational intelligence methods provide transformation of continuous image information into the discrete structures, making it available for analysis. Article shows that symbols naturally emerge in such networks, giving opportunity to use symbolic operations. Such framework naturally combines methods of machine learning, classification and analogy with induction, deduction and other methods of higher level reasoning. Based on these principles image understanding system provides more flexible ways of handling with ambiguity and uncertainty in the real images and does not require supercomputers. That opens way to new technologies in the computer vision and image databases.
Pyramid image coder using block-template-matching algorithm
Farhad Keissarian, Mohammad Farhang Daemi
In this paper, a new image coding technique is introduced first. Its inclusion in a pyramidal representation is presented afterwards. In the proposed stand-alone coding algorithm, referred to as Block template Matching, an image is block coded according to the type of individual blocks. A novel classifier, which is designed based on the histogram analysis of blocks is employed to classify the image blocks according to their level of visual activity. Each block is then represented by a set of parameters associated with the pattern appearing inside the block. The use of these parameters at the receiver reduces the cost of reconstruction significantly and exploits the efficiency of the proposed technique. The coding efficiency of the proposed technique along with the low computational complexity and simple parallel implementation of the pyramid approach allows for a high compression ratio as well as a good image quality. Satisfactory coded images have been obtained at bit rates in the range of 0.30 - 0.35 bits per pixel.
L-infinity constrained micronoise filtering for high-fidelity image compression
Imaging apparatus inevitably impose undesirable noises onto acquired images during real imaging process. Usually these noises are too faint to cause unpleasing visual effects, however, they degrade image fidelity and significantly lower the compression ratio of lossless coding. More baffling, in this case, there leaves little room for traditional noise filtering methods to work. This paper will introduce some of our efforts trying to weaken the effect of such Micro Noise during near-lossless compression. Experimental results on ISO test images and micro Gaussian noises demonstrate that with potentiality of filtering micro noise, an improved near-lossless coder can not only achieve obviously higher compression ratio but also provide better image fidelity (measured by mean squared error) than lossless coding.
Compression of palettized images with progressive coding of the color information
Uwe Rauschenbach
This paper introduces a new compression method for palettized images, which supports progressive refinement of the color information in contrast to the resolution refinement used in standard methods like interlaced GIF. Such, fine image details can be recognized after decoding only a small part of the compressed image data. Achieved compression ratios are comparable to those of interlaced GIF or PNG. The method combines color map sorting with bitplane by bitplane prediction and Golomb coding of the pixel field.
Nonredundant representation of images allowing object-based and multiresolution scalable coding
Isabelle Amonou, Pierre Duhamel
In this work, we investigate a new class of scalable image coders. We target at the same time multiresolution (for spatial scalability), critical (for compression efficiency) and (hierarchical) segmentation based decompositions (for object based scalability). Hierarchical segmentation allows to access the description of a scene in terms of regions or objects at several resolution levels, and thus encode and transmit the objects selectively. From a coding viewpoint, it is obviously interesting to couple the multi-level segmentation with a critically decimated decomposition of the image (to avoid redundancy of representation). However, the association of object representation combined with critically sampled multiresolution decomposition has not been studied to our knowledge. In this paper, we propose new methods to perform hierarchical segmentation of an image using critically decimated non linear filter banks; the resulting decomposition embeds a hierarchical segmentation map and is therefore particularly well suited for region based coding and progressive transmission. As the segmentation map is embedded by reconstruction inside the decomposition, we do not really need to transmit it separately, thus attempting to reduce the bitrate. Simulations show that a prototype coder of this type has a degradation in terms of rate/distortion tradeoff compared to a conventional wavelet based image coder, but offers in addition new perspectives for object based manipulations, coding and transmission.
Toward a robust solution for image coding with easy content access
We address in this paper the issue of content accessibility in compressed images. Under content accessibility we understand the efficiency of regaining the features of content elements that are important for content-based applications in large-scale image databases. It is realistic to expect that current and future databases are likely to store preferably compressed images in order to optimally use the available storage space. At the same time, widely used image compression methods, also including the present (JPEG) and future standards (JPEG 2000), are not explicitly optimized regarding the content accessibility. Consequently, a high computational load in reaching features in compressed images, combined with a large amount of images stored in a database, can negatively influence the interaction with that database. In order to make this interaction more efficient, it is necessary to develop compression methods that besides the classical three optimization criteria (bit rate/complexity/distortion minimization) also explicitly take into account the accessibility of image content. We approach this challenge and propose here a novel image compression method where a good synergy among all four optimization criteria is reached.
Poster Session II: Segmentation, Tracking, and Feature Extraction
icon_mobile_dropdown
Using a novel multiresolution hybrid matching method to improve stereo matching accuracy of satellite images
Yanwen Ji, Anthony Tung Shuen Ho, Tao Yu
A novel multi-resolution hybrid matching method based on wavelets to improve stereo matching accuracy of satellite images is presented. It is a feature-based system. Wavelets are used to perform multi-resolution edge extraction and multi-resolution matching. And edge pixels are matched using adaptive matching windows that vary their shapes according to the directions of edges. Unlike conventional matching methods, an adaptive searching range is applied here, which means each edge point's searching range may be different. And the low resolution level matched results are utilized for interpolating high resolution mismatched pixels.
Fast computation of Gaussian mixture parameters and optimal segmentation
We present a fast parameter estimation method for image segmentation using the maximum likelihood function. The segmentation is based on a parametric model in which the probability density function of the gray levels in the image is assumed to be a mixture of two Gaussian density functions. For the more accurate parameter estimation and segmentation, the algorithm is formulated as a compact iterative scheme. In order to reduce computation time and make convergence fast, histogram information is combined into the algorithm. In the iterative computation, the performance of the algorithm greatly depends on the initial values and properly selected initial estimates make convergence fast. A reasonable approach about the computation of initial parameter is also proposed.
Robust centroid target tracker based on novel distance features in cluttered image sequences
Jae-Soo Cho, Dae-Joung Kim, Dong-Jo Park
A real-time adaptive segmentation method based on new distance features is proposed for the centroid tracker. These novel features are distances from the center point of a predicted target to each pixel by a tracking filter in extraction of a moving target. The proposed method restricts clutters with target-like intensity from entering the tracking window with low computational complexity for real- time applications compared with other complex feature-based methods. Comparative experiments show that the proposed method is superior to the other segmentation methods based on the intensity feature only in target detection and tracking.
New discrete representation method for passenger counting system
Yue Feng, Alan L. Harvey
This paper presents a new method for accurate real time counting of passengers using discrete representation processing techniques. In order to minimize computational costs and problems with moving objects in the image overlapping each other, a specially designed discrete representation method is developed based on analyzing the objects in the discrete representation with an object center's line on the object trigger points sequence. It is planned to implement the algorithm on a personal computer to produce a real time working system.
Novel approach of combining temporal segmentation results to the region-binding process for separating moving objects from still background
Tianming Liu, Feihu Qi, Yiqiang Zhan
To automatically segment moving objects in video sequences, FUB and ETRI have proposed several approaches to combine results provided by temporal segmentation methods (by FUB and UH) and spatial segmentation methods (by ETRI). In this paper, the authors present a novel approach that fuses temporal and spatial information during the process of segmentation rather than combine temporal and spatial segmentation results. The proposed approach is based on a region binding process, during which temporal segmentation results are integrated. That regions are distributedly represented and characterized distinguishes the region binding from the region merging and region growing. By fusing both temporal and spatial information, primitively segmented regions are bound to form the Binding-Cores (BC), whose role is similar to that of seeds in the region growing. Then the rest regions are bound to their neighboring BCs under strong or weak rules. The approach is composed of four stages. The experiment results show the performance of the approach.
Detection of facial features based on the relaxation algorithm
Ho-Jin Lee, Dong-Gyu Sim, Rae-Hong Park
This paper proposes a relaxation-based algorithm for detection of facial features such as face outline, eyes, and mouth. At first, a number of candidates for each facial feature are detected. To select a correct set of facial features from the candidates, probabilities and geometric relationships of each candidate location are considered, in which a relaxation algorithm is used for implementation. Simulation results with various test images are presented.
Image segmentation through a multithresholding based on gray-level co-occurrence
Pornphan Dulyakarn, Punya Thitimajshima, Yuttapong Rangsanseri
This paper presents an unsupervised segmentation method applicable to both gray-level and multispectral images. For the gray-level image, the segmentation is achieved by a multithresholding on a histogram derived from gray-level co- occurrence of the image. The threshold selection is performed by using Otsu algorithm on such a histogram. This method is also extended to the case of multispectral images by converting the image into a monochrome version using Karhunen-Loeve transform. The results tested on a synthetic image is illustrated by comparison with the direct application of Otsu algorithm. This method was applied on many real images, and the results are also given.
Efficient video segmentation algorithm for real-time MPEG-4 camera system
Video segmentation is one of the fundamental technologies for content-based real-time MPEG-4 camera system. For real- time requirement, the segmentation algorithm should be fast and accurate. Almost all existing algorithms are computationally intensive and not suitable for real-time applications. In this paper, based on change detection and background registration, a new segmentation algorithm is proposed. The algorithm includes two modes: first, the baseline mode that can deal with general sequences; second, the shadow cancellation mode, which is based on gradient operation and block-based changed detection, can deal with sequences affected by shadow and light change. Experimental results show that this algorithm can be applied on different kinds of sequences, and can quickly give accurate segmentation results.
Video segmentation based on adaptive combination of multiple features according to MPEG-4
Datchakorn Tancharoen, Somchai Jitapunkul, Panachit Kittipanya-ngam, et al.
Video segmentation for object based video coding according to MPEG-4 should be able to segment interested objects in video sequence clearly. This paper presents the object segmentation algorithm which image features are combined to use in segmentation process following to characteristic of video signal. Because the combination of many features in video sequence is a method that can achieve high quality object segmentation. In addition, this algorithm is an adaptive method that many parameters can be adjusted in order to give clearly segmentation. The significant features are used in segmentation process including color, motion vector and change information. A fast shortest spanning tree algorithm is adapted to use for fast segmenting image boundary. Motion vectors are estimated and searched by thresholding hierarchical block matching which want quite low computation and give a few groups of motion vectors. The change information is used to detect moving objects that can separate between moving objects and static background. After that, each feature will be considered in segmentation decision process to segment interested objects. Then the post processing refines the final segmentation. The results from many test sequences have good quality and show object boundary clearly.
Tracking of video objects using a backward projection technique
In this paper, we present a technique for tracking video objects in a sequence. The proposed technique is based on a backward projection technique. Since classical backward technique can be disturbed by occlusions and potential errors in spatial segmentation or motion estimation, we propose an extension to be backward projection technique in order to cope with these problems. Results obtained show the relevance of the proposed approach for various kind of tracked objects that can either be rigid or non-rigid.
Multiscale region segmentation of images using nonlinear methods
Balaji Iyer, Malcolm David Macleod
Region segmentation is the process of identifying the regions within an image, where a region is a group of connected pixels with similar properties. In this paper, we present method of region segmentation using the scale-tree obtained from a datasieve--a recursive non-linear morphological filter. An initial segmentation of the image is done using the scale-tree based region growing method. We show that the properties of the datasieve scale-tree, in conjunction with the scale-tree based region growing method can be used to obtain the features of the objects in the image. The initial segmentation is then followed by post- processing methods for yielding a final segmentation. The results presented for several color images show the methods to be promising.
Automatic facial feature detection for model-based coding
Liyanage C. De Silva, Kyine Kyine Win
This paper presents an automatic facial feature detection system for 3D model based coding applications. This proposed system is based on simple image processing techniques, which can be easily implemented using parallel algorithms is parallel processing hardware. Model Based Face Coding can be used in remote teaching to enhance the quality of remote teaching, where by reducing the barrier between teacher and student. In this case, only a selected set of control points of the face is transmitted to the remote terminal instead of sending video signal. In order to extract this set of control points a predefined 3D generic wire frame model is used. In this paper, automatic extraction process of the feature points of facial images needed for 3D model fitting is discussed. The proposed detection methods for all the facial features utilize filtering, thresholding, edge detection and edge counting without any manual adjustments or initialization. Head top, chin points, eye center, mouth center and nose center were detected using vertical integral projection method. The centroid method was used successfully for eyebrow center detection. Four points of the mouth features were detected with both Canny edge detection method and amplitude projection method. The first one had limited success and second gave very satisfactory results. On the whole, the results obtained are encouraging and could be used in automatic registration of 2D facial images into 3D face models. Subsequent tracking of some of these feature points lead us to automatic facial expression recognition using optical flow.
Surface feature extraction of objects using the surface equation
Dae Hwan Hyeon, Sun Ho Lee, Dae-Hyun Kim, et al.
This paper presents a new method of 3D surface feature extraction using not conventional surface normal vectors and curvatures but center, corner and subsidiary points. We assume that each region of object have a uniform surface curvature distribution. With a range image, we get an edge map through the scan line technique. Using this edge map, we label a 3D object and extract center and corner point from each segmented region. Then we determined whether the segmented region is a planar surface or a curved. From the quadric surface equation, we calculate the coefficients of the planar surface or the curved surface. In this article, we use synthetic and real (Odetics) range images including polyhedral and curved objects.
Three-dimensional hybrid edge detection
Mohammed Bennamoun, Pi-Chi Chou, Espen Norheim, et al.
Volume image processing is a very important research area and has many medical applications. One of the key topics in medical image analysis is edge detection. This paper describes a new approach for 3D edge detection. The approach presented in this paper is based on the 2D Hybrid Edge Detector. The structure of the 3D Hybrid Edge Detector is equivalent to the structure of the 2D case. It is the combination of the first and the second order differential edge detectors. The combination of the two differential detectors give an accurate edge localization whilst maintaining immunity to the noise in the image. Results based on synthetic and medical images (Computed Tomography and Magnetic Resonance Imaging) using the Hybrid Edge Detectors are presented. They are also compared to the results of the Gradient of the Gaussian detector and the Laplacian of the Gaussian detector.
Recognition of 3D objects with curved surfaces based on the cross entropy between shape histograms
Dong-O Kim, Rae-Hong Park
In the real world, many objects consist of curved surfaces, thus recognition of 3D objects with curved surfaces is to be investigated. In this paper, we present the shape histogram based algorithm for recognizing and grouping objects, in which the cross entropy between shape histograms is employed. Computer simulations with various synthetic and real images are presented to show the effectiveness of the proposed algorithm.
Poster Session III: Image Processing
icon_mobile_dropdown
Source and channel coding approach to data hiding
Nidhal Abdulaziz, Khok Khee Pang
In this work, a robust data embedding scheme, which uses a source and channel coding framework for data hiding is implemented. The data to be embedded, referred to as the signature data, comprises of two different data types, text messages and images as the signature data. The first data type used was the text message, where the text message is converted into bits and these bits are coded using Reed- Solomon codes, and the resulting code is hidden into the wavelet transformed coefficients of the host image. For hiding images as signature data, an image is used as large as 128 X 128 to be hidden into a host image of size 256 X 256. The perturbations are controlled by a maximum allowable visible distortion that can be introduced in the host using a model of the human visual perception. This method could be used for both digital watermarking related applications as well as for data hiding purposes.
Block boundary detection method from JPEG images for embedded watermark detection
Madoka Hasegawa, Shigeo Kato
Large amounts of digital contents can be easily duplicated by personal computers. Illegal copies have become serious problem in recent years. Digital watermark technology is a solution of this task, and various kind of digital watermark techniques have been proposed. In general, there are two types of digital image watermarking schemes, that is to say, one is the scheme based on spatial domain and another is the scheme based on frequency domain. In the latter scheme, wavelet transform or DCT can be used and the watermark is embedded into their coefficients. This scheme makes watermark spread out into the whole image, and therefore, gives robust behavior to the frequency domain based attack like JPEG algorithm. However, it is difficult to extract the watermark from the partial image trimmed by cutting off the original one, because the starting position of the DCT block and the cut position are not usually same. The starting position of the DCT block has to be detected in order to extract the embedded watermark from the partial image. In this paper, we propose the detection method for DCT block boundary of original image from the trimmed partial image. Embedded watermark in the DCT coefficients can be extracted by detecting the DCT block boundary of the original image, with our proposed method.
Automatic text extraction from color image
WenPing Liu, Hui Su, Chang Y. Chi
An effective and fast method to extract text from color background is proposed in this paper. Under the assumption that the text strings with high contrast are usually important, a modified sobel operator is utilized to transform the color image into an edge binary image and an edge-based smear processing is also used to speed up the color image processing time greatly. Combining the feature- based text candidates identification with the moment preserving region segmentation, we can effectively extract text from scanned true-color image varying in text font, size and color complexity. The extracted texts are finally input to the Optical Character Recognition system. Experimental results demonstrate the high speed performance and feasibility of this method.
Efficient edge line average interpolation algorithm for deinterlacing
Tao Chen, Hong Ren Wu, Zhenghua Yu
In this paper, an efficient interpolation approach is proposed for deinterlacing within a single frame. On the basis of the edge-based line average (ELA) algorithm, two useful measurements are introduced within the analysis window in order to alleviate misleading decisions in determining the direction in which the interpolation is to be made. By efficiently estimating the directional spatial correlations of neighboring pixels, increased interpolation accuracy has been achieved. Additionally, the new method possesses a simple computation structure and is therefore easy to implement. Extensive simulations conducted for different images and video sequences have shown the efficacy of the proposed interpolator with significant improvement over previous ELA based algorithms in terms of both quantitative and perceived image quality.
Reduction of blocking artifacts by adaptive postfiltering of transform coefficients
Tao Chen, Hong Ren Wu, Bin Qiu
This paper proposes a novel postprocessing technique for reducing blocking artifacts in low bit rate transform coding. As the quantization is applied to transform coefficients in the encoding, the proposed approach works in the transform domain. Masking effect in the human visual system is considered, and an adaptive weighting mechanism is then integrated into the postfiltering. In low activity areas, since the blocking artifacts appear to be perceptually more detectable, a large window is used to efficiently smooth out the artifacts. In order to preserve image details, a small neighborhood and a large central weight are employed for processing those high activity blocks, where the blocking artifacts are less noticeable due to the masking of local background. The quantization constraint is finally applied to the postfiltered coefficients. Experimental results show that the proposed technique provides superior performance to other postprocessing methods in both objective and subjective image quality.
Three-dimensional range data interpolation using B-spline surface fitting
Songtao Li, Dongming Zhao
Many optical range sensors use an Equal Angle Increment (EAI) sampling. This type of sensors uses rotating mirrors with a constant angular velocity using radar and triangulation techniques, where the sensor sends and receives the modulated coherent light through the mirror. Such an EAI model generates data for surface geometrical description that has to be converted, in many applications, into data which meet the desired Equal Distance Increment orthographic projection model. For an accurate analysis in 3D images, a 3D interpolation scheme is needed to resample the range data into spatially equally-distance sampling data that emulate the Cartesian orthographic projection model. In this paper, a resampling approach using a B-Spline surface fitting is proposed. The first step is to select a new scale for all X, Y, Z directions based on the 3D Cartesian coordinates of range data obtained from the sensor parameters. The size of the new range image and the new coordinates of each point are then computed according to the actual references of (X, Y, Z) coordinates and the new scale. The new range data are interpolated using a B-Spline surface fitting based on the new Cartesian coordinates. The experiments show that this 3D interpolation approach provides a geometrically accurate solution for many industrial applications which deploy the EAI sampling sensors.
Efficient DCT-domain prefiltering inside a video encoder
Sung Deuk Kim, Jong Beom Ra
Efficient implementation of pre-filtering has been an important issue in video sequence coding, since it improves coding efficiency dramatically by alleviating camera noise. Based on the approximated generalized Wiener filtering and 2D discrete cosine transform (DCT) factorization, this paper introduces a novel pre-filtering scheme that is performed inside a video encoder. The proposed pre-filtering is performed, by scaling the DCT coefficients of original image blocks for intra block coding and those of motion- compensated error blocks for inter block coding, respectively. Even though the pre-filtering operation is embedded in a video encoder, its additional computational complexity is marginal compared to the encoding process, and the overall architecture of the conventional video encoder is maintained. In spite of its simplicity, the proposed pre- filtering scheme gives good filtering and coding performance for noisy video sequences.
Adaptive image sequence resolution enhancement using multiscale-decomposition-based image fusion
Jeong-Ho Shin, Junghoon Jung, Joon-Ki Paik, et al.
This paper presents a regularized image sequence interpolation algorithm, which can restore high frequency details by fusing low-resolution frames. Image fusion algorithm gives the feasibility of using different data sets, which correspond to the same scene to get a better resolution and information on the scene than the one obtained using only one data set. Based on the mathematical model of image degradation, we can have an interpolated image which minimizes both residual between the high resolution and the interpolated images with a prior constraint. In addition, by using spatially adaptive regularization parameters, directional high frequency components are preserved with efficiently suppressed noise. The proposed algorithm provides a better-interpolated image by fusing low-resolution frames. We provide experimental results which are classified into non-fusion and fusion algorithms. Based on the experimental results, the proposed algorithm provides a better interpolated image than the conventional interpolation algorithms in the sense of both subjective and objective criteria. More specifically, the proposed algorithm has the advantage of preserving high frequency components and suppressing undesirable artifacts such as noise.
Adaptive FIR filter design and implementation empowered by reconfigurable FPGAs
Anwar Dawood, Neil W. Bergmann, Zulfi Asdani, et al.
This paper explores the design and implementation of an adaptive Finite Impulse Response Filter on Reconfigurable Computing Technology (RCT). RCT deploys Field Programmable Gate Array technology as a flexible platform for implementing and improving digital systems design.
Optimal down-conversion in compressed DCT domain with minimal operations
Myoung-Cheol Shin, In-Cheol Park
A new down-conversion scheme in the DCT domain is presented, which can be used in decoders of DCT-compressed images and videos. The down-conversion in the transform domain generally requires lower computational complexity than the spatial domain down-conversion. The proposed method also requires computational complexity comparable to other DCT domain techniques, and it is optimal in MSE sense, whereas the others are not. We minimized the number of arithmetic operations without discarding any data in DCT coefficients. We first combined the DQ, IDCT, and spatial domain averaging together. To minimize the number of operations used, we then concentrated most of the multiplications to the first stage and performed them at once. As a result, the proposed scheme shows better PSNR characteristic than other DCT domain methods while it requires almost the same number of operations and memory.
Interframe robust image processing for restoration of heavily corrupted old movies
Takahiro Saito, Takashi Komatsu, Toshiaki Ohuchi, et al.
The old film restoration involves the development of image processing technology. We focus on interframe image processing algorithms for the correction of film misalignment, flicker correction and the removal of blotches. In the digital restoration, the film is first read with a scanner, but the scanned sequence often suffers from irregular spatial vibration due to inaccurate frame alignment. Hence, we develop a robust correction method for estimating interframe misalignment separately from camera work and compensating for it. After the correction of interframe misalignment, we perform flicker correction. Flickers are defined as undesirable brightness fluctuations. We present a hierarchy of flicker-correction models for correcting old film flickers, and develop a flicker correction method which first estimates the correction-model parameters from the input sequence and then corrects its flickers according to the estimated model. Furthermore, we present a method for blotch removal. Blotch distortions are repaired with a blending-type filter. For blotch detection we employ our previously presented approach of the spatiotemporal continuity analysis. The simulations on really corrupted sequences have demonstrated that our interframe image processing techniques can reduce film misalignment and can remove flickers and blotches almost perfectly.
Inverse filters for generation of arbitrarily focused images
This paper describes a novel filtering method to reconstruct an arbitrarily focused image from two differently focused images. Based on the assumption that image scene has two layers--foreground and background--, the method uses the linear imaging model of the acquired two differently focused images and the desired image with arbitrary blurring effect manipulated independently in each layer. The linear equation that holds between these images, which is derived from the imaging model, can be formulated as image restoration problem. This paper shows the solution of this problem, completely exists as an inverse filter, and the desired image can be reconstructed only by the linear filtering. As a result, reconstruction with high accuracy and fast processing can be achieved. Experiments using real images are shown.
Digital autofocusing of multiple objects based on image restoration
Chungnam Cho, SangKyu Kang, Joonshik Yoon, et al.
We proposed a new degradation model for auto focus blur between multiple objects with different out of focus parameters and the segmentation-based spatially adaptive regularized iterative restoration algorithm. In the proposed model, the boundary effect of out of focus objects is mathematically analyzed. By using experimental results, we show that the proposed restoration alignment can efficiently remove the space-variant out of focus blur of multiple blurred objects image.
Regularized constrained restoration of wavelet-compressed image
Junghoon Jung, Younhui Jang, Tae Yong Kim, et al.
Wavelet-compressed images suffer from coding artifacts, such as ringing and blurring, resulted from the quantization of transform coefficients. In this paper we propose a new algorithm that reduces such coding artifacts in wavelet- compressed images by using regularized iterative image restoration. We, first, propose an appropriate model for the image degradation system which represents the wavelet-based image compression system. Then the model is used to formulate the regularized iterative restoration algorithm. The proposed algorithm adopts a couple of constraints, and adaptivity is imposed to the general regularization process on both spatial and frequency domain. Experimental results show that the solution of the proposed iteration converges to the image in which both ringing and blurring are significantly reduced.
Out-of-focus blur estimation using isotropic step responses and its application to image restoration
Joonshik Yoon, Chungnam Cho, Inkyung Hwang, et al.
In this paper, we propose an out-of-focus blur estimation using isotropic step responses and its application to image restoration. The proposed algorithm can make in-focused image by using only digital image processing techniques, and it required neither infrared light or ultrasound nor focusing lens assembly operated by electrically powered movement of focusing lens.
Hiding data in halftone image using modified data hiding error diffusion
Ming Sun Fu, Oscar Chi Lim Au
With the ease of distribution of digital images, there is a growing concern for copyright control and authentication. While there are many existing watermarking and data hiding methods for natural images, almost none can be applied to halftone images. In this paper, we proposed a novel data hiding method, Modified Data Hiding Ordered Dithering (MDHED) for halftone images. MDHED is an effective method to hide a relative amount of data while yielding halftone images with good visual quality. Besides, the amount of hidden data is easy to control and the security depends on the key not the system itself.
Special Session: Testing and Quality Metrics for Digital Video Services
icon_mobile_dropdown
Video Quality Experts Group: current results and future directions
Ann Marie Rohaly, Philip J. Corriveau, John M. Libert, et al.
The Video Quality Experts Group (VQEG) was formed in October 1997 to address video quality issues. The group is composed of experts from various backgrounds and affiliations, including participants from several internationally recognized organizations working int he field of video quality assessment. The first task undertaken by VQEG was to provide a validation of objective video quality measurement methods leading to recommendations in both the telecommunications and radiocommunication sectors of the International Telecommunications Union. To this end, VQEG designed and executed a test program to compare subjective video quality evaluations to the predictions of a number of proposed objective measurement methods for video quality in the bit rate range of 768 kb/s to 50 Mb/s. The results of this test show that there is no objective measurement system that is currently able to replace subjective testing. Depending on the metric used for evaluation, the performance of eight or nine models was found to be statistically equivalent, leading to the conclusion that no single model outperforms the others in all cases. The greatest achievement of this first validation effort is the unique data set assembled to help future development of objective models.