Obtaining an upper bound in MPEG coding performance from jointly optimizing coding mode decisions and rate control
Author(s):
Wilson Kwok;
Huifang Sun;
John Ju
Show Abstract
This paper presents a general procedure for determining the optimal MPEG coding strategy in terms of the selection of macroblock coding modes and quantizer scales. The two processes of coding mode decision and rate control are intimately related to each other and should be determined jointly in order to achieve optimal coding performance. We formulate the constrained optimization problem and present solutions based upon rate- distortion characteristics, or R(D) curves, for all the macroblocks that compose the picture being coded. Distortion of the entire picture is assumed to be decomposable and expressible as a function of individual macroblock distortions, with this being the objective function to minimize. The determination of the optimal solution is complicated by the MPEG differential encoding of motion vectors and dc coefficients, which introduces dependencies that carry over from macroblock to macroblock for a duration equal to the slice length. Once the upper bound in performance is calculated, it can be used to assess how well practical sub-optimum methods perform.
Sequence image coding based on fractal approximation using dynamic residual pools
Author(s):
In Kwon Kim;
Rae-Hong Park
Show Abstract
We propose a two-layer sequence image coding algorithm based on residual block matching using fractal approximation. First, the motion compensation (MC) error signal is encoded by the discrete cosine transform (DCT). The motion vector and DCT coefficients are transmitted as the first layer and the residual signal of MC/DCT is encoded by fractal approximation and transmitted as the second layer. The second layer is encoded by the matching block selected from a dynamic residual pool. The reconstructed MC error image is used as a dynamic residual signal which is called a domain pool in conventional fractal coding. The computer simulation result by the proposed methods and the DCT-based methods shows that the performance improvement by the proposed method is significant.
Recovery of coded video sequences from channel errors
Author(s):
Ki-Won Kang;
Sang Hoon Lee;
Taejeong Kim
Show Abstract
In this paper, we propose a method to recover good quality pictures from channel errors in the transmission of coded video sequences. This work is basically an extension of the work presented in to video sequence coding. Transmitted information in most video coding standards is mainly composed of motion vectors (MV) and motion-compensated prediction errors (MCPE). The compressed data are transmitted in binary form through a noisy channel in general. Channel errors in this bit stream result in objectionable degradations in consecutive reconstructed frames. Up to now, there have been many studies for concealing the effects of channel errors on the reconstructed images, but they did not consider recovery of the actual binary data and instead just utilized replacement and/or interpolation techniques to make errors less visible to an observer. In order to have a simple and powerful method to recover the video sequences separately in MV and MCPE from errors, it is necessary to take full advantage of both the source and channel characteristics. The proposed method takes advantage of single-bit-error dominance in a received bit string by using a parity bit for error detection. It also takes advantage of high pixel correlation in usual images by using side- match criterion in selecting the best-fit among candidate replacements.
Fast DCT block smoothing algorithm
Author(s):
Rensheng Horng;
Albert J. Ahumada Jr.
Show Abstract
Image compression based on quantizing the image in the discrete cosine transform (DCT) domain can generate blocky artifacts in the output image. It is possible to reduce these artifacts and RMS error by adjusting measures of block edginess and image roughness, while restricting the DCT coefficient values to values that would have been quantized to those of the compressed image. This paper presents a fast algorithm to replace our gradient search method for RMS error reduction and image smoothing after adjustment of DCT coefficient amplitude.
Description and evaluation of a non-DCT-based codec
Author(s):
Albert A. Deknuydt;
Stefaan Desmet;
Luc Van Eycken;
Andre J. Oosterlinck
Show Abstract
At this time, almost all (de-facto) video coding standards are DCT based. It would be wrong though to think that DCT is the only practical way to reach a reasonable compression ratio for a reasonable codec complexity. In this paper, a description of a complete hybrid codec based on OLA (Optimal Level Allocation) and HVS (Human Visual System) based classification is given. Then a performance comparison is made between this codec and an MPEG-2 like codec.
Rate control for MPEG video coding
Author(s):
Limin Wang
Show Abstract
ISO/IEC MPEG-2 Test Model 5 (TM5) describes a rate control method which consists of three steps: bit allocation, rate control and modulation. There are basically two problems with the TM5 rate control. First, the quantization parameter for a macroblock is fully dependent upon the channel buffer fullness. Hence, macroblocks in a picture may not all be treated equally because of variations in buffer fullness, which may result in nonuniform picture quality. Secondly, the TM5 rate control does not handle scene changes properly because the target bit rate for a picture is determined based only on the information obtained from encoding of the pervious pictures. This paper presents a rate control approach which addressed these two problems associated with the TM5 rate control. A single quantization parameter is used for each picture, which guarantees that all the macroblocks in a picture are treated equally. To address the impact of scene changes on picture quality, we propose to code the first scheduled P picture after scene change as an I picture and the I picture in the following group of picture as a P picture. The simulation results demonstrate that a significant improvement is obtained using the proposed rate control.
Extraction of a dedicated fast playback MPEG bit stream
Author(s):
Emmanuel D. Frimout;
Jan Biemond;
Reginald L. Lagendijk
Show Abstract
The subject of the extraction of a dedicated fast playback stream from a normal play MPEG encoded stream is important for recording applications where fast playback is supported by such a device. The most important issue is the selection of the coefficients of codewords to retain for the fast playback stream. In this paper several codeword extraction methods of varying complexity, ranging from optimal extraction methods to a zonal extraction method are evaluated. The range of possible solutions gives the possibility to make a trade-off between performance and complexity. The newly developed selection method, based on a Lagrangian cost minimization per block in combination with a feedback rate control, yields an attractive performance-complexity combination.
Frontal-view face detection
Author(s):
Antonio J. Colmenarez;
Thomas S. Huang
Show Abstract
This paper presents a symmetry measurement based on the correlation coefficient. This symmetry measurement is used to locate the center line of faces, and afterward, to decide whether the face view is frontal or not. A 483-face image database obtained from the U.S. Army was used to test the algorithm. Though the performance of the algorithm is limited to 87%, this is due to the wide range of variations present in the database used to test our algorithm. Under more constrain conditions, such as uniform illumination, this technique can be a powerful tool in facial feature extraction. In regards its computational requirements, though this algorithm is very expensive, three independent optimizations are presented; two of which are successfully implemented, and tested.
Obtaining 3D shape of potted plants and modeling
Author(s):
Takeshi Agui;
Kenji Komine;
Hiroshi Nagahashi;
Takanori Nagae
Show Abstract
An approach obtaining three dimensional object shapes from a series of silhouette images is presented. A two dimensional image sequence of an object put on a turning table is taken by a video camera at regular time intervals to obtain silhouette images from multiple viewpoints. A pillar is obtained by sweeping the silhouette along a line parallel to the viewing direction. An intersection of the pillars from all the viewpoints surrounding the object is sampled to give a set of volume data. The segmentation and the modeling of potted plants are also studied as an application of computer graphics. The set of volume data is segmented into three parts of a potted plant, i.e., a pot, stems and leaves. Then, the pot and the stems are modeled by frustums and each of the leaves is approximated by two Bezier surface patches.
Three-dimensional object shape recognition using cross sections
Author(s):
Mehmet Celenk
Show Abstract
This paper describes a computationally efficient 3D object surface matching algorithm. In the proposed method, object and model surfaces are scaled to be in a unit cube in the 3D space. They are then sliced along the magnitude axis and the resultant object and model surface cross sections are represented in binary image format. The cross- sections' centroids of an unknown object and the models of different shapes are computed in their respective binary images. The resultant cross-sections are translated to the origin of the spatial plane using the centroids. Major and minor axes of the plane cross sections are aligned with the coordinate axes of the spatial plane. Matching of the aligned cross sections is done in the direction of the gradient of the cross section boundary by computing the shape deformation as the Euclidean distance between the object boundary points and the corresponding points in the model cross section boundary. The shape deformation distances measured in different cross sections are average and the minimum average shape deformation distance is used to identify the model best matching to the object of unknown classification.
Spectral approach to classification based on generalized unconditional tests
Author(s):
Karen O. Egiazarian;
Jaakko T. Astola;
Sos S. Agaian
Show Abstract
Spectral approach to distribution-free classification is presented. The discriminant function is based on generalized unconditional tests. The main steps of the algorithms are: (1) finding a set of deadlock generalized tests, (2) computing a local discriminant function for each such a test, and (3) performing the actual classification of the observed pattern into a class. The spectral algorithms involve computation of the Walsh and Reed-Muller (conjunctive) spectra using fast algorithms.
Fitting three-dimensional models to stereo images using a genetic algorithm
Author(s):
Tomoharu Nagao;
Takeshi Agui;
Hiroshi Nagahashi
Show Abstract
A method to determine positions and rotational angles of 3D objects in stereo images is proposed in this paper. First, range data of edge points of a left image are calculated by a stereo matching method. Next, a three dimensional model is rotated, translated and projected to a 2D plane, and the edges of the projected image are compared with those of the left image. The space transformation parameters which give the maximum matching ratio are searched by a Genetic Algorithm;GA. In the searching process of the proposed method, a set of space transformation parameters is regarded as chromosome of an individual, and a randomly generated population is evolved according to GA rules. Principle of the method and several experimental results are described.
Two-dimensional invariant pattern recognition using a back-propagation network improved by distributed associative memory
Author(s):
Jau-Ling Shih;
Pau-Choo Chung
Show Abstract
A system combining the BackPropagation Network (BPN) and the Distributed Associative Memory (DAM) for 2D pattern recognition is proposed. In the system, a sequence of image processes and transformations, including complex transform, Laplacian, and Fourier transform are used for invariant feature extraction. Two modified neural networks are proposed for pattern recognition: (1) the DAM combined with BPN, and (2) the BPN improved by DAM. In the DAM combined with BPN, a fine training is provided by the BPN to take the pattern variations within each class into the training procedure. Experimental results indicate that this improved DAM has higher recognition rates compared to a traditional DAM. In the BPN improved by DAM, the weights of the first layer used the memory matrix of DAM as initial values. This network is compared with the BPN. Experimental results show that this network not only has slightly higher recognition rate, but also requires less training time than a BPN. Finally, the system is also tested with noisy patterns. According to the experiments results, it is found that the system also has high recognition rate even on the noisy images.
Three-dimensional object recognition using hidden Markov models
Author(s):
Young Kug Ham;
Kil Moo Lee;
Rae-Hong Park
Show Abstract
We propose an effective segmentation and recognition algorithm for range images. The proposed recognition system based on the hidden Markov model (HMM) and back-propagation (BP) algorithm consists of three parts: segmentation, feature extraction, and object recognition. For object classification using the BP algorithm we use 3D moments, and for surface matching using the HMM we employ 3D features such as surface area, surface type and line lengths. Computer simulation results show that the proposed system can be successfully applied to segmentation and recognition of range images.
Pyramid multiresolution classifier for online large vocabulary Chinese character recognition
Author(s):
Quen-Zong Wu;
I-Chang Jou;
Yann Le Cunn
Show Abstract
A pyramid classifier is proposed for large vocabulary Chinese characters, which at first uses low resolution features to roughly classify the input character, and then used higher resolution features to make finer classification stage by stage. In addition to the rule- based preclassification, there are three stages to achieve recognition. The number of candidate categories is reduced step by step. We use one thousand categories of Chinese characters for experiments. Simulation results show that this classifier can recognize the input character with 93.1% and 90% accuracy on the training set and the test set respectively.
Temporal and spatial projection onto the convex set (POCS) based error concealment algorithm for the MPEG-encoded video sequence
Author(s):
Max Chien;
Huifang Sun;
Wilson Kwok
Show Abstract
The paper presents an algorithm attempting to combine both intra-frame and interframe information to reconstruct lost macroblocks due to imperfect communication channels when decoding a MPEG bitstream. The algorithm is a POCS-based (Projection Onto the Convex Set) iterative restoration algorithm incorporating both the temporal and spatial constraints derived from a set of analysis performed on the picture sequence. Often the use of the temporal information in the restoration process is complicated by the scene changes or large random motion activities. To reliably utilize the temporal information, we formulate a series of test to determine the usefulness of the temporal information. In addition, the tests yield a temporal constraint if the temporal information is deemed good. Along with the spatial constraints as described in [1], the temporal constraint is used in the proposed iterative restoration algorithm.
Simple multiresolution approach for representing multiple regions of interest (ROIs)
Author(s):
Andrew T. Duchowski;
Bruce Howard McCormick
Show Abstract
A simple spatial-domain multiresolution scheme is presented for preserving multiple regions of interest (ROIs) in images. User-selected ROIs are maintained at high (original) resolution while peripheral areas are degraded. The presented method is based on the well-known MIP texture mapping algorithm used extensively in computer graphics. Most ROI schemes concentrate on preserving a single foveal region, usually attempting to match the visual acuity of the human visual system (HVS). The multiple ROI scheme presented here offers three variants of peripheral degradation, including linear and nonlinear resolution mapping, as well as a mapping matching HVS acuity. Degradation of image pixels is carried out relative to each ROI. A simple criterion is used to determine screen pixel membership in given image ROIs. Results suggest that the proposed multiple ROI representation scheme may be suitable for gaze-contingent displays as well as for encoding sparse images while optimizing compression and visual fidelity.
Adaptive filter for noise removal using wavelet transform
Author(s):
Chun Hong Yang;
G. Su
Show Abstract
In this paper, an adaptive noise filter implemented in Wavelet Transform (WT) domain is proposed. This filter smoothes noise while preserving edges as much as possible by taking advantage of different characteristics of signal and noise in WT domain. (1) The shape of signal histograms in WT domain approaches a Gaussian distribution, the mean is zero and the variance increases as scale increases. (2) The white noise in spatial domain remains white in WT domain with variance decreasing proportionally to the scale. (3) Uncorrelated signal and noise in spatial domain remain uncorrelated in WT domain. (4) The signal-to-noise ratio (SNR) increases as scale increases in WT domain. Based on these analyses, we derive a simple form of the 2D Minimum Mean Square Error (MMSE) estimate algorithm in WT domain that is applicable for nonstationary image models. All the nonstationary image statistical parameters needed for the filter can be estimated from the noisy image and no a priori information about the original image is required. A comparison demonstrates that the method in WT domain provides better improvement of SNR and better subjective impression than the same method in spatial domain.
Blocking artifacts reduction in image coding based on minimum block boundary discontinuity
Author(s):
Byeungwoo Jeon;
Jechang Jeong;
Jae Moon Jo
Show Abstract
This paper proposes a novel blocking artifacts reduction method which is based on the notion that the blocking artifacts are present in images due to heavy accuracy loss of transform coefficients in the quantization process. We define the block boundary discontinuity measure as the sum of the squared differences of pixel values along the block boundary. The proposed method makes correction of the selected transform coefficients so that the resultant image has minimum block boundary discontinuity. It does not specify a transform domain where the correction should take place, therefore an appropriate transform domain can be selected at user's discretion. In the experiments, the scheme is applied to DCT- based compressed images to show its performance.
Novel filter algorithm for removing impulse noise in digital images
Author(s):
Nelson Hon Ching Yung;
Andrew H. S. Lai
Show Abstract
In this paper, we present a novel filter algorithm that is more capable in removing impulse noise than some of the common noise removal filters. The philosophy of the new algorithm is based on a pixel identification concept. Rather than processing every pixel in a digital image, this new algorithm intelligently interrogates a subimage region to determine which are the 'corrupted' pixels within the subimage. With this knowledge, only the 'corrupted' pixels are eventually filtered, whereas the 'uncorrupted' pixels are untouched. Extensive testing of the algorithm over a hundred noisy images shows that the new algorithm exhibits three major characteristics. First, its ability in removing impulse noise is better visually and has the smallest mean-square error compared with the median filter, averaging filter and sigma filter. Second, the effect of smoothing is minimal. As a result, edge and line sharpness is retained. Third, the new algorithm is consistently faster than the median filter in all our test cases. In its current form, the new filter algorithm performs well with impulse noise.
Spectral methods for threshold Boolean filtering
Author(s):
Jaakko T. Astola;
Karen O. Egiazarian;
David Zaven Gevorkian
Show Abstract
An efficient spectral algorithm for representing any Boolean function as a linear combination of positive Boolean functions (PBF) is proposed. A processor architecture realizing this algorithm with the varying levels of parallelism is suggested. The algorithm finds not only the truth table of PBF's, but also their minimal disjunctive normal form representations. This allows to incorporate previously proposed efficient stack filtering designs into the construction of architectures for threshold Boolean filters.
Image size reduction and enlargement based on circular apertures
Author(s):
Jia-Guu Leu
Show Abstract
In this paper we describe a new method for image size reduction and magnification. The method is based on a concept of circular apertures. In image reduction, a circular region in the original image is mapped to a single pixel in the reduced image. The average intensity of the pixels in the circle is assigned to be the intensity of the resulting pixel. In image magnification, a single pixel in the original image is projected to a circular region in the enlarged image. Weighted averaging is used to determine the intensity values for pixels in the resulting image at places where circles overlap. We have addressed 4 basic reduction/magnification scales in the paper. For each scale we have studied two aperture sizes. Higher scales can be obtained by repeatedly applying the basic ones. We have compared the results produced by the proposed method with the results produced by the commonly used resampling/zero-order hold method and found the proposed method gives results that are superior in both visual comparison and quantitative analysis.
Cumulant-based blur identification approach to image restoration
Author(s):
You Xu;
Gregory A. Crebbin
Show Abstract
In this paper, we provide a novel method based on higher order statistic cumulants to identify nonminimum-phase point spread functions and enlarges the possible distribution type of the image formation field. In our method, we specify the blur identification problem as an ARMA model parameter identification problem, but we consider the image model as a realization of a colored signal instead of a normal zero-mean white Gaussian signal, which enlarges the range of image types. For the colored input ARMA model, the contributions of the bicepstrum of ARMA model are only along the axes and 45 degree(s) degree lines, so we extract the linear parts of cumulant of blur image to analysis, in which we use higher-order statistic techniques to estimate the ARMA parameters. The experiments are present in this paper.
Two- versus three-dimensional object-based coding
Author(s):
A. Murat Tekalp;
Yucel Altunbasak;
Gozde Bozdagi
Show Abstract
We propose a new, efficient 2D object-based coding method for very low bit rate video compression based on affine motion compensation with triangular patches under connectivity constraints. We, then, compare this approach with a 3D object-based method using a flexible wireframe model of a head-and-shoulders scene. The two approaches will be compared in terms of the resulting bitrates, peak-signal-to-noise-ratio (PSNR), visual image quality, and execution time. We show that 2D object-based approaches with affine transformations and triangular mesh models can simulate all capabilities of 3D object-based approaches using wireframe models under the orthographic projection, at a fraction of the computational cost. Moreover, 2D object-based methods provide greater flexibility in modeling arbitrary input scenes in comparison to 3D object-based methods.
Interframe finite-state vector subband coding (VSC) scheme for very low bit rate (VLBR) video coding
Author(s):
Ya-Qin Zhang;
Weiping Li;
John P. Wus
Show Abstract
Vector subband coding (VSC) has been shown to be a promising technique for very low bit rate (VLBR) video coding. Good performance of VSC is achieved because the vector filter bank used in VSC preserves intra- vector correlation while reducing inter-band and inter-vector correlations. Application of VSC to VLBR video coding with intra-frame coding only has been reported previously. In this paper, we describe an inter-frame VSC scheme of VLBR video coding. To reduce inter-frame redundancy, the most popular technique is motion compensated prediction. This technique is effective in pixel-based coding schemes such as the ones used in various video coding standards. However, it is not very efficient when used in vector-based coding schemes. Therefore, instead of predicting pixel values and coding the prediction difference, the proposed new technique used the vectors in the vector subbands of the previous frame to predict the vectors in the corresponding vector subbands of the current frame. It is shown that such an inter-frame coding scheme promises good performance for VLBR video coding.
Motion modeling and estimation for very low bit rate video coding
Author(s):
Janusz Konrad;
Abdol-Reza Mansouri;
Eric Dubois;
Viet-Nam Dang;
Jean-Bernard Chartier
Show Abstract
In video coding at high compression rates, e.g., in very low bit rate coding, every transmitted bit carries a significant amount of information that is related either to motion parameters or to intensity residual. As demonstrated in the SIM-3 coding scheme, a more precise motion model leads to improved quality of coded images when compared with the H.261 coding standard. In this paper, we present some of our recent results on the modeling and estimation of motion of the compression and post-processing of typical videophone ('head-and- shoulders') image sequences. We describe a block-based motion estimation that attempts to optimize the overall bit budget for intensity residual, motion and overhead information. We compare simulation results for this scheme with full-search block matching in the context of the H.261 coding. Then, we discuss a region-based motion estimation that exploits segmentation maps obtained from an MDL-based (minimum description length) algorithm. We compare experimentally several algorithms for the compression of such maps. Finally, we describe motion-compensated interpolation that takes into account pixel acceleration. We show experimentally a major performance improvement of the constant- acceleration model over the usual constant-velocity models. This is a very promising technique for post-processing in the receiver to improve reconstruction of frames dropped in the transmitter.
Speech recognition for acoustic-assisted video coding and animation
Author(s):
Homer H. Chen;
Wu Chou;
Barry G. Haskell;
Tsuhan Chen
Show Abstract
In this paper, we discuss issues related to analysis and synthesis of facial images using speech information. An approach to speaker independent acoustic-assisted image coding and animation is studied. A perceptually based sliding window encoder is proposed. It utilizes the high rate (or oversampled) acoustic viseme sequence from the audio domain for image domain viseme interpolation and smoothing. The image domain visemes in our approach are dynamically constructed from a set of basic visemes. The look-ahead and look-back moving interpolations in the proposed approach provide an effective way to compensate the mismatch between auditory and visual perceptions.
Video coding algorithm based on recovery techniques using mean field annealing
Author(s):
Taner Ozcelik;
James C. Brailean;
Aggelos K. Katsaggelos
Show Abstract
Most of the existing video coding algorithms produce highly visible artifacts in the reconstructed images as the bit-rate is lowered. These artifacts are due to the information loss caused by the quantization process. Since these algorithms treat decoding as simply the inverse process of encoding, these artifacts are inevitable. In this paper, we propose an encoder/decoder paradigm in which both the encoder and decoder solve an estimation problem based on the available bitstream and prior knowledge about the source image and video. The proposed technique makes use of a priori information about the original image through a nonstationary Gauss-Markov model. Utilizing this mode, a maximum a posteriori (MAP) estimate is obtained iteratively using mean field annealing. The fidelity to the data is preserved by projecting the image onto a constraint set defined by the quantizer at each iteration. The performance of the proposed algorithm is demonstrated on an H.261-type video codec. It is shown to be effective in improving the reconstructed image quality considerably while reducing the bit-rate.
Design of a linguistic feature space for quantitative color harmony judgment
Author(s):
Yu-Chuan Shen;
Yung-Sheng Chen;
Wen Hsing Hsu
Show Abstract
The successful judgement of color harmony primarily depends on the determined features related to human's pleasure. In this paper, a new color feature of color linguistic distributions (CLD) is proposed upon a designed 1D image scale of 'CHEERFUL-SILENT'. This linguistic feature space is mainly designed by consisting with the color-difference of practical color vision. The CLD is described by a distance-based color linguistic quantization (DCLQ) algorithm, and is capable to indicate the fashion trends in Taiwan. Also, the grade of harmony can be measured based on the similarity of CLDs. Experiment of quantitative color harmony judgement demonstrate that the results based on CIE1976-LUV and CIE1976-LAB color spaces accomplish better consistency with those of questionnaire-based harmony judgement than the hue-dominated method.
Space-associated segmentation for multiforeground/background images
Author(s):
Yu-Chuan Shen;
Yung-Sheng Chen;
Wen Hsing Hsu
Show Abstract
Multi-foregrounds-backgrounds (MFsBs) images involve histogram- interlaces and space-neighboring foregrounds and backgrounds. In this paper, a concept of 'dummy background' is proposed to represent the 'perceived' background, instead of multibackgrounds. Also, the dummy background corresponds to scaling the morphological distance of foregrounds and backgrounds, which improve the space-association capability of traditional segmentation algorithm. Experimental results demonstrate that more better segmentation is accomplished by less time- consuming.
Shift-, rotation-, and limited-scale-invariant pattern recognition using synthetic discriminant functions
Author(s):
OuYang Yueh;
Pen-Wen Chen;
Hon-Fai Yau
Show Abstract
We propose here a simple way to synthesize a shift, rotation and limited size correlation filter, making use of the idea of synthetic discriminant functions (SDF). The SDF is synthesized by superimposing four 2nd order circular harmonics of a training reference pattern in 4 different sizes. Computer simulation experiments have shown that the filter is indeed shift, fully rotation and limited size invariant over a size range from 1 to 1.75. The invariant range can be increased if more training patterns are used.
Study of Thai character recognition method
Author(s):
Yasuhiro Nakamura;
Charia Promin;
Kineo Matsui
Show Abstract
This paper presents an auto-recognition scheme for printed Thai characters. The structure of Thai characters is very different from that of English characters. Thus, we summarize some properties of Thai characters for auto-recognition, and propose an algorithm to segment each character from the printed character image. Our experimental system results in good recognition more than ninety percent.
Optimum exposure factors of soft x-ray radiographs for the best object recognition
Author(s):
Hitoshi Kanamori;
Yoshiaki Ozaki;
Hideaki Kubota;
Masao Matsumoto
Show Abstract
The method for obtaining optimum exposure factors of tube voltage and mAs value (the product of the tube current and the exposure time) is illustrated. For this purpose, attenuation curves of the energy absorbed in emulsion layers and characteristic curves of film, in which the absorbed energy is used as the input instead of exposure, are derived. In addition, the knowledge of the minimum perceptible contrast and the optimum film density is required.
Comparison of the performance of three types of correlation filters for rotational-invariant pattern recognition
Author(s):
Yih-Shyang Cheng;
Tsair-Chun Liang;
Ray-Cheng Chang
Show Abstract
In this paper, comparison of the performance of three types of correlation filters: linear high-pass circular harmonic filter (LHPCHF), ideal high-pass circular harmonic filter (IHPCHF), and wavelet transform circular harmonic filter (WTCHF), are presented. These filters combining 2D symmetrical edge detection filter and circular harmonic filter (CHF) are used to perform shift- and rotational-invariant optical pattern recognition.
Fuzzy reasoning processor for camera image autofocus
Author(s):
Oscal Tzyh-Chian Chen;
Yao-Chou Lu;
Hwai-Tsu Chang
Show Abstract
A fuzzy reasoning processor for the autofocusing operation of a camera has been developed. The automatic focus is performed by evaluating the object distance and luminance. The object distance is measured by the beams of infrared light. The adequate contrast of object contours is evaluated by the image luminance. The proposed fuzzy reasoning processor can efficiently determine a good focusing point and its computation power can reach 3.75 million fuzzy logic inferences per second at a system clock of 30 MHz.
Design of a dataway processor for a parallel image signal processing system
Author(s):
Mitsuru Nomura;
Tetsuro Fujii;
Sadayasu Ono
Show Abstract
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
New architecture for running threshold Boolean filtering
Author(s):
Jaakko T. Astola;
David Akopian;
Karen O. Egiazarian
Show Abstract
In this paper we use bit-serial 'local sorting' to perform the threshold Boolean filtering, based on running processing without ordering the input data. The proposed architecture is simple and suitable for realization. It is shown, that introduced homogeneous generalized threshold Boolean filters can be represented as a threshold Boolean filter on the appended input signal window, and can be computed in the same architectures. Also homogeneous generalized threshold Boolean filters are represented as linear combinations of homogeneous generalized stack filters.
Very large scale integration (VLSI) implementation of block-based predictive Rice codec
Author(s):
Chien-Min Huang;
Alan W. Shaw;
Richard W. Harris
Show Abstract
This paper presents a VLSI implementation of the lossless block-based predictive Rice codec (BPRC). The BPRC uses an adaptive predictive coding algorithm to remove the redundancy in the image, codes the residue using an entropy coder. This algorithm can adapt well to local images statistics. The codec chip will encode 4 to 16-bit pixels at 10 Mpixels/sec input, and decode at 10 Mpixels/sec output. For images of normal size it requires little supports circuitry, only input data formatting and output data defomatting. Large images can be supported with external FIFOs.
Nonuniform image sampling and interpolation over deformed meshes and its hierarchical extension
Author(s):
Ouseb Lee;
Yao Wang
Show Abstract
To improve the reconstructed image quality with a given number of sampling points, nonuniform sampling is desired which adapts the sampling density according to the local bandwidth of the signal. Determination of optimal sampling positions and interpolation from nonuniform samples through the use of a coordinate mapping which converts nonuniform samples into points on a regular sampling lattice. We then introduce a nonuniform sampling scheme which embeds the samples in a generally deformed mesh structure that can be easily mapped to a regular sampling lattice. The optimal samples or the mesh is generated by minimizing the interpolation error. The numerical difficulty associated with dealing with nonuniform samples are circumvented by mapping all the operations to the master domain where the samples are uniformly distributed. With this scheme, in order to maintain the mesh topology, unnecessary nodes are usually allocated in large but smooth regions. For an improved sampling efficiency, a hierarchial nonuniform sampling scheme is also developed. Which embeds the samples in a generalized quadtree structure. Compared to its nonhierarchical counterpart, this scheme can reduce the numbers of samples significantly, under the same visual quality constraint.
Parallel image processing on a PC network
Author(s):
Yu-Fai Fung
Show Abstract
Image processing operations are computational intensive and are usually solved by parallel algorithms implemented either on dedicated parallel computing devices or a network of workstations. With the increasing computing power of Personal Computer (PC) and better networking facilities, it is now feasible to perform distributive computing on a PC network. As the cost/performance ratio of PC is high, this provides a cost effective solution for solving image processing problems. In this paper, we describe how to implement distributive image processing algorithms on a peer-to-peer PC network under the WindowsNTTM operating system and the efficiency of such algorithms will be discussed.
Multispectral imagery, hyperspectral radiometry, and unmanned underwater vehicles: tools for the assessment of natural resources in coastal waters
Author(s):
David K. Costello;
Kendall L. Carder;
Robert F. Chen;
Thomas G. Peacock;
N. Sandy Nettles
Show Abstract
In many coastal oceans of the world, the flora and fauna are under stress. In some areas, seagrasses, coral reefs, fish stocks, and marine mammals are disappearing at a rate great enough to capture the attention of, and in some cases, provoke action by local, national, and international governing bodies. The governmental concern and consequent action is most generally rooted in the economic consequences of the collapse of coastal ecosystems. In the United States, for example, some experts believe that the rapid decline of coral reef communities within coastal waters is irreversible. If correct, the economic impact on the local fisheries and tourism industries would be significant. Most scientists and government policy makers agree that remedial action is in order. The ability to make effective management decisions is hampered, however, by the convolution of the potential causes of the decline and by the lack of historical or even contemporary data quantifying the standing stock of the natural resource of concern. Without resource assessment, neither policy decisions intended to respond to ecological crises nor those intended to provide long-term management of coastal resources can be prudently made. This contribution presents a methodology designed to assess the standing stock of immobile coastal resources (eg. seagrasses and corals) at high spatial resolution utilizing a suite of optical instrumentation operating from unmanned underwater vehicles (UUVs) which exploits the multi-spectral albedo and fluorescence signatures of the flora and fauna.
Perceptual texture segmentation and characterization
Author(s):
Rafael Santos;
Takeshi Ohashi;
Takaichi Yoshida;
Toshiaki Ejima
Show Abstract
A very common task in image processing is the segmentation of the image in areas that are uniform in the sense of its features. Various applications can benefit even from partial segmentation, which is performed without need of physical of semantic knowledge. Several segmentation methods exist, but none that is applicable to all tasks. We use color and perceptual texture information to segment color images. Perceptual texture features are the features that can be qualified in simple descriptions by humans. Color information is represented in a perceptual way, using hue, value and saturation. These features' values are represented by histograms that integrates texture information around a small area. Segmentation and classification are obtained by comparing histograms of classes with the histograms of the area around the pixel being classified. We build a small application that uses remote sensing imagery and allows a user to interactively segment a Landsat TM-5 image using color and texture information. The steps and intermediate results of the classification are shown. The results are visually good, and the segmentation using color and texture information is more coherent than the using only color.
Matching algorithm for radical-based online Chinese character recognition
Author(s):
Quen-Zong Wu;
I-Chang Jou;
Bor-Shenn Jeng;
Chao-Hao Lee;
Miin-Luen Day;
Nai-Jen Cheng
Show Abstract
In this paper, we propose a matching algorithm for radical-based on-line Chinese character recognition. The major effort of this paper is to demonstrate recognition procedures for subcharacters, such as radicals and residual subcharacters, and nonradical characters. Since a Chinese character could have front radical, rear radical or none of them, the matching algorithm should be able to take care of all these conditions. Furthermore, instead of picking up the front/rear radical strokes from the input character before the matching process is taken, our matching algorithm determines how many strokes the front/rear radical should be during the matching process; it thus enjoys the property of flexibility. After the radical type and the number of strokes of the radical are figured out, the residual subcharacter can be picked up and submitted for matching again. By sequentially recognizing the types of front/rear radicals and the types of residual subcharacters, we can determine what the input characters are.
Minimum mean square error linear predictor with rounding
Author(s):
Fu Yu Tsai;
Huei Peng
Show Abstract
Many digital signal processing and image coding systems implement the linear predictor with rounding. Usually, people will obtain the linear predictors by solving the Yule-Walker equations or doing something equivalent. The predictors obtained in this way will not necessarily be the true minimum mean square error predictor considering the effect of rounding. In this paper, we address the issue of finding the true optimum mean square error rounded linear predictor. Experiment results show that when the prediction results are rounded, this true MMSE linear predictor could outperform the conventional one without considering the effect of rounding very significantly for data of low prediction errors.
New algorithms for processing images in the transform-compressed domain
Author(s):
Shih-Fu Chang
Show Abstract
Future multimedia applications involving images and video will require technologies enabling users to manipulate image and video data as flexibly as traditional text and numerical data. However, vast amounts of image and video data mandate the use of image compression, which makes direct manipulation and editing of image data difficult. To explore the maximum synergistic relationships between image manipulation and compression, we extend our prior study of transform-domain image manipulation techniques to more complicated image operations such as rotation, shearing, and line-wise special effects. We propose to extract the individual image rows (columns) first and then apply the previously proposed transform-domain filtering and scaling techniques. The transform-domain rotation and line-wise operations can be accomplished by calculating the summation of products of nonzero transform coefficients and some precalculated special matrices. The overall computational complexity depends on the compression rate of the input images. For highly-compressed images, the transform-domain technique provides great potential for improving the computation speed.
Three-dimensional modeling using surface regions from industrial sketches
Author(s):
Takanori Aoi;
Hiroshi Nagahashi;
Takeshi Agui
Show Abstract
In computer vision, it is one of the important subjects to estimate 3D shapes of objects from a 2D view. As a means of solving this problem, there is a method of using knowledge about objects for understanding a 2D view. This article presents a method to acquire shape data of objects from industrial sketches using knowledge about objects and to construct 3D models. Knowledge including uncertainty is used in the image processing. The processes are performed using the concept of likelihoods since the discussion using ambiguous knowledge are also ambiguous. Three-dimensional data of objects are acquired from industrial sketches, and then object models are constructed using bicubic Bezier surfaces.
Transmission capability of asymmetric digital subscriber lines in Taiwan
Author(s):
Shyue-Win Wei;
Shyue-Tzong Leu;
Che-Ho Wei
Show Abstract
The transmission capacity of discrete multitone (DMT) modulation system for Taiwan's subscriber loops is evaluated in this study. Based on the characteristics of Taiwan's local loops, the transmission capacity is estimated to be 1.544 Mb/s and 6 Mb/s in Taiwan. Simulation results also show how many percents of users in Taiwan may have 1.544 Mb/s or 6 Mb/s of asymmetric digital subscriber lines (ADSL) services. Self far-end crosstalk (FEXT) and additive white Gaussian noise (AWGN) are considered to be the dominant noise sources in the work.
Coding of partially computer-rendered image sequences
Author(s):
Thomas B. Riegel
Show Abstract
A new coding scheme for partially computer-rendered image sequences will be presented. It is specifically suited for heterogeneous data sets containing symbolic and pixel-based image descriptions, which are used by an electronic set system at the receiver site for the synthesis and mixture of transmitted image sequences. The different types of data sets and their particular properties regarding data compression are explained. Finally, results are given comparing the new coding scheme with traditional MPEG2 coding based on typical test sequences.
Design of a high-performance pyramid architecture
Author(s):
M. Fikret Ercan;
Yu-Fai Fung
Show Abstract
In this paper, we are going to introduce a pyramid architecture, which we are currently constructing for computer vision applications. Some of the architectural features of the system is its linear array interconnections, reconfigurable architecture and its design without top-down control mechanism. The system is targeted to use in real time image processing applications, so that high performance processors are used during construction. It has three processing layers for three different stages of image processing. Each layer has direct access to image source. Architectural properties of the system, its control mechanism and expected performance, are outlined in the following sections.
New VLSI architecture for full-search vector quantization
Author(s):
Chin-Liang Wang;
Ker-Min Chen
Show Abstract
This paper presents a new systolic architecture to realize the encoder of the full-search vector quantization (VQ) for high-speed applications. The architecture possesses the features of regularity and modularity, and is thus very suitable for VLSI implementation. For a codebook of size N and dimension k, the VQ encoder has area complexity of O(N), time complexity of O(k), and I/O bandwidth of O(k). It reaches a compromise between hardware cost and speed requirement as compared to existing systolic/regular VQ encoders. At the current state of VLSI technology, the proposed system can easily be realized in a single chip for most practical applications. In addition, it provides flexibility in changing the codebook contents and extending the codebook size, where the latter is achieved simply by cascading some identical basic chips. With 0.8 micrometers CMOS technology to implement the proposed VQ encoder for N equals 256 and k equals 16, the die size required is about 5 X 8.5 mm2 and the processing speed is up to 100 M samples per second. These features show that the proposed architecture is attractive for use in high-speed image/video applications.
Variable-rate predictive residual vector quantization
Author(s):
Syed A. Rizvi;
Nasser M. Nasrabadi;
Lin-Cheng Wang
Show Abstract
A major problem with a VQ based image compression scheme is its codebook search complexity. Recently a Predictive Residual Vector Quantizer (PRVQ) was proposed in Ref. 8. This scheme has a very low search complexity and its performance is very close to that of the Predictive Vector Quantizer (PVQ). This paper presents a new VQ scheme called Variable-Rate PRVQ (VR-PRVQ) which is designed by imposing a constraint on the output entropy of the PRVQ. The proposed VR-PRVQ is found to give an excellent rate-distortion performance and clearly outperforms the state-of-the-art image compression algorithm developed by Joint Photographic Experts Group (JPEG).
Finite-state residual vector quantization
Author(s):
Syed A. Rizvi;
Lin-Cheng Wang;
Nasser M. Nasrabadi
Show Abstract
This paper presents a new FSVQ scheme called Finite-State Residual Vector Quantization (FSRVQ) in which each state uses a Residual Vector Quantizer (RVQ) to encode the input vector. Furthermore, a novel tree- structured competitive neural network is proposed to jointly design the next-state and the state-RVQ codebooks for the proposed FSRVQ. Joint optimization of the next-state function and the state-RVQ codebooks eliminates a large number of redundant states in the conventional FSVQ design; consequently, the memory requirements are substantially reduced in the proposed FSRVQ scheme. The proposed FSRVQ can be designed for high bit rates due to its very low memory requirements and low search complexity of the state-RVQs. Simulation results show that the proposed FSRVQ scheme outperforms the conventional FSVQ schemes both in terms of memory requirements and perceptual quality of the reconstructed image. The proposed FSRVQ scheme also outperforms JPEG (current standard for still image compression) at low bit rates.
Multiresolution interpolative DPCM for data compression
Author(s):
WenJen Ho;
WenThong Chang
Show Abstract
A family of multirate representation of a given signal is defined for data compression. This family of multirate signals is constructed by polynomial interpolation of these direct decimated versions of a given signal. Interpolated signal from decimated signal is used to predict the higher resolution signal. The prediction error is the difference between the interpolated signal from lower resolution and the higher resolution one. This kind of signal representation can be called as interpolation compensated signal prediction. A multiresolution interpolative DPCM is then proposed to represent the prediction errors with a hierarchial multirate structure. This structure possesses the advantages of both the pyramid structure and the DPCM structure.
Fractal block coding using a simplified finite-state algorithm
Author(s):
Hsuan-Ting Chang;
Chung Jung Kuo
Show Abstract
The exhaustive search process leads to a computational burden and therefore increases the complexity in the fractal image coding system. This is the main drawback to employ fractals for practical image compression applications. In this paper, an image compression scheme based on the fractal block coding and the simplified finite-state algorithm is proposed. For the finite-state algorithm that has been successfully employed in the vector quantization (VQ) technique, the state codebook (equivalent to the domain pool in the fractal image coding) is determined by a specific next-state function. In this research, we use the position of the range block to decide its domain pool. Therefore, a confined domain pool is limited in the neighboring region of the range block and thus the search process is simplified and faster. During the computer simulations, we consider two partition types, the single-level (8 X 8 blocks) and two-level (8 X 8 and 4 X 4 blocks) conditions. The simulation results show that the proposed scheme greatly reduces the computational complexity and improves the system performance.
Image data compression with selective preservation of wavelet coefficients
Author(s):
Eiji Atsumi;
Fuminobu Ogawa;
Naoto Tanabe;
Fumitaka Ono
Show Abstract
Wavelet transform has recently been attracting notable attention in its applicability for a variety of signal processing or image coding1'2'3, since it is expected that wavelet provides a unified interpretation to transform coding, hierarchical coding, and subband coding, all of which have ever been studied separately. It is also expected that wavelet transform is shown to be more advantageous than other image coding schemes because the wavelet coefficients represent the features of an image localized both in spatial and frequency domains4'5. In case of transform coding or subband coding, the efficiency is generally maximized by designing bit allocation to each decomposed band signal proportional to the relative importance of information in it. This technique is known as the optimum bit allocation algorithm (OBA). However, OBA should not directly be applied to the wavelet coding, because OBA does not well exploit the spatial local information represented on each wavelet coefficient. The purpose of this work is to develop a quantization scheme which maintains significant spatial information locally represented on wavelet coefficients. Preserving only the selected coefficients which represent visually significant features and discarding the others, is expected to keep high image quality since the significant features will be kept even at a low bit rate. In this respect, we propose two kinds of image data compression techniques employing a selective preservation of wavelet coefficients. Section 2 gives a brief description of wavelet transform, which includes construction of wavelet basis functions, feature extraction with wavelet, and multi-resolution property of wavelet. In Section 3, the first technique is proposed where resolution dependent thresholding is introduced to classify wavelet coefficients into significant or insignificant ones. In Section 4, the second technique is proposed where better performance can be achieved by further classifying significant coefficients with a multiresolution property of wavelet. Finally, summary and conclusions are provided in Section 5.
Minimization of the mosaic effect in hierarchical multirate vector quantization for image coding
Author(s):
Xavier Jove-Boix;
M. E. Santamaria;
F. Tarres;
J. Trabado
Show Abstract
In this paper, we propose an image coding method, which basically consist on the application of Vector Quantization and appropriate post- processing in order to minimize the mosaic effect. We have to store a monographic image data base. We use a Hierarchical Multirate Vector Quantization to codify the data base in a low bit rate and tolerable SNR. HMVQ is a suitable algorithm in order to separate different contrast blocks in each image. This division provides as a reconstruction selective filtering to minimize the mosaic effect. Each block will be filtered according to its characteristics. The gap between contiguous blocks is minimized with low contrast subimages. At the same time, edge blocks will maintain image details.
Psychovisual image coding via an exact discrete Radon transform
Author(s):
Jeanpierre V. Guedon;
Dominique Barba;
Nicole Burger
Show Abstract
The goal of this paper is to describe a new fully-reversible image transform specifically designed for an efficient (pseudo-critical) coding while preserving a psychovisual Fourier domain description. There is now strong evidence for the presence of directional and angular sensitivity in the cells of the human visual cortex and the representation proposed here has for main objective to respect this human like filter bank. The decomposition is performed using a discrete Radon transform for the angular patches and by splitting each projection with a 1D spline wavelet for the radial part. Consequently, the whole algorithm is performed in the spatial domain. Finally, we show that the transform is both well-suited for psychovisual quantization and channel adapted coding.
Compression behavior of the JPEG baseline image coder
Author(s):
Leu-Shing Lan
Show Abstract
In this paper, we have investigated the compression behavior of each processing step of the JPEG baseline image coder. The two main objectives of this research are to provide a better understanding of the JPEG system and to provide a guideline for improving the performance of any JPEG-like image coders. For performance evaluation, we have chosen the estimated entropy as a means for performance measure. The key results of this paper are: (1) Generally, the psychovisually-weighted plays a dominant role of the overall system performance. (2) The compression gain provided by the entropy coding procedure is also significant. Since there is a gap between the estimated entropies and the actual coding rates, a more efficient entropy coding procedure which reflects the signal statistics should improve the system performance. (3) The common concept of the optimal transform is variance-based, which requires a zonal selection of the transform coefficients. Since the JPEG adopts the thresholding quantization, the ordinary discussion of an optimal transform is not appropriate. A truly optimal transform should take the transform and its subsequent operations into account. In consequence, to improve the overall system performance it would be effective to focus on the quantization and entropy coding procedures.
Fast methods for fractal image encoding
Author(s):
Gregory Caso;
Pere Obrador;
C.-C. Jay Kuo
Show Abstract
Fractal image compression is a relatively new and very promising technique for still image compression. However, it is not widely applied due to its very time consuming encoding procedure. In this research, we focus on speeding up this procedure by introducing three schemes: dimensionality reduction, energy-based classification, and tree search. We have developed an algorithm that combines these three schemes together and achieves a speed-up factor of 175 at the expense of only 0.6 dB degradation in PSNR relative to the unmodified exhaustive search for a typical image encoded with 0.44 bpp.
Subjective rating of picture coding algorithms
Author(s):
Wen Xu;
Gert Hauske;
Pavel Filip;
Michael J. Ruf
Show Abstract
This paper reports on a comprehensive subjective evaluation of different waveform coding algorithms for monochrome still pictures. In order to obtain reliable and relevant results about the coding efficiency in the sense of a rate-distortion criterion, i.e. mean opinion scores (MOS) of observers vs. bit rate, the coding algorithms are optimized as far as possible by maintaining the subjective quality of coded pictures. Test pictures with various bit rates are generated from several source pictures. The psychophysical picture quality experiments are carried out for the whole set of test pictures. Based on the experimental results, different coding algorithms are quantitatively compared with each other. The coding methods investigated include the following stand-alone methods: DPCM, vector quantization (VQ), discrete cosine transform (DCT) coding, recursive block coding (RBC), subband coding (SBC), wavelet transform coding (WTC), Laplacian pyramid coding, Cortex transform coding, and combined methods: ISO standard JPEG, wavelet transform with run-length coding and variable length coding, DCT with pyramid vector quantization (PVQ) and subband transform with PVQ.
Adaptive transform coding of images based on removing just noticeable distortion
Author(s):
Chun-Hsien Chou
Show Abstract
The removal of perceptual redundancy from image signals has been considered as a promising approach to maintain high image quality at low bit rates, and has recently become an important area of research. In this paper, a perceptually tuned discrete cosine transform (DCT) coder of gray-scale images is presented, where a just-noticeable distortion (JND) profile is measured as the perceptual redundancy inherent in an image. The JND profile provides each signal being coded with a visibility threshold of distortion, below which reconstruction errors are rendered imperceptible. By exploiting basic characteristics of human visual perception, the JND profile is derived from analyzing local properties of image signals. According to the sensitivity of human visual perception to spatial frequency, a distortion allocation algorithm is applied to each block for screening out perceptually unimportant coefficients (PUC's) and, simultaneously, determining quantizer stepsizes of perceptually important coefficients (PIC's). Simulation results show that high visual quality can be obtained at low bit rates, and, for a given bit rate, the visual quality of the images compressed by the proposed coder is more acceptable than those compressed by ISO-JPEG coder.
Fast search algorithm for vector quantization using means and variances of code words
Author(s):
Chang-Hsing Lee;
Ling-Hwei Chen
Show Abstract
Vector Quantization has been applied to low-bit-rate speech and image compression. One of the most serious problems for vector quantization is the high computational complexity of searching for the closest codeword in the codebook design and encoding processes. To overcome this problem, a fast algorithm, under the assumption that the distortion is measured by the squared Euclidean distance, will be proposed to search for the closest codeword to an input vector. Using the means and variances of codewords, the algorithm can reject many codewords that are impossible to be candidates for the closest codeword to the input vector and hence save a great deal of computation time. Experimental results confirm the effectiveness of the proposed method.
Lossy coding scheme of binary character patterns
Author(s):
Tadahiko Kimoto;
Masayuki Tanimoto
Show Abstract
To transmit facsimile images through a very low bit-rate channel such as the half-rate mobile channel, a very efficient coding scheme for data compression is required. Lossy coding is expected to perform more data reduction than that achieved by the conventional lossless coding schemes. This paper discusses approximate representation of scanned character patterns for data reduction. First, the quality of character patterns is considered in terms of the size of patterns. According to this consideration, the attributes of scanned character patterns and the quality associated with them are assumed. For preserving quality under approximation, a character pattern is described by a set of strokes in tree data structure.
Hybrid adaptive vector quantizer for image compression via the gold-washing mechanism
Author(s):
Wen-Shiung Chen;
Zhen Zhang;
En-Hui Yang
Show Abstract
A new image compression algorithm based on an adaptive vector quantization is presented. A novel efficient on-line codebook refining mechanism, called 'Gold-Washing' (GW) mechanism, including the GW algorithm which works on a dynamic codebook, called the GW codebook, is presented and implemented. This mechanism is universal so that it is suitable for any type of input data sources and is adaptive so that no source statistics transmission is needed. The asymptotic optimality of GW mechanism has been proven for not only memoryless (i.i.d.) sources but also stationary, ergodic sources. The efficiency and time complexity of the GW mechanism are analyzed. Based on this mechanism, an efficient hybrid adaptive vector quantizer which incorporates with other coding techniques such as a basic VQ with a large auxiliary codebook, called universal-mother (UM) codebook, as a new codeword generator, quadtree- based hierarchial decomposition, and classification is designed for image coding applications. From the experimental results, the performance of out image compression algorithm is competitive to and even better than those of JPEG and other coding algorithms, especially in low bit rate applications. The coded results with but rate of 0.120- 0.150 bits per pixel and acceptable image quality can be achieved.
New ADPCM image coder using frequency weighted directional filters
Author(s):
Chen-Chang Lien;
Chang-Lin Huang;
I-Chang Chang
Show Abstract
This paper proposes a new ADPCM method for image coding called directional ADPCM which can remove more redundancy from the image signals than the conventional ADPCM. The conventional ADPCM calculates the two-dimensional prediction coefficients by using the correlation functions followed by solving the Yule-Walker equation. Actually, the quantities of correlation functions to be the approximation of the correlation function. However, the block size is limited by the error accumulation effect during packet transmission. Using small block may induce the unregulated prediction coefficients. Therefore, we need to develop the directional ADPCM system to overcome such a problem and to have better prediction result. Our directional ADPCM utilized the fan- shape filters to obtain the energy distribution in four directions and then determines the four directional prediction coefficient. All the fan-shape filters are designed by using the singular value decomposition (SVD) method, the two-dimensional Hilbert transform technique, and the frequency weighting concept. In the experiments, we illustrate that the M.S.E. for the directional ADPCM is less than that of the conventional ADPCM.
Two-pass side-match finite-state vector quantization
Author(s):
Ruey-Feng Chang;
Wen-Jia Kuo
Show Abstract
Among the image coding techniques, vector quantization (VQ) has been considered to be an effective method for coding images at low bit rate. Side-match finite-state vector quantizer (SMVQ) exploits the correlations between the neighboring blocks (vectors) to avoid large gray level transition across block boundaries. In this paper, an improved SMVQ technique named two-pass side-match finite-state vector quantization (TPSMVQ) has been proposed. In TPSMVQ, the size of state codebook in the first pass is decided by the variances of neighboring blocks. In the second pass, we will improve the blocks encoded in the first pass whose variances are greater than a threshold. Moreover, not only the left and upper blocks but also the down and right blocks are used for constructing the state codebook. In our experiment results, the improvement of second pass is up to 1.5 dB in PSNR over the fist pass. In comparison to ordinary SMVQ, the improvement is upt to 1.54 dB at nearly the same bit rate.
Improved minimum distortion encoding algorithm for image vector quantization
Author(s):
Kwok-Tung Lo;
Jian Feng
Show Abstract
An improved minimum distortion encoding method called predictive mean search (PMS) algorithm is proposed for image vector quantization in this paper. With the proposed method, the minimum distortion codeword can be found by searching only a portion of the codebook and its relative address to the origin of search is sent instead of absolute address. Two schemes are proposed to transmit the relative address of the minimum distortion codeword. Simulation results show that a significant reduction in both computations and bit-rates is achieved by using the proposed methods.
Reversible subband coding of images
Author(s):
Kunitoshi Komatsu;
Kaoru Sezaki
Show Abstract
In this paper, reversible subband coding of images are proposed. The high-band signals of conventional one-dimensional reversible filter banks were extrapolative prediction error signals or interpolative ones of which number of levels became four times that of the input signals. Therefore, we design reversible filter banks that generate interpolative prediction error signals of which number of levels becomes twice that of the input signals. We also design nonseparable filter banks. It is shown that the proposed methods have the better performance than the conventional reversible subband coding and that the separable method with low-pass filter has little aliasing in the reduced images.
Versatile visual pattern image coding system
Author(s):
K. W. Chan;
Kwok-Leung Chan
Show Abstract
A full-fledged Visual Pattern Image Coding system is developed. For high compression ratio, Uniform Patterns are merged by Quadtree merging. For high visual quality, a new set of 2 X 2 Edge Patterns are designed for near perfect reconstruction. By a classification scheme of 11 groups or a threshold of gradient magnitude, the performance profile can adapt to a wide variety of applications.
Two-dimensional split and merge algorithm for image coding
Author(s):
Wai-Fong Lee;
Calvin C.K. Chan
Show Abstract
In this paper, a new 2D split and merge algorithm (2DSM) for image coding is devised. An image is modelled as a 2.5-dimensional surface and approximated by a surface formed by triangular patches. The algorithm iteratively improves the approximated image by splitting the merging of the triangles in order to drive the error under a specified bound. In addition, a new optimal triangulation for image data approximation is proposed. The algorithm is successfully applied for coding monochrome images using Interpolative Vector Quantization (IVQ) technique. Simulation results show that the proposed method can achieve 2.8 dB improvement on the approximated image and 0.68 dB improvement on the decoded image at a bit rate lower than the current schemes. Besides, excellent reconstruction visual quality is observed.
Target region extraction and image coding based on motion information
Author(s):
Jong-Bae Lee;
Seong-Dae Kim
Show Abstract
This paper describes a method of coding image sequences based on global/local motion information. The suggested method initially estimates global motion parameters and local motion vectors. Then segmentation is performed with a hierarchial clustering scheme and a quadtree algorithm in order to divide a processing image into the background and target region. Finally image coding is done by assigning more bits to the target region and less bits to background so that the target region may be reconstructed with high quality. Simulations show that the suggested algorithm performs well especially in the circumstances where background changes and target region is small enough compared with that of background.
Intra- and interframe image coding via vector quantization
Author(s):
Hazem Ahmad Munawer;
Otar G. Zumburidze
Show Abstract
In this paper, we present two algorithms for intra-and inter-frame image coding via vector quantization (VQ): (1) The intraframe image coding using Wavelet VQ (WVQ): in this case, we carried out several experimental results on the base of a combination between the Wavelet pyramid coding and VQ. (2) The interframe image coding: Here two modifications are proposed; in the first, we identify the moving vectors in each frame using a block matching technique. For each moving vector, a prediction is estimated by searching for the direction of minimum distortion in the previous reconstructed frame. Then the differential vectors are quantized via a simple modified VQ (thresholded VQ) to achieve high compression. The second modification uses moving mask technique to determine the moving object over several successive frames of an image sequence. Then, two codebooks were developed: one for the background vectors (non-moving vectors) and the other for the moving object vectors.
Very large scale integration (VLSI) architectures for video signal processing
Author(s):
Peter Pirsch;
Winfried Gehrke
Show Abstract
The paper presents an overview on architectures for VLSI implementations of video compression schemes as specified by standardization committees of the ITU and ISO, focussing on programmable architectures. Programmable video signal processors are classified and specified as homogeneous and heterogeneous processor architectures. Architectures are presented for reported design examples for the literature. Heterogenous processors outperform homogeneous processors because of adaptation to the requirements of special subtasks by dedicated modules. The majority of heterogenous processors incorporate dedicated modules for high performance subtasks of high regularity as DCT and block matching. By normalization to a fictive 1.0 micron CMOS process typical linear relationships between silicon area and through-put rate have been determined for the different architectural styles. This relationship indicated a figure of merit for silicon efficiency.
Macro motion vector quantization
Author(s):
Yoon Yung Lee;
John W. Woods
Show Abstract
A new algorithm is developed for reducing the bit rate required for motion vectors. This algorithm is a generalization of block matching motion estimation in which the search region is represented as a codebook of motion vectors. The new algorithm, called macro motion vector quantization (MMVQ), generalized our earlier MVQ by coding a group of motion vectors. The codebook is a set of macro motion vectors which represent the block locations of the small neighboring blocks in the previous frame. We develop an interative design algorithm for the codebook. Our experiments show that the variances of displaced frame differences (DFDs) are reduced significantly compared to block matching algorithm (BMA) with the macroblock size.
Joint motion estimation and segmentation for very low bit rate video coding
Author(s):
Touradj Ebrahimi;
Homer H. Chen;
Barry G. Haskell
Show Abstract
Motion estimation is a key issue in video coding. In very low bitrate applications, the amount of the side information for the motion field presents an important portion of the total bitrate. This paper presents a joint motion estimation, segmentation and coding technique, which tries to reduce the segmentation and motion side information, while providing a similar or smaller prediction error when compared to more classical motion estimation techniques. The main application in mind is a region based coding approach in which the consecutive frames of the video are divided into regions having similar motion vectors with simple shapes, easy to encode.
Reliability metric of motion vectors and its application to motion estimation
Author(s):
Toshiyuki Yoshida;
Atsushi Miyamoto;
Yoshinori Sakai
Show Abstract
This paper proposes a reliability metric for motion vectors, and applies it to the block matching method using hierarchical images to reduce estimation errors. First, the proposed reliability metric for motion vectors is derived and its properties are discussed. In order to evaluate the usefulness of the proposed reliability metric, a calculation example for an actual image is shown. Then, the derived reliability metric is applied to the block matching method using hierarchical images as a weighting function. Finally, several experimental results by the proposed estimation method are shown for a verification of the proposed method.
Fast motion vector estimation by using spatiotemporal correlation of motion field
Author(s):
Sungook Kim;
Junavit Chalidabhongse;
C.-C. Jay Kuo
Show Abstract
Motion vector (MV) estimation plays an important role in motion compensated video coding. In this research, we first examine a stochastic MV model which enables us to exploit the strong correlation of MVs in both spatial and temporal domains in a given image sequence. Then, a new fast stochastic block matching algorithm (SBMA) is proposed. The basic idea is to select a set of good MV candidates and choose from them the one which satisfies a certain spatio-temporal correlation rule. The proposed algorithm reduces matching operations to about 2% of that of the full block matching algorithm (FBMA) with only 2% increase of the sum of absolute difference (SAD) in motion compensated residuals. The excellent performance of the new algorithm is supported by extensive experimental results.
Rate-distortion optimization between the hierarchical variable block size motion estimation and motion sequence coding
Author(s):
JongWon Kim;
Sang Uk Lee
Show Abstract
Recently, a variable block size (VBS) motion estimation technique has been employed to improve the performance of the motion compensated transform coding (MCTC). This technique allows larger blocks to be used when smaller blocks provide little gain, saving the bit rates, especially for areas containing more complex motion. However, the employment of the VBS motion estimation technique addresses a new optimization issue for the motion compensated coding (MCC), since an increased bit rate should be allocated to the VBS motion vectors. That is, the rate allocation between the motion vector encoding and the displaced frame difference (DFD) coding is an important issue. Hence, in this paper, a rate-distortion (R-D) optimization between the hierarchical VBS motion estimation and DFD coding is described. First, to make the R-D search feasible, the hierarchical VBS motion structures age grouped into two-level model structures and an efficient R-D search method is proposed. Next, a solution for the control of the VBS motion information, based on Lagrange multiplier method, is introduced. Intensive computer simulation employing the MCTC technique shows that an overall improvement up to 1.0 dB, compared to the fixed block size motion estimation, is obtained.
Contour tracking and synthesis in image sequences
Author(s):
Takeshi Agui;
Tomofumi Ishihara;
Hiroshi Nagahashi;
Takanori Nagae
Show Abstract
A method for generating a sequence of synthetic images from two different image sequences is described. The method is composed of two major processes, i.e., an object tracking and an image synthesizing. The process of the object tracking consists of two phases. The first phase is to determine an initial contour which roughly approximates the shape of the object, and the second phase is to extract an accurate contour from the initial one by using an active contour model. An initial contour of an object in the first frame is specified manually, and those in the following frames are determined by referring the extracted contour in the previous frame. The contour extracted in the current frame is deformed by detecting nonrigid object movements and the resultant shape is used as the initial contour of the active contour model in the next frame. The motion parameters representing camera motions are also estimated by calculating optical flows between two successive frames.
Reconstruction-model-based snake applied to optimal motion contour positioning
Author(s):
Henri Sanson
Show Abstract
This paper addresses the problem of optimal positioning of a contour separating two moving regions using snake concepts. After a brief recall of classical snake methodology, an alternative approach is proposed, based on a reconstruction criterion for the regions delimited by the curve, and the use of parametric modeling of both the region textures and boundaries. A generic adaptive step gradient algorithm is formulated for solving the curve evolution problem, independently of the models used. The method is then more specifically applied to motion boundary localization, where the texture of mobile regions is reconstructed by motion compensation, using polynomial motion models. The generic optimization algorithm is applied to motion frontiers defined by B- spline curves. Detailed implementation of this method in this particular case is described, and considerations about its behavior are given. Some experimental results are finally reported, attesting the interest of the proposed approach.
Motion analysis on an image sequence using fractal dimension
Author(s):
Kwok-Leung Chan
Show Abstract
In this investigation, motion analysis is carried out on image sequence using region-based feature matching. A Two-Pass search algorithm is devised for motion estimation. Each image is partitioned into a number of blocks. The movement of each block is determined by looking for the corresponding block in the previous frame of the image sequence. In the first pass, fractal dimension is estimated in the neighborhoods of each block. The coarse position of the corresponding block in the previous frame is identified based on the similarity of that parameter. This position is then regarded as the center of the search space in the second pass, which employs grey level Exhaustive search to determine the fine position of the block. The algorithm has been tested on a sequence of 16 images. The performance of the Two-Pass search is found to be much better than the grey level Three-Step search and comparable to the grey level Exhaustive search.
Investigation of texture periodicity extraction
Author(s):
Valery V. Starovoitov;
Sang-Yong Jeong;
Rae-Hong Park
Show Abstract
The structure extraction task is analyzed. The co-occurrence matrices (CMs) are the popular basis for this goal. We show that binary preparation of arbitrary texture preserves its structure. This transformation decreases the computation time of analysis and the required memory in dozens times. A number of features for detecting displacement vectors on binarized images are compared. We suggest to use CM elements jointly as the united feature for this goal. We have shown that it is a stable detector for noisy images and simpler than well- known (chi) 2 and (kappa) statistics.
Quasi-moments under the projected rotation group
Author(s):
Masaru Tanaka
Show Abstract
In this paper, we consider the projected rotation group which consists of projection and rotations in 3D and give some invariant feature extractors. Based on the theory of Lie algebra, the representation of the projected rotation group is obtained. Therein it is shown that the basis of the representation can be an orthonormal basis. With this orthonormal basis, we can construct quasi moment which is a kind of a weighted moment. It is also shown that the quasi moments which are closed in their orders under the projected rotation group. By computer simulations, some experiments for 3D motion analysis with the quasi moments are given.
Accurate set-up of Gabor filters for texture classification
Author(s):
Devesh Patel;
T. John Stonham
Show Abstract
Gabor filters are of particular interest to the computer vision community because the profiles of the two dimensional Gabor functions have been shown to closely approximate the receptive field profiles of particular simple cells in the visual cortex of certain mammals. However, only a few values for the parameters of the Gabor function can generate a large number of filters that makes practical implementation impossible. Moreover, the process of adjusting the parameters of these functions to obtain the 'best' set is not straightforward. In this paper we describe a new reliable and systematic method for setting up Gabor filters for texture classification. Texture is an intrinsic property of images and is thus an important feature for computer vision. Gabor filters are used to extract the features from local neighborhoods of the texture images and have been tuned for the classification of initially; naturally occurring textures from Brodatz's album and then different grades of Ceramic filters used in molten metal filtration.
Component-based representation of complex line drawings
Author(s):
Sergey V. Ablameyko;
Carlo Arcelli;
Gabriella Sanniti di Baja
Show Abstract
In this paper, an approach for hierarchical representation of graphic image objects in vector form is proposed, which is based on a decomposition of the object skeleton into a number of components, consisting of concatenations of branches and loops. To build this representation, the image vectorization method is proposed which is based on an object skeletonization and its iterative tracing. At each iteration, the skeleton components are extracted and are concatenated with already extracted components. Concatenations are built by taking into account the spatial relevance of the regions represented by the various skeleton subsets. The data structure is built which allows to store in compact way information about object components. The method can be applied to recognize graphic images, particularly cartographic, engineering drawing, and flow-chart images.
Difference-of-exponential detector for extracting edges
Author(s):
Bingcheng Li;
Dongming Zhao
Show Abstract
In this paper, a generalized zero crossing (GZC) theorem is proposed. The GZC theorem has much less constraints on filters so that the design of filters can be flexible. Then, it is shown that ramp models can be effectively approximated by step models. Based on the GZC theorem, a difference-of-exponential (DoE) operator is proposed. It is shown both theoretically and experimentally that the new operator is computationally efficient, and its edge detection performance is higher than that of the Laplacian-of-Gaussian (LOG) operator.
Mathematical morphology-based shape feature analysis for Chinese character recognition systems
Author(s):
Tun-Wen Pai;
Keh-Hwa Shyu;
Ling-Fan Chen;
Gwo-Chin Tai
Show Abstract
This paper proposes an efficient technique of shape feature extraction based on the application of mathematical morphology theory. A new shape complexity index for preclassification of machine printed Chinese Character Recognition (CCR) is also proposed. For characters represented in different fonts/sizes or in a low resolution environment, a more stable local feature such as shape structure is preferred for character recognition. Morphological valley extraction filters are applied to extract the protrusive strokes from four sides of an input Chinese character. The number of extracted local strokes reflects the shape complexity of each side. These shape features of characters are encoded as corresponding shape complexity indices. Based on the shape complexity index, data base is able to be classified into 16 groups prior to recognition procedures. The performance of associating with shape feature analysis reclaims several characters from misrecognized character sets and results in an average of 3.3% improvement of recognition rate from an existing recognition system. In addition to enhance the recognition performance, the extracted stroke information can be further analyzed and classified its own stroke type. Therefore, the combination of extracted strokes from each side provides a means for data base clustering based on radical or subword components. It is one of the best solutions for recognizing high complexity characters such as Chinese characters which are divided into more than 200 different categories and consist more than 13,000 characters.
New color image processor for the video camera
Author(s):
Shin-Shu Wang;
Ye-Quang Chen
Show Abstract
This paper presents a newly-developed digital color image processor (CIP) for the video camera. The CIP accepts digitized image signals from a one-chip color CCD sensor, performs luminance and chrominance signal processing, and outputs NTSC Y/C and digital CCIR601 signals. In addition, the algorithmly-developed auto-focus mechanism that will be integrated with CIP is described.
Elastic mapping technique for intersubject tomographic image registration
Author(s):
Kang-Ping Lin;
Sung-Cheng Huang
Show Abstract
This paper presents a two step self-organizing method for a transformation that can elastically map one subject's MR image, called the input image, to a standard reference MR image. Linear scaling and transformation are first introduced to grossly match the input image to the reference image. Then the input image is linearly scaled and divided into several smaller cubes of equal volume. A local correspondence is used to estimate the best matching position by moving individual cubes of the input image to the reference image within a search neighborhood. Based on local correspondence, coarse displacement vectors for each cube are determined by the position difference between the original and the new cube centers. The estimated vectors provide a complete transformation that matches the entire input image to the reference image. As the process is repeated, a better transformation is obtained that improves the matching. This algorithm has been tested on simulations of 3D deformed images and been found to be successful for 3D inter-subject registration of MR images.
Image analysis for skeletal evaluation of carpal bones
Author(s):
Chien-Chuan Ko;
Chi-Wu Mao;
Chi-Jen Lin;
Yung-Nien Sun
Show Abstract
The assessment of bone age is an important field to the pediatric radiology. It provides very important information for treatment and prediction of skeletal growth in a developing child. So far, various computerized algorithms for automatically assessing the skeletal growth have been reported. Most of these methods made attempt to analyze the phalangeal growth. The most fundamental step in these automatic measurement methods is the image segmentation that extracts bones from soft-tissue and background. These automatic segmentation methods of hand radiographs can roughly be categorized into two main approaches that are edge and region based methods. This paper presents a region-based carpal-bone segmentation approach. It is organized into four stages: contrast enhancement, moment-preserving thresholding, morphological processing, and region-growing labeling.
Occlusion and nonstationary displacement field estimation in quantum-limited image sequences
Author(s):
Cheuk L. Chan;
James C. Brailean;
Aggelos K. Katsaggelos
Show Abstract
In this paper, we develop an algorithm for obtaining the maximum a posteriori (MAP) estimate of the displacement vector field (DVF) from two consecutive image frames of an image sequence acquired under quantum-limited conditions. The estimation of the DVF has applications in temporal filtering, object tracking and frame registration in low- light level image sequences as well as low-dose clinical x-ray image sequences. The quantum-limited effect is modeled as an undesirable, Poisson-distributed, signal-dependent noise artifact. The specification of priors for the DVF allows a smoothness constraint for the vector field. In addition, discontinuities and areas corresponding to occlusions which are present in the field are taken into account through the introduction of both a line process and an occlusion process for neighboring vectors. A Bayesian formulation is used in this paper to estimate the DVF and a block component algorithm is employed in obtaining a solution. Several experiments involving a phantom sequence show the effectiveness of this estimator in obtaining the DVF under severe quantum noise conditions.
Segmentation-based lossless coding of medical images
Author(s):
Liang Shen;
Rangaraj M. Rangayyan
Show Abstract
Lossless compression techniques are essential in archival and communication of medical images. However, there has been limited recent progress in lossless image coding. Algorithms available are either too complicated for fast implementation or suitable only for certain specific types of images. In this paper, a new Segmentationbased Lossless Image Coding (SLIC) scheme is proposed, which is based on a simple but efficient region growing procedure. This embedded procedure produces an adaptive scanning pattern for an image with the help of a very-few-bits-needed discontinuity index map. Along with this scanning pattern, the high correlation among image pixels is exploited by the method, and thereby an error image data part with a very small dynamic range is generated. Both the error image data and the discontinuity index map data parts are then JBIG encoded. In comparison with direct coding by JBIG, JPEG, adaptive Lempel-Ziv, and 2D Burg prediction pius Huffman error coding methods, the SLIC method performed better by at least 7% with ten 8-bit and 10-bit high-resolution digitized medical images.
Keywords: segmentation, region growing, lossless compression, image coding, medical image
Medical image compression with structure-preserving adaptive quantization
Author(s):
Chang Wen Chen;
Ya-Qin Zhang;
Jiebo Luo;
Kevin J. Parker
Show Abstract
We present in this paper a study of medical image compression based on an adaptive quantization scheme capable of preserving clinically useful structures appeared in the given images. We believe that how accurate can a compression algorithm preserve these structures is a good measure of image quality after compression since many image-based diagnoses are based on the position and appearance of certain structures. With wavelet decomposition, we are able to investigate the image features at different scale levels that correspond to certain characteristics of biomedical structures contained in the medical images. An adaptive quantization algorithm based on clustering with spatial constraints is then applied to the high frequency subbands. The adaptive quantization enables us to selectively preserve the image features at various scales so that desired details of clinically useful structure are preserved during the process of compression, even at a low bit rate. Preliminary results based on real medical images suggest that this clustering-based adaptive quantization, combined with wavelet decomposition, is very promising for medical image compression with structure-preserving capability.
Extracting multidimensional signal features for content-based visual query
Author(s):
Shih-Fu Chang;
John R. Smith
Show Abstract
Future large visual information systems (such as image databases and video servers) require effective and efficient methods for indexing, accessing, and manipulating images based on visual content. This paper focuses on automatic extraction of low-level visual features such as texture, color, and shape. Continuing our prior work in compressed video manipulation, we also propose to explore the possibility of deriving visual features directly from the compressed domain, such as the DCT and wavelet transform domain. By stressing at the low-level features, we hope to achieve generic techniques applicable to general applications. By exploring the compressed-domain content extractability, we hope to reduce the computational complexity. We also propose a quad-tree based data structure to bind various signal features. Integrated feature maps are proposed to improve the overall effectiveness of the feature-based image query system. Current technical progress and system prototypes are also described. Part of the prototype work has been integrated into the Multimedia/VOD testbed in the Advanced Image Lab of Columbia University.
Visual information offering system using N-ISDN
Author(s):
Takaaki Akimoto;
Shizuo Nakano;
Mineo Shoman;
Hisashi Ibaraki;
Koji Jinzenji
Show Abstract
A system for offering visual information to N-ISDN video phones is proposed. The detail of the key components of this system; N-ISDN-LAN gateways, interactive service control and H.261 coded video editor are described. The features of this system are that it is flexible and the same system can be applied to a system for LAN.
Very low bit rate video coding standards
Author(s):
Ya-Qin Zhang
Show Abstract
Very low bit rate video coding has received considerable attention in academia and industry in terms of both coding algorithms and standards activities. In addition to the earlier ITU-T efforts on H.320 standardization for video conferencing from 64 kbps to 1.544 Mbps in ISDN environment, the ITU-T/SG15 has formed an expert group on low bit coding (LBC) for visual telephone below 64 kbps. The ITU-T/SG15/LBC work consists of two phases: the near-term and long-term. The near-term standard H.32P/N, based on existing compression technologies, mainly addresses the issues related to visual telephony at below 28.8 kbps, the V.34 modem rate used in the existing Public Switched Telephone Network (PSTN). H.32P/N will be technically frozen in January '95. The long-term standard H.32P/L, relying on fundamentally new compression technologies with much improved performance, will address video telephony in both PSTN and mobile environment. The ISO/SG29/WG11, after its highly visible and successful MPEG 1/2 work, is starting to focus on the next- generation audiovisual multimedia coding standard MPEG 4. With the recent change of direction, MPEG 4 intends to provide an audio visual coding standard allowing for interactivity, high compression, and/or universal accessibility, with high degree of flexibility and extensibility. This paper briefly summarizes these on-going standards activities undertaken by ITU-T/LBC and ISO/MPEG 4 as of December 1994.
Mobile video communications techniques and services
Author(s):
Hisashi Ibaraki;
Tsuyoshi Fujimoto;
Shizuo Nakano
Show Abstract
This paper describes picture information transmission for portable multimedia terminals. The radio links used in portable multimedia terminals have narrower channel capacity and higher transmission error rates than wired links such as those used in ISDN. To transmit multimedia information of satisfactory quality over radio links, robustness against radio link errors must be improved, because picture deterioration is much more apparent than audio deterioration. First, the effects of transmission errors on picture quality are analyzed using the H.261 coding system used for ISDN picture communication. Second, the relationship among bit error rate, terminal velocity, and picture quality is analyzed and the deterioration mechanisms of picture quality are discussed. Three techniques for improving picture quality against radio link errors are proposed.
Mobile multimedia communications in a universal telecommunications network
Author(s):
Klaus Illgner;
Dirk Lappe
Show Abstract
The paper gives an overview on trends in mobile communications which is currently one of the most emerging markets. The demand for new and extended services is a driving force for the development of future mobile networks. Especially the provision of video communications is a great challenge. The current status of MPEG and ITU-T/SG15 concerning mobile communications is described and related to current projects. In the second part some aspects for the design of video codecs for mobile video telephony are outlined. Based on the coming standard H.26P and advanced video codec is described and its suitability for mobile networks is proven by some simulation results.
Digital television on ATM networks: how to optimize coding and transmission?
Author(s):
Jean-Pierre Leduc;
Claude Labit
Show Abstract
This paper aims at presenting the major steps towards the elaboration of an optimum control for video transmissions on ATM network. The paper puts forward the gain in statistical multiplexing to demonstrate that transmitting at variable rates on asynchronous multiplexing links is more efficient than exploiting constant rate on synchronous links. Optimum coding and transmission require to characterize the video sources of information as entropy generator and the develop entropy rate-distortion functions in the coder and the transmission channel. Quantizers and VLC in coding, traffic and queues in transmission multiplexing lead each to performance functions expressing quality in terms of entropy rate; they are respectively the PSNR in function of the output data rate and the cell losses in terms of the network loads. The main advantage of transmitting on variable bit rate channels is to favor the generation of image sequences at constant subjective quality on the coding side, and, the saving of transmission bandwidth through a gain in statistical multiplexing on the network side. Mirror control algorithms can be implemented in the coding end and in the multiplexing nodes to optimally manage the rate-distortion functions.
Multimedia processing and transport for the wireless personal terminal scenario
Author(s):
D. Raychaudhuri;
Daniel J. Reininger;
Maximilian Ott;
Girish Welling
Show Abstract
This paper presents an exploratory view of multimedia processing and transport issues for the wireless personal terminal scenario. In this scenario, portable multimedia computing devices are used to access video/voice/data information and communication services over a broadband wireless network infrastructure. System architecture considerations are discussed, leading to identification of a specific approach based on a unified wired + wireless ATM network, a general-purpose CPU based portable terminal, and a new software architecture foe efficient media handling and quality-of-service (QoS) support. The recently proposed 'wireless ATM' network concept is outlined, and the associated transport interface at the terminal is characterized in terms of available service types (ABR, VBR, CBR) and QoS. A specific MPEG video delivery applications with VBR ATM transport and software decoding is then examined in further detail. Recognizing that software video decoding at the personal terminal represents a major performance bottleneck in this system, the concept of MPEG encoder quantizer control with joint VBR bit-rate and decoder computation constraints is introduced. Experimental results are given for software decoding of VBR MPEG video with both VBR usage parameter control (UPC) and decoder CPU constraints at the encoder, demonstrating improvements in delivered video quality relative to the conventional case without such controls.
Real-time software-based end-to-end wireless visual communications simulation platform
Author(s):
Ting-Chung Chen;
Li-Fung Chang;
Andria H. Wong;
Ming-Ting Sun;
T. Russell Hsing
Show Abstract
Wireless channel impairments pose many challenges to real-time visual communications. In this paper, we describe a real-time software based wireless visual communications simulation platform which can be used for performance evaluation in real-time. This simulation platform consists of two personal computers serving as hosts. Major components of each PC host include a real-time programmable video code, a wireless channel simulator, and a network interface for data transport between the two hosts. The three major components are interfaced in real-time to show the interaction of various wireless channels and video coding algorithms. The programmable features in the above components allow users to do performance evaluation of user-controlled wireless channel effects without physically carrying out these experiments which are limited in scope, time-consuming, and costly. Using this simulation platform as a testbed, we have experimented with several wireless channel effects including Rayleigh fading, antenna diversity, channel filtering, symbol timing, modulation, and packet loss.
Iterative restoration of fast moving objects in dynamic image sequences
Author(s):
Damon L. Tull;
Aggelos K. Katsaggelos
Show Abstract
If a point on an object passes over two or more photoreceptors during image acquisition, a blur will occur. Under these conditions, an object or scene is said to move fast relative to the camera's ability to capture the motion. In this work, we consider the iterative restoration of images blurred by distinct, fast moving objects in the frames of a (video) image sequence. Even in the simplest case of fast object motion, the degradation is spatially variant with respect to the image scene. Rather than segmenting the image into regions where the degradation can be considered space invariant, we allow the blur to vary at each pixel and perform iterative restoration. Our approach requires complete knowledge of the blur point spread function (PSF) to restore the scene. The blur of fast moving object in a single frame is under specified. With the appropriate assumptions, an estimate of the blur PSF can be specified to within a constant scaling factor using motion information provided by a displacement vector field (DVF). A robust iterative restoration approach is followed which allows for the incorporation of prior knowledge of the scene structure into the algorithm to facilitate the restoration of difficult scenes. A bilinear approximation to the continuous PSF derived from the motion estimate is proposed to obtain results for real and synthetic sequences. We found this approach suitable for restoring motion degradations in a wide range of digital video applications. The results of this work reinforced the well known flexibility of the iterative approach to restoration and its application as an off-line image sequence restoration method.
Quasi-median gray image filtering via distance transformation
Author(s):
Valery V. Starovoitov
Show Abstract
The construction of filters or filter operators is an important problem in image processing. In this paper it is described how one can construct a filter operator based on the Distance Transformation for grey-scale image. This operator is similar to max/min filters for grey-scale images. We call this operator quasi-median filter. It is not a filter from the morphological point of view like median filter, but it posses filter (mean) properties.
Fast motion compensated temporal interpolation for video
Author(s):
Chi-Kong Wong;
Oscar Chi Lim Au
Show Abstract
Recently, MPEG-4 is being formed to study very-low-bit-rate (VLBR) video coding for applications in videotelephony. In this paper, we propose a possible postprocessing technique for VLBR coding. In videophone applications, temporal subsampling is a simple technique which can be combined with other compression schemes to achieve very large compression ratio, so as to satisfy the VLBR requirement. As a result, object motions tend to be jerky and disturbing to the human eyes. To smooth out object motions, we propose a postprocessing technique, motion compensated temporal interpolation (MCTI), to increase the instantaneous decoder frame rate. In MCTI, block-based exhaustive motion search is used to establish temporal association between two reconstructed frames. Both forward and backward searches are used to account for uncovered and newly covered areas properly. With MCTI, we show that one or more frames can be interpolated with acceptable visual quality. After showing the feasibility of MCTI, we propose a fast algorithm FMCTI with reduced computation requirement and negligible performance degradation.
Study of statistical properties of random signals in multirate filter banks
Author(s):
Leu-Shing Lan
Show Abstract
Previous works on subband-related signal processing were mainly dedicated to the applications of subband systems and to the formulation of multirate filter banks. Only very limited results can be found that treat statistical properties of random signals inside a multirate filter bank. In this paper, such a theoretical study is performed from the statistical viewpoint. Our main interest lies in how a multirate structure interacts with a random signal. The key statistical properties examined are stationarity, autocorrelation, cross-correlation, power spectral density, and spectral flatness measure. Exact explicit expressions are obtained. These results have their counterparts in a fullband system; however, inside a multirate structure or a subband system, the aliasing effect caused by decimation should be taken into account. In a multirate system, stationarity is not preserved when an upsampling (or expanding) operation is encountered. Furthermore the equivalent filtering operation is nonlinear. A test example of an AR-1 process is included for demonstration. From this example, an interesting phenomenon is observed. When the correlation coefficient of the AR-1 process is close to 1, the lowpassed signal is not, in any sense, a rough replica of the source. This example justifies the significance and necessity of a theoretical analysis of subband systems from a statistical viewpoint. We believe that stochastic signal processing applications of a subband structure such as estimation, detection, recognition, etc. will benefit from study of this nature.
Interpolation for 3D object reconstruction using wavelet transforms
Author(s):
Wen-Huei Lin;
Chin-Hsing Chen;
Jiann-Shu Lee
Show Abstract
Three dimensional (3D) object reconstruction from a series of cross sectional images has found many applications, such as computer vision, medical imaging. In this paper, we proposed a wavelet based interpolation for 3D reconstruction. In this scheme, a contour signal of interested object is decomposed by using multiresolution wavelet bases. The length of a 'filled' contours is first estimated from the two lengths of a coarsest scales of the two adjacent slices, refined by the lengths of the finer scales. The interslice contour estimation is obtained by the inverse wavelet transform. A series of CT liver images is used to test the performance of our method. Experiments show that our method can obtain satisfactory reconstruction surface. The advantages of our method are (i) no need of feature matching, which is a time consuming process and usually result in false matching results and (ii) fast algorithms for wavelet transforms can be implemented. Thus, our method is not only reliable for practical images but also computationally efficient.
Estimation of image motion through mathematical modeling
Author(s):
Cheng-Ho Hsin
Show Abstract
A computational model for estimation of image motion from a sequence of images obtained by an imaging sensor is developed through mathematical modeling. The paper is divided into three parts which describe the evolution of the model of image motion. An ideal model of image motion is developed in the first part. The model is unsolvable because some parameters and boundary condition are unknown. Hence, a reformulated model of image motion is derived in the second part. Although this model is solvable, it is ill-posed mainly due to the differentiation of noise- contaminated image irradiance function. The consequence of this is that the solution estimated by this model is unstable. Thus, the reformulated model is remedied and transformed into a realistic model of image motion which is discussed in the third part. The results from simulations demonstrate that this realistic model of image motion gives correct and reliable estimation.
Mouth shape detection and tracking using an active mesh
Author(s):
Yao Wang;
Ru-Shang Wang;
Ouseb Lee;
Tsuhan Chen;
Homer H. Chen;
Barry G. Haskell
Show Abstract
In this paper, we describe an approach to detecting and tracking certain feature points in the mouth region in a talking head sequence. These feature points are interconnected in a polygononal mesh so that the detection and tracking of these points is based on the information not only at these points but also in the surrounding elements. The detection of the nodes in an initial frame is accomplished by a feature detection algorithm. The tracking of these nodes in successive frames is obtained by deforming the mesh so that, when one mesh is warped to the other, the image patterns over corresponding elements in two meshes match with each other. This is accomplished by a modified Newton algorithm which iteratively minimized the error between the two images after mesh-based- warping. The numerical calculation involved in the optimization approach is simplified by using the concept of master elements and shape functions in the finite element method. This algorithm has been applied to a SIF resolution sequence, which contains fairly rapid mouth movement. Our simulation results show that this algorithm can locate and track the feature points in the mouth region quite accurately.
Method of analyzing 3D motions of human head and hand for man-machine interface
Author(s):
Masayuki Tanimoto;
Toshio Huwa;
Tadahiko Kimoto
Show Abstract
Lately new types of man-machine interfaces for lightening burdens of users have been studied vigorously. One important example of these interfaces is a system for users to put instruction in the noncontact way. In such systems, users put instruction into virtual environment by gestures through stereo cameras. However, the systems proposed before have several restrictions of the environment where these systems are used. Here, we note two of them, the arrangement of stereo cameras and the background of the user. They are very important in the system applications. Thus we propose a new system to put instruction into virtual environment in the noncontact way through stereo cameras, reducing these two restrictions. First, we describe the outline of the proposed system and then we describe a new algorithm to estimate the 3D motion of user's hand and the position of the head by our system. Experiments to estimate user's hand motion are made and the results are shown.
Detection and tracking of facial features
Author(s):
Liyanage C. De Silva;
Kiyoharu Aizawa;
Mitsutoshi Hatori
Show Abstract
Detection and tracking of facial features without using any head mounted devices may become required in various future visual communication applications, such as teleconferencing, virtual reality etc. In this paper we propose an automatic method of face feature detection using a method called edge pixel counting. Instead of utilizing color or gray scale information of the facial image, the proposed edge pixel counting method utilized the edge information to estimate the face feature positions such as eyes, nose and mouth in the first frame of a moving facial image sequence, using a variable size face feature template. For the remaining frames, feature tracking is carried out alternatively using a method called deformable template matching and edge pixel counting. One main advantage of using edge pixel counting in feature tracking is that it does not require the condition of a high inter frame correlation around the feature areas as is required in template matching. Some experimental results are shown to demonstrate the effectiveness of the proposed method.
Algorithms for extracting the medial axis transform of 2D images
Author(s):
Ching-Shoei Chiang
Show Abstract
We describe 5 algorithms for finding the MAT for 2D regions in this paper. There are Danielson's algorithm, Rosenfeld and Pfaltz's algorithm, interpolation/extrapolation algorithm, Newton and march algorithm and grid edge interpolation algorithm. The Rosenfeld and Pfaltz's, Danielson's, and interpolation/extrapolation methods are based on the maximal disc criterion. Whether the grid point (i,j) with distance amplitudes (a,b) to the boundary of the regions is a MA point is decided by its grid neighbors. If the discrete circle associated with the gird point (i,j) is not contained in one of the 8 discrete circles associated with its neighbors, then it is a MA point. The Newton and march and the grid edge interpolation methods are based on the equal distance criterion. Given the boundary of a region, we compute the distance transform for the discretized region as preprocessing step. With every grid point we associate the index of a nearest edge or a concave vertex, and the direction and distance to that edge or concave vertex. The main purpose of these steps is to solve the proximity problem. A system of equations will be generated and Newton method will be used to trace the MAT. If we add one more equation, such as the equation for a grid line, instead of marching MAT step by step, we can find the MA point square by square under some assumptions, this is the idea of grid interpolation method.
Study on the implementation of a moving-object tracking system
Author(s):
Young Ho Kim;
Kyu-Won Lee;
Chae Wook Lim;
Kyu Tae Park
Show Abstract
An automatic moving object tracking system for unmanned visual surveillance is designed and implemented. To detect movement of an object, differential operation in the time domain is performed followed by binary morphological processing to extract the moving object of interest. The extreme and focusing points of detected object are computed to predict the object's moving direction and amount, and camera driving is performed using pan/tilt drives and electrically controlled zooming mechanisms for motion tracking. Since the amount of data to be processed is enormous and these operations require real time processing, the proposed scheme is implemented with extensive use of high density Programmable Logic Device (PLD) together with off-the-shelf peripheral devices.
New VLSI architecture for velocity computation of multiple moving objects in images
Author(s):
Chang-Yu Chen;
Chin-Liang Wang
Show Abstract
This paper presents a new systolic realization of the Fourier-based method for motion detection and velocity computation of multiple moving objects in images. In the architecture, the 2D discrete Fourier transform is computed via the 2D discrete Hartley transform (DHT) that involves only real valued arithmetic. The 2D DHT is realized based on the row-column decomposition without matrix transposition problems. The systolic system possesses the desirable features of regularity, modularity, and concurrency for VLSI implementation. It has a utilization efficiency of Min(N, M)/Max(N,M) X 100 percent and a throughput rate of one velocity estimation per T X Max(N,M) cycles, where N and M are the number of pixels of an image in the x- and y- directions, respectively, and T is the number of frames in the image sequence.
Use of second-order derivative-based smoothness measure for error concealment in transform-based codecs
Author(s):
Wenwu Zhu;
Yao Wang
Show Abstract
In this paper, we study the recovery of lost or erroneous transform coefficients in image communication systems employing block transform codecs. Previously, Wang et al. have developed a technique that exploits the smoothness property of image signals, and recovered the damaged blocks by maximizing a smoothness measure. There the first order derivative was used as the smoothness measure, which can lead to the blurring of sharp edges. In order to alleviate the edge blurring problem of this method, in this paper, we study the use of second order derivatives as the smoothness measure, including the quadratic variation and the Laplacian operators. Our simulation results show that a weighted combination of the quadratic variation and the Laplacian operator can significantly reduce the blurring across the edges while enforcing smoothness along the edges.
Analysis of the effect of motion-estimation error on the motion-adaptive spatial filter (MASF) and its application
Author(s):
Sang-Yeon Kim;
Seong-Dae Kim
Show Abstract
MASF (Motion Adaptive Spatial Filter) is a kind of temporal filter proposed for noise reduction and temporal band limitation. MASF uses motion vectors to extract the temporal information in spatial domain. Therefore, inaccurate motion information causes some distortions in MASF operation. In order to decrease the distortions, motion correcting techniques are required. In this paper, we analyze the effect of motion estimation error on MASF and propose a motion estimation scheme including motion correction and quantization for MASF. Experimental results show that by using the proposed scheme considerable amounts of distortions are eliminated.
Estimation of the point spread function of a motion-blurred object from autocorrelation
Author(s):
Tsang-Long Pao;
Ming-Dar Kuo
Show Abstract
The ultimate goal of image restoration is to recover a degraded image utilizing the digital image processing techniques. Image degradations may be in the form of sensor noise, blur due to camera misfocus, relative object-camera motion, and so on. However, the quality of restored results is closely related to the accuracy in estimating the degradation function, which is termed the point spread function (PSF). In this paper, a procedure that can be used to determine the PSF for a motion degraded image is presented. The method is to extract a block of blurred edge from the degraded image, and estimate the PSF from the autocorrelation of that area. The results of applying the proposed procedure to some motion blurred images are presented and the performance is discussed.
W-matrices and arbitrary-length nonorthogonal multiresolution analysis (MRA)
Author(s):
Man Kam Kwong;
Ping Tak Peter Tang
Show Abstract
We present an interpretation of multiresolution analysis of signal of any arbitrary finite length in terms of matrix theory. In particular, we present a new nonorthogonal MRA associated with 4 coefficients that has vanishing moments up to order 2. A more general theory of this class of transform is also presented.
Decoupled local energy and phase representation of a wavelet transform
Author(s):
Zhi-Yan Xie;
J. Michael Brady
Show Abstract
The wavelet transform is increasingly popular for mathematical scale- space analysis in various aspects of signal processing. The squared power and full-wave rectification of the wavelet transform coefficients are the most frequently features used for further processing. However it is shown in this paper that, in general, these features are coupled with the local phase component that depends not only on the analyzed signal but also on the analyzing wavelet at the scale. This dependency causes two problems: 'spurious' spatial variations of features at each scale; and the difficulty of associating features meaningfully across scales. To overcome these problems, we present a decoupled local energy and local phase representation of a real-valued wavelet transform by applying the Hilbert transform at each scale. We show that although local energy is equivalent to the power of the wavelet transform coefficients in term of energy conservation, they differ in scale-space. The local energy representation not only provides a phase-independent local feature at each scale, but also facilitate the analysis of similarity in scale-space. Applications of this decoupled representation to signal segmentation and the analysis of fractal signals are presented. Examples are given through out, using both real infra-red line scan signals and simulated Fractional Brownian Motion data.
Fast block matching method for image data compression based on fractal models
Author(s):
Hideo Kuroda;
Dan C. Popescu;
Hong Yan
Show Abstract
This paper presents a fast block matching technique for image data compression based on fractal models. In fractal coding, domain blocks in an image are searched and the one most similar to a range block is selected as the best matching domain block. We propose a fast search method to improve the encoding time and the data compression rate. In our method a positive aim at a domain block consists of the inner pixels of the range block and the outer only one or two pixels of the range block. The method has been tested on real image data with good results.
Three-dimensional tree shape reconstruction from multiviewpoint images based on fractal geometry
Author(s):
Noriaki Kuwahara;
Shinichi Shiwa;
Fumio Kishino
Show Abstract
A great deal of research has been devoted to generating 3D tree shapes by simulating tree growth to obtain realistic yet imaginary tree shapes. In order to display an actual scene in 3D computer graphics, 3D tree shapes of actual trees are necessary. Currently, despite considerable research on 3D shape reconstruction of objects from their silhouettes, the existing methods have so far been unable to handel complicated shape objects like trees whose silhouettes have a lot of occlusions. In this paper, we propose an algorithm for reconstructing 3D tree shapes based on fractal geometry, and show some experimental results.
Efficient codebook search algorithm for vector quantization
Author(s):
Chih-chiang Lai;
Shen-Chuan Tai
Show Abstract
In this paper, we present an efficient codebook search algorithm in a VQ-based system. The proposed fast search algorithm utilizes the compactness property of signal energy on transform domain and the geometrical relations among input vector and codevectors to eliminate those codevectors which is impossible to be the closest codeword to input vector. Id does not need to examine each entry in the codebook of a vector quantization encoder and can achieve a full search equivalent performance. In comparison with other existing fast algorithm, the proposed algorithm requires the least number of multiplication and the least total number of distortion measurements.
Optical image processing for structural determination of biological objects by dye contrasting
Author(s):
Alexander V. Smirnov;
Vitaly M. Podgaetsky;
Sergei A. Tereshchenko;
Larisa G. Tomilova;
Nikolai S. Vorobiev
Show Abstract
Time profiles of picosecond Nd:YAG-laser radiation transmitted through homogeneous and inhomogeneous scattering models of biological objects contrasted by new dyphtalocyanine dyes have been investigated. These dyes have specially synthesized as the contrasting substances for biological tissues in order to increase signal-to-noise ratio in optical image processing. By using this dye contrasting technique the ratio improvement has been demonstrated in time-resolved experiments on determination of inner details for some model biological objects.
Some results of International Telecommunication Union-Telecommunication Sector (ITU-T) test model TMN1
Author(s):
King N. Ngan;
Andrew Millin
Show Abstract
Recently, ITU-T established a test model (TMN1) to be used as a possible standard for very low bit rate video coding at less than 64 kbits per second. It is based on the ITU-T H.261 standard for videoconferencing. This paper provides an analysis of the performance of TMN1 coder with the aim of establishing its operational parameters and limitations when operating at very low bit rates. Its performance is then optimized by selecting the desired set of operational parameters.
Motion-oriented video sequence interpolation using digital image warping
Author(s):
Jun-Ye Lee;
Yawgeng A. Chau;
I-Chang Jou;
Rong-Hauh Ju
Show Abstract
We present a warping prediction scheme for motion compensated interpolation of image sequences by using the motion vectors transmitted to the decoder. In order to achieve the very low bit-rate, a method based on the Hermite form of the cubic polynomial curve segment for the reconstruction of skipped frame is developed so that the number of transmitted frames can be reduced. The principal idea is to reconstruct all picture elements at a certain time instant of their motion trajectory. Experimental results are presented to illustrate the subject quality of the reconstructed video frames.
Very low bit rate video coding system based on optical flow and region segmentation algorithms
Author(s):
Chung-Wei Ku;
Liang-Gee Chen;
You-Ming Chiu
Show Abstract
Digitalized video and audio system has become the trend of the progress in multimedia, because it provides great performance in quality and feasibility of processing. However, as the huge amount of information is needed while the bandwidth is limitted, data compression plays an important role in the system. Say, for a 176 x 144 monochrornic sequence with 10 frames/sec frame rate, the bandwidth is about 2Mbps. This wastes much channel resource and limits the applications. MPEG (moving piciure ezperi group) standardizes the video codec scheme, and it performs high compression ratio while providing good quality. MPEG-i is used for the frame size about 352 x 240 and 30 frames per second, and MPEG-2 provides scalibility and can be applied on scenes with higher definition, say HDTV (high definition ielevision). On the other hand, some applications concerns the very low bitrate, such as videophone and video-conferencing. Because the channel bandwidth is much limitted in telephone network, a very high compression ratio must be required. For the channel bandwidth as low as 28.8 Kbps (V.42), full-duplex is necessary, and 4Kbps is reserved for one-way speech, the effective bandwidth available is only about 10Kbps. As a result, the digital video signal should be compressed 200 times. For conventional codec scheme, such as 11.261, the performance is poor when the bit-rate is about 64Kbps. Therefore, MPEG-4 is being developed to satisfy the demand. In order to encode video signal, MC (motion compensated) method was widely used in standard systems, in which fix-block based algorithm was applied. However, there are some disadvantages such as block-effect that degrades the performance. To fit these applications, there are many approaches. For instance, model-based coding2 and analysis-synthesis coding.4 Model based coding is a very popular topic of survey in very low bit rate system. Though the encoded pictures are much constrained only in slow-moving of human face, the transmitted information is very little because only some parameters of motion of models must be sent. However, the technique is still at beginning because how to analyze and extract the parameters in a moving sequence is very difficult and complex. Besides, the moving sequence is much limitted in specific patterns. That is, if an unexpected pattern existed in the picture, say, a raising hand, the system may cost even much more both in complexity and bit-rate. And as a commercial consideration, most customers may not accept an "synthesized" countenance of his friend or relative in the screen. The other branch is "object-oriented analysis-synthesis coding" . In the approach, the contents in the picture is classified into background, model compliance object, and model failure object. The model-compliance object is coded by its motion and shape; while the model-failure object is coded by its color (including luminance and chrominance) and shape. The coding of background is unnecessary. Head and shoulder usually belong to the model-compliance objects and the details in face such as eyes and mouth are model-failure objects. This method immunes from the variety of picture patterns compared with the model-based approach, however, the performance is not good enough to fit the practical demand. Besides, the complex analysis is also a major burden. There are many researches about this topic recently and lots of modifications and adaptive schemes were proposed. As mentioned above, object-oriented coding is block-effect free. Besides, it is an efficient approach for coding the scene in video-conferencing application because only simple head and shoulder occupies the major part. In the view of image analysis, object-oriented approach contains much direct information than conventional waveform coding thus it can satisfy the requirements. In this paper, a pseudo object-oriented coding system is proposed. It is based on a pel-wised motion estimation and arbitrarily shaped transform. Instead of block matching scheme, the motion estimation is realized by a modified optical flow algorithm (MOFA). Because ofthe quasi-homogeneous property of motion field, the objects can be extracted by simpler segmentation. These objects are applied with arbitrarily shaped transform (AST). Briefly speaking, MOFA reduces temporal redundancy while AST reduces spatial redundancy. Since AST is applied on motion field rather than color field, it is quite different from conventional coding system.
Fast motion estimation algorithm with adjustable search area
Author(s):
Chun-Hung Lin;
Ja-Ling Wu
Show Abstract
Full search block-matching algorithm (FBMA) was shown to be able to produce the best motion compensated images among various motion estimation algorithms. However, huge computational load inhibits its applicability in real applications. A lot of different methods, with lower complexity, have been proposed to speed up the process of motion compensation, but the resultant image quality cannot be as good as FBMA does. A new motion estimated algorithm, with less computational complexity and similar image quality while comparing to FBMA, will be presented in this paper. By considering the relation between neighboring blocks, the search area in the algorithm is adjustable. Due to the adaptation of the search area, the computation complexity can be largely reduced and the actual motion vectors can still be found. On the Sun SPARC-II workstation, the speed of the proposed algorithm can be 61 times faster than that of FBMA, maximally.
Motion estimation using the Radon transform in dynamic scenes
Author(s):
Seyed Alireza Seyedin;
Chris J.E. Phillips
Show Abstract
This article describes a Radon transform based method for estimating the displacement in a monocular image sequence from dynamic scenes which contain rigid moving objects. The Radon transform is used to take projections along parallel directions from two consecutive frames in a monocular image sequence while the camera is stationary. The method consists of three stages: (a) motion detection, (b) identification of moving object areas in rectangular regions, (c) motion estimation for each isolated moving object. Moving object detection and displacement measurement using this method is insensitive to sudden or gradual overall illumination changes in the sequence. The method has the capability to be implemented in real time by using pipeline architecture. Motion estimation operations using the Radon transform do not need much time. Results using synthetic and real scene images are included.
Simple motion-compensated up-conversion method for TV/HDTV compatible video coding
Author(s):
Soon-kak Kwon;
Jae-Kyoon Kim
Show Abstract
In this paper, we propose a method of motion-compensated up-conversion from TV resolution to HDTV resolution for compatible coding. The proposed method obtains the motion information for up-conversion from the motion vectors included in the lower layer. The algorithm has advantage in obtaining motion vectors for up-conversion without extra transmission. Simulation results show that the spatial prediction by the proposed up-conversion method gives a better performance than the conventional one, also that the coding efficiency of the higher layer is improved.
Analysis of error concealment schemes for MPEG-2 video transmission over ATM-based networks
Author(s):
Wenjun Luo;
Magda El Zarki
Show Abstract
When cell loss occurs, error concealment can play a critical role in recovering the viewing quality of impaired video. In this paper, we first present a Slice Interleaving (SI) algorithm which has been demonstrated to be able to effectively prevent vertical adjacent slice loss, thereby enhancing the performance of error concealment techniques that rely heavily on interpolation. Experimentation on approximately 100 video clips using a time-variant Markov Chain (MC) cell loss model that emulates a fluctuating bursty cell loss environment, allowed us to conduct a comprehensive and comparative study of error concealment mechanisms in different coding domains. We then designed a hybrid concealment algorithm, such that, for an arbitrary video sequence, several concealment mechanisms can be adaptively merged together to achieve the best performance. Simulation results show that our hybrid algorithm can effectively detect frames or macroblocks with scene change and/or excessive irregular motion, and adaptively switch to an appropriate concealment module to achieve the optimal video quality.
Design and implementation of a digital TV transport system
Author(s):
JaeGon Kim;
Yo-Sung Ho;
Do-Nyon Kim
Show Abstract
We, in ETRI, have developed a real-time digital TV (DTV) system for direct broadcasting satellite (DBS) service. In this paper, we described hardware design and implementation of a DTV transport system, which is based on the MPEG-2 system specification. We also explain the system architecture and design considerations behind the development of the hardware system with field programmable gate arrays (FPGAs).
Adaptive error concealment in SNR scalable system
Author(s):
Cheul-hee Hahm;
Jae-Kyoon Kim
Show Abstract
In this paper, we propose an adaptive error concealment method for enhancement layer, which is quantized with a fixed step size, in SNR scalable system. The proposed method consists of two steps. First, we partition the impaired image area into the Not-Coded macroblocks and the Coded macroblocks according to the quantization step sizes of both layers. Since the step sizes of base layer are transmitted through guaranteed channel and the residual data of enhancement layer are quantized with a fixed step size, the decoder can always receive the quantization step sizes of both layers. Second, we choose the best concealment method among spatial-domain and temporal-domain error concealments for the Not-Coded macroblock. For this selection, neighborhood matching is used. Experimental results show that the proposed concealment method gives about 8% improvement in PSNR over the conventional one using base layer.
Intelligent packetising of MPEG video data
Author(s):
Iain E. Garden Richardson;
Martyn J. Riley
Show Abstract
In this paper we describe an 'intelligent' packetizing scheme in which MPEG coded video data is split into separate streams of ATM cells according to the picture coding type of each frame. We investigate the tolerance of each stream to cell losses and analyze the traffic characteristics of the separate streams. We conclude that this technique may provide a flexible means of splitting MPEG data into separate streams, each of which can accept a different Quality Of Service during transmission through an ATM network.
Integrated packet video/voice/data protocol for CDMA wireless LANs
Author(s):
Songchar J. Jiang;
Y. Huang;
W. Weng;
L. Lin;
C. Chen;
W. Huang
Show Abstract
This paper proposes an integrated protocol of CDMA wireless LANs. Voice, data, and images can be exchanged over the wireless LANs based on this frame-based protocol. This protocol uses a handshake procedure for voice calls first, followed by a short polling/contention for data data/video transfer requests. Then, the formal transmission are allocated to voice users the higher priority, while the data/video transfers will not start until some random delay. The startup time of data/video transfer depends on the number of voice calls initiated and the system threshold. This protocol fully utilizes the characteristic of CDMA technique which allows simultaneous transmissions, while utilizes the movable boundary concept in TDMA to assure quality of service. Approximate performance analysis by assuming fixed packet length of both voice packets and data/video packets are proposed and numerical results in voice call blocking probability, packet loss rate, and network throughput are presented.
Stable buffer control strategy with learning capability
Author(s):
Chieh-Feng Chang;
Jia-Shung Wang
Show Abstract
In this paper, we present a stable buffer control strategy for video coding and transmission over ATM. This strategy assigns quantizer scales by considering the smooth quality, the transmission bandwidth and the buffer occupancy for each frame simultaneously. Besides, it has the learning capability of fitting various image sequences. This strategy was implemented on MPEG-1 coding scheme. The simulation results show that the quality is almost identical to that produced by the pure MPEG-1 while the output rate is maintained at a constant level.
New two-layer coding scheme utilizing background information
Author(s):
Jin-Rong Chen;
Shih-Yu Huang;
Chun Wei Hsieh;
Jia-Shung Wang
Show Abstract
Cell loss in ATM transport is a critical factor affecting the quality of an image sequence. Two-layer coding is adopted and has the acceptable performance to conquer the above problem. However, the inefficient coding scheme causes the redundant information in both two layers. In this paper, we reduce the temporal redundancy using the background information of image sequence. Layer separation is decided according to the information from the background classifier. The performances of the new two-layer codec is evaluated by the true image sequences. Experimental results show that a very high percentage of base layer data can be reduced in I-frame.
Video traffic analysis in a multimedia communication network
Author(s):
Raymond Wai-Leung Cheung;
Peter C. K. Liu;
M. Suleyman Demokan
Show Abstract
The various types of video sources on a multimedia network are simulated using the continuous state autoregressive stochastic process Markov models. These models are normalized for comparison. Due to different characteristics of the various video sources, the statistical multiplexing of these sources produces bandwidth gain during the video packet transmission. The numerical results indicate that the FIFO buffer size has a significant effect on cell loss improvement when the number of video sources are increased.
Region-based time-varying image coding at low bit rate with a high visual quality
Author(s):
Ling Wu;
Jenny Benois-Pineau;
Dominique Barba
Show Abstract
In this paper a method for very low bit rate coding of video sequences is described. The method is based on spatio-temporal segmentation of the sequence into semantically significant regions with polygonal shapes. The motion parameters, structural description of regions and reconstruction by motion compensation error should be encoded to reconstruct high quality images at the decoder. In the coding scheme, the temporal coherence of the regions structure and of motion compensation error are taken into account in the encoding scheme to achieve a very low bit rate.
Video coding control methods based on frame and block classifications
Author(s):
Eri Murata;
Takashi Mochizuki
Show Abstract
This paper proposes video coding control methods which greatly improve image quality for videoconference or videophone systems. The proposed methods utilize frame and block classifications to incorporate subjective characteristics. Frame classification makes it possible to tune quantization step sizes according to which is more important regarding each processing frame, motion smoothness or image clarity. Block classification is used to adjust encoding parameters according to whether the block belongs to background regions or not. combining the two proposed method, quantization step sizes are controlled to achieve high image quality according to scene and block contents. Moreover, without unnecessary bit consumption, coding distortion is reduced in uncovered background regions, which appear after moving objects.
Scene-adaptive bit rate control strategy
Author(s):
Sang Gyu Lee;
Jang Hyuk Cho;
Sang Gyu Park;
Sung-Woong Ra
Show Abstract
An effective bit rate control scheme is one of the most important issues in the video coding algorithm for the transmission of high quality images through the band limited channel. An MPEG2 three step rate control algorithm is well suited for high compression of normal pictures maintaining good quality. But it is well known that the MPEG2 rate control algorithm can not handle scene change efficiently. In this paper, we present a simple rate control algorithm which is capable of handling scene changed pictures efficiently. The algorithm assigns more bits to scene changed pictures and fewer bits to previous pictures maintaining the total bit rate. We also propose a method that can estimate the complexity of a picture using the activity of the picture. The estimated complexity is usefully used as a rate control parameter in coding of scene changed picture. Simulation results show that proposed algorithm enhance the visual qualities after scene change.
Subband-based scalable coding schemes with motion-compensated prediction
Author(s):
Katsutoshi Sawada;
Tsuyoshi Kinoshita
Show Abstract
This paper describes scalable coding schemes which use subband picture decomposition and motion compensated interframe prediction. In the scalable coding, a lower resolution picture can be obtained by decoding only a subset of the total bitstream, while a full resolution picture is obtained by decoding the total bitstream. Two types of scalable coding schemes are studied. In the first type (schemes A), an input picture is first decomposed to subband pictures, then MC prediction coding is carried out in the subband picture domain. In the second type (scheme B), MC prediction is first carried out in the full band picture domain and then subband decomposition is performed for the prediction difference picture. Coding performance for these two types of schemes was estimated by computer simulation experiments. The performance comparison between scalable and non scalable coding schemes was also carried out. The experimental results have demonstrated that the scheme B is superior to schemes A.
Temporal and spatial interleaving of H.261 compression for lossy transmission
Author(s):
Gong-San Yu;
Max Ming-Kang Liu
Show Abstract
This paper proposes a modified H.261 code that uses both temporal and spatial domain interleaving for transmission over a lossy channel. In this scheme, a video input sequence is splitted into two temporally interleaved subsequences for separate H.261 encoding. After that, spatially interleaved macro blocks are transmitted separately by different packets. From these temporal and spatial interleavings, lost data can be substituted from the closest matched blocks in the adjacent frames. As a result, the error propagation effect is minimized and very little distortion is observed. Due to only one frame delay from temporal interleaving, the scheme maintains a good compression ratio and is attractive to real-time video communications.
Improvements of embedded zerotree wavelet (EZW) coding
Author(s):
Jin Li;
Po-Yuen Cheng;
C.-C. Jay Kuo
Show Abstract
In this research, we investigate several improvements of embedded zerotree wavelet (EZW) coding. Several topics addressed include: the choice of wavelet transforms and boundary conditions, the use of arithmetic coder and arithmetic context and the design of encoding order for effective embedding. The superior performance of our improvements is demonstrated with extensive experimental results.
Gradient-based buffer control technique for MPEG
Author(s):
Liang-Jin Lin;
Antonio Ortega;
C.-C. Jay Kuo
Show Abstract
This paper proposes a new buffer-control technique for MPEG Video. The goal is to minimize the distortion while keeping the change in distortion between consecutive frames small. We formulate this goal as a constrained optimization problem and find the optimal solution using an iterative gradient search method. A preliminary experiment on a sub- GOP (4 frames, IBBP) shows that the solution is very close to the global optimal point. Further experiments on short video sequences show that, at the same bit rate and buffer constraint, the technique generates output sequences with smaller and more stable mean square error than other techniques, and strictly constant bit rate for every group of pictures. This technique should be particularly useful for off-line encoding or for one encoder/many decoders situations, where the higher computational cost required at the encoder can be afforded pays off by allowing higher quality sequences or smaller decoder buffers.
Segmentation-based subband video coder
Author(s):
Chun-Hung Lin;
Ja-Ling Wu
Show Abstract
In this paper, a segmentation-based subband video coding algorithm is proposed. In this algorithm, each encoding frame is first compared with the associated prediction frame (outputed from a motion estimation based predictor) and the variant blocks (those blocks with variation larger than a given threshold) in the encoding frame are located. And the successive QMF decompositions and entropy coding are applied to encode each variant block. Simulations show that, by using the proposed coding scheme, the picture quality of the reconstructed frames is better than that of the other two well-known subband coding schemes while the resultant data rates are nearly the same. The computational complexities involved in the proposed approach are also shown to be less than that of the others. Moreover, the segmented nature of the proposed algorithm makes it more suitable for parallel implementation.
Digital coding of NTSC sequences with a subband-VQ scheme
Author(s):
Yo-Sung Ho
Show Abstract
In order to compress digital NTSC video sequences for transmission or storage, we have developed a Subband-VQ coding scheme where 2D frequency subbands of each picture frame are encoded using VQ coding techniques. In the Subband-VQ coding scheme, we take a multistage VQ approach to obtain good picture quality at about 6 Mb/s while avoiding a potential hardware complexity problem of VQ. In conjunction with the SBC and VQ coding techniques, motion estimation and motion compensation algorithms are implemented to fully exploit the temporal correlation existing in video sequences. The SBC-VQ scheme has a very simple decoder structure which mainly consists of table look-up operations.
Progressive image transmission using variable block coding with classified vector quantization
Author(s):
Young Huh;
Krit Panusopone;
K. R. Rao
Show Abstract
The progressive image transmission allows an approximate image to be built up quickly and details to be transmitted progressively through several passes over the image. This paper describes progressive image transmission scheme using the variable block coding technique in conjunction with variety of quantization schemes in the transform domain. The proposed scheme uses a region growing to partition the images so that regions of the different sizes can be addressed using a small amount of side information. This segmentation divides the image into five different regions that vary in size based on the details within the image. High-detail blocks are classified into four different categories using the energy distribution followed by classified vector quantization (CVQ), and low-detail blocks are encoded with scalar quantization. Simulation results show that the reconstructed images preserve fine and pleasant qualities based on both subjective and mean square error criteria. Also, the receiver reconstructs more details in each stage so that the observer can recognize the image quickly.
Block matching algorithm using a genetic algorithm
Author(s):
In Kwon Kim;
Rae-Hong Park
Show Abstract
In this paper, we propose a block matching algorithm (BMA) using a genetic algorithm. The genetic algorithm was inspired by an information processing scheme which is used by nature. To use the genetic algorithm in 2D block matching, we encode, based on a quad-tree structure, the phenotype representing a motion vector, i.e., the genotype is represented by four symbol strings. The probability of mutation is differently set for each position in a symbol string. Computer simulation results show that we can have the peak signal to noise ratio (PSNR) of the proposed genetic-based BMA comparable to that of the three step search (TSS) or full search (FS) by varying the number of search points.
Chromatic boundary detection by integrating adaptive smoothing and vector gradient approach
Author(s):
Soo-Chang Pei;
Ching Min Cheng
Show Abstract
Chromatic boundaries are edges caused by material changes. To detect these boundaries in the color images, chromatic information must be used. In this paper, we present an integration algorithm to combine adaptive smoothing technique and vector gradient approach together. Simulation results show that the integration algorithm earns the benefits of both adaptive smoothing technique and vector gradient approach. It can detect chromatic boundaries effectively and smooth out the variations within chromatic boundaries. Whether input color images are contaminated noise or not, the integration algorithm performs better than both adaptive smoothing for intensity image and vector gradient approach.
Rectangle-shaped object detection in aerial images
Author(s):
Dmitry M. Lagunovsky;
Sergey V. Ablameyko
Show Abstract
A fast algorithm to detect rectangles in aerial images is proposed. An initial gray-scale image is used as input for the algorithm. A contour image is obtained from it by modified edge detection scheme. The primitive lines are extracted in the contour image and joined into line segments by cluster analysis method. The line merging algorithm is developed and algorithm to detect rectangles from the extracted straight lines is suggested. The developed algorithm yields a good relation between computational time and quality of the result.
Detection of low-contrast objects in textured images
Author(s):
Devesh Patel;
E. R. Davies
Show Abstract
In this paper we present a method for the detection of objects that are not clearly defined by an edge within the underlying texture. The application of this method is to detect impurities or contaminants within food products. The authors have previously proposed a system that has an extremely high detection rate for a wide range of contaminants, but which needs to be further developed for the detection of low contrast contaminants. The method presented in this paper uses convolution to extract texture features from the food images to generate the texture energy images. The convolution mask coefficients are the principal components obtained from images that do not have any foreign objects. The grey levels of the resulting texture energy images are modified to eliminate the underlying noise in a consistent way across all these images. A distance map image is created using the Mahanalobis distance measure to indicate the presence of any contaminants within the food products. This paper shows that the proposed method can cope with the subtle variations between the contaminants and the food background and successfully detect the low contrast contaminants.
Performance evaluation of unsupervised stochastic model-based image segmentation technique
Author(s):
Tianhu Lei;
Wilfred Sewchand
Show Abstract
This paper provides a new approach for performance evaluation of unsupervised stochastic model-based image segmentation techniques. Performance evaluation is conducted at three (3) aspects: (1) ability in detection of the number of image regions, (2) accuracy in estimation of the model parameters, and (3) error in classification of pixels into image regions. For detection performance, probabilities of over- detection and under-detection of the number of image regions are defined, and the corresponding formulae in terms of model parameters and image quality are derived. For estimation performance, this paper shows that both Classification-Maximization (CM) and Expectation-Maximization (EM) algorithms produce the asymptotically unbiased ML estimates of model parameters in the case of no-overlap. Cramer-Rao bounds of variances of these estimates are derived. For classification performance, misclassification probability, based on parameter estimate and classified data, is derived to evaluate segmentation errors. The results by applying this performance evaluation method to the simulated images demonstrate that for the images with the moderate quality, the detection procedure is robust, the parameter estimates are accurate, and the segmentation errors are small.
Statistical split and polynomial merge algorithm for image representation
Author(s):
Seoung-Jun Oh;
Keun-Heum Park
Show Abstract
Since polynomials fit the geometrical forms of images harmoniously and well represent slowly varying surfaces in images, there were many split and merge algorithms which used a polynomial function to represent each homogeneous region. Even though very low-bit rate can be achieved using their algorithms, it takes too much time for both split process and merging process. Furthermore, the splitted result is not quite well matched to HVS, either. In this paper, a new split and merge algorithm is designed. In this algorithm the split process uses a statistical hypothesis test called ShortCut method as a measurement of region homogeneity, and the merge process uses a polynomial function. The computation time for the split process can be significantly reduced using the new algorithm, and the new scheme reflects HVS more than any other scheme. To justify the algorithm proposed here, it is compared with other algorithms including Kunt's algorithm.
Comparison of the segmentation of ultrasonic image utilizing different fractal parameters
Author(s):
Kwok-Leung Chan
Show Abstract
Statistical texture analysis has been commonly applied in the quantitative characterization of ultrasonic image. Recently, another approach begins to emerge, which uses fractal dimension. In this investigation, fractal dimension is estimated from histologically confirmed ultrasonic images using three different approaches: intensity- based, spectrum-based and reticular cell counting. The parameters are then used in image segmentation. The fractal model has the advantages that the parameter generated is stable over transformations of scale and linear transforms of intensity. From the results obtained, the performance of fractal dimension obtained by the reticular cell counting method is better than the other two approaches and comparable to the spatial grey level co-occurrence matrices statistic.
Image segmentation for stone-size inspection
Author(s):
Jui-Pin Hsu;
Chiou-Shann Fuh
Show Abstract
Object size inspection is an important task and has various applications in computer vision, for example, automatic control stone-breaking machines. In this paper, an algorithm is proposed for image segmentation on size inspection of almost round stones with strong textures or almost no textures. We use one camera and multiple light sources at difference positions to take one image when each of the light sources is on. Then we compute the image differences and threshold them to extract edges. We will explain, step by step, picture taking, edge extraction, noise removal, and edge gap filling. Experimental results will be presented. Through various experiments, we find our algorithm robust on various stones and under noise.
Range image segmentation via edges and critical points
Author(s):
Xintong Zhang;
Dongming Zhao
Show Abstract
A novel method for range image segmentation is presented in this paper, It is based on an integration of edge and region information. The algorithm consists of three steps: edge and critical point detection, triangulation, and region information. The edge detection method presented in this paper is based on morphological operations. In general, segmentation may not be effective when only edge operators are applied on range images especially on noisy images. Further processing is important for final segmentations when the edge operators are not sufficient. In this paper, critical points are extracted from planar edge curves. These edge curves and critical points constitute an initial set of segments. The constrained Delaunay triangulation is employed on the initial set to obtain triangle-like connection graphs. By projecting the critical points and their connectivity relationships in parallel onto 3D surface, a 3D surface structure graph (SSG) is obtained. Hence, segmentation is completed by grouping these triangle-like facets. The grouping scheme is presented in this paper according to the normals of adjacent facets. Because edge curves are not usually straight lines but rather a set of curve segments, we introduce extensive triangulation for building 3D triangle-like surface structure graphs (SSG's). This method significantly reduces the computation complexity compared to polyhedral approximations using the original Delaunay triangulation. Experimental results show that the method is efficient for range image segmentation especially for polyhedra.
Range image segmentation using pseudoreflectance images
Author(s):
Ho-Keun Song;
Kug-Chan Cha;
Jong Soo Choi
Show Abstract
In this paper, a range image segmentation method using Pseudo Reflectance Images (PRIs) is proposed. The PRI is generated of the range image by assuming a light source position and applying a shading algorithm. It is assumed that surfaces within a PRI correspond to physical surfaces in a scene. Planar surface points within the PRI can be readily extracted from the condition of homogeneous reflectance value in addition to the invariant characteristic of reflectance gradient. The extraction of quadratic surface points within the PRI is associated with the fact that is has a direct relationship between elliptical, hyperbolic and parabolic points and rotational change of the image reflectance gradient produced from varying light source position. Moreover, the major part of edges in range image are formed by jumps in reflectance values within the PRI, and these types of edges can be detected by common edge operator.
Interactive system for classifying multispectral images using a Hilbert curve
Author(s):
Michiharu Niimi;
Sei-ichiro Kamata;
Eiji Kawaguchi
Show Abstract
There are several techniques for analyzing multispectral images. In general, those are performed by linear transformation methods. In this paper, we present a new interactive method for classifying multispectral images using a Hilbert curve which is a one-to-one mapping from N- dimensional space to one dimensional space and preserves the neighborhood as much as possible. The merit of our system is that the user can extract clusters without computing any distance in N- dimensional space, and analyze multidimensional data from gross data distribution to fine data distribution hierarchically. In the experiments using LANDSAT TM data, it is confirmed that the user can get the real time response from the system after once making the data tables, and understand distribution of data that correspond to categories in feature space.
Character segmentation algorithm for off-line handwritten script recognition
Author(s):
Nelson Hon Ching Yung;
Andrew H. S. Lai;
Perry Z.P. Chua
Show Abstract
In this paper, a new character segmentation algorithm for dealing with off-line handwritten script recognition is presented. The X-axis projection, Y-axis projection and geometric classes techniques used by the algorithm proves to be successful in segmenting normal handwriting with a success rate of 93.5%. As a result of this development, detailed understanding of geometric classes of English characters and the difficult cases in segmentation was gained. Although the algorithm works quite well with a randomly chosen sample, results of a detailed analysis may shed new light into the tuning of the algorithm especially for segmenting the identified difficult cases.
Radiance transformation of multitemporal LANDSAT image for land cover classification
Author(s):
Fumihiro Tanizaki;
Michiharu Niimi;
Sei-ichiro Kamata;
Eiji Kawaguchi
Show Abstract
Recently classification of remote sensing images using neural network approach is studied. However multitemporal LANDSAT image data is not used for classification. A problem in classification of remote sensing images is that we cannot get images in fixed wide area such as states of prefectures at one time because the range of sensors of artificial satellite is limited. For example, full area of Fukuoka prefecture is observed two separate images. For multitemporal images, several factors affect the spectrum at different observation dates. In this paper, we concentrate on the sunbeam factor from which we can estimate the intensity. Using the sun elevation angle from the data, we can estimate sunbeam intensity. We transform multitemporal images using radiance transformation which is based on path radiance model. We confirmed that radiance transformation is effective to the classification of multitemporal images from several experiments.
Object-based coding of stereo image sequences using joint 3D motion/disparity segmentation
Author(s):
Dimitrios Tzovaras;
Nikos Grammalidis;
Michael G. Strintzis
Show Abstract
An object-based coding scheme is proposed for the coding of the right channel of a stereoscopic image sequence, using motion and disparity information. A multiresolution block-based motion estimation approach is used for initialization, while disparity estimation is performed using a pixel-based hierarchical dynamic programming algorithm. A split and merge segmentation procedure based on both 3D motion and disparity is then used to determine regions with similar motion and depth parameters. This is combined with an efficient depth modeling method that offers full depth information at the decoder site. In order to reduce the computational load of the merge phase of the algorithm, a fast algorithm is implemented which speeds up the merge procedure considerably. The segmentation part of the algorithm is interleaved with the estimation part in order to optimize the coding performance of the segmentation procedure. Motion and depth model parameters are then quantized and transmitted to the decoder along with the segmentation information. An object-based motion compensating scheme is then used to reconstruct the original image, based on the objects created by the segmentation approach.
Lip synchronization in talking head video utilizing speech information
Author(s):
Tsuhan Chen;
H. P. Graf;
Homer H. Chen;
Wu Chou;
Barry G. Haskell;
Eric D. Petajan;
Yao Wang
Show Abstract
We utilize speech information to improve the quality of audio-visual communications such as video telephony and videoconferencing. We show that the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion, and demonstrate speech-assisted coding of talking head video. Demonstration sequences are presented. Extensions and other applications are outlined.
Entirely psychovisual-based subband image coding scheme
Author(s):
Abdelhakim Saadane;
Hakim Senane;
Dominique Barba
Show Abstract
A new subband coding scheme is proposed in this paper. The two main functions in such schemes, which are the decomposition and quantization, are entirely based on the psychovisual aspects. The visual subbands have been estimated by using the variation of the masking function. These masking effects in the case of sinusoidal gratings show that the peripheral part of the visual system may be modelled by a set of sixteen filters and a low frequency residue. The quantizers associated to such a decomposition have been designed by a methodology which has been developed. The main finding of the conducted experiments is that the decision thresholds and the reconstruction follow a linear law, with an interval quantization varying with frequency and orientation. This result, highly dependent on the way the signals have been characterized, justifies the choice of the local band-limited contrast. The results, obtained with a coding scheme which includes these basic features of the visual system, show that a low signal to noise ratios the visual quality of the reconstructed image reminds much better than for the 'classical' schemes. Another particularity of the approach lies in the structure of the reconstruction image error. Indeed the latter is found to be highly correlated to the structure of the original image.
Improvement of image-compression quality via block classification and coefficient diffusion
Author(s):
Kuo-Chin Fan;
Juo-Chien Chang;
Kou-Sou Kan
Show Abstract
Image compression techniques have been widespreadly used in abundant of applications. In the consideration of image quality and compression ratio, an efficient compression technique is expected. Transform coding is known to be generally superior. In this paper, we will present a novel method based on adaptive classification and coefficient diffusion techniques to improve image compression quality while maintaining compression ratio. Experiments are conducted on a wide variety of images. Experimental results reveal that both of the image quality and the compression ratio are retained by applying the proposed method.
Object-based coding method for visual telephony using discrete wavelet transform
Author(s):
Keith Hung-Kei Chow;
King-Ip Chan;
Ming Lei Liou
Show Abstract
For applications such as videophone or videoconferencing, we all agree that foreground object is more important than the background information. Nevertheless, in most conventional video codecs, such as H.261, due to the variation of the background lighting, noise or other unpredictable reasons, a large amount of available bandwidth is used to code the residue blocks at the background. Thus, it makes the coding inefficient when we consider the subjective visual quality of the foreground object. In this paper we proposed an object-based coding method. It tries to identify the foreground object and then put most of the resources to code it. We suggest a block-based DWT as a coding kernel because of its simple structure and outstanding performance. Experimental results show that we can compress a video sequence down to below 20 kb/s while having a reasonable visual quality.
Two-dimensional object recognition using chamfer distance transform on morphological skeleton
Author(s):
Jun-Sik Kwon;
Jong-Ho Choi;
Jong Soo Choi
Show Abstract
In this paper, we propose a new method to represent the 2D shape and to recognize the object. In previous approaches, the boundary information is generally employed to recognize the 2D object. The points extracted from the polygonal approximation of the boundary are considered as the features and the point pattern matching is used. The methods, however, have the difficulty in the recognition of changeable objects. Thus we obtain the morphological skeleton and the distance transform and use them in the 2D object recognition. The matching method has the invariant features (rotation, translation, and scaling) and is a class of stochastic approach. And then the proposed method is employed effectively for the recognition of the changeful shaped object, e.g., tools like a wrench and a stripper or marks of the different shape and size.
Texture analysis: pretopological approach versus morphology approach
Author(s):
Stephane Bonnevay;
Michel P. Lamure;
Nicolas Nicoloyannis
Show Abstract
This paper deals with the problem of texture classification: a new coding images has been developed. The first interest of this article is the presentation of a new approach for texture analysis based on pretopology: pretopology is a mathematical area which generalizes, in our case, mathematical morphology. Instead of using only one structuring element to make transformations on an image, we can use a basis of structuring elements. We can recreate the mathematical-morphology transformations like dilation and erosion, but we can build new transformations, thinner or coarser than dilation or erosion. More over, we can use the all mathematical knowledges of pretopology. The second part of this article presents the coding images based on the pretopological approach: it uses a basis of four structuring elements. One gray scale image is coded by two images of 15 colors. We try to determinate the discrimining capacity of the coding images.
Morphological approach for thresholding noisy images
Author(s):
C. K. Lee;
Siu Pang Wong
Show Abstract
Image segmentation is an important preprocessing step before object recognition. Here, we assume that an image consists of three main primitives, namely the noise, the object and the varying background. First we shall show a mean to characterize the sizes of these primitives based on the morphological opening. Second, we investigate how an image can be effectively enhanced by looking for blocks inscribed under the image surface and then removing the top of the noisy background and the bottom of the foreground, which is small speck noise, constructed from the surfaces of the inscribed blocks. With these findings, a morphological segmentation algorithm is thus formulated. Experimental results are included to illustrate its superiority over the other two segmentation algorithms.
Stereo matching using the morphological filtering and the fingerprints on the scale space
Author(s):
Chang-Bum Lee;
Woo Young Choi;
Rae-Hong Park
Show Abstract
In this paper, we propose an efficient stereo matching algorithm using morphological filtering and finger print on the scale space. We propose a morphological filter using a Gaussian structure element, which has lower computational complexity than conventional Gaussian filtering with similar performance. In stereo matching, we propose a coarse-to-fine feature-based method to minimize the effect of mismatching and noise by scale change. In the proposed stereo matching algorithm, we use the loci of zero-crossing points in the left and right images, as the robust matching features, and dynamic programming for feature correspondence. Computer simulation results with several test images show the effectiveness of the proposed feature-based stereo matching algorithm using the finger prints on the scale space.
Compression of medical images with regions of interest (ROIs)
Author(s):
Man-Bae Kim;
Yong-Duk Cho;
Dong-Kook Kim;
Nam-Kyu Ha
Show Abstract
In most medical images, regions of interest (ROIs) that may include clinically important information exist and occupy a small portion of the image. Based on this observation, we present compression methods that can effectively compress medical images with ROIs. They are implemented in a manner that ROIs are reversibly compressed and non-ROI (the region outside of ROIs) is irreversibly compressed. In this paper, we present and analyze the three different compression schemes: a DCT-based compression, a DCT/HINT compression, and a HINT-based compression. These methods compress ROIs by reversible compression and non-ROI by irreversible compression. Our current study shows that compression ratio decreases exponentially as ROI ratio (the portion of ROIs in the image) increases. Also, it showed that RMSE (Root-Mean-Squared Error) is not much dependent upon the ROI ratio. To verify this, we tested seven heart X-ray images, twelve head MR images, ten abdomen CT images, and ten chest CT images. Our experimental results showed that the DCT-based compression is the best among the three proposed methods in terms of compression ratio, algorithm complexity, and quality of a reconstructed image.
Stereoscopic video compression using temporal scalability
Author(s):
Atul Puri;
Richard V. Kollarits;
Barry G. Haskell
Show Abstract
Despite the fact that human ability to perceive a high degree of realism is directly related to our ability to perceive depth accurately in a scene, most of the commonly used imaging and display technologies are able to provide only a 2D rendering of the 3D real world. Many current as well as emerging applications in areas of entertainment, remote operations, industrial and medicine can benefit from the depth perception offered by stereoscopic video systems which employ two views of a scene imaged under the constraints imposed by human visual system. Among the many challenges to be overcome for practical realization and widespread use of 3D/stereoscopic systems are efficient techniques for digital compression of enormous amounts of data while maintaining compatibility with normal video decoding and display systems. After a brief discussion on the relationship of digital stereoscopic 3DTV with digital TV and HDTV, we present an overview of tools in the MPEG-2 video standard that are relevant to our discussion on compression of stereoscopic video, which is the main topic of this paper. Next, we determine ways in which temporal scalability concepts can be applied to exploit redundancies inherent between the two views of a scene comprising stereoscopic video. Due consideration is given to masking properties of stereoscopic vision to determine bandwidth partitioning between the two views to realize an efficient coding scheme while providing sufficient quality. Simulations are performed on stereoscopic video of normal TV resolution to compare the performance of the two temporal scalability configurations with each other and with the simulcast solution. Preliminary results are quite promising and indicate that the configuration that exploits motion and disparity compensation significantly outperforms the one that exploits disparity compensation alone. Compression of both views of stereo video of normal TV resolution appears feasible in a total of 8 or 9 Mbit/s. Finally, the implication of our results is discussed and potential directions for future research are identified.
Image coding based on energy-sorted wavelet packets
Author(s):
Lin-Wen Kong;
Kuen-Tsair Lay
Show Abstract
The discrete wavelet transform performs multiresolution analysis, which effectively decomposes a digital image into components with different degrees of details. In practice, it is usually implemented in the form of filter banks. If the filter banks are cascaded and both the low-pass and the high-pass components are further decomposed, a wavelet packet is obtained. The coefficients of the wavelet packet effectively represent subimages in different resolution levels. In the energy-sorted wavelet- packet decomposition, all subimages in the packet are then sorted according to their energies. The most important subimages, as measured by the energy, are preserved and coded. By investigating the histogram of each subimage, it is found that the pixel values are well modelled by the Laplacian distribution. Therefore, the Laplacian quantization is applied to quantized the subimages. Experimental results show that the image coding scheme based on wavelet packets achieves high compression ratio while preserving satisfactory image quality.
Coding of arbitrarily shaped regions
Author(s):
John G. Apostolopoulos;
Jae S. Lim
Show Abstract
Region-based coding schemes may yield considerable improvements in performance as compared to block-based schemes. A fundamental problem in region-based coding is to efficiently encode the interior of each region. This paper proposes two approaches for coding the interiors of arbitrarily-shaped regions. The first is an adaptive iterative scheme and the second is a matching pursuits-type scheme. A geometric interpretation of the problem is given to provide insight into these approaches and to compare their different properties and performances. A number of examples illustrate the performance of the previous and proposed approaches.
Vector quantization by hierarchical packing of embedded-truncated lattices
Author(s):
Vincent Ricordel;
Claude Labit
Show Abstract
The purpose of this paper is to introduce a new vector quantizer (VQ) which takes place in a temporal-adaptative coding scheme for the compression of digital image sequences. Our approach, which has to perform a fast codebook construction, unify both efficient coding methods: a fast lattice encoding and an unbalanced tree-structured codebook design according to a distortion vs. rate tradeoff. Moreover, this tree-structured lattice vector quantizer (TSLVQ) has a convenient property: because of its lattice structure, no reproduction vectors have to be transmitted. Briefly this TSLVQ technique is based on the hierarchical packing of embedded truncated lattices. We investigate here its design: by, first, explaining how to determine the support lattice and secondly how to obtain the hierarchical set of truncated lattice structures which can be optimally embedded with respect to the hierarchical packing. We then use a simple quantization procedure and describe the corresponding tree-structured codebook. Finally we present two unbalanced tree-structured codebook design algorithms based on the BFOS distortion vs. rate criterion.