Applications of Digital Image Processing XXVIII

Volume Details

Date Published: 31 August 2005

Contents: 8 Sessions, 74 Papers, 0 Presentations

Conference: Optics and Photonics 2005 2005

Volume Number: 5909

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Image Models and Processing
Next-Generation Multimedia Compression, Transmission, and Architectures
Imaging Security
MPEG-7: The Power of Standard Multimedia Metadata and Content Representation
H.264/AVC Applications
Image Coding and Structures
Content Management and Representation
Posters - Wednesday

Image Models and Processing

Automatic inspection of pavement cracking distress

B. Xu, Y. Huang D.D.S.

Show abstract

This paper presents the image-processing algorithm customized for high-speed, real-time inspection of pavement cracking. In the algorithm, a pavement image is divided into grid cells of 8x8 pixels and each cell is classified as a non-crack or crack cell using the grayscale information of the border pixels. Whether a crack cell can be regarded as a basic element (or seed) depends on its contrast to the neighboring cells. A number of crack seeds can be called a crack cluster if they fall on a linear string. A crack cluster corresponds to a dark strip in the original image that may or may not be a section of a real crack. Additional conditions to verify a crack cluster include the requirements in the contrast, width and length of the strip. If verified crack clusters are oriented in similar directions, they will be joined to become one crack. Because many operations are performed on crack seeds rather than on the original image, crack detection can be executed simultaneously when the frame grabber is forming a new image, permitting real-time, online pavement survey. The trial test results show a good repeatability and accuracy when multiple surveys were conducted at different driving conditions.

Real-time automatic target recognition and identification of ground vehicles for airborne optronic systems

Olivier Ruch, Jean-Yves Dufour

Show abstract

This paper describes ATR/I algorithms that were developed by Thales Optronique for the real-time automatic target recognition and identification of ground vehicles in a Air-to-Ground non cooperative context. The main principles of the algorithm based on an exhaustive comparison between the input image and the elements of a 〈 Model Data Set 〉 are: • To avoid the variability on the gray-levels of the target, the comparison is not performed directly on the input gray-level image but on an edge image which is obtained with a segmentation algorithm derived from the classical Canny-Deriche edge detector. • The selected architecture is chosen to repeat many times a single instruction rather than to execute only one time a lot of different instructions. Therefore, the comparisons between the input image and the elements of the 〈 Model Data Set 〉 are performed in 2D with a correlative technique. • The computation time is achieved thanks to a coarse to fine analysis with different levels of comparison : in a first stage, a simple comparison measure is used which enables quick selection of a preliminary list of potential hypotheses. This measure is discriminating enough to select a small number of hypotheses and robust enough to select the true hypothesis associated with the target. These selected hypotheses are then analyzed during a second stage of processing using a more refined measure, and thus more time consuming than the previous one, but which is applied on a significantly reduced number of hypotheses.

Hausdorff probabilistic feature analysis in SAR image recognition

John A. Saghri, Chessa Guilas

Show abstract

An automatic target recognition algorithm for synthetic aperture radar (SAR) imagery data is developed. The algorithm classifies an unknown target as one of the known reference targets based on a maximum likelihood estimation procedure. The algorithm helps assess and optimize the favorable effects of multiple image features on recognition accuracy. This study addresses four procedures: (1) feature extraction, (2) training set creation, (3) classification of unknown images, and (4) optimization of recognition accuracy. A three-feature probabilistic method based on extracted edges, corners, and peaks is used to classify the targets. Once the three features are extracted from the target image, binary images are created from each. Training sets, which are used to classify an unknown target, are then created using average Hausdorff distance values for each of the known members of the eight target image types (ZSU-23-4, ZIL131, D7, 2S1, SLICY, BDRM2, BTR60, and T62) included in the publicly available MSTAR test data. The average Hausdorff distance values are acquired from unknown target feature images and are compared to each training set. Each comparison provides the likelihood of the unknown target belonging to one of the eight possible known targets. For each target, eight likelihoods (for eight possible unknown targets) are determined based on the Hausdroff distances and the pre-assigned feature weights. The unknown target is then classified into the target type that has the maximum likelihood estimation value.

Adaptive noise filtering of white-light confocal microscope images using Karhunen-Loève expansion

M. Balasubramanian, S. S. Iyengar, P. Wolenski, et al.

Show abstract

We present a noise filtering technique using Karhunen-Loeve expansion by the method of snapshots (KLS) using a small ensemble of 3 images. The KLS provides a set of basis functions which comprise the optimal linear basis for the description of an ensemble of empirical observations. The KLS basis is computed using the eigenvectors of the covariance matrix R of the ensemble of images. The significance of each of the basis functions is determined by the magnitude of the corresponding eigenvalues of R, the largest being the most significant. Since all the three images in the ensemble represent the same scene and are registered, the KLS basis construct using the eigenvectors of R with the least eigenvalues typically represent the non-significant and uncommon features in the ensemble. We show that most of the noise in the scene can be removed by reconstructing the image using the KLS basis function constructed using the eigenvector of R with the largest eigenvalue. R is 3x3 symmetric positive definite matrix and hence has a full set of orthogonal eigenvectors. The KLS filtering scheme described here is faster and does not require prior knowledge about the image noise. We show the performance of the proposed method on the images of random cotton fibers acquired using white-light confocal microscope (WLCM) and compare the performance with a median filter. Also, we show that a simple inverse-filter deconvolution algorithm provides an impressive image restoration by pre-filtering the images using the proposed KLS filtering technique.

A detection method of mura on a coated layer using interference light

Kazutaka Taniguchi, Kunio Ueta, Shoji Tatsumi

Show abstract

Here, we describe a method to detect mura during the display devices manufacturing process on a uniformly coated thin photoresist layer. A mura is an irregular variation of lightness on a uniformly manufactured surface. Display devices are manufactured through photolithographic process, and every imaging process requires a uniformly coated photo resist layer. Mura detection on the layer is necessary to keep the device quality high. The mura has been visually inspected by tilting a glass under a sodium lamp in order to create the largest amount of interference. However, tilting the glass is difficult due to the growing device size currently four square-meters in the latest line. Rather than tilting the glass, we have developed an apparatus which illuminates the glass via a set of narrow bandpass filtered light of different wavelength. The apparatus observes the interference of light reflected from the surface and from the bottom of the thin layer in order to inspect the mura on the glass. The reflection intensity as a function of the layer thickness shows a sinusoidal periodic characteristic, which means that the mura detection sensitivity depends strongly on the optical path length in the layer. We have developed a method to compensate the periodic sensitivity fluctuation by employing a reflection ratio that enables us to detect the local thickness fluctuation with an accuracy of one nanometer.

Quadratic correlation filter-based target tracking in FLIR image sequences

A. Bal, M. S. Alam

Show abstract

Target tracking in forward looking infrared (FLIR) video sequences is challenging problem due to various limitations such as low signal-to-noise ratio, image blurring, partial occlusion, and low texture information, which often leads to missing targets or tracking non-target objects. To alleviate these problems, we propose the application of quadratic correlation filters using subframe approach in FLIR. The proposed filtering technique avoids the disadvantages of pixel-based image preprocessing techniques. The filter coefficients are obtained for desired target class from the training images. For real time applications, the input scene is first segmented to the subframes according to target location information from the previous frame. The subframe of interest is then correlated with correlation filters associated with target class. The obtained correlation output contains higher value that indicates the target location in the region of interest. The simulation results for target tracking in real life FLIR imagery have been reported to verify the effectiveness of the proposed technique.

Minimax distance transform correlation filter-based target detection in FLIR imagery

J. F. Khan, M. S. Alam, R. R. Adhami, et al.

Show abstract

This paper proposes a method to detect objects of arbitrary poses and sizes from a complex forward looking infrared (FLIR) image scene exploiting image correlation technique along with the preprocessing of the scene using a class of morphological operators. This presented automatic target recognition (ATR) algorithm consists of two steps. In the first step, the image is preprocessed, by employing morphological reconstruction operators, to remove the background as well as clutter and to intensify the presence of both low or high contrast targets. This step also involves in finding the possible candidate target regions or region of interests (ROIs) and passing those ROIs to the second step for classification. The second step exploits template-matching technique such as minimax distance transform correlation filter (MDTCF) to identify the true target from the false alarms in the pre-selected ROIs after classification. The MDTCF minimizes the average squared distance from the filtered true-class training images to a filtered reference image while maximizing the mean squared distance of the filtered false-class training images to this filtered reference image. This approach increases the separation between the false-class correlation outputs and the true-class correlation outputs. Classification is performed using the squared distance of a filtered test image to the chosen filtered reference image. The proposed technique has been tested with real life FLIR image sequences supplied by the Army Missile Command (AMCOM). Experimental results, obtained with these real FLIR image sequences, illustrating a wide variety of target and clutter variability, demonstrate the effectiveness and robustness of the proposed method.

A new two-level efficient technique for 2D image patches registration via optimized cross correlation, wavelet transform, and moments of inertia with application to ladar imaging

Carlos Bejar, Dalila B. Megherbi

Show abstract

A new fast feature-based approach for efficient and accurate automated image registration with applications to multiple-views or Multi-sensor LADAR imaging is presented. As it is known, highly accurate and efficient Image registration is highly needed and desired in ground or Airborne LADAR imaging. The proposed approach is two-fold: First, direct comparison of sub-image patches of the overlapping images is performed applying the normalized cross-correlation technique. A pre-specified window sub-image patch size is used to speed up the matching process. In particular, a 65x65 window is defined in the right 50% of the left image (reference image) then, a matching window in the left 50% of the right image (unregistered image) is searched. The beauty of this approach is that the original images are reduced to small and similar sub-image patches of size 65x65, reducing tremendously the computation time of the matching point pairs search process, which as we show, speeds up tremendously the derivation of the matching points pairs described in the next phase. Second, Wavelet transform is applied then to the small and similar sub-image patches to extract a number of matching feature points. Each feature point is an edge point whose edge response is the maximum within a neighborhood. The normalized cross correlation technique is applied again this time to find the matching pairs between the feature points. From the matching pairs, the moments of inertia are applied to estimate the rigid transformation parameters between the overlapping images. In general, the overlapping images can have an arbitrarily large orientation difference. Therefore, this angle must be found first to correct the unregistered image. In order to estimate the rotation angle, we show how a so-called "angle histogram" is derived and calculated. The rotation angle selected is the one that corresponds to the maximum peak in the angle histogram. We show how the proposed approach is of an order of magnitude faster than the existing methods, on a single-processor computer. We show also that the proposed approach is automatic, robust, and can work with any partially overlapping images rotated from each other. Experimental results using rotated and non-rotated images are presented.

Image fusion using DFT based digital filter banks

Murat Sezgin, Isin Erer, Okan K. Ersoy

Show abstract

In this paper, we present a new fusion algorithm based on a multidecomposition approach with the DFT based symmetric, zero-phase, nonoverlapping digital filter bank representation. The DFT of the signal is separated into two parts leading to the low and high −pass components then decimated by two to obtain subband signals. The original signal may be recovered by interpolating the subband signals, computing their inverse DFT and summing the results. In the proposed image fusion algorithm, two or more source images are decomposed into subbands by DFT based digital filters. The detail and approximation subband coefficients are modified according to their magnitudes and mean values, respectively. Then, the modified subbands are combined in the subband domain. Finally, the fused image is obtained by the inverse transform.

Next-Generation Multimedia Compression, Transmission, and Architectures

ISMuS: interactive, scalable, multimedia streaming platform

Jihun Cha, Hyun-Cheol Kim, Seyoon Jeong, et al.

Show abstract

Technical evolutions in the field of information technology have changed many aspects of the industries and the life of human beings. Internet and broadcasting technologies act as core ingredients for this revolution. Various new services that were never possible are now available to general public by utilizing these technologies. Multimedia service via IP networks becomes one of easily accessible service in these days. Technical advances in Internet services, the provision of constantly increasing network bandwidth capacity, and the evolution of multimedia technologies have made the demands for multimedia streaming services increased explosively. With this increasing demand Internet becomes deluged with multimedia traffics. Although multimedia streaming services became indispensable, the quality of a multimedia service over Internet can not be technically guaranteed. Recently users demand multimedia service whose quality is competitive to the traditional TV broadcasting service with additional functionalities. Such additional functionalities include interactivity, scalability, and adaptability. A multimedia that comprises these ancillary functionalities is often called richmedia. In order to satisfy aforementioned requirements, Interactive Scalable Multimedia Streaming (ISMuS) platform is designed and developed. In this paper, the architecture, implementation, and additional functionalities of ISMuS platform are presented. The presented platform is capable of providing user interactions based on MPEG-4 Systems technology [1] and supporting an efficient multimedia distribution through an overlay network technology. Loaded with feature-rich technologies, the platform can serve both on-demand and broadcast-like richmedia services.

Joint wavelet packets for groups of frames in MCTF

Maria Trocan, Christophe Tillier, Beatrice Pesquet-Popescu

Show abstract

Wavelet packets provide a flexible representation of data, which has been proved to be very useful in a lot of applications in signal, image and video processing. In particular, in image and video coding, their ability to best capture the input content features can be exploited by designing appropriate optimization criteria. In this paper, we introduce joint wavelet packets for groups of frames, allowing to provide a unique best basis representation for several frames and not one basis per frame, as classically done. Two main advantages are expected from this joint representation. On the one hand, bitrate is spared, since a single tree description is sent instead of 31 per group of frames (GOP) - when a GOP contains, for example, 32 frames. On the other hand, this common description can characterize the spatio-temporal features of the given video GOP and this way can be exploited as a valuable feature for video classification and video database searching. A second contribution of the paper is to provide insight into the modifications necessary in the best basis algorithm (BBA) in order to cope with biorthogonal decompositions. A computationally efficient algorithm is deduced for an entropy-based criterion.

MPEG-21 DIA-based video adaptation framework and its application to rate adaptation

Jung Won Kang, Jae-Gon Kim, Dong-San Jun, et al.

Show abstract

In universal media access (UMA) environment, because of the heterogeneous networks and terminals, flexible video adaptation, that is performed according to the network conditions and terminal capabilities as well as user preferences, is required to maximize consumer experience and ensure Quality of Service (QoS). MPEG-21 Digital Item Adaptation (DIA) support an interoperable framework for effective and efficient video adaptation. Among MPEG-21 DIA tools, utility function that describes the relations among the feasible adaptation operation, resource constraint, and utility plays the most important role in adaptation process because the optimal adaptation operation is decided among the feasible adaptation operations with given constraints. Therefore, in this paper, the overall concept of MEPG-21 DIA based adaptation framework and formulation of utility function are presented. In addition, the feasibility of the adaptation framework is presented by applying it to a few use cases for generating utility function and applications to specific adaptation scenarios involving nonscalable and scalable video.

An effective MP4 streaming method using the schedule information of image objects

Seyoon Jeong, Jihun Cha, Hyun-Cheol Kim, et al.

Show abstract

In this paper, we present an effective streaming method for MPEG-4 contents using the schedule information of image objects and progressive JPEG. The proposed method is designed for Interactive Scalable Multimedia Streaming (ISMuS) system. In rich interactive contents, the amount of image objects is not negligible for a streaming service with QoS. If a streaming system does not manage the image data, it could create a bottleneck in the system. The proposed method considers the schedule information of image objects to be displayed within a specific time frame, generally within a few second. Since the proposed method uses the progressive JPEG instead of Baseline JPEG, it treats image object as scalable Object. The streaming server sends surely DC data of each image object and AC data of image object is sent only when there is an enough room for AC data in network bandwidth. The priorities of audio and video elementary stream are also within the consideration as well as those image objects according to the varying network status.

Performance of scalable video coding with respect to differentiated service requirements

Doug Young Suh, Gwang Hoon Park, Seung Gyun Han, et al.

Show abstract

Multimedia service can be categorized as conversational, streaming, and download services, which differ in temporal and semantic quality requirements. Performance of SVC (Scalable video coding) is analyzed in the three dimensional (spatio-temporal) frequency domain. Based on the analysis, SVC scheme can be modified to satisfy various requirements as being compliant to the international standards. Decoding without enhancement layer data results in drift phenomenon. Several drift-free techniques are analyzed.

Network support mechanisms for scalable media streaming

Charalampos Z. Patrikakis, Pantelis Karamolegkos, Ignatios Mihailaris, et al.

Show abstract

In this paper, the problem of integrating scalable media encoding mechanisms with emerging solutions/protocols for media streaming is addressed. In this, we consider the cases of scalable media encoding technologies, trying to combine them with emerging protocols and mechanisms for transmission rate adaptation. An architecture comprising of a scalable media encoder supported by DCCP transport protocol is proposed. The proposal introduces a buffer management mechanism that takes advantage of the rate adaptation features of the scalable media encoding for complementing DCCPs' inherent congestion control.

Imaging Security

Study and validation of tools interoperability in the JPSEC framework

V. Conan, Y. Sadourny, K. Jean-Marie, et al.

Show abstract

Digital imagery is important in many applications today, and the security of digital imagery is important today and is likely to gain in importance in the near future. The emerging international standard ISO/IEC JPEG-2000 Security (JPSEC) is designed to provide security for digital imagery, and in particular digital imagery coded with the JPEG-2000 image coding standard. One of the primary goals of a standard is to ensure interoperability between creators and consumers produced by different manufacturers. The JPSEC standard, similar to the popular JPEG and MPEG family of standards, specifies only the bitstream syntax and the receiver's processing, and not how the bitstream is created or the details of how it is consumed. This paper examines the interoperability for the JPSEC standard, and presents an example JPSEC consumption process which can provide insights in the design of JPSEC consumers. Initial interoperability tests between different groups with independently created implementations of JPSEC creators and consumers have been successful in providing the JPSEC security services of confidentiality (via encryption) and authentication (via message authentication codes, or MACs). Further interoperability work is on-going.

Image replica detection based on support vector classifier

Y. Maret, F. Dufaux, T. Ebrahimi

Show abstract

In this paper, we propose a technique for image replica detection. By replica, we mean equivalent versions of a given reference image, e.g. after it has undergone operations such as compression, filtering or resizing. Applications of this technique include discovery of copyright infringement or detection of illicit content. The technique is based on the extraction of multiple features from an image, namely texture, color, and spatial distribution of colors. Similar features are then grouped into groups and the similarity between two images is given by several partial distances. The decision function to decide whether a test image is a replica of a given reference image is finally derived using Support Vector Classifier (SVC). In this paper, we show that this technique achieves good results on a large database of images. For instance, for a false negative rate of 5 % the system yields a false positive rate of only 6 • 10^-5.

Supporting secure transcoding in JPSEC

John G. Apostolopoulos, Susie J. Wee

Show abstract

The capture, processing, delivery, and display or printing of digital images is important today and will become even more important in the future. Important additional future challenges include the remote browsing of images, image adaptation to support diverse clients, and providing security services such as confidentiality and authentication. In this context, an important functionality is secure transcoding: providing end-to-end security between the content creator and the content consumer, while enabling a potentially untrusted mid-network node or proxy to adapt the content to be best matched for delivery to the consumer - where the adaptation is performed without requiring the node or proxy to unprotect (i.e., decrypt) the content. In prior work, secure transcoding was shown to be possible through a framework referred to as Secure Scalable Streaming, which was originally designed for video streaming applications. Secure transcoding was identified as being possible within the ISO/IEC JPEG-2000 Security (JPSEC) standardization effort, and this paper describes how the JPSEC standard was designed to support rate-distortion (R-D) optimized secure transcoding.

Scrambling for anonymous visual communications

Frederic Dufaux, Touradj Ebrahimi

Show abstract

In this paper, we present a system for anonymous visual communications. Target application is an anonymous video chat. The system is identifying faces in the video sequence by means of face detection or skin detection. The corresponding regions are subsequently scrambled. We investigate several approaches for scrambling, either in the image-domain or in the transform-domain. Experiment results show the effectiveness of the proposed system.

MPEG-7: The Power of Standard Multimedia Metadata and Content Representation

MPEG-7: standard metadata for multimedia content

Wo Chang

Show abstract

The eXtensible Markup Language (XML) metadata technology of describing media contents has emerged as a dominant mode of making media searchable both for human and machine consumptions. To realize this premise, many online Web applications are pushing this concept to its fullest potential. However, a good metadata model does require a robust standardization effort so that the metadata content and its structure can reach its maximum usage between various applications. An effective media content description technology should also use standard metadata structures especially when dealing with various multimedia contents. A new metadata technology called MPEG-7 content description has merged from the ISO MPEG standards body with the charter of defining standard metadata to describe audiovisual content. This paper will give an overview of MPEG-7 technology and what impact it can bring forth to the next generation of multimedia indexing and retrieval applications.

Easy and effective indexing and browsing schemes for digital photo album application

Sang-Kyun Kim, Seungji Yang, Kyongsok Seo, et al.

Show abstract

In this paper, we propose a method to cluster digital home photos associated with user-centric functionalities, which are event/situation based photo clustering, category based photo clustering, and person-identity based photo clustering and indexing. The main idea of the user-centric photo album is to enable users to organize and browse their photos along the semantically meaningful axes that are situation, category, and person-identity. Experiment results showed that the proposed method would be useful to organize a photo album based on human perception.

Mobile multimedia library: an MPEG-7 application with camera-equipped mobile phones

Ryoma Oami, Eiji Kasutani, Akio Yamada

Show abstract

This paper proposes a Mobile Multimedia Library (MML), a new application of information retrieval using camera-equipped mobile phones. MML allows users to get information of an unknown object anywhere and anytime. A user simply takes a picture of the object by a mobile phone and sends it directly to an MML server on which the picture is analyzed to identify the object. The MML server returns the identification result to the mobile phone and the user can browse information of the object on the display of the mobile phone. This application employs the k-Nearest Neighbor approach using eight MPEG-7 visual features to identify objects. A prototype system for animal identification has been developed to demonstrate effectiveness of the MML framework. It takes about ten seconds on average for the identification in which 63% and 79% queries were correctly identified within first four and first ten candidates, respectively, out of 229 categories.

Characterizing region of interest in image using MPEG-7 visual descriptors

Min-Sung Ryu, Soo-Jun Park, Chee Sun Won

Show abstract

In this paper, we propose a region-based image retrieval system using EHD (Edge Histogram Descriptor) and CLD (Color Layout Descriptor) of MPEG-7 descriptors. The combined descriptor can efficiently describe edge and color features in terms of sub-image regions. That is, the basic unit for the selection of the region-of-interest (ROI) in the image is the sub-image block of the EHD, which corresponds to 16 (i.e., 4x4) non-overlapping image blocks in the image space. This implies that, to have a one-to-one region correspondence between EHD and CLD, we need to take an 8x8 inverse DCT (IDCT) for the CLD. Experimental results show that the proposed retrieval scheme can be used for image retrieval with the ROI based image retrieval for MPEG-7 indexed images.

Overall framework on digital image searching

Mun-Kew Leong, Joo-Hwee Lim

Show abstract

We propose a framework for digital image searching which seeks to include the best practices learnt from document information retrieval systems. In particular, we motivate the importance of the user in the search process, and show how the user's task can significantly alter the evaluation of results from the search system. The framework makes the roles of the user explicit to avoid the academic omniscient truth approach which characterizes current content-based image retrieval systems. While primary (low-level) features are a necessary part of the image search process, it is the higher level semantics which have created significant results from image analysis and search. We show that the user is a necessary component in this process.

Emedding MPEG-7 metadata within a media file format

Wo Chang

Show abstract

Embedding metadata within a media file format becomes evermore popular for digital media. Traditional digital media files such as MP3 songs and JPEG photos do not carry any metadata structures to describe the media content until these file formats were extended with ID3 and EXIF. Recently both ID3 and EXIF advanced to version 2.4 and version 2.2 respectively with much added new description tags. Currently, most MP3 players and digital cameras support the latest revisions of these metadata structures as the de-facto standard formats. Given the benefits of having metadata to describe the media content is very critical to consumers for viewing and searching media content. However, both ID3 and EXIF were designed with very different approaches in terms of syntax, semantics, and data structures. Therefore, these two metadata file formats are not compatible and cannot be utilized for other common applications such as slideshow for playing MP3 music in the background and shuffle through images in the foreground. This paper presents the idea of embedding the international standard of ISO/IEC MPEG-7 metadata descriptions inside the rich ISO/IEC MPEG-4 file format container so that a general metadata framework can be used for images, audio, and video applications.

H.264/AVC Applications

Some applications of H.264/AVC fidelity-range extension profiles

Pankaj Topiwala

Show abstract

H.264/MPEG-4 AVC Fidelity Range Extensions (FRExt), Amendment 1 to the new video standard, was created to allow for higher quality, higher bit depth coding with richer color space support. A key impact of this extension has been to also create an important new profile (High Profile) which supports the traditional 8-bit, 4:2:0 color space, but with improved performance. Finished in July, 2004, this new standard is being aggressively adopted by industry, especially for future high-definition TV and DVD applications. In this paper, we provide a high-level snapshot of what is new in the FRExt Amendment relative to the base standard, and indicate the many applications that now rely on it.

Rate-adaptive H.264 video for information for global reach (IFGR)

Praveen Kota, Karthik Kannan, Zixiang Xiong, et al.

Show abstract

Rate control for video transmission becomes extremely important in "bandwidth-precious" scenarios and added real-time constraints such as joint source channel coding make it even more vital. Hence, there has always been a demand for simple and efficient rate control algorithms. The approximate linear relationship between coding rate (R) and percentage of zeros among the quantized spatial transform coefficients (ρ) is exploited in the present work, to cater to such low-bandwidth, low-delay applications. The current rate control algorithm for H.264 is used as the benchmark for comparison. The extensive experimental results show that ρ-Domain model outperforms the existing algorithm with a more robust rate control, besides yielding a similar or improved Peak Signal to Noise Ratio (PSNR) and being faster

Comparative Study of JPEG2000 and H.264/AVC FRExt I– Frame Coding on High-Definition Video Sequences

Pankaj Topiwala

Show abstract

This paper reports the rate-distortion performance comparison of JPEG2000 with H.264/AVC Fidelity Range Extensions (FRExt) High Profile I-frame coding for high definition (HD) video sequences. This work can be considered as an extension of a similar earlier study involving H.264/AVC Main Profile [1]. Coding simulations are performed on a set of 720p and 1080p HD video sequences, which have been commonly used for H.264/AVC standardization work. As expected, our experimental results show that H.264/AVC FRExt I-frame coding offers consistent R-D performance gains (around 0.2 to 1 dB in peak signal-to-noise ratio) over JPEG2000 color image coding. However, similar to [1, 2], we have not considered scalability, computational complexity as well as other JPEG2000 features in this study.

Origins of the performance of H.264/AVC: an account of the development of H.263 and H.264 coding tools

Cliff Reader

Show abstract

The development of the H.264 standard is described focused on increases in coding performance and reductions in implementation complexity. The lineage of the standard is described, with a historical overview including its relationship to MPEG4. The discussion is centered on the development of eight major coding tools, which was typically a two-phase development comprising first the basic algorithm, and then optimization for implementation. Each tool development includes an overview of the technology, summary of the development process and timeline, and citation of the original inventions and any major interim inventions. The incremental performance gains and complexity reductions are documented. The work is abstracted from a detailed report on the technology of .264¹.

AVC/H.264 patent portfolio license

Lawrence A. Horn

Show abstract

MPEG LA, LLC offers a joint patent license for the AVC (a/k/a H.264) Standard (ISO/IEC IS 14496-10:2004). Like MPEG LA's other licenses, the AVC Patent Portfolio License is offered for the convenience of the marketplace as an alternative enabling users to access essential intellectual property owned by many patent holders under a single license rather than negotiating licenses with each of them individually. The AVC Patent Portfolio License includes essential patents owned by Electronics and Telecommunications Research Institute (ETRI); France Telecom, societe anonyme; Fujitsu Limited; Koninklijke Philips Electronics N.V.; LG Electronics Inc.; Matsushita Electric Industrial Co., Ltd.; Microsoft Corporation; Mitsubishi Electric Corporation; Robert Bosch GmbH; Samsung Electronics Co., Ltd.; Sedna Patent Services, LLC; Sharp Kabushiki Kaisha; Siemens AG; Sony Corporation; The Trustees of Columbia University in the City of New York; Toshiba Corporation; and Victor Company of Japan, Limited. MPEG LA's objective is to provide worldwide access to as much AVC essential intellectual property as possible for the benefit of AVC users. Therefore, any party that believes it has essential patents is welcome to submit them for evaluation of their essentiality and inclusion in the License if found essential.

Efficient temporal search range prediction for motion estimation in H.264

Changsung Kim, C.-C. Jay Kuo

Show abstract

An efficient search range prediction method is proposed to reduce the complexity of motion search in the H.264 video coding standard in this work. The main idea is to predict the temporal search range by modeling the relationship between the rate-distortion (RD) coding gain and the required computational complexity. The proposed algorithm first predicts the temporal search range to maximize the ratio of the expected RD coding gain and the normalized computational cost. Then, fast motion search is performed within the predicted search range with some early termination rule. Experimental results show that the proposed algorithm can save approximately 63-75% of the encoding complexity in motion estimation of H.264 (JM9.3) with negligible degradation in the RD performance.

An operational rate control scheme for H.264 with two-stage encoding

Do-Kyoung Kwon, Mei-Yin Shen, C.-C. Jay Kuo

Show abstract

An operational rate control (RC) scheme based on two-stage encoding is studied in this research, where frame-layer rate control with a constant bit rate to achieve constant video quality is examined. In the first encoding stage, the R-D optimized mode decision and its associated motion estimation (RDO) as well as DCT/Q, IQ/IDCT and entropy coding are performed for all macroblocks (MBs) for a target frame using an initial quantization parameter (QP), which is the QP of its previous frame. In the second encoding stage, the residual signal from the first stage is encoded using several QP values around the initial QP. Given the target bits and distortion for the current frame, the residual signal is finally encoded using the QP determined by comparing target bits and distortion with actual bits and distortion. To reduce the additional coding complexity of the two-stage encoding, upper and lower bounds around target bits and distortion are employed to reduce the number of encoding required in the second stage. Experimental results are given to show the superior performance of two proposed rate control algorithms, where one targets at the constant bit rate while the other at constant quality.

Efficient bit stream switching of H.264 coded video

Xiaosong Zhou, C.-C. Jay Kuo

Show abstract

Stream switching among compressed video streams coded at different quality levels and bit rates is examined, and two enhanced stream switching schemes for H.264 video are proposed in this work. The flexibility of the original H.264 stream switching scheme is achieved at the cost of coding efficiency by introducing primary and secondary SP/SI pictures. In contrast, no modification is made to the original compressed streams in our enhanced schemes so that coding efficiency is maintained at switching points. When switching occurs, the two new schemes use a new picture type, called the difference picture (DIFF), in different ways to compensate the mismatch of reference frames. The difference picture can be coded efficiently, and it is easy to pick a good quantization step size to meet the quality requirement. It is shown by experimental results that the two new schemes outperform the original H.264 switching scheme in the sense that they can achieve prompt stream switching without noticeable quality degradation, and a small amount of bit rate overhead is demanded only when switching occurs.

Image Coding and Structures

Progressive geometry coding of 3D meshes using hierarchical vertex set split

Jingliang Peng, C.-C. Jay Kuo

Show abstract

A progressive lossless 3D geometry encoder using a hierarchical vertex set split method is presented in this work. Compared with prior art, the proposed coder has significantly better rate-distortion (R-D) performance at low bit-rates and provides visually pleasant intermediate meshes at all bit-rates. Given a 3D mesh, all its 3D vertices form an initial vertex set, which is split into several child vertex sets using the well-known Generalized Lloyd Algorithm (GLA). Each newly generated vertex set that contains more than one vertex is iteratively split so as to form a hierarchical structure. During the process of hierarchical vertex set split, a representative is calculated for each newly generated vertex set. Then, the representatives of all existing vertex sets form an approximation to the original 3D geometry. For each vertex set split, the number of child vertex sets is arithmetic encoded, and the offsets of the child representatives from their parent representative are sorted, quantized and arithmetic encoded. If a finer resolution is required for a vertex set containing only one vertex, the rectangloid cell containing that vertex can be further subdivided and coded iteratively. Experimental results are provided to demonstrate the superior performance of the proposed geometry coder.

A stereo video coding scheme based on H.264

Jong Dae Oh, C.-C. Jay Kuo

Show abstract

A stereo video coding algorithm based on H.264 is presented in this work. The main difference between the multi-view and the single-view video coding algorithm is whether a disparity vector (DV) is used or not. For stereo video, we use binocular sequences that consist of left- and right-view sequences. The left-view sequence is coded by the conventional H.264 algorithm and used as a reference image for the prediction of the right-view sequence. The right-view sequence is coded with motion vectors (MVs) and DV. To code a forward prediction (P) frame or a bi-directional prediction (B) frame of the right-view sequence, we employ 2 or 3 prediction vectors. They are forward MV, backward MV and DV. For the coding of stereo sequences with DV and forward/backward MVs, we propose new macroblock (MB) type tables and a temporal prediction structure. Experimental results show that the proposed algorithm offers a higher peak signal to noise ratio (PSNR) than the H.264 simulcast method.

Adaptive cubic convolution interpolation and sequential filtering for color demosaicing of Bayer pattern image sensors

Wonpil Yu

Show abstract

A still or video camera based on a Bayer-type image sensor is inherently an under-sampled system in terms of color pixel reconstruction. Accurate reconstruction of green channel information and minimization of color artifacts are two primary goals in the color demosaicing methods. Unsuccessful demosaicing methods usually come up with large color artifacts, particularly at image areas with fine details. In the proposed method, we first estimate green values at each chrominance pixel position by utilizing cubic convolution interpolation along the direction of the smallest gradient magnitude. We have defined a diamond shaped interpolation kernel and four different gradient directions to facilitate accurate reconstruction of the green channel. Reconstruction of chrominance channels comprises spectral correlation based averaging of neighboring chrominance pixels and a proposed sequential filtering on the econstructed chrominance channels. Due to the introduction of sequential filtering stage, conventional quantitative image quality measures such as PSNR or PESNR are not high but we found that the visual quality as observed from the human visual system is more natural and comfortably vivid reconstruction can be obtained. Moreover, the proposed demosaicing method comprises additions and subtractions for the most part, which makes its implementation more tractable.

A point-based hybrid cloud modeling technique

Bei Wang, C.-C. Jay Kuo

Show abstract

A hybrid approach to cloud modeling and shading based on volumetric procedural modeling, particle modeling and point primitives is proposed in this work. The volumetric procedural modeling allows great flexibility in controlling object's level of detail, the particle modeling enables to model clouds piece by piece to facilitate interactive fly-in animation, and the point primitive facilitates the multi-resolution representation and animation of complex objects. A preliminary experimental result is given to demonstrate the superior performance of the proposed method.

Edge directed image interpolation with Bamberger pyramids

Jose Gerardo Rosiles

Show abstract

Image interpolation is a standard feature in digital image editing software, digital camera systems and printers. Classical methods for resizing produce blurred images with unacceptable quality. Bamberger Pyramids and filter banks have been successfully used for texture and image analysis. They provide excellent multiresolution and directional selectivity. In this paper we present an edge-directed image interpolation algorithm which takes advantage of the simultaneous spatial-directional edge localization at the subband level. The proposed algorithm outperform classical schemes like bilinear and bicubic schemes from the visual and numerical point of views.

Applications based on restored satellite images

D. Arbel, S. Levin, M. Nir, et al.

Show abstract

Satellites orbit the earth and obtain imagery of the ground below. The quality of satellite images is affected by the properties of the atmospheric imaging path, which degrade the image by blurring it and reducing its contrast. Applications involving satellite images are many and varied. Imaging systems are also different technologically and in their physical and optical characteristics such as sensor types, resolution, field of view (FOV), spectral range of the acquiring channels - from the visible to the thermal IR (TIR), platforms (mobilization facilities; aircrafts and/or spacecrafts), altitude above ground surface etc. It is important to obtain good quality satellite images because of the variety of applications based on them. The more qualitative is the recorded image, the more information is yielded from the image. The restoration process is conditioned by gathering much data about the atmospheric medium and its characterization. In return, there is a contribution to the applications based on those restorations i.e., satellite communication, warfare against long distance missiles, geographical aspects, agricultural aspects, economical aspects, intelligence, security, military, etc. Several manners to use restored Landsat 7 enhanced thematic mapper plus (ETM+) satellite images are suggested and presented here. In particular, using the restoration results for few potential geographical applications such as color classification and mapping (roads and streets localization) methods.

Thermogram analysis and classification in breast cancer diagnostics

M. Zavisek, A. Drastich, K. Dvorak M.D.

Show abstract

The question of effectiveness of the infrared thermal imaging for population screening and the early breast cancer detection has become topical again in last few years. The reason is that we have new ways how to replace the subjective classification, performed by a trained physician and based on ill-defined thermo-pathological features, by semi-automated classification performed on digital thermograms with a sophisticated computer program. Our purpose is to solve the task of a pattern classifier design that would work as a core of such a program, and also try to answer the question of the effectiveness. We describe the regions of interest (whole breasts in frontal picture) by number of about 40 features quantifying all fundamental properties of ROI - from an average temperature up to texture descriptors. Feature selection procedures helped us to define the essential quantitative thermo-pathological features. We used the final feature space to design several types of classifiers with supervised learning.

Multifocus fusion with multisize windows

R. Redondo, F. Sroubek, S. Fischer, et al.

Show abstract

The term fusion means in general an approach to combine the important information simultaneously from several sources (channels). When we approach image fusion, multiscale transforms (MST) are commonly used as the analyzing tool. It transforms the sources into a space-frequency domain which can be understood as a measure of the saliency (activity level). The criterion to fuse consists of taking the decision to preserve the most salient data from the sources. In order to reduce sensitivity against noise the saliency is often averaged over certain neighborhood (window). However averaging produces that decisions become more fuzzy. Traditionally the size of the neighborhood is chosen fixed according to the level of noise present in the sources, which has to be estimated in advance. This paper proposes a novel technique which combines a set of decreasing averaging windows in order to exploit the advantages of each one. We call it multisize windows-based fusion. This technique apart from improving fusion results avoids selecting the neighboring size in advance (and therefore to estimate the level of noise) since it only needs a simple set of windows defined according to image size. We compared it with another technique developed by us called oriented windows which, although it consider a fixed neighborhood, adapts the averaging shape to the spatial orientation of the saliency. The specific case of multifocus image fusion is considered for the experiments. The multisize windows technique delivers the best percentage of correct decisions compared with any single fixed window in all the experiments carried out, adding different noise sources (Gaussian, speckle and salt&pepper) with different levels. Although it does not performs better than the oriented window scheme one has to bear in mind that oriented windows are tuned in each case to the best size.

Relaxed color reproduction under different illuminant conditions

Li Liu, Yongyi Yang, Henry Stark

Show abstract

In many applications, the illuminant condition of color reproductions is different from that of targets. To achieve the same color perceptions, reproduced colors should be adjusted according to the changes of reference whites determined by the illuminants. In this paper, we consider this color reproduction problem which is also subject to material constraints. By material constraints we mean any constraints that are applied to the amount of inks, lights, voltages, and currents that are used in the generation of color. Color reproduction that is subject to material constraints is called the relaxed color reproduction, because the reproduced colors may not match the targets exactly. An algorithm that is suitable for this task is the method of vector space projections (VSP). In our work, VSP method is directly applied to the control signals of devices. The effects of illuminant variances are also studied. In order to use VSP for constrained color reproduction, we use a novel approach to convert the non-linear constraints in the CIE-Lab space into simpler linear forms. Experimental results demonstrate the feasibility of this method.

Content Management and Representation

Dynamic fluid animation using scalable grids

Youngmin Kwak, Chang-Su Kim, C.-C. Jay Kuo

Show abstract

A new grid system for dynamic fluid animation that controls the number of particles adaptively according to the viewing distance is proposed in this work. The proposed scalable grid system is developed in association with the semi-Lagrangian method to demonstrate the dynamic fluid behavior solved from the system of Navier-Stokes equations. It contains actual particles and virtual particles. To save computations, only actual particles are used to render images viewed from a standard viewpoint. When we zoom in, virtual particles are added to maintain the resolution of the fluid simulation and provide higher quality rendering. We implement a scalable computing procedure for the diffusion step in fluid simulation. The proposed scalable grid system can be incorporated into any semi-Lagrangian method that uses grid or voxel primitives.

Video rate color region segmentation for mobile robotic applications

Aymeric de Cabrol, Patrick J. Bonnin, Vincent Hugel, et al.

Show abstract

Color Region may be an interesting image feature to extract for visual tasks in robotics, such as navigation and obstacle avoidance. But, whereas numerous methods are used for vision systems embedded on robots, only a few use this segmentation mainly because of the processing duration. In this paper, we propose a new real-time (ie. video rate) color region segmentation followed by a robust color classification and a merging of regions, dedicated to various applications such as RoboCup four-legged league or an industrial conveyor wheeled robot. Performances of this algorithm and confrontation with other methods, in terms of result quality and temporal performances are provided. For better quality results, the obtained speed up is between 2 and 4. For same quality results, the it is up to 10. We present also the outlines of the Dynamic Vision System of the CLEOPATRE Project - for which this segmentation has been developed - and the Clear Box Methodology which allowed us to create the new color region segmentation from the evaluation and the knowledge of other well known segmentations.

Temporally optimized edge segmentation for mobile robotics applications

Aymeric de Cabrol, Patrick J. Bonnin, Vincent Hugel, et al.

Show abstract

As an Edge may be an interesting image feature to extract for robotic visual tasks such as the 3D modelization of the environment of robots by stereovision, we propose in this paper a methodology to implement edge segmentation, then we apply it to design a Temporally Optimized Edge Segmentation for Mobile Robotics Applications. Using our methodology, we show how it is possible to reduce the duration of an edge detection operator from 100.62ms for the slower case to 10.8ms for the faster one. This represents a gain of nearly 90ms for the processing time, so nearly a factor of 10 for the speed up.

Separating patterns and finding the independent components of mixed signals based on non-Gaussian distribution properties

Robert S. Rand, Hao Chen, Pramod K Varshney

Show abstract

The effect of assuming and using non-Gaussian attributes of underlying source signals for separating/encoding patterns is investigated, for application to terrain categorization (TERCAT) problems. Our analysis provides transformed data, denoted as "Independent Components," which can be used and interpreted in different ways. The basis vectors of the resulting transformed data are statistically independent and tend to align themselves with source signals. In this effort, we investigate the basic formulation designed to transform signals for subsequent processing or analysis, as well as a more sophisticated model designed specifically for unsupervised classification. Mixes of single band images are used, as well as simulated color infrared and Landsat. A number of experiments are performed. We first validate the basic formulation using a straightforward application of the method to unmix signal data in image space. We next show the advantage of using this transformed data compared to the original data for visually detecting TERCAT targets of interest. Subsequently, we test two methods of performing unsupervised classification on a scene that contain a diverse range of terrain features, showing the benefit of these methods against a control method for TERCAT applications.

Model based recognition using 3D line sets and multidimensional Hausdorff distance

M. T. Rahman, M. S. Alam

Show abstract

In this paper, we proposed a three dimensional (3D) line based matching algorithm using multi-dimensional Hausdorff distance. Classical line based recognition techniques using Hausdorff distance deals with two dimensional (2D) models and 2D images. In our proposed 3D line based matching technique, two sets of lines are extracted from a 3D model and 3D image (constructed by stereo imaging). For matching these line sets, we used multidimensional Hausdorff distance minimization technique which requires only to find the translation between the image and the model, whereas most of the model based recognition techniques require to find the rotation, scale and translation variations between the image and the models. A line based approach for model based recognition using four dimensional (4D) Hausdorff distance has been already proposed in Ref. [1]. However, our method requires a 4D Hausdorff distance calculation followed by a 3D Hausdorff distance calculation. In the proposed method, as the matching is performed using 3D line sets, it is more reliable and accurate.

Data image fusion using combinatorial maps

Kamel Bouchefra, Patrick Bonnin, Aymeric De Cabrol

Show abstract

Processing images involves large amount of both rich and complex information. Indeed, sets of localized pixels identify objects; however, the same pixels when contained on a larger set (the whole image for example), may also represent other types of information. They may have some semantics or represent a context and so on. Dealing with one type of information identifies problems particular to one grain level. At the low level are for example filtering problems. At the mid-level, one may consider segmentation techniques and at the high level, are interpretation problems. Independently of the algorithmic questions, a structure that allows capturing part or whole of the above granularity is of great interests. In this frame of mind, it is proposed here a structure based on the combinatorial maps' formalism. A combinatorial map is a topological representation, in term of darts, built to represent one object within the image. Permutations are then defined that operate on the darts. Their combinations allow exhaustive and easy circulations on the objects' edges. The combinations allow also representing relations among different objects; a feature one may use for complex (3D) objects' modeling. Furthermore, different information (texture, geometry...) may be attached to the maps. The proposed structure is demonstrated here at the mid-level, within a fusion scheme that combines edge and region segmentations. The first one is accurate in edges detection while the second detects regions which edges are less accurate. Combinatorial maps are then considered to highlight features mentioned above, but also to enhance region edges' representation.

Towards parameter-free classification of sound effects in movies

Selina Chu, Shrikanth Narayanan, C.-C. Jay Kuo

Show abstract

The problem of identifying intense events via multimedia data mining in films is investigated in this work. Movies are mainly characterized by dialog, music, and sound effects. We begin our investigation with detecting interesting events through sound effects. Sound effects are neither speech nor music, but are closely associated with interesting events such as car chases and gun shots. In this work, we utilize low-level audio features including MFCC and energy to identify sound effects. It was shown in previous work that the Hidden Markov model (HMM) works well for speech/audio signals. However, this technique requires a careful choice in designing the model and choosing correct parameters. In this work, we introduce a framework that will avoid such necessity and works well with semi- and non-parametric learning algorithms.

Object detection and segmentation in camouflaged environments

S. Makrogiannis, M. Trujillo San-Martin

Show abstract

The detection and classification of objects in complicated backgrounds represents a difficult image analysis problem. Previous methods have employed additional information from dynamic scene processing to extract the object of interest from its environment and have produced efficient results. However, the study of object detection based on the information provided uniquely by still images has not been comprehensively studied. In this work, a different approach is proposed, when dynamic information is not available for detection. The presented scheme consists of two main stages. The first one includes a still image segmentation approach that makes use of multi-scale information and graph-based grouping to partition the image scene into meaningful regions. This is followed by a texture-based classification algorithm, in which correspondence analysis is used for feature selection and optimisation purposes. The outcomes of this methodology provide representative results at each stage of the study, to indicate the efficiency and potential of this approach for classification/detection in the difficult task of object detection in camouflaged environments.

A feature-based image registration algorithm using the multi-resolution approach combined with moments of inertia, with application to ladar imaging

Carlos Bejar, D. B. Megherbi

Show abstract

In this paper, a new feature-based approach for efficient automated image registration with applications to multiple-views or Multi-sensor LADAR imaging is presented. As it is known, highly accurate and efficient Image registration is highly needed and desired in ground or Airborne LADAR imaging. The characteristic of the proposed approach is that it combines wavelet transform with moments of inertia to estimate the rigid transformation parameters between two overlapping images. The Wavelet transform is applied here to extract a number of feature points. Each feature point is an edge point whose edge response is the maximum within a neighborhood of the edge point. By using the normalized cross correlation technique the matching points are found. We show how the computational complexity of the image comparison process is improved by applying the cross correlation technique to 75% of the image size, the right 75% of the left image with the left 75% of the right image, where the overlapping area is supposed to be. From the matching points, the moments of inertia are applied to estimate the rigid transformation parameters. As it is well known, in general, the cross correlation technique for similarity measure is very sensitive to rotation. We show here how a modified cross correlation technique, which includes the rotation angle between points under measurement, is used to estimate the orientation difference between the overlapping images. In particular, for each feature point the orientation with respect to the horizontal Cartesian axis is calculated first, then, the orientation difference between points under measurement is calculated. From the pool of orientation differences, the rotation value that is repeated the most is selected as the orientation difference between the two images. We show the robustness and accuracy of the proposed method in comparison to the existing state-of-the art methods. It is automatic and can work with any partially overlapping images, independently of the size of the rotation angle between the two images considered. Finally, experimental results including significantly rotated and nonrotated images are presented to show the potential of the method.

Automatic image-driven segmentation of cardiac ventricles in cine anatomical MRI

Chris A. Cocosco, Wiro J. Niessen, Thomas Netsch, et al.

Show abstract

The automatic segmentation of the heart's two ventricles from dynamic ("cine") cardiac anatomical images, such as 3D+time short-axis MRI, is of significant clinical importance. Previously published automated methods have various disadvantages for routine clinical use. This work reports about a novel automatic segmentation method that is very fast, and robust against anatomical variability and image contrast variations. The method is mostly image-driven: it fully exploits the information provided by modern 4D (3D+time) balanced Fast Field Echo (bFFE) cardiac anatomical MRI, and makes only few and plausible assumptions about the images and the imaged heart. Specifically, the method does not need any geometrical shape models nor complex gray-level appearance models. The method simply uses the two ventricles' contraction-expansion cycle, as well as the ventricles' spatial coherence along the time dimension. The performance of the cardiac ventricles segmentation method was demonstrated through a qualitative visual validation on 32 clinical exams: no gross failures for the left-ventricle (right-ventricle) on 32 (30) of the exams were found. Also, a clinical validation of resulting quantitative cardiac functional parameters was performed against a manual quantification of 18 exams; the automatically computed Ejection Fraction (EF) correlated well to the manually computed one: linear regression with RMS=3.7% (RMS expressed in EF units).

Posters - Wednesday

A fuzzy approach to supervised segmentation parameter selection for object-based classification

Travis L. Maxwell, Yun Zhang

Show abstract

Today's very high spatial resolution satellite sensors, such as QuickBird and IKONOS, pose additional problems to the land cover classification task as a consequence of the data's high spectral variability. To address this challenge, the object-based approach to classification demonstrates considerable promise. However, the success of the object-oriented approach remains highly dependent on the successful segmentation of the image. Image segmentation using the Fractal Net Evolution approach has been very successful by exhibiting visually convincing results at a variety of scales. However, this segmentation approach relies heavily on user experience in combination with a trial and error approach to determine the appropriate parameters to achieve a successful segmentation. This paper proposes a fuzzy approach to supervised segmentation parameter selection. Fuzzy Logic is a powerful tool given its ability to manage vague input and produce a definite output. This property, combined with its flexible and empirical nature, make this control methodology ideally suited to this task. This paper will serve to introduce the techniques of image segmentation using Fractal Net Evolution as background for the development of the proposed fuzzy methodology. The proposed system optimizes the selection of parameters by producing the most advantageous segmentation in a very time efficient manner. Results are presented and evaluated in the context of efficiency and visual conformity to the training objects. Testing demonstrates that this approach demonstrates significant promise to improve the object-based classification workflow and provides recommendations for future research.

Investigation of crop nitrogen content based on image processing technologies

Yane Zhang, Minzan Li, Zenghui Xu, et al.

Show abstract

A special image sampler was developed to non-destructively take leaf images of cucumber plants in greenhouse, which were grown in different nutrient conditions in order to obtain nitrogen stress to the crop. Then the correlation between nitrogen content of cucumber leaf and image property of the leaf was analyzed. The sampler is composed of eight lamps, a half sphere shell, a platform, and a window used for fixing camera. The lamps were arranged around the platform on what leafs would be placed for image-taking. The half sphere shell was over the platform to reflect the light of lamps. Since the reflected light from the shell was diffuse and symmetrical, the reflection noise of the leaf could be reduced and the high quality image could be obtained. The correlation analysis between leaf images and nitrogen contents of leaves was conducted based on RGB mode and HSI mode. In RGB mode the G weight of the image showed the highest linear correlation with the nitrogen content of the cucumber leaf than R weight and B weight, while in HSI mode the hue showed the same high linear correlation as the G weight. A new index from the G weight of RGB mode and the hue of HSI mode was suggested to estimate nitrogen content of cucumber leaf. The result shows the new index is practical.

Estimating soil moisture based on image processing technologies

Lihua Zheng, Minzan Li, Jianying Sun, et al.

Show abstract

Soil moisture is a critical factor to crop growth. Due to the facts of drought and less rain in northern China, it is necessary to introduce water controlled irrigating. Therefore, estimating soil moisture distribution rapidly and accurately is very important for decision making of water saving irrigating. This study took a farmland in Beijing as the experiment field. The aerial image at each experimental spot was taken from a balloon at the height of 100m above the land surface, the hyperspectral data of each test site was measured by a handheld spectroradiometer in the meantime, and the soil moisture of each sample was obtained in laboratory. With the obtained aerial images of the experiment field, the characteristics of each image were calculated by image processing technologies. And then the correlation analysis between soil moisture and each image characteristics was executed. Firstly, the coefficients of correlation between soil moisture and RGB values, as well as between soil moisture and HSV values were calculated respectively, and corresponding estimation models were established with R² of 0.887 and 0.706 respectively. Secondly, using the combination of RGB and HSV values, another estimation model was established, and its R² reached to 0.900. Finally, using the combination of the RGB values, HSV values and spectral reflectance data at 835 nm, a multiple linear regression model was also explored, which R² reached to 0.905. The result showed that the estimation of soil moisture content by using aerial images and hyperspectral data was rapid and accurate. Further more, high resolution image can be obtained conveniently now, thus it should be more practicable for forecasting the soil moisture accurately and timely by image processing technologies.

Use of a local regularity analysis by a wavelet analysis for glitch detection

C. Ordénovic, C. Surace, B. Torrésani, et al.

Show abstract

We present a method based on local regularity analysis to detect glitch signatures in an interferometric signal. The regularity is given by the local value of the Holder exponent. This exponent can be derived using a Holderian analysis with a wavelet coefficients modulus calculation along wavelet transform modulus maxima lines (so called WTMML) in suitably selected regions of the time-scale half-plane. Glitches that are considered as a discontinuity on the signal show Holder exponent lower than a fixed threshold defined for a continuous signal (around -1). The method has been tested using computed histograms simulations derived from "HERSCHEL / SPIRE" theoretical signals. Statistics show that the optimization of the detection parameters should take into account variables such as sampling rate, signal to noise ratio but is almost independent of the glitch amplitude.

Estimating growth status of winter wheat based on aerial images and hyperspectral data

Yunxia Han, Minzan Li, Liangliang Jia, et al.

Show abstract

The aim of this paper is to estimate the growth status and yield of winter wheat using aerial images and hyperspectral data obtained by unmanned aircraft, and then to perform precision management to the crop. The test farm was divided into 48 cells. Twenty-four cells were selected as variable rate fertilization area, and the other 24 cells were used as contrast area with low fertilization in growth season. In 2004, the aerial images of winter wheat canopy were measured from an unmanned aircraft. The SPAD value of crop leaf was acquired using a SPAD-502 chlorophyll meter, and then the hyperspectral reflectance of the crop canopy was measured by a handheld spectroradiometer. The vegetation indices, NDVI and DVI, were calculated from the hyperspectral data. The characteristics of the aerial images were used to evaluate the growth status. The RGB values of all cells were calculated from aerial images. The result showed that total nitrogen had better correlation with SPAD, NDVI, DVI, and RGB. NDVI and DVI had high correlation with the growth condition, and R/(R+G+B) and G/(R+G+B) had good correlation with the growth status and yield. The variable rate fertilization based on aerial images and NDVI was executed in the experimental cells. The yield map showed that the spatial variation of the yield was reduced and the total yield was increased. While in contrast cells, the spatial variation of the yield is greater than in experimental cells because of the spatial variation of the field nutrition. Therefore, it is practical to use aerial images and hyperspectral data of the crop canopy in estimation of the crop growth status.

Impulsive noise detection and removal with the use of iterative filter based on gradient adaptive neighborhoods

Mikhail Mozerov, Vitaly Kober

Show abstract

A new effective algorithm of impulse noise suppression is proposed. First an iterative procedure composes a helper function associated with the filtered image, then this helper function allows realizing efficient impulse detection for the conventional two steps filtering scheme. In our previous papers we suggested a simple adaptive algorithm of impulse noise detection in monochrome images that takes into account the size of signal gradient neighborhoods and image statistics. In this work the detection scheme is noticeably ameliorated. Further investigations shown that the proposed modification of the gradient based impulse detector highly improves the results of the filtering in terms of both subjective and objective criteria.

Recognition of partially occluded objects using correlation filters with training

J. Angel González-Fraga, Vitaly Kober, Josue Álvarez-Borrego

Show abstract

One of the main problems in visual signal processing is incomplete information owing an occlusion of objects by other objects. It is well known that correlation filters mainly use contour information of objects to carry out pattern recognition. However, in real applications object contours are often disappeared. In these cases conventional correlation filters without training yield a poor performance. In this paper two novel methods based on correlation filters with training for recognition of partially occluded objects are proposed. The methods improve significantly discrimination capability of conventional correlation filters. The first method performs training of a correlation filter with both a target and objects to be rejected. In the second proposal two different correlation filters are designed. They deal independently with contour and texture information to improve recognition of partially occluded objects. Computer simulation results for various test images are provided and discussed.

A fast predictive block matching motion estimation algorithm based on spatio temporal correlation information

Humaira Nisar, Tae-Sun Choi

Show abstract

Motion estimation is an important and computationally intensive task in video coding applications. Fast block matching algorithms reduce the computational complexity of motion estimation at the expense of accuracy. Fast motion estimation algorithms often assume monotonic error surface in order to speed up the computations. The argument against this assumption is that the search might be trapped in local minimum resulting in inaccurate motion estimates. This paper investigates the state-of-the-art techniques for block based motion estimation and presents an approach to improve the performance of block-based motion estimation algorithms. Specifically, this paper suggests a simple scheme that includes spatiotemporal neighborhood information for obtaining better estimates of the motion vectors. The predictive motion vector is then chosen as the initial search center. This predictive search center is found to be closer to the global minimum and thus decreases the effects of the monotonic error surface assumption and its impact on the motion field estimates. Based on the prediction, the algorithm also chooses between center biased or uniform approach for slow or fast moving sequences. The experiments presented in this paper demonstrate the efficiency of the proposed approach.

A reduced color approach to high quality cartoon coding

Yi-Chen Tsai, Ming-Sui Lee, Meiyin Shen, et al.

Show abstract

An algorithm that integrates table indexing and quad-tree decomposition is proposed for cartoon image compression in this work. The proposed method includes 3 steps. First, colors in the color palette are selected based on the input image and a set of training images. Second, the input image is divided into blocks of size 16 by 16. The number of colors inside each block is checked. If the block has one uniform color or exactly two colors, no further processing is required. Otherwise, quad-tree decomposition will be performed for this block. The subdivision continues until all subblocks have either one or two colors. Then, the code for each subblock will be output in a depth-first order. If the subblock size reaches 2 x 2 and the number of colors in that block is still more than 2, no further subdivision is performed and a code that indicates colors of 4 pixels are output. Finally, to further reduce the size, the data part of the output stream is losslessly compressed by the LZW method. Experimental results are given to demonstrate the superior performance of the proposed method for cartoon image compression.

Tempo and tension analysis for MTV-style home video authoring

Shen-Zheng Wang, Shih-Hung Lee, C.-C. Jay Kuo

Show abstract

Automatic authoring of MTV-style home video using music tempo and visual tempo analysis is investigated in this research. The music tempo is extracted using the onset analysis. The frame-level visual tempo is detected based on the motion degree between consecutive frames. The object-level visual tempo is performed based on the tension analysis of facial expression. Finally, the authoring methodology is presented, which consists of music and visual tempo matching to product MTV-Style video. Experiment results using baby home video are given to demonstrate the performance of the proposed algorithm.

Real-time pattern recognition with adaptative correlation filters

Victor H. Diaz-Ramirez, Vitaly Kober, Josue Álvarez-Borrego

Show abstract

One of the most important performance measures for pattern recognition systems is the discrimination capability, or how well a system can recognize objects of interest as well as reject wrong objects and the background. In real time opto-digital pattern recognition systems the light efficiency (how much of the input light energy passes trough the system) and parameters of optical setup should also taken into account. In this work we propose a new adaptive composite correlation filters based on synthetic discriminant functions, which are able to improve significantly the discrimination capability and which yield a high light efficiency. An iterative design procedure is used for the digital design of phase-only filter. A desired value of the discrimination capability is obtained by exploiting information about the target signal, background, and objects to be rejected. Next the designed filter with a high light efficiency is implemented in an optical setup. We use real scenes to test the proposed filter in a real-time opto-digital system. Experimental results obtained in the system are compared with those obtained with computer simulation.

Estimation of the point response function for undersampled data for the infrared camera of the Spitzer Space Telescope

David Makovoz, Patrick Lowrance

Show abstract

Point Response Function (PRF) is an important characteristic of the combination of an optical system and the detector array. It has various applications, such as accurate photometry and astrometry, image interpolation and deconvolution. We present a technique of PRF estimation for undersampled detectors. We present the results of application of this technique to the data taken by the Infrared Array Camera (IRAC) of the Spitzer Space Telescope. The technique capitalizes on the numerous observations of point sources that cover the whole detector array as well as the area of an individual pixel. Data fitting is used to center the point source images on the sub-pixel level and to normalize them by the point source flux. They are subsequently resampled and shifted to a common grid using bicubic interpolation. Great redundancy of the data allows for effective outlier rejection. The quality of PRF estimation is verified using simulated images and real images taken by the Spitzer Space Telescope.

Detecting and measuring human bodies over wide work fields by using stereo cameras

Jian Lu, Kyoko Hamajima, Subarna Lata Tuladhar

Show abstract

In factory-automation, in order to avoid collision between human-bodies and autonomous mobile machines, a stereo-camera based method was proposed to implement an intelligent sensing method for human-bodies. An experimental system for this purpose was complemented, and an evaluation experiment was performed. The experiment shows that the accurate results were obtained in the case that no object other than human-body enters/exits to the monitored area.

Target detection and tracking in airborne video imagery using statistical snake and mean shift

Shuqun Zhang, Mo Chen

Show abstract

Target detection and tracking in real-time videos are very important and yet difficult for many applications. Numerous detection and tracking techniques have been proposed, typically by imposing some constraints on the motion and image to simplify the problem depending on the application and environment. This paper focuses on target detection and tracking in airborne videos, in which not much simplification can be made. We have recently proposed a combined/switching detection and tracking method which is based on the combination of a spatio-temporal segmentation and statistical snake model. This paper improves the statistical snake model by incorporating both edge and region information and enhancing the snake contour deformation. A more complex motion model is used to improve the accuracy of object detection and size classification. Mean-shift is integrated into the proposed combined method to track small point objects and deal with the problem of object disappearance-reappearance. Testing results using real UAV videos are provided.

Preprocessing and compression of digital holographic images

Shuqun Zhang, Mo Chen

Show abstract

Digital hologram compression has recently received increasing attention due to easy acquisition and new applications in three-dimensional information processing. Standard compression algorithms perform poorly on complex-valued holographic data. This paper studies quantization techniques for lossy compression of digital holographic images, where three commonly used quantizers are compared. Our observations show that the real and imagery components of holograms and their corresponding Fourier transform coefficients exhibit a Laplacian and Gaussian distribution, respectively. It is therefore possible to design an optimal quantizer for holographic data compression. To further increase the compression ratio, preprocessing techniques to extract the region of interest are presented. These include Fourier plane filtering and statistical snake image segmentation.

3D watermarking scheme in stereo vision system

Dong-Choon Hwang, Kyung-hoon Bae, Jung-Hwan Ko, et al.

Show abstract

In this paper, 3D watermarking scheme in stereo vision system is proposed. A watermark data is embedded into the right image of a stereo image pair by using the DWT algorithm and disparity data is extracted from the left and watermarked right images. And then, disparity data and the left image are transmitted to the recipient through the communication channel. At the receiver, the watermarked right image is reconstructed from the received left image and disparity data by employing the FMA. From the difference between the watermarked and original right images, the embedded watermark image can be finally extracted. From experiments using the stereo image pair of 'Friends' and a watermark data of '3DRC', it is found that PSNR of the watermark image extracted from the reconstructed right image through the FMA and DWT algorithms can be increased up to 2.87dB, 2.58dB on average by comparing with those of the FMA and DCT algorithm when the quantizer Scale(Q.S) is kept to be 16 and 20, respectively.

A bi-directional stereo matching algorithm based on adaptive matching window

Kyung-Hoon Bae, Dong-Sik Yi, Seung Cheol Kim, et al.

Show abstract

In this paper, a bi-directional stereo matching algorithm based-on adaptive matching window is proposed. That is, by adaptively predicting the mutual correlation between stereo images pair using the proposed algorithm, the bandwidth of stereo input images pair can be compressed to the level of a conventional 2D image and a predicted image also can be effectively reconstructed using a reference image and disparity vectors. Especially, in the proposed algorithm, first feature values are extracted from input stereo images pair. Then, a matching window for stereo matching is adaptively selected depending on the magnitude of these feature values. That is, for the region having larger feature values, a smaller matching window is selected while, for the opposite case, a larger matching window is selected by comparing predetermined threshold values. This approach is not only able to reduce a mismatching of disparity vectors which occurs in the conventional dense disparity estimation with a small matching window, but is also able to reduce blocking effects which occur in the coarse disparity estimation with a large matching window. In addition, from some experiments using stereo sequences of 'Man' and 'Fichier', it is shown that the proposed algorithm improves the PSNRs of a reconstructed image to about 6.78 dB on average at ± 30 search ranges by comparing with that of conventional algorithms. And also, it is found that there is almost no difference between an original image and a reconstructed image through the proposed algorithm by comparison to that of conventional algorithms.

An adaptive image enhancement algorithm and real-time implementation for an infrared imaging system

Ajay Kumar, S. Sarkar, R. P. Agarwal

Show abstract

Present day thermal imaging systems are designed based on highly sensitive infrared focal plane arrays (IRFPA), in which most of the preprocessing is done on the focal plane itself. In spite of many advances in the design of IRFPAs, it has inherent non-uniformities and instabilities, which limits its sensitivity, dynamic range and other advantages. Whenever there is little or no thermal variation in the scene, the thermal imager suffers from its inability to separate out the target of interest from its background. Thus, most of the infrared imagery suffers from poor contrast and high noise. This results in object not being visible clearly. The problem becomes complicated because of dynamic background and non-availability of background clutter characteristics. In this paper, we present a adaptive approach for image contrast enhancement that expands the range of the digital numbers in a self-adaptive manner. This algorithm has been tested on the field-recorded data and is observed that this technique offers excellent results for thermal imager operating in both 3-5 μm and 8-12 μm wavelength regions.

An adaptive wavelet-based deblocking algorithm for MPEG-4 codec

Trieu-Kien Truong, Shi-Huang Chen, Rong-Yi Jhang

Show abstract

This paper proposed an adaptive wavelet-based deblocking algorithm for MPEG-4 video coding standard. The novelty of this method is that the deblocking filter uses a wavelet-based threshold to detect and analyze artifacts on coded block boundaries. This threshold value is based on the difference between the wavelet transform coefficients of image blocks and the coefficients of the entire image. Therefore, the threshold value is made adaptive to different images and characteristics of blocking artifacts. Then one can attenuate those artifacts by applying a selected filter based on the above threshold value. It is shown in this paper that the proposed method is robust, fast, and works remarkably well for MPEG-4 codec at low bit rates. Another advantage of the new method is that it retains sharp features in the decoded frames since it only removes artifacts. Experimental results show that the proposed method can achieve a significantly improved visual quality and increase the PSNR in the decoded video frame.

A novel face recognition system using the binary phase-only filter via optimal correlation thresholding

D. B. Megherbi, S. Rastogi

Show abstract

With the increasing amount of research in the area of face recognition and the influx of new theories, the development of a reliable and accurate face recognition system is a vital requisite. In this paper, a novel face recognition system via optical correlation thresh-holding using the Binary Phase-Only Filter is presented. The system is developed using the database created by AT&T Laboratories Cambridge. In course of the paper we also study the effect of an illumination correction algorithm on the recognition results.

Effect of illumination correction on face recognition rates for the binary phase-only filter (BPOF) and the extended normalized correlation methods

Dalila Megherbi, Soumitra Rastogi

Show abstract

In this paper, we study the effect of illumination on face recognition. The challenging database of images created by AT&T Laboratories Cambridge has been used for all our tests. Since images of the same individual in the database have been taken under varying light sources, illumination across the images is not uniform. We pre-processed all the database images for non-uniform illumination correction. Correlation tests were performed on these pre-processed images using two methods - first, by simulating a simple correlator employing the Binary Phase Only Filter (BPOF) and second, using the extended correlation method suggested by Kyatkin and Chirikjian. Results are presented to show the effect of illumination correction on both methods.

A generic and high performance FPGA based signal processing architecture for staring thermal imaging system

Ajay Kumar, S. Sarkar, R. P. Agarwal

Show abstract

A fully reconfigurable architecture for realizing thermal imaging system is proposed. This architecture provides a generic solution to any signal processing issues related to thermal imaging and also provides the advantage of low power design and take care of any sensor up-gradation. The proposed architecture is implemented by using Xilinx Virtex II 2000K gate FPGA and 320 x 256 elements InSb IRFPA. The system can store up to six gain and offset tables which can be optimized for different environmental conditions and as well us allow the up-gradation of offset coefficients dynamically. Image frames are presented which shows the successful implementation of the architecture.

Applications of Digital Image Processing XXVIII

Volume Details

Table of Contents

Table of Contents