Proceedings Volume 7881

Multimedia on Mobile Devices 2011; and Multimedia Content Access: Algorithms and Systems V

cover
Proceedings Volume 7881

Multimedia on Mobile Devices 2011; and Multimedia Content Access: Algorithms and Systems V

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 8 February 2011
Contents: 9 Sessions, 37 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2011
Volume Number: 7881

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Emerging Mobile Applications I
  • Data Processing and Evaluation for Mobile Applications
  • 3D Media for Mobile Devices
  • Emerging Mobile Applications II
  • Interactive Paper Session for Multimedia on Mobile Devices 2011
  • Multimedia Tagging
  • Multimedia Systems
  • Bay Area Multimedia
  • Interactive Paper Session for Multimedia Content Access: Algorithms and Systems V
Emerging Mobile Applications I
icon_mobile_dropdown
Towards a multimedia remote viewer for mobile thin clients
B. Joveski, P. Simoens, L. Gardenghi, et al.
Be there a traditional mobile user wanting to connect to a remote multimedia server. In order to allow them to enjoy the same user experience remotely (play, interact, edit, store and share capabilities) as in a traditional fixed LAN environment, several dead-locks are to be dealt with: (1) a heavy and heterogeneous content should be sent through a bandwidth constrained network; (2) the displayed content should be of good quality; (3) user interaction should be processed in real-time and (4) the complexity of the practical solution should not exceed the features of the mobile client in terms of CPU, memory and battery. The present paper takes this challenge and presents a fully operational MPEG-4 BiFS solution.
Multimodal sensing-based camera applications
The increased sensing and computing capabilities of mobile devices can provide for enhanced mobile user experience. Integrating the data from different sensors offers a way to improve application performance in camera-based applications. A key advantage of using cameras as an input modality is that it enables recognizing the context. Therefore, computer vision has been traditionally utilized in user interfaces to observe and automatically detect the user actions. The imaging applications can also make use of various sensors for improving the interactivity and the robustness of the system. In this context, two applications fusing the sensor data with the results obtained from video analysis have been implemented on a Nokia Nseries mobile device. The first solution is a real-time user interface that can be used for browsing large images. The solution enables the display to be controlled by the motion of the user's hand using the built-in sensors as complementary information. The second application is a real-time panorama builder that uses the device's accelerometers to improve the overall quality, providing also instructions during the capture. The experiments show that fusing the sensor data improves camera-based applications especially when the conditions are not optimal for approaches using camera data alone.
Mobile text messaging solutions for obesity prevention
Cellular telephony has become a bright example of co-evolution of human society and information technology. This trend has also been reflected in health care and health promotion projects which included cell phones in data collection and communication chain. While many successful projects have been realized, the review of phone-based data collection techniques reveals that the existing technologies do not completely address health promotion research needs. The paper presents approaches which close this gap by extending existing versatile platforms. The messaging systems are designed for a health-promotion research to prevent obesity and obesity-related health disparities among low-income Latino adolescent girls. Messaging and polling mechanisms are used to communicate and automatically process response data for the target constituency. Preliminary survey data provide an insight on phone availability and technology perception for the study group.
Data Processing and Evaluation for Mobile Applications
icon_mobile_dropdown
Quality and noise measurements in mobile phone video capture
Doina Petrescu, John Pincenti
The quality of videos captured with mobile phones has become increasingly important particularly since resolutions and formats have reached a level that rivals the capabilities available in the digital camcorder market, and since many mobile phones now allow direct playback on large HDTVs. The video quality is determined by the combined quality of the individual parts of the imaging system including the image sensor, the digital color processing, and the video compression, each of which has been studied independently. In this work, we study the combined effect of these elements on the overall video quality. We do this by evaluating the capture under various lighting, color processing, and video compression conditions. First, we measure full reference quality metrics between encoder input and the reconstructed sequence, where the encoder input changes with light and color processing modifications. Second, we introduce a system model which includes all elements that affect video quality, including a low light additive noise model, ISP color processing, as well as the video encoder. Our experiments show that in low light conditions and for certain choices of color processing the system level visual quality may not improve when the encoder becomes more capable or the compression ratio is reduced.
3D scene reconstruction based on multi-view distributed video coding in the Zernike domain for mobile applications
V. Palma, M. Carli, A. Neri
In this paper a Multi-view Distributed Video Coding scheme for mobile applications is presented. Specifically a new fusion technique between temporal and spatial side information in Zernike Moments domain is proposed. Distributed video coding introduces a flexible architecture that enables the design of very low complex video encoders compared to its traditional counterparts. The main goal of our work is to generate at the decoder the side information that optimally blends temporal and interview data. Multi-view distributed coding performance strongly depends on the side information quality built at the decoder. At this aim for improving its quality a spatial view compensation/prediction in Zernike moments domain is applied. Spatial and temporal motion activity have been fused together to obtain the overall side-information. The proposed method has been evaluated by rate-distortion performances for different inter-view and temporal estimation quality conditions.
Psycho-physiological effects of head-mounted displays in ubiquitous use
Takashi Kawai, Jukka Häkkinen, Keisuke Oshima, et al.
In this study, two experiments were conducted to evaluate the psycho-physiological effects by practical use of monocular head-mounted display (HMD) in a real-world environment, based on the assumption of consumer-level applications as viewing video content and receiving navigation information while walking. In the experiment 1, the workload was examined for different types of presenting stimuli using an HMD (monocular or binocular, see-through or non-see-through). The experiment 2 focused on the relationship between the real-world environment and the visual information presented using a monocular HMD. The workload was compared between a case where participants walked while viewing video content without relation to the real-world environment, and a case where participants walked while viewing visual information to augment the real-world environment as navigations.
Progressive imagery with scalable vector graphics
Georg Fuchs, Heidrun Schumann, René Rosenbaum
Vector graphics can be scaled without loss of quality, making them suitable for mobile image communication where a given graphics must be typically represented in high quality for a wide range of screen resolutions. One problem is that file size increases rapidly as content becomes more detailed, which can reduce response times and efficiency in mobile settings. Analog issues for large raster imagery have been overcome using progressive refinement schemes. Similar ideas have already been applied to vector graphics, but an implementation that is compliant to a major and widely adopted standard is still missing. In this publication we show how to provide progressive refinement schemes based on the extendable Scalable Vector Graphics (SVG) standard. We propose two strategies: decomposition of the original SVG and incremental transmission using (1) several linked files and (2) element-wise streaming of a single file. The publication discusses how both strategies are employed in mobile image communication scenarios where the user can interactively define RoIs for prioritized image communication, and reports initial results we obtained from a prototypically implemented client/server setup.
3D Media for Mobile Devices
icon_mobile_dropdown
Mobile 3D quality of experience evaluation: a hybrid data collection and analysis approach
Timo Utriainen, Jyrki Häyrynen, Satu Jumisko-Pyykkö, et al.
The paper presents a hybrid approach to study the user's experienced quality of 3D visual content on mobile autostereoscopic displays. It combines extensive subjective tests with collection and objective analysis of eye-tracked data. 3D cues which are significant for mobiles are simulated in the generated 3D test content. The methodology for conducting subjective quality evaluation includes hybrid data-collection of quantitative quality preferences, qualitative impressions, and binocular eye-tracking. We present early results of the subjective tests along with eye movement reaction times, areas of interest and heatmaps obtained from raw eye-tracked data after statistical analysis. The study contributes to the question what is important to be visualized on portable auto-stereoscopic displays and how to maintain and visually enhance the quality of 3D content for such displays.
Overcome the shortcoming in mobile stereoscopy
The mobile device has the lacked space for configuring cameras to make either ortho- or hyperstereoscopic condition with a small size of display. Therefore mobile stereoscopy cannot provide a presence with a good depth sense to an observer. To solve this problem, we focused on the depth sense control method with a switchable stereo camera alignment. In converging type, the fusible stereo area becomes wider compared to a parallel type when the same focal length was used in both types. This matter makes it that the stereo fusible area formed by converging type to be equal to the parallel type with a more shorten focal length. Therefore there is a kind of the zoom-out effect at the reconstructed depth sense. In diverging type, the fusible stereo area becomes narrower than the parallel. As the same way, the diverging type guarantees a similar characteristic of that an increased focal length is considered in parallel type. Therefore there is a zoom-in effect existing. Stereoscopic zoom-in depth effect becomes rapidly changed by the increased angle but zoom-out becomes retarded relatively.
Comparative study of autostereoscopic displays for mobile devices
We perform comparative analysis of the visual quality of multiple 3D displays - seven portable ones, and a large 3D television set. We discuss two groups of parameters that influence the perceived quality of mobile 3D displays. The first group is related with the optical parameters of the displays, such as crosstalk or size of sweet spots. The second group includes content related parameters, such as objective and subjective comfort disparity range, suitable for a given display. We identify eight important parameters to be measured, and for each parameter we present the measurement methodology, and give comparative results for each display. Finally, we discuss the possibility of each display to visualize downscaled stereoscopic HD content with sufficient visual quality.
Subjective evaluation of mobile 3D video content: depth range versus compression artifacts
Satu Jumisko-Pyykkö, Tomi Haustola, Atanas Boev, et al.
Mobile 3D television is a new form of media experience, which combines the freedom of mobility with the greater realism of presenting visual scenes in 3D. Achieving this combination is a challenging task as greater viewing experience has to be achieved with the limited resources of the mobile delivery channel such as limited bandwidth and power constrained handheld player. This challenge sets need for tight optimization of the overall mobile 3DTV system. Presence of depth and compression artifacts in the played 3D video are two major factors that influence viewer's subjective quality of experience and satisfaction. The primary goal of this study has been to examine the influence of varying depth and compression artifacts on the subjective quality of experience for mobile 3D video content. In addition, the influence of the studied variables on simulator sickness symptoms has been studied and vocabulary-based descriptive quality of experience has been conducted for a sub-set of variables in order to understand the perceptual characteristics in detail. In the experiment, 30 participants have evaluated the overall quality of different 3D video contents with varying depth ranges and compressed with varying quantization parameters. The test video content has been presented on a portable autostereoscopic LCD display with horizontal double density pixel arrangement. The results of the psychometric study indicate that compression artifacts are a dominant factor determining the quality of experience compared to varying depth range. More specifically, contents with strong compression has been rejected by the viewers and deemed unacceptable. The results of descriptive study confirm the dominance of visible spatial artifacts along the added value of depth for artifact-free content. The level of visual discomfort has been determined as not offending.
Development of 3D mobile receiver for stereoscopic video and data service in T-DMB
Gwangsoon Lee, Hyun Lee, Kugjin Yun, et al.
In this paper, we present a development of 3D-T DMB (three-dimensional digital multimedia broadcasting) receiver for providing 3D video and data service. First, for a 3D video service, the developed receiver is capable of decoding and playing 3D AV contents that is encoded by simulcast encoding method and that is transmitted via T-DMB network. Second, the developed receiver can render stereoscopic multimedia objects delivered using MPEG-4 BIFS technology that is also employed in T-DMB. Specially, this paper introduces hardware and software architecture and its implementation of 3D T-DMB receiver. The developed 3D T-DMB receiver has capabilities of generating stereoscopic viewing on the glasses-free 3D mobile display, therefore we propose parameters for designing the 3D display, together with evaluating the viewing angle and distance through both computer simulation and actual measurement. Finally, the availability of 3D video and data service is verified using the experimental system including the implemented receiver and a variety of service examples.
A right scaled depth sense formed by using a distorted objective space based on CG stereoscopy
Kwang-Hoon Lee, Dong-Wook Kim, Gi-Mun Um, et al.
In this paper, we suggested a new way to overcome a shortcoming as stereoscopic depth distortion in common stereoscopy based on computer graphics (CG). In terms of the way, let the objective space transform as the distorted space to make a correct perceived depth sense as if we are seeing the scaled object volume which is well adjusted to user's stereoscopic circumstance. All parameters which related the distortion such as a focal length, an inter-camera distance, an inner angle between camera's axes, a size of display, a viewing distance and an eye distance can be altered to the amount of inversed distortion in the transformed objective space by the linear relationship between the reconstructed image space and the objective space. Actually, the depth distortion is removed after image reconstruction process with a distorted objective space. We prepared a stereo image having a right scaled depth from -200mm to +200mm with an interval as 100mm by the display plane in an official stereoscopic circumstance and showed it to 5 subjects. All subjects recognized and indicated the designed depths.
Emerging Mobile Applications II
icon_mobile_dropdown
Smart travel guide: from internet image database to intelligent system
To help the tourist to discover a city, a region or a park, many options are provided by public tourism travel centers, by free online guides or by dedicated book guides. Nonetheless, these guides provide only mainstream information which are not conform to a particular tourist behavior. On the other hand, we may find several online image databases allowing users to upload their images and to localize each image on a map. These websites are representative of tourism practices and constitute a proxy to analyze tourism flows. Then, this work intends to answer this question: knowing what I have visited and what other people have visited, where should I go now? This process needs to profile users, sites and photos. our paper presents the acquired data and relationship between photographers, sites and photos and introduces the model designed to correctly estimate the site interest of each tourism point. The third part shows an application of our schema: a smart travel guide on geolocated mobile devices. This android application is a travel guide truly matching the user wishes.
Revised benchmarking of contact-less fingerprint scanners for forensic fingerprint detection: challenges and results for chromatic white light scanners (CWL)
Stefan Kiltz, Marcus Leich, Jana Dittmann, et al.
Mobile contact-less fingerprint scanners can be very important tools for the forensic investigation of crime scenes. To be admissible in court, data and the collection process must adhere to rules w.r.t. technology and procedures of acquisition, processing and the conclusions drawn from that evidence. Currently, no overall accepted benchmarking methodology is used to support some of the rules regarding the localisation, acquisition and pre-processing using contact-less fingerprint scanners. Benchmarking is seen essential to rate those devices according to their usefulness for investigating crime scenes. Our main contribution is a revised version of our extensible framework for methodological benchmarking of contact-less fingerprint scanners using a collection of extensible categories and items. The suggested main categories describing a contact-less fingerprint scanner are properties of forensic country-specific legal requirements, technical properties, application-related aspects, input sensory technology, pre-processing algorithm, tested object and materials. Using those it is possible to benchmark fingerprint scanners and describe the setup and the resulting data. Additionally, benchmarking profiles for different usage scenarios are defined. First results for all suggested benchmarking properties, which will be presented in detail in the final paper, were gained using an industrial device (FRT MicroProf200) and conducting 18 tests on 10 different materials.
Interactive Paper Session for Multimedia on Mobile Devices 2011
icon_mobile_dropdown
Optimizing bandwidth and storage requirements for mobile images using perceptual-based JPEG recompression
Tamar Shoham, Dror Gill, Sharon Carmel
The increasing quality and resolution of cellular phone cameras is creating a significant burden on mobile device storage and network bandwidth requirements. In this paper we propose a novel method for recompressing digital photos, which significantly reduces their file size, without affecting their spatial resolution or perceptual quality. By operating within the scope of baseline JPEG, we ensure that the resulting image files can be viewed and edited with any software, browser or consumer device. The proposed method is applied by iteratively recompressing the input image to varying degrees, while computing the value of a novel, robust, perceptual image quality measure. When the image quality measure falls within a pre-determined perceptual quality range, the iterative compression process ends and the resulting image is output. This process ensures that the near maximum amount of compression, which still yields a perceptually identical image, is applied to each input image. Subjective testing of obtained results has shown that using our proposed method, the file size of photos may be reduced by a factor of up to 4X (75% reduction) without affecting their visual quality. Feasibility of the proposed method for mobile applications has been established by implementation on the iPhone 3Gs device.
MPEG-4 AVC stream watermarking by m-QIM techniques
M. Hasnaoui, M. Belhaj, M. Mitrea, et al.
The present paper is devoted to the MPEG-4 AVC (a.k.a. H.264) video stream protection by means of watermarking techniques. The embedding process is carried out on quantized index domain and relies on the m-QIM (m-arry Quantisation Index Modulation) principles. In order to cope with the MPEG-4 AVC peculiarities, the Watson's perceptual model is reconsidered and discussed. The experimental results correspond to the MEDIEVALS (a French National Project) corpus of 4 video sequences of about 15 minutes each, encoded at 512kbps. The transparency is assessed by both subjective and objective measures. The transcoding (down to 64kbps) and geometric (StirMark) attacks result in BER of 6.75% and 11.25%, respectively. In order to improve robustness, an MPEG-4 AVC syntax-driven counterattack is considered: this way, the two above mentioned attacks lead to BER of 2% and 10%, respectively. Finally, the overall theoretical relevance of these results is discussed by estimating the related channel capacities.
Generalized phi number system and its applications for image decomposition and enhancement
Sarkis Agaian, Yicong Zhou
Technologies and applications of the field-programmable gate array (FPGAs) and digital signal processing (DSP) require both new customizable number systems and new data formats. This paper introduces a new class of parameterized number systems, namely the generalized Phi number system (GPNS). By selecting appropriate parameters, the new system derives the traditional Phi number system, binary number system, beta encoder, and other commonly used number systems. GPNS also creates new opportunities for developing customized number systems, multimedia security systems, and image decomposition and enhancement systems. A new image enhancement algorithm is also developed by integrating the GPNS-based bit-plane decomposition with Parameterized Logarithmic Image Processing (PLIP) models. Simulation results are given to demonstrate the GPNS's performance.
Local polynomial approximation-local binary pattern (LPA-LBP) based face classification
Rakesh Mehta, Jirui Yuan, Karen Egiazarian
In literature of face recognition many methods have been proposed which extract features at multiple scales for robust classification. In this paper, we proposed a novel method which utilizes Local Polynomial Approximation (LPA) techniques to capture the directional information of the face image at different scales. LPA based filters are used to obtain directional faces from the normalized face images at multiple scales. Since face image is spatially varied and classification works better when local descriptors are used, we incorporate Local Binary Pattern (LBP) operator to obtain LPA-LBP maps. Blockwise processing is done on LPA-LBP maps to capture the local regional relation among the pixels. Then, finally, Support Vector Machine (SVM) classifier is learned in LPA-LBP feature space for face classification. The final descriptor contains information extracted from different levels and, thus, results in high classification accuracy of the faces. Experiments done on Yale and ORL datasets demonstrate that the proposed method has higher classification accuracy than previously proposed methods.
Anisotropic multi-scale Lucas-Kanade pyramid
Jirui Yuan, Karen Egiazarian
The Lucas-Kanade (LK) algorithm provides a smart iterative parameter-update rule for efficient image alignment, and it has become one of the most widely used techniques in computer vision. Applications range from optical flow and tracking to layered motion, mosaic construction, and face coding. In this paper, we propose a novel Anisotropic Multi-Scale Lucas-Kanade Pyramid (AMSLKP) method. By extracting image pyramids from the original images and iteratively implementing LK algorithm at each level, the Lucas-Kanade Pyramid (LKP) gained better robustness and accuracy. Moreover, instead of calculating gradients in single direction with fixed scale sizes, this paper introduces anisotropic local polynomial approximation (LPA) and intersection of conference intervals (ICI) method to the LKP. The proposed AMSLKP method first calculates the directional estimates and gradients with multiple scales; then for each direction, it adaptively selects the optimum scale for each pixel in the image using ICI rule; at last, the estimate and gradients of the distorted image are computed by fusing the directional results together. The proposed method is evaluated in different noise conditions with various distortion levels. Experimental results show that the AMSLKP method improves the accuracy by more than forty percent compared to LKP method.
iPhone forensics based on Macintosh open source and freeware tools
Thomas Höne, Reiner Creutzburg
The aim of this article is to show the usefulness of Mac OS X based open source tools for forensic investigation of modern iPhones. It demonstrates how important data stored in the iPhone is investigated. Two different scenarios of investigations are presented that are well-suited for a forensics lab work in university. This work shows how to analyze an Apple iPhone using open source and freeware tools. Important data used in a forensics investigation, which are possibly stored on a mobile device are presented. Also the superstructure and functions of the iPhone are explained.
Forensic investigation of certain types of mobile devices
Silas Luttenberger, Reiner Creutzburg
The aim of this paper is to show the usefulness of Windows based Open Source Tools and other demo versions of software tools for forensic investigation of modern mobile devices. It is demonstrated how important data stored in the mobile device are investigated. Different scenarios of investigations are presented that are well-suited for a inexpensive forensics lab work in university. In particular the forensic investigation of three different cell phones is described: Motorola V3m, Motorola V3i, and BlackBerry 8700g.
SENSC algorithm for object and scene categorization
Bao-di Liu, Yu-Jin Zhang
So far, the most popular method for object & scene categorization (such as Vector Quantization (VQ), Sparse coding (SC)) transforms low-level descriptors (usually SIFT descriptors) into mid-level representations with more meaningful information. These methods have two key steps: (1) the building dictionary step, which provides a mechanism to map low-level descriptors into mid-level representations. (2) The coding step, which implements the map from low-level descriptors to mid-level representation for each image by the dictionary. In this paper, we proposed to use a stable and efficient nonnegative sparse coding (SENSC) algorithm for building dictionary and coding each image with it to develop an extension of Spatial Pyramid Matching (SPM) method. We also compare SENSC with SC (state-of-the-art performance) method and VQ method, analysis the drawbacks of SC and VQ for building dictionary, and show SENSC algorithm's performance. According to the experiments on three benchmarks (Caltech101, scene, and events), the method we proposed has shown a better performance than SC and VQ methods.
Practical vision based degraded text recognition system
Rapid growth and progress in the medical, industrial, security and technology fields means more and more consideration for the use of camera based optical character recognition (OCR) Applying OCR to scanned documents is quite mature, and there are many commercial and research products available on this topic. These products achieve acceptable recognition accuracy and reasonable processing times especially with trained software, and constrained text characteristics. Even though the application space for OCR is huge, it is quite challenging to design a single system that is capable of performing automatic OCR for text embedded in an image irrespective of the application. Challenges for OCR systems include; images are taken under natural real world conditions, Surface curvature, text orientation, font, size, lighting conditions, and noise. These and many other conditions make it extremely difficult to achieve reasonable character recognition. Performance for conventional OCR systems drops dramatically as the degradation level of the text image quality increases. In this paper, a new recognition method is proposed to recognize solid or dotted line degraded characters. The degraded text string is localized and segmented using a new algorithm. The new method was implemented and tested using a development framework system that is capable of performing OCR on camera captured images. The framework allows parameter tuning of the image-processing algorithm based on a training set of camera-captured text images. Novel methods were used for enhancement, text localization and the segmentation algorithm which enables building a custom system that is capable of performing automatic OCR which can be used for different applications. The developed framework system includes: new image enhancement, filtering, and segmentation techniques which enabled higher recognition accuracies, faster processing time, and lower energy consumption, compared with the best state of the art published techniques. The system successfully produced impressive OCR accuracies (90% -to- 93%) using customized systems generated by our development framework in two industrial OCR applications: water bottle label text recognition and concrete slab plate text recognition. The system was also trained for the Arabic language alphabet, and demonstrated extremely high recognition accuracy (99%) for Arabic license name plate text recognition with processing times of 10 seconds. The accuracy and run times of the system were compared to conventional and many states of art methods, the proposed system shows excellent results.
Pixel- and region-based image fusion using the parameterized logarithmic stationary wavelet transform
Image fusion is the effective combination of multiple images into a single fused image. The goal of the fusion is use the similar and complementary information of the source images in order to obtain an informative depiction of the scene for further processing. Many multi-scale image fusion algorithms have been formulated on the basis that the human visual system is sensitive to edge information. However, these algorithms make use of standard mathematical operators, which do not reflect human visual system characteristics over a large range of background luminance intensities. Accordingly, this paper proposes new image fusion algorithms using a new Parameterized Logarithmic Stationary Wavelet Transform (PL-SWT), which combines the advantages of the Stationary Wavelet Transform (SWT) and the Parameterized Logarithmic Image Processing (PLIP) model, a parameterized framework for processing images. An analysis of the PLIP model shows that it is capable of providing a balance between logarithmic and standard mathematical operators based on image dependent characteristics. Consequently, the use of the parameterized model is extended for to both pixel- and region-based fusion approaches. Experimental results via computer simulation illustrate the improved performance of the proposed image fusion algorithms by both qualitative and quantitative means.
Practical automatic Arabic license plate recognition system
Since 1970's, the need of an automatic license plate recognition system, sometimes referred as Automatic License Plate Recognition system, has been increasing. A license plate recognition system is an automatic system that is able to recognize a license plate number, extracted from image sensors. In specific, Automatic License Plate Recognition systems are being used in conjunction with various transportation systems in application areas such as law enforcement (e.g. speed limit enforcement) and commercial usages such as parking enforcement and automatic toll payment private and public entrances, border control, theft and vandalism control. Vehicle license plate recognition has been intensively studied in many countries. Due to the different types of license plates being used, the requirement of an automatic license plate recognition system is different for each country. [License plate detection using cluster run length smoothing algorithm ].Generally, an automatic license plate localization and recognition system is made up of three modules; license plate localization, character segmentation and optical character recognition modules. This paper presents an Arabic license plate recognition system that is insensitive to character size, font, shape and orientation with extremely high accuracy rate. The proposed system is based on a combination of enhancement, license plate localization, morphological processing, and feature vector extraction using the Haar transform. The performance of the system is fast due to classification of alphabet and numerals based on the license plate organization. Experimental results for license plates of two different Arab countries show an average of 99 % successful license plate localization and recognition in a total of more than 20 different images captured from a complex outdoor environment. The results run times takes less time compared to conventional and many states of art methods.
Semi-supervised classification of emotional pictures based on feature combination
Shuo Li, Yu-Jin Zhang
Can the abundant emotions reflected in pictures be classified automatically by computer? Only the visual features extracted from images are considered in the previous researches, which have the constrained capability to reveal various emotions. In addition, the training database utilized by previous methods is the subset of International Affective Picture System (IAPS) that has a relatively small scale, which exerts negative effects on the discrimination of emotion classifiers. To solve the above problems, this paper proposes a novel and practical emotional picture classification approach, using semi-supervised learning scheme with both visual feature and keyword tag information. Besides the IAPS with both emotion labels and keyword tags as part of the training dataset, nearly 2000 pictures with only keyword tags that are downloaded from the website Flickr form an auxiliary training dataset. The visual feature of the latent emotional semantic factors is extracted by probabilistic Latent Semantic Analysis (pLSA) model, while the text feature is described by binary vectors on the tag vocabulary. A first Linear Programming Boost (LPBoost) classifier which is trained on the samples from IAPS combines the above two features, and aims to label the other training samples from the internet. Then the second SVM classifier which is trained on all training images using only visual feature, focuses on the test images. In the experiment, the categorization performance of our approach is better than the latest methods.
Comparative review of studies on aging effects in context of biometric authentication
Tobias Scheidat, Juliane Heinze, Claus Vielhauer, et al.
The performance of a biometric system from the point of view of authentication, enrollment and usability depends not only on the algorithms, hardware and software used, but also on aging effects of the human body. Thus, the examination of the influence of ageing depended physiological and mental variances of potential user groups is an important part of biometric research. In this paper a survey of studies is presented which examining effects of biological aging on enrollment and authentication performance as well as usability of biometric systems based on modalities fingerprint, face and iris. In order to compare the findings of the studies and overcome the problem, that nearly every one of these studies uses its own database with varying number of users and different sensors, measurements and/or aging levels, we developed a novel graphical representation of the results. It provides an overview of changes appearing with increasing age and possible influences on performance or usability. The outcomes of a high number of evaluations are compared for each of the three biometric modalities in context of aging and finally concluded in the novel graphical representation.
Multimedia Tagging
icon_mobile_dropdown
Material classification and automatic content enrichment of images using supervised learning and knowledge bases
Sri Abhishikth Mallepudi, Ricardo A. Calix, Gerald M. Knapp
In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.
Personal photo album summarization for global and local photo annotation
M. Broilo, Francesco G. B. De Natale
Although content-based media retrieval tools are continuously improving, personalized image annotation is still one of the most reliable ways to index large image archives. Unfortunately, it is also a time consuming and repetitive operation. Using content to facilitate the user in media annotation may lead to reduced effort and more accurate results. In this paper we propose a content-based interactive tool that supports a user in annotating his personal photo albums. The system provides two main functionalities: to summarize a photo collection in salient moments, and to annotate pictures in a semi-supervised way based on their global and local content. The summarization is based on a bottom-up unsupervised hierarchical clustering that exploits two different matrices of visual distances, while the local tagging uses an object retrieval method based on local image features. Experiments on personal photo collections show that the proposed technique produces good results in terms of organization and access to data.
Multimedia Systems
icon_mobile_dropdown
Spatially organized visualization of image query results
In this work we present a system which visualizes the results obtained from image search engines in such a way that users can conveniently browse the retrieved images. The way in which search results are presented allows the user to grasp the composition of the set of images "at a glance". To do so, images are grouped and positioned according to their distribution in a prosemantic feature space which encodes information about their content at an abstraction level that can be placed between visual and semantic information. The compactness of the feature space allows a fast analysis of the image distribution so that all the computation can be performed in real time.
Image retrieval considering people co-occurrence relations using relevance feedback
Kazuya Shimizu, Naoko Nitta, Noboru Babaguchi
The recent popularity of digital cameras allows us to take a large number of images. There is an increasing need for efficiently and accurately retrieving images containing a specific person from such image collections. While only the visual features of the specific person are used in many query-by-example retrieval methods, we focus on the fact that some people such as family or friends are more likely to appear in the same images than others and use visual features of not only the queried person but also people who have strong co-occurrence relations with the queried person to improve the retrieval performance. The relevance feedback is used to learn who co-occur with the queried person in the same images, their faces, and the strength of their co-occurrence relations. For 116 images collected from 6 persons, after five feedback iterations, the recall rate of 53% was obtained by considering the co-occurrence relations among people, as against 34% when using only features of the queried person.
Bay Area Multimedia
icon_mobile_dropdown
Know your data: understanding implicit usage versus explicit action in video content classification
Jude Yew, David A. Shamma
In this paper, we present a method for video category classification using only social metadata from websites like YouTube. In place of content analysis, we utilize communicative and social contexts surrounding videos as a means to determine a categorical genre, e.g. Comedy, Music. We hypothesize that video clips belonging to different genre categories would have distinct signatures and patterns that are reflected in their collected metadata. In particular, we define and describe social metadata as usage or action to aid in classification. We trained a Naive Bayes classifier to predict categories from a sample of 1,740 YouTube videos representing the top five genre categories. Using just a small number of the available metadata features, we compare the classifications produced by our Naive Bayes classifier with those provided by the uploader of that particular video. Compared to random predictions with the YouTube data (21% accurate), our classifier attained a mediocre 33% accuracy in predicting video genres. However, we found that the accuracy of our classifier significantly improves by nominal factoring of the explicit data features. By factoring the ratings of the videos in the dataset, the classifier was able to accurately predict the genres of 75% of the videos. We argue that the patterns of social activity found in the metadata are not just meaningful in their own right, but are indicative of the meaning of the shared video content. The results presented by this project represents a first step in investigating the potential meaning and significance of social metadata and its relation to the media experience.
Multimedia information retrieval at FX Palo Alto Laboratory
Matthew L. Cooper, John Adcock, Andreas Girgensohn, et al.
This paper describes research activities at FX Palo Alto Laboratory (FXPAL) in the area of multimedia browsing, search, and retrieval. We first consider interfaces for organization and management of personal photo collections. We then survey our work on interactive video search and retrieval. Throughout we discuss the evolution of both the research challenges in these areas and our proposed solutions.
Interactive Paper Session for Multimedia Content Access: Algorithms and Systems V
icon_mobile_dropdown
No-reference blur estimation based on the average cone ratio in the wavelet domain
Ljiljana Platisa, Aleksandra Pizurica, Ewout Vansteenkiste, et al.
With extensive technological advancements in electronic imaging today, high image quality is becoming an imperative necessity in the modern imaging systems. An important part of quality assurance are techniques for measuring the level of image distortion. Recently, we proposed a wavelet based metric of blurriness in the digital images named CogACR. The metric is highly robust to noise and able to distinguish between a great range of blurriness. Also, it can be used either when the reference degradation-free image is available or when it is unknown. However, the metric is content sensitive and thus in a no-reference scenario it was not fully automated. In this paper, we further investigate this problem. First, we propose a method to classify images based on edge content similarity. Next, we use this method to automate the CogACR estimation of blur in a no-reference scenario. Our results indicate high accuracy of the method for a range of natural scene images distorted with the out-of-focus blur. Within the considered range of blur radius of 0 to 10 pixels, varied in steps of 0.25 pixels, the proposed method estimates the blur radius with an absolute error of up to 1 pixel in 80 to 90% of the images.
Texture based Markovian modelling for image retrieval
D. Benboudjema, F. Precioso
Texture is one of the main features with color, shape, edges ... by which human being perceives image. It can be viewed as a set of pixels within an image whose local statistics or local properties (e.g. periodicity, frequency) are constant or slightly varying. In this paper we address from the statistical standpoint, the image indexing problem for image retrieval. Two new Markov model based approaches allowing texture feature extraction will be proposed and a comparison to the texture features based on Gabor filters will be presented. The three methods have been tested for image retrieval task using SVM classifier with a Gaussian kernel on texture-oriented database, Brodatz, and on another texture database. The experimental results show for the proposed scheme promising results.
Non-supervised macro segmentation of large-scale TV videos
Hongliang Bai, Chengyu Dong, Lezi Wang, et al.
In this paper, a novel non-supervised macro segmentation algorithm is presented by detecting duplicate sequences of large-scale TV videos. Motivated by the fact that "Inter-Programs" are repeatedly inserted into the TV videos, the macro structure of the videos can be effectively and automatically generated by identifying the special sequences. There are four sections in the algorithm, namely, keyframe extraction, discrete cosine transformbased feature generation(a fixed-size 64D signature), Locality-Sensitive Hashing (LSH)-based frame retrieval and macro segmentation through the duplicated sequence detection and the dynamic programming. The main contributions are: (1) supply one effective and efficient algorithm for the macro segmentation in the large-scale TV videos, (2) LSH can quickly query the similar frames, and (3) the non-supervised learned duplicate sequence models are used to find the lost duplicate sequences by the dynamic programming. The algorithm has been tested in 15-day different-type TV streams. The F-measure of the system is greater than 96%. The experiments show that it is efficient and effective for the macro segmentation.