18 - 22 August 2024
San Diego, California, US
Post-deadline submissions will be considered for the poster session, or oral session if space becomes available

Submissions to the conference should have abstract text lengths of 1,000 words or less.

The field of digital image processing has experienced continuous and significant expansion in recent years. The usefulness of this technology is apparent in many different disciplines covering entertainment through remote sensing. The advances and wide availability of image processing hardware along with advanced algorithms have further enhanced the usefulness of image processing. The Application of Digital Image Processing conference welcomes contributions of new results and novel techniques from this important technology.

Papers are solicited in the broad areas of digital image processing applications, including:

Application areas New imaging modalities and their processing Immersive imaging Image and video processing and analysis New standards in image and video applications Security in imaging Imaging requirements and features Imaging systems Compression Human visual system and perceptual imaging Artificial intelligence in imaging Novel and emerging methods in imaging ;
In progress – view active session
Conference 13137

Applications of Digital Image Processing XLVII

19 - 22 August 2024 | Conv. Ctr. Room 17B (Mon-Wed); Room 11B (Thu)
View Session ∨
  • 1: Medical Imaging
  • 2: Imaging Security I
  • 3: Imaging Security II
  • 4: Imaging Systems I
  • Poster Session
  • 5: Imaging Systems II
  • 6: AI-Based Imaging
  • Signal, Image, and Data Processing Keynote
  • Optical Engineering Plenary
  • 7: Imaging Performance Assessment I
  • 8: Imaging Performance Assessment II
  • 9: Video ASICs for Data Center
  • Featured Nobel Plenary
  • 10: Imaging Technology
Information

Want to participate in this program?
Post-deadline abstract submissions accepted through 20 June. See "Additional Information" tab for instructions.

Session 1: Medical Imaging
19 August 2024 • 9:30 AM - 11:10 AM PDT | Conv. Ctr. Room 17B
Session Chair: Andrew G. Tescher, AGT Associates (United States)
13137-1
Author(s): Gloria Bueno, Jesus Ruiz-Santaquiteria, Jesus Salido, Univ. de Castilla-La Mancha (Spain); Gabriel Cristobal, Instituto de Óptica "Daza de Valdés", Consejo Superior de Investigaciones Científicas (Spain); Oscar Deniz, Univ. de Castilla-La Mancha (Spain)
19 August 2024 • 9:30 AM - 9:50 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Market-available automated microscopy systems are often unaffordable for research institutions, especially those in economically disadvantaged nations, limiting their access to advanced technologies. This work addresses this challenge by developing a cost-effective virtual microscopy and telemicroscopy system, aiming to create a remote-controlled microscopy setup for analyzing digital samples with comparable performance to high-end equipment but at a reduced cost. The system includes a web platform for telemicroscopy, enabling remote control of the robotic stage and real-time viewing of the microscope camera. Additionally, a decision support system has been implemented, integrating AI-based models for identifying objects of interest in two use cases: i) analyzing water quality in biological samples and ii) identifying cancerous tissue in digital pathology samples. These models enhance diagnostic capabilities, leading to increased productivity for experts and reducing manual workload. Sample virtualization and automatic processing, simplify tasks for professionals, allowing remote participation in concurrent work sessions and streamlining processes for digital samples.
13137-2
Author(s): Hanieh Ajami, Al Mahmud, Mahdi Kargar Nigjeh, Md Sami Ul Hoque, Scott E. Umbaugh, Southern Illinois Univ. Edwardsville (United States)
19 August 2024 • 9:50 AM - 10:10 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
This research compares two deep-learning models, BetaVAEClassifier and PCAEClassifier, for identifying white matter lesions in the brains of multiple sclerosis patients. Both models use convolutional encoder-decoder architectures with different approaches for feature representation. The dataset, comprising various MRI modalities, undergoes data enhancement, compression, and augmentation. Evaluation metrics show promising results, highlighting the potential for accurate diagnosis and assessment in multiple sclerosis research.
13137-3
Author(s): Hanieh Ajami, Heena Chakradhar, Mahdi Kargar Nigjeh, Al Mahmud, Md Sami Ul Hoque, Scott E. Umbaugh, Southern Illinois Univ. Edwardsville (United States)
19 August 2024 • 10:10 AM - 10:30 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
This research extends previous studies on white matter lesion identification in multiple sclerosis. While the initial study with CVIPtools achieved a 90.63% success rate and the second study using deep learning architecture reached 93%, our current investigation focuses on compressed MRI datasets. The results indicated a significant 50% decrease in lesion identification accuracy using established methods, highlighting a limitation with the CVIPtools approach. However, the deep learning model maintained a remarkable 98.53% accuracy despite compression challenges, demonstrating its resilience and effectiveness in accurately classifying lesion and non-lesion classes.
13137-4
Author(s): Al Mahmud, Hanieh Ajami, Md Sami Ul Hoque, Roshan Silwal, Mahdi Kargar Nigjeh, Scott E. Umbaugh, Southern Illinois Univ. Edwardsville (United States)
19 August 2024 • 10:30 AM - 10:50 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
This paper introduces an automated system comparing VGG16 and ResNet50 for dermatoscopic image processing and classification. Swift and accurate diagnosis of skin lesions enable skin cancer detection at an early stage. This method utilized transfer learning and fine-tuning VGG16 and ResNet50 using the HAM10000 dataset. Random resampling balanced the dataset, optimizing models for accurate results with limited resources. We preprocessed images, performed data augmentation, modified the pre-existing models, and tuned the hyperparameters to increase the overall accuracy of both the models. Results demonstrate VGG16 and ResNet50 achieving 92.10% and 91.8% accuracy, respectively, showcasing the effectiveness of the proposed system in advancing early skin cancer intervention with deep learning techniques.
13137-5
Author(s): Mahdi Kargar Nigjeh, Hanieh Ajami, Al Mahmud, Md Sami Ul Hoque, Scott E. Umbaugh, Southern Illinois Univ. Edwardsville (United States)
19 August 2024 • 10:50 AM - 11:10 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
This study introduces a new method for automating the classification of brain tumors in MRI images using three deep learning models: VGG16, ResNet18, and DenseNet. The research uses a dataset that includes 7023 brain MRI images categorized into glioma, meningioma, no tumor, and pituitary classes. Data augmentation techniques are used to improve the learning process of the models, and an advanced image enhancement algorithm enhances tumor visibility. The study compares the models and identifies a methodology that achieves up to 95% accuracy. This research is a significant advancement in automated brain tumor classification, providing insights into deep learning models for medical imaging and guiding future research for more precise diagnostic devices.
Break
Coffee Break 11:10 AM - 11:40 AM
Session 2: Imaging Security I
19 August 2024 • 11:40 AM - 12:40 PM PDT | Conv. Ctr. Room 17B
Session Chair: Frederik Temmermans, Vrije Univ. Brussel (Belgium)
13137-6
Author(s): Frederik Temmermans, Vrije Univ. Brussel (Belgium), imec (Belgium); Sabrina Caldwell, The Australian National Univ. (Australia)
19 August 2024 • 11:40 AM - 12:00 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
While distributed version control systems offer a solid foundation for monitoring revision history, their effectiveness is hindered when dealing with digital media assets, which are often treated as opaque binary data. This makes it challenging to precisely track modifications and compromises storage efficiency. Despite this, a significant portion of embedded metadata within these files is actually textual in nature, though it remains unrecognized due to its integration into the binary structure. Moreover, alterations to the metadata and the underlying structure of metadata container formats, such as the JPEG Universal Metadata Box Format (JUMBF), go unnoticed during media rendering, further complicating the identification process. To address these issues, this paper proposes a solution that defines a standardized asset decomposition and structured serialization scheme. This framework enables the individual tracking of subcomponents within media assets, facilitating more accurate version control and metadata management.
13137-7
Author(s): James Rainey, Newcastle Univ. (United Kingdom); Jacob Hobson, Lamyaa Aljuaid, Newcastle Univ (United Kingdom); Deepayan Bhowmik, Newcastle Univ. (United Kingdom)
19 August 2024 • 12:00 PM - 12:20 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
AI-driven image manipulation techniques offer unprecedented capabilities for creativity and visual enhancement, but they also pose significant challenges in terms of authenticity, integrity, and misinformation. Current state-of-the-art techniques for image manipulation detection often struggle to discern subtle alterations made by AI algorithms and, as such, report poor detection results, necessitating the development of advanced detection methods capable of discerning AI manipulations. This paper presents a dataset of images containing AI-generated modifications and a new method for the detection of image manipulations that excels in AI-generated manipulations.
13137-8
Author(s): Roberto Herrera-Charles, Opeyemi M. Afolabi, José C. Núñez-Pérez, Instituto Politécnico Nacional (Mexico)
19 August 2024 • 12:20 PM - 12:40 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
The massive development of IoT, Big Data and other technologies has led to security concerns with respect to data protection. It has become imperative to develop solutions to protect our data, such as images, texts, and audios from unauthorized access. This work presents an encrypted image transmission scheme based on a chaotic dynamic configuration of two synchronized spherical chaotic attractors of 3 dimension in a master-slave topology. We synchronized the future evolution of the chaotic systems starting with different initial conditions using the Hamiltonian observer-based approach and then utilized the resulting phase space points as the pseudo-random numbers for securing image transmitted through the communication channel. The scheme designed is realized and implemented on the Multiprocessor System-on-Chip (MPSoC) platform by harnessing the easy and synthesizable programming features of Python with MPSoC. The image is transmitted through the state variables x1, x2, and x3, and analyzed using the two statistical techniques namely, information entropy and correlation analysis where the result shows the full recovery of the image that was transmitted through the state variables.
Break
Lunch Break 12:40 PM - 2:10 PM
Session 3: Imaging Security II
19 August 2024 • 2:10 PM - 3:10 PM PDT | Conv. Ctr. Room 17B
Session Chair: Frederik Temmermans, Vrije Univ. Brussel (Belgium)
13137-9
Author(s): Yuhang Lu, Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
19 August 2024 • 2:10 PM - 2:30 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
The past few years have witnessed remarkable advancement in the domain of face recognition thanks to the development of deep learning. However, the robustness of deep face recognition techniques in varying real-world conditions is a pressing challenge. This paper proposes to incorporate both pose-invariant and cross-resolution strategies into one face recognition framework and learning a unified feature representation. Firstly, a knowledge distillation paradigm is employed as the learning framework. The face recognition model learns to extract pose and resolution-robust features from varying faces in the wild under the guidance of the feature representation from frontal and high-resolution faces. Secondly, two sub-networks attached to the feature extractor are devised, which learn to bridge the discrepancy between face images in different poses or resolutions in deep feature space. Extensive experiments on different in-the-wild face recognition benchmarks demonstrate the superiority of the proposed method over the state-of-the-art.
13137-10
Author(s): Frederik Temmermans, Vrije Univ. Brussel (Belgium), imec (Belgium); Sabrina Caldwell, The Australian National Univ. (Australia); Deepayan Bhowmik, Newcastle Univ. (United Kingdom); Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
19 August 2024 • 2:30 PM - 2:50 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
JPEG Trust is a novel international standard that responds to the pressing need to assess trust of digital media assets. JPEG Trust provides a comprehensive framework addressing key elements such as provenance, authenticity, integrity, and copyright. Built on top of established JPEG and industry standards, the framework ensures compatibility across digital media ecosystems. This paper provides an overview of the JPEG Trust framework and illustrates its usage in several usage scenarios.
13137-11
Author(s): Sabina Caldwell, The Australian National Univ. (Australia); Frederik Temmermans, Vrije Univ. Brussel (Belgium)
19 August 2024 • 2:50 PM - 3:10 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
There is insufficient information in the literature about the impacts of image manipulation in society. While some anecdotes about qualitative factors exist, such factors are thinly covered in the literature, and estimates of quantitative, especially monetary, costs are even less available. That these costs are substantial is perhaps indicated by a 2019 study jointly issued by the University of Baltimore and Isael-based cybersecurity firm CHEQ claiming that, on the whole, fake news costs the global economy $78 billion annually, however the bases for such a figure are difficult to find. Quantifying the impacts of misinformative images is an important first step in addressing and mitigating these impacts. Furthermore, identifed quantitative factors have the potential to inform models providing justification for implementation of control measures for fake images, and may also assist in informing relevant policy and regulation. This paper identifies factors that may contribute to quantitative assessment of fake image impacts, with an exploration of approaches to usefully modelling such impacts.
Break
Coffee Break 3:10 PM - 3:40 PM
Session 4: Imaging Systems I
19 August 2024 • 3:40 PM - 5:00 PM PDT | Conv. Ctr. Room 17B
Session Chair: Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
13137-12
Author(s): Danika Gupta, The Harker School (United States); Awani Gadre, Santa Clara Univ. (United States)
19 August 2024 • 3:40 PM - 4:00 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Mosquito-borne diseases annually impact 3 billion people and cause over 500,000 deaths. Traditional identification methods, requiring specialized skills and equipment, limit monitoring scalability and are challenged by climate-induced habitat changes. Our study introduces a scalable solution through citizen science, leveraging smartphone imagery for mosquito identification despite challenges like varied backgrounds. We utilize object detection for precise mosquito identification from diverse images, converting a classification dataset into one with annotated bounding boxes for two primary breeds: Aedes Albopictus and Culex Quinquefasciatus. Training on 10,000 images from Mosquito Alert and testing with a Malaysian dataset, our model demonstrates high accuracy (mAP50 of 90% and 99% respectively), showing promise for global mosquito monitoring and enhancing public health efforts.
13137-13
Author(s): Roberto Herrera-Charles, Teodor Alvarez-Sanchez, Jesus Antonio Álvarez-Cedillo, Instituto Politécnico Nacional (Mexico)
19 August 2024 • 4:00 PM - 4:20 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
The Vegetation Index(VIs) is determined as a parameter calculated from the reflectance values ​​at different wavelengths of the vegetation and is particularly sensitive to the vegetation cover.. The problem of detecting the vegetation index utilizing UAVs has been addressed in multiple articles in the literature, in which many special hardware and thermal or infrared cameras are adapted to improve its detection. This article seeks to identify the vegetation index from its biophysical parameters. We help ourselves with artificial intelligence algorithms and machine learning algorithms. A semi-physical model was designed to estimate the ecosystem and establish the vegetation index correctly. The results will be validated by remote sensing. Finally, an ecological model will be developed to simulate the environmental impact on vegetation patterns and geographic plains. The proposed model successfully imitated the urban effect. Given these results, it was possible to predict better the impact of changing seasons in a defined geographic area.
13137-14
Author(s): Teymoor Ali, George Paul, Newcastle Univ. (United Kingdom); Robert Nicol, STMicroelectronics (R&D) Ltd. (United Kingdom); Deepayan Bhowmik, Newcastle Univ. (United Kingdom)
19 August 2024 • 4:20 PM - 4:40 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
CNN algorithms have become ubiquitous within the vision domain, encompassing a wide array of tasks, including object detection, segmentation, and classification. However, executing complex CNN algorithms on real-time vision systems demands better energy efficiency, runtime, and accuracy requirements. This has led to innovative computing architectures, leveraging heterogeneity that combines CPUs, GPUs, FPGAs, and other accelerators into a single processing fabric. However, scheduling and partitioning algorithms remain an arduous task, particularly when distributing operations among accelerators that have different computing paradigms. This paper proposes a scheduler targeting heterogeneous vision systems, which carefully fine-grain partitions and maps layers and sub-operations of state-of-the-art convolutional neural networks and image processing algorithms. Our experiments reveal that the scheduled partitioned algorithms perform better in both energy and runtime efficiency compared to their best-performing homogeneous components executing the complete algorithm.
13137-15
Author(s): Hanieh Ajami, Mahsa Kargar Nigjeh, Mahdi Kargar Nigjeh, Scott E. Umbaugh, Southern Illinois Univ. Edwardsville (United States)
19 August 2024 • 4:40 PM - 5:00 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
This study explores two algorithms for removing rain streaks in car images. The first method utilizes CVIPtools software and the second method combines CVIPtools and Python. While both methods address rain streak removal, method 1 loses more information, while method 2, though requiring more programming expertise, is more robust and accurate in preserving significant details.
Poster Session
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Conference attendees are invited to attend the poster session on Monday evening. Come view the posters, enjoy light refreshments, ask questions, and network with colleagues in your field. Authors of poster papers will be present to answer questions concerning their papers. Attendees are required to wear their conference registration badges to the poster sessions.

Poster Setup: Monday 10:00 AM - 4:30 PM
Poster authors, view poster presentation guidelines and set-up instructions at https://spie.org/OP/poster-presentation-guidelines
13137-44
Author(s): Ana Karen Peraza Munoz, Univ. Autónoma de Baja California (Mexico)
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Show Abstract + Hide Abstract
This project shows the latest scientific development in digital image processing applied to wine grape characterization. The literature review highlights that existing studies on both external and internal grape characteristics often employ expensive equipment or lack essential features for robust prediction results. Consequently, experts in the field advocate for new studies considering a broader range of characteristics and economic viability for end-users. An analysis of the Scopus database, using keywords like "grape image processing" identified 285 papers covering 2012 to 2023; additionally, advanced searches related to maturation, color, chemical analysis, phenolic composition, sugar content, prediction models, and correlation of physical and chemical attributes indicate an area of opportunity due to the decrease of works found on these specific topics. Bibliometric results reveal the evolving research landscape in these areas over the past decade, with notable authors such as Whitty, M., and Liu, S. Leading institutions and countries include China, India, the United States, and Spain.The VOSviewer software was employed to confirm influential studies and trends in the field.
13137-45
Author(s): Lingbo Cai, Hongyang Dong, Xiaohan Chang, Leijian Wang, Shandong Univ. (China); Guang Deng, TomoWave Suzhou Medical Imaging Co., Ltd. (China); Jing Han, China Univ. of Mining and Technology (China); Chun Wang, Shandong Univ. (China)
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Show Abstract + Hide Abstract
We conducted the research on photoacoustic computed tomography (PACT) in small animal models with prostate tumors using the photoacoustic tomography (PAT) system of LOIS-3D, and achieved good imaging results without the use of exogenous contrast agents. The excitation light source with a wavelength of 755 nm was employed to image the vascular structure of mice, achieving a comprehensive visualization of the overall vascular network distribution. The vascular structure with an irregular shape in tumor tissue exhibited obvious differences compared to normal tissue, which can provide a valuable reference for the diagnosis of tumors.
13137-46
Author(s): Artyom S. Makovetskii, Sergei Voronin, Chelyabinsk State Univ. (Russian Federation); Vitaly Kober, CICESE (Mexico); Aleksei Voronin, Chelyabinsk State Univ. (Russian Federation)
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Show Abstract + Hide Abstract
Simultaneous Localization and Mapping (SLAM) is the task of reconstructing an environmental model passed using on-board sensors and at the same time maintaining an estimate of the mobile sensor location within the model. One of the known approaches to the SLAM problem is the Kalman filter. The Kalman filter efficiency is based on the fact that it contains a fully correlated posterior over feature maps and mobile sensor poses. The important element of the SLAM problem is the reconstruction of the environmental 3D scene. In this paper, we propose an algorithm to restore the 3D scene using consistent condition and a modified version of the Kalman filter. The reconstruction algorithm is non-iterative. Computer simulation results are provided to illustrate the performance of the proposed method.
13137-47
Author(s): Sergei Voronin, Chelyabinsk State Univ. (Russian Federation); Vitaly Kober, CICESE (Mexico); Artyom S. Makovetskii, Aleksei Voronin, Dmitrii Zhernov, Chelyabinsk State Univ. (Russian Federation)
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Show Abstract + Hide Abstract
3D point cloud registration is of great importance in robotics and computer vision to find a rigid body transformation to align a pair of point clouds with unknown point correspondences. In recent years, the deep learning model has dominated the field of computer vision. The important part of registration is the estimation of correspondences between point clouds. The main idea of studying correspondences between point clouds is to establish correspondences through the multidimensional features of each point. In this paper, we propose a simple neural network algorithm to register incongruent point clouds. The proposed algorithm utilizes the virtual points and is partially based on the PointNet++ neural network. Computer simulation results are provided to illustrate the performance of the proposed method.
13137-48
Author(s): Vladislav Pryadka, Chelyabinsk State Univ. (Russian Federation); Vitaly Kober, CICESE (Mexico); Andrei Krendal, Chelyabinsk State Univ. (Russian Federation)
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Show Abstract + Hide Abstract
To effectively use deep learning feature extraction, we train a large model on usual detection and segmentation to find abnormalities in mammogram screening. Such a model is then used for distillation or transfer learning to train a smaller network, which would be much easier and faster to use without much loss of quality. We create a pipeline in which such a smaller distilled model is used to extract deep features from mammogram screenings and to create dictionaries of these features for unsupervised anomaly detection. If segment features do not match or are too different from any known data in the network, then previously learned clusters are used to create new groups in our dictionary, which helps us find and group any similar pathology.
13137-49
Author(s): Sergei Voronin, Artyom S. Makovetskii, Chelyabinsk State Univ. (Russian Federation); Vitaly Kober, CICESE (Mexico); Aleksei Voronin, Chelyabinsk State Univ. (Russian Federation)
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Show Abstract + Hide Abstract
Mammography screening also leads to a high rate of false positive results. This may lead to unnecessary worry, inconvenient follow-up care, additional imaging studies, and sometimes the need for tissue. blood draws (often a needle biopsy). Convolutional neural networks (CNN) are one of the most important networks in the field of deep learning. The neural networks form some feature vectors often contain weak features. There are known methods for eliminating weak features based on the mutual information. In this paper, we propose a convolutional neural network based to recognize local geometrical features. Computer simulation results are provided to illustrate the performance of the proposed method.
13137-50
Author(s): Luis Rodríguez, CICESE (Mexico); José A. González, Univ. Autónoma de Baja California (Mexico); Vitaly Kober, CICESE (Mexico)
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Show Abstract + Hide Abstract
This work proposes a multi-class model for breast pathology classification using a combination of machine and deep learning methods, aimed at improving classification rates and minimizing false positives. The proposed method encompasses the following steps: preprocessing of image datasets, training of base classification models, and construction of a meta-classifier. The model enhances the performance of single classifiers and is benchmarked against various machine learning models. Finally, the method is evaluated using the MIAS and CBIS-DDSM mammography datasets
13137-51
Author(s): Nadezhda D. Tolstoba, Ruslan Nasretdinov, Kirill Bodrov, ITMO Univ. (Russian Federation)
19 August 2024 • 5:30 PM - 7:00 PM PDT | Conv. Ctr. Exhibit Hall A
Show Abstract + Hide Abstract
In recent years, 3D printing has gained prominence in manufacturing. To enhance productivity and quality in this field, real-time management of equipment and printing processes is crucial. Technical vision systems utilizing video signals from cameras can aid in analyzing and optimizing the printing process. Challenges include developing algorithms for high-resolution video processing in real-time. These systems help monitor product quality, detect printing errors, and automate processes. Research is needed to integrate advanced computer vision, machine learning, and image processing methods into 3D printing control systems. Focused on the FFF method, a methodology using neural networks for real-time error detection and correction in 3D printing has been developed. Preliminary work with a dataset shows promise for enhancing printing parameter predictions using neural networks.
Session 5: Imaging Systems II
20 August 2024 • 9:15 AM - 10:15 AM PDT | Conv. Ctr. Room 17B
Session Chair: Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
13137-16
Author(s): James Rainey, Ines R. Blach, Newcastle Univ. (United Kingdom); Douglas MacLachlan, John Wannan, Falcon Foodservice Equipment (United Kingdom); Deepayan Bhowmik, Newcastle Univ. (United Kingdom)
20 August 2024 • 9:15 AM - 9:35 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Real-world food recognition is a challenging task, as the contents of a plate of food can be complex intermixed objects, making it difficult to define their individual structures. Deep learning methods have shown better accuracy and ability to identify ingredients and types of food compared to traditional approaches for image classification. However, many deep learning methods rely on powerful computational resources, which have limitations in terms of cost, energy consumption, and size. Our method utilises deep-learning methods for detection and segmentation that are optimised for resource-constrained embedded platforms. The resulting system provides a fast, accurate way to recognise foods without requiring expensive, energy-intensive hardware.
13137-17
Author(s): Patrick Maier, Univ. of Stirling (United Kingdom); James Rainey, Newcastle Univ. (United Kingdom); Elena Gheorghiu, Univ. of Stirling (United Kingdom); Kofi Appiah, Univ. of York (United Kingdom); Deepayan Bhowmik, Newcastle Univ. (United Kingdom)
20 August 2024 • 9:35 AM - 9:55 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Despite tremendous advancement in computer vision, especially with deep learning, understanding scenes in the wild remains challenging. Even modern image classification models often misclassify when presented with out-of-distribution inputs despite having been trained on tens of millions of images or more. Moreover, training modern deep-learning classifiers requires a lot of energy due to the need to iterate many times over the training set, constantly updating billions of model parameters. Owing to problems with generalisability and robustness as well as efficiency, there is growing interest in computer vision to mimic biological vision (e.g. human vision) in the hope that doing so will require fewer resources for training both in terms of energy and in terms of data sets while increasing robustness and generalizability. This paper proposes a biologically plausible neuromorphic vision system that is based on a spiking neural network and is evaluated on the classification of hand-written digits from the MNIST dataset.
13137-18
Author(s): Tengfei Long, Weili Jiao, Guojin He, Aerospace Information Research Institute (China)
20 August 2024 • 9:55 AM - 10:15 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
While traditional on-orbit geometric calibration relies on comprehensive imaging parameters, such models are often unavailable for widely distributed remote sensing products. This limitation hinders the geometric accuracy of these images, impacting their usability for various applications. To address this challenge, we propose a novel approach that leverages rational polynomial coefficients (RPCs) to refine the geometric fidelity of remote sensing images. By employing RPCs, our method bypasses the need for a rigorous sensor model, making it applicable to a broader range of remote sensing data. This paper details the methodology and demonstrates its effectiveness in improving geometric accuracy.
Break
Coffee Break 10:15 AM - 10:45 AM
Session 6: AI-Based Imaging
20 August 2024 • 10:45 AM - 12:05 PM PDT | Conv. Ctr. Room 17B
Session Chair: Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
13137-19
Author(s): Mahdi Kargar Nigjeh, Mahsa Kargar Nigjeh, Scott E. Umbaugh, Southern Illinois Univ. Edwardsville (United States)
20 August 2024 • 10:45 AM - 11:05 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Optical coherence tomography (OCT) is a crucial tool in ophthalmology. It aids in diagnosing and managing various ocular conditions by visualizing intricate retinal structures. Despite widespread adoption, the manual analysis of OCT images remains time-consuming and labor-intensive. This study presents a novel approach to streamlining this process by integrating artificial intelligence (AI) techniques.
13137-21
Author(s): Md Sami Ul Hoque, Al Mahmud, Roshan Silwal, Hanieh Ajami, Mahdi Kargar Nigjeh, Scott E. Umbaugh, Southern Illinois Univ. Edwardsville (United States)
20 August 2024 • 11:05 AM - 11:25 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
The advent of high-resolution satellite imagery has revolutionized remote sensing, providing unparalleled access and detail worldwide. This project utilizes the OpenEarthMap dataset, a comprehensive collection of high-resolution earth observation images, to enhance land cover classification accuracy through deep learning. By optimizing a U-Net convolutional neural network architecture, analyzing different learning rates, and applying image variation preprocessing, we significantly improve semantic segmentation performance. Our methodology includes thorough dataset preparation, preprocessing, network parameter experimentation, and model evaluation. Using 3500 OpenEarthMap images and the Adam optimizer, our model achieved an f-score of 0.75, consistent with visual interpretation. Future work will explore additional images, a 4x image split implementation, and alternative models to further enhance classification accuracy.
13137-22
Author(s): Shunsuke Akamatsu, Michela Testolina, Evgeniy Upenik, Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
20 August 2024 • 11:25 AM - 11:45 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Recently, the image compression field has seen a shift in paradigm thanks to the rise of neural network-based models, such as the future JPEG AI standard. While most research to date has focused on image coding for humans, JPEG AI is planning to address machine vision by presenting a number of non-normative decoders addressing multiple image processing and computer vision tasks. While the impact of conventional image compression on classification tasks has already been addressed, no study has been conducted to assess the impact of learning-based image compression on such tasks. In this study, the impact of learning-based image compression, including JPEG AI, on the classification task is reviewed and discussed. The study reviews the impact of JPEG AI compression on a variety of image classification models and shows the superiority of JPEG AI over other learning-based compression models.
13137-24
Author(s): Hassane Guermoud, InterDigital, Inc. (France); Philippe Bordes, Franck Galpin, Thierry Dumas, Edouard François, Gagan Rath, InterDigital (France)
20 August 2024 • 11:45 AM - 12:05 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
The Versatile Video Coding (VVC) video coding standard specifies a tool named Reference Picture Resampling (RPR), designed for dynamic adaptive resolution change. This tool is also included in the Enhanced Coding Model (ECM) currently developed as exploratory work by JVET. RPR is well designed to support frame changing resolution without inserting intra refresh picture. Video streaming and low delay scenarios can take advantage of RPR to ensure a smooth frame-based bit-rate adaptation, compared to traditional techniques that can generates bitrate leaps. Substantial video coding gains may be obtained from this new feature by properly deciding at the encoding of the optimal picture resolution to be used per video segment. In this paper, a neural network regressor to predict the picture resolution change decision is presented, and adaptation of downscaling factor is proposed to improve VVC coding efficiency in the context of random access and all intra modes configurations.
Break
Lunch/Exhibition Break 12:05 PM - 2:30 PM
Signal, Image, and Data Processing Keynote
20 August 2024 • 2:30 PM - 3:15 PM PDT | Conv. Ctr. Room 17B
Session Chair: Khan Iftekharuddin, Old Dominion Univ. (United States)

2:30 PM - 2:35 PM:
Welcome and Opening Remarks
13136-501
Author(s): Zhi-Pei Liang, Univ. of Illinois (United States)
20 August 2024 • 2:35 PM - 3:15 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
The ongoing paradigm shift in healthcare towards personalized and precision medicine is posing a critical need for noninvasive imaging technology that can provide quantitative tissue and molecular information. Magnetic resonance signals from biological systems contain information from multiple molecules and multiple physical/biological processes (e.g., T1 relaxation, T2 relation, diffusion, perfusion, etc.). So, magnetic resonance imaging (MRI) is inherently a high-dimensional imaging technology that can acquire structural, functional and molecular information simultaneously. In practice, due to the curse of dimensionality, MRI experiments are often done in a low-dimensional setting to acquire biomarkers one at a time. Such a “divide-and-conquer” approach not only reduces data acquisition efficiency but also makes it difficult to obtain molecular information in high resolution. By synergistically integrating machine learning with sparse sampling, constrained image reconstruction and quantum simulation, we have successfully demonstrated ultrafast high-dimensional imaging of the brain. This talk will give an overview of this unprecedented omni imaging technology and show some exciting experimental results of brain function and diseases.
Break
Coffee Break 3:15 PM - 3:30 PM
Optical Engineering Plenary
20 August 2024 • 3:30 PM - 5:35 PM PDT | Conv. Ctr. Room 6A
3:30 PM - 3:35 PM:
Welcome and Opening Remarks
13138-501
Author(s): Manuel Gonzalez-Rivero, Maxar Technologies (United States)
20 August 2024 • 3:35 PM - 4:15 PM PDT | Conv. Ctr. Room 6A
Show Abstract + Hide Abstract
With 140+ petabytes of historical data holdings, 3.8 million square kilometers of daily multi-spectral collection, integration of Synthetic Aperture Radar and newly launching assets every quarter, the opportunities to develop insight from sense making technologies at Maxar are ever growing. During this discussion, we will cover the challenges of collecting, organizing, and exploiting multi source electro-optical remote sensing systems at scale using modern machine learning architectures and techniques to derive actionable insights.
13131-501
Author(s): Nelson E. Claytor, Fresnel Technologies Inc. (United States)
20 August 2024 • 4:15 PM - 4:55 PM PDT | Conv. Ctr. Room 6A
13145-501
Author(s): Jeremy S. Perkins, NASA Goddard Space Flight Ctr. (United States)
20 August 2024 • 4:55 PM - 5:35 PM PDT | Conv. Ctr. Room 6A
Session 7: Imaging Performance Assessment I
21 August 2024 • 9:00 AM - 10:20 AM PDT | Conv. Ctr. Room 17B
Session Chair: Thomas Richter, Fraunhofer-Institut für Integrierte Schaltungen IIS (Germany)
13137-25
Author(s): Thomas Richter, Fraunhofer-Institut für Integrierte Schaltungen IIS (Germany)
21 August 2024 • 9:00 AM - 9:20 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
JPEG XS is a lightweight, low-latency image coding standard for transmission of video streams over IP. To transmit video over an error-prone networks such as a WAN or wireless networks, error correction need to be considered. If in addition low-latency is a design requirement, forward error correction according to according to SMPTE ST 2022-5 is a favourable choice. In this work, we study JPEG XS transmission in lossy networks with and without forward error correction and report on the outcome of an experiment where we measured image quality as a function of error rate and also provide estimates on the additional latency due to the error correction layer.
13137-26
Author(s): Luis David Duarte Moreno, Manuel G. Forero, Miguel A. Rivera, Univ. de Ibagué (Colombia); Christian S. Gonzalez, Andrés L. Mogollón, Adriana P. Noguera, Univ. Nacional Abierta y a Distancia (Colombia)
21 August 2024 • 9:20 AM - 9:40 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Traditionally, rice quality relied on manually estimating whole and broken grains, a slow and subjective process. This study explores leveraging image processing and machine learning for a more efficient approach, but achieving clear images is crucial. The study details a meticulously designed protocol considering background, grain shape, translucency, and lighting to capture high-quality images. This aims to pave the way for automated analysis, ultimately improving accuracy, efficiency, and industry standards for better quality control and consistent rice products.
13137-27
Author(s): Luis David D. Duarte Moreno, Miguel A. Rivera, Manuel G. Forero, Univ. de Ibagué (Colombia); Adriana P Noguera, Univ. Nacional Abierta y a Distancia (Colombia)
21 August 2024 • 9:40 AM - 10:00 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Rice quality, crucial for the agri-food industry, relies on tedious and subjective manual grading of whole vs. broken grains. Seeking a solution, this study proposes a fully automated technique using digital image processing. While existing methods struggle with accuracy due to manual adjustments, this approach utilizes new algorithms to analyze images and overcome oversegmentation issues. By leveraging circularity information, it significantly reduces errors compared to manual grading, offering both efficiency gains and improved accuracy, paving the way for more automated and reliable rice quality assessment.
13137-28
Author(s): Michela Testolina, Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
21 August 2024 • 10:00 AM - 10:20 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
The JPEG Committee has recently initiated an activity known as JPEG AIC (Assessment of Image Coding) in response to recent advancements in image compression technology. This initiative aims to address the challenge posed by the high to nearly visually lossless range, where traditional subjective visual quality assessment protocols, such as those outlined in ITU-T Rec. BT.500, proved ineffective. The committee has issued a Call for Contributions on Subjective Image Quality Assessment. Furthermore, the committee is currently working on the Call for Proposals on Objective Image Quality Assessment. This paper aims to provide an overview of the future JPEG AIC-3 standards, highlighting recent advancements in this domain and outlining the roadmap for future advancements.
Break
Coffee Break 10:20 AM - 10:50 AM
Session 8: Imaging Performance Assessment II
21 August 2024 • 10:50 AM - 12:10 PM PDT | Conv. Ctr. Room 17B
Session Chair: Thomas Richter, Fraunhofer-Institut für Integrierte Schaltungen IIS (Germany)
13137-29
Author(s): Rafael Rodrigues, Antonio M. G. Pinheiro, Touradj Ebrahimi, RayShaper SA (Switzerland)
21 August 2024 • 10:50 AM - 11:10 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
With the constant increase in video resolution and frame rate for immersive content applications, there is a need for efficient coding strategies that can deliver very high visual quality with very low latency over 5G networks. JPEG XS is a low-complexity codec that can be implemented with very low latency, designed to provide visually lossless quality at high compression ratios, making it suitable for immersive video applications. This paper reports a quality evaluation of omnidirectional videos using the JVET 360º test sequences dataset coded with JPEG XS. A subjective quality experiment used an alternating double-stimulus method in a VR environment, where subjects freely commute between reference and distorted videos. Test sequences were encoded with JPEG XS at five different bitrates, ranging from 0.25 to 3 bpp. These bit rates are suitable for real-time high-resolution video transmission over 5G networks. It was concluded that JPEG XS provides an effective low latency solution suitable for high quality immersive applications using 5G networks.
13137-30
Author(s): Evgeniy Upenik, Davi Nachtigall Lazzarotto, Ecole Polytechnique Fédérale de Lausanne (Switzerland); Robin Mange, Javier Bello, Kepa Iturrioz, Anjo Martinez, Imverse SA (Switzerland); Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
21 August 2024 • 11:10 AM - 11:30 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
This paper introduces a methodology for evaluating quality of experience in mixed immersive communication systems. It focuses on assessing the impact of advanced 3D capture techniques, immersive eye-sensing light field displays, and efficient compression mechanisms in mixed setups where terminals offer different visual modalities. This research methodically investigates the influence of the above technologies on user experience in various communication scenarios. By employing a combination of qualitative and quantitative assessments, the study aims to develop methods for comprehensive evaluation of how such immersive technologies affect perceived visual quality, presence, engagement, and overall satisfaction and preference compared to traditional video communication methods. The experimental design incorporates a series of tests where participants interact through a state-of-the-art immersive communication setup, followed by detailed feedback sessions to gauge their experiences. Through this approach, the study seeks to uncover the nuances of user satisfaction in immersive environments and identify the key factors that enhance the overall quality of peer-to-peer communication.
13137-31
Author(s): Touradj Ebrahimi, Michela Testolina, Evgeniy Upenik, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
21 August 2024 • 11:30 AM - 11:50 AM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Several image coding formats have been proposed recently, either by standardization committees, industry consortia or private companies. Examples include JPEG XL, AVIF, HEIF, JPEGLI, WebP, HDPhoto. etc. Often, the originators of the format claim superiority in terms of performance when compared to state-of-the-art. However, these are usually claims for which the proof is provided by the originators of the technologies that, besides obvious bias, they might not have the necessary insights or time to spend optimizing the state of the art they compare to. In this paper, we start with an overview of different performance metrics that can and are used to assess the quality, efficiency, and effectiveness of image coding for current and emerging standards. We then compare the most recent state of the art in image coding and provide a detailed assessment of their performance when measured by those metrics.
13137-32
Author(s): Touradj Ebrahimi, Antonio M. G. Pinheiro, Rafael Rodrigues, RayShaper SA (Switzerland); Andrew Perkis, Oyvind Klungre, Norwegian Univ. of Science and Technology (Norway); Andrea Castelli, Brainstorm Multimedia (Spain)
21 August 2024 • 11:50 AM - 12:10 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Emerging 5G technologies bring various new opportunities for the media sector. In particular, they allow for the incorporation of ultra-high resolution video formats and immersive AR/VR/XR content into streaming applications while providing a reliable and high-quality user experience. In this paper, we focus on streaming immersive content in 8K and 360 within two scenarios and validate the feasibility of efficient, cost-effective solutions with significant added value measurement using various key performance indicators in the framework of a European innovation project called 5GMediaHUB.
Break
Lunch/Exhibition Break 12:10 PM - 1:40 PM
Session 9: Video ASICs for Data Center
21 August 2024 • 1:40 PM - 4:50 PM PDT | Conv. Ctr. Room 17B
Session Chair: Ryan Zhijun Lei, Meta (United States)
13137-33
Author(s): Pavel Novotny, Advanced Micro Devices, Inc. (Canada); Avinash Ramachandran, Advanced Micro Devices, Inc. (United States)
21 August 2024 • 1:40 PM - 2:00 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
The current scale of online video streaming requires hardware accelerated video transcoding solutions. Historically, hardware solutions have been excellent at offloading the computationally intensive tasks from CPUs, but often came with the penalty of being inflexible and not quickly adaptable to emerging market trends. We are presenting an architecture, which maintains all the benefits of hardware acceleration but also adds unparalleled level of programmability and flexibility. This architecture supports a wide spectrum of markets ranging from ultra-low latency encoding all the way to high quality Video On Demand markets with only firmware changes. These capabilities are achieved by a strategic combination of built-in hardware acceleration components and many embedded CPUs that have having full control over the video encoding pipeline flow. This architecture not only provides deterministic timing, which is critical for ultra-low latency transcoding, but it also offers flexibility and programmability allowing robust product roadmaps through simple firmware updates.
13137-34
Author(s): In Suk Chong, Google (United States)
21 August 2024 • 2:00 PM - 2:20 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
YouTube is actively driving advancements in the AV1 and AV2 video codecs to enhance streaming quality and efficiency for diverse user-generated content (UGC). Efforts include customizing the AV1 codec for UGC, optimizing quality/bitrate/compute tradeoffs, and developing hardware encoding/decoding support within YouTube's data centers to support AV1 at scale. To accelerate adoption, YouTube works to increase AV1 transcoding coverage, expand device compatibility, and contributes to the Alliance for Open Media (AOM) for the ongoing improvement of AV1 and AV2. Research focuses on novel quality metrics, hardware-software analysis, and potential modifications to the codecs to support emerging use cases like AR and VR. To ensure the practical feasibility of AV2, we assess hardware complexity and propose methods to reduce it. YouTube also prioritizes reducing AV1/AV2 encoder complexity using approaches like machine learning-based partition search pruning. Furthermore, YouTube led the collective efforts to modify existing tools by chairing HW Subgroup within the AOM group.
13137-35
Author(s): Ryan Zhijun Lei, Nick Wu, Hassene Tmar, Cosmin Stejerean, Ioannis Katsavounidis, Meta (United States)
21 August 2024 • 2:20 PM - 2:40 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
To extend the work we have done previously to benchmark encoder coding efficiency and performance for different open source software encoders, including x264, x264, libvpx, libaom and SVT-AV1, we want to also include the hardware encoders into a similar study. In this work, we have included few commercially available hardware AV1 encoder implementations from external vendors along with Meta’s MSVP VP9 encoder. A wider variety of test content are included in the study. In order to ensure a fair comparison between software and hardware encoders, we normalized the encoding performance to power used in watt-hour. In this paper, we will provide detailed description for test methodology, process for measuring compression efficiency and power usage. We will also discuss the limitations and future opportunities to improve the methodology.
13137-36
Author(s): Haibo Zhang, Zhijun Lei, Ping-Hao Wu, Gaurang Chaudhari, Wai Lun Tam, Meta (United States)
21 August 2024 • 2:40 PM - 3:00 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
In video streaming service for VOD use case, one important workflow is to transcode user uploaded videos into multiple encoded bitstreams with different bitrates and resolutions, which allows client players to leverage ABR (adaptive bitrate) algorithm to select the bitstream segments based on its available bandwidth. In this workflow, the key decision needs to be made is to determine the optimal encoding resolution and bitrate for every video at each quality or bitrate target in a ABR ladder. To tackle this challenge, an efficient two-stage convex hull based dynamic optimization framework was recently proposed. In this two-stage system, two different encoders, or encoder presets can be used to construct the convex hull to improve the computation efficiency. In this work, we study the cross codec encoding parameter prediction problem in the two-stage system to improve compression efficiency. We first describe how we formulate the prediction as an optimization problem. We propose two methods towards this optimization with validation results. We also discuss some potential directions that can further improve the results.
Coffee Break 3:00 PM - 3:30 PM
13137-37
Author(s): Ryan Zhijun Lei, Chenguang Zhou, Haoteng Chen, Cosmin Stejerean, Denise Noyes, Abhishek Gera, Meta (United States)
21 August 2024 • 3:30 PM - 3:50 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
In this paper, we will introduce how we implemented the client side ABR algorithm to enable delivery for mixed codec manifest. We will also share some results and potential opportunities for further optimization.
13137-38
Author(s): Jongju Kim, Sijung Kim, Syehoon Oh, Minyong Jeon, BLUEDOT, Inc. (Korea, Republic of)
21 August 2024 • 3:50 PM - 4:10 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
This paper presents a method to simplify content-aware encoding for streaming services using AI, aiming to enhance user experience and efficiency in bandwidth-limited environments. Unlike traditional encoding, which uses fixed bitrates, this AI-driven approach optimizes the bitrate based on the content's complexity, significantly reducing the necessary computational steps and bitrates. It achieves this by predicting an optimized Adaptive Bitrate (ABR) ladder through minimal encoding steps and lightweight analysis, resulting in substantial bitrate savings and streamlined workflow. The approach also fits well with the trend of integrating video processing ASICs in data centers, further enhancing cost-effectiveness and scalability.
13137-20
Author(s): Ungwon Lee, Sijung Kim, Jungtae Kim, Dong-gyu Kim, Minyong Jeon, BLUEDOT, Inc. (Korea, Republic of)
21 August 2024 • 4:10 PM - 4:30 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Advanced AI and new compression standards are required to improve the viewing experience and reduce service costs, but the explosion in computational complexity is a major barrier to adoption. In this paper, we describe how the development of dedicated hardware accelerators for super-resolution and preprocessing to improve encoder compression performance can significantly improve video quality and compression efficiency while reducing cost and development time. These advances represent an important step towards balancing high quality streaming services with operational efficiency.
13137-23
Author(s): John Plasterer, Lin Xu, NETINT Technologies Inc. (Canada)
21 August 2024 • 4:30 PM - 4:50 PM PDT | Conv. Ctr. Room 17B
Show Abstract + Hide Abstract
Rapid development and deployment of GPU based computation has led to an improvement in diffusion generation of video and images. Further, a rapid reduction in the effective cost of compression using NNC techniques provides opportunities to compress images and videos in new ways. The overall structure of diffusion based generative video and images is leveraged to take advantage of the compressed latents and lower overall compression costs and latency. This paper presents an architecture to compress latents for transmission and reduce overall latency and cost as compared to alternatives using traditional Codecs or NNC on the raw image. It further presents computational cost, quantitative and perceptual quality, and latency for this architecture as compared to the alternatives.
Featured Nobel Plenary
21 August 2024 • 5:00 PM - 5:45 PM PDT | Conv. Ctr. Room 6A
Session Chair: Jennifer Barton, The Univ. of Arizona (United States)

5:00 PM - 5:05 PM:
Welcome and Opening Remarks
13115-501
The route to attosecond pulses (Plenary Presentation)
Author(s): Anne L'Huillier, Lund Univ. (Sweden)
21 August 2024 • 5:05 PM - 5:45 PM PDT | Conv. Ctr. Room 6A
Show Abstract + Hide Abstract
When an intense laser interacts with a gas of atoms, high-order harmonics are generated. In the time domain, this radiation forms a train of extremely short light pulses, of the order of 100 attoseconds. Attosecond pulses allow the study of the dynamics of electrons in atoms and molecules, using pump-probe techniques. This presentation will highlight some of the key steps of the field of attosecond science.
Session 10: Imaging Technology
22 August 2024 • 9:00 AM - 12:10 PM PDT | Conv. Ctr. Room 11B
Session Chairs: Yuriy A. Reznik, Brightcove, Inc. (United States), Shan Liu, Tencent America, LLC (United States)
13137-39
Author(s): Davi Nachtigall Lazzarotto, Michela Testolina, Touradj Ebrahimi, Ecole Polytechnique Fédérale de Lausanne (Switzerland)
22 August 2024 • 9:00 AM - 9:20 AM PDT | Conv. Ctr. Room 11B
Show Abstract + Hide Abstract
The use of DNA molecules as a storage medium has been recently proposed as a solution to the exponentially increasing demand for data storage, achieving lower energy consumption and higher information density. The nucleotides composing the molecules can be regarded as quaternary symbols, but constraints are generally imposed to avoid sequences prone to errors during sequencing, storage, and synthesis. While the majority of previous works in the field have proposed methods for translating general binary data into nucleotides, others have presented algorithms tailored for specific data types such as images as well as joining source and channel coding into a single process. This paper proposes and evaluates a method that integrates DNA Fountain codes with state-of-the-art compression coding techniques, targeting the storage of images and three-dimensional point clouds. Results demonstrate that the proposed method outperforms previous techniques for coding images directly into DNA, putting forward a first benchmark for the coding of point clouds.
13137-40
Author(s): Yuriy A. Reznik, Brightcove, Inc. (United States)
22 August 2024 • 9:20 AM - 9:40 AM PDT | Conv. Ctr. Room 11B
Show Abstract + Hide Abstract
Karhuen-Loeve Transform (KLT) is a valuable tool in many applications, but its computation is not exactly trivial. Generally, it requires finding the solution of an eigenvector problem, and with general types of inputs, the typical path forward is to use iterative numerical methods. Such methods are usually complex. In some cases, KLTs allow approximations by sinusoidal transforms – DCT-II likely the best-known example, but the number of such cases is limited, and usually constrained to very simple (1-st order) processes. However, as we will show in this paper, for some short sizes, KLTs can still be computed analytically, with only mild assumptions about the structure of the covariance matrix. For example, we show analytic solutions for arbitrary real symmetric 3x3 covariance matrixes. With symmetric 3-diagonal and some special cases of 5-diagonal matrices the solutions can also be found. In the end, we discuss a few possible applications of such transforms for image and video coding.
13137-41
Author(s): Yuriy A. Reznik, Brightcove, Inc. (United States)
22 August 2024 • 9:40 AM - 10:00 AM PDT | Conv. Ctr. Room 11B
Show Abstract + Hide Abstract
We review the history of the development of one of the most iconic tools in image and video coding – the zigzag scan. Despite its apparent obviousness, we will show that its development was a non-trivial process that took several years, multiple iterations, and multiple ideas that eventually led to the formation of its final "zigzag" shape. Remarkably, we also discover that early variants of the zigzag scan appeared before the invention of the DCT, intra-predictors, and many other techniques in image and video coding algorithms. It is one of the oldest and most fundamental techniques in this context. The paper also traces the evolution of image and video codec architectures over the last six decades and brings examples of uses of the zigzag scan in modern-era image and video coding standards.
13137-42
Author(s): Yuriy A. Reznik, Brightcove, Inc. (United States)
22 August 2024 • 10:00 AM - 10:20 AM PDT | Conv. Ctr. Room 11B
Show Abstract + Hide Abstract
We review the history of the development of transform-based image and video codecs and re-construct logical chains that led to the inventions of DCT, zigzag scan, adaptive coding, and hybrid DPCM + transform-based architecture. We also review the subsequent evolution of this architecture and explain the reasoning behind multiple transform choices in modern video codecs (HEVC, VVC, etc.), in-loop filters, etc. Finally, we also describe the role of fast transform algorithms in image and video codec evolution and will give an outlook on current developments, including increasing use of CNNs, learning methods, and performance-energy usage tradeoffs that may shape future architectures.
Coffee Break 10:20 AM - 10:50 AM
13137-43
Author(s): Madhu Krishnan, Shan Liu, Tencent America, LLC (United States)
22 August 2024 • 10:50 AM - 11:10 AM PDT | Conv. Ctr. Room 11B
Show Abstract + Hide Abstract
This paper presents state of the art design and application of secondary transforms in the context of ongoing next generation video coding standardization beyond AV1 of AOM (Alliance for Open Media). Methods to apply flexible secondary transform sets and kernels on intra and inter coded blocks are discussed in detail. With the proposed methods, the encoder is empowered to optimize the transform sets for each intra prediction residual block and extend the application of secondary transform for inter prediction residuals. Experiment results on the reference software demonstrate that the proposed approach improves the overall coding efficiency measured in weighted YUV PSNR BD-Rates by 3.40% for All Intra (AI), 1.70% for Random Access (RA) and 1.10% for Low Delay (LD) configurations of AOM CTC (Common Test Condition) version 7.
13137-52
Author(s): Srivatsa Venkatachari Prativadibhayankaram, Thomas Richter, Siegfried Föessel, Fraunhofer-Institut für Integrierte Schaltungen IIS (Germany); André Kaup, Friedrich-Alexander-Univ. Erlangen-Nürnberg (Germany)
22 August 2024 • 11:10 AM - 11:30 AM PDT | Conv. Ctr. Room 11B
Show Abstract + Hide Abstract
With the advent of learned image compression, there are numerous models are being developed. Most of the learned image codecs are built on the variational autoencoder framework, where the analysis transform maps an image into a latent representation and a synthesis transform which reconstructs the image from the latent space. Additionally, there are quantization and entropy coding blocks. In this work, we analyse the energy distribution of latent channels and compare it with that of KLT. Additionally, we study the closeness of traditional codec design to that of learned image codecs.
13137-53
Author(s): Abhijith Jagannath, Yuriy Reznik, Brightcove INC (United States)
22 August 2024 • 11:30 AM - 11:50 AM PDT | Conv. Ctr. Room 11B
Show Abstract + Hide Abstract
The Karhunen-Loeve Transform (KLT) when applied to an AR (1) process with known block boundaries resembles the Discrete Sine Transform of type VII (DST-VII). Similarly, when both boundaries are known, the KLT becomes like the DST-I. In this paper, we will use the same methodology to suggest new shapes and forms of temporal prediction structures for video coding. Specifically, treat the Group of Pictures (GOP) as samples from some AR (1) process and interpret factorizations of the resulting DST-VII and DST-I transforms as sequences of temporal predictions with specific weight factors applied. Then, identify a subset of GOP lengths producing simple structures of short-length DST-VII and DST-I factorizations as candidates for practical implementations of temporal prediction algorithms. We will also analyze the coding gains achievable by traditional vs transform-based predictions considering single- and dual-side boundaries. These results may be of interest for future evolutions of video coding algorithms and architectures.
13137-54
Author(s): Lev Hnativ, V.M. Glushkov Institute of Cybernetics NASU (Ukraine)
22 August 2024 • 11:50 AM - 12:10 PM PDT | Conv. Ctr. Room 11B
Show Abstract + Hide Abstract
The paper proposes a new method of conducting integer approximations of discrete sine (DST) and discrete cosine (DCT) transforms of type IV. Compared to known fast integer algorithms for computing DST and DCT transforms of type IV of size 16, the proposed methods achieve a practically appreciable reduction in computational complexity. Fast 2D transforms based on the same process are also proposed and evaluated in the context of image and video coding applications. The proposed methods may be of interest for future image and video coding applications.
Conference Chair
AGT Associates (United States)
Conference Chair
Ecole Polytechnique Fédérale de Lausanne (Switzerland)
Program Committee
Qualcomm Inc. (United States)
Program Committee
Univ. Catholique de Louvain (Belgium)
Program Committee
Comcast Corp. (Israel)
Program Committee
Ben-Gurion Univ. of the Negev (Israel)
Program Committee
Meta (United States)
Program Committee
The Univ. of Southern California (United States)
Program Committee
Tencent America, LLC (United States)
Program Committee
KU Leuven Association (Belgium)
Program Committee
Instituto de Telecomunicações (Portugal)
Program Committee
Brightcove, Inc. (United States)
Program Committee
Fraunhofer-Institut für Integrierte Schaltungen IIS (Germany)
Program Committee
California Polytechnic State Univ., San Luis Obispo (United States)
Program Committee
Dolby Labs., Inc. (United States)
Program Committee
The Univ. of New South Wales (Australia)
Program Committee
FastVDO Inc. (United States)
Additional Information
POST-DEADLINE ABSTRACTS ACCEPTED UNTIL 20 June
New submissions considered for poster session, or oral session if space becomes available
Contact author will be notified of acceptance by 8-July
View Submission Guidelines and Agreement
View the Call for Papers PDF

Submit Post-Deadline Abstract

What you will need to submit

  • Presentation title
  • Author(s) information
  • Speaker biography (1000-character max including spaces)
  • Abstract for technical review (200-300 words; text only)
  • Summary of abstract for display in the program (50-150 words; text only)
  • Keywords used in search for your paper (optional)
Note: Only original material should be submitted. Commercial papers, papers with no new research/development content, and papers with proprietary restrictions will not be accepted for presentation.