13 - 17 April 2025
Orlando, Florida, US
This conference addresses the real-time aspects of image processing and real-time aspects of deep learning solutions in various imaging and vision applications. These aspects include algorithmic computational complexity, hardware implementation, and software optimization for the purpose of making an image processing or recognition system to operate in real-time for an application of interest. The SPIE Conference on Real-Time Image Processing and Deep learning is the continuation of the SPIE Conference on Real-Time Image and Video Processing that has been held for many years but is now expanded to include real-time deep learning for solving image recognition and identification problems. This conference, similar to the previous real-time image processing conferences, is intended to serve as a catalyst bringing together scientists and researchers from industry and academia working in real-time image processing and deep learning to present recent research results pertaining to real-time solutions to image processing and recognition applications.

Papers of interest include, but not limited to, the following general topics addressing real-time aspects of image processing and deep learning: ;
In progress – view active session
Conference 13034

Real-Time Image Processing and Deep Learning 2024

22 - 23 April 2024 | National Harbor 3
View Session ∨
  • Opening Remarks
  • 1: Real-Time Deep Learning I
  • 2: Real-Time Methods and Algorithms I
  • 3: Real-Time Deep Learning II
  • 4: Real-Time Methods and Algorithms II
  • Symposium Plenary
  • Symposium Panel on Microelectronics Commercial Crossover
  • 5: Real-Time Methods and Algorithms III
  • Poster Session
  • Digital Posters
Opening Remarks
22 April 2024 • 9:10 AM - 9:20 AM EDT | National Harbor 3
Session Chair: Nasser Kehtarnavaz, The Univ. of Texas at Dallas (United States)
The chair of Real-Time Image Processing and Deep Learning 2024 will introduce the sessions for this conference.
Session 1: Real-Time Deep Learning I
22 April 2024 • 9:20 AM - 10:00 AM EDT | National Harbor 3
Session Chair: Nasser Kehtarnavaz, The Univ. of Texas at Dallas (United States)
13034-3
Author(s): Blake Richey, The Univ. of Texas at Tyler (United States); Christos Grecos, Arkansas State Univ. (United States); Mukul V. Shirvaikar, The Univ. of Texas at Tyler (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Due to evolving chip architectures that support artificial intelligence (AI) on the edge, producing models that achieve top performance on small devices with limited resources has become a priority. Past research has primarily been focused on improving models by optimizing only a subset of the latency, memory, and power consumption aspects. We propose that possible solutions can be found using genetic algorithms by posing the above variables as part of a multi-reward optimization problem. Further, few methods have incorporated the device itself in the training process efficiently. In this paper, we construct an initial population of viable networks with varying parameters. The ultra low-power MAX 78000 platform is utilized as a model for edge deployment and results for the CIFAR10 deep learning dataset are presented.
13034-4
Author(s): Seungeon Lee, Donyung Kim, Sungho Kim, Yeungnam Univ. (Korea, Republic of)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
In this paper, we proposed a template matching technique using deep learning to match pairs of wide fields of view and narrow field of view infrared images. The Deep Learning network has a similar structure with the Atrous Spatial Pyramid Pooling (ASPP) module and both wide and narrow fields of view images are input to the same network, so the network weights are shared. Our experiments used the Galaxy S20 (Qualcomm Snapdragon 865) platform and show that the trained network has higher matching accuracy than other template matching techniques and is fast enough to be used in real time.
Break
Coffee Break 10:00 AM - 10:30 AM
Session 2: Real-Time Methods and Algorithms I
22 April 2024 • 10:30 AM - 11:50 AM EDT | National Harbor 3
Session Chairs: Nasser Kehtarnavaz, The Univ. of Texas at Dallas (United States), Mukul V. Shirvaikar, The Univ. of Texas at Tyler (United States)
13034-5
Author(s): Indranil Sinharoy, Madhukar Budagavi, SAMSUNG Research America (United States); Esmaeil Faramarzi, Apple Inc. (United States); Saifeng Ni, Abhishek Sehgal, SAMSUNG Research America (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Recent advancements in volumetric displays have opened doors to immersive, glass-free holographic experiences in our everyday environments. This paper introduces Holoportal, a real-time, low-latency system that captures, processes, and displays 3D video of two physically separated individuals as if they are conversing face-to-face in the same location. The evolution of work in multi-view immersive video communication from a Space-Time-Flow (STF) media technology to real time Holoportal communication is also discussed. Multiple cameras at each location capture subjects from various angles, with wireless synchronization for precise video-frame alignment. Through this technology we envision a future where any living space can transform into a Holoportal with a wireless network of cameras placed on various objects, including TVs, speakers, and refrigerators.
13034-6
Author(s): Bogdan Smolka, Damian Kusnik, Silesian Univ. of Technology (Poland); Milena Smolka, AGH Univ. of Science and Technology (Poland); Michal Kawulok, Silesian Univ. of Technology (Poland); Boguslaw Cyganek, AGH Univ. of Science and Technology (Poland)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
This paper centers on addressing the reduction of mixed Gaussian and impulsive noise in color images. Our proposed method comprises two essential steps. Firstly, we detect impulsive noise through a novel approach grounded in the concept of digital path exploration within the local pixel neighborhood. Each pixel is assigned a path cost, computed from the boundary of a local processing window to its center. When the central pixel exhibits corruption, it can be unequivocally identified as an impulse. To achieve this, we employ a threshold value for pinpointing corrupted pixels. Analyzing the distribution of path costs, we employ the k-means technique to classify pixels into three distinct categories: those nearly unaffected, those tainted by Gaussian noise, and those corrupted by impulsive noise. Subsequently, we employ Laplace interpolation techniques to restore the impulsive pixels — a fast and effective method yielding satisfactory denoising results. In the second step, we address the residual Gaussian noise using the Non-local Means method, which selectively considers pixels from the local window that have not been flagged as impulsive.
13034-7
Author(s): Benedykt Pawlus, Bogdan Smolka, Jolanta Kawulok, Michal Kawulok, Silesian Univ. of Technology (Poland)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Assessing smile genuineness from video sequences is a vital topic concerned with recognizing facial expression and linking them with the underlying emotional states. There have been a number of techniques proposed underpinned with handcrafted features, as well as those that rely on deep learning to elaborate the useful features. As both of these approaches have certain benefits and limitations, in this work we propose to combine the features learned by a long short-term memory network with the features handcrafted to capture the dynamics of facial action units. The results of our experiments indicate that the proposed solution is more effective than the baseline techniques and it allows for assessing the smile genuineness from video sequences in real-time.
13034-8
Author(s): Ethan Frakes, Umar Khalid, Chen Chen, Univ. of Central Florida (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Recent zero-shot text-to-video models have allowed for the elimination of fine-tuning diffusion models and can generate novel videos from a text prompt alone. While the zero-shot generation method greatly reduces generation time, many models rely on inefficient cross-frame attention processors. We address this issue by introducing recent attention processors to a video diffusion model. Specifically, we use FlashAttention-2, an attention processor that is highly optimized for efficiency and hardware parallelization. We then apply FlashAttention-2 to a video generator and test with both older diffusion models and newer, high-quality models. Our results show that using new attention processors alone can reduce generation time by around 25%. Combined with the use of higher quality models, this use of efficient attention processors in zero-shot generation presents a substantial efficiency and quality increase, greatly expanding the video diffusion model’s application to real-time video generation.
Break
Lunch Break 11:50 AM - 1:30 PM
Session 3: Real-Time Deep Learning II
22 April 2024 • 1:30 PM - 2:50 PM EDT | National Harbor 3
Session Chair: Mukul V. Shirvaikar, The Univ. of Texas at Tyler (United States)
13034-9
Author(s): Lamia Alam, Nasser Kehtarnavaz, The Univ. of Texas at Dallas (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Modern wafer inspection systems in Integrated Circuit (IC) manufacturing utilize deep neural networks. The training of such networks requires the availability of a very large number of defective or faulty die patterns on a wafer called wafer maps. The number of defective wafer maps on a production line is often limited. In order to have a very large number of defective wafer maps for the training of deep neural networks, generative models can be utilized to generate realistic synthesized defective wafer maps. This paper compares the following three generative models that are commonly used for generating synthesized images: Generative Adversarial Network (GAN), Variational Auto-Encoder (VAE), and CycleGAN which is a variant of GAN. The comparison is carried out based on the public domain wafer map dataset WM‐811K. The quality aspect of the generated wafer map images is evaluated by computing the five metrics of PSNR, structural similarity index measure (SSIM), inception score (IS), Fréchet inception distance (FID), and kernel inception distance (KID). Furthermore, the computational efficiency of these generative networks is examined in terms of their dep
13034-10
Author(s): Zephaniah Spencer, The Univ. of North Carolina at Charlotte (United States); Gunar Schirner, Northeastern Univ. (United States); Hamed Tabkhi, The Univ. of North Carolina at Charlotte (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
With the advent of deep learning, there has been an ever-growing list of applications to which Deep Convolutional Neural Networks (DCNNs) can be applied. The field of Multi-Task Learning (MTL) attempts to provide optimizations to many-task systems, improving performance by optimization algorithms and structural changes to these networks. However, we have found that current MTL optimization algorithms often impose burdensome computation overheads, require meticulously labeled datasets, and do not adapt to tasks with significantly different loss distributions. We propose a new MTL optimization algorithm: Batch Swapping with Multiple Optimizers (BSMO). We utilize single-task labeled data to train on a multi-task hard parameter sharing (HPS) network through swapping tasks at the batch level. This dramatically increases the flexibility and scalability of training on an HPS network by allowing for per-task datasets and augmentation pipelines. We demonstrate the efficacy of BSMO versus current SOTA algorithms by benchmarking across contemporary benchmarks & networks.
13034-11
Author(s): Gavin Halford, Arthur C. Depoian, Colleen P. Bailey, Univ. of North Texas (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Lower resolutions and a lack of distinguishing features in large satellite imagery datasets make identification tasks challenging for traditional image classification models. Vision Transformers (ViT) address these issues by creating deeper spatial relationships between image features. Self attention mechanisms are applied to better understand not only what features correspond to which classification profile, but how the features correspond to each other within each separate category. These models, integral to computer vision machine learning systems, depend on extensive datasets and rigorous training to develop highly accurate yet computationally demanding systems. Deploying such models in the field can present significant challenges on resource constrained devices. This paper introduces a novel approach to address these constraints by optimizing an efficient Vision Transformer (TinEVit) for real-time satellite image classification that is compatible with ST Microelectronics AI integration tool, X-Cube-AI.
13034-12
Author(s): Yan Zhang, Lei Pan, Univ. of Maryland, College Park (United States); Phillip Berkowitz, Mun Wai Lee, Intelligent Automation, Inc. (United States); Benjamin Riggan, Univ. of Nebraska-Lincoln (United States); Shuvra S. Bhattacharyya, Univ. of Maryland, College Park (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Deployment of deep neural networks (DNNs) for information extraction in commercial and defense sensing systems involves large design spaces and complex trade-offs among several relevant metrics. This work aims to assist designers of sensing systems in deriving efficient DNN configurations for specific deployment scenarios. To this end, we present a design space exploration framework in which DNNs are represented as dataflow graphs, and schedules (strategies for managing processing resources across different DNN tasks) are modeled in a formal, abstract form using dataflow methods as well. Integration with a multiobjective particle swarm optimization (PSO) strategy enables efficient evaluation of implementation trade-offs and derivation of Pareto fronts involving alternative deployment configurations. Experimental results using different DNN architectures demonstrate the effectiveness of our proposed framework in efficiently exploring complex design spaces for DNN deployment.
Break
Coffee Break 2:50 PM - 3:20 PM
Session 4: Real-Time Methods and Algorithms II
22 April 2024 • 3:20 PM - 4:40 PM EDT | National Harbor 3
Session Chair: Bogdan Smolka, Silesian Univ. of Technology (Poland)
13034-13
Author(s): Pablo Rangel, Scott Tardif, Mehrube Mehrubeoglu, Edward St. John, Preston Whaley, Matthew Salas, Daniel Armstrong, Marcial Torres, Texas A&M Univ. Corpus Christi (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
This paper showcases the integration of several technologies to develop an Unmanned Traffic Management System. This system contributes to improved safety and reliability in various applications, from civilian to military contexts. Furthermore, the exploration of dynamic vision-based drone detection methods adds valuable insights into the field of real-time image processing and deep learning. In that perspective, a more in-depth computer vision development is been presented. The system's core components include an unmanned ground vehicle (UGV) guided through a wireless mesh network and an unmanned aircraft vehicle (UAV) is controlled by an IoT cloud platform. One of the primary objectives of this research is the development of a dynamic vision-based drone detection system for sense and avoid actions. Two different methods are explored for drone detection: Viola&Jones and You Only Look Once (YOLO) algorithms. The performance of these methods is evaluated, providing insights into the effectiveness of each approach in real-time drone detection.
13034-14
Author(s): Yuchen Tian, Turfs University (United States); Sidike Paheding, Fairfield Univ. (United States); Ehsan Azimi, Eung-Joo Lee, The Univ. of Arizona (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Surgical image and video applications, particularly using endoscopic datasets, are utilized to advance surgical assistant systems during operations. While advanced deep neural networks have significantly improved various tasks, research on surgical action recognition remains limited and still faces challenges, especially in dynamic imaging and real-time processing. In this study, we present a framework employing video masked autoencoders (VideoMAE) V2 for action recognition in endoscopic datasets.. Specifically, our approach utilizes EndoVIT, a specialized Vision Transformer pretrained for endoscopic datasets, thereby demonstrating the effectiveness of masked autoencoders in surgical action recognition tasks.
13034-15
Author(s): Prabha Sundaravadivel, The Univ. of Texas at Tyler (United States); Preetha J. Roselyn, Dept of EEE, SRM Institute of Science and Technology (India); Vedachalam Narayanaswamy, National Institute of Ocean Technology (India); Vincent I. Jeyaraj, Aishree Ramesh, Dept. of AI and Data Science, Saveetha Engineering College (India); Aaditya Khanal, Dept of Chemical Engineering, The Univ. of Texas at Tyler (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
Real-time underwater surveillance frameworks are limited by the capacity of the low-power devices used for monitoring underway. Image-based large language models (LLM) understand images and generate text. Integrating them for underwater data analysis improves environmental assessment and autonomous vehicle responses. Deploying these models on low-power microcontrollers is challenging. In this research, we propose image-based LLMs that can help in interpreting underwater environments by generating multi-modal tokens for collected images. These tokens are sent to AUVs or ROVs, where LLMs run. We evaluate this approach with our optimized transformer architectures on a NVIDIA Jetson Nano-based AUV, to classify between boat and fish images. This integrated framework reduces response time and data traffic in underwater monitoring.
13034-16
Author(s): Omer Sevinc, Mehrube Mehrubeoglu, Kirk Cammarata, Chi Huang, Texas A&M Univ. Corpus Christi (United States); Lifford McLauchlan, Texas A&M University-Kingsville (United States)
On demand | Presented live 22 April 2024
Show Abstract + Hide Abstract
This study proposes a novel approach for clustering of seagrass images into three distinct age categories: young, medium, and old, using deep learning and unsupervised machine learning techniques. VGG-16 convolutional neural networks are employed for feature extraction followed by K-means clustering to categorize the seagrass samples into the specified age groups. Images are first pre-processed to ensure consistent size and quality. To enable real-time capabilities, an optimized VGG-16 CNN is fine-tuned on the annotated dataset to learn discriminative features that capture age-related characteristics within the constraints of real-time image processing. After feature extraction, the K-means clustering algorithm is applied to group the images into young, medium, and old categories based on the learned features. The clustering results are evaluated using quantitative metrics such as the silhouette score and Davies-Bouldin index, demonstrating the effectiveness of the proposed method in capturing age-related patterns in seagrass imagery. This research contributes to the field of seagrass monitoring through an automated and real-time approach to age-based classification of seagrasses.
Symposium Plenary
22 April 2024 • 5:00 PM - 6:30 PM EDT | Potomac A
Session Chairs: Tien Pham, The MITRE Corp. (United States), Douglas R. Droege, L3Harris Technologies, Inc. (United States)

View Full Details: spie.org/dcs/symposium-plenary

Chair welcome and introduction
22 April 2024 • 5:00 PM - 5:05 PM EDT

DoD's microelectronics for the defense and commercial sensing ecosystem (Plenary Presentation)
Presenter(s): Dev Shenoy, Principal Director for Microelectronics, Office of the Under Secretary of Defense for Research and Engineering (United States)
22 April 2024 • 5:05 PM - 5:45 PM EDT

NATO DIANA: a case study for reimagining defence innovation (Plenary Presentation)
Presenter(s): Deeph Chana, Managing Director, NATO Defence Innovation Accelerator for the North Atlantic (DIANA) (United Kingdom)
22 April 2024 • 5:50 PM - 6:30 PM EDT

Symposium Panel on Microelectronics Commercial Crossover
23 April 2024 • 8:30 AM - 10:00 AM EDT | Potomac A

View Full Details: spie.org/dcs/symposium-panel

The CHIPS Act Microelectronics Commons network is accelerating the pace of microelectronics technology development in the U.S. This panel discussion will explore opportunities for crossover from commercial technology into DoD systems and applications, discussing what emerging commercial microelectronics technologies could be most impactful on photonics and sensors and how the DoD might best leverage commercial innovations in microelectronics.

Moderator:
John Pellegrino, Electro-Optical Systems Lab., Georgia Tech Research Institute (retired) (United States)

Panelists:
Shamik Das, The MITRE Corporation (United States)
Erin Gawron-Hyla, OUSD (R&E) (United States)
Carl McCants, Defense Advanced Research Projects Agency (United States)
Kyle Squires, Ira A. Fulton Schools of Engineering, Arizona State Univ. (United States)
Anil Rao, Intel Corporation (United States)

Break
Coffee Break 10:00 AM - 10:30 AM
Session 5: Real-Time Methods and Algorithms III
23 April 2024 • 10:30 AM - 11:30 AM EDT | National Harbor 3
Session Chair: Eung-Joo Lee, The Univ. of Arizona (United States)
13034-17
Author(s): Vincent Nwaneri, Daniel Uyeh, Patience Mba, Daniel Morris, Michigan State Univ. (United States)
23 April 2024 • 10:30 AM - 10:50 AM EDT | National Harbor 3
Show Abstract + Hide Abstract
In the global agricultural landscape, dairy cattle are of paramount economic importance because they produce essential products like milk, butter, and cheese. Ensuring their well-being and sustaining production necessitate effective feed management. Traditional methods for assessing feed quality are labor-intensive and destructive, posing risks of resource wastage and production interruptions. This study addresses this challenge by introducing a novel approach to classify feed materials and Total Mixed Rations (TMR) for dairy cattle. Utilizing RGB images and a dual-branch neural network based on the VGG16 architecture, the model achieved 86.72% accuracy in feed categorization. This automates real-time feed analysis, offering high precision, and lays the foundation for further advancements in precision animal production through deep learning in practical agricultural contexts.
13034-18
Author(s): Dzmitry Kasinets, Amir K. Saeed, Benjamin A. Johnson, Benjamin M. Rodriguez, Johns Hopkins Univ. (United States)
On demand | Presented live 23 April 2024
Show Abstract + Hide Abstract
In the context of the advancing digital landscape, there is a discernible demand for robust and defensible method- ologies in addressing the challenges in multi-class image classification. The evolution of intelligent systems man- dates swift evaluations of environmental variables to facilitate decision-making within an authorized workflow. Recognizing the imperative role of ensemble models, this paper undertakes an exploration into the efficacy of layered Convolutional Neural Network (CNN) architectures for the nuanced task of multi-class image classifica- tion, specifically applied to traffic signage recognition in the dynamic context of a moving vehicle. The research methodology employs a YOLO (You Only Look Once) model to establish a comprehensive training and testing dataset. Subsequently, a stratified approach is adopted, leveraging layered CNN architectures to categorize clus- ters of objects and, ultimately, extrapolate the pertinent speed limit values. Our endeavor aims to elucidate the procedural framework for integrating CNN models, providing insights into their accuracy within the application domain.
13034-20
Author(s): Euan McLeod, Wyant College of Optical Sciences (United States)
23 April 2024 • 11:10 AM - 11:30 AM EDT | National Harbor 3
Show Abstract + Hide Abstract
In lensfree microscopy, the sample is placed close to the image sensor without any imaging lenses in between. This configuration provides the benefits of low cost and compact hardware assemblies as well an ultra-large field of view and a high space-bandwidth product. Image focusing and reconstruction are performed computationally, relying on algorithms such as pixel superresolution and the angular spectrum method of propagation. We present recent progress on improving the resolution to characterize nanoscale materials, protein sensing, ultrafine air pollution monitoring, and high resolution incoherent (fluorescent) imaging.
Poster Session
23 April 2024 • 6:00 PM - 7:30 PM EDT | Potomac C
Conference attendees are invited to attend the symposium-wide poster session on Tuesday evening. Come view the SPIE DCS posters, enjoy light refreshments, ask questions, and network with colleagues in your field. Authors of poster papers will be present to answer questions concerning their papers. Attendees are required to wear their conference registration badges to the poster session.

Poster Setup: Tuesday 12:00 PM - 5:30 PM
Poster authors, view poster presentation guidelines and set-up instructions at http://spie.org/DCSPosterGuidelines.
13034-21
Author(s): Indranil Sinharoy, Aditya Dave, Lianjun Li, Hao Chen, SAMSUNG Research America (United States); Doyoon Kim, SAMSUNG Electronics Co., Ltd. (Korea, Republic of); Abhishek Sehgal, SAMSUNG Research America (United States)
On demand | Presented live 23 April 2024
Show Abstract + Hide Abstract
The capacity to track eyeball movements beneath closed eyelids holds significant promise across commercial, security, and medical domains. Our work presents a simple, effective, non-invasive method for closed-eye eyeball motion detection using videos. This method relies on detecting the temporal variations in eyelid shadows cast by the eyeball bulge in the subject’s video following face alignment and video registration. We key points used for face alignment and video registration are the detected facial landmarks. The eye movement signals derived using the presented technique closely correlate with simultaneously captured electrooculography (EOG) signals. We showcase the potential of fusing the eyeball movement signals obtained thus with data acquired from ultra-wideband (UWB) or millimeter-wave (mmWave) Doppler sensors. This fusion, supported by machine learning-based algorithms, enables the classification of sleep stages in a smart sleep chair that is designed to enhance and extend good quality sleep.
13034-22
Author(s): Aly Sultan, Bruno Morais, Raksha Ramkumar, Mehrshad Zandigohar, Gunar Schirner, Northeastern Univ. (United States)
On demand | Presented live 23 April 2024
Show Abstract + Hide Abstract
Neural networks remain susceptible to adversarial attacks with defenses largely individual or ensemble-based. Many ensemble strategies ensure robustness at the cost of large sizes, complicating edge device deployment. This paper presents the Categorized Ensemble Networks (CAEN), which bolsters defense using fewer ensemble members. Based on the tendency of models to misclassify similar classes under attack and the enhancement of defense through soft label assignments, CAEN streamlines ensemble training and achieves superior accuracy improvements. With only two networks in its ensemble, CAEN reduces runtime FLOPs by 16%, making it apt for edge device deployment.
13034-24
Author(s): Benjamin Hand, Colleen P. Bailey, Univ. of North Texas (United States)
On demand | Presented live 23 April 2024
Show Abstract + Hide Abstract
The TSA is responsible for air travel safety but faces a significant challenge. Recent studies show an alarming 80% failure rate in threat detection at most screening locations, primarily due to heavy reliance on human judgment. With more than 50,000 TSA officers screening over 2 million passengers daily, it is essential to address this issue promptly, as evidenced by a 42% increase in complaints related to TSA screening over the past year. These complaints underscore the pressing need for improved threat detection procedures in airport security. In response to these critical concerns, we present a novel and efficient neural network as a potential solution, specifically designed to mitigate the identified shortcomings in the TSA's threat detection capabilities. By reducing the overall complexity of larger models, through the application of advanced layers and an artfully configured structure, we achieve a solution that maximizes efficiency without compromising accuracy.
13034-25
Author(s): Cemre Omer Ayna, Ali Cafer Gurbuz, Mississippi State Univ. (United States)
On demand | Presented live 23 April 2024
Show Abstract + Hide Abstract
Color Filter Arrays (CFA) are essential for capturing color information in digital cameras. Virtually all modern CFAs are hand-designed patterns with different physical and application-specific considerations, with a group of machine learning (ML)-based methods that have arrived recently for learning an optimum CFA, particularly for demosaicing. However, these methods either work on multispectral color configurations or learn a unique channel per mask pixel, which fails to consider the physical limitations and algorithmic complications in commercial digital cameras. This study proposes a learnable discrete CFA layer as a part of a neural network for demosaicing. The joint nature of the learning pipeline allows the proposed CFA layer to be adapted to different problems and applications for specific devices, while the discrete band selection allows the learned CFA to be applied physically to commercial cameras. The analysis shows that the proposed filter provides higher-quality images than hand-designed CFAs.
13034-26
Author(s): Sayed Asaduzzaman, Daniel Newman, Coenen Casey, Univ. of North Dakota (United States)
23 April 2024 • 6:00 PM - 7:30 PM EDT | Potomac C
Show Abstract + Hide Abstract
This study introduces a novel deep learning method using modified VGG16-InceptionNet models for accurately identifying burn severity from skin images. It addresses the limitations of traditional visual examination methods in burn assessment, especially in resource-limited settings. By employing advanced transfer learning techniques, with added dense and classification layers specifically for burn detection, the model shows remarkable precision (96.69%) in classifying burn degrees. The approach is thoroughly tested on diverse datasets, including augmented images, to ensure robustness against varying image conditions. Image augmentation and careful hyperparameter tuning are crucial to the model's success, with a focus on preventing overfitting. This research not only surpasses existing methods in accuracy and reliability but also holds significant potential for immediate diagnostic application in emergency and remote medical situations. The deep transfer learning framework promises to enhance patient care in burn evaluation by offering a rapid and precise diagnostic tool, potentially reducing treatment time and improving outcomes in burn injuries.
Digital Posters
The posters listed below are available exclusively for online viewing during the week of SPIE Defense + Commercial Sensing 2024.
Conference Chair
The Univ. of Texas at Dallas (United States)
Conference Chair
The Univ. of Texas at Tyler (United States)
Program Committee
Univ. of North Carolina at Chapel Hill School of Medicine (United States)
Program Committee
Univ. of Virginia (United States)
Program Committee
Univ. of Central Florida (United States)
Program Committee
Arkansas State Univ. (United States)
Program Committee
Kui Liu
Zebra (United States)
Program Committee
Texas A&M Univ. Corpus Christi (United States)
Program Committee
Instituto Politécnico Nacional (Mexico)
Program Committee
SAMSUNG Research America (United States)
Program Committee
Silesian Univ. of Technology (Poland)
Program Committee
The MITRE Corp. (United States)
Additional Information

View call for papers

 

What you will need to submit:

  • Presentation title
  • Author(s) information
  • Speaker biography (1000-character max including spaces)
  • Abstract for technical review (200-300 words; text only)
  • Summary of abstract for display in the program (50-150 words; text only)
  • Keywords used in search for your paper (optional)
  • Check the individual conference call for papers for additional requirements (i.e. extended abstract PDF upload for review or instructions for award competitions)
Note: Only original material should be submitted. Commercial papers, papers with no new research/development content, and papers with proprietary restrictions will not be accepted for presentation.