Paper 13051-36
Task-agnostic feature extractors for online learning at the edge
On demand | Presented live 24 April 2024
Abstract
Machine learning (ML) at the edge typically involves pushing deep neural network (DNN) models ever closer to the sensor. In practice, a DNN deployed to a dynamic environment will quickly become obsolete if it cannot be updated to accommodate new or modified classes. Especially for low-Size, Weight, and Power (SWaP) hardware, the data, hardware, and time requirements for retraining a DNN remain cost prohibitive. Technical challenges include 1) catastrophic forgetting where retraining only on new data overwrites prior knowledge; 2) class imbalance where there exists only a handful of novel class samples compared to the thousands of training samples; and 3) high energy costs required to run the backpropagation retraining for hours on high-end GPUs.
In this work, we evaluated the Constrained Few-Shot Class Incremental Learning (C-FSCIL) framework for sequentially learning the CIFAR100 dataset. The C-FSCIL framework modularizes the layers of an arbitrary DNN into 1) a frozen, pre-trained feature extractor, 2) a retrainable fully connected layer, and 3) an explicit prototype vector memory matrix. We investigated the effects of different task-agnostic feature extractors trained via fully supervised, weakly supervised, and self-supervised training. Using a CLIP-trained ConvNeXt-L for the frozen feature extractor, our C-FSCIL implementation sequentially learned 40 additional classes over the base session of 60 classes, with a final accuracy of 79.9% over the 100 classes, sacrificing only 7.2% points of accuracy from the base session.
In this work, we evaluated the Constrained Few-Shot Class Incremental Learning (C-FSCIL) framework for sequentially learning the CIFAR100 dataset. The C-FSCIL framework modularizes the layers of an arbitrary DNN into 1) a frozen, pre-trained feature extractor, 2) a retrainable fully connected layer, and 3) an explicit prototype vector memory matrix. We investigated the effects of different task-agnostic feature extractors trained via fully supervised, weakly supervised, and self-supervised training. Using a CLIP-trained ConvNeXt-L for the frozen feature extractor, our C-FSCIL implementation sequentially learned 40 additional classes over the base session of 60 classes, with a final accuracy of 79.9% over the 100 classes, sacrificing only 7.2% points of accuracy from the base session.
Presenter
David Wise
Air Force Research Lab. (United States)