Share Email Print

Proceedings Paper

Low-complexity object detection with deep convolutional neural network for embedded systems
Author(s): Subarna Tripathi; Byeongkeun Kang; Gokce Dane; Truong Nguyen
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We investigate low-complexity convolutional neural networks (CNNs) for object detection for embedded vision applications. It is well-known that consolidation of an embedded system for CNN-based object detection is more challenging due to computation and memory requirement comparing with problems like image classification. To achieve these requirements, we design and develop an end-to-end TensorFlow (TF)-based fully-convolutional deep neural network for generic object detection task inspired by one of the fastest framework, YOLO.1 The proposed network predicts the localization of every object by regressing the coordinates of the corresponding bounding box as in YOLO. Hence, the network is able to detect any objects without any limitations in the size of the objects. However, unlike YOLO, all the layers in the proposed network is fully-convolutional. Thus, it is able to take input images of any size. We pick face detection as an use case. We evaluate the proposed model for face detection on FDDB dataset and Widerface dataset. As another use case of generic object detection, we evaluate its performance on PASCAL VOC dataset. The experimental results demonstrate that the proposed network can predict object instances of different sizes and poses in a single frame. Moreover, the results show that the proposed method achieves comparative accuracy comparing with the state-of-the-art CNN-based object detection methods while reducing the model size by 3× and memory-BW by 3 − 4× comparing with one of the best real-time CNN-based object detectors, YOLO. Our 8-bit fixed-point TF-model provides additional 4× memory reduction while keeping the accuracy nearly as good as the floating-point model. Moreover, the fixed- point model is capable of achieving 20× faster inference speed comparing with the floating-point model. Thus, the proposed method is promising for embedded implementations.

Paper Details

Date Published: 19 September 2017
PDF: 15 pages
Proc. SPIE 10396, Applications of Digital Image Processing XL, 103961M (19 September 2017); doi: 10.1117/12.2275512
Show Author Affiliations
Subarna Tripathi, Univ. of California, San Diego (United States)
Byeongkeun Kang, Univ. of California, San Diego (United States)
Gokce Dane, Qualcomm Inc. (United States)
Truong Nguyen, Univ. of California, San Diego (United States)

Published in SPIE Proceedings Vol. 10396:
Applications of Digital Image Processing XL
Andrew G. Tescher, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?