Share Email Print

Proceedings Paper

ProNet: an accurate and light-weight CNN model for retail products recognition
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Nowadays, retail products recognition technologies are mostly based on traditional two-stages computer vision methods. Those methods first create features manually, followed by a classification algorithm to distinguish all products. Since deep learning methods have achieved state-of-the-art results on many tasks and have unified pipelines, it would be promising to apply deep models into products recognition. In this paper, we have built up a new light CNN architecture named ProNet for this task. The 27-layers ProNet combines the advantages of ResNet and Mobilenet. Depth-wise separable convolution and residual connection are two main operations in the architecture design. Depth-wise separable convolution is used to cut down the computation cost. Residual connection is used to help network learn better feature representations and converge to a better point during training. Compared with other commonly used CNN architectures, our ProNet is relatively computational efficient, but it can still get good performances on several public datasets. We first test ProNet architecture on ImageNet dataset. Top 1 average accuracy of 70.8% is got. After that, we test ProNet on another public dataset ALOI and our own task-specific retail products dataset GroOpt using transfer learning. Using this base model, we get an average accuracy of 98% on ALOI and 96% on GroOpt, which are both much higher than traditional SIFT based methods. Results show that ProNet is an accurate model. To make ProNet transferable in other environments, we apply the following two strategies: (1) a white balance augmentation algorithm to randomly change the RGB ratio of every image. (2) add another linear classifier on top feature maps to help distinguish very similar samples. Using augmented training set and modified model, we have trained ProNetV2. This improved version gets an accuracy of 99% on both ALOI and GroOpt. We have also embedded ProNetV2 model into a smart phone with 2GB RAM and test it under different situations, including different light illuminations, backgrounds, etc. An average accuracy of 96% and processing time of 0.1s per image are reached. Those results prove the effectiveness and usefulness of our proposed networks.

Paper Details

Date Published: 24 July 2018
PDF: 9 pages
Proc. SPIE 10827, Sixth International Conference on Optical and Photonic Engineering (icOPEN 2018), 1082715 (24 July 2018); doi: 10.1117/12.2326939
Show Author Affiliations
Wei Yi, Zhejiang Univ. (China)
Yaoran Sun, Zhejiang Univ. (China)
Sailing He, Zhejiang Univ. (China)

Published in SPIE Proceedings Vol. 10827:
Sixth International Conference on Optical and Photonic Engineering (icOPEN 2018)
Yingjie Yu; Chao Zuo; Kemao Qian, Editor(s)

© SPIE. Terms of Use
Back to Top