
Proceedings Paper
Reducing weight precision of convolutional neural networks towards large-scale on-chip image recognitionFormat | Member Price | Non-Member Price |
---|---|---|
$17.00 | $21.00 |
Paper Abstract
In this paper, we develop a server-client quantization scheme to reduce bit resolution of deep learning architecture, i.e., Convolutional Neural Networks, for image recognition tasks. Low bit resolution is an important factor in bringing the deep learning neural network into hardware implementation, which directly determines the cost and power consumption. We aim to reduce the bit resolution of the network without sacrificing its performance. To this end, we design a new quantization algorithm called supervised iterative quantization to reduce the bit resolution of learned network weights. In the training stage, the supervised iterative quantization is conducted via two steps on server – apply k-means based adaptive quantization on learned network weights and retrain the network based on quantized weights. These two steps are alternated until the convergence criterion is met. In this testing stage, the network configuration and low-bit weights are loaded to the client hardware device to recognize coming input in real time, where optimized but expensive quantization becomes infeasible. Considering this, we adopt a uniform quantization for the inputs and internal network responses (called feature maps) to maintain low on-chip expenses. The Convolutional Neural Network with reduced weight and input/response precision is demonstrated in recognizing two types of images: one is hand-written digit images and the other is real-life images in office scenarios. Both results show that the new network is able to achieve the performance of the neural network with full bit resolution, even though in the new network the bit resolution of both weight and input are significantly reduced, e.g., from 64 bits to 4-5 bits.
Paper Details
Date Published: 20 May 2015
PDF: 9 pages
Proc. SPIE 9496, Independent Component Analyses, Compressive Sampling, Large Data Analyses (LDA), Neural Networks, Biosystems, and Nanoengineering XIII, 94960A (20 May 2015); doi: 10.1117/12.2176598
Published in SPIE Proceedings Vol. 9496:
Independent Component Analyses, Compressive Sampling, Large Data Analyses (LDA), Neural Networks, Biosystems, and Nanoengineering XIII
Harold H. Szu; Liyi Dai; Yufeng Zheng, Editor(s)
PDF: 9 pages
Proc. SPIE 9496, Independent Component Analyses, Compressive Sampling, Large Data Analyses (LDA), Neural Networks, Biosystems, and Nanoengineering XIII, 94960A (20 May 2015); doi: 10.1117/12.2176598
Show Author Affiliations
Zhengping Ji, Samsung Semiconductor, Inc. (United States)
Ilia Ovsiannikov, Samsung Semiconductor, Inc. (United States)
Yibing Wang, Samsung Semiconductor, Inc. (United States)
Ilia Ovsiannikov, Samsung Semiconductor, Inc. (United States)
Yibing Wang, Samsung Semiconductor, Inc. (United States)
Lilong Shi, Samsung Semiconductor, Inc. (United States)
Qiang Zhang, Samsung Semiconductor, Inc. (United States)
Qiang Zhang, Samsung Semiconductor, Inc. (United States)
Published in SPIE Proceedings Vol. 9496:
Independent Component Analyses, Compressive Sampling, Large Data Analyses (LDA), Neural Networks, Biosystems, and Nanoengineering XIII
Harold H. Szu; Liyi Dai; Yufeng Zheng, Editor(s)
© SPIE. Terms of Use
