Share Email Print
cover

Proceedings Paper

From image captioning to video summary using deep recurrent networks and unsupervised segmentation
Author(s): Bogdan-Andrei Morosanu; Camelia Lemnaru
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Automatic captioning systems based on recurrent neural networks have been tremendously successful at providing realistic natural language captions for complex and varied image data. We explore methods for adapting existing models trained on large image caption data sets to a similar problem, that of summarising videos using natural language descriptions and frame selection. These architectures create internal high level representations of the input image that can be used to define probability distributions and distance metrics on these distributions. Specifically, we interpret each hidden unit inside a layer of the caption model as representing the un-normalised log probability of some unknown image feature of interest for the caption generation process. We can then apply well understood statistical divergence measures to express the difference between images and create an unsupervised segmentation of video frames, classifying consecutive images of low divergence as belonging to the same context, and those of high divergence as belonging to different contexts. To provide a final summary of the video, we provide a group of selected frames and a text description accompanying them, allowing a user to perform a quick exploration of large unlabeled video databases.

Paper Details

Date Published: 13 April 2018
PDF: 8 pages
Proc. SPIE 10696, Tenth International Conference on Machine Vision (ICMV 2017), 106960P (13 April 2018); doi: 10.1117/12.2310071
Show Author Affiliations
Bogdan-Andrei Morosanu, Technical Univ. of Cluj-Napoca (Romania)
Camelia Lemnaru, Technical Univ. of Cluj-Napoca (Romania)


Published in SPIE Proceedings Vol. 10696:
Tenth International Conference on Machine Vision (ICMV 2017)
Antanas Verikas; Petia Radeva; Dmitry Nikolaev; Jianhong Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top