Share Email Print
cover

Proceedings Paper

Fusing video and text data by integrating appearance and behavior similarity
Author(s): Georgiy Levchuk; Charlotte Shabarekh
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In this paper, we describe an algorithm for multi-modal entity co-reference resolution and present experimental results using text and motion imagery data sources. Our model generates probabilistic association between entities mentioned in text and detected in video data by jointly optimizing the measure of appearance and behavior similarity. Appearance similarity is calculated as a match between propositionderived entity attributes mentioned in text, and the object appearance classification from video sources. The behavior similarity is calculated based on the semantic information about entity movements, actions, and interactions with other entities mentioned in text and detected in video sources. Our model achieved 79% Fscore for text-to-video entity co-reference resolution; we show that entity interactions present unique features for resolving variability present in text data and ambiguity of visual appearance of entities.

Paper Details

Date Published: 28 May 2013
PDF: 10 pages
Proc. SPIE 8751, Machine Intelligence and Bio-inspired Computation: Theory and Applications VII, 875107 (28 May 2013); doi: 10.1117/12.2014878
Show Author Affiliations
Georgiy Levchuk, Aptima Inc. (United States)
Charlotte Shabarekh, Aptima Inc. (United States)


Published in SPIE Proceedings Vol. 8751:
Machine Intelligence and Bio-inspired Computation: Theory and Applications VII
Misty Blowers; Olga Mendoza-Schrock, Editor(s)

© SPIE. Terms of Use
Back to Top