Share Email Print

Proceedings Paper

Linguistically informed digital fingerprints for text
Author(s): Özlem Uzuner
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.

Paper Details

Date Published: 15 February 2006
PDF: 12 pages
Proc. SPIE 6072, Security, Steganography, and Watermarking of Multimedia Contents VIII, 607208 (15 February 2006); doi: 10.1117/12.650278
Show Author Affiliations
Özlem Uzuner, Univ. at Albany, State Univ. of New York (United States)

Published in SPIE Proceedings Vol. 6072:
Security, Steganography, and Watermarking of Multimedia Contents VIII
Edward J. Delp III; Ping Wah Wong, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?