Share Email Print
cover

Proceedings Paper

Plagiarism-detection framework for digital libraries
Author(s): Antonio Si; Rynson W.H. Lau; Hong Va Leong
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

In digital libraries, documents are in digital forms and they are especially vulnerable from being copied. Existing copy detection methods exhaustively compare every single sentence of two documents and identify the degree of overlapping of the two documents. This approach is not scalable as the number of sentences for each document is often plentiful. In this paper, we propose a copy detection mechanism which could eliminate unnecessary comparisons. This is achieved by pre-parsing the documents to quantify their semantic meanings; comparisons between documents describing different topics could be eliminated s it will not serve any purpose to copy from a document describing an unrelated topic. This process is recursively applied to sections, subsections, subsubsections, etc. until we find two paragraphs which are highly related semantically. The paragraphs are then compared in detail, i.e., per-sentence basis, to determine if the paragraphs are overlapped in a substantive way. The parsing process is based on document retrieval techniques with some helpful heuristics that extract keywords from the documents to index the semantics for each document, section, subsection, and so forth. Weights based on relative occurrences of the keywords are assigned to individual keywords to form a keyword vector. The semantic relationships between different documents, sections, subsections, or paragraphs can be represented by the dot product of their corresponding keyword vectors as in document retrieval systems.

Paper Details

Date Published: 30 September 1996
PDF: 10 pages
Proc. SPIE 2898, Electronic Imaging and Multimedia Systems, (30 September 1996); doi: 10.1117/12.253374
Show Author Affiliations
Antonio Si, Hong Kong Polytechnic Univ. (Hong Kong)
Rynson W.H. Lau, Hong Kong Polytechnic Univ. (Hong Kong)
Hong Va Leong, Hong Kong Polytechnic Univ. (Hong Kong)


Published in SPIE Proceedings Vol. 2898:
Electronic Imaging and Multimedia Systems
Chung-Sheng Li; Robert L. Stevenson; LiWei Zhou, Editor(s)

© SPIE. Terms of Use
Back to Top