Share Email Print

Proceedings Paper

Data mining of text as a tool in authorship attribution
Author(s): Ari J. E. Visa; Jarmo Toivonen; Sami Autio; Jarno Maekinen; Barbro Back; Hannu Vanharanta
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.

Paper Details

Date Published: 27 March 2001
PDF: 8 pages
Proc. SPIE 4384, Data Mining and Knowledge Discovery: Theory, Tools, and Technology III, (27 March 2001); doi: 10.1117/12.421068
Show Author Affiliations
Ari J. E. Visa, Tampere Univ. of Technology (Finland)
Jarmo Toivonen, Tampere Univ. of Technology (Finland)
Sami Autio, Tampere Univ. of Technology (Finland)
Jarno Maekinen, Tampere Univ. of Technology (Finland)
Barbro Back, Abo Akademi Univ. (Finland)
Hannu Vanharanta, Pori School of Technology and Economics (Finland)

Published in SPIE Proceedings Vol. 4384:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology III
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top