Share Email Print

Proceedings Paper

Statistically constrained shallow text marking: techniques, evaluation paradigm, and results
Author(s): Brian Murphy; Carl Vogel
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with our automatic measure strongly (Pearson's r = 0.795, p = 0.001), allowing us to account for about two thirds of variability in human judgements. A moderate but statistically insignificant (Pearson's r = 0.422, p = 0.356) correlation is found with judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic measure may need to be extended.

Paper Details

Date Published: 27 February 2007
PDF: 9 pages
Proc. SPIE 6505, Security, Steganography, and Watermarking of Multimedia Contents IX, 65050Z (27 February 2007); doi: 10.1117/12.713355
Show Author Affiliations
Brian Murphy, Trinity College Dublin (Ireland)
Carl Vogel, Trinity College Dublin (Ireland)

Published in SPIE Proceedings Vol. 6505:
Security, Steganography, and Watermarking of Multimedia Contents IX
Edward J. Delp III; Ping Wah Wong, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?