Share Email Print

Proceedings Paper

Observers' ability to judge the similarity of clustered calcifications on mammograms
Author(s): Robert M. Nishikawa; Yongyi Yang; Dezheng Huo; Miles Wernick; Charlene A. Sennett; John Papaioannou; Liyang Wei
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

We are comparing two different methods for obtaining the radiologists’ subjective impression of similarity, for application in distinguishing benign from malignant lesions. Thirty pairs of mammographic clustered calcifications were used in this study. These 30 pairs were rated on a 5-point scale as to their similarity, where 1 was nearly identical and 5 was not at all similar. After this, all possible combinations of pairs of pairs were shown to the reader (n=435) and the reader selected which pair was most similar. This experiment was repeated by the observers with at least a week between reading sessions. Using analysis of variance, intra-class correlation coefficients (ICC) were calculated for both absolute scoring method and paired comparison method. In addition, for the paired comparison method, the coefficient of consistency within each reader was calculated. The average coefficient of consistence for the 4 readers was 0.88 (range 0.49-0.97). These results were statistically significant different from guessing at p << 0.0001. The ICC for intra-reader agreement was 0.51 (0.37-0.66 95% CI) for the absolute method and 0.82 (0.73-0.91 95% CI) for the paired comparison method. This difference was statistically significant (p=0.001). For the inter-reader agreement, the ICC for the absolute method was 0.39 (0.21-0.57 95% CI) and 0.37 (0.18-0.56 95% CI) for the paired comparison method. We conclude that humans are able to judge similarity of clustered calcifications in a meaningful way. Further, radiologists had greater intra-reader agreement when using the paired comparison method than when using an absolute rating scale. Differences in the criteria used by different observers to judge similarity and differences in interpreting which calcifications comprise the cluster can lead to low ICC values for inter-reader agreement for both methods.

Paper Details

Date Published: 4 May 2004
PDF: 7 pages
Proc. SPIE 5372, Medical Imaging 2004: Image Perception, Observer Performance, and Technology Assessment, (4 May 2004); doi: 10.1117/12.536571
Show Author Affiliations
Robert M. Nishikawa, Univ. of Chicago (United States)
Yongyi Yang, Illinois Institute of Technology (United States)
Dezheng Huo, Univ. of Chicago (United States)
Miles Wernick, Illinois Institute of Technology (United States)
Charlene A. Sennett, Univ. of Chicago (United States)
John Papaioannou, Univ. of Chicago (United States)
Liyang Wei, Illinois Institute of Technology (United States)

Published in SPIE Proceedings Vol. 5372:
Medical Imaging 2004: Image Perception, Observer Performance, and Technology Assessment
Dev P. Chakraborty; Miguel P. Eckstein, Editor(s)

© SPIE. Terms of Use
Back to Top