Share Email Print

Proceedings Paper

Gold standards and expert panels: a pulmonary nodule case study with challenges and solutions
Author(s): Dave P. Miller; Kathryn F. O’Shaughnessy; Susan A. Wood; Ronald A. Castellino M.D.
Format Member Price Non-Member Price
PDF $17.00 $21.00

Paper Abstract

Comparative evaluations of reader performance using different modalities, e.g. CT with computer-aided detection (CAD) vs. CT without CAD, generally require a “truth” definition based on a gold standard. There are many situations in which a true invariant gold standard is impractical or impossible to obtain. For instance, small pulmonary nodules are generally not assessed by biopsy or resection. In such cases, it is common to use a unanimous consensus or majority agreement from an expert panel as a reference standard for actionability in lieu of the unknown gold standard for disease. Nonetheless, there are three major concerns about expert panel reference standards: (1) actionability is not synonymous with disease (2) it may be possible to obtain different conclusions about which modality is better using different rules (e.g. majority vs. unanimous consensus), and (3) the variability associated with the panelists is not formally captured in the p-values or confidence intervals that are generally produced for estimating the extent to which one modality is superior to the other. A multi-reader-multi-case (MRMC) receiver operating characteristic (ROC) study was performed using 90 cases, 15 readers, and a reference truth based on 3 experienced panelists. The primary analyses were conducted using a reference truth of unanimous consensus regarding actionability (3 out of 3 panelists). To assess the three concerns noted above: (1) additional data from the original radiology reports were compared to the panel (2) the complete analysis was repeated using different definitions of truth, and (3) bootstrap analyses were conducted in which new truth panels were constructed by picking 1, 2, or 3 panelists at random. The definition of the reference truth affected the results for each modality (CT with CAD and CT without CAD) considered by itself, but the effects were similar, so the primary analysis comparing the modalities was robust to the choice of the reference truth.

Paper Details

Date Published: 4 May 2004
PDF: 12 pages
Proc. SPIE 5372, Medical Imaging 2004: Image Perception, Observer Performance, and Technology Assessment, (4 May 2004); doi: 10.1117/12.544716
Show Author Affiliations
Dave P. Miller, Ovation Research Group (United States)
Kathryn F. O’Shaughnessy, R2 Technology, Inc. (United States)
Susan A. Wood, R2 Technology, Inc. (United States)
Ronald A. Castellino M.D., R2 Technology, Inc. (United States)

Published in SPIE Proceedings Vol. 5372:
Medical Imaging 2004: Image Perception, Observer Performance, and Technology Assessment
Dev P. Chakraborty; Miguel P. Eckstein, Editor(s)

© SPIE. Terms of Use
Back to Top
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research
Forgot your username?