Figure 1. (a) A receiver operating characteristic (ROC) curve. (b) The alternative ROC curve of (a). FAR: False alarm rate. FRR: False rejection rate.
Table 1. Contingency table for binary assessment
The first limitation is that cost is not reflected in the ROC curve. The design of a biometric system required for a highly secured environment will be very different from one used for personal computer log-ins. In the high-security scenario, even one falsely accepted terrorist or criminal can cause substantial damage to the facility. For that reason, the FAR is a high priority. In contrast, for personal home computer log-ins, convenience is an important consideration, and the FRR counts more.
Unfortunately, the ROC curve cannot reflect the cost of classification, that is, errors. The equal error rate (EER) in an ROC curve—the point where the FAR equals the FRR—can be misleading. Figure 2 shows two crossed ROC curves with identical EERs. For situations where security is paramount, the FAR is weighted more and system 1 (the solid curve) would be preferable. For consumer electronics applications, on the other hand, system 2 (the dashed curve) has the advantage.
An example of two crossed ROC curves with the same equal error rate (EER).3
The second limitation is that the ROC curve gives no indication of optimal threshold. It is not sensitive to the bias of a system to misclassify one way or the other and, more important, the ROC curve cannot predict the optimum threshold for a system or the threshold's accuracy.
Thirdly, the ROC curve ignores the amount of data. Database size is a critical parameter affecting biometric accuracy. For the same system, the FRR and FAR will increase along with the size of the database. Yet ROC curves say nothing about how large a data set is. It is impossible to compare ROC curves of biometric systems tested on different information repositories.
Finally, variable data affect the ROC curve's predictive power. The condition of the data can affect the performance of a biometric system. Low quality dramatically decreases accuracy, yet ROC curves do not reflect the state of the data used in recognition. Consequently, when quality is variable, it is difficult to predict a biometric system's performance based purely on its ROC curve.
There are additional concerns too. Other factors can affect the accuracy and performance of biometric systems that ROC curves cannot measure. Examples include recognition time, testing and evaluation protocol, template size (the amount of computer memory taken up by the biometric data), failure-to-enroll rate (the proportion of the population of end-users who fail to complete enrollment), comfort, convenience, and acceptability.Possible solutions
We propose a 3D combinational accuracy curve, shown in Figure 3, as one way of obtaining a balanced assessment of FAR, FRR, threshold T, and cost. Six 2D curves can be derived from the 3D combinational accuracy curve: the conventional 2D ROC curve; the 2D curve of (FRR, T); the 2D curve of (FAR, T); the 2D curve of (FRR, cost), the 2D curve of (FAR, cost); and the 2D curve of (T, cost). A 3D combinational performance curve can be derived from the 3D combinational accuracy curve, which weighs security, convenience, T, and cost. Overall, these curves provide more comprehensive information about system accuracy and performance than the ROC curve alone.
In addition, different systems should be tested and evaluated using identical database(s) and testing and evaluation protocols. The National Institute of Standards and Technology has taken a lead in evaluating and testing biometric systems and algorithms, such as the Iris Evaluation Challenge, the Face Recognition Grand Challenge, and the Face Recognition Vendor Test.5
Appropriate metrics for data quality should be included when evaluating system performance and accuracy. For example, the feature information-based method objectively assesses the quality of an iris image, which helps in comparing system accuracy when using different data sets.6 Because performance involves a number of factors, measures should be adopted to facilitate comparisons between systems. Examples include, but are not limited to, the 3D combinational accuracy and performance curves, quality assessment, ease of use, and user acceptance.
Department of Electrical and Computer Engineering
Indiana University-Purdue University Indianapolis
Yingzi Du is an assistant professor with the Department of Electrical and Computer Engineering. Her research interests include biometrics, image processing, and pattern recognition. She is a member of SPIE, IEEE, Phi Kappa Phi, and Tau Beta Pi. She received an Office of Naval Research Young Investigator Program award in 2007.
Department of Computer Science and Electrical Engineering
University of Maryland, Baltimore County
Chein-I Chang, now a professor, received his PhD in electrical engineering from the University of Maryland, College Park. He has authored a book titled Hyperspectral Imaging, and has published 90 journal articles. He is a SPIE Fellow and associate editor of IEEE Transactions on Geoscience and Remote Sensing.