Summary: A new study reports facial recognition experts perform better with deep convolutional neural networks assisting them in their jobs rather than other humans.
Source: NIST.
Experts at recognizing faces often play a crucial role in criminal cases. A photo from a security camera can mean prison or freedom for a defendant–and testimony from highly trained forensic face examiners informs the jury whether that image actually depicts the accused. Just how good are facial recognition experts? Would artificial intelligence help?
A study appearing today in the Proceedings of the National Academy of Sciences has brought answers. In work that combines forensic science with psychology and computer vision research, a team of scientists from the National Institute of Standards and Technology (NIST) and three universities has tested the accuracy of professional face identifiers, providing at least one revelation that surprised even the researchers: Trained human beings perform best with a computer as a partner, not another person.
“This is the first study to measure face identification accuracy for professional forensic facial examiners, working under circumstances that apply in real-world casework,” said NIST electronic engineer P. Jonathon Phillips. “Our deeper goal was to find better ways to increase the accuracy of forensic facial comparisons.”
The team’s effort began in response to a 2009 report by the National Research Council, “Strengthening Forensic Science in the United States: A Path Forward”, which underscored the need to measure the accuracy of forensic examiner decisions.
The NIST study is the most comprehensive examination to date of face identification performance across a large, varied group of people. The study also examines the best technology as well, comparing the accuracy of state-of-the-art face recognition algorithms to human experts.
Their result from this classic confrontation of human versus machine? Neither gets the best results alone. Maximum accuracy was achieved with a collaboration between the two.
“Societies rely on the expertise and training of professional forensic facial examiners, because their judgments are thought to be best,” said co-author Alice O’Toole, a professor of cognitive science at the University of Texas at Dallas. “However, we learned that to get the most highly accurate face identification, we should combine the strengths of humans and machines.”
The results arrive at a timely moment in the development of facial recognition technology, which has been advancing for decades, but has only very recently attained competence approaching that of top-performing humans.
“If we had done this study three years ago, the best computer algorithm’s performance would have been comparable to an average untrained student,” Phillips said. “Nowadays, state-of-the-art algorithms perform as well as a highly trained professional.”
The study itself involved a total of 184 participants, a large number for an experiment of this type. Eighty-seven were trained professional facial examiners, while 13 were “super recognizers,” a term implying exceptional natural ability. The remaining 84–the control groups–included 53 fingerprint examiners and 31 undergraduate students, none of whom had training in facial comparisons.
For the test, the participants received 20 pairs of face images and rated the likelihood of each pair being the same person on a seven-point scale. The research team intentionally selected extremely challenging pairs, using images taken with limited control of illumination, expression and appearance. They then tested four of the latest computerized facial recognition algorithms, all developed between 2015 and 2017, using the same image pairs.
Three of the algorithms were developed by Rama Chellappa, a professor of electrical and computer engineering at the University of Maryland, and his team, who contributed to the study. The algorithms were trained to work in general face recognition situations and were applied without modification to the image sets.
One of the findings was unsurprising but significant to the justice system: The trained professionals did significantly better than the untrained control groups. This result established the superior ability of the trained examiners, thus providing for the first time a scientific basis for their testimony in court.
The algorithms also acquitted themselves well, as might be expected from the steady improvement in algorithm performance over the past few years.
What raised the team’s collective eyebrows regarded the performance of multiple examiners. The team discovered that combining the opinions of multiple forensic face examiners did not bring the most accurate results.
“Our data show that the best results come from a single facial examiner working with a single top-performing algorithm,” Phillips said. “While combining two human examiners does improve accuracy, it’s not as good as combining one examiner and the best algorithm.”

Combining examiners and AI is not currently used in real-world forensic casework. While this study did not explicitly test this fusion of examiners and AI in such an operational forensic environment, results provide an roadmap for improving the accuracy of face identification in future systems.
While the three-year project has revealed that humans and algorithms use different approaches to compare faces, it poses a tantalizing question to other scientists: Just what is the underlying distinction between the human and the algorithmic approach?
“If combining decisions from two sources increases accuracy, then this method demonstrates the existence of different strategies,” Phillips said. “But it does not explain how the strategies are different.”
The research team also included psychologist David White from Australia’s University of New South Wales.
Source: Chad Boutin – NIST
Publisher: Organized by NeuroscienceNews.com.
Image Source: NeuroscienceNews.com image is credited to J. Stoughton/NIST.
Original Research: Open access research for “Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms” by P. Jonathon Phillips, Amy N. Yates, Ying Hu, Carina A. Hahn, Eilidh Noyes, Kelsey Jackson, Jacqueline G. Cavazos, Géraldine Jeckeln, Rajeev Ranjan, Swami Sankaranarayanan, Jun-Cheng Chen, Carlos D. Castillo, Rama Chellappa, David White, and Alice J. O’Toole in PNAS. Published April 30 2018.
doi:10.1073/pnas.1721355115
[cbtabs][cbtab title=”MLA”]NIST “Face Recognition Experts Perform Better with AI as Partners.” NeuroscienceNews. NeuroscienceNews, 29 May 2018.
<https://neurosciencenews.com/ai-face-recognition-9159/>.[/cbtab][cbtab title=”APA”]NIST (2018, May 29). Face Recognition Experts Perform Better with AI as Partners. NeuroscienceNews. Retrieved May 29, 2018 from https://neurosciencenews.com/ai-face-recognition-9159/[/cbtab][cbtab title=”Chicago”]NIST “Face Recognition Experts Perform Better with AI as Partners.” https://neurosciencenews.com/ai-face-recognition-9159/ (accessed May 29, 2018).[/cbtab][/cbtabs]
Abstract
Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms
Achieving the upper limits of face identification accuracy in forensic applications can minimize errors that have profound social and personal consequences. Although forensic examiners identify faces in these applications, systematic tests of their accuracy are rare. How can we achieve the most accurate face identification: using people and/or machines working alone or in collaboration? In a comprehensive comparison of face identification by humans and computers, we found that forensic facial examiners, facial reviewers, and superrecognizers were more accurate than fingerprint examiners and students on a challenging face identification test. Individual performance on the test varied widely. On the same test, four deep convolutional neural networks (DCNNs), developed between 2015 and 2017, identified faces within the range of human accuracy. Accuracy of the algorithms increased steadily over time, with the most recent DCNN scoring above the median of the forensic facial examiners. Using crowd-sourcing methods, we fused the judgments of multiple forensic facial examiners by averaging their rating-based identity judgments. Accuracy was substantially better for fused judgments than for individuals working alone. Fusion also served to stabilize performance, boosting the scores of lower-performing individuals and decreasing variability. Single forensic facial examiners fused with the best algorithm were more accurate than the combination of two examiners. Therefore, collaboration among humans and between humans and machines offers tangible benefits to face identification accuracy in important applications. These results offer an evidence-based roadmap for achieving the most accurate face identification possible.