Tools That Assess Bias in Standardized Tests Are Flawed
Overturning more than 40 years of accepted practice, new research proves that the tools used to check tests of “general mental ability” for bias are themselves flawed. This key finding challenges reliance on such exams to make objective decisions for employment or academic admissions even in the face of well-documented gaps between mean scores of white and minority populations.
The study, published in the July issue of the Journal of Applied Psychology, investigated an amalgam of scores representing a vast sample of commonly used tests, including civil service or other pre-employment exams and university entrance exams.
“Test bias” means that two people with different ethnicity or gender, for example, who have the same test score are predicted to have different “scores” on the outcome (e.g., job performance); thus a biased test might benefit certain groups over others. Decades of earlier research consistently found no evidence of test bias against ethnic minorities, but the current study challenges this established belief.
“For generations, important decisions have been made about life-changing opportunities in employment and education based on results of these tests -- but we can no longer say with certainty they are unbiased,” said Herman Aguinis, professor of organizational behavior and human resources and director of the Kelley School’s new Institute for Global Organizational Effectiveness.
He led the study, which was co-authored by Steven A. Culpepper at the University of Colorado Denver and Charles A. Pierce at the University of Memphis.
“Our findings are significant because we proved that bias can be present but not be detected by even the top experts in the field, which could result in inaccurate prediction of outcomes such as job and academic performance for hundreds of thousands, if not millions, of individuals,” Aguinis said.
To reach these conclusions, Aguinis and his co-authors created the largest simulation of its kind -- using nearly 16 million individual samples to yield more than eight trillion pairs of individual test/outcome scores. They built bias into most samples to resemble real-world results and used newly available super computing technology and power to check tens of billions of scores. They found the procedures in use today overwhelmingly and repeatedly missed the bias inserted in the data.
Few topics in human resource management have generated more public attention than bias in pre-employment and academic-admissions exams.
“The belief in the fairness of the tests and the accuracy of the gauges to check them has been so deeply engrained that to challenge them would be akin to questioning the sun as center of the solar system,” said Aguinis, a nationally recognized expert who was also a co-author of an amicus brief in the landmark Ricci v. DeStefano Supreme Court case regarding employment testing.
“The irony is that for 40 years we have been trying to assess potential test bias with a biased procedure and we now see that countless people may have been denied or given opportunities unfairly,” he added. “From an ethical standpoint it may be argued that even if only one individual is affected this way, that is one too many. The problem is obviously magnified when we are dealing with hundreds of thousands, if not millions, of individuals taking standardized tests every year.”
Prelude to a New Era?
Given the weight placed on such testing and the polarizing nature of the underlying racial/ethnic achievement gap, the authors expect the study will spur considerable controversy among the public and the academic, legal and policy communities, all of which will question the long-held belief that tests are unbiased.
They also anticipate a significant impact on the multi-billion dollar testing industry but made clear that they are not saying that any organization is deliberately using biased tests. However, as a preliminary step while more research is conducted, it is likely that many organizations will examine their existing tests and perhaps create new ones.
“While the academic community has demonstrated repeatedly that different racial or ethnic groups’ cultural frames of reference and identity may play a role in affecting test scores, we have not used that knowledge to sufficiently advance testing processes,” he said. “We sincerely hope that this research opens doors to thoughtful and important analysis that will allow us to legitimately assign scores that predict a job well done.”