Impact of Differential Item Functioning on Subsequent Statistical Conclusions Based on Observed Test Score Data (Seccion METODOLOGICA)
Psicologica 2009, July-Dec, 30, 2
-
- 22,00 kr
-
- 22,00 kr
Publisher Description
Differential item functioning (DIF) has been widely studied in educational and psychological measurement. For recent reviews please see Camilli (2006) and Zumbo (2007). Previous research has primarily focused on the definitions of and the methods for detecting DIF. It is well accepted that the presence of DIF might degrade the validity of a test. There is relatively little known, however, about the impact of DIF on later statistical decisions when one uses the observed test scores in data analyses and corresponding statistical hypothesis tests. For example, let us imagine that a researcher is investigating whether there are gender differences on a language proficiency test. What is the impact of gender-based differential item functioning on the eventual statistical decision of whether the group means (male versus female) of the observed scores on the language proficiency test are equal? There is remarkably little research to help one directly answer this question. DIF may be present in a test because either (a) DIF analyses have not been used as part of the item analyses, (b) it is there unbeknownst to the researcher, as an artifact of DIF detection being a statistical decision method, and hence true DIF items may be missed, or (c) as a result of the practice of leaving items flagged as DIF in a test. Irrespective of how the DIF items got there, it is still unknown how such DIF items affect the subsequent statistical results and conclusions, particularly, the Type I error rate and effect size of hypothesis tests from observed score test data.