中文摘要
|
Conventional differential item functioning (DIF) assessment methods tend to yield an inflated type I error rate and a deflated power rate when the tests contain many DIF items that favor the same group. To control type I error rates in DIF assessments under similar conditions, the DIF-free-then-DIF (DFTD) strategy is proposed. The DFTD strategy consists of two steps: (1) selecting a set of items that is most likely to be DIF-free, and (2) assessing DIF for other items using the designated items as anchors. To explore the variables that influence the performance of the DFTD strategy in assessing DIF, a series of simulation studies was implemented in this study. Three multiple indicators, multiple causes (MIMIC) methods, namely the standard MIMIC method (M-ST), the M IMIC method with scale purification (M-SP), and the iterative MIMIC method (M-IT), were used to select four items as an anchor set before implementing the DFTD strategy. The results of the analysis of variance showed significant differences among M-IT, M-SP, and M-ST in identifying DIFfree items, with M-IT performing better than M-SP, and M-SP performing better than M-ST. The analysis also found that the main effects of DIF patterns, DIF percentages, sample sizes, and item response theory (IRT) models, as well as their interactions, were significant in terms of their accuracy in identifying the DIF-free items. Based on the results, the M-SP and M-IT methods are recommended for use in identifying DIF-free items, especially when there are many DIF items in a test. The same set of variables significantly determined the power rates of these methods in assessing DIF. However, the type I error rates in the DIF assessments were significantly influenced by the DIF patterns, DIF percentages, and sample sizes. Based on the results of this study, it is recommended that R500/F500, as well as data fits two-parameter logistic model (2PLM), be adopted when applying the DFTD strategy with t he MIMIC method in assessing DIF.
|
参考文献
|
-
Ackerman, T. A.(1992).A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective.Journal of Educational Measurement,29,67-91.
-
Bollen, K. A.(Ed.),Long, J. S.(Ed.)(1993).Testing structural equation models.Newbury Park, CA:Sage.
-
Camilli, G.(1992).A conceptual analysis of differential item functioning in terms of a multidimensional item response model.Applied Psychological Measurement,16,129-147.
-
Chen, J.-H.,Chen, C.-T.,Shih, C.-L.(2014).Improving the control of type I error rate in assessing differential item functioning for hierarchical generalized linear model when impact is presented.Applied Psychological Measurement,38,18-36.
-
Cohen, A. S.,Kim, S.-H.,Wollack, J. A.(1996).An investigation of the likelihood ratio test for detection of differential item functioning.Applied Psychological Measurement,20,15-26.
-
Cohen, J.(1988).Statistical power analysis for the behavioral sciences.Hillsdale, NJ:Lawrence Erlbaum.
-
Finch, H.(2005).The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio.Applied Psychological Measurement,29,278-295.
-
Fleishman, J. A.,Spector, W. D.,Altman, B. M.(2002).Impact of differential item functioning on age and gender differences in functional disability.The Journal of Gerontology: Series B,57,S275-S284.
-
French, B. F.,Maller, S. J.(2007).Iterative purification and effect size use with logistic regression for differential item functioning detection.Educational and Psychological Measurement,67,373-393.
-
Gallo, J. J.,Anthony, J. C.,Muthén, B. O.(1994).Age differences in the symptoms of depression: A latent trait analysis.Journal of Gerontology,49,251-264.
-
Glöckner-Rist, A.,Hoijtink, H.(2003).The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling.Structural Equation Modeling: A multidisciplinary Journal,10,544-565.
-
Hidalgo-Montesinos, M. D.,Gómez-Benito, J.(2003).Test purification and the evaluation of differential item functioning with multinomial logistic regression.European Journal of Psychological Assessment,19,1-11.
-
Holland, P. W.,Thayer, D. T.(1988).Differential item performance and the Mantel-Haenszel procedure.Test validity,Hillsdale, NJ:
-
Kopf, J.,Zeileis, A.,Strobl, C.(2015).Anchor selection strategies for DIF analysis: Review, assessment, and new approaches.Educational and Psychological Measurement,75,22-56.
-
Levine, D. W.,Kaplan, R. M.,Kripke, D. F.,Bowen, D. J.,Naughton, M. J.,Shumaker, S. A.(2003).Factor structure and measurement invariance of the Women's Health Initiative Insomnia Rating Scale.Psychological Assessment,15,123-136.
-
Lord, F. M.(1980).Applications of item response theory to practical testing problems.Hillsdale, NJ:Lawrence Erlbaum.
-
Lord, F. M.(Ed.),Novick, M. R.(Ed.)(1968).Statistical theories of mental test scores.Reading, MA:Addison-Wesley.
-
Muthén, B. O.(1985).A method for studying the homogeneity of test items with respect to other relevant variables.Journal of Educational Statistics,10,121-132.
-
Muthén, B. O.,Kao, C.-F.,Burstein, L.(1991).Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items.Journal of Educational Measurement,28,1-22.
-
Muthén, L. K.,Muthén, B. O.(2004).Mplus user's guide.Los Angeles, CA:Muthen & Muthen.
-
Navas-Ara, M. J.,Gómez-Benito, J.(2002).Effects of ability scale purification on identification of DIF.European Journal of Psychological Assessment,18,9-15.
-
Oort, F. J.(1998).Simulation study of item bias detection with restricted factor analysis.Structural Equation Modeling: A Multidisciplinary Journal,5,107-124.
-
Shealy, R.,Stout, W.(1993).A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF.Psychometrika,58,159-194.
-
Shih, C.-L.,Liu, T.-H.,Wang, W.-C.(2014).Controlling type I error rates in assessing DIF for logistic regression method combined with SIBTEST regression correction procedure and DIF-free-then-DIF strategy.Educational and Psychological Measurement,74,1018-1048.
-
Shih, C.-L.,Wang, W.-C.(2009).Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor.Applied Psychological Measurement,33,184-199.
-
Swaminathan, H.,Rogers, H. J.(1990).Detecting differential item functioning using logistic regression procedures.Journal of Educational Measurement,27,361-370.
-
Thissen, D.,Steinberg, L.,Wainer, H.(1988).Use of item response theory in the study of group differences in trace lines.Test validity,Hillsdale, NJ:
-
Wang, W.-C.(2004).Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models.Journal of Experimental Education,72,221-261.
-
Wang, W.-C.,Shih, C.-L.,Sun, G.-W.(2012).The DIFfree-then-DIF strategy for the assessment of differential item functioning.Educational and Psychological Measurement,72,687-708.
-
Wang, W.-C.,Shih, C.-L.,Yang, C.-C.(2009).The MIMIC method with scale purification for detecting differential item functioning.Educational and Psychological Measurement,69,713-731.
-
Wang, W.-C.,Su, Y.-H.(2004).Effects of average signed area between two item characterist ic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method.Applied Measurement in Education,17,113-144.
-
Wang, W.-C.,Yeh, Y.-L.(2003).Effects of anchor item methods on differential item functioning detection with the likelihood ratio test.Applied Psychological Measurement,27,479-498.
-
Woods, C. M.(2009).Empirical selection of anchors for tests of differential item functioning.Applied Psychological Measurement,33,42-57.
|