题名

Exploring the Variables That Influence the Performance of the DIF-free-then-DIF Strategy in Assessing Differential Item Functioning

并列篇名

探討影響「先定錨後檢核」策略於檢核差異試題功能之表現的變數

DOI

10.6129/CJP.201812_60(4).0003

作者

陳繼成(Chi-Chen Chen);周業太(Yeh-Tai Chou);施慶麟(Ching-Lin Shih)

关键词

MIMIC ; DIF-free-then-DIF strategy ; scale purification ; item response theory ; confirmatory factor analysis ; MIMIC ; 先定錨後檢核策略 ; 量尺淨化 ; 試題反應理論 ; 驗證性因素分析

期刊名称

中華心理學刊

卷期/出版年月

60卷4期(2018 / 12 / 01)

页次

251 - 265

内容语文

英文

中文摘要

Conventional differential item functioning (DIF) assessment methods tend to yield an inflated type I error rate and a deflated power rate when the tests contain many DIF items that favor the same group. To control type I error rates in DIF assessments under similar conditions, the DIF-free-then-DIF (DFTD) strategy is proposed. The DFTD strategy consists of two steps: (1) selecting a set of items that is most likely to be DIF-free, and (2) assessing DIF for other items using the designated items as anchors. To explore the variables that influence the performance of the DFTD strategy in assessing DIF, a series of simulation studies was implemented in this study. Three multiple indicators, multiple causes (MIMIC) methods, namely the standard MIMIC method (M-ST), the M IMIC method with scale purification (M-SP), and the iterative MIMIC method (M-IT), were used to select four items as an anchor set before implementing the DFTD strategy. The results of the analysis of variance showed significant differences among M-IT, M-SP, and M-ST in identifying DIFfree items, with M-IT performing better than M-SP, and M-SP performing better than M-ST. The analysis also found that the main effects of DIF patterns, DIF percentages, sample sizes, and item response theory (IRT) models, as well as their interactions, were significant in terms of their accuracy in identifying the DIF-free items. Based on the results, the M-SP and M-IT methods are recommended for use in identifying DIF-free items, especially when there are many DIF items in a test. The same set of variables significantly determined the power rates of these methods in assessing DIF. However, the type I error rates in the DIF assessments were significantly influenced by the DIF patterns, DIF percentages, and sample sizes. Based on the results of this study, it is recommended that R500/F500, as well as data fits two-parameter logistic model (2PLM), be adopted when applying the DFTD strategy with t he MIMIC method in assessing DIF.

英文摘要

檢核差異試題功能(differential item functioning, DIF)時,若測驗有較多DIF試題且有利同一組別時,會造成型一錯誤膨脹。有學者提出先定錨後檢核(DIF-free-then-DIF, DFTD)策略(Wang, Shih, & Sun, 2012)以控制DIF檢核時之型一錯誤。DFTD策略可被應用在不同的DIF檢核方法上,然而目前尋找定錨題的方法均可能找到具有DIF的定錨題,進而影響DIF檢核效果,因此本研究之研究目的有二:(一)探討DFTD策略中,影響尋找定錨題方法表現之因素;(二)為探討DFTD策略中,影響DIF檢核方法成效之因素,進而提出使用DFTD策略的建議情境。本研究使用標準MIMIC法(the standard multiple indicators, multiple causes method, M-ST)、量尺淨化MIMIC法(the MIMIC method with scale purification, M-SP)與迭代MIMIC法(the iterative MIMIC method, M-IT)來選取四道定錨題以執行DFTD策略,並探討定錨題對於DFTD策略之影響。變異數分析的結果顯示,M-IT尋找定錨題的正確率優於M-SP,M-SP優於M-ST,故建議DFTD策略中應以M-IT或M-SP選取定錨題;此外,在DFTD策略中,DIF百分比、樣本數、DIF型態及試題反應理論(item response theory)模式是明顯影響選擇定錨題之正確率以及DIF試題之檢核力的關鍵因素;型一錯誤則受到DIF型態、DIF百分比及樣本數等三個變項之影響。由於樣本數可以由研究者控制,故而研究者在使用MIMIC法結合DFTD策略時,樣本數以R500/F500、資料以符合二參數對數模式(two-parameter logistic model)為宜。

主题分类 社會科學 > 心理學
参考文献
  1. Ackerman, T. A.(1992).A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective.Journal of Educational Measurement,29,67-91.
  2. Bollen, K. A.(Ed.),Long, J. S.(Ed.)(1993).Testing structural equation models.Newbury Park, CA:Sage.
  3. Camilli, G.(1992).A conceptual analysis of differential item functioning in terms of a multidimensional item response model.Applied Psychological Measurement,16,129-147.
  4. Chen, J.-H.,Chen, C.-T.,Shih, C.-L.(2014).Improving the control of type I error rate in assessing differential item functioning for hierarchical generalized linear model when impact is presented.Applied Psychological Measurement,38,18-36.
  5. Cohen, A. S.,Kim, S.-H.,Wollack, J. A.(1996).An investigation of the likelihood ratio test for detection of differential item functioning.Applied Psychological Measurement,20,15-26.
  6. Cohen, J.(1988).Statistical power analysis for the behavioral sciences.Hillsdale, NJ:Lawrence Erlbaum.
  7. Finch, H.(2005).The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio.Applied Psychological Measurement,29,278-295.
  8. Fleishman, J. A.,Spector, W. D.,Altman, B. M.(2002).Impact of differential item functioning on age and gender differences in functional disability.The Journal of Gerontology: Series B,57,S275-S284.
  9. French, B. F.,Maller, S. J.(2007).Iterative purification and effect size use with logistic regression for differential item functioning detection.Educational and Psychological Measurement,67,373-393.
  10. Gallo, J. J.,Anthony, J. C.,Muthén, B. O.(1994).Age differences in the symptoms of depression: A latent trait analysis.Journal of Gerontology,49,251-264.
  11. Glöckner-Rist, A.,Hoijtink, H.(2003).The best of both worlds: Factor analysis of dichotomous data using item response theory and structural equation modeling.Structural Equation Modeling: A multidisciplinary Journal,10,544-565.
  12. Hidalgo-Montesinos, M. D.,Gómez-Benito, J.(2003).Test purification and the evaluation of differential item functioning with multinomial logistic regression.European Journal of Psychological Assessment,19,1-11.
  13. Holland, P. W.,Thayer, D. T.(1988).Differential item performance and the Mantel-Haenszel procedure.Test validity,Hillsdale, NJ:
  14. Kopf, J.,Zeileis, A.,Strobl, C.(2015).Anchor selection strategies for DIF analysis: Review, assessment, and new approaches.Educational and Psychological Measurement,75,22-56.
  15. Levine, D. W.,Kaplan, R. M.,Kripke, D. F.,Bowen, D. J.,Naughton, M. J.,Shumaker, S. A.(2003).Factor structure and measurement invariance of the Women's Health Initiative Insomnia Rating Scale.Psychological Assessment,15,123-136.
  16. Lord, F. M.(1980).Applications of item response theory to practical testing problems.Hillsdale, NJ:Lawrence Erlbaum.
  17. Lord, F. M.(Ed.),Novick, M. R.(Ed.)(1968).Statistical theories of mental test scores.Reading, MA:Addison-Wesley.
  18. Muthén, B. O.(1985).A method for studying the homogeneity of test items with respect to other relevant variables.Journal of Educational Statistics,10,121-132.
  19. Muthén, B. O.,Kao, C.-F.,Burstein, L.(1991).Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items.Journal of Educational Measurement,28,1-22.
  20. Muthén, L. K.,Muthén, B. O.(2004).Mplus user's guide.Los Angeles, CA:Muthen & Muthen.
  21. Navas-Ara, M. J.,Gómez-Benito, J.(2002).Effects of ability scale purification on identification of DIF.European Journal of Psychological Assessment,18,9-15.
  22. Oort, F. J.(1998).Simulation study of item bias detection with restricted factor analysis.Structural Equation Modeling: A Multidisciplinary Journal,5,107-124.
  23. Shealy, R.,Stout, W.(1993).A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF.Psychometrika,58,159-194.
  24. Shih, C.-L.,Liu, T.-H.,Wang, W.-C.(2014).Controlling type I error rates in assessing DIF for logistic regression method combined with SIBTEST regression correction procedure and DIF-free-then-DIF strategy.Educational and Psychological Measurement,74,1018-1048.
  25. Shih, C.-L.,Wang, W.-C.(2009).Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor.Applied Psychological Measurement,33,184-199.
  26. Swaminathan, H.,Rogers, H. J.(1990).Detecting differential item functioning using logistic regression procedures.Journal of Educational Measurement,27,361-370.
  27. Thissen, D.,Steinberg, L.,Wainer, H.(1988).Use of item response theory in the study of group differences in trace lines.Test validity,Hillsdale, NJ:
  28. Wang, W.-C.(2004).Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models.Journal of Experimental Education,72,221-261.
  29. Wang, W.-C.,Shih, C.-L.,Sun, G.-W.(2012).The DIFfree-then-DIF strategy for the assessment of differential item functioning.Educational and Psychological Measurement,72,687-708.
  30. Wang, W.-C.,Shih, C.-L.,Yang, C.-C.(2009).The MIMIC method with scale purification for detecting differential item functioning.Educational and Psychological Measurement,69,713-731.
  31. Wang, W.-C.,Su, Y.-H.(2004).Effects of average signed area between two item characterist ic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method.Applied Measurement in Education,17,113-144.
  32. Wang, W.-C.,Yeh, Y.-L.(2003).Effects of anchor item methods on differential item functioning detection with the likelihood ratio test.Applied Psychological Measurement,27,479-498.
  33. Woods, C. M.(2009).Empirical selection of anchors for tests of differential item functioning.Applied Psychological Measurement,33,42-57.