题名

A Comparison of Three Polytomous DIF Detection Methods

并列篇名

三種多元化計分題之試題差異性診斷法的比較

DOI

10.6773/JRMS.201012.0001

作者

吳莉安(Li-An Wu);蔡蓉青(Rung-Ching Tsai)

关键词

試題差異性 ; 羅吉斯迴歸檢定 ; 差異試題及測驗功能檢定 ; differential item functioning (DIF) ; logistic regression procedure ; differential functioning of items and tests procedure

期刊名称

測驗統計年刊

卷期/出版年月

18期_下(2010 / 12 / 01)

页次

1 - 21

内容语文

英文

中文摘要

本論文以模擬研究比較了三種不同的試題差異性(DIF)診斷法-羅吉斯迴歸檢定、概度比檢定,以及差異試題及測驗功能檢定在等級反應模式(graded response model)下之表現。操縱變因包括了樣本數(兩種)、母群體之分配(兩種)、以及測驗中所含DIF題數之比例(四種)。在十六種組合之下,各做了一百次試驗。試驗結果發現,這三種方法之型一誤差(type I error)大致上都符合0.05的限定。而在檢定力(power)的表現上,概度比檢定最好、差異試題及測驗功能檢定次之、羅吉斯迴歸檢定最差。平均而言,羅吉斯迴歸檢定之檢定力的表現低於0.4,而且只對DIF性質明顯的題目偵測較為靈敏。

英文摘要

The performance of the three procedures -- the logistic regression procedure (LogR), the likelihood ratio test (LRT), and the differential functioning of items and tests procedure (DFIT) in detecting differential item functioning (DIF) under the graded response model were compared in a simulation study. Factors manipulated included sample size, differences in the ability distributions between the focal and the reference groups, and four different percentages of DIF items contained in a test. For each of the sixteen combinations, 100 replications of DIF detection were simulated. All three DIF procedures adhered to nominal type I error rates under most conditions. LRT was the most powerful among the three under all situations. DFIT was less powerful than LRT, but also useful for DIF detection especially with groups of different ability distributions and relatively large percentage of DIF items. LogR, with mean powers lower than 0.4 in all conditions, appeared to be sensitive only to items with large DIF size.

主题分类 基礎與應用科學 > 統計
社會科學 > 教育學
参考文献
  1. Baker, F. B. (1993). EQUATE2: Computer program for equating two metrics in item response theory [Computer program]. Madison: University of Wisconsin, Laboratory of Experimental Design..
  2. Ankenmann, R. D.,Witt, E. A.,Dunbar, S. B.(1999).An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning.Journal of Educational Measurement,36,277-300.
  3. Bock, R. D.,Aitkin, M.(1981).Maximum likelihood estimation of item parameters: an application of the EM algorithm.Psychometrika,46,443-459.
  4. Bolt, D. M.(2002).A monte carlo comparison of parametric and nonparametric polytomous dif detection methods.Applied Measurement in Education,15,113-141.
  5. Camilli, G.,Shepard, L. A.(1994).Methods for Identifying Biased Test Items.Thousand Oaks:Sage.
  6. Chang, H. H.,Mazzeo, J.(1994).The unique correspondence of the item response function and item category response functions in polytomously scored item response models.Psychometrika,59,391-404.
  7. Chang, H. H.,Mazzeo, J.,Roussos, L.(1996).Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure.Journal of Educational Measurement,32,79-96.
  8. Cohen, A. S.,Kim, S. H.,Baker, F. B.(1993).Detection of differential item functioning in the graded response model.Applied Psychological Measurement,17(4),335-350.
  9. Crane, P. K.,Belle, G. V.,Larson, E. B.(2004).Test bias in a cognitive test: differential item functioning in the CASI.Statistics in Medicine,23,241-256.
  10. du Toit, M.(Ed.)(2003).IRT from SSI.Lincolnwood, IL:Scientific Software International, Inc..
  11. Flowers, C. P.,Oshima, T. C.,Raju, N. S.(1999).A description and demonstration of the polytomous-DFIT framework.Applied Psychological Measurement,23,309-326.
  12. French, A. W.,Miller, T. R.(1996).Logistic regression and its use in detecting differential item functioning in polytomous items.Journal of Educational Measurement,33,315-332.
  13. Jodoin, M. G.,Gierl, M. J.(2001).Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection.Applied Measurement in Education,14,329-349.
  14. Kim, S. H.,Cohen, A. S.(1991).A comparison of two area measures for detecting differential item functioning.Applied Psychological Measurement,15(3),269-278.
  15. Kim, S. H.,Cohen, A. S.(1998).Detection of differential item functioning under the graded response model with the likelihood ratio test.Applied Psychological Measurement,22,345-355.
  16. Lord, F. M.(1980).Applications of item response theory to practical testing problems.Hillsdale NJ:Erlbaum.
  17. Maldonado, G.,Greenland, S.(1993).Simulation study of confounder-selection strategies.American Journal of Epidemiology,138,923-936.
  18. Mantel, N.,Haenszel, W. M.(1959).Statistical aspects of the analysis of data from retrospective studies of diserse.Journal of National Cancer Institute,22,719-748.
  19. Mapuranga, R.,Dorans, N. J.,Middleton, K.(2008).A review of recent developments in differential item functioning.annual meeting of the National Council on Measurement in Education (NCME),New York:
  20. Mellenbergh, G. J.(1995).Conceptual notes on models for discrete polytomous item responses.Applied Psychological Measurement,19,91-100.
  21. Miller, T. R.,Spray, J. A.(1993).Logistic discriminant function analysis for DIF identification of polytomously scored items.Journal of Educational Measurement,30,107-122.
  22. Millsap, R. E.,Everson, H. T.(1993).Methodology review: statistical approaches for assessing measurement bias.Applied Psychological Measurement,17,297-334.
  23. Muraki, E.(1992).A generalized partial credit model: Application of an EM algorithm.Applied Psychological Measurement,16,159-176.
  24. Muthén, B. O.(2002).Beyond SEM: General latent variable modeling.Behaviormetrika,29,81-117.
  25. Narayanan, P.,Swaminathan, H.(1996).Identification of items that show nonuniform DIF.Applied Psychological Measurement,20,257-274.
  26. Narayanan, P.,Swaminathan, H.(1994).Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning.Applied Psychological Measurement,18,315-338.
  27. Oshima, T. C.,McGinty, D.,Flowers, C. P.(1994).Differential item functioning for a test with a cutoff score: use of limited closed-interval measures.Applied Measurement in Education,7(3),195-209.
  28. Oshima, T. C.,Raju, N. S.,Flowers, C. P.(1997).Development and demonstration of multidimensional IRT-based internal measures of differential functioning of items and tests.Journal of Educational Measurement,34,253-272.
  29. Penfield, R. D.,Lam, T. C. M.(2000).Assessing differential item functioning in performance assessment: Review and recommendations.Educational Measurement: Issues and Practice,19,5-15.
  30. Potenza, M. T.,Dorans, N. J.(1995).DIF assessment for polytomously scored items: a framework for classification and evaluation.Applied Psychological Measurement,19,23-37.
  31. Raju, N. S.,van der Linden, W. J.,Fleer, P. F.(1995).IRT-based internal measures of differential functioning of items and tests.Applied Psychological Measurement,19,353-368.
  32. Reise, S. P.,Widaman, K. F.,Pugh, R. H.(1993).Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance.Psychological Bulletin,114,552-566.
  33. Rogers, H. J.,Swaminathan, H.(1993).A comparison of the logistic regression and Mantel-Haenszel procedures for detecting differential item functioning.Applied Psychological Measurement,17,105-116.
  34. Samejima, F.(1969).Estimation of latent ability using a response pattern of graded scores.
  35. Shealy, R.,Stout, W.(1993).A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF.Psychometrika,58,159-194.
  36. Stocking, M. L.,Lord, F. M.(1983).Developing a common metric in item response theory.Applied Psychological Measurement,7,201-210.
  37. Stroud, A. H.,Sechrest, D.(1966).Gaussian quadrature formulas.New York:Prentice Hall.
  38. Swaminathan, H.,Rogers, H. J.(1990).Detecting differential item functioning using logistic regression procedures.Journal of Educational Measurement,27,361-370.
  39. Teresi, J. A.,Fleishman, J. A.(2007).Differential item functioning and health assessment.Quality of Life Research,16(Supplement 1),33-42.
  40. Thissen, D.,Steinberg, L.,Gerard, M.(1986).Beyond mean group difference: The concept of item bias.Psychological Bulletin,99,118-128.
  41. Wainer, H.(Ed.),Braun, H. I.(Ed.)(1988).Test validity.Hillsdale, NJ:Lawrence Erlbaum.
  42. Zumbo, B. D.(1999).A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modelling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores.Ottawa, Ont:Directorate of Human Resources Research and Evaluation, Department of National Defense.
  43. Zwick, R.,Donoghnue, J. R.,Grima, A.(1993).Assessment of differential item functioning for performance tasks.Journal of Educational Measurement,30,233-251.