题名

Differential Item Functioning Analyses in Large-Scale Educational Surveys: Key Concepts and Modeling Approaches for Secondary Analysts

并列篇名

大型教育調查研究中的差別試題功能:次級分析中的核心概念及建模方法

作者

朱小姝(Xiao-Shu Zhu);安德魯˙儒普(André A. Rupp);高靜(Jing Gao)

关键词

複雜題本設計 ; 差別試題功能 ; 多階層廣義線性模式 ; 多階段抽樣設計通 ; complex booklet design ; DIF ; HGLMs ; multi-stage sampling design

期刊名称

教育科學研究期刊

卷期/出版年月

56卷1期(2011 / 03 / 01)

页次

91 - 127

内容语文

英文

中文摘要

型教育評量研究常採用多階段抽樣的設計(multi-stage sampling design),透過對母群體之抽樣單位進行分層以抽取受測者。此外,還會採用複雜題本設計(complex booklet design)的方式將題目組成多份測驗題本。在此情況下,欲確保公正測量出不同受測群體的能力,關鍵在於能夠有效偵測所採用的題目是否具差別試題功能(differential item functioning, DIF)。本文旨在介紹探討在大型教育評量複雜設計之下能用以偵測差別試題功能的建模方法,並應用六種可用於偵測DIF的多階層廣義線性模式(hierarchical generalized linear models, HGLMs),再透過電腦模擬比較它們偵測DIF的效力。接著又將這些模式應用到國際數學與科學教育成就趨勢調查研究(TIMSS)的實證數據上,藉以探測是否存在一致性的性別DIF(uniform gender DIF)。

英文摘要

Many educational surveys employ a multi-stage sampling design for students, which makes use of stratification and/or clustering of population units, as well as a complex booklet design for items from an item pool. In these surveys, the reliable detection of item bias or differential item functioning (DIF) across student groups is a key component for ensuring fair representations of different student groups. In this paper, we describe several modeling approaches that can be useful for detecting DIF in educational surveys. We illustrate the key ideas by investigating the performance of six hierarchical generalized linear models (HGLMs) using a small simulation study and by applying them to real data from the Trends in Mathematics and Science Study (TIMSS) study where we use them to investigate potential uniform gender DIF.

主题分类 社會科學 > 教育學
参考文献
  1. (2009).TIMSS&PIRLS International Study Center. (2009). TIMSS 2007 international database and user guide. Retrieved June 12, 2010, from http://timss.bc.edu/TIMSS2007/idb_ug.html.http://timss.bc.edu/TIMSS2007/idb_ug.html
  2. American Educational Research Association,American Psychological Association,National Council on Measurement in Education(1999).Standards for educational an psychological testing.Washington, DC:American Educational Research Association.
  3. Beaton, A. E.(Ed.)(1987).,Princeton, NJ:Educational Testing Service.
  4. Binder, D. A.(1983).On the variances of asymptotically normal estimators from complex surveys.International Statistical Review,51(3),279-292.
  5. Binici, S.(2008).Miami, FL,Florida State University.
  6. Brennan, R. L.(ed.)(2006).Educational measurement (4th ed.).Westport, CN:Greenwood.
  7. Camill, G.,Shepard, L.(1994).MMSS volume 4: Methods for identifying biased test items.Thousand Oaks, CA:Sage.
  8. Clauser, B.,Mazor, K.,Hambleton, R. K.(1993).The effects of purification of the matching criterion on the identification of DIF using the Mantel-Haenszel procedure.Applied Measurement in Education,6(4),269-279.
  9. Cochran, W. G.(1977).Sampling techniques (3rd ed.).New York:John Wiley & Sons.
  10. de Ayala, R.(2009).The theory and practice of item response theory.New York:Guilford Press.
  11. De Boeck, P.(ed.),Wilson, M.(ed.)(2004).Explanatory item response models: A generalized linear and nonlinear approach.New York:Springer.
  12. Embretson, S. E.,Reise, S. P.(2000).Item response theory for psychologists.Mahwah, NJ:Erlbaum.
  13. Ferne, T.,Rupp, A. A.(2007).A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations.Language Assessment Quarterly,4(2),1-36.
  14. Frey, A.,Hartig, J.,Rupp, A. A.(2009).An NCME instructional module on booklet designs in large-scale assessments of student achievement.Educational Measurement: Issues and Practice,28(3),39-53.
  15. Goldstein, H.(2003).Multilevel statistical models.London:Arnold.
  16. Hamilton, L. S.(1999).Detecting gender-based differential item functioning on a constructed-respons science test.Applied Measurement in Education,12(3),211-235.
  17. Hamilton, L. S.,Snow, R. E.(1998).,Los Angeles:National Center for Research on Evaluation, Standards, and Student Testing, University of California.
  18. Hauger, J. B.,Sireci, S. G.(2008).Detecting differential item functioning across examinees teted in their dominant language and examinees tested in a second language.International Journal of Testing,8(3),237-250.
  19. Holland, P. W.(ed.),Wainer, H.(ed.)(1993).Differential item fuctioning.Hillsdale, NJ:Lawrence Erlbaum Associates.
  20. Kalton, G.(1983).Models in the practice of survey sampling.International Statistical Review,51(2),175-188.
  21. Kamata, A.(2001).Item analysis by the hierarchical generalized linear model.Journal of Educational Measurement,38(1),79-93.
  22. Kamata, A.,Binici, S.(2003).Random-effect DIF analysis via hierarchical generalized linear model.The International Meeting of the Psychometric Society (IMPS),Sardinia, Italy:
  23. Kim, W.(2003).Pennsylvania, PA,Pennsylvania State University.
  24. Lomax, R. G.(2007).Statistical concepts: A second course (3rd ed.).Mahwah, NJ:Erlbaum.
  25. Mapuranga, R.,Dorans, N.,Middleton, K.(2008).,Princeton, NJ:ETS.
  26. Martin, M. O.(ed.),Kelly, D. L.(ed.)(1996).,Chestnut Hill, MA:Boston College.
  27. McLachlan, G. J.,Peel, D.(2000).Finite mixture models.New York:Wiley.
  28. Mislevy, R. J.(1991).Randomizaton-based inference about latent variables from complex samples.Psychometrika,56(2),177-196.
  29. Mislevy, R. J.,Beaton, A.,Kaplan, B.,Sheehan, K.(1992).Estimating population characteristics from sparse matrix samples of item responses.Journal of Educational Measurement,29(2),133-161.
  30. Mislevy, R.,Johnson, E.,Muraki, E.(1992).Scaling procedures in NAEP.Journal of Educational Statistics,17(2),131-154.
  31. Muthén, L. K.,Muthén, B. O.(2007).Mplus.Los Angeles:Muthen, L. K..
  32. Osterlind, S.(2009).Differential item fuctioning (2nd ed.).Thousand Oaks:Sage.
  33. Pan, T.(2008).Ann Arbor, MI,Michigan State University.
  34. Pfeffermann, D.,Skinner, C. J.,Holmes, D. J.,Goldstein, H.,Rasbash, J.(1998).Weighting for unequal selection probabilities in multilevel models.Journal of the Royal Statistical Society Series B,60,23-40.
  35. Prowker, A.,Camilli, G.(2007).Looking beyond the overall scores of NAEP assessments: Applications of generalized linear mixed modeling for exploring value-added item difficulty effects.Journal of Educational Measurement,44(1),69-87.
  36. Rao, C. R.(ed.),Sinharay, S.(ed.)(2006).Handbook of statistics, Vol. 26: Psychometrics.North Holland:Elsevier.
  37. Raudenbush, S. W.,Bryk, A. S.(2002).Hierarchical linear models: Applications and data analysis methods.Thousand Oaks:Sage.
  38. Raudenbush, S. W.,Bryk, A. S.,Cheong, Y. F.,Congdon, R.,du Toit, M.(2004).HLM 6: Hierarchical linear and nonlinear modeling.Lincolnwood, IL:Scientific Software International.
  39. Rubin, D. B.(1987).Multiple imputation for nonresponse in sample surveys.New York:John Wiley.
  40. Rutkowski, L.,Gonzalez, E.,Joncas, M.,von Davier, M.(2010).Secondary analyses of large-scale assessment data.Educational Researcher,39(2),142-151.
  41. Shealy, R.,Stout, W. F.(1993).A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF.Psychometrika,58(2),159-194.
  42. Skondral, A.,Rabe-Hesketh, S.(2004).Generalized latent variable modeling: Multilevel, longitudinal, and Structural equation models.Boca Raton, FL:Chapman & Hall}CRC.
  43. Swaminathan, H.,Rogers, H. J.(1990).Detecting differential item functioning using logistic regression procedures.Journal of Educational Measurement,27(4),361-370.
  44. von Davier, M.,Gonzalez, E.,Mislevy, R. J.(2010).,未出版
  45. Wainer, H.(ed.),Braun, H. I.(ed.)(1988).Test validity.Hillsdale, NJ:Lawrence Erlbaum Associates.
  46. Wu, M. L.,Adams, R. J.,Wilson, M. R.,Haldane, S. A.(2007).ACER ConQuest version 2.0 : generalised item response modelling software [Softwave program].Camberwell:Acer Press.
  47. Zenisky, A.,Hambleton, R.,Robin, F.(2003).DIF detection and interpretation in large-scale science assessments: Informing item writing practices.Educational Assessment,9(1/2),61-78.
  48. Zenisky, A.,Hambleton, R.,Robin, F.(2003).Detection of differenctial item functioning in large scale state tests: A study evaluating a two-stage approach.Educational and Psychological Measurement,63(1),51-64.
  49. Zhang, Y.,Dorans, N.,Matthews-Lopez, J.(2005).,Princeton, NJ:ETS.
  50. Zumbo, B. D.(1999).A handbook on the theory and methods of differential item functioning (DIF).Ottawa, Canada:Directorate of Human Resources Research and Evaluation, Department of National Defense.