题名

Omitted Variable Bias in Differential Item Functioning Assessment

并列篇名

探討差異試題功能檢核中的遺漏變數偏誤

DOI

10.6129/CJP.201812_60(4).0002

作者

趙秀怡(Hsiu-Yi Chao);陳繼成(Chi-Chen Chen);鄭中平(Chung-Ping Cheng);陳俊宏(Jyun-Hong Chen)

关键词

differential item functioning ; scale purification ; test fairness ; omitted variable bias ; 差異試題功能 ; 量尺淨化 ; 測驗公平性 ; 遺漏變數偏誤

期刊名称

中華心理學刊

卷期/出版年月

60卷4期(2018 / 12 / 01)

页次

233 - 250

内容语文

英文

中文摘要

Differential item functioning (DIF) assessment has been widely applied for decades to ensure test fairness in routine item analysis. However, few studies have investigated, or even noticed, omitted variable bias (OVB) while assessing DIF. As a result, the estimation of DIF effects may not be unbiased, resulting in inflated type I error rates and/or deflated power rates of DIF assessment. In testing practices, test practitioners may, therefore, wrongly identify inequality among grouping variables and revise the flagged DIF items based on misleading information. To overcome these problems, two issues were addressed in detail in this study. The first issue is the robustness of the original method (i.e., assessing DIF without considering confounding variables) to OVB, which was examined by evaluating the impact of ignoring OVB in DIF assessment. The second issue occurs when the controlled method (i.e., including all grouping variables) encounters the so-called trade-off between bias and inefficiency while assessing DIF. To address this issue, the backward scale purification (BSP) procedure was applied to the controlled method to improve the performance of DIF assessment. Accordingly, three interrelated studies were conducted. In Study 1, type I error rates for the original and controlled methods in DIF assessment were investigated. The results indicated that the controlled method can well control type I error rates under all conditions. In contrast, the original method lost control of type I error rates when confounding variables exhibited DIF and the correlation among grouping variables was high (i.e., greater than or equal to .2). In Study 2, type II error rates of the controlled method were investigated. In comparison to the true model, the results indicated that the type II error rates of the controlled method increased as the number of confounding variables decreased and the correlation among grouping variables increased. This result manifests the trade-off between bias and inefficiency when adding additional variables to the model. In Study 3, the BSP was applied to the controlled method to reduce the type II error rates. The results indicated that BSP can effectively control type I error rates while maintaining acceptable power rates. In summary, the controlled method with BSP appears promising for helping test practitioners deal with OVB in DIF assessment, thereby ensuring fairness and validity in testing practices.

英文摘要

差異試題功能(differential item functioning, DIF)檢核已廣泛於試題分析中進行以確保測驗公平性;然而,少有研究考量遺漏變數偏誤(omitted variable bias, OVB)對DIF檢核的影響。忽略OVB可能會影響參數估計值的不偏性,並導致DIF檢核之型一誤差膨脹與檢定力下降。測驗實務者可能因此錯誤判斷組別變項出現不公平現象,並據此錯誤訊息進行DIF試題修改。為改善此問題,本研究針對兩議題進行探討。第一、評估原始DIF檢核法(DIF檢核時不考慮混淆變項)對於OVB之強韌性,亦即評估忽略OVB對DIF檢核造成的影響。第二、加入向後量尺淨化程序(backward scale purification, BSP)以提升控制法(DIF檢核時納入所有混淆變項)的DIF檢核成效,解決該方法在估計偏誤與檢核無效率上的兩難。據此,本研究進行三個模擬研究。研究一評估DIF檢核時之型一誤差。結果顯示控制法能有效控管型一誤差;但原始法在混淆變項具DIF且分組變項間存在高相關時,會出現型一誤差失控的情形。研究二評估DIF檢核時之型二誤差。結果顯示當混淆變項數量降低且組別變項間相關增加時,控制法之型二誤差會出現膨脹情形。研究三在控制法中加入BSP以降低型二誤差。研究結果顯示,BSP能有效控制型一誤差,並維持良好檢定力。總結而言,BSP應能有效幫助測驗實務者在DIF檢核中處理OVB,確保測驗公平性與效度。

主题分类 社會科學 > 心理學
参考文献
  1. American Psychological Association. (2012). Ethnic and racial disparities in education: Psychology's contributions to understanding and reducing disparities. Retrieved from http://www.apa.org/ed/resources/racialdisparities.aspx
  2. Andrich, D.(1978).A rating formulation for ordered response categories.Psychometrika,43,561-573.
  3. Aud, S.,Fox, M. A.,Kewal Ramani, A.(2010).,Washington, DC:Government Printing Office.
  4. Barreto, H.,Howland, F.(2005).Introductory econometrics: Using Monte Carlo simulation with Microsoft Excel.Cambridge, UK:Cambridge University Press.
  5. Chen, C.-T.,Hwu, B.-S.(2018).Improving the assessment of differential item functioning in largescale programs with dual-scale purification of Rasch models: The PISA example.Applied Psychological Measurement,42,206-220.
  6. Chen, J.-H.,Chen, C.-T.,Shih, C.-L.(2014).Improving the control of type I error rate in assessing differential item functioning for hierarchical generalized linear model when impact is presented.Applied Psychological Measurement,38,18-36.
  7. Clarke, K. A.(2009).Return of the phantom menace: Omitted variable bias in political research.Conflict Management and Peace Science,26,46-66.
  8. Clarke, K. A.(2005).The phantom menace: Omitted variable bias in econometric research.Conflict Management and Peace Science,22,341-352.
  9. Clauser, B.,Mazor, K.,Hambleton, R. K.(1993).The effects of purification of matching criterion on the identification of DIF using the Mantel-Haenszel procedure.Applied Measurement in Education,6,269-279.
  10. Darling-Hammond, L.(1998).Unequal opportunity: Race and education.The Brookings Review,16(2),28-32.
  11. Donoghue, J. R.,Holland, P. W.,Thayer, D. T.(1993).A Monte Carlo study of factors that affect the Mantel- Haenszel and standardization measures of differential item functioning.Differential item functioning,Hillsdale, NJ:
  12. Douglas, J. A.,Roussos, L. A.,Stout, W.(1996).Itembundle DIF hypothesis testing: Identifying suspect bundles and assessing their differential functioning.Journal of Educational Measurement,33,465-484.
  13. Greene, W. H.(1993).Econometric analysis.New York, NY:Macmillan.
  14. Hauser, R. M.,Goldberger, A. S.(1971).The treatment of unobservable variables in path analysis.Sociological methodology,3,81-117.
  15. Hu, L.-T.,Bentler, P. M.(1999).Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives.Structural Equation Modeling: A Multidisciplinary Journal,6,1-55.
  16. Johnston, J.,DiNardo, J.(1997).Econometric methods.New York, NY:McGraw-Hill.
  17. Khine, M. S.(Ed.)(2013).Applications of structural equation modeling in educational research and practice.Rotterdam, Netherlands:Sense.
  18. Kline, R. B.(2005).Principles and practice of structural equation modeling.New York, NY:Guilford.
  19. Kopf, J.,Zeileis, A.,Strobl, C.(2015).Anchor selection strategies for DIF analysis: Review, assessment, and new approaches.Educational and Psychological Measurement,75,22-56.
  20. Lord, F. M.(1980).Applications of item response theory to practical testing problems.Hillsdale, NJ:Lawrence Erlbaum.
  21. Lord, F. M.(Ed.),Novick, M. R.(Ed.)(1968).Statistical theories of mental test scores.Reading, MA:Addison-Wesley.
  22. Mantel, N.,Haenszel, W.(1959).Statistical aspects of the analysis of data from retrospective studies of disease.Journal of the National Cancer Institute,22,719-748.
  23. Muthén, L. K.,Muthén, B. O.(1998).Mplus user's guide.Los Angeles, CA:Muthen & Muthen.
  24. Oneal, J. R.,Russett, B.(2005).Rule of three, let it be? When more really is better.Conflict Management and Peace Science,22,293-310.
  25. Oort, F. J.(1998).Simulation study of item bias detection with restricted factor analysis.Structural Equation Modeling: A Multidisciplinary Journal,5,107-124.
  26. Pine, S. M.(1977).Applications of item characteristic curve theory to the problem of test bias.Applications of computerized adaptive testing: Proceedings of a symposium presented at the 18th Annual Convention of the Military Testing Association,Minneapolis, MN:
  27. Rasch, G.(1960).Probabilistic models for some intelligence and attainment tests.Chicago, IL:University of Chicago Press.
  28. Rosenbaum, P. R.,Rubin, D. B.(1983).The central role of the propensity score in observational studies for causal effects.Biometrika,70,41-55.
  29. Ryan, C. L., & Bauman, K. (2016). Educational attainment in the United States: 2015. Retrieved from https://www.census.gov/content/dam/Census/library/publications/2016/demo/p20-578.pdf
  30. Sadler, P. M.,Sonnert, G.,Hazari, Z.,Tai, R.(2012).Stability and volatility of STEM career interest in high school: A gender study.Science Education,96,411-427.
  31. Schermelleh-Engel, K.,Moosbrugger, H.,Müller, H.(2003).Evaluating the fit of structural equation models: Tests of significance and descriptive goodness-of-fit measures.Methods of Psychological Research Online,8,23-74.
  32. Shih, C.-L.,Liu, T.-H.,Wang, W.-C.(2014).Controlling type I error rates in assessing DIF for logistic regression method combined with SIBTEST regression correction procedure and DIF-free-then-DIF strategy.Educational and Psychological Measurement,74,1018-1048.
  33. Smith, E.(2011).Women into science and engineering? Gendered participation in higher education STEM subjects.British Educational Research Journal,37,993-1014.
  34. Swaminathan, H.,Rogers, H. J.(1990).Detecting differential item functioning using logistic regression procedures.Journal of Educational Measurement,27,361-370.
  35. Thissen, D.,Steinberg, L.,Wainer, H.(1993).Detection of differential item functioning using the parameters of item response models.Differential item functioning,Hillsdale, NJ:
  36. Wainer, H.(Ed.),Braun, H. I.(Ed.)(1988).Test validity.Hillsdale, NJ:Lawrence Erlbaum.
  37. Wang, W.-C.,Shih, C.-L.,Sun, G.-W.(2012).The DIF-free-then-DIF strategy for the assessment of differential item functioning.Educational and Psychological Measurement,72,687-708.
  38. Wang, W.-C.,Shih, C.-L.,Yang, C.-C.(2009).The MIMIC method with scale purification for detecting differential item functioning.Educational and Psychological Measurement,69,713-731.
  39. Wang, W.-C.,Su, Y.-H.(2004).Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items.Applied Psychological Measurement,28,450-480.
  40. Wang, W.-C.,Yeh, Y.-L.(2003).Effects of anchor item methods on differential item functioning detection with the likelihood ratio test.Applied Psychological Measurement,27,479-498.
  41. Yatchew, A.,Griliches, Z.(1985).Specification error in probit models.The Review of Economics and Statistics,67,134-139.
被引用次数
  1. 鄧鈞文,陳俊瑋,林仁傑(2019)。數學成就測驗的性別差異試題功能(DIF)現象:以臺灣學生學習成就評量資料為例。教育科學期刊,18(1),71-91。