题名

DIF成因之初探:試題特徵與差異試題功能之關聯

并列篇名

Investigating Sources of Differential Item Functioning: the Relationship Between Item Property and Differential Item Functioning

DOI

10.6251/BEP.201812_50(2).0001

作者

孫國瑋(Guo-Wei Sun);陳承德(Cheng-Te Chen);施慶麟(Ching-Lin Shih)

关键词

DIF成因 ; 差異試題功能 ; 差異層面功能 ; 線性邏輯斯測驗模式 ; 隨機效果線性邏輯斯測驗模式 ; DIF source ; Differential item functioning ; Differential facet functioning ; Linear logistic test model ; Random effects linear logistic test model

期刊名称

教育心理學報

卷期/出版年月

50卷2期(2018 / 12 / 01)

页次

167 - 188

内容语文

繁體中文

中文摘要

近年來,研究者對於差異試題功能(differential item functioning, DIF)議題的探討,已由「檢測」DIF轉變為「解釋」DIF。以往對於DIF試題的解釋,多有賴於專家質性審查的方式。然而,如果能有量化分析的證據輔助專家審查,可對DIF成因的判斷有所幫助。本研究透過分析DIF試題之特徵,找出試題特徵與DIF之關聯,作為後續專家審查時判斷DIF成因的參考。為此,本研究採用線性邏輯斯測驗模式(linear logistic test model, LLTM)及隨機效果線性邏輯斯測驗模式(random effects linear logistic test model, LLTM-R)針對測驗中各試題特徵進行所謂的差異層面功能(differential facet functioning, DFF)之檢測,藉以說明試題特徵與DIF之關聯。模擬研究結果顯示試題的DIF程度受到該試題特徵的DFF效果之影響。此外,測驗的Q矩陣密度較高時(例如60%),可能因型一誤差之膨脹而檢測出高比例的DIF試題;本研究另以實徵資料說明如何針對試題進行DFF分析,藉以找出與DIF有關的試題特徵,並作為後續試題修正之方向。根據結果,本研究建議採用LLTM-R進行DFF檢測,可有助於釐清試題特徵與DIF之關聯。

英文摘要

Because assessment methods for differential item functioning (DIF) have been developed and thoroughly investigated, the focus in DIF research has shifted to explaining DIF phenomena. Experts in this field are recruited to tap possible sources of DIF. Quantitative analysis results help experts reviewing DIF to locate sources for DIF items. This study aimed to demonstrate the use of the differential facet functioning (DFF) procedure implemented using the linear logistic test model (LLTM) and random effects linear logistic test model (LLTM-R) to explain possible DIF sources. The efficiency of LLTM and LLTM-R in detecting DFF under various conditions was also evaluated. The simulation results indicated that the DIF effect was significantly influenced by the DFF effect of item properties. Moreover, as the design matrices had a high density (e.g., 60%), Type-I error rates of DIF assessment were seriously inflated. We also demonstrated the procedure of DFF analysis with an empirical data. The result showed that most DIF items were related to two item properties, which would be provided as possible DIF sources in the item-review meeting. Researchers should implement DFF assessment using LLTM-R to help explain DIF sources.

主题分类 社會科學 > 心理學
社會科學 > 教育學
参考文献
  1. 王佳琪、何曉琪、鄭英耀(2014)。「科學創造性問題解決測驗」之發展。測驗學刊,61(3),337-360。
    連結:
  2. 侯雅齡(2013)。高級中學自然科學術性向測驗編製。科學教育學刊,21(2),189-213。
    連結:
  3. 張銘秋、謝秀月、徐秋月(2010)。PISA 科學素養之試題認知成份分析。課程與教學,13(1),1-20。
    連結:
  4. 曾明基、邱皓政(2015)。研究生評鑑教師教學的結果真的可以與大學生一起比較嗎?多群組混合MIMIC-DIF 分析。測驗學刊,62(1),1-23。
    連結:
  5. 黃宏宇、洪素蘋(2009)。建構效度檢驗之線性與非線性取向:以學生創意自我效能量表為例。屏東教育大學學報-教育類,33,489-513。
    連結:
  6. 廖彥棻(2015)。英文學科能力測驗選擇題之性別差異與差異試題功能分析。東吳外語學報,41,21-59。
    連結:
  7. 蕭偉智、傅家珍(2012)。國中八年級自然科定期評量之性別差別試題功能(DIF)分析。新竹教育大學教育學報,29(2),35-64。
    連結:
  8. 賴姿伶、余民寧(2015)。應徵者與在職者在多分題人格測驗的作答差異之研究:試題層次與試題組合層次的分析。人力資源管理學報,15(4),91-120。
    連結:
  9. 蘇旭琳、陳柏熹(2008)。DIF 分析在小樣本情境中的偵測效果─以視障生和普通生在國中基測數學科之DIF 爲例。測驗學刊,55(4),761-791。
    連結:
  10. Baker, F. B.(1993).Sensitivity of the linear logistic test model to misspecification of the weight matrix.Applied Psychological Measurement,17,201-210.
  11. Bates, D.,Maechler, M.,Bolker, B.,Walker, S.(2014).lme4: Linear mixed-effects models using Eigen and S4.R Package Version,1(7),23.
  12. Beretvas, S. N.,Cawthon, S. W.,Lockhart, L. L.,Kaye, A. D.(2012).Assessing impact, DIF, and DFF in accommodated item scores a comparison of multilevel measurement model parameterizations.Educational and Psychological Measurement,72(5),754-773.
  13. Bolt, D.(2002).Studying the potential of nuisance dimensions using bundle DIF and multidimensional IRT analyses.annual meeting of the National Council on Measurement in Education,New Orleans: LA:
  14. Choi, I. H.,Wilson, M.(2015).Multidimensional classification of examinees using the mixture random weights linear logistic test model.Educational and Psychological Measurement,75,78-101.
  15. De Boeck, P.(2008).Random item IRT models.Psychometrika,73(4),533-559.
  16. De Boeck, P.(Ed.),Wilson, M.(Ed.)(2004).Explanatory item response models: A generalized linear and nonlinear approach.New York, NY:Springer-Verlag.
  17. De Boeck, P.,Wilson, M.(2004).Explanatory item response models: A generalized linear and nonlinear approach.New York, NY:Springer.
  18. Douglas, J. A.,Roussos, L. A.,Stout, W.(1996).Item‐Bundle DIF Hypothesis Testing: Identifying Suspect Bundles and Assessing Their Differential Functioning.Journal of Educational Measurement,33(4),465-484.
  19. Drabinová, A.,Martinková, P.(2016).,未出版
  20. Embretson, S.E.(Ed.)(2010).Measuring psychological constructs: Advances in modelbased approaches.Washington, DC:American Psychological Association.
  21. Engelhard, G.(1992).The measurement of writing ability with a many-faceted Rasch model.Applied Measurement in Education,5,171-191.
  22. Ercikan, K.(2002).Disentangling sources of differential item functioning in multilanguage assessments.International Journal of Testing,2(3-4),199-215.
  23. Ercikan, K.,Arim, R. G.,Law, D. M.,Lacroix, S.,Gagnon, F.,Domene, J. F.(2010).Application of think-aloud protocols in examining sources of differential item functioning.Educational Measurement: Issues and Practice,29(2),24-35.
  24. Fischer, G. H.(1973).The linear logistic test model as an instrument in educational research.Acta Psychologica,37,359-374.
  25. Gierl, M. J.,Bisanz, J.,Bisanz, G. L.,Boughton, K. A.(2003).Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the multidimensionality-based DIF analysis paradigm.Journal of Educational Measurement,40(4),281-306.
  26. Gierl, M. J.,Bisanz, J.,Bisanz, G. L.,Boughton, K. A.,Khaliq, S. N.(2001).Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests.Educational Measurement: Issues and Practice,20,26-36.
  27. Gierl, M. J.,Bolt, D. M.(2001).Illustrating the use of nonparametric regression to assess differential item and bundle functioning among multiple groups.International Journal of Testing,1(3-4),249-270.
  28. Gierl, M. J.,Khaliq, S. N.(2001).Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis.Journal of Educational Measurement,38(2),164-187.
  29. Green, K. E.,Smith, R. S.(1987).A comparison of two methods of decomposing item difficulties.Journal of Educational Statistics,12,369-381.
  30. Holland, H.(Ed.),Braun, H. I.(Ed.)(1988).Test validity.Hillsdale, NJ:Erlbaum.
  31. Holland, P. W.(Ed.),Wainer, H.(Ed.)(1993).Differential item functioning.Hillside, NJ:Lawrence Erlbaum.
  32. Jin, K. Y.,Wang, W. C.(2017).Assessment of Differential Rater Functioning in Latent Classes with New Mixture Facets Models.Multivariate Behavioral Research,52(3),391-402.
  33. Kaplan, D.(2009).Structural equation modeling: Foundations and extensions.Los Angeles, CA:Sage.
  34. Linacre, J. M.(1989).Many-facet Rasch measurement.Chicago:MESA Press.
  35. Linacre, J. M.(2017).Linacre, J. M. (2017). Winsteps® Rasch measurement computer program. Beaverton, Oregon: Winsteps. com..
  36. Magis, D.,Beland, S.,Tuerlinckx, F.,De Boeck, P.(2010).A general framework and an R package for the detection of dichotomous differential item functioning.Behavior Research Methods,42,847-862.
  37. Mazor, K. M.,Clauser, B. E.,Hambleton, R. K.(1992).The effect of sample size on the functioning of the Mantel-Haenszel statistic.Educational and Psychological Measurement,52(2),443-451.
  38. Mendes-Barnett, S.,Ercikan, K.(2006).Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach.Applied Measurement in Education,19(4),289-304.
  39. Oliveri, M. E.,Ercikan, K.(2011).Do different approaches to examining construct comparability lead to similar conclusions?.Applied Measurement in Education,24,1-18.
  40. R Core Team. (2015). R: A language and environment for statistical computing [Interne]. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
  41. Rasbash, J.,Charlton, C.,Browne, W.J.,Healy, M.,Cameron, B.(2009).MLwiN Version 2.10. Centre for Multilevel Modelling.University of Bristol.
  42. Rasch, G.(1960).Probabalistic models for some intelligence and attainment tests.Copenhagen, Denmark:The Danish Institute for Educational Research.
  43. Roussos, L.,Stout, W.(1996).A multidimensionality-based DIF analysis paradigm.Applied Psychological Measurement,20,355-371.
  44. Sakamoto, Y.,Ishiguro, M.,Kitagawa, G.(1986).Akaike information criterion statistics.Dordrecht, The Netherlands:D. Reidel.
  45. Schwarz, G.(1978).Estimating the dimension of a model.The Annals of Statistics,6(2),461-464.
  46. Shealy, R.,Stout, W. F.(1993).A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF.Psychometrika,58,159-194.
  47. Sinharay, S.,Dorans, N. J.,Grant, M. C.,Blew, E. O.(2009).Using past data to enhance small sample DIF estimation: A Bayesian approach.Journal of Educational and Behavioral Statistics,34,74-96.
  48. Spiegelhalter, D. J.,Thomas, A.,Best, N. G.,Lunn, D.(2003).WinBUGS version 1.4 users manual.Cambridge:MRC Biostatistics Unit.
  49. Van den Noortgate, W.,De Boeck, P.(2005).Assessing and explaining differential item functioning using logistic mixed models.Journal of Educational and Behavioral Statistics,30,443-464.
  50. Wu, M. L.,Adams, R. J.,Wilson, M.(1998).ACER ConQuest: Generalized item response modeling software manual.Melbourne, Victoria:The Australian Council for Educational Research Ltd.
  51. Xie, Y.,Wilson, M.(2008).Investigating DIF and extensions using an LLTM approach and also an individual differences approach: an international testing context.Psychology Science,50(3),403.
  52. Zumbo, B. D.(2007).Three generation of DIF analyses: Considering where it has been, where it is now, and where it is going.Language Assessment Quarterly: An International Journal,4,223-233.
  53. Zumbo, B. D.(1999).A handbook on the theory and methods of differential item functioning (DIF).Ottawa, Ontario, Canada:Directorate of human resources research and evaluation, department of National defense.
  54. Zumbo, B. D.,Liu, Y.,Wu, A. D.,Shear, B. R.,Olvera Astivia, O. L.,Ark, T. K.(2015).A methodology for Zumbo's third generation DIF analyses and the ecology of item responding.Language Assessment Quarterly,12(1),136-1.
  55. 林月仙(2013)。中文色塊測驗之認知成分分析:LLTM 與SEM 取向。教育與心理研究,36(2),113-144。
被引用次数
  1. 曾鈺琪(2019)。臺灣國中青少年之自然連結量表編製與信效度分析。科學教育學刊,27(4),323-345。