


Applying Multidimensional IRT based Plausible Value Method to Estimate Group Means in Large-scale Assessment-Using TASA 2006 Mathematics as An Example




王敏嫻(Min-Shian Wang);曾筱倩(Hsiao-Chien Tseng);郭伯臣(Bor-Chen Kuo);吳慧珉(Huey-Min Wu)


大型測驗 ; 臺灣學生學習成就評量資料庫 ; 可能值 ; 多向度試題反應模式 ; Large-scale assessment ; Taiwan Assessment of Student Achievement ; plausible values ; Multidimensional item response theory




18期_上(2010 / 06 / 01)


47 - 67




NAEP、TIMSS和PISA等大型測驗,學生的成就資料集是以可能值的資料型態提供給次級資料分析者進行統計特性描述。目前NAEP、TIMSS和PISA等所公布的技術報告,主要是以單向度的試題反應理論為基礎,透過可能值方法估計群體能力值,尚未以可能值方法進行估計不同向度之群體能力值。 本研究根據2006年TASA數學科國小四年級施測後所蒐集到的學生作答資料與問卷資料,將學生的作答反應以多向度試題反應模式進行能力值估計,再分別將問卷背景變項與能力值進行二系列相關。將相關係數取絕對值後經由高低排序,分別從高到低抽取六個背景變項,探討背景變項與能力間的相關,是否會造成學生群體能力值的差異以及納入背景變項對學生群體能力值差異的影響? 研究結果顯示:多向度試題反應理論下可能值方法於有無納入背景變項等不同模式中,對於不同向度下的不同類別平均力值由高到低的排序有一致的趨勢;此外,有納入背景變項進行估計,在五個向度中,最高數學成就能力值的組別與最低數學成就能力值組別的差距比忽略背景變項進行能力值估計的差距來的大,所以有無納入背景變項會影響估計群體能力參數;另外,在納入不同相關程度的背景變項之研究中發現,在各個向度中隨著納入的背景變項與能力相關的遞減,其群體能力平均的差異也隨之遞減。


In the international large-scale assessment programs, such as NAEP, TIMSS and PISA, the plausible value methods based on unidimensional item response theory were used to estimate the population characteristics. The plausible value methods based on multidimensional item response theory were not used in the international assessment program .were use unidimensional plausible value method to estimate the population characteristics, but it does not use the multidimensional plausible value method to estimate the different dimension's population characteristics. The multidimensional item response theory based plausible value method was used to estimate group means with the empirical data from TASA 2006 mathematics in this paper. The examinees' background variables (BVs) were included in the plausible value method to improve the precision of the ability estimation. The effect of five models with different procedures of including background variables into the plausible value methods were explored in this paper. The results showed that the estimation of group means was affected by including different procedures of background variables. As the correlation between abilities and background variables decreased, the difference between group means also decreased.

主题分类 基礎與應用科學 > 統計
社會科學 > 教育學
  1. Wu, M., Adams, R. J. & Wilson, M. R. (1997). ConQuest:Multi-Aspect Test Software computer program, Australian Council for Education Research, Camberwell.
  2. Lee, J., Grigg, W., & Dion, G. (2007). The Nation's Report Card: Mathematics 2007. National Center for Education Statistics, Institute of Education Sciences, U. S. Department of Education, Washington, D. C.
  3. TASA臺灣學生學習成就評量資料庫.上網日期:99年5月15日.取自:http://tasa.naer.edu.tw/15news-1.asp.http://tasa.naer.edu.tw/15news-1.asp
  4. Adams, R. J.,Wilson, M.,Wang, W. -C.(1997).The multidimensional random coefficients multinomial logit model.Applied Psychological Measurement,21,1-23.
  5. Allen, N. L.,Donoghue, J. R.,Schoeps, T. L.(2001).The NAEP 1998 technical report.Washington, DC:National Center for Educational Statistics.
  6. de la Torre, J.,Song, H.(2009).Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables.Applied Psychological Measurement,33,465-485.
  7. Fox, J. P.,Klein Entink, R. H.,van der Linden, W. J.(2007).Modeling of responses and response times with the package CIRT.Journal of Statistical Software,20,1-14.
  8. Hoskens, M.,De Boeck, P.(1997).A parameteric model for local dependence among test items.Psychological methods,2,261-277.
  9. Lord, F. M.(1980).Applications of Item Response Theory to Practical Testing Problems.Hillsdale, NJ:Lawrence Erlbaum.
  10. Lord, F. M.,Novick, M. R.(1968).Statistical theories of mental test scores.Reading, Mass:Addison-Wesley.
  11. Martin, M. O.(ed.),Mullis, I. V. S.(ed.),Chrostowski, S. J.(ed.)(2004).TIMSS 2003 Technical Report.Chestnut Hill, MA:TIMSS & PIRLS International Study Center, Boston College.
  12. Mislevy, R. J.,Beaton, A. E.,Kaplan, B.,Sheehan, K. M.(1992).Estimating population characteristics form sparse matrix samples of item response.Journal of Educational Measurement,29,133-161.
  13. Mislevy, R. J.,Bock R. D.(1982).Implementation of the EM algorithm in the estimation of item parameters: The BILOG computer program.Item Response Theory and Computerized Adaptive Testing Conference,Wayzata, MN:
  14. Mislevy, R. J.,Sheehan, K. M.(1989).Information matrices in latent-variable models.Journal of Educational Statistics,14,335-350.
  15. Mullis, I. V. S.,Martin, M. O.,Ruddock, G. J.,O''Sullivan, C.Y.,Arora, A.,Eberber, E.(2005).TIMSS 2007 Assessment Frameworks.
  16. OECD(2005).PISA 2003 Technical Report.Paris:OCED.
  17. OECD(2006).Assessing Scientific, Reading and Mathematical Literacy.Paris:OCED.
  18. van der Linden, W. J.(2007).A hierarchical framework for modeling speed and accuracy on test items.Psychometrika,72,287-308.
  19. Wang, W.,Wilson, M.,Cheng, Y.(2000).Local Dependence between Latent Traits when Common Stimuli are Used.International Objective Measurement Workshop,New Orleans, LA:
  20. Wang, X.,Bradlow, E. T.,Wainer, H.(2004).A user's guide for SCORIGHT(version 3.0): A computer program for scoring tests built of testlets including a module for covariate analysis.Princeton, NJ:Educational Testing Service.
  21. Wilson, M.,Adams R. J.(1995).Rasch models for item bundles.Psychometrika,60,181-198.
  22. Wu, M.(2005).The role of plausible values in large-scale surveys.Studies in Educational Evaluation,31(2-3),114-128.
  1. 郭伯臣、張素珍、林佳樺、李佩瑾(2010)。應用HIRT於實徵資料分析─以國小六年級數學小數的除法單元為例。測驗統計年刊,18(下),51-66。