题名

次級量尺分數估計法於大型教育測驗之模擬研究

并列篇名

The Subscale Scores Estimation for Large-Scale Assessments

DOI

10.7108/PT.201006.0209

作者

郭伯臣(Bor-Chen Kuo);王暄博(Hsuan-Po Wang);吳慧珉(Huey-Min Wu);張宛婷(Wan-Tin Chang)

关键词

大型測驗 ; 次級量尺分數 ; 測驗等化 ; large-scale assessments ; subscale scores ; test equating

期刊名称

測驗學刊

卷期/出版年月

57卷2期(2010 / 06 / 01)

页次

209 - 238

内容语文

繁體中文

中文摘要

近幾年,次級量尺分數的估計方法與應用開始被重視,例如:國內外大型測驗(TIMSS、PISA、NAEP、TASA)的分數報告,均呈現不同能力向度之次級量尺分數。然而,雖然國外已有學者針對次級量尺分數之研究進行探討,但是國內部分目前尚無相關研究,且並沒有研究比較這些方法使用於等化測驗設計。因此,本研究主要以模擬實驗方式探討不同次級量尺分數計算方法於不同測驗情境中,對於單一測驗設計與等化測驗設計分數之估計效果。此外,本研究亦提出新的次級量尺分數計算方法,以比較不同次級量尺分數計算方法之差異。 研究結果發現,本研究提出之新的次級量尺計算方法,於不同測驗情境中具有較佳之估計精準度。

英文摘要

The purpose of this paper is to explore subscale scores estimation in two testing design situations, single testing design and equating testing design. Additionally, two new methods to estimate subscale scores are presented in this paper. Using simulation data, this study investigates the accuracy of subscale scores estimation for different methods of estimating subscale scores. In single testing design, factors taken into consideration include the following: correlation between subscales, sample sizes, ratio of CR/MC items, numbers of subscales, and test length. In equating testing design, factors taken into consideration include the following: correlation between subscales, sample sizes, collocation of anchor items, and equating methods. The results show that: 1. New methods of estimating subscale scores are better than other methods. 2. The estimation error decreases as correlation between subscales increases; however, the sample sizes don't impact the estimation error. 3. In single testing design, the estimation error decrease as ratio of CR/MC items increase and the estimation error decrease as test length increase. 4. In equating testing design, the collocation of anchor items do not impact the estimation error and the concurrent calibration method based on item response theory has higher accuracy than equating calibration based on classical test theory.

主题分类 社會科學 > 心理學
社會科學 > 教育學
参考文献
  1. Kelley, T. L. (1927). The interpretation of educational measurements. New York: World Book.
  2. Kelley, T. L. (1947). Fundamentals of statistics. Cambridge, MA: Harvard University Press.
  3. Baxter, G. P.,Ahmed, S.,Sikali, E.,Waits, T.,Sloan, M.,Salvucci, S.(2007).Technical report of the NAEP Mathematics Assessment in Puerto Rico: Focus on statistical issues (NCES 2007-462rev).Washington, DC:National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.
  4. Bock, R. D.,Thissen, D.,Zimowski, M. F.(1997).IRT estimation of domain scores.Journal of Educational Measurement,34(3),197-211.
  5. Brennan, R. L.(ed.)(2007).Educational measurement.New York:Macmillan.
  6. Gessaroli, M. E.(2004).Using hierarchical multidimensional item response theory to estimate augmented subscores.the annual meeting of the National Council on Measurement in Education,San Diego, CA:
  7. Green, B. F.,Bock, R. D.,Humphreys, L. G.,Linn, R. L.,Reckase, M. D.(1984).Technical guidelines for assessing computerized adaptive tests.Journal of Educational Measurement,21(4),347-360.
  8. Gummerman, K.(1972).A response-contingent measure of proportion correct.The Journal of the Acoustical Society of America,52,1645-1647.
  9. Johnson, D. A.,Wichern, D. W.(2007).Applied multivariate statistical analysis.Upper Saddle River, NJ:Pearson.
  10. Kahraman, N.,Kamata, A.(2004).Increasing the precision of subscale scores by using out-of-scale information.Applied Psychological Measurement,28(6),407-426.
  11. Lord, F. M.(1983).Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability.Psychometrika,48,233-245.
  12. Lord, F. M.(1980).Application of item response theory to practical testing problems.Hillsdale, NJ:Lawrence Erlbaum Associates.
  13. Martin, M. O.(Ed.),Mullis, I. V. S.(Ed.),Chrostowski, S. J.(Ed.)(2004).TIMSS 2003 Technical Report.Chestnut Hill, MA:TIMSS & PIRLS International Study Center, Boston College.
  14. Muraki, E.(1992).A generalized partial credit model: Application of an EM algorithm.Applied Psychological Measurement,16(2),159-176.
  15. Muraki, E.,Bock, R. D.(1996).PARSCALE: IRT based test scoring and item analysis for graded open-ended exercises and performance tasks (Version 3).Chicago, IL:Scientific Software International.
  16. Nance, L. A.,John, R. D.,Terry, L. S.(2001).The NEAP 1998 Technical Report.Washington, DC:National Center for Education Statistics, Educational Testing Service.
  17. Novick, M. R.,Jackson, P. H.(1974).Statistical methods for educational and psychological research.New York, NY:McGraw-Hill.
  18. Organisation for Economic Co-operation and Development [OECD](2005).PISA 2003 Technical Report.Paris:Organisation for Economic Co-operation and Development [OECD].
  19. Pommerich, M.,Nicewander, W. A.,Hanson, B.(1999).Estimating average domain scores.Journal of Educational Measurement,36,199-216.
  20. Shin, C. D.(2006).A comparison of methods of estimating subscale scores for Mixed-Format tests.the annual meeting of the National Council on Measurement in Education,San Francisco, CA:
  21. Shin, C. D.,Ansley, T.,Tsai, T.,Mao, X.(2005).A comparison of methods of estimating objective scores.the annual meeting of the National Council on Measurement in Education,Montreal, Quebec, Canada:
  22. Tate, R. L.(2004).Implications of multidimensionality for total score and subscale performance.Applied Measurement in Education,17(2),89-112.
  23. Wainer, H.,Vevea, J. L.,Camacho, F.,Reeve III, B. B.,Rosa, K.,Nelson, L.,Swygert, K. A.,Thissen, D.(2000).Test scoring.Hillsdale, NJ:Lawrence Erlbaum Associates.
  24. Yen, W. M.(1987).A Bayesian/IRT index of objective performance.the annual meeting of the Psychometric Society,Montreal, Quebec, Canada:
  25. Yen, W. M.,Sykes, R. C.,Ito, K.,Julian, M.(1997).A Bayesian/IRT index of objective performance for tests with mixed-item types.the annual meeting of the National Council on Measurement in Education,Chicago, IL:
  26. Zimowski, M. F.,Muraki, E.,Mislevy, R. J.,Bock, R. D.(2003).BILOG-MG.Chicago, IL:Scientific Software International.
  27. 洪碧霞、林素微、林娟如(2006)。認知複雜度分析架構對TASA-MAT六年級線上測驗試題難度的解釋力。教育研究與發展期刊,2(4),69-86。
  28. 楊孟麗、譚康榮、黃敏雄(2003)。台灣教育長期追蹤資料庫—心理計量報告:TEPS2001分析能力測驗第一版。台北市:中央研究院調查研究專題中心。
被引用次数
  1. 王榮照(2014)。模擬研究適合的實驗設計。運動教練科學,33,67-77。