题名

大型測驗等化群體不變性之探究:以2007年臺灣學生學習成就評量資料庫國中二年級數學科為例

并列篇名

Exploring the Population Invariance of Equating in the Large-Scale Assessments: Using the Taiwan Assessment of Student Achievement as an Example

作者

王暄博(Hsuan-Po Wang);郭伯臣(Bor-Chen Kuo);呂玉如(Yu-Ju Lu)

关键词

IRT真實分數等化 ; IRT觀察分數等化 ; 群體不變性 ; 量尺轉換方法 ; IRT observed score equating ; IRT true score equating ; population invariance ; scale transformation method

期刊名称

測驗學刊

卷期/出版年月

60卷3期(2013 / 09 / 01)

页次

489 - 518

内容语文

繁體中文

中文摘要

本研究以2007年「臺灣學生學習成就評量資料庫」(TASA)國中二年級數學科的測驗資料為例,檢驗TASA測驗進行量尺程序後,其測驗分數是否有符合等化群體不變性之性質。本研究以性別進行分群,探討不同等化方法於性別受試者群體中是否保留群體不變性,包含:平均數與標準差法、平均數法、試題特徵曲線,以及測驗特徵曲線等不同量尺轉換方法,並搭配試題反應理論(IRT)真實分數與IRT觀察分數等化方法,共計八種等化方法。此外,採用Dorans與Holland(2000)提出之均方根誤差(RMSD)與均方根平均期望誤差(REMSD),以及Yang(2004)提出之均方根期望誤差(RESD)等三種方法來評估經過次群體等化後的群體不變性,並以SDTM為評估準則。研究結果顯示,TASA 2007年的數學科資料除了題本七有某些分數點超出SDTM標準值之外,其餘題本皆符合等化群體不變性。

英文摘要

This study aims to use test data from the Taiwan Assessment of Student Achievement (TASA) database to explore whether the test scores determined by the TASA complied with population invariance. Researchers used the TASA eighth grade mathematics data from 2007 and explored eight different equating methods to assess whether invariance was retained regarding the subjects' gender, including item response theory (IRT) true score and IRT observed score equating. This study also adopted four scale transformation methods, such as mean/mean, mean/sigma, Haebara, and Stocking-Lord procedures. Furthermore, Dorans and Hollands' (2000) RMSD and REMSD methods, as well as Yang's (2004) RESD method, were used to evaluate the population invariance after completed subpopulation equating. SDTM was the evaluation standard. The results showed that the TASA mathematics data correlated with the population invariance, except for the seventh booklet where a few points exceeded the SDTM standard.

主题分类 社會科學 > 心理學
社會科學 > 教育學
参考文献
  1. Hanson, B. A., Zeng, L.,&Chien, Y. (2004). PIE: IRT true and observed scoring equating for dichotomously scored tests [Computer software]. Retrieved March 10, 2011, from http://www.education.uiowa.edu/casma
  2. Hanson, B. A., Zeng, L., & Chien, Y. (2004). ST: A computer program for IRT scale transformation [Computer software]. Retrieved March 10, 2011, from http://www.education.uiowa.edu/casma
  3. 教育部統計處(2010)。99 學年度國中學生、教職員統計。2011 年5 月23 日,取自http://www.edu.tw/statistics/
  4. 臺灣學生學習成就評量資料庫(2011)。臺灣學生學習成就評量資料庫。2011年4月20日,取自http://tasa.naer.edu.tw/brief.htm。http://tasa.naer.edu.tw/brief.htm
  5. Brennan, R. L.,Kolen, M. J.(1987).Some practical issues in equating.Applied Psychological Measurement,11,279-290.
  6. Cook, L. L.,Petersen, N. S.(1987).Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances.Applied Psychological Measurement: Issues and Practice,10,37-45.
  7. Crocker, L.,Algina, J.(1986).Introduction to classical and modern test theory.New York, NY:Holt, Rinehart and Winston.
  8. Dorans, N. J.,Holland, P.W.(2000).Population invariance and equatability of tests: Basic theory and the linear case.Journal of Educational Measurement,37,281-306.
  9. Dorans, N. J.,Holland, P.W.,Thayer, D. T.,Tateneni, K.(2002).Invariance of score linking across gender groups for three Advanced Placement Program exams.Annual meeting of the National Council on Measurement in Education,New Orleans, LA:
  10. Dorans, N. J.,Liu, J.,Hammond, S.(2008).Anchor test type and population invariance: An exploration across subpopulations and test administrations.Applied Psychological Measurement,32,81-97.
  11. Gulliksen, H.(1950).Theory of mental tests.New York, NY:John Wiley & Sons.
  12. Haebara, T.(1980).Equating logistic ability scales by a weighted least squares method.Japanese Psychological Research,22,144-149.
  13. Hambleton, R. K.,Swaminathan, H.(1985).Item response theory: Principles and applications.Boston, MA:Kluwer.
  14. Hanson, B. A.,Béguin, A. A.(2002).Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design.Applied Psychological Measurement,26,3-24.
  15. Harris, D. J.(1993).Practical issues in equating.Annual meeting of the American Educational Research Association,Atlanta, GA:
  16. Harris, D. J.,Crouse, J. D.(1993).A study of criteria used in equating.Applied Measurement in Education,6,195-240.
  17. Holland, P. W.(Ed.),Rubin, D. B.(Ed.)(1982).Test equating.New York, NY:Academic Press.
  18. Kolen, M. J.,Brennan, R. L.(2004).Test equating, scaling, and linking: Methods and practices.New York, NY:Springer-Verlag.
  19. Liu, M.,Holland, P. W.(2008).Exploring population sensitivity of linking functions across three law school admission test administrations.Applied Psychological Measurement,32,27-44.
  20. Lord, F. M.(1980).Application of item response theory to practical testing problems.Hillsdale, NJ:Lawrence Erlbaum Associates.
  21. Lord, F. M.,Wingersky, M. S.(1984).Comparing IRT true-score and equipercentile observed score "equatings".Applied Psychological Measurement,8,452-461.
  22. Loyd, B. H.,Hoover, H. D.(1980).Vertical equating using the Rasch model.Journal of Educational Measurement,4,11-22.
  23. Marco, G. L.(1977).Item characteristic curve solutions to three intractable testing problems.Journal of Educational Measurement,14,139-160.
  24. Marco, G.,Petersen, N.,Stewart, E.(1979).A test of the adequacy of curvilinear score equating models.Computerized Adaptive Testing Conference,Minneapolis, MN:
  25. Petersen, N. S.,Cook, L. L.,Stocking M. L.(1983).IRT versus conventional equating methods: A comparative study of scale stability.Journal of Educational Statistics,8(2),135-156.
  26. Skaggs, G.(1990).Assessing the utility of item response theory models for testing equating.Annual meeting of the National Council on Measurement in Education,Boston, MA:
  27. Skaggs, G.,Lissitz, R. W.(1986).IRT test equating: Relevant issues and a review of recent research.Review of Educational Research,56(4),495-529.
  28. Stocking, M. L.,Lord, F. M.(1983).Developing a common metric in item response theory.Applied Psychological Measurement,7(2),201-211.
  29. von Davier, A. A.,Wilson, C.(2008).Investigating the population sensitivity assumption of item response theory true-score equating across two subgroups of examinees and two test formats.Applied Psychological Measurement,32,11-26.
  30. Yang, W.-L.(2004).Sensitivity of linkings between AP multiple-choice scores and composite scores to geographical region: An illustration of checking for population invariance.Journal of Educational Measurement,41,33-41.
  31. Yang,W.-L.,Dorans, N. J.,Tateneni, K.(2002).Sample selection effect on AP multiplechoice score to composite score scaling.Annual meeting of the National Council on Measurement in Education,New Orleans, LA:
  32. Yang,W.-L.,Gao, R.(2008).Invariance of score linkings across gender groups for forms of a testlet-based college-level examination program examination.Applied Psychological Measurement,32,45-61.
  33. Yi, Q.,Harris, D. J.,Gao, X.(2008).Invariance of equating functions across different subgroups of examinees taking a Science Achievement Test.Applied Psychological Measurement,32,62-80.
  34. Zimowski, M. F.,Muraki, E.,Mislevy, R. J.,Bock, R. D.(2003).BILOG-MG.Chicago, IL:Scientific Software International.
  35. 郭伯臣、王暄博(2008)。大型測驗中同時進行垂直與水平等化效果之探討。教育研究與發展期刊,4(4),89-90。
被引用次数
  1. 謝名娟(2020)。從多層面Rasch模式來檢視不同的評分者等化連結設計對參數估計的影響。教育心理學報,52(2),415-436。
  2. 楊心怡,陳柏熹,吳昭容,吳宜玲(2021)。三至九年級學生數學運算能力等化測量與多向度分析。清華教育學報,38(2),111-150。