题名

以可能值方法為基礎之多向度能力值垂直等化探究

并列篇名

The Research in Estimating Multidimensional Traits under Vertical Equating Based on Plausible Value Method

作者

吳慧珉(Huey-Min Wu);郭伯臣(Bor-Chen Kuo);許天維(Tian-Wei Sheu);陳婉寧(Wan-Ning Chen)

关键词

大型測驗 ; 可能值方法 ; 多向度試題反應理論 ; 垂直等化 ; 能力估計 ; Large-scale assessments ; MIRT ; plausible value method ; trait estimation ; vertical equating

期刊名称

測驗學刊

卷期/出版年月

62卷2期(2015 / 06 / 01)

页次

95 - 126

内容语文

繁體中文

中文摘要

現今國際上幾個著名大型測驗均使用可能值方法呈現群體參數,因可能值方法在群體參數的回復性極佳,且大型測驗關注的焦點正是群體參數。建置大型測驗的目的通常是為了長期的教育成效評估,因此,如何檢視學生是否隨著年級不同而在某些能力值上有所不同,便成了一項值得關注的議題。透過垂直等化能使不同年級的受試者分別接受符合於其能力範圍的試題之後,將測量結果建置在同一量尺上,以進行能力高低之比較。本研究以多向度試題反應理論為基礎,使用垂直等化設計,探討不同題數、不同向度數對於能力參數估計的影響,並以不同估計方法與可能值方法進行比較。研究結果顯示,可能值方法在群體標準差的估計上有極佳的精準度,而群體能力平均數的估計則與其他估計法差不多;在多向度垂直等化設計下,每向度所對應的題數較多時則估計的效果較好。

英文摘要

The purpose of large-scale assessment is to monitor group progress. Therefore, group statistics are what the large-scale assessment focus on. Plausible value method is proposed to be a great method that measures population statistics accurately so it is used to provide students' achievement data by some significant large-scale assessment programs. Vertical equating is the way test publishers used to longitudinally evaluate achievement that spans grade levels. This research is aimed to analysis if: (1) the method that used to estimate parameters; (2) the number of item for each dimension whether or not impact on the recovery of ability parameters of group statistics, based on multidimensional item response theory (MIRT) with the vertical equating design. The result indicates that plausible value method recovers the standard deviation very well but not outstands in recovering the population means. When using MIRT vertical design, parameters are estimated better when the number of items is more.

主题分类 社會科學 > 心理學
社會科學 > 教育學
参考文献
  1. 許天維、郭伯臣、吳慧珉、葉昶成(2013)。單向度試題反應理論之可能值方法於等化設計下之模擬實驗探究。測驗統計年刊,21(下),1-24。
    連結:
  2. 陳柏熹(2006)。能力估計方法對多項度電腦化適性測驗評量精準度的影響。教育心理學報,38(2),195-211。
    連結:
  3. Wu,M., Adams, R. J.,Wilson, M. R., & Haldane, A. H. (2007). ACER ConQuest 2.0 [computer program]. Hawthorn, Australia: ACER.
  4. Adams, R. J.,Wilson, M.,Wang, W.(1997).The multidimensional random coefficients multinomial logit model.Applied Psychological Measurement,21(1),1-23.
  5. Adams, R. J.,Wilson, M.,Wu, M.(1997).Multilevel item response models: An ap-proach to errors in variables regression.Journal of Educational and Behavioral Stat-istics,22,47-76.
  6. Allen, N. L.,Donoghue, J. R.,Schoeps, T. L.(2001).The NAEP 1998 technical report.Washington, DC:National Center for Education Statistics.
  7. de la Torre, J.,Song, H.(2009).Improving the quality of ability estimates through mul-tidimensional scoring and incorporation of ancillary variables.Applied Psychological Measurement,33,465-485.
  8. Glas, C. A. W.,Geerlings, H.(2009).LSAC ResearchLSAC Research,Law School Admission Council.
  9. Hattie, J.(1981).Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory.Armidale, Australia:The University of New England, Center for Behavioral Studies.
  10. Ito, K.,Sykes, R. C.,Yao, L.(2008).Concurrent and separate grade-groups linking pro-cedures for vertical scaling.Applied Measurement in Education,21,187-206.
  11. Kim, S.,Cohen, A. S.(1998).A comparison of linking and concurrent calibration under item response theory.Applied Psychological Measurement,22,131-143.
  12. Kolen, M. J.,Brennan, R. J.(1995).Test equating: Methods and practices.New York, NY:Springer-Verlag.
  13. Lord, F. M.(1983).Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability.Psychometrika,48,233-245.
  14. Mckinley, R. L.,Reckase, M. D.(1983).MAXLOG: A computer program for the esti-mation of the parameters of a multidimensional logistic model.Behavior Research Methods and Instrumentation,15,389-390.
  15. Mislevy, R. J.(1984).Estimating latent distributions.Psychometrika,49,359-381.
  16. Mislevy, R. J.(1991).Randomization-based inference about latent variable from complex samples.Psychometrika,56,177-196.
  17. Mislevy, R. J.,Beaton, A. E.,Kaplan, B.,Sheehan, K. M.(1992).Estimating population characteristics from sparse matrix samples of item response.Journal of Educational Measurement,29,133-161.
  18. Mislevy, R. J.,Johnson, E. G.,Muraki, E.(1992).Scaling procedures in NAEP.Journal of Educational Statistics,17,131-154.
  19. Mislevy, R. J.,Sheehan, K. M.(1989).Information matrices in latent-variable models.Journal of Educational Statistics,14,335-350.
  20. Nemhauser, G. L.,Wolsey, L. A.(1999).Integer and combinatorial optimization.New York, NY:John Wiley & Sons.
  21. Olson, J. F.(Ed.),Martin, M. O.(Ed.),Mullis, I. V. S.(Ed.)(2008).TIMSS 2007 technical report.Boston, MA:TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.
  22. Organisation for Economic Co-operation and Development=OECD(2009).PISA 2006 technical report.Paris, France:OECD.
  23. Reckase, M. D.(2009).Multidimensional item response theory.New York, NY:Springer.
  24. Reckase, M. D.,Mckinley, R. L.(1991).The discriminating power of items that measure more than one dimension.Applied Psychological Measurement,15,361-373.
  25. Sympson, J. B.(1978).A model for testing with the multidimensional items.Proceedings of the 1977 Computerized Adaptive Testing Conference,Minneapolis, MN:
  26. van der Linden, W. J.,Veldkamp, B. P.,Carlson, J. E.(2004).Optimizing balanced in-complete block designs for educational assessments.Applied Psychological Measurement,28,317-331.
  27. von Davier, M.,Gonzalez, E.,Mislevy, R. J.(2009).What are plausible values and why are they useful?.IERA Monograph Series: Issues and Methodologies in Large-Scale Assessment
  28. Warm, T. A.(1989).Weighted likelihood estimation of ability in item response theory.Psy-chometrika,54,427-450.
  29. Wu, M.(2005).The role of plausible values in large-scale surveys.Studies in Educational Evaluation,31(2-3),114-128.
  30. 余民寧(2009)。試題反應理論(IRT)及其應用。臺北市:心理。
  31. 郭伯臣編、曾建銘編、吳慧珉編(2012)。大型標準化測驗建置流程應用於TASA 之研究。新北市:國家教育研究院。
  32. 郭伯臣、王暄博(2008)。大型測驗中同時進行垂直與水平等化效果之探討。教育研究與發展期刊,4(4),87-120。
  33. 黃珮璇(2007)。碩士論文(碩士論文)。臺中市,國立臺中教育大學。
被引用次数
  1. 謝名娟(2020)。從多層面Rasch模式來檢視不同的評分者等化連結設計對參數估計的影響。教育心理學報,52(2),415-436。
  2. 楊心怡,陳柏熹,吳昭容,吳宜玲(2021)。三至九年級學生數學運算能力等化測量與多向度分析。清華教育學報,38(2),111-150。