题名

以常態混組模型討論書籤標準設定法對英語聽讀基本能力標準設定有效性之輻合證據

并列篇名

Normal Mixture Model as Convergent Validity Evidence to Bookmark Standard Setting of English Reading and Listening Ability

DOI

10.6251/BEP.20081015

作者

吳毓瑩(Yuh-Yin Wu);陳彥名(Yan-Ming Chen);張郁雯(Yuwen Chang);陳淑惠(Shu-hui Eileen Chen);何東憲(Tung-Hsien He);林俊吉(Jyun-Ji Lin)

关键词

英語聽讀能力 ; 效度之輻合證據 ; 常態混組模型 ; 書籤標定法 ; 標準設定 ; bookmark standard setting method ; convergent evidence of validity ; English reading and listening ability ; normal mixture model ; standard setting

期刊名称

教育心理學報

卷期/出版年月

41卷1期(2009 / 09 / 01)

页次

69 - 89

内容语文

繁體中文

中文摘要

本研究旨在探討書籤標準設定法(簡稱書籤標定法)應用於2005 年台灣學生學習成就資料庫(TASA, Taiwan Assessment of Student Achievement)中之「英語文學習成就評量」的英語聽讀基本能力標準設定(Standard Setting)的判斷歷程,以及所判斷結果之輻合證據的有效性。研究樣本有10101名小六學生,分層抽樣自全台各地,施測方式採取平衡不完全區塊設計(Balanced Incomplete Block Design),將70個聽與讀的題目設計成6個40題組成的題本。書籤標定法標準設定會議由十三位專家(七位學科內容教授、三位測驗教授、以及三位英語專家教師)組成,會中共識設一切分點(Ө=-0.57),通過組學生佔全體72.9%。本研究利用常態混組模型(normal mixture model)之計量模式結果作為書籤標定法有效程度的輻合證據(convergent evidence),其估計的切分點(Ө=-0.40)與專家設定的分類結果達到.87之Kappa一致性。文末研究者提出實務使用上以及理論上的討論議題。

英文摘要

This study investigated convergent validity of the bookmark standard setting method used for English reading and listening ability. The data set was obtained from 2005 Taiwan Assessment of Student Achievement (TASA) data bank. A total of 10101 sixth graders from different areas of Taiwan were cluster sampled and tested by a 40-item scale. The scale was developed through balanced incomplete block design out of 70 items. Thirteen experts formed bookmark standard setting seminar. Among them, 7 were university professors in the English-as-a-Foreign-Language (EFL) field, 3 were professors in measurement, and 3 were elementary school English master teachers. They attained the consensus of cut score Ө=-.57 with 72.7% of students were classified as passed. The result from normal mixture model (Ө=-.40) was consistent with the result from the bookmark standard setting method with classification consistency Kappa=.87, indicating convergent validity evidence. In line with this finding, issues on how to implement bookmark standard setting approach were further explored and discussed.

主题分类 社會科學 > 心理學
社會科學 > 教育學
参考文献
  1. Akaike, H.(1977).Factor analysis and AIC.Psychometrika,52,317-332.
  2. American Educational Research Association, American Psychological Association, and National Council on Measurement in Education(1999).Standards for educational and psychological testing.Washington, DC:American Educational Research Association.
  3. Angoff, W. H.,R. L. Thorndike (Ed.)(1971).Educational measurement.Washington, DC:American Council on Education.
  4. Basford, K. E.,McLachlan, G. J.(1985).Likelihood estimation with normal mixture models.Applied Statistics,34,282-289.
  5. Buckendahl, C. W.,Smith, R. W.,Impara, J. C.,Plake, B. S.(2002).A comparison of Angoff and bookmark standard setting methods.Journal of Educational Measurement,39(3),253-263.
  6. Campbell, D. T.,Fiske, D. W.(1959).Convergent and discriminant validation by the multitraitmultimethod matrix.Psychological Bulletin,56,81-105.
  7. Cizek, G. J.,G. J. Cizek (Ed.)(2001).Setting performance standards: Conxepts, methods, and perspectives.Mahwah, NJ:Lawrence Erlbaum Associates.
  8. Crocker, L.,Algina, J.(1986).Introduction to classical and modern test theory.NY:Holt, Rinehart and Winston.
  9. Ebel, R. L.(1972).Essentials of educational measurement.NJ:Prentice-Hall.
  10. Eckhout, T. J.,Plake, B. S.,Smith, D. L.,Larsen, A.(2007).Aligning a state's alternative standards to regular core content standards in reading and mathematics: A case study.Applied Measurement in Education,20(1),79-100.
  11. Everitt, B. S.,Hand, D. J.(1981).Finite mixture distributions.London:Chapman and Hall.
  12. Flanagan, J. C.,E. F. Lindquist (Ed.)(1951).Educational measurement.Washing, DC:American Council on Education.
  13. Green, D. R,Trimble, C. S.,Lewis, D. M.(2003).Interpreting the results of three different standard setting procedures.Educational Measurement: Issues and Practice,22(1),22-32.
  14. Hambleton, R. K.,G. J. Cizek (Ed.)(2001).Setting performance standards: Conxepts, methods, and perspectives.Mahwah, NJ:Lawrence Erlbaum Associates.
  15. Hambleton, R. K.,Swaminathan, H.(1985).Item response theory: Principle and application.Massachusetts:Kluwer Academic.
  16. Huynh, H(2006).A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping.Educational Measurement, Issues and Practice,25(2),19-20.
  17. Huynh, H.(1998).On score locations of binary and partial credit items and their applications to item mapping and criterion referenced interpretation.Journal of Educational and Behavioral Statistics,23,35-56.
  18. Jaeger, R. M.(1982).An iterative structured judgment process for establishing standards on competency tests: Theory and application.Educational Evaluation and Policy Analysis,4,461-476.
  19. Jaeger, R. M.,R. L. Linn.(Ed.)(1989).Educational measurement.NY:American Council on Education/Macmillan.
  20. Kaplan, D.(1995).The impact of BIB spiraling-induced missing data patterns on goodness-of-fit tests in factor analysis.Journal of Educational and Behavioral Statistics,20(1),69-82.
  21. Karantonis, A.,Sireci, S. G.(2005).The bookmark standard-setting method: A literature review.Educational Measurement, Issues and Practice,25(1),4-12.
  22. Koffler, S. L.(1980).A comparison of approaches for setting proficiency standards.Journal of Educational Measurement,17,167-178.
  23. Koski, W. S.,Weis, H. A.(2004).What educational resources do students need to meet California's educational content standards? A textual analysis of California's educational content standards and their Implications for basic educational conditions and resources.Teachers College Record,106(10),1907-1935.
  24. Lewis, D. M.,Mitzel, H. C.,Green, D. R.(1996).Standard setting: A bookmark approach. Symposium presented at the Council of Chief State School Officers National Conference on Large Scale Assessment,Phoenix, AZ:
  25. Lewis, D. M,Mitzel, H. C.,Green, D. R.,Patz, R. J.(1999).The bookmark standard setting procedure.Monterey, CA:McGraw-Hill.
  26. Lindsay, B. G.(1995).Mixture models: Theory, geometry, and applications.Hayward, CA:Institute of Mathematical Statistics.
  27. Linn, R. L.(2000).Assessments and accountability.Educational Researcher,29(2),4-16.
  28. Linn, R. L.(2003).The bookmark standard setting procedure: strength and weakness. Canada.Language Learning,52(3),537-564.
  29. The NAEP writing achievement levels
  30. Perie, M.(2005).Angoff and bookmark methods.Workshop presented at the annual meeting of the National Council on Measurement in Education,Montreal, Canada:
  31. Reckase, M. D.(2006).A conceptual framework for a psychometric theory for standard setting with examples of its use for evaluating the functioning of two standard setting methods.Educational Measurement, Issues and Practice,25(2),4-18.
  32. Shrout, P. E.(1988).Measurement reliability and agreement in psychiatry.Statistical Methods in Medical Research,7,301-317.
  33. Sim, J.,Wright, C. C.(2005).The Kappa statistic in reliability studies: Use, interpretation, and sample size requirements.Physical Therapy,85(3),257-268.
  34. Skaggs, G.,Tessema, A.(2001).Item disordinality with the bookmark standard setting procedural.Paper presented at the 2001 annual meeting of the national council on measurement in education,Seattle, WA.:
  35. Swaminathan, H.,Hambleton, R. K.,Algina, J.(1974).Reliability of criterion referenced tests: A decision theoretic formulation.Journal of Educational Measurement,11,263-268.
  36. U. S. Department of Education(1996).Goals 2000: A progress report.
  37. Vermunt, J. K.,Magidson, J.(2005).Technical guide for Latent GOLD 4.0: Basic and advanced.Belmont, MA:Statistical Innovations.
  38. Overseeing the nation's report card: The creation and evolution of the national assessment governing board (NAGB)
  39. Yin, P.,Schulz, E. M.(2005).A comparison of cut scores and cut score variability from Angoff-based and Bookmark-based procedures in standard setting.Paper presented at the annual meeting of the National Council on Measurement in Education,Montreal, Canda:
  40. 王文中合著、呂金燮合著、吳毓瑩合著、張郁雯合著、張淑慧合著(2004)。教育測驗與評量。台北:五南。
  41. 余民寧(2002)。教育測驗與評量-成就測驗與教學評量。台北:心理。
  42. 教育部(2000)。國民小學九年一貫課程暫行綱要。台北:教育部。
  43. 教育部(2003)。國民小學九年一貫課程綱要。台北:教育部。
  44. 教育部(2004)。英語文學習領域能力指標解讀與示例手冊。台北:教育部。
  45. 陳淑惠、吳毓瑩、何東憲、張郁雯、陳錦芬(2005)。台灣學生學習成就評量資料庫2005年台灣學生英語學習成就之趨勢調查研究期中報告。
  46. 陳淑惠、吳毓瑩、張郁雯、何東憲(2006)。台灣學生學習成就評量資料庫2005年台灣學生英語學習成就之趨勢調查研究技術報告。
被引用次数
  1. 謝進昌(2021)。以「補充性表現水平描述輔助自陳式測量構念」之延伸Angoff標準設定研究。教育心理學報,53(2),307-334。
  2. 謝進昌,謝佩蓉,林世華(2012)。表現標準設定之擴大參與:教學現場效度證據。教育研究與發展期刊,8(4),1-18。
  3. 謝進昌、謝名娟、林世華(2013)。不同方法設定英文科決斷分數之實務性研究。測驗學刊,60(3),519-544。
  4. 謝進昌、謝佩蓉、謝名娟、陳清溪、林陳涌、林世華(2011)。大型資料庫國小四年級自然科學習成就評量標準設定結果之效度評估。教育科學研究期刊,56(1),1-32。
  5. 謝名娟(2013)。以多層面Rasch分析的角度來評估標準設定之變異性。教育心理學報,44(4),793-811。
  6. (2011)。TEPS資料庫中學業成就與相關影響因素之縱貫性研究。教育政策論壇,14(3),119-154。
  7. (2013)。臺灣學生學習成就評量英語科標準設定之效度評估研究。教育與心理研究,36(2),87-112。