题名

科學探究能力評量之標準設定與其效度檢核

并列篇名

Validating the Standard Setting on Multimedia-based Assessment of Scientific Inquiry Abilities

DOI

10.6251/BEP.201903_50(3).0005

作者

林小慧(Hsiao-Hui Lin);吳心楷(Hsin-Kai Wu)

关键词

科學探究能力 ; 效度驗證 ; 標準設定 ; Bookmark標準設定法 ; Bookmark ; Scientific inquiry abilities ; Standard setting ; Validation

期刊名称

教育心理學報

卷期/出版年月

50卷3期(2019 / 03 / 01)

页次

473 - 502

内容语文

繁體中文

中文摘要

本研究係以臺灣大台北地區605位11年級學生接受科學探究能力評量施測所蒐集的實徵資料,以達到兩項研究目的。其一為依據待加強、基礎、精熟三個等級之標準表現描述,設定科學探究能力評量之標準,其二則從內部、過程及外部等多元效度證據來源,檢核Bookmark法進行科學探究能力標準設定的適切性及有效性。研究結果顯示,本研究科學探究能力的標準設定可獲得過程效度證據的支持。其次,內部效度評估結果顯示,14位標準設定成員在第一輪到第二輪之各表現等級的標準誤均在可接受範圍(SE < 0.12),表示成員內標準設定結果檢具可靠性。另以二輪決斷分數中位數之樣本平均數的標準誤評估標準設定方法內的一致性,結果顯示各表現等級的標準誤均在可接受範圍(SE < 0.12),表示標準設定方法內的結果相當一致。再者,以獨立樣本t檢定進行標準設定成員間一致性的考驗,分析結果顯示不同群組成員所設定的決斷分數均未達顯著差異。此外,標準設定極端值的監控結果發現,僅有少數極端值出現,故而對於整體決斷分數的影響甚微。因此,本研究科學探究能力標準設定可獲得內部效度證據的支持。最後,本研究以群聚分析標準設定,透過探討Bookmark法所得決斷分數之輻合效度,結果顯示二種標準設定法將學生分為三個表現等級之相關係數達顯著水準,表示在判斷表現等級有相當程度的一致性。另利用區別分析檢核標準設定的一致性,分析結果顯示,Bookmark法在「觀察與定題」、「計畫與執行」、「分析與發現」及「推理與論證」整體分類一致性依序為79.50%、86.00%、100.00%、89.90%,可見Bookmark標準設定法所得出的決斷分數在各表現等級分類之區別力相當高,可獲得外部效度證據的支持。綜合以上證據,研究結果顯示經由Bookmark法所設定之科學探究能力標準適切而且有效。

英文摘要

This study developed a standard setting for Grade 11 of the Multimedia-based Assessment of Scientific Inquiry Abilities (MASIA) based on three levels of standard performance descriptions: below basic, basic, and proficient. The study also used a bookmark to identify the cut-off scores. Furthermore, the study discussed the correct degree of the MASIA standard, which depends on multiple levels of evidence, namely procedural, internal, and external evidence for validity. First, the result of the procedural evaluation for validity showed that the standard setting of scientific inquiry abilities adopted in this study is supported by the procedural evidence for validity. Second, the result of the internal evaluation for validity showed that the standard error of each performance level that participants reached in rounds one and two were within an acceptable range (standard error [SE] < 0.12), thus indicating good intra-rater consistency. The consistency within the standard-setting method was evaluated using the standard error of the sample mean based on the median of the cut-off scores from round two. The result showed that the standard error of every performance level was within an acceptable range (SE < 0.12), thus denoting high consistency within the results of the standard-setting method. Third, the inter-rater consistency of the standard setting was examined using an independent sample t test, and the results showed that none the cut-off scores set by the participants of different groups reached statistical significance. Therefore, the standard setting of scientific inquiry abilities can be supported by internal evidence for procedural validity. Finally, this study treated the quasi-setting results derived from the cluster analysis as convergent validity-based evidence to assess external validity. The results showed that the correlation coefficient of the three performance levels of the students differentiated by two standard-setting methods reached statistical significance, thus indicating that those who judged the performance levels of the students had a certain degree of consistency. Moreover, a discriminant analysis was conducted to determine the consistency of the standard settings, and the results revealed that the consistency of the factors classified into "observing and questioning," "planning and experimenting," "analyzing and concluding," and "reasoning and arguing" were in the following sequence: 79.50%, 86.00%, 100.00%, and 89.90%. In the present study, the cut-off scores obtained by the bookmark presented high discrimination for each performance level category and can be supported using external evidence for validity. These results suggest that the bookmark standard setting of scientific inquiry abilities are appropriate and effective.

主题分类 社會科學 > 心理學
社會科學 > 教育學
参考文献
  1. 吳宜芳, Y. F.,鄒慧英, H.,林娟如, J. R.(2010)。標準設定效度驗證之探究:以大型數學學習成就評量為例。測驗學刊,57(1),1-27。
    連結:
  2. 林小慧, H. H.,林世華, S. H.,吳心楷, H. K.(2018)。科學能力的建構反應評量之發展與信效度分析:以自然科光學為例。教育科學研究期刊,63(1),173-205。
    連結:
  3. 陳慧娟, H. J.(2015)。「師生共同增能」與「學生增能」教學實驗方案促進偏遠地區國中學生知識信念,自我調整策略與科學學習成就之比較研究。教育科學研究期刊,60(4),21-53。
    連結:
  4. 謝名娟, M. C.,謝進昌, J. C.,林世華, S. H(2013)。不同方法設定英文科決斷分數之實務性研究。測驗學刊,60(3),519-544。
    連結:
  5. Angoff, W. H.(1984).Scales, norms, and equivalent scores.Princeton, NJ:Educational Testing Service.
  6. Berk, R. A.(1986).A consumer's guide to setting performance standards on criterion-referenced tests.Review of Educational Research,56(1),137-172.
  7. Cizek, G. J.(Ed.)(2001).Standard setting: Concepts, methods, and perspectives.Mahwah, NJ:Erlbaum.
  8. Cizek, G. J.,Bunch, M. B.(2007).Standard setting: A guide to establishing and evaluating performance standards on tests.Thousand Oaks, CA:Sage.
  9. Downing, S. M.(Ed.),Haladyna, T. M.(Ed.)(2006).Handbook of test development.Mahwah, NJ:Lawrence Erlbaum Associates.
  10. Ebel, R. L.,Frisbie, D. A.(1986).Essentials of educational measurement.Englewood Cliffs, NJ:Prentice-Hall.
  11. Giraud, G.,Impara, J. C.,Plake, B. S.(2005).Teachers' conceptions of the target examinee in Angoff standard setting.Applied Measurement in Education,18(3),223-232.
  12. Green, D. R.,Trimble, C. S.,Lewis, D. M.(2003).Interpreting the results of three different standard‐ setting procedures.Educational Measurement: Issues and Practice,22(1),22-32.
  13. Hambleton, R. K.(2001).Setting performance standards on educational assessments and criteria for evaluating the process.Setting performance standards: Concepts, methods, and perspectives,Mahwah, NJ:
  14. Hambleton, R. K.,Jaeger, R. M.,Plake, B. S.,Mills, C.(2000).Setting performance standards on complex educational assessments.Applied Psychological Measurement,24(4),355-366.
  15. Hsu, Y. S.,Chang, H. Y.,Fang, S. C.,Wu, H. K.(2015).Developing technology-infused inquiry learning modules to promote science learning in Taiwan.Science education in East Asia: Pedagogical innovations and research-informed practices,Dordrecht:
  16. Hsu, Y. S.,Wu, H. K.,Hwang, F. K.(2008).Fostering high school students’ conceptual understandings about seasons: The design of a technology-enhanced learning environment.Research in Science Education,38(2),127-147.
  17. Huynh, H.(2006).A clarification on the response probability criterion RP67 for standard settings based on bookmark and item mapping.Educational Measurement: Issues and Practice,25(2),19-20.
  18. Impara, J. C.,Plake, B. S.(1997).Standard setting: An alternative approach.Journal of Educational Measurement,34(4),353-366.
  19. Jaeger, R. M.(1982).An iterative structured judgment process for establishing standards on competency tests: Theory and application.Educational Evaluation and Policy Analysis,4,461-475.
  20. Kane, M.(1994).Validating the performance standards associated with passing scores.Review of Educational Research,64(3),425-461.
  21. Karantonis, A.,Sireci, S. G.(2006).The bookmark standard‐setting method: A literature review.Educational Measurement: Issues and Practice,25(1),4-12.
  22. Kline, R. B.(2015).Principles and practice of structural equation modeling, fourth edition.New York:Guilford Press.
  23. Lewis, D. M.,Mitzel, H. C.,Green, D. R.(1996).Standard setting: A bookmark approach.council of chief state school officers national conference on large scale assessment,Boulder, CO.:
  24. Linn, R. L.,Herman, J. L.(1997).A policymaker's guide to standards-led assessment.Denver, CO:Education Commission of the States.
  25. Livingston, S. A.,Zieky, M. J.(1989).A comparative study of standard-setting methods.Applied Measurement in Education,2(2),121-141.
  26. Loomis, S. C.(2000).Feedback in the NAEP achievement levels setting process.meeting of the national council on measurement in education,New Orleans:
  27. Mitzel, H. C.,Lewis, D. M.,Patz, R. J.,Green, D. R.(2001).The bookmark procedure: Psychological perspectives.Setting performance standards: Concepts, methods, and perspectives,Mahwah, NJ:
  28. Nedelsky, L.(1954).Absolute grading standards for objective tests.Educational And Psychological Measurement,14,3-19.
  29. Peterson, C. H.,Schulz, E. M.,Engelhard, G.(2011).Reliability and validity of bookmark-based methods for standard setting: Comparisons to Angoff-based methods in the national assessment of educational progress.Educational Measurement: Issues and Practice,30(2),3-14.
  30. Reckase, M. D.(2000).The evolution of the NAEP achievement levels setting process: A summary of the research and development efforts conducted by ACT.Iowa City, IA:American College Testing.
  31. Reckase, M. D.(2001).Innovative methods for helping standard-setting participants to perform their task: The role of feedback regarding consistency, accuracy, and impact.Setting performance standards: Concepts, methods, and perspectives,Mahwah, NJ:
  32. Reckase, M. D.(2006).A conceptual framework for a psychometric theory for standard setting with examples of its use for evaluating the functioning of two standard setting methods.Educational Measurement: Issues and Practice,25(2),4-18.
  33. Sireci, S. G.,Hauger, J. B.,Wells, C. S.,Shea, C.,Zenisky, A. L.(2009).Evaluation of the standard setting on the 2005 Grade 12 National Assessment of Educational Progress mathematics test.Applied Measurement in Education,22(4),339-358.
  34. Sturmberg, J. P.,Hinchy, J.(2010).Borderline competence-from a complexity perspective: Conceptualization and implementation for certifying examinations.Journal of Evaluation in Clinical Practice,16(4),867-872.
  35. Thorndike, R. L.(Ed.)(1971).Educational measurement.Washington, DC:American Council on Education.
  36. Timm, N. H.(2002).Applied multivariate analysis.New York: NY:Springer-Verlag.
  37. Violato, C.,Marini, A.,Lee, C.(2003).A validity study of expert judgment procedures for setting cutoff scores on high-stakes credentialing examinations using cluster analysis.Evaluation & the Health Professions,26(1),59-72.
  38. Wu, H. K.(2010).Modelling a complex system: Using novice-expert analysis for developing an effective technology-enhanced learning environment.International Journal of Science Education,32(2),195-219.
  39. Wu, H. K.,Hsieh, C. E.(2006).Developing sixth graders’ inquiry skills to construct scientific explanations in inquiry-based learning environments.International Journal of Science Education,28(11),1289-1313.
  40. Wu, H. K.,Kuo, C. Y.,Jen, T. H.,Hsu, Y. S.(2015).What makes an item more difficult? Effects of modality and type of visual information in a computer-based assessment of scientific inquiry abilities.Computers & Education,85,35-48.
  41. Yin, P.,Schulz, E. M.(2005).A comparison of cut scores and cut score variability from Angoff-based and Bookmark-based procedures in standard setting.annual meeting of the national council on measurement in education,Montreal, Canda:
  42. 吳清山, C. S.(2014)。標準參照測驗。教育資料與研究,113,205-206。
  43. 曾建銘, C. M.,王暄博, H. P.(2012)。臺灣學生學習成就評量資料庫標準設定探究:以 2009 年國小六年級社會科為例。教育與心理研究,35(3),115-149。
  44. 曾建銘, C. M.,王暄博, H. P.(2012)。標準設定之效度評估:以 TASA 國語科為例。教育學刊,39,77-118。
被引用次数
  1. 林小慧(Hsiao-Hui Lin);郭哲宇(Che-Yu Kuo);吳心楷(Hsin-Kai Wu)(2021)。學生學習投入、好奇心、教師集體層級變項與科學探究能力的關係:跨層級調節式中介效果之探討。教育科學研究期刊。66(2)。75-110。