题名

Reforming a Valid Classroom Test: Application of Item Analysis with Expert Opinions and Examinee Feedbacks in Medical Education

DOI

10.6145/jme201302

作者

Shih-Chieh Liao;Pei-Ying Pai;Walter Chen

关键词

classical test theory ; classroom test ; items analysis ; item response theory ; validity

期刊名称

Journal of Medical Education

卷期/出版年月

17卷1期(2013 / 03 / 01)

页次

12 - 20

内容语文

英文

英文摘要

Background: Item analysis is used to ensure the validity of a test. The Classic Test Theory (CTT) and the Item Response Theory (IRT) are two main item analysis theories. Objective: This study discussed and compared advantages and disadvantages of CTT and IRT in screening out potential problematic test items. Expert opinion and student feedback were also considered before removal of truly problematic items. The study aimed to develop an item analysis procedure to ensure classroom test validity. Method: Eighty-six sixth-year medical students answered a newly developed authentic medical test composed of 48 multiple-choice questions. For item analysis, this study used CTT and IRT methods for the quantitative analysis, while the expert opinion and student feedback were used for the qualitative ones. Cronbach's Alphas were the coefficients of the internal consistency of the whole test. Results: The Cronbach's Alpha of the responses to all 48 items in the test was 0.55. Using IRT, 4 items were deleted and the alpha increased to 0.57. Using CTT, 24 items were deleted and the alpha increased to 0.70. Using IRT and CTT as well as expert opinion, 21 items were deleted and the alpha increased to 0.71. Conclusions: Both CTT and IRT help to increase the test reliability. Compared to IRT, CTT is more effective at increasing the test reliability. Moreover, expert opinion and student feedback offer valuable suggestions for item selection. Based on CTT, expert opinion and student feedback is a considerable procedure for item selection.

主题分类 醫藥衛生 > 醫藥總論
社會科學 > 教育學
参考文献
  1. National Board of Medical Examiners (NBME). Philadelphia: NBME, 1996-2009. Available at: [http://www.nbme.org/programs-services/healthprofessionals/scoring-and-analysis.html] Accessed 16 July, 2009
  2. Allen, MJ,Yen, WM(2002).Introduction to Measurement Theory.Long Grove, Illinois:Waveland Press Inc.
  3. Downing, SM(2003).Item response theory: applications of modern test theory in medical education.Med Educ,37,739-45.
  4. Epstein, RM(2007).Assessment in medical education.N Engl J Med,356,387-96.
  5. Fan, X(1988).Item response theory and classical test theory: an empirical comparison of their item/person statistics.EPM,58,357-81.
  6. Guion, RM,Ironson, GH(1983).Latent trait theory for organizational research.Organizational Behavior and Human Performance,31,54-87.
  7. Gullikson, H(1987).Theory of Mental Tests.Hillsdale, NJ:Lawrence Erlbaum Associates.
  8. Hambleton, RK,Cookk, LL(1977).Latent trait models and their use in the analysis of educational test data.JEM,14,75-96.
  9. Hambleton, RK,Jones, RW(1993).Comparison of classical test theory and item response theory and their applications to test development.JEM,12,38-47.
  10. Hambleton, RK,Swaminathan, H(1985).Item Response Theory: Principles and Applications.Boston, MA:Kluwer-Nijhoffm.
  11. Hambleton, RK,Swaminathan, H,Rogers, HJ(1991).Fundamentals of Item Response Theory.Newburry Park, CA:SAGE.
  12. Hopkins, KD(1998).Educational and Psychological Measurement and Evaluation.Boston, MA:Allyn and Bacon.
  13. Hulin, CL,Drasgow, F,Parsons, CK(1983).Item Response Theory: Application to Psychological Measurement.Homewood, IL:Dow Jones-Irwin.
  14. Lawson, S(1991).One Parameter Latent Trait Measurement: Do the Results Justify the Effort?.Advances in Educational Research: Substantive Findings, Methodological Developments,Greenwich, CT:
  15. Linacre, JM(2002).What do INFIT and OUTFIT, meansquare and standard mean?.RMT,16,878.
  16. Linacre, JM(1994).Sample size and item calibration stability.RMT,7,328.
  17. Linn, RL(ed.)(1989).Educational Measurement.New York:Macmillan.
  18. Lord, FM(1980).Applications of Item Response Theory to Practional Testing Problems.Hillsdale, NJ:Lawrence Erlbawn Associates.
  19. Lord, FM,Novick, MR(1968).Statistical Theories of Mental Test Scores.Reading, MA:Addison-Wesley.
  20. MacDonald, P,Paunonen, SV(2002).A monte carlo comparison of item and person statistics based on item response theory versus classical test theory.EPM,62,921-43.
  21. McTighe, J,O''Connor, K(2005).Seven practices for effective learning.EL,63,10-17.
  22. Miller, DM,Linn, RL,Gronlund, NE(2008).Measurement and Assessment in Teaching.New Jersey:Merrill Prentice Hall.
  23. Richard, FB(2004).Can item response theory help us improve our tests?.Med Educ,38,336-9.
  24. Roediger, HL, III.,Karpicke, JD(2006).Test-enhanced learning.Psychol Sci,17,249-55.
  25. Schuwirth, L,van der, Vleuten C(2004).Merging views on assessment.Med Educ,38,1208-10.
  26. Swaminathan, H(1999).Advances in Measurement in Education Research and Assessment.New York, NY:Pergamon.
  27. Verschaffel, L(ed.),Dochy, F(ed.),Boekaerts, M(ed.)(2006).Instructional Psychology: Past, Present and Future Trends.Oxford, Amsterdam:Elsevier.
  28. Wang, W(1998).Rasch analysis of distractors in multiplechoice items.JOM,2,43-6.
  29. Waugh, RF,Addison, PA(1998).A rasch measurement model analysis of the revised approaches to studying inventory.Br J Educ Psychol,68,95-112.
  30. Wright, BD,Masters, GN(1982).Rating Scale Analysis: Rasch Measurement.Chicago, IL:MESA.
被引用次数
  1. Pai, Yi-Fong,Hsieh, Ming-Chen(2013).Introspection: the Challenge of the Method on Course Research in Medical Education.醫學教育,17(4),133-136.