


Peer evaluation system for open-ended questions based on pairwise comparisons






教育科技 ; 線上學習平台 ; 數位學習 ; 同儕互評 ; 自動批改 ; 開放式問題 ; 成對比較 ; Education Technology ; Online Learning Platform ; E-Learning ; Peer Assessments ; Auto-grading ; Open-ended Question ; Pairwise Comparison












近年來由於 MOOCs 的興起,掀起了一波學習新革命。各式各樣的線上學習平 台不斷萌芽、快速發展,不論是想要課前讓學生預習的平台、課中增加學生課堂參 與度的平台、或是課後提升學習動機讓學生複習的平台。我們發現這些平台有個共 同要面對的問題:如何去批改開放式問題? 有的系統使用複雜機制:互評、檢討、自評、仲裁;有的系統利用大量學生匿 名互評的方式;有的系統乾脆只提供有標準答案的題目。但我們卻找不到一個有理 論背景支持、又快速方便的系統,能讓傳統教學環境中老師能夠快速上手,加入這 個 E-Learning 的時代。 本研究以快速、方便、有效為核心宗旨,打造了一個線上教育平台:師暢。旨 在帶給傳統教育不一樣的改變。有別於大多數教育平台只支援有「標準答案」的題 目。我們提出一套演算法,藉由「同儕互評」與「成對比較」的評分方式,能自動 批改「開放式問題」。希望透過同儕互評的方式,提高學生認知領域的層次,增加 訓練學生思辨能力的機會。並透過自動批改功能,減輕老師教學上的負擔。讓老師 只需要專注於設計教案、題目,而不用費心去批改。而系統在批改的同時也能計算 出每位學生的「評鑑能力」回饋給老師,幫助老師了解學生的學習狀態。讓老師能 評估學生是否有將知識融會貫通、是否有掌握認知領域中的最高層次學習目標:評 鑑能力。 有鑑於其它平台皆無有力的理論支援,我們引進了不同領域的理論背景,以實 作「師暢」的演算法。我們將結合推廌系統、資料探勘常用的成對比較演算法,佐 以機器學習中的主動學習方法,以提升演算法準確度。並且透過與學校老師合作, 證明了系統自動評分與老師評分為高度相關,相關係數約為 0.9。而「師暢」也在 此次實驗中,成功輔助老師抓到一些評分上的疏失與發覺評分標準前後不一的情 況,而修正了之前批改的分數。


In recent years, MOOCs and online education have changed the education a lot. Many people regard MOOCs as a revolution since anyone with an Internet connection can learn. However, online courses encounter a new problem: How to grade open-ended questions automatically and accurately? There are many online education platforms trying to deal with this problem. Some use peer assessment but often leads to inaccurate grades and low-quality feedback [1]. Some develop complex systems, such as peer review, self-evaluation, multiple rounds of evaluation, anonymous feedback, and so on. Some can only be applied to objective questions. But it seems that there’s no education platform can use simple but effective way to deal with the problem currently. In this thesis, several theorems are proposed and an online education platform named PK-Grader is developed. The name means auto-grading by the result of PK. It grades open-ended questions by ordinal peer evaluations and generates not only the score of answers but also the evaluation ability of each student. It also allows teachers to better understand their students and know whether they really get the concept and reach higher category in Bloom’s cognitive domain: evaluation ability. Several theorems of different fields and the combination of pairwise comparison algorithms, active learning methods, and probability models will be introduced to form our algorithm. We prove our auto-grading algorithm’s correctness by testing it with high school students and found that there is a high correlation between the scores from our system and the scores from the teacher (the correlation coefficient is about 0.9). PK- Grader also enabled the teacher to find out the fact that some original scores were wrong and helped the teacher evaluate the assignment more accurately.

主题分类 電機資訊學院 > 電機工程學系
工程學 > 電機工程
  1. [5] 沈慶珩 and 黃信義, "網路同儕互評在 Moodle 系統上的應用," 教育資料與圖書館學, vol. 43, no. 3, pp. 267-284, 2006.
  2. [7] 陳豐祥, "新修定布魯姆認知領域目標的理論內涵及其在歷史教學上的應用," 歷史教育, 2009.
  3. [10] W. Barnett, "The modern theory of consumer behavior: Ordinal or cardinal?," Quarterly Journal of Austrian Economics, vol. 6, no. 1, pp. 41-65, 2003.
  4. [12] K. Topping, "Peer assessment between students in colleges and universities," (in English), Review of Educational Research, vol. 68, no. 3, pp. 249-276, Fal 1998.
  5. [13] K. J. Topping, "Peer assessment," Theory into practice, vol. 48, no. 1, pp. 20-27, 2009.
  6. [14] M. Freeman and J. McKenzie, "SPARK, a confidential web–based template for self and peer assessment of student teamwork: benefits of evaluating across different subjects," British Journal of Educational Technology, vol. 33, no. 5, pp. 551-569, 2002.
  7. [15] S. Fallows and B. Chandramohan, "Multiple approaches to assessment: Reflections on use of tutor, peer and self-assessment," Teaching in Higher Education, vol. 6, no. 2, pp. 229-246, 2001.
  8. [16] J. J. Ammer, "Peer evaluation model for enhancing writing performance of students with learning disabilities," Reading & Writing Quarterly, vol. 14, no. 3, pp. 263-282, 1998.
  9. [17] D. J. Nicol and D. Macfarlane‐Dick, "Formative assessment and self‐regulated learning: A model and seven principles of good feedback practice," Studies in higher education, vol. 31, no. 2, pp. 199-218, 2006.
  10. [22] K. Cho, C. D. Schunn, and R. W. Wilson, "Validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives," Journal of Educational Psychology, vol. 98, no. 4, p. 891, 2006.
  11. [29] M. Dougiamas and P. Taylor, "Moodle: Using learning communities to create an open source course management system," 2003.
  12. [31] 洪杰志, "結合 [翻轉教室] 與 [Q & A 教學] 策略對國中學生數學科學習成就與學習動機之影響-以七年級 [二元一次方程式] 課程為例," 交通大學理學院科技與數位學習學程學位論文, pp. 1-79, 2016.
  13. [33] N. Law et al., "Using Web 2.0 technology to support learning, teaching and assessment in the NSS Liberal Studies subject," Hong Kong Teachers' Centre Journal, 2009.
  14. [34] S.-C. Tseng and C.-C. Tsai, "On-line peer assessment and the role of the peer feedback: A study of high school computer course," Computers & Education, vol. 49, no. 4, pp. 1161-1174, 2007.
  15. [35] S. S. J. Lin, E. Z. F. Liu, and S. M. Yuan, "Web-based peer assessment: feedback for students with various thinking-styles," (in English), Journal of Computer Assisted Learning, vol. 17, no. 4, pp. 420-432, Dec 2001.
  16. [37] L. M. Jessup, T. Connolly, and D. A. Tansik, "Toward Atheory of Automated Group Work: The Deindividuating Effects of Anonymity," Small group research, vol. 21, no. 3, pp. 333-348, 1990.
  17. [43] I. Fette, "The websocket protocol," 2011.
  18. [44] J. Salvia, J. Ysseldyke, and S. Witmer, Assessment: In special and inclusive education. Cengage Learning, 2012.
  19. [47] M. Hartl, "RUBY ON RAILS TUTORIAL (RAILS 5)," https://www.railstutorial.org/book/toy_app.
  20. [48] R. A. Bradley and M. E. Terry, "Rank analysis of incomplete block designs: I. The method of paired comparisons," Biometrika, vol. 39, no. 3/4, pp. 324-345, 1952.
  21. [49] R. D. Luce, Individual choice behavior: A theoretical analysis. Courier Corporation, 2005.
  22. [50] J. Aldrich, "RA Fisher and the making of maximum likelihood 1912-1922," Statistical Science, vol. 12, no. 3, pp. 162-176, 1997.
  23. [51] D. R. Hunter, "MM algorithms for generalized Bradley-Terry models," Annals of Statistics, pp. 384-406, 2004.
  24. [52] O. Dykstra, "A Note on the Rank Analysis of Incomplete Block Designs--Applications beyond the Scope of Existing Tables," Biometrics, vol. 12, no. 3, pp. 301-306, 1956.
  25. [53] M. G. Kendall, "A new measure of rank correlation," Biometrika, vol. 30, no. 1/2, pp. 81-93, 1938.
  26. [56] J. Benesty, J. Chen, Y. Huang, and I. Cohen, "Pearson correlation coefficient," in Noise reduction in speech processing: Springer, 2009, pp. 1-4.
  27. [57] D. J. Leinweber, "Stupid data miner tricks: overfitting the S&P 500," The Journal of Investing, vol. 16, no. 1, pp. 15-22, 2007.
  28. [1] D. Gamage, M. E. Whiting, T. Rajapakshe, H. Thilakarathne, I. Perera, and S. Fernando, "Improving Assessment on MOOCs Through Peer Identification and Aligned Incentives," in Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale, 2017, pp. 315-318: ACM.
  29. [2] L. Pappano, "The Year of the MOOC," The New York Times, vol. 2, no. 12, p. 2012, 2012.
  30. [3] "Professionals against machine scoring of student essays in high-stakes assessment.," http://humanreaders.org/petition/.
  31. [4] R. Mcdaniel, "Getting to Know Coursera: Assessments," Commentary, https://cft.vanderbilt.edu/2012/11/getting-to-know-coursersa-assessments/, November 26, 2012.
  32. [6] "中華教育開放平台," https://courses.openedu.tw/.
  33. [8] B. S. Bloom, C. o. College, and U. Examiners, Taxonomy of educational objectives. Longmans, Green New York, 1964.
  34. [9] N. B. Shah, J. K. Bradley, A. Parekh, M. Wainwright, and K. Ramchandran, "A case for ordinal peer-evaluation in MOOCs," in NIPS Workshop on Data Driven Education, 2013.
  35. [11] 教育部, "十二年國民基本教育課程綱要總綱," https://www.naer.edu.tw/files/15-1000-7944,c639-1.php, 2014年11月28.
  36. [18] C. Kulkarni et al., "Peer and self assessment in massive online classes," ACM Transactions on Computer-Human Interaction (TOCHI), vol. 20, no. 6, p. 33, 2013.
  37. [19] 于富雲, 鄭守杰, 杜明璋, and 陳德懷, "網路同儕互評與評量標準來源對批判思考能力之影響," 南師學報: 教育類, vol. 37, no. 2, pp. 1-21, 2003.
  38. [20] C. J. Lee, C. R. Sugimoto, G. Zhang, and B. Cronin, "Bias in peer review," Journal of the Association for Information Science and Technology, vol. 64, no. 1, pp. 2-17, 2013.
  39. [21] "MOOC Completion Rates," http://www.katyjordan.com/MOOCproject.html, June 12, 2015.
  40. [23] R. Mcdaniel, "Getting to Know Coursera: Peer Assessments," Commentary, https://cft.vanderbilt.edu/2013/01/getting-to-know-coursera-peer-assessments/, January 7, 2013.
  41. [24] A. Watters, "The Problems with Peer Grading in Coursera," https://www.insidehighered.com/blogs/hack-higher-education/problems-peer-grading-coursera, August 27, 2012.
  42. [25] J. Rees, "Peer Grading Can’t Work," https://www.insidehighered.com/views/2013/03/05/essays-flaws-peer-grading-moocs, March 5, 2013.
  43. [26] K. Raman and T. Joachims, "Methods for ordinal peer grading," in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 1037-1046: ACM.
  44. [27] F. Wauthier, M. Jordan, and N. Jojic, "Efficient ranking from pairwise comparisons," in International Conference on Machine Learning, 2013, pp. 109-117.
  45. [28] D. Park, J. Neeman, J. Zhang, S. Sanghavi, and I. Dhillon, "Preference completion: Large-scale collaborative ranking from pairwise comparisons," in International Conference on Machine Learning, 2015, pp. 1907-1916.
  46. [30] ADI, "Social education game PaGamO wins education Oscar," in http://adigaskell.org/2015/01/28/social-education-game-pagamo-wins-education-oscar, ed, January 28, 2015.
  47. [32] C.-H. Wang, Y.-C. Hsu, P.-C. Yeh, C.-Y. Lin, and I.-W. Lai, "Edventure: Gamification for collaborative problem design and solving," in Information Technology Based Higher Education and Training (ITHET), 2016 15th International Conference on, 2016, pp. 1-5: IEEE.
  48. [36] P. G. Zimbardo, "The human choice: Individuation, reason, and order versus deindividuation, impulse, and chaos," in Nebraska symposium on motivation, 1969: University of Nebraska press.
  49. [38] R. Lu and L. Bol, "A comparison of anonymous versus identifiable e-peer review on college student writing performance and the extent of critical feedback," Journal of Interactive Online Learning, vol. 6, no. 2, 2007.
  50. [39] A. V. Aho, R. Sethi, and J. D. Ullman, "Compilers, Principles, Techniques," Addison wesley, vol. 7, no. 8, p. 9, 1986.
  51. [40] S. Team, "Summernote - Super Simple WYSIWYG Editor on Bootstrap. http://summernote.org/."
  52. [41] 陳薇婷等編著, "健康與護理," 新北市:泰宇出版股份有限公司, (2016四版).
  53. [42] B. Settles, "Active learning literature survey," University of Wisconsin, Madison, vol. 52, no. 55-66, p. 11, 2010.
  54. [45] 林雨蒼, "如何在Ubuntu 13.04 Server上部署Ruby on Rails app," http://billy3321.blogspot.tw/2013/09/ubuntu-1304-serverruby-on-rails-app.html, September 19, 2013.
  55. [46] 麥克阿忠, "Ruby on Rails 實務─熟悉 MVC," November 29, 2011.
  56. [54] F. M. Harper and J. A. Konstan, "The movielens datasets: History and context," ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 5, no. 4, p. 19, 2016.
  57. [55] "MySQL String Functions," https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_field.
  58. [58] "ajpeace," http://www.ajpeace.com.tw/Shop/.