题名

抄襲偵測之原始碼分析

并列篇名

Source Code Analysis for Plagiarism Detection

作者

林家禾(Jia-He Lin);吳宜鴻(Yi-Hung Wu)

关键词

抄襲偵測 ; 程式碼相似度 ; 混合式抄襲偵測 ; plagiarism detection ; program similarity ; hybrid clone detection

期刊名称

先進工程學刊

卷期/出版年月

13卷2期(2018 / 07 / 01)

页次

83 - 91

内容语文

繁體中文

中文摘要

偵測程式抄襲方法大致上可以分成文本分析和結構分析兩種類型,文本分析方法大部分都採用單一演算法擷取部分字串,藉此計算兩兩程式之間的相似度,再依相似度判斷是否抄襲。結構分析方法主要以樹狀結構的方式紀錄程式碼的結構語法,藉由探勘兩棵樹之間相似的部份以評估程式相似度。每一種演算法都有它的優缺點,只以單一方法評估有無抄襲是不夠全面的,所以本研究提出結合兩種類型的分析方法,希望藉此能夠綜合不同層面偵測程式抄襲。為了驗證可行性,實驗採用真實學生作業的程式碼,依照人工確認的實際抄襲名單評估準確度,與其它方法相較之下,本研究在各種指標的表現都較為優異。

英文摘要

The methods of code plagiarism detection can be roughly divided into two categories: textual analysis and structural analysis. Most of textual analysis methods adopt one single algorithm to extract a portion of strings from source code, compute the similarity between every two programs and then assess the possibility of plagiarism accordingly. Structural analysis methods mainly record the structural syntax in a program as a tree structure, discover the similar parts between every two trees and then estimate the similarity among programs accordingly. Every algorithm has its own pros and cons. Detection of code plagiarism by only one single algorithm is not comprehensive. Therefore, this thesis proposes an approach to integrate the methods of two categories in order to detect code plagiarism from different aspects. To verify the effectiveness, our experiments take into account the source codes from actual student assignments and evaluate the accuracy of our results by using a plagiarism list confirmed manually. Compared with the existing tools, our approach performs better in each of the accuracy measures.

主题分类 工程學 > 工程學綜合
工程學 > 工程學總論
工程學 > 土木與建築工程
工程學 > 機械工程
工程學 > 化學工業
参考文献
  1. Baxter, I. D.,Yahin, A.,Moura, L.,Sant’Anna, M.,Bier, L.(1998).Clone Detection Using Abstract Syntax Trees.Proceedings of the 14th International Conference on Software Maintenance,Bethesda, Maryland:
  2. Bellon, S.,Koschke, R.,Antoniol, G.,Krinke, J.,Merlo, E.(2007).Comparison and evaluation of clone detection tools.IEEE Transactions on Software Engineering,33(9),577-591.
  3. Canfora, G.,Cimitile, A.,De Carlini, U.,De Lucia, A.(1998).An Extensible System for Source Code Analysis.IEEE Transactions on Software Engineering,721-740.
  4. Higo, Y.,Kusumoto, S.(2011).Code Clone Detection on Specialized PDGs with Heuristics.European Conference on Software Maintenance and Reengineering,Oldenburg:
  5. Horwitz, S.(1990).Identifying the Semantic and Textual Differences Between Two Versions of a Program.Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation
  6. Jiang, L.,Misherghi, G.,Su, Z.,Glondu, S.(2007).DECKARD: Scalable and Accurate Treebased Detection of Code Clones.Proceedings of 29th International Conference on Software Engineering,Minneapolis:
  7. Johnson, J.H.(1994).Substring Matching for Clone Detection and Change Tracking.Proceedings of the 10th International Conference on Software Maintenance,Victoria, British Columbia, Canada:
  8. Kamiya, T.,Kusumoto, S.,Inoue, K.(2002).CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code.IEEE Transactions on Software Engineering,654-670.
  9. Karp, R. M.,Rabin, M. O.(1987).Efficient randomized pattern-matching algorithms.IBM Journal of Research and Development,249-260.
  10. Koschke, R.(2007).Survey of Research on Software Clones.Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings
  11. Kuo, J. Y.,Chu, L.(2005).Intelligent Code Analyzer for Online Course Management System.Proceedings of the 3rd ACIS International Conference on Software Engineering Research, Management & Applications
  12. Kuo, J. Y.,Huang, F. C.(2010).Code Analyzer for an Online Course Management System.Journal of Systems and Software,2478-2486.
  13. Levenshtein, V.(1966).Binary Codes Capable of Correcting Deletions, Insertions and Reversals.Soviet Physics Doklady,10(8),707.
  14. Liu, C.,Chen, C.,Han, J.,Yu, P.(2006).GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis.Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
  15. Mendes-Moreira, J. A.,Soares, C.,Jorge, A. M.,Sousa, J. F. D.(2012).Ensemble Approaches for Regression: A survey.ACM Comput. Surv.,45(1),10.
  16. Prechelt, L.,Malpohl, G.,Philippsen, M.(2002).Finding Plagiarism among a Set of Programs with Jplag.Journal of Universal Computer Science,8(11),1016-1038.
  17. Rattan, D.,Bhatia, R.,Singh, M.(2013).Software Clone Detection: A Systematic Review.Information and Software Technology,1165-1199.
  18. Schleimer, S.,Wilkerson, D.S.,Aiken, A.(2003).Winnowing: Local Algorithms for Document Fingerprinter.Proceedings of ACM SIGMOD Conference
  19. Tairas, R.,Gray, J.(2006).Phoenix-Based Clone Detection Using Suffix Trees.Proceedings of the 44th Annual Southeast Regional Conference,Melbourne, Florida:
  20. Wahler, V.,Seipel, D.,Gudenberg, J.W.,Fischer, G.(2004).Clone Detection in Source Code by Frequent Itemset Techniques.Proceedings of the 4th IEEE International Workshop Source Code Analysis and Manipulation,Chicago:
  21. M. Wise, "String Similarity via Greedy String Tiling and Running Karprabin Matching," in Unpublished Basser Department of Computer Science Report, 1993.
  22. Wise, M. J.(1996).Yap3: Improved Detection of Similarities in Computer Program and other Texts.Proceedings of the Twenty-seventh SIGCSE Technical Symposium on Computer Science Education
  23. Yang, W.(1991).Identifying Syntactic Differences Between Two Programs.Software Practice and Experience,21(7),739-755.