题名

繁體中文文章之剽竊偵測工具實作

并列篇名

A Prototype for Plagiarism Detection in Chinese Contexts

DOI

10.6382/JIM.201101.0025

作者

費彥霖(Yen-Lin Fei);唐日新(Jih-Hsin Tang);廖淯任(Yu-Ren Liao)

关键词

剽竊偵測 ; 中文文章剽竊 ; 智慧財產 ; plagiarism detection ; text plagiarism ; property rights

期刊名称

資訊管理學報

卷期/出版年月

18卷1期(2011 / 01 / 01)

页次

25 - 52

内容语文

繁體中文

中文摘要

全球資訊網(World Wide Web)的普及使得各種形式的資料在網路上廣為流傳,上網搜尋各種資訊已相當便利,但便利的同時也為人類帶來新的難題與挑戰;事實上,網路資訊的剽竊(Plagiarism)已是非常嚴重的問題。針對如何預防或偵測剽竊的問題,已有許多學者提出相關文獻探討(Chen et al. 2004),許多偵測程式剽竊的軟體如:MOSS、JPLAG、YAP、SID等也陸陸續續被開發出來,另外許多偵測剽竊的商業網站如TURNITON.COM等,也都營運了幾年,相關技術之研究與應用已相當成熟與普及,不過這些研究以及軟體工具多半適用於電腦程式(program)的相似度的比對,對於實際偵測文章剽竊的工具很少,尤其在中文文章比對上,目前並沒有相關的研究,因此本研究的目的在於提出一套以LZW演算法為基礎(Ziv & Tempel 1977),改良式的中文文章比對演算法,而為探討該演算法之可行性,實作一套中文文章剽竊偵測的雛形系統以進行實證,實驗評估顯示,任意選擇兩篇中文文章為比對主體,即能篩選出兩篇文章內容中所有相同的中文字句,因此,本研究的實驗結果,展現作者提出的中文文章剽竊偵測演算法,在技術上及實務應上具備可行性。然而,研究所提之演算法仍有許多改善的空間,透過進一步發展後,以期將來能夠作為日後華文世界中,文章剽竊研究的基礎,並能成功地被應用於教育研究等智慧財產相關體系中。

英文摘要

The popularity of World Wide Web makes all forms of data, information and knowledge accessible to the public; however plagiarism has also become a major concern. To overcome the problems, several scholars have long proposed program detection algorithms (Chen et al. 2004), and several famous tools have been developed and used widely such as MOSS, JPLAG, YAP and SID. However, these tools and algorithms are not applicable to the traditional Chinese contexts. To fill this gap, we propose a LZW-based algorithm and develop a prototype to examine the feasibility and usability. Initial results confirmed the prototype is applicable in the Chinese article plagiarism detection. Further discussion and limitations are also provided.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. Chang, C.,Tsai, W.(1991).A Data Compression Scheme for Chinese and English Characters.Computer Processing of Chinese and Oriental Languages,5(2),154-182.
  2. Chen, X.,Francia, B.,Li, M.,McKinnon, B.,Seker, A.(2004).Shared Information and Program Plagiarism Detection.EEE Transactions on Information Theory,1545-1550.
  3. Huffman, D.(1952).A Method for the Construction of Minimum Redundancy Codes.Proceedings of IRE,40(9),1098-1101.
  4. Lancaster, T.,Culwin, F.(2004).A Comparison of Source Code Plagiarism Detection Engines.Computer Science Education,14(2),101-117.
  5. McLafferty, C.,Foust, K.(2004).Electronic Plagiarism as a College Instructor' s Nightmare Prevention and Detection.Journal of Education for Business,79(3),186-190.
  6. Ottenstein, K.(1976).An algorithmic approach to the detection and prevention of plagiarism.ACM SIGCSE Bulletin,8(4),30-41.
  7. Parker A.,Hamblen. J.(1989).Computer algorithms for plagiarism detection.IEEE Transactions on Education,32(2),94-99.
  8. Verco, K. L.,Wise, M. J.(1996).Plagiarism à la Mode: A Comparison of Automated Systems for Detecting Suspected Plagiarism.the Computer Journal,39(9),741-750.
  9. Wise, M.(1994).Technical Report 463Technical Report 463,Sydney, Australia:Department of Computer Science, Sydney University.
  10. Ziv, J.,Lempel, A.(1977).A Universal Algorithm for Sequential Data Compression.IEEE Transactions on Information Theory,23(3),337-343.
  11. 張真誠、蔡文輝(1991)。植基於藍波―立夫編碼法的中文資料壓縮技術。電腦學刊,3(2),13-19。