题名

運用詞彙權重技術於自動文件摘要之研究

并列篇名

Automatic Text Summarization based on Wights of Words

作者

黃仁鵬(Jen-Peng Huang);張貞瑩(Chen-Ying Chang)

关键词

自動文件摘要 ; 文字探勘 ; 網際網路探勘 ; 資訊檢索 ; TF-IDF演算法 ; automatic text summarization ; text mining ; Web mining ; TF-IDF

期刊名称

資訊管理學報

卷期/出版年月

21卷4期(2014 / 10 / 01)

页次

391 - 415

内容语文

繁體中文

中文摘要

目前各個搜尋引擎所產生的網頁摘要,大多無法提供使用者充足的摘要內容判斷資訊,更可能造成使用者的誤導。本研究希望搜尋引擎將查詢結果回傳給使用者時,不只是給予一些片斷不全的訊息,取而代之的是一個比較有幫助的摘要,使用者可以藉由此自動摘要,了解全文的概要,然後決定是否需要讀取網頁之全文。本研究運用權重技術針對網頁的內容進行文字探勘,藉由中研院所開發的中文斷詞系統(CKIP)進行斷詞,利用TF-ISF與相似度權重技術分別進行摘要實作,並透過其聯集與交集分別產生「概略摘要」與「精準摘要」,藉以提升自動摘要的品質。由實驗結果可證實本研究所提出之系統方法可以有效的提升文件自動摘要的正確性。

英文摘要

Purpose-The objective of text document summarization is to extract essential sentences that cover most of the concepts of a document so that users are able to comprehend the ideas of the documents which try to address by simply reading through the corresponding summary. This study aims to develop an automatic text summarization technique to product the summary of the web pages by extracting the sentences which cover most of the concepts of the web pages. Design/methodology/approach-The research framework was developed from CKIP (Chinese Knowledge Information Processing) system and automatic text summarization techniques. Two studies were designed to elicit and evaluate the accuracy and applicability of the five automatic text summarization techniques with 10 samples from 184 web articles. Findings-Our results show that TF-ISF (Term Frequency-Inverse Sentence Frequency) is better than the others in the evaluation of "F-measure". Further, "Rough Summary" and "Accurate Summary" respectively is the best performance in the evaluation of "RECALL" and "PRECISION". Research limitations/implications-This paper focuses on Chinese web articles. Hence, future research is recommended to develop an automatic text summarization system based on Ontology-based architecture. Practical implications-This paper provides several automatic text summarization techniques to product the summary of the web pages by extracting the sentences which cover most of the concepts of the web pages. The experimental results indicate that the proposed approach outperform a significant improvement on the accuracy of automatic text summarization. Originality/value-This paper is the first that applies the union and intersection of "Rough Summary" and "Accurate Summary" to improve the quality of automatic text summarization.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. 李俊宏、張興亞(2007)。一個以 Ontology 為基礎的 Web-Mining 技術應用於供應鏈競爭分析之研究。電子商務學報,9(3),435-160。
    連結:
  2. 柯淑津(2003)。從詞網出發的中文複合名詞的語意表達。中文計算語言學期刊,8(2),93-107。
    連結:
  3. 鄒明城、韓慧林、邱景星(2010)。網頁地理資訊檢索與探勘─以民宿主題為例。中華民國資訊管理學報,17(3),19-44。
    連結:
  4. 魏玲玉、曾守正(2006)。以文件倉儲概念實現動態群聚與多重文件摘要之研究─以中文電子新聞為例。中華民國資訊管理學報,13(3),153-173。
    連結:
  5. Das, D. and Martins A.F. (2007), 'A survey on automatic text summarization', Literature Survey for the Language and Statistics II course at CMU, Vol. 4, pp. 192-195..
  6. Abdel Fattah, M.,Ren, F.(2008).Probabilistic neural network based text summarization.Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering (IEEE 2008),Beijing, China:
  7. Baxendale, P.B.(1958).Machine-made index for technical literature: an experiment.IBM J. Res. Dev.,2(4),354-361.
  8. Dalal, M.K.,Zaveri, M.A.(2011).Heuristics based automatic text summarization of unstructured text.Proceedings of the International Conference & Workshop on Emerging Trends in Technology (ICWET 2011),Mumbai, India:
  9. Gupta, V.,Lehal, G.S.(2010).A survey of text summarization extractive techniques.Journal of Emerging Technologies in Web Intelligence,2(3),258-268.
  10. Harris, A.,Oussalah, M.(2008).Automatic document summarizer.Proceedings of the 7th IEEE International Conference on Cybernetic Intelligent Systems (CIS 2008),London, UK:
  11. Ji, X.(2008).Research on the Automatic Summarization Model based on Genetic Algorithm and Mathematical Regression.Proceedings of the International Symposium on Electronic Commerce and Security (ISECS 2008),Guangzhou, China:
  12. Losiewicz, P.,Oard, D.W.,Kostoff, R.N.(2000).Textual data mining to support science and technology management.Journal of Intelligent Information Systems,15(2),99-119.
  13. Luhn, H.P.(1958).The automatic creation of literature abstracts.IBM Journal of research and development,2(2),159-165.
  14. Mani, I.,Maybury, M.T.(1999).Advances in Automatic Text Summarization.Cambridge:MIT press.
  15. Ren, F.,Li, S.,Kita, K.(2001).Automatic abstracting important sentences of web articles.IEEE International Conference on Systems, Man, and Cybernetics (IEEE SMC 2001),Tucson, Arizona:
  16. Salton, G.(1989).Automatic text processing.Addison-Wesley Publishing Company.
  17. Salton, G.,McGill, M.J.(1983).Introduction to modern information retrieval.McGraw-HIII Book company.
  18. Salton, G.,Singhal, A.,Mitra, M.,Buckley, C.(1997).Automatic Text Structuring and Summarization.Information Processing &Management,33(2),193-207.
  19. Sullivan, D.(2001).Document Warehousing and Text Mining.Wiley.
  20. Wei, C.P.,Chen, L.C.,Chen, H.Y.,Yang, C.S.(2013).Mining Suppliers from Online News Documents.Proceedings of the Pacific Asia Conference on Information Systems (PACIS 2013),Jeju Island, Korea:
  21. 李麗華、李富民、詹尚驥、周裕健(2009)。以學術部落格為主之個人化推薦系統。資訊科技國際期刊,3(1),56-75。
  22. 陳姿妤、魏世杰(2007)。運用重複具排除技術於中文文件自動摘要之研究。第十八屆國際資訊管理學術研討會論文集(ICIM 2007),臺北,臺灣:
  23. 黃純敏、吳郁瑩(1999)。網路中文文件自動摘要。網際網路研討會(TANET99)論文集,高雄,臺灣:
  24. 黃純敏、黃世源、盧韋秀(2011)。自動摘要方法於新聞解讀之比較。2011 商管與資訊研討會論文集(TBI 2011),新北市三峽,臺灣:
  25. 黃純敏、楊存一、邱立豐(2002)。英文網路文件自動摘要之研究。第十三屆國際資訊管理學術研討會論文集(ICIM 2002),台北,台灣:
被引用次数
  1. 黃淇瀅,郭伯臣,李政軒(2021)。利用Google BERT提升中文寫作自動評分之準確率。測驗學刊,68(1),53-74。