题名

利用LSA與GA之線上新聞查詢建議

并列篇名

Online News Query Suggestion Using LSA and GA

DOI

10.29767/ECS.201109.0003

作者

陳林志(Lin-Chih Chen);許富翔(Fu-Hsiang Hsu);劉英和(Ying-Ho Liu);陳大仁(Da-Ren Chen)

关键词

潛在語意分析 ; 基因演算法 ; 查詢建議 ; latent semantic analysis ; genetic algorithm ; query suggestion

期刊名称

Electronic Commerce Studies

卷期/出版年月

9卷3期(2011 / 09 / 30)

页次

295 - 321

内容语文

繁體中文

中文摘要

現階段的入口網站雖然提供整合新聞服務,但由於更新頻繁,且又是從各大新聞網站彙整而成,導致使用者資訊過載,不容易評斷新聞文件的重要性。新聞文件本身具有以下兩個特性:(一)描述新聞事件;(二)可能包含不同的時間點。因此本研究考慮到新聞文件的兩項特性:(一)文件產生時間;(二)不同事件的重要性。首先,我們使用潛在語意分析(latent semantic analysis, LSA),藉由維度約化篩檢文件中之雜訊,並將潛在的語意表現出來且可避免同義詞(synonymy)及一詞多義(polysemy)之問題。然後,我們使用基因演算法(genetic algorithm, GA),同時考慮搜尋空間中多個點,而非單一個點,因此可以較快地獲得整體區域最佳解。最後達到產生使用者新聞查詢詞語建議之目的。根據實驗結果,我們發現在LSA原始矩陣混合加入重要性特徵及時間性特徵後,其所產生的效能確實比LSA原始矩陣優良。然而,在加入GA後並無更佳之結果,原因是其在廣泛的空間採隨機式的搜尋,會找出一些較不相關的詞語。

英文摘要

Many of the portal sites provide integrated news content. However, the users suffer from the information overload problem since the news articles are updated frequently and summarized from different news sources. The news articles have the following two interesting properties: (a) it describes the news events; (b) it may contain different times of the news events. In this thesis, we consider the following two features in the news articles: (i) the generated times; (ii) the importance of different terms. We first use the Latent Semantic Analysis (LSA) to reduce the noise of news articles and present the latent semantic of terms and news articles to users in order to address the problems of synonymy and polysemy. We then use the Genetic Algorithm (GA) to find many possible solutions simultaneously in order to quickly find the local optimal solution. According to the results of experiments, we found that the performance of the LSA matrix with the features of times and importance is greater than the benefit from its original LSA matrix. However, GA did not outperform a better result since it uses a random search technique to guide the wide exploratory search that may result in the search process may lead to some unrelated terms.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 經濟學
参考文献
  1. CNN (2010). CNN.com International - Breaking, World, Business, Sports,Entertainment and Video News, http://www.cnn.com/.
  2. iThome(民100)。研究:年青人不讀報紙,愛上新聞網站,http://www.ithome.com.tw/itadm/article.php?c=47950
  3. ABCNews (2010). ABCNews.com - Breaking news, politics, online news, world news, feature stories, celebrity interviews and more - ABC News,http://abcnews.go.com/.
  4. Porter, M. and R. Boulton (2007). Snowball: A Language for Stemming Algorithms,http://snowball.tartarus.org/.
  5. Wikipedia (2011). News-Wikipedia, the free encyclopedia,http://en.wikipedia.org/wiki/News.
  6. BBC (2010). BBC - Homepage, http://www.bbc.co.uk/.
  7. Google (2010). Google News, http://news.google.com/news?ned=us.
  8. Baeza-Yates, R.,Ribeiro-Neto, B.(1999).Modern Information Retrieval.Addison Wesley Press.
  9. Chen, L.C.(2011).Building a Web-Snippet Clustering System Based on a Mixed Clustering Method.Online Information Review,35(4)
  10. Chen, L.C.,Luh, C.J.,Jou, C.(2005).Generating Page Clippings from Web Search Results Using a Dynamically Terminated Genetic Algorithm.Information Systems,30(4),299-316.
  11. Cilibrasi, R.L.,Vit´anyi, P. M.B.(2007).The Google Similarity Distance.IEEE Transaction on Knowledge and Data Engineering,19(3),370-383.
  12. Deerwester, S.(1990).Indexing by Latent Semantic Analysis.Journal of the American Society for Information Science,41(6),391-407.
  13. Ellouze, M.,Karray, H.,Alimi, A.M.(2007).Genetic Algorithm for Summarizing News Stories.Proceedings of the Second International Conference on Computer Vision Theory and Applications
  14. Fox, C.(1989).A Stop List for General Text.ACM SIGIR Forum,24(1-2),19-35.
  15. Holland, J.H.(1992).Adaptation in Natural and Artificial Systems.MIT Press.
  16. Jaoua, M.,Hamadou, A.B.(2003).Automatic Text Summarization of Scientific Articles Based on Classification of Extract' s Population.Lecture Notes in Computer Science,2588(1),363-377.
  17. Kanejiya, D.,Kumar, A.,Prasad, S.(2003).Automatic Evaluation of Students'Answers using Syntactically Enhanced LSA.Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications using Natural Language Processing
  18. Koza, J.R.(1992).Genetic Programming: On the Programming of Computers by Means of Natural Selection.MIT Press.
  19. Landauer, T.K.,Dumais, S.T.(1997).A Solution to Plato' s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.Psychological Review,104(2),211-240.
  20. Landauer, T.K.,Foltz, P.W.,Laham, D.(1998).An Introduction to Latent Semantic Analysis.Discourse Processes,25,259-284.
  21. Lin, F.T.,Kao, C.Y.,Hsu, C.C.(1993).Applying the Genetic Approach to Simulated Annealing in Solving Some NP-Hard Problems.IEEE Transaction on Systems, Man and Cybernetics,23(6),1752-1767.
  22. Madsen, R.E.,Kauchak, D.,Elkan, C.(2005).Modeling Word Burstiness Using the Dirichlet Distribution.Proceedings of the 22nd International Conference on Machine Learning
  23. Manning, C.,Schütze, H.(1999).Foundations of Statistical Natural Language Processing.MIT Press.
  24. Mitchell, M.(1998).An Introduction to Genetic Algorithms.MIT Press.
  25. Perold, A.F.,Sharpe, W.F.(1995).Dynamic Strategies for Asset Allocation.Financial Analysts Journal,Jan/Feb,16-27.
  26. Sathya, S.S.,Simon, P.(2009).Review on Applicability of Genetic Algorithm to Web Search.International Journal of Computer Theory and Engineering,1(4),450-455.
  27. Tibshirani, R.,Walther, G.,Hastie, T.(2001).Estimating the Number of Clusters in a Data Set via the Gap Statistic.Journal of the Royal Statistical Society,63(2),411-423.
  28. Wright, A.H.(1991).Genetic Algorithms for Real Parameter Optimization.Morgan Kaufmann.
  29. 黃世杰、陳友相(2006)。2005 年NCS全國計算機會議
被引用次数
  1. 劉宜芳、柯華葳(2017)。線上閱讀研究之回顧與展望。教育科學研究期刊,62(2),61-87。