题名

基於時間參數提昇谷歌部落格搜尋引擎效能

并列篇名

Improving the Performance of Google Blog Search Based on the Time Parameter

作者

陳林志(Lin-Chih Chen);葉國暉(Kuo-Hui Yeh);陳大仁(Da-Ren Chen);陳冠瑜(Guan-Yu Chen)

关键词

潛在語意分析 ; 機率潛在語意模型 ; 潛在狄利克里分配 ; 關係主題模型 ; 谷歌部落格搜尋 ; latent semantic analysis ; probabilistic latent semantic analysis ; latent dirichlet allocation ; relational topic model ; Google blog search

期刊名称

資訊管理學報

卷期/出版年月

24卷2期(2017 / 04 / 01)

页次

155 - 183

内容语文

繁體中文

中文摘要

部落格搜尋引擎是ㄧ種類似於谷歌的搜尋引擎,因為它們會自動收集來自網路上大量的資訊,並利用免費的介面讓一般人能搜索它們的資料庫。兩者之間的差異在於,部落格搜尋引擎主要是針對部落格進行索引並篩選掉一般的網頁,這個功能讓部落格搜尋引擎增加了一些特殊和獨特性。首先,每個部落格都有一個發佈日期,而部落格搜尋引擎可以顯示文章的發佈日期,相比一般搜尋引擎只能顯示最後更新日期,有時這些日期卻是不可靠的。其次,部落格搜尋引擎能抓取部落格文章發佈日期,相較於一般的搜尋引擎雖然有進階的搜索選項可以顯示日期,但這些都僅限於網頁的最後修改日期。本論文中,我們使用四種語意模型分析谷歌部落格引尋引擎:潛在語意分析(LSA)、機率潛在語意分析(PLSA)、潛在狄利克里分配(LDA)、關係主題模型(RTM)。另外,我們提出一個利用時間參數來改良RTM 的變形模型。根據實驗的結果,改良的RTM 模型結合時間參數能提高谷歌部落格引擎效能。

英文摘要

Purpose-Blog search engines are similar to web search engines like Google in that they automatically gather large quantities of information from the web and give a free interface to allow the public to search their databases. Design/methodology/approach-In this paper, we use four kinds of semantic models to analyze Google blog search engine: Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), and Relational Topic Model (RTM). Findings-According to the result of experiment, our modified RTM's model can effectively combine the time parameter to Google blog search engine. Research limitations/implications-The main difference between the two is that blog search engines mainly index blogs and ignore the rest of the web. The special features of blogs give blog search engines some specific and unique attributes. Practical implications-First, since each blog posting is dated, blog search engines can reported the date at which the posting was created. For normal web pages, search engines can only report the last updated date, and this is often not very reliable. Second, many blog search engines have a date-specific search capability. Again, some general search engines have this as an advanced search option, but only for the last modified date of pages. Originality/value-In this paper, we propose a variant of RTM, which mainly focuses on the time parameter.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. 陳林志、林育任(2013)。個人化的網頁摘要文件分群系統。資訊管理學報,20(1),97-130。
    連結:
  2. Yahoo (2013b), '2013 year in review', available at http://tinyurl.com/q3zlabr (accessed 21 September 2015).
  3. 創世紀(2014)。comScore 與創市際依據comScore MMXTM數據公佈2014 年10 月
  4. Yahoo (2014), '2014 hot keywords for Yahoo', available at http://tinyurl.com/ovkch3u (accessed 21 September 2015).
  5. Yahoo (2012), 'Yahoo!'S year in review reveals the daily search habits of 2012', available at http://tinyurl.com/kw47q8r (accessed 21 September 2015).
  6. 台灣網路活動分析報告,http://www.insightxplorer.com/news/news_12_22_14.html (存取日期2015/09/21)。
  7. 余至浩(2014),痞客邦百億Log 上雲端-挖掘社群行為尋找新服務,http:// www.ithome.com.tw/news/90977(存取日期2015/09/21)。
  8. Kunder, M.d. (2008), 'The size of the world wide web', available at http://worldwidewebsize.com/ (accessed 21 September 2015).
  9. Yahoo (2013a), '2013 hot keywords for Yahoo', available at http://tinyurl.com/ qcoxybv (accessed 21 September 2015).
  10. Hazel, P. (2015), 'Pcre-perl compatible regular expressions', available at http://www. pcre.org/pcre.txt (accessed 21 September 2015).
  11. Google (2012), 'Google zeitgeist 2012', available at http://tinyurl.com/mc2f9nf (accessed 21 September 2015).
  12. Google (2014), '2014 hot keywords for Google', available at http://tinyurl.com/pnqkld9 (accessed 21 September 2015).
  13. Google (2013a), '2013 hot keywords for Google', available at http://tinyurl.com/ puj9brg (accessed 21 September 2015).
  14. Google (2013b), 'Google zeitgeist 2013', available at http://tinyurl.com/kubnvvg (accessed 21 September 2015).
  15. Shijiebei2009 (2015), '1893 stop words for Chinese', available at http://blog.csdn.net/shijiebei2009/article/details/39696571 (accessed 21 September 2015).
  16. 中央研究院(2015),中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw(存取日期 2015/09/21)。
  17. Blei, D.M.,Ng, A.Y.,Jordan, M.I.(2003).Latent dirichlet allocation.Journal of Machine Learning Research,3(1),993-1022.
  18. Chang, J.,Blei, D.M.(2010).Hierarchical relational models for document networks.The Annals of Applied Statistics,4(1),124-150.
  19. Chen, L.C.(2011).Term suggestion with similarity measure based on semantic analysis techniques in query logs.Online Information Review,35(1),9-33.
  20. Chen, L.C.(2012).Building a term suggestion and ranking system based on a probabilistic analysis model and a semantic analysis graph.Decision Support Systems,53(1),257-266.
  21. Cosma, G.,Joy, M.(2012).An approach to source-code plagiarism detection and investigation using latent semantic analysis.IEEE Transactions on Computers,61(3),379-394.
  22. Fox, C.(1989).A stop list for general text.ACM SIGIR Forum,24(1-2),19-35.
  23. Fujimura, K.,Toda, H.,Inoue, T.,Hiroshima, N.,Kataoka, R.,Sugizaki, M.(2006).Blogranger-a multi-faceted blog search engine.Proceedings of the WWW 2006 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics,Edinburgh, UK:
  24. Gethers, M.,Poshyvanyk, D.(2010).Using relational topic models to capture coupling among classes in object-oriented software systems.Proceedings of the 2010 IEEE International Conference on Software Maintenance,Timișoara, Romania:
  25. Hearst, M.A.,Hurst, M.,Dumais, S.T.(2008).What should blog search look like?.Proceedings of the 2008 ACM Workshop on Search in Social Media,Napa Valley, California, USA:
  26. Hofmann, T.(1999).Probabilistic latent semantic indexing.Proceedings of the 22th Annual International SIGIR Conference on Research and Development in Information Retrieval,Berkeley, California, USA:
  27. Hofmann, T.(2003).Collaborative filtering via gaussian probabilistic latent semantic analysis.Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval,Toronto, Canada:
  28. Hofmann, T.(2004).Latent semantic models for collaborative filtering.ACM Transactions on Information Systems,22(1),89-115.
  29. Horowitz, E.,Sahni, S.,Anderson-Freed, S.(2007).Fundamentals of Data Structures in C.Summit, New Jersey:Silicon Press.
  30. Inoue, M.(2005).The remarkable search topic-finding task to share success stories of cross-language information retrieval.Proceedings of the Fifth Workshop on Important Unresolved Matters,Michigan, USA:
  31. Jeong, O.R.,Oh, J.(2012).Social community based blog search framework.Lecture Notes in Computer Science,7240(2012),130-141.
  32. Jin, X.,Zhou, Y.,Mobasher, B.(2004).Web usage mining based on probabilistic latent semantic analysis.Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Seattle, WA, USA:
  33. Kiezun, A.,Ganesh, V.,Artzi, S.,Guo, P. J.,Hooimeijer, P.,Ernst, M.D.(2012).Hampi: A solver for word equations over strings, regular expressions, and contextfree grammars.ACM Transactions on Software Engineering and Methodology,21(4),25:1-25:28.
  34. Kim, J.,Yun, U.(2014).The blog ranking algorithm using analysis of both blog influence and characteristics of blog posts.Lecture Notes in Electrical Engineering,274(2014),13-17.
  35. Klein, R.,Kyrilov, A.,Tokman, M.(2011).Automated assessment of short free-text responses in computer science using latent semantic analysis.Proceedings of the 16th Annual Joint Conference on Innovation and Technology in Computer Science Education,Darmstadt, Germany:
  36. Krestel, R.,Fankhauser, P.,Nejdl, W.(2009).Latent dirichlet allocation for tag recommendation.Proceedings of the third ACM conference on Recommender Systems,New York, USA:
  37. Kuo, F.F.,Shan, M.K.,Lee, S.Y.(2013).Background music recommendation for video based on multimodal latent semantic analysis.Proceedings of the 2013 IEEE International Conference on Multimedia and Expo,San Jose, California, USA:
  38. Landauer, T.K.,Foltz, P.W.,Laham, D.(1998).An introduction to latent semantic analysis.Discourse Processes,25(2-3),259-284.
  39. Liénou, M.,Maître, H.,Datcu, M.(2010).Semantic annotation of satellite images using latent dirichlet allocation.IEEE Geoscience and Remote Sensing Letters,7(1),28-32.
  40. Lintean, M.,Moldovan, C.,Rus, V.,McNamara, D.(2010).The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis.Proceedings of the 23th International Florida Artificial Intelligence Research Society Conference,Florida, USA:
  41. Liu, Z.,Zhang, Y.,Chang, E.Y.,Sun, M.(2011).Plda+: Parallel latent dirichlet allocation with data placement and pipeline processing.ACM Transactions on Intelligent Systems and Technology,2(3),26:1-26:18.
  42. Logan, B.,Kositsky, A.,Moreno, P.(2004).Semantic analysis of song lyrics.Proceedings of the 2004 IEEE International Conference on Multimedia and Expo,Taipei, Taiwan:
  43. Luh, C.J.,Yang, S.A.,Huang, D.T.L.(2012).Estimating search engine ranking function with latent semantic analysis and a genetic algorithm.Proceedings of the 2012 3rd International Conference on E-Business and E-Government,Shanghai, China:
  44. Lukins, S.K.,Kraft, N.A.,Etzkorn, L.H.(2008).Source code retrieval for bug localization using latent dirichlet allocation.Proceedings of the 15th Working Conference on Reverse Engineering,Antwerp, Belgium:
  45. McInerney, J.,Rogers, A.,Jennings, N.R.(2012).Improving location prediction services for new users with probabilistic latent semantic analysis.Proceedings of the 2012 ACM Conference on Ubiquitous Computing,Pittsburgh, Pennsylvania, USA:
  46. Menascé, D.A.(2002).Qos issues in web services.IEEE Internet Computing,6(6),72-75.
  47. Mesaros, A.,Heittola, T.,Klapuri, A.(2011).Latent semantic analysis in sound event detection.Procedding of the 19th European Signal Processing Conference,Barcelona, Spain:
  48. Moritz, E.,Linares-Vásquez, M.,Poshyvanyk, D.,Grechanik, M.(2013).Export: Detecting and visualizing api usages in large source code repositories.Proceedings of the 2013 IEEE/ACM 28th International Conference on Automated Software Engineering,CA, USA:
  49. Nardi, B.A.,Schiano, D.J.,Gumbrecht, M.,Swartz, L.(2004).Why we blog.Communications of the ACM,47(12),41-46.
  50. Nguyen, H.V.,Bai, L.(2011).Cosine similarity metric learning for face verification.Lecture Notes in Computer Science,6493(2011),709-720.
  51. Ozsoy, M.G.,Alpaslan, F.N.,Cicekli, I.(2011).Text summarization using latent semantic analysis.Journal of Information Science,37(4),405-417.
  52. Patil, C.G.,Patil, S.S.(2013).Use of porter stemming algorithm and svm for emotion extraction from news headlines.International Journal of Electronics, Communication and Soft Computing Science and Engineering,2(7),9-13.
  53. Qureshi, M.A.,Younus, A.,Touheed, N.,Qureshi, M.S.,Saeed, M.(2011).Discovering irrelevance in the blogosphere through blog search.Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining,Kaohsiung, Taiwan:
  54. Skaggs, B.,Getoor, L.(2014).Topic modeling for wikipedia link disambiguation.ACM Transactions on Information Systems,32(3),10:1-10:24.
  55. Somasundaram, K.,Murphy, G.C.(2012).Automatic categorization of bug reports using latent dirichlet allocation.Proceedings of the fifth India Software Engineering Conference,Kanpur, India:
  56. Tan, P.N.,Steinbach, M.,Kumar, V.(2005).Introduction to Data Mining.Boston, Massachusetts:Addison- Wesley Press.
  57. Thelwall, M.,Hasler, L.(2007).Blog search engines.Online Information Review,31(4),467-479.
  58. Xu, C.,Zhang, Y.F.,Zhu, G.,Rui, Y.,Lu, H.,Huang, Q.(2008).Using webcast text for semantic event detection in broadcast sports video.IEEE Transactions on Multimedia,10(7),1342-1355.
  59. Xu, J.,Ye, G.,Wang, Y.,Herman, G.,Zhang, B.,Yang, J.(2009).Incremental em for probabilistic latent semantic analysis on human action recognition.Proceddings of the 6th IEEE International Conference on Advanced Video and Signal Based Surveillance,Genova, Italy:
  60. Yeh, J.Y.,Keb, H.R.,Yang, W.P.,Meng, I.H.(2005).Text summarization using a trainable summarizer and latent semantic analysis.Information Processing & Management,41(1),75-95.
  61. Zeng, J.,Cheung, W.K.,Liu, J.(2013).Learning topic models by belief propagation.IEEE Transactions on Pattern Analysis & Machine Intelligence,35(5),1121-1134.
  62. Zhu, L.,Sun, A.,Choi, B.(2011).Detecting spam blogs from blog search results.Information Processing and Management,47(2),246-262.