题名

以非監督式方法利用知識庫與搜尋結果提升網頁搜尋排序一致性

并列篇名

An Unsupervised Ranking Consistency Approach based on Knowledge Base and Search Results

DOI

10.6342/NTU201600837

作者

江建德

关键词

網頁搜尋 ; 排序一致性 ; 查詢意圖 ; 非監督式方法 ; 知識庫 ; 主題分群 ; 查詢意圖模板 ; Web Search ; Ranking Consistency ; Query Intent ; Unsupervised Approach ; Knowledge Base ; Topical Cluster ; Query Intent Template

期刊名称

國立臺灣大學資訊工程學系學位論文

卷期/出版年月

2016年

学位类别

碩士

导师

鄭卜壬

内容语文

英文

中文摘要

對於網頁搜尋系統如知名搜尋引擎Google, Yahoo!與Bing,相關性排序是一個最重要的問題。相關性排序的傳統方法採用對於查詢分別進行最佳化的方式來增進效能。之前曾有一篇論文提出一個根據查詢意圖的相似性使用兩階段監督式學習,並藉由提升排序一致性來改善相關性排序。然而在該篇論文中有兩個問題需要被提出來解決。第一,他們使用學習排序需要使用大量的查詢紀錄,而如此大量的查詢紀錄只有成熟的搜尋引擎才會擁有,剛開始發展或發展中的搜尋系統必須仰賴非監督式方法來提升相關性排序。第二,該篇論文使用知識庫中的實體來代表查詢意圖。但由於查詢通常含有一些特定的資訊,所以實體並無法完全的表達查詢意圖。舉例來說:``Kobe Bryant family'表達的意圖是想了解Kobe Bryant的家人而非Kobe Bryant本人。 在這篇論文當中,我們提出一個藉由搜尋結果與知識庫的兩階段非監督式方法來改善排序一致性與相關性排序,解決不成熟的搜尋系統沒有查詢紀錄的問題。第一階段從搜尋結果擷取排序一致性的分數,並於第二階段藉由衡量獨特性與一致性的方式重新排序搜尋結果。此外,我們在查詢意圖加入查詢模板可以讓我們更清楚的解析查詢意圖。就我們所知,我們的論文是第一個使用非監督式排序一致性方法來改善相關性排序。最後,我們使用Freebase與Yahoo!的搜尋結果當作實驗資料庫並證實我們的方法,結果顯示出我們成功藉由非監督式方法改善了排序一致性與相關性排序的效能。

英文摘要

Relevance ranking is the most important problem in web search system, such as Google, Yahoo!, Bing etc. Most of conventional approaches focus on optimizing ranking model by each query separately. One past work propose a two-stage supervised approach to improve relevance ranking by enhancing ranking consistency across queries with similar search intents. However, there are two crucial problems of previous work. First, they use pair-wise learning to rank to learn consistency, and the method relies on large-scale query log which only few of mature web search systems have. Most of developing search engines need to improve their performance without query log. Second, they considers query intents on entities in knowledge base. Nevertheless, entities cannot completely represent query intents because queries contains some specific information to ask, such as ``Kobe Bryant family' for the intents of family. In this work, we propose an two-phase unsupervised approach to improve ranking consistency by knowledge base and search results. The first phase extracts consistency from search results and the second phase re-ranks search results by leveraging consistency and unique. Furthermore, we add query templates to help us clarify query intents completely. For the best of our knowledge, our work is the first unsupervised method with ranking consistency to improve relevance ranking. We conducted extensive experiments using Freebase and search results from Yahoo! search engine, and results demonstrate that our approach improves ranking consistency and relevance ranking significantly.

主题分类 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學系
参考文献
  1. [1] J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition, 1997.
    連結:
  2. [2] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology, 55(10): 859–868, 2004.
    連結:
  3. [8] Y. Chen, X. Li, A. Dick, and R. Hill. Ranking consistency for image matching and object retrieval. Pattern Recognition, 47(3):1349–1360, 2014.
    連結:
  4. [14] J. Hu, G. Wang, F. Lochovsky, J.-t. Sun, and Z. Chen. Understanding user’s query intent with wikipedia. In Proceedings of the 18th international conference on World wide web, pages 471–480, 2009.
    連結:
  5. [15] J. Jiang, X. Song, N. Yu, and C.-Y. Lin. Focus: learning to crawl web forums. IEEE Transactions on knowledge and Data Engineering, 25(6):1293–1306, 2013.
    連結:
  6. [17] T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 25(2):7, 2007.
    連結:
  7. [18] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
    連結:
  8. [24] J. J. Rocchio. Relevance feedback in information retrieval. 1971.
    連結:
  9. [26] H. Wang, X. He, M.-W. Chang, Y. Song, R. W. White, and W. Chu. Personalized ranking model adaptation for web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 323–332, 2013.
    連結:
  10. [28] X. Yin, W. Tan, X. Li, and Y.-C. Tu. Automatic extraction of clickable structured web contents for name entity queries. In Proceedings of the 19th international conference on World wide web, pages 991–1000, 2010.
    連結:
  11. [3] P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk, and X. Cui. Modeling the impact of short-and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 185–194, 2012.
  12. [4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, 2008.
  13. [5] M. R. Bouadjenek, H. Hacid, and M. Bouzeghoub. Sopra: A new social personalized ranking function for improving web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 861–864, 2013.
  14. [6] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89–96, 2005.
  15. [7] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129–136, 2007.
  16. [9] J. C. K. Cheung and X. Li. Sequence clustering and labeling for unsupervised query intent discovery. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 383–392, 2012.
  17. [10] G. V. Cormack, C. L. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 758–759, 2009.
  18. [11] E. A. Fox and J. A. Shaw. Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pages 243–243, 1994.
  19. [12] S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS), 23(2):147–168, 2005.
  20. [13] J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 267–274, 2009.
  21. [16] J.-Y. Jiang, J. Liu, C.-Y. Lin, and P.-J. Cheng. Improving ranking consistency for web search by leveraging a knowledge base and search logs. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 1441–1450, 2015.
  22. [19] A. Khudyak Kozorovitsky and O. Kurland. Cluster-based fusion of retrieved lists. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 893–902, 2011.
  23. [20] H. S. Koppula, K. P. Leela, A. Agarwal, K. P. Chitrapura, S. Garg, and A. Sasturkar. Learning url patterns for webpage de-duplication. In Proceedings of the third ACM international conference on Web search and data mining, pages 381–390, 2010.
  24. [21] Y. Li, B.-J. P. Hsu, and C. Zhai. Unsupervised identification of synonymous query intent templates for attribute intents. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 2029–2038, 2013.
  25. [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  26. [23] C. Quoc and V. Le. Learning to rank with nonsmooth cost functions. Proceedings of the Advances in Neural Information Processing Systems, 19:193–200, 2007.
  27. [25] P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141–188, 2010.
  28. [27] K. Wang, T. Walker, and Z. Zheng. Pskip: estimating relevance ranking quality from web search clickthrough data. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1355–1364, 2009.