


Browsing and Visualizing Wikipedia through Link Mining and Semantic Relatedness Analysis


吳怡瑾(I-Chin Wu);張鈞甯(Chun-Ning Chang)


連結探勘 ; 正規化Google距離 ; 語意關聯分析 ; 主題導向維基百地圖 ; Link mining ; Normalized Google distance ; Semantic relatedness analysis ; Topic-based WikiMap




5卷2期(2011 / 06 / 01)


101 - 142




隨著網際網路與Web 2.0技術的推陳出新,以使用者貢獻為本之新型態的社會媒體服務(social media service)網站紛紛崛起。由於網站易於開發與網頁易於存取的特性,造成網路資訊快速的成長,網路世界逐漸成為使用者獲取資訊的來源,其中維基百科(Wikipedia)更為使用者快速獲取定義、解釋……等資訊的重要網路服務。由於網路資訊不斷倍增,故其延伸之主要問題為資訊超載,因此使用者經常花費許多時間尋找與過濾所需資訊。本研究即以Wikipedia為研究對象,以連結探勘與語意關聯分析技術為理論基礎,試圖建構特定主題之知識網路圖。本研究首先提出藉由Wikipedia頁面連結型態(type)與連結頻率(frequency)之連結關聯強度法(link strength measure)以建構初始網路,再進一步採用以搜尋結果為依據之Normalized Google Distance (NGD)演算法計算節點間的語意關係以建構主題網路。本研究最後採用社會網路分析指標來分析主題間之關係,並以視覺化的方式呈現研究結果。本研究透過不同使用者搜尋任務設計以評估所提出方法與建構之主題導向維基百科地圖介面之有效性,研究結果顯示該發展介面有助於協助使用者快速瀏覽Wikipedia資訊,且能協助使用者完成較複雜的任務搜尋。


With the ubiquity of the Internet and the emergence of Web 2.0 technologies, social web sites (i.e., social networking websites and, micro-blogging services) are providing unprecedented opportunities for creating user-generated content, as well as for promoting communication, collaboration and information-sharing among users. Wikipedia, one of the most famous collaborative projects on the Web, has become an extremely popular reference database for people seeking information or knowledge. However, since the number of articles and the wide variety of topics in Wikipedia is constantly expanding, it is difficult for users to find information efficiently via the hypertext links, i.e., the network of linked documents. To address the problem, we propose a hybrid approach that is based on the theories and techniques of link-based analysis and semantic relatedness analysis. Specifically, we employ a link strength measure to establish a preliminary topic network by analyzing the relationships between articles. We also refine the ”Normalized Google Distance” to quantify the strength of the relationship between two articles via key terms. Then, we apply social network analysis indicators to determine the relationships between topics and visualize the analysis results in order to help users browse Wikipedia efficiently. Finally, a topic-based WikiMap is generated based on the proposed hybrid approach. We conducted a user-task oriented evaluation study to confirm that the derived topic-based WikiMap can help users browse topics and execute complicated tasks easily and efficiently.

主题分类 人文學 > 圖書資訊學
社會科學 > 傳播學
  1. Chiang, Heien-Kun、Chen, Hown-Wen、Yang, Jing-Rong(2008)。The development and application of an automatic link analysis algorithm for social networks。Journal of Information Management,15(3),157-180。
  2. Page, L., Brin, S., Motwani, R., & Winograd, T.(1998). The pagerank citation ranking: Bringing order to the web. Retrieved December 5, 2010, from http://dbpubs.stanford.edu:8090/pub/1999-66
  3. Katz, J. S. (2004). Co-link web indicators of the European research area. Retrieved December 5, 2010, from http://www.sussex.ac.uk/Users/sylvank/pubs/Co-Link.pdf
  4. Anthonisse, J. M.(1971).The rush in a directed graph (Mathematische Besliskunde, No. BN 9/71).Amsterdam:Stichting Mathematisch Centrum.
  5. Björneborn, L.,Ingwersen, P.(2004).Towards a basic framework of webometrics.Journal of the American Society for Information Science and Technology,55,1216-1227.
  6. Brin, S.,Page, L.(1998).The anatomy of a large-scale hypertextual web search engine.Computer Networks and ISDN Systems,30,107-117.
  7. Carrière, J.,Kazman, P.(1997).WebQuery: Searching and visualizing the web through connectivity.Computer Networks and ISDN Systems,29,1257-1267.
  8. Chin, A.,Chignell, M.(2006).A social hypertext model for finding community in blogs.Proceedings of the 17th ACM Conference on Hypertext and Hypermedia,Denmark:
  9. Cilibrasi, R. L.,Vitányi, P. M. B.(2007).The Google similarity distance.IEEE Transactions on Knowledge and Data Engineering,19,370-383.
  10. Evangelista, A. J.,Kjos-Hanssen, B.(2006).Google distance between words.Storrs, CT:University of Connecticut, Frontiers in Undergraduate Research.
  11. Freeman, L. C.(1977).A set of measures of centrality based on betweenness.Sociometry,40,35-41.
  12. Freeman, L. C.(1979).Centrality in social networks: Conceptual clarification.Social Networks,1,215-239.
  13. Getoor, L.,Diehl, C. P.(2005).Link mining: A survey.ACM SIGKDD Explorations Newsletter,7(2),3-12.
  14. Gracia, J.,Trillo R.,Espinoza, M.,Mena, E.(2006).Querying the web: A multiontology disambiguation method.Proceedings of International Conference on Web Engineering (ICWE'06),USA:
  15. Hill, M. D.,Gaudiot, J.-L.,Hall, M.,Marks, J.,Prinetto, P.,Baglio, D.(2006).A Wiki for discussing and promoting best practices in research.Communications of the ACM,49(9),63-64.
  16. Hu, X.,Zhang, X.,Lu, C.,Park, E. K.,Zhou, X.(2009).Exploiting Wikipedia as external knowledge for document clustering.Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,France:
  17. Jeh, G.,Widom, J.(2002).SimRank: A measure of structural-context similarity.Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Canada:
  18. Kleinberg, J. M.(1998).Authoritative sources in a hyperlinked environment.Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'98),USA:
  19. Krackhardt, D.(1990).Assessing the political landscape: Structure, cognition, and power in organizations.Administrative Science Quarterly,35,342-369.
  20. Liu, B.(2007).Web data mining: Exploring hyperlinks, contents and usage data.New York:Springer.
  21. McAfee, A. P.(2006).Enterprise 2.0: The dawn of emergent collaboration.Management of Technology and Innovation,47(3),21-28.
  22. Milne, D.,Witten, I. H.(2008).Learning to link with Wikipedia.Proceedings of the 17th ACM Conference on Information and Knowledge Management,USA:
  23. Milne, D.,Witten, I. H.,Nichols, D. M.(2007).A knowledge-based search engine powered by Wikipedia.Proceedings of the 6th ACM Conference on Information and Knowledge Management,Portugal:
  24. Niemincn, J.(1974).On the centrality in a graph.Scandinavian Journal of Psychology,15(1),322-336.
  25. Rosenbllom, A.(2004).The blogosphere.Communications of the ACM,47(12),31-33.
  26. Scott, J. P.(2000).Social network analysis: A handbook.London:Sage.
  27. Srinivas, K.,Kiran Kumar Reddy, L.,Govardhan, A.(2010).A theoretical approach to link mining for personalization.International Journal of Computer Science Issues,7(3),41-44.
  28. Völkel, M.,Krötzsch, M.,Vrandecic, D.,Haller, H.,Studer, R.(2006).Semantic Wikipedia.Proceedings of the 15th International Conference on World Wide Web,UK:
  29. Wasserman, S.,Faust, K.(1994).Social network analysis: Methods and applications.Cambridge, UK:Cambridge University Press.
  30. Wu, I.-C.,Lin, Y. H..WNavi: Constructing a SNA-based navigation interface for Wikipedia.Proceedings of the Conference on Education and Education Management,China:
  31. Wu, I.-C.,Wu, C.-Y.(2009).A user-oriented topic discovery approach for effective browsing of Wikipedia.Proceedings of the 13th International Conference on Human-Computer Interaction,USA:
  32. Zhao, P.,Han, J.,Sun, Y.(2009).P-rank: A comprehensive structural similarity measure over information networks.Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM),Hong Kong: