题名

Using Blog Content Depth and Breadth to Access and Classify Blogs

DOI

10.6702/ijbi.2010.5.1.2

作者

Meichieh Chen;Toshizumi Ohta

关键词

Blog ; Wikipedia ; semantic analysis ; topic based model

期刊名称

International Journal of Business and Information

卷期/出版年月

5卷1期(2010 / 06 / 01)

页次

26 - 45

内容语文

英文

英文摘要

Blogs are attractive to readers and researchers because of their ability to express a variety of opinions and critiques on topics. In particular, the fast growth of such online context brings in a strong demand for developing blog-specific filtering systems in order to identify and monitor the knowledge flow in the blogosphere. Traditional measures, which simply adopt link-analysis algorithms, are inadequate when it comes to blogs because of their sparseness of links. In this paper, we propose a filtering system that is able to assess the associated content of blog entries by measuring topic concentration and topic variety in terms of content depth and breadth with the scalars of informativeness, completeness, topic count, inter-topic distance, and topic mergence. These employed measures have proved to be appropriate for helping users to judge blogs they prefer among search results. Our system is different from existing blog search engines, as it aims to provide better relevance and precision of the search.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 經濟學
社會科學 > 管理學
参考文献
  1. Adomavicius, G.,Tuzhilin, A.(2005).Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions.IEEE Transactions on Knowledge and Data Engineering,17(6),734-749.
  2. Avesani, P.,Cova, M.,Hayes, C.,Massa, P.(2005).Learning contextualised weblog topics.Proc. WWW'05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics
  3. Balabanovi, M.,Shoham, Y.(1997).Fab: content-based, collaborative recommendation.Commun. ACM,40(3),66-72.
  4. Bizer, C.,Cyganiak, R.(2009).Quality-driven information filtering using the WIQA policy framework.Web Semantics: Science, Service and Agents WWW'07
  5. Brin, S.,Page, L.,Motwami, R.,Winograd, T.(1999).Stanford University Technical ReportStanford University Technical Report,Stanford University.
  6. Chee, S.,Han, J.,Wang, K.(2001).Rectree: An efficient collaborative filtering method.3rd Int. Conf. On Data Warehousing and Knowledge Discovery(DAWAK 2001),Munich, Germany:
  7. Chen, Y.,Tsai, F.S.,Chan, K.L.(2008).Machine learning techniques for business blog search and mining.Expert Systems with Applications,35,581-590.
  8. Eppler, M.,Muenzenmayer, P.(2002).Measuring information quality in the Web context: A survey of state-of-the-art instruments and an application methodology.International Conference on Information Quality
  9. Faloutsos, C.,Oard, W.D.(1995).,UM Computer Science Department.
  10. Fujimura, K.,Toda, H.,Inoue, T.,Hiroshima, N.,Kataoka, R.,Sugizaki, M.(2006).BLOGRANGER-A Multi-faceted Blog Search Engine.Proc. WWW'06
  11. Glance, N.S.,Hurst, M.,Tomokiyo, T.(2004).BlogPulse: Automated trend discovery for weblogs.Proc. WWW'04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis, and Dynamics
  12. Gruhl, D.,Guha, R.,Liben-Nowell, D.,Tomkins, A.(2004).Information diffusion through blogspace.Proc.WWW'04 Workshop on the Weblogging Ecosystem: Aggregation, Analysis, and Dynamics
  13. Hearst, M.,Hurst, M.,Dumais, D.(2008).What Should Blog Search Look Like?.Proc. SSM'08
  14. Hofmann, T.(1999).Probabilistic latent semantic indexing.Proc. SIGIR/ACM'99
  15. Kayaap, M.,Özyer, T.,Özyer, S.T.(2009).A collaborative and content based event recommendation system integrated with data collection scrapers and services at a social networking site.IEEE Advances in Social Network Analysis and Mining
  16. Kleinberg, J.(1999).Authoritative sources in a hyperlinked environment.Journal of the ACM,46
  17. Knight, S.-A.,Burn, J.(2005).Developing a framework for assessing information quality on the World Wide Web.Informing Science Journal,8,160-172.
  18. Kolari, P.,Java, A.,Finin, T.(2006).Characterizing the splogosphere.3rd Workshop on the Weblogging Ecosystem, WWW 2006
  19. Koppel, M.,Schler, J.,Argamon, S.,Pennebaker, J.W.(2006).Effects of age and gender on blogging.AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs
  20. Kosala, R.,Blockeel, H.(2000).Web mining research: A survey.SIGKDD Explor. Newsl.,2(1),1-15.
  21. Kr¨otzsch, M.,Vrandecic, D.,V¨olkel, M.(2005).Wikipedia and the semantic Web-the missing links.Proc. WIKIMANIA'05
  22. Kuhn, A.,Ducasse, S.,Girba, T.(2007).Semantic clustering: Identifying topics in source code.Information and Software Technology
  23. Li, M.,Chen, W.A.(2009).Synthetical approach for blog recommendation: Combining trust, social relation, and semantic analysis.Expert Systems with Applications,36(3),6536-6547.
  24. Mishne, G.(2006).Information access challenges in the blogspace.Proc. IIIA-2006 - International Workshop on Intelligent Information Access
  25. Mishne, G.,de Rijke, M.(2006).A study of blog search.Proc. ECIR'06
  26. Nakajima, S.,Tatemura, J.,Hino, Y.,Hara, Y.,Tanaka, K.(2005).Discovering important bloggers based on analyzing blog threads.Proc. WWW'05 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Ddynamics
  27. Naumann, F.(2002).Quality-Driven Query Answering for Integrated Information Systems.Berlin:Springer.
  28. Pikas, C.K.(2005).Blog searching for competitive intelligence, brand image, and reputation management.Online,29(4),16-21.
  29. Pipino, L.,Wang, R.,Kopcso, D.,Rybold, W.(2005).Developing Measurement Scales for Data-Quality Dimensions.New York:M.E. Sharpe.
  30. Raghavan, P.(1997).Information retrieval algorithms: A survey.SODA '97: Proc. SIAM/ACM'97
  31. Rubin, L.,Liddy, E.(2006).Assessing credibility of Weblogs.Proc. AAAI-06: CAAW
  32. Ruiz-Casado, M.,Alfonseca, E.,Castells, P.(2006).From Wikipedia to semantic relationships: A semi-automated annotation approach.Proc. ESWC'06
  33. Schaffert, S.,Gruber, A.,Westenthaler, R.(2005).A semantic Wiki for collaborative knowledge formation.Proc. SEMANTICS'05
  34. Sifry, S.(2007).State of the Blogosphere
  35. Sriphaew, K.,Takamura, H.,Okumura, M.(2008).Cool blog identification using topic based models.Proc. IEEE/WIC/ACM'08
  36. Strong, D.,Lee, Y.,Wang, R.(1997).Data quality in context.Communications of the ACM,40(5),103-110.
  37. Tseng, B.,Tatemura, J.,Wu, Y.(2005).Tomographic clustering to visualize blog communities as mountain views.Proc. WWW'05
  38. Ulicny, B.,Baclawski, K.(2007).New metrics for newsblog credibility.Proc. ICWSM,Boulder, Colorado:
  39. Wang, R.,Strong, D.(1996).Beyond accuracy: What data quality means to data consumers.Journal of Management Information Systems,12(4),5-33.
  40. Weerkamp, W.,Rijke, M.(2008).Credibility improves topical blog post retrieval.Proc. ACL'08: HLT