题名

Effects of Diacritics on Web Search Engines' Performance for Retrieval of Yoruba Documents

并列篇名

變音符號對搜尋引擎檢索約魯巴語文獻表現之成效

DOI

10.6182/jlis.2014.12(1).001

作者

Toluwase Victor Asubiaro

关键词

資訊檢索 ; 資訊檢索評估 ; 變音符號 ; 搜尋引擎 ; 約魯巴語 ; Information Retrieval ; Information Retrieval Evaluation ; Diacritics ; Search Engines ; Yoruba Language

期刊名称

圖書資訊學刊

卷期/出版年月

12卷1期(2014 / 06 / 01)

页次

1 - 19

内容语文

英文

中文摘要

本研究目的在於了解使用變音符號與否,是否影響搜尋引擎(AOL、Bing、Google、Yahoo!)搜尋約魯巴語文獻之成效。本研究自Google search logs整理奈及利亞最常使用的關鍵字,制訂30題約魯巴語問項,包含使用變音符號與未使用變音符號兩類,做為研究之關鍵字彙。研究結果顯示,未使用變音符號之關鍵字彙在所有搜尋引擎中皆獲得較多結果;在準確率(precision values)上,是否使用變音符號,則在AOL和Yahoo!相比時出現顯著差異。本研究結果指出,是否使用變音符號,確實影響搜尋引擎檢索約魯巴語文獻之成效。本研究建議,搜尋引擎有必要針對約魯巴語之問項與索引,預先進行正規化。

英文摘要

This paper aims to find out the possible effect of the use or nonuse of diacritics in Yoruba search queries on the performance of major search engines, AOL, Bing, Google and Yahoo!, in retrieving documents. 30 Yoruba queries created from the most searched keywords from Nigeria on Google search logs were submitted to the search engines. The search queries were posed to the search engines without diacritics and then with diacritics. All of the search engines retrieved more sites in response to the queries without diacritics. Also, they all retrieved more precise results for queries without diacritics. The search engines also answered more queries without diacritics. There was no significant difference in the precision values of any two of the four search engines for diacritized and undiacritized queries. There was a significant difference in the effectiveness of AOL and Yahoo when diacritics were applied and when they were not applied. The findings of the study indicate that the search engines do not find a relationship between the diacritized Yoruba words and the undiacritized versions. Therefore, there is a need for search engines to add normalization steps to pre-process Yoruba queries and indexes. This study concentrates on a problem with search engines that has not been previously investigated.

主题分类 人文學 > 圖書資訊學
参考文献
  1. comScore.com. (2012). comScore releases December 2011 U.S. search engine rankings. Retrieved from http://www.comscore.com/Press_Events/Press_Releases/2012/1/comScore_Releases_December_2011_U.S._Search_Engine_Rankings
  2. Internet World Stats. (2010). Internet world users by language top 10 languages. Retrieved from http://www.internetworldstats.com/stats7.htm
  3. Rampton, J. (2011). comScore.com: Bing takes no. 2 spot from Yahoo! in December 2011. Retrieved from http://searchenginewatch.com/article/2137562/comScore-Bing-Takes-No.-2-Spot-From-Yahoo!-in-December-2011
  4. Alpkocak, A.,Ceylan, M.(2012).Effects of diacritics on Turkish information retrieval.Journal of Electrical Engineering & Computer Science,20(5),787-804.
  5. Clarke, L.,Craswell, N.,Voorhees, M.(2012).Overview of the TREC 2012 web track.Proceedings of The Twenty-First Text REtrieval Conference (TREC 2012),Gaithersburg, MD:
  6. Griesbaum, J.(2004).Evaluation of three German search engines: Altavista.de, Google.de and Lycos.de..Information Research,9(4)
  7. Harman, D.,Braschler, M.,Hess, M.,Kluck, M.,Peters, C.,Schäuble, P.,Sheridan, P.(2001).CLIR evaluation at TREC.Cross-Language Information Retrieval and Evaluation
  8. Harris, C.,Srinivasan, P.(2012).Using hybrid methods for relevance assessment in TREC crowd'12..Proceedings of The Twenty-First Text REtrieval Conference (TREC 2012),Gaithersburg, MD:
  9. Jouis, C.(Ed.),Biskri, I.(Ed.),Ganascia, J.-G.(Ed.),Roux, M.(Ed.)(2012).Next generation search engines: Advanced models for information retrieval.Hershey, PA:IGI Global.
  10. Kumar, B. T. S.,Pavithra, S. M.(2010).Evaluating the searching capabilities of the search engines and meta search engine: A comparative study.Annals of Library and Information Studies,57,87-97.
  11. Liu, L.(Ed.),Özsu, M. T.(Ed.)(2007).Encyclopedia of database systems.Boston, MA:Springer US.
  12. Magdy, W.,Jones, G.(2010).A new metric for patent retrieval evaluation.1st International Workshop on Advances in Patent Information Retrieval (AsPIRe'10),Milton Keynes, United Kingdom:
  13. Radlinski, F.,Craswell, N.(2010).Comparing the sensitivity of information retrieval metrics.SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval,New York, NY:
  14. Sakai, T.,Dou, Z.(2013).Summaries, ranked retrieval and sessions: A unified framework for information access evaluation.SIGIR'13: Proceedings of the 36th International ACM SIGIR conference on research and development in information retrieval,New York, NY:
  15. Shafi, S. M.,Rather, R. A.(2005).Precision and recall of five search engines for retrieval of scholarly information in the field of biotechnology.Webology,2(2)
  16. Tawileh, W.,Mandl, T.,Griesbaum, J.(2011).Evaluation of five web search engines' in Arabic language.LWA in Kassel,Hessen, Germany: