题名

網頁地理資訊檢索與探勘-以民宿主題為例

并列篇名

Geographic Information Retrieval on Web Pages-Taking Homestay as an Example

DOI

10.6382/JIM.201007.0019

作者

鄒明城(Ming-Cheng Tsou);韓慧林(Hui-Lin Hai);邱景星(Ching-Hsing Chiu)

关键词

地理資訊檢索 ; 文字探勘 ; 網頁探勘 ; 正規表示式 ; geographic information retrieval ; text mining ; web mining ; regular expression

期刊名称

資訊管理學報

卷期/出版年月

17卷3期(2010 / 07 / 01)

页次

19 - 44

内容语文

繁體中文

中文摘要

網際網路上散佈了各式主題與大量的網頁資料,其中隱含了非常多的知識,但是這些內容大多是半結構性,甚至是非結構性的資料,如何能夠有效率的管理這些資料,並且進行資訊與知識的擷取,一直是研究與開發的重點,因此也就有各式各樣的網路搜尋引擎、資料探勘以及網路行銷技術的開發。但是目前一般的網路搜尋技術大多只著重於關鍵字的檢索,對於網頁內容與主題的分析,則仍未盡理想。另外,對於網頁內容中的地理資訊,也未能進行有效的檢索與分析,以致於犧牲了許多內含的地理資訊。 本研究以網頁中的民宿主題為例,使用Google Search Web Service為網路搜尋的基礎,結合中央研究院詞庫小組開發的斷詞斷字系統與文字資料探勘的技術,對於Google所搜尋到的網頁,進行空間與語意內容的探勘、檢索與排序,找出與所查詢主題在內容與地理資訊上最相關的網頁。接著,透過地理資訊檢索與正規表示式,由這些篩選過的網頁內容中,檢索出有用的地理資訊,再透過Google Map API地址對位的技術,將檢索出來的地理資訊與文字內容結合顯示於Google Map地圖上。以這樣的方式所搜尋出來的結果,將是包含了地理資訊的圖與文,且更貼近需求的查詢結果,將可應用於各種與空間主題相關之內容的查詢、分析、地理資料蒐集與空間知識的管理上。

英文摘要

The World Wide Web (WWW) offers an enormous spread of information and data, and assembles a tremendous amount of knowledge. Much of this knowledge however, comprises either non-structured data or semi-structured data. In order to make use of these unexploited or underexploited resources more efficiently, the management of information and data gathering have become essential direction for research and development. However, at the present moment, the ability of regular search engines to access and use this data, is still far from perfect, since it is limited to the retrieval of basic keywords rather than analysis of the subject matter and content of the webpage itself. In addition, there are limited capabilities for effective retrieval and analysis of implicit geographic information contained within the webpage. This paper focuses on the task of researching a hostel or homestay by using the Google Search Web Service as a base search engine. From the search results, mining, retrieving and sorting out location and semantic data were carried out by combining the Chinese Word Segmentation System with Text Mining technology in order to find geographic information thatthatthat can be derived from the webpage. The results obtained from this particular searching method allowed users to get closer to the answers they sought and achieve greater accuracy, since the results included graphics and associated textual geographic information. In the future, this method may be suitable for and applicable to various types of queries, analyses and geographic data collection, and in managing spatial knowledge related to different keywords within a document.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. Perez-Iglesias, J.“Integrating BM25 & BM25F into Lucene,"June 2008 (available online at http://nlp.uned.es/~jperezi/Lucene-BM25/)
  2. Vestavik, O.“Geographic Information Retrieval: An Overview,"June 2008 (available online at http://www.idi.ntnu.no/~oyvindve/article.pdf)
  3. Robertson, S. and Zaragoza, H.“The Probabilistic Relevance Method: BM25 and beyond (SIGIR 2007 Tutorial 2D),"June 2007 (available online at http://barcelona.research.yahoo.net/dokuwiki/doku.php?id=prm).
  4. Amitay, E.,Har''E, l N.,Sivan, R.,Soffer, A.(2004).Web-a-Where: Geotagging Web content.Proceedings of the 27th annual international ACM SIGIR Conference on research and development in information retrieval
  5. Andrade, L.,Silva, M.(2006).Relevance Ranking for Geographic IR.Proceedings of the workshop on Geographic Information Retrieval
  6. Boguraev, B.,Neff, M. S.(2000).Discourse segmentation in aid of document summarization.Proceedings of the 33rd Hawaii International Conference on System Sciences
  7. Buyukkokten, O.,Cho, J.,Garcia-molina, H.,Gravano, L.,Shivakumar, N.(1999).Exploiting geographical location information of web pages.Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99)
  8. Byrd, R.,Ravin, Y.(1999).Identifying and extracting relations in text.Proceedings 5th Jt Conference on Information Sciences, JCIS2000
  9. Egenhofer M. J.(ed.),Marks D. M.(ed.)(2002).GIScience.Berlin:Springer-Verlag.
  10. Gey, F.,Larson, R.,Sanderso, M.,Joho, H.,Clough, P.(2006).GeoCLEF: the CLEF 2005 cross-language geographic information retrieval track.CLEF 2005 Workshop
  11. Hawking, D.,Upstill, T.,Craswell, N.(2004).Toward Better Weighting of Anchors.Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
  12. Jones, C. B.,Alani, H.,Tudhope, D.(2001).Geographical Information Retrieval with Ontologies of Place.Proceedings of the International Conference on Spatial Information Theory
  13. Jones, C. B.,Purves, R.,Ruas, A.,Sanderson, M.,Sester, M.,van Kreveld, M.,Weibel, R.(2002).Spatial information retrieval and geographical ontologies: An overview of the SPIRIT project.Proceedings of the 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval
  14. Kanada, Y.(1999).A Method of geographical name extraction from Japanese text for thematic geographical search.Proceedings of the 8th international conference on Information and knowledge management
  15. Koch(ed.),Solvberg(ed.)(2003).Research and Advanced Technology for Digital Libraries.Berlin:Springer.
  16. Kornai, A.,Sundheim, B.(2003).Proceedings of the NAACL-HLT Workshop on the Analysis of Geographic References
  17. Larson, R. R.(1995).Geographic Information Retrieval and Spatial Browsing.Geographic Information Systems Patrons Maps and Spatial Information
  18. Lin, K. H.-Y,Hou, W.-J.,Chen, H.-H.(2005).Retrieval of Biomedical Documents by Prioritizing Key Phrases.Proceedings of the Fourteenth Text REtrieval Conference (TREC 2005),Gaithersburg, Maryland:
  19. Martins, B.,Silva, M. J.(2005).A graph-ranking algorithm for geo-referencing documents.Proceedings of the 5th IEEE International Conference on Data Mining
  20. May, W.-Y.,Chang, K.-J.(2003).Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff.Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing
  21. McCurley, K. S.(2001).Geospatial mapping and navigation of the Web.Proceedings of the 10th International conference on World Wide Web
  22. Mitra, S.,Acharya, T.(2003).Data Mining: Multimedia, Soft Computing and Bioinformatics.John Wiely & Sons, Inc..
  23. Periakaruppan, R.,Nemeth, E.(1999).GTrace-A Graphical Traceroute Tool.Proceedings of the 13th USENIX conference on System administration
  24. Purves, R. R.,Sanderson, A.,Sester, M. M.,Kreveld, M. V.,Weibel, R.(2002).Spatial information retrieval and geographical ontologies an overview of the SPIRIT project.Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
  25. Rijsbergen, V. C. J.(1979).Information Retrieval.London:Butterworth.
  26. Robertson, S. E.,Walker, S.,Jones, S.,Hancock-Beaulieu, M.,Gatford, M.(1994).Okapi at TREC-3.Proceedings of the 3rd Text REtrieval Conference (TREC 1994)
  27. Salton, G.(1968).Automatic Information Organization and Retrieval.New York:McGraw-Hill.
  28. Salton, G.,Buckley, C.(1988).Term-Weighting Approaches in Automatic Text Retrieval.Information Processing and Management,24(5),513-523.
  29. Scharl, A.(ed.),Tochtermann, K.(ed.)(2007).The Geospatial Web - How Geo-Browsers, Social Software and the Web 2.0 Shaping the Network Society.London:Springer.
  30. Souza, L.,Davis, C. J.,Borges, K.,Delboni, T.,Laender, A.(2005).The role of gazetteers in geographic knowledge discovery on the web.Proceedings of the 3rd Latin American Web Congress
  31. Tezuka, T.,Kurashima, T.,Tanaka, K.(2006).Toward tighter integration of web search with a geographic information system.Proceedings of the 15th international conference on World Wide Web
  32. Tezuka, T.,Tanaka, K.(2005).Landmark Extraction: A Web Mining Approach.Proceedings of COSIT' 2005
  33. Vaid, S.,Jones, C. B.,Joho, H.,Sanderson, M.(2005).Spatio-textual indexing for geographical search on the web.Proceedings of SSTD-05, the 9th Symposium on Spatial and Temporal Databases
  34. Vogel, D.,Bickel, S.,Haider, P.,Schimpfky, R.,Siemen, P.,Bridges, S.,Scheffer, T.(2005).Classifying search engine queries using the Web as background knowledge.SIGKDD Explorations Newsletter,7(2),117-122.
  35. Woodruff, A. G.,Plaunt, C.(1994).GIPSY: Geo-referenced Information Processing System.Journal of the American Society for Information Science,45(9),645-655.
  36. 李俐槿、李祐陞、林金龍、黃國倫(2007)。自動旅遊行程空間對位。2007台灣地理資訊學會年會暨學術研討會
被引用次数
  1. 謝育慈(2016)。醫學博碩士論文關鍵詞與MeSH詞彙之對應研究-以臺北醫學大學為例。淡江大學數位出版與典藏數位學習碩士在職專班學位論文。2016。1-81。