


Geographic Information Retrieval on Web Pages-Taking Homestay as an Example




鄒明城(Ming-Cheng Tsou);韓慧林(Hui-Lin Hai);邱景星(Ching-Hsing Chiu)


地理資訊檢索 ; 文字探勘 ; 網頁探勘 ; 正規表示式 ; geographic information retrieval ; text mining ; web mining ; regular expression




17卷3期(2010 / 07 / 01)


19 - 44




網際網路上散佈了各式主題與大量的網頁資料,其中隱含了非常多的知識,但是這些內容大多是半結構性,甚至是非結構性的資料,如何能夠有效率的管理這些資料,並且進行資訊與知識的擷取,一直是研究與開發的重點,因此也就有各式各樣的網路搜尋引擎、資料探勘以及網路行銷技術的開發。但是目前一般的網路搜尋技術大多只著重於關鍵字的檢索,對於網頁內容與主題的分析,則仍未盡理想。另外,對於網頁內容中的地理資訊,也未能進行有效的檢索與分析,以致於犧牲了許多內含的地理資訊。 本研究以網頁中的民宿主題為例,使用Google Search Web Service為網路搜尋的基礎,結合中央研究院詞庫小組開發的斷詞斷字系統與文字資料探勘的技術,對於Google所搜尋到的網頁,進行空間與語意內容的探勘、檢索與排序,找出與所查詢主題在內容與地理資訊上最相關的網頁。接著,透過地理資訊檢索與正規表示式,由這些篩選過的網頁內容中,檢索出有用的地理資訊,再透過Google Map API地址對位的技術,將檢索出來的地理資訊與文字內容結合顯示於Google Map地圖上。以這樣的方式所搜尋出來的結果,將是包含了地理資訊的圖與文,且更貼近需求的查詢結果,將可應用於各種與空間主題相關之內容的查詢、分析、地理資料蒐集與空間知識的管理上。


The World Wide Web (WWW) offers an enormous spread of information and data, and assembles a tremendous amount of knowledge. Much of this knowledge however, comprises either non-structured data or semi-structured data. In order to make use of these unexploited or underexploited resources more efficiently, the management of information and data gathering have become essential direction for research and development. However, at the present moment, the ability of regular search engines to access and use this data, is still far from perfect, since it is limited to the retrieval of basic keywords rather than analysis of the subject matter and content of the webpage itself. In addition, there are limited capabilities for effective retrieval and analysis of implicit geographic information contained within the webpage. This paper focuses on the task of researching a hostel or homestay by using the Google Search Web Service as a base search engine. From the search results, mining, retrieving and sorting out location and semantic data were carried out by combining the Chinese Word Segmentation System with Text Mining technology in order to find geographic information thatthatthat can be derived from the webpage. The results obtained from this particular searching method allowed users to get closer to the answers they sought and achieve greater accuracy, since the results included graphics and associated textual geographic information. In the future, this method may be suitable for and applicable to various types of queries, analyses and geographic data collection, and in managing spatial knowledge related to different keywords within a document.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
