题名 |
以資料挖礦法則預測網頁更新規則之研究 |
并列篇名 |
Discovering Web Page Update Patterns with Data Mining |
DOI |
10.6188/JEB.2003.5(2).02 |
作者 |
許秉瑜(Ping-Yu Hsu);張維捷(Wei-Chieh Chang) |
关键词 |
企網頁更新 ; 資料挖礦 ; 樣式 ; 關聯規則 ; 網頁挖礦 ; web page update ; Data mining ; pattern Discovery ; WWW |
期刊名称 |
電子商務學報 |
卷期/出版年月 |
5卷2期(2003 / 09 / 01) |
页次 |
11 - 36 |
内容语文 |
繁體中文 |
中文摘要 |
企在電子商務時代,有各式代理人軟體(Agent)在網路搜尋資訊以建構各式各類網站。由於資料量通常相當龐大,對這類軟體而言,何時應更新其所取得的資訊,便成為一個系統管理員重要的決策課題。目前通常採取固定時間更新方式,亦即更新的間隔為一使用者自定的固定時間。但是一旦其間隔的設定不佳,則可能造成抓回來的網頁內容都是與先前相同的(間隔太短),或是網頁的內容已經被更新過多次以上了(間隔太長),這樣一來就可能會有浪費網路資源或資料過舊的情況出現。所以本論文利用資料挖礦中產生序列關聯規則的方法,對網頁找出其更新時間的樣式(up-date pattern),並以此樣式來實際擷取網頁,以做驗證。由於網頁更動的樣式可能隨著時間變化而產生修改,因此一成不動的預測樣式會逐漸失去準確性。本研究因此也提出累進式的方法來更新預測規則,使規則能適時反應現況但又不至於耗用過多電腦資源。 |
英文摘要 |
In the E-Commerce era, many agents roam over Internet to find best prices, cluster related product information, etc. Agents have to visit targeted web pages periodically to update information. If agents visit pages too frequently then they end up reloading existing information. On the other hand, if agents visit web pages too infrequently, collected data may be out of date. To minimize out-of-date errors, agents temp to visit a site as soon as possible. However, to minimize network traffic and database update cost, system administrators temp to reduce the visit as much as possible. To the best of our knowledge, no research has have been directed to finding a scientific approach to solve the dilemma. In the paper, we propose to visit web pages according to past update patterns. That is, a page should be visited as soon as it is expected to be changed, but should not be visited in any other time. To discover the update patterns, we propose to use sequential association rules of data mining methodology. Association rules can find patterns implicitly associated with update temporal patterns. In the paper, each web page will be associated with a sequence of binary digits denoting whether the page is updated in last agent fetching slot. We designed an algorithm to mine patterns from the sequence of binary digits. The patterns will be composed of large item sequences and related association rules. The rule states under some preconditions, the web page will be changed in next time slot. If a precondition matches current situation then an agent will be sent to fetch the page. Besides computing patterns for existing pages, the system will also update its database dynamically to consider the factors of newly inserted pages and deleted pages. |
主题分类 |
人文學 >
人文學綜合 基礎與應用科學 > 資訊科學 基礎與應用科學 > 統計 社會科學 > 社會科學綜合 |
参考文献 |
|