题名 |
格網運算環境於序列型樣探勘之設計與實作 |
并列篇名 |
The Design and Implementation of a Grid-Computing Environment for Mining Sequential Patterns |
DOI |
10.6382/JIM.200904.0129 |
作者 |
羅宇傑(Yu-Chieh Lo);吳志宏(Chih-Hung Wu);賴智錦(Chih-Chin Lai) |
关键词 |
資料探勘 ; 序列型樣探勘 ; 分散式處理 ; 鬆散耦合處理 ; 格網運算 ; Data Mining ; Mining Sequential Patterns ; Distributed Processing ; Loosely Coupled Parallelism ; Grid Computing |
期刊名称 |
資訊管理學報 |
卷期/出版年月 |
16卷2期(2009 / 04 / 01) |
页次 |
129 - 150 |
内容语文 |
繁體中文 |
中文摘要 |
本論文提出格網運算環境於序列型樣探勘之設計與實作。本研究實作一Apriori-like演算法的序列型樣探勘於格網運算環境,並加以驗證、分析其探勘效能與結果。Apriori-like演算法相較於相關序列型樣探勘的演算法而言,探勘過程需歷經大量重覆性與遞迴式的資料處理與演算,缺乏高效率的執行效能。但Apriori-like演算法透過修改少量的資料探勘演算程序,即可適用於鬆散耦合的分散式處理,並實行分散任務於格網運算環境。本研究所提出的格網運算環境中,設計了運算格網與資料格網等兩種格網節點型態,所有的格網節點皆以Globus Toolkit實作,每一格網節點安裝與設定本研究所開發的分散探勘程式。格網服務程序為透過使用者或遠端格網節點所觸發之程序,並賦予回應探勘結果至格網主控端,相互合作地完成探勘任務。格網運算環境主要分散於兩個不同的大學校園網路,安裝與設定了16台格網節點,每一格網節點為獨立電腦主機,每台電腦皆配置著不同的硬體元件,藉以呈現真實格網運算的實作環境。最後,經由本研究之實驗結果與效能評估顯示,格網運算環境可提供高度彈性與高效能之運算平台,適用於大容量資料庫的序列型樣探勘。 |
英文摘要 |
This paper presents the design and implementation of a grid-computing environment for mining sequential patterns. An Apriori-like algorithm for mining sequential patterns is deployed in the proposed grid-computing environment. Apriori-like algorithm is not of very high performance in comparison to others but it is more convenient to be realized for distributed processing in a grid computing environment due to its nature of loosely coupled processing. Two types of grids are designed, the computing grid and data grid, in the proposed environment. All grid nodes are installed with full functions implementing the mentioned Apriori-like algorithm for mining sequential patterns, each of which is wrapped by Globus Toolkit. Grid services are invoked by the users or other grids and able to respond to the invoking side for cooperatively completing the mining task. There are 16 computers serving as grid nodes each of which is equipped with different hardware components and is distributed across two WANs. The experimental results show that the proposed grid-computing environment provides a flexible and efficient platform for mining sequential patterns from large datasets. |
主题分类 |
基礎與應用科學 >
資訊科學 社會科學 > 管理學 |
参考文献 |
|
被引用次数 |