题名

格網運算環境於序列型樣探勘之設計與實作

并列篇名

The Design and Implementation of a Grid-Computing Environment for Mining Sequential Patterns

DOI

10.6382/JIM.200904.0129

作者

羅宇傑(Yu-Chieh Lo);吳志宏(Chih-Hung Wu);賴智錦(Chih-Chin Lai)

关键词

資料探勘 ; 序列型樣探勘 ; 分散式處理 ; 鬆散耦合處理 ; 格網運算 ; Data Mining ; Mining Sequential Patterns ; Distributed Processing ; Loosely Coupled Parallelism ; Grid Computing

期刊名称

資訊管理學報

卷期/出版年月

16卷2期(2009 / 04 / 01)

页次

129 - 150

内容语文

繁體中文

中文摘要

本論文提出格網運算環境於序列型樣探勘之設計與實作。本研究實作一Apriori-like演算法的序列型樣探勘於格網運算環境,並加以驗證、分析其探勘效能與結果。Apriori-like演算法相較於相關序列型樣探勘的演算法而言,探勘過程需歷經大量重覆性與遞迴式的資料處理與演算,缺乏高效率的執行效能。但Apriori-like演算法透過修改少量的資料探勘演算程序,即可適用於鬆散耦合的分散式處理,並實行分散任務於格網運算環境。本研究所提出的格網運算環境中,設計了運算格網與資料格網等兩種格網節點型態,所有的格網節點皆以Globus Toolkit實作,每一格網節點安裝與設定本研究所開發的分散探勘程式。格網服務程序為透過使用者或遠端格網節點所觸發之程序,並賦予回應探勘結果至格網主控端,相互合作地完成探勘任務。格網運算環境主要分散於兩個不同的大學校園網路,安裝與設定了16台格網節點,每一格網節點為獨立電腦主機,每台電腦皆配置著不同的硬體元件,藉以呈現真實格網運算的實作環境。最後,經由本研究之實驗結果與效能評估顯示,格網運算環境可提供高度彈性與高效能之運算平台,適用於大容量資料庫的序列型樣探勘。

英文摘要

This paper presents the design and implementation of a grid-computing environment for mining sequential patterns. An Apriori-like algorithm for mining sequential patterns is deployed in the proposed grid-computing environment. Apriori-like algorithm is not of very high performance in comparison to others but it is more convenient to be realized for distributed processing in a grid computing environment due to its nature of loosely coupled processing. Two types of grids are designed, the computing grid and data grid, in the proposed environment. All grid nodes are installed with full functions implementing the mentioned Apriori-like algorithm for mining sequential patterns, each of which is wrapped by Globus Toolkit. Grid services are invoked by the users or other grids and able to respond to the invoking side for cooperatively completing the mining task. There are 16 computers serving as grid nodes each of which is equipped with different hardware components and is distributed across two WANs. The experimental results show that the proposed grid-computing environment provides a flexible and efficient platform for mining sequential patterns from large datasets.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. JSR-000101 Java APIs for XML based RPC (Proposed Final Draft)
  2. Globus Alliance
  3. Agrawal, R.,Shafer, J. C.(1996).Parallel Mining of Association.IEEE Transactions on Knowledge and Data Engineering,8(6),962-969.
  4. Agrawal, R.,Srikant, R.(1995).Mining Sequential Patterns.Proceedings of the 11th International Conference on Data Engineering
  5. Ali, A.,Anjum, A.,Azim, T.,Bunn, J. J.,Mehmood, A.,McClatchey, R.,Newman, H. B.,Rehman, W.,Steenberg, C.,Thomas, M.,Lingen, F.,Willers, I.,and Zafar, M. A.(2005).Resource Management Services for a Grid Analysis Environment.Proceedings of the 34th International Conference on Parallel Processing Workshops
  6. Alpdemir, M. N.,Mukherjee, A.,Paton, N. W.,Watson, P.,Fernandes, A. A. A.,Gounaris, A.,Smith, J.(2003).Service-Based Distributed Querying on the Grid.Proceedings of the 1st International Conference on Service-Oriented Computing
  7. Cannataro, M.,Comito, C.(2003).A Data Mining Ontology for Grid Programming.1st International Workshop on Semantics in Peer-to-Peer and Grid Computing.
  8. Cannataro, M.,Talia, D.(2003).Knowledge Grid: An Architecture for Distributed Knowledge Discovery.Communication of ACM,46(1),89-93.
  9. Cao, J.,S. Jarvis, A.,Saini, S.(2002).ARMS: An Agent-Based Resource Management System for Grid Computing.Scientific Programming,10(2),135-148.
  10. Chen, M. S.,Han, J.,Yu, P. S.(1996).Data Mining: An Overview from a Database Perspective.IEEE Transactions on Knowledge and Data Engineering,8,866-883.
  11. Chervenak, A.,Foster, I.,Kesselman, C.,Salisbury, C.,and Tuecke, S.(2000).The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets.Journal of Network and Computer Applications,23(3),187-200.
  12. Erwin, D. W.,Snelling, D. F.(2001).UNICORE: A Grid Computing Environment.Lecture Notes in Computer Science,2150,825-834.
  13. Ferreira, L.,Berstis, V.,Armstrong, J.,Kendzierski, M.,Neukoetter, A.,Takagi, M.,Bing-Wo, R.,Amir, A.,Murakawa, R.,Hernandez, O.,Magowan, J.,Bieberstein, N.(2003).Introduction to Grid Computing with Globus.IBM International Technical Support Organization.
  14. Globus Project
  15. Foster, I.,Kesselmany, C.(1997).Globus: A Metacomputing Infrastructure Toolkit.The International Journal of Supercomputer Applications and High Performance Computing,11(2),115-128.
  16. Frawley, W. J.,Piatetsky-Shapiro, G.,Matheus, C. J.(1992).Knowledge Discovery in Databases-An Overview.AI Magazine,13,57-70.
  17. Grimshaw, A. S.,Wulf, W. A.,French, J. C.,Weaver, A. C.,Reynolds, P. F.(1994).University of Virginia, Technical ReportUniversity of Virginia, Technical Report,未出版
  18. Gualnik, V.,Karypis, G.(2004).Parallel Tree-Projection-Based Sequence Mining Algorithms.Parallel Computing,30,443-472.
  19. Han, J.,Pei, J.,Mortazavi-Asl, B.,Chen, Q.,Dayal, U.,Hsu, M. C.(2000).FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining.Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
  20. Han, J.,Pei, J.,Mortazavi-Asl, B.,Wang, J.,Pinto, H.,Chen, Q.,Dayal, U.,Hsu, M. C.(2004).Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach.IEEE Transactions on Knowledge and Data Engineering,16(11),1424-1440.
  21. Quest Synthetic Data Generation Code
  22. Masseglia, F.,Cathala, F.,Poncelet, P.(1998).The PSP Approach for Mining Sequential Patterns.Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
  23. Natarajan, R.,Sion, R.,Apte, C.,and Narang, I. S.(2004).A Grid-Based Approach for Enterprise-Scale Data Mining.Workshop on Data Mining and the Grid at the 4th IEEE International Conference on Data Mining
  24. Rahman, R. M.,Barker, K.,Alhajj, R.(2005).Replica Selection in Grid Environment: A Data-Mining Approach.Proceedings of the 2005 ACM Symposium on Applied Computing
  25. Roughan, M.,Zhang, Y.(2006).Secure Distributed Data-Mining and its Application to Large-Scale Network Measurements.ACM SIGCOMM Computer Communication Review,36(1),7-14.
  26. Silberschatz, A.,Galvin, P.(2003).Operating System Concepts.John Wiley & Sons.
  27. Srikant, R.,Agrawal, R.(1996).Mining Sequential Patterns: Generalizations and Performance Improvements.Proceedings of the 5th International Conference on Extending Database Technology (Lecture Notes in Computer Science 1057)
  28. Swany, M.,Wolski, R.(2004).Building Performance Topologies for Computational Grids.International Journal of High Performance Computing Applications,18(2),255-265.
  29. Zaki, M.(1999).Parallel and Distributed Association Mining: A Survey.IEEE Concurrency,7(4),14-25.
  30. 張昭憲、周定賢(2005)。以動態任務分配為基礎之分散式循序樣本探勘系統。第十六屆國際資訊管理學術研討會
被引用次数
  1. 胡宜中、林震岩、林雅惠(2011)。運用關聯規則和序列型樣探討投資地區之關聯性與遷移─以印刷電路板產業為例。明新學報,37(1),217-230。