题名

漸進式區塊深度優先關聯法則探勘之研究

并列篇名

A Study of Incremental Block Depth First Search Technique on Association Rule Mining

DOI

10.6382/JIM.200810.0099

作者

游坤明(Kun-Ming Yu);王子健(Tzu-Chien Wang);王冠傑(Kuan-Chieh Wang)

关键词

資料探勘 ; 關聯式法則 ; 漸進式探勘 ; Data Mining ; Association Rules ; Incremental

期刊名称

資訊管理學報

卷期/出版年月

15卷4期(2008 / 10 / 01)

页次

99 - 122

内容语文

繁體中文

中文摘要

有鑑於傳統關聯法則之探勘方法,需要耗費大量時間來完成資料之探勘,過去雖有學者提出漸進式探勘架構,不過仍然無法避免舊有資料庫重複掃瞄。因此本論文提出一個運用項目資料結構與區塊深度優先之探勘策略,只需對交易資料庫進行一次掃瞄,建立探勘程序使用之資料結構,可避免反覆掃瞄資料庫,並且在產生關聯法則時,只需要針對必要項目進行比對。此外針對漸進式資料之動態資料庫,透過本演算法所提出的漸進探勘機制,利用過去探勘所記錄之資訊,可以避免對舊有資料進行重複掃瞄完成資料探勘。本論文並針對傳統演算法,利用實際的資料進行探勘效能之實驗,並且進行效能比較與分析。透過實驗結果顯示,本論文提出之演算法可以節省大量的探勘時間。

英文摘要

Data mining technique and application has received a lot of attention in the past decade. And finding out the association rules among data is one of the hot topics of data mining. By applying data mining technique, we can get valuable information from large size of raw data efficiently. But with the evolution of computer technology, the data grow constantly in time and the time spent in finding the valuable information is growth sharply. Therefore, how to design an efficient data mining scheme is extremely important. This paper focuses on the important issue and proposes an I-BDFS (Incremental Block Depth First Search) algorithm to resolve the problem. In I-BDFS algorithm, the raw data only needed to be scanned once instead of reduplicate of database scanned in previous algorithms. The proposed algorithm also can quickly generate large itemset by necessary intersection item, so the algorithm can save lots of execution time when mining. Moreover, with the help of designed structure in I-BDFS, the proposed algorithm need only to mine the necessary specific patterns to save scanning and comparison time. At last, in the paper, we conduct several experiments with real data to evaluate the performance of I-BDFS as well as some traditional algorithms. And the experimental results show that I-BDFS algorithm indeed has better performance compared with those traditional algorithms.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. Agrawal, R.,Srikant, R.(1995).Mining Sequential Patterns.Proceedings of 11th International conference on Data Engineering
  2. Agrawal, R.,Srikant, R.(1994).Fast algorithms for mining association rules in large database.Proceedings of 20th International conference on Very Large Database
  3. Chen, Ming-Syan,Han, Jiawei,Yu, P. S.(1996).Data mining: an overview from a database perspective.IEEE Transactions on Knowledge and Data Engineering,8(6),866-883.
  4. Cheung, D. W.,Han, Jiawei,Ng, V. T.,Wong, C. Y.(1996).Maintenance of discovered association rules in large databases: an incremental updating technique.Proceedings of the Twelfth International Conference on Data Engineering
  5. Cheung, D. W.,Lee, S. D.,Kao, B.,Wong, C. Y.(1997).A general Incremental Technique for Maintaining Discovered Association Rules.Proceedings of the fifth International Conference on Data Engineering
  6. Dass, R.,Mahanti, A.(2005).An Efficient Technique for Frequent Pattern Mining in Real-Time Business Applications.Proceedings of the 38th Hawaii International Conference on System Sciences
  7. Gorodetsk, Vladimir,Karasaeyv, Oleg,Samoilov, Vladimir(2003).Multi-agent Technology for Distributed Data Mining and Classification.Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
  8. Han, Jiawei,Pei, Jian,Yin, Yiwen(2000).Mining frequent patterns without candidate generation.Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data
  9. Han, Jiawei,Pei, Jian,Yin, Yiwen(2004).Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach.Data Mining and Knowledge Discovery,8(1),53-87.
  10. Han, Nam(2001).Data Analysis and Mining in the Life Sciences.SIGMOD Record,30(3),76-85.
  11. Hipp, J.,Guntzer, U.,Nakhaeizadeh, G.(2000).Algorithms for Association Rule Mining-A general Survey and Comparison.Proceedings of ACM SIGKDD Explorations Newsletter,2(1),58-64.
  12. Mahanti, A.,Ghosh, S.,Pal, A. K.(1992).A High Performance Limited-Memory dmissible and Real Time Search Algorithm for Networks.Computer Science Technical Report Series.
  13. Park, Jong Soo,Chan, Ming-Syan,Yu, Philip S.(1995).An Effective Hash-Based Algorithm for Mining Association Rules.ACM SIGMOD Record,24(2),175-186.
  14. Su, J. H.,Lin, W. Y.(2004).CBW: An Efficient Algorithm for Frequent Itmeset Mining.Proceedings of the 37th Hawaii International Conference on System Science
  15. 王慶堯(2000)。碩士論文(碩士論文)。義守大學資訊工程所。
  16. 江俊彥(2001)。國立屏東科技大學資訊管理學系。
  17. 高淑珍(2004)。博士論文(博士論文)。國立成功大學企業管理學系。
  18. 莊文宗(2005)。中華大學資訊管理學系。
  19. 游坤明、莊文宗、蕭偉呈(2004)。分群方式技術與資料探勘應用於肝功能檢驗與疾病關係之研究。資通技術管理與應用會議
  20. 游坤明、盧展皓(2002)。大量資料之關聯式法則的快速發掘與應用-以醫院門診資料為例。第八屆海峽兩岸資訊管理研討會
  21. 游坤明、盧展皓、張煥禛、林年茂、謝泉發(2002)。大量資料之關聯法則的快速發掘與應用-以醫院門診為例。第八屆海峽兩岸資訊管理策略發展會議