题名

幾個快速挖掘關聯規則的資料探勘方法

并列篇名

Several Improved Data Mining Algorithms for Finding Association Rules

DOI

10.6188/JEB.2003.5(2).01

作者

陳彥良(Yen-Liang Chen);趙書榮(Shu-Rong Zhao);陳禹辰(Yu-Chen Chen)

关键词

資料挖掘 ; 關聯規則 ; 交易資料庫 ; Data mining ; Association rule ; Transaction database

期刊名称

電子商務學報

卷期/出版年月

5卷2期(2003 / 09 / 01)

页次

1 - 10

内容语文

繁體中文

中文摘要

關聯規則的挖掘,是目前最重要的資料挖掘問題之一,它的目的是要從銷售的交易資料庫中,發現商品項目間的關聯。在過去已經有相當多挖掘關聯規則演算法被提出來,當中FP-tree演算法可說是最主要的演算法之一,並以高執行效率著稱。它的主要概念是不產生candidate itemsets,而將資料庫壓縮在FP-tree的結構中,以避免多次的高成本資料庫掃瞄。在本文中,我們針對原本的FP tree演算法,更進一步改進其所用的資料結構以提高挖掘效率。 在本文中共建立了三種資料結構:(一)FP-tree_tail演算法,也就是在head table中增加一個tail欄位,(二)FP-tree_hash演算法,乃是以hash function計算出每個node所在位置方式建立FP-tree,(三)FP-tree_hash+tail演算法,為結合(一)、(二)之優點,所完成之演算法。作者將以上三個演算法與傳統FP_tree演算法一起比較,以找出各演算法之優缺點。經由各種實驗數據發現,傳統FP_tree演算法所需花費之時間,為三個改良FP-tree演算法的數十倍。

英文摘要

Mining association rules is one of the most important problems in data mining. Its aim is to discover the associations between items in a large database of sales transactions. In the past, a large number of algorithms for mining association rules have been proposed, and the FP-tree algorithm is one of the most famous ones, known for its efficiency. Unlike the traditional approach that requires many phases of candidate itemsets generation and database scan, the FP-tree algorithm compresses and stores the entire database into a sophisticated tree structure, called FP-tree, by which all the associations can be found by two database scans. In this paper, we attempt to further improve the standard structure of the FP-tree such that the mining performance can be improved. To this end, three variants of the improved FP-tree algorithm are proposed. The first variant is called FP-tree tail, which adds a tail pointer into the head table of the original FP-tree structure. The second is named as FP-tree hash, which adds a hash table into every node of the FP-tree. Finally, we call the last FP-tree_hash+tail, which is a combination of the first two improvements. Finally, a performance evaluation is done to compare their performances. The result indicates that the three proposed algorithms are about 20-50 times faster than the original FP-tree algorithm.

主题分类 人文學 > 人文學綜合
基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 社會科學綜合
参考文献
  1. Agarwal, R.,Aggarwal, C.,Prasad, V. V. V.(2000).A tree projection algorithm for generation of frequent item sets.Journal of Parallel and Distributed Computing
  2. Agrawal, R.,Srikant, R.(1994).Proc. Int'l Conf..
  3. Bayardo, Jr. R. J.(1998).Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data.
  4. Brin, S.,Ullman, J. D.,Motwani, R.,Tsur, S.(1997).In Proc. of the 1997 ACM-SIGMOD Conf..
  5. Gunopulos, G.,Mannila, H.,Saluja, S.(1997).In Proc. Of the 6th Int'l Conf..
  6. Han, J.,Pei, J.(2000).Mining Frequent Patterns by Pattern-Growth: Methodology and Implications.ACM SIGKDD Explorations,2(2)
  7. Han, J.,Pei, J.,Yin, Y.(2000).ACM-SIGMOD Int. Conf..
  8. Park, J. S.,Chen, Ming-Syan,Yu, P. S.(1995).ACM-SIGMOD Int'l Conf..
  9. Pasquier, N.,Bastide, Y.,Taouil, R.,Lakhal, L.(1999).Efficient Mining of Association Rules using Closed Itemset Lattices.Information Systems,24(1)
  10. Pei, J.,Han, J.(2000).Int. Conf. on Knowledge Discovery and Data Mining.
  11. Pei, J.,Pinto, H.,Chen, Q.,Dayal, U.,Han, J.,Hsu, M. C.(2001).Int. Conf. on Data Engineering.
  12. Savasere, E. O.,Navathe, S.(1995).Proc. Int'l Conf..
  13. Yen, S. J.,Chen, A. L. P.(1996).Proceeding of the IEEE/ACM International Conference on Parallel and Distributed Information Systems.
  14. Zaki, M. J.,Li, W.,Parthasarathy, S.,Ogihara, M.(1996).Evaluation of Sampling for Data Mining of Association Rules.U. Rochester:Computer Science Dept..