题名 |
幾個快速挖掘關聯規則的資料探勘方法 |
并列篇名 |
Several Improved Data Mining Algorithms for Finding Association Rules |
DOI |
10.6188/JEB.2003.5(2).01 |
作者 |
陳彥良(Yen-Liang Chen);趙書榮(Shu-Rong Zhao);陳禹辰(Yu-Chen Chen) |
关键词 |
資料挖掘 ; 關聯規則 ; 交易資料庫 ; Data mining ; Association rule ; Transaction database |
期刊名称 |
電子商務學報 |
卷期/出版年月 |
5卷2期(2003 / 09 / 01) |
页次 |
1 - 10 |
内容语文 |
繁體中文 |
中文摘要 |
關聯規則的挖掘,是目前最重要的資料挖掘問題之一,它的目的是要從銷售的交易資料庫中,發現商品項目間的關聯。在過去已經有相當多挖掘關聯規則演算法被提出來,當中FP-tree演算法可說是最主要的演算法之一,並以高執行效率著稱。它的主要概念是不產生candidate itemsets,而將資料庫壓縮在FP-tree的結構中,以避免多次的高成本資料庫掃瞄。在本文中,我們針對原本的FP tree演算法,更進一步改進其所用的資料結構以提高挖掘效率。 在本文中共建立了三種資料結構:(一)FP-tree_tail演算法,也就是在head table中增加一個tail欄位,(二)FP-tree_hash演算法,乃是以hash function計算出每個node所在位置方式建立FP-tree,(三)FP-tree_hash+tail演算法,為結合(一)、(二)之優點,所完成之演算法。作者將以上三個演算法與傳統FP_tree演算法一起比較,以找出各演算法之優缺點。經由各種實驗數據發現,傳統FP_tree演算法所需花費之時間,為三個改良FP-tree演算法的數十倍。 |
英文摘要 |
Mining association rules is one of the most important problems in data mining. Its aim is to discover the associations between items in a large database of sales transactions. In the past, a large number of algorithms for mining association rules have been proposed, and the FP-tree algorithm is one of the most famous ones, known for its efficiency. Unlike the traditional approach that requires many phases of candidate itemsets generation and database scan, the FP-tree algorithm compresses and stores the entire database into a sophisticated tree structure, called FP-tree, by which all the associations can be found by two database scans. In this paper, we attempt to further improve the standard structure of the FP-tree such that the mining performance can be improved. To this end, three variants of the improved FP-tree algorithm are proposed. The first variant is called FP-tree tail, which adds a tail pointer into the head table of the original FP-tree structure. The second is named as FP-tree hash, which adds a hash table into every node of the FP-tree. Finally, we call the last FP-tree_hash+tail, which is a combination of the first two improvements. Finally, a performance evaluation is done to compare their performances. The result indicates that the three proposed algorithms are about 20-50 times faster than the original FP-tree algorithm. |
主题分类 |
人文學 >
人文學綜合 基礎與應用科學 > 資訊科學 基礎與應用科學 > 統計 社會科學 > 社會科學綜合 |
参考文献 |
|