题名

高效率探勘關聯規則之演算法—GRA

并列篇名

An Efficient AIgorithm for Mining Association Rules-GRA

DOI

10.6188/JEB.2006.8(4).02

作者

黃仁鵬(Jen-Peng Huang);藍國誠(Guo-Cheng Lan)

关键词

資料探勘 ; 關聯法則 ; 階段縮減機制 ; data mining ; association rules ; the gradation reduction mechanisms

期刊名称

電子商務學報

卷期/出版年月

8卷4期(2006 / 12 / 01)

页次

469 - 498

内容语文

繁體中文

中文摘要

資料探勘的技術變得日益重要,目前已廣泛應用在商業上的預測及決策的支援。其中,關聯法則在資料探勘領域中,扮演著相當重要的角色,因此,許多探勘關聯法則演算法不斷被提出及改進,主要目的在於增加執行效能或提升記憶體使用率。本研究也朝著這個目標,試著改進關聯規則演算法為主要方向。本研究主要是針對探勘關聯規則GDA演算法的特性及缺點來加以改進,雖然GDA 演算法已是很有效率的演算法之一,不過,它還是有兩個最主要的問題。第一,GDA無法探勘交易長度較長的交易資料庫。第二,GDA演算法在某階段須儲存大量非必要的項目集時,將導致記憶體使用率會呈現不佳狀態;因此,GDA演算法的實用性大打折扣。基本於上述理由,本研究提出一個改良於GDA演算法的階段拆解核心的新演算法GRA(Gradation Reduction Approaches)。GRA演算法主要的特色就是加入階段縮減交易長度機制,因為該縮減機制可大量減少非頻繁項目集的數量,將可更適用於探勘交易長度較長的資料庫。在執行過程中,GRA演算法不需要產生任何候選項目集,即可快速找出關聯規則。除此之外,GRA演算法透過階段縮減機制僅會產生可能成為頻繁的項目集,所以,不需耗費龐大記憶體空間來儲存項目集,可有效提昇記憶體的利用率。在現實生活中的資料庫容量通常都是大於記憶體容量,為了解決此問題,本研究也以GRA演算法為主題提出另一修正版GAR-M演算法,GAR-M演算法將選擇採用資料庫分割方式繼續執行探勘任務,每個子資料庫僅需進行四次實體I/O動作,不隨著頻繁項目集的長度增長而增加實體I/O 的次數,以避免耗費過多的I/O時間,也可有效提高執行效率與實用性。

英文摘要

The technology of data mining becomes important in recent years, and it is generally applied to commercial forecast and decision supports. Association rules mining algorithms play the important role in the field of data mining. Many of association rules mining algorithms were proposed to improve the efficiency of data mining or save the utility rate of memory. Our major study also tries to improve the efficiency of association rules mining algorithms.In this paper, our major study is to improve the defects of the GDA algorithm. Although GDA algorithm was one of the most efficient algorithms, but it still has two serious problems; in the first place, GDA algorithm can't mine the transactions of databases whose record length is very long; in the second place, GDA algorithm isn't very efficient at utility rate of the memory when it must store lots of unnecessary itemsets at one phase. Therefore, the GDA algorithm is not very practical.Based on above the reasons we propose a new algorithm-GRA (Gradation Reduction Approaches) that is improved from GDA algorithm. One of the characters of the GRA algorithm is the gradation reduction mechanisms because it can reduce lots of infrequent itemsets; the GRA algorithm is very suitable to mine the transactions of databases whose record length is very long. In the mining procedure, the GRA algorithm doesn't generate any candidate itemset to find association rules quickly. Besides, the GRA algorithm through gradation reduction mechanisms only generate those itemsets which are the most possible to be the frequent itemsets. So, the GRA algorithm can't waste lots of memory spaces to store infrequent itemsets; it can efficiently increase the utility rate of memory. The size of the databases in the real world is always greater than the size of the memory. In order to solve this problem, we propose a modifying algorithm-GRA-M (Gradation Reduction Approaches-Modifying); it divides a large database into many sub-databases and mines association rules from those sub-databases. The GRA-M algorithm only scans database four times and will not be affect by the length of frequent itemsets. The GRA-M algorithm avoids wasting a lot of I/O time and increases the efficiency and the practicability in application.

主题分类 人文學 > 人文學綜合
基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 社會科學綜合
参考文献
  1. Agrawal, R.,R. Srikant(1994).Fast Algorithm for Mining Association Rules in Large Databases.Proc. 1994 Int'l Conf VLDB,Santiago, Chile:
  2. Brin. S.,R. Motwani,J. D. Ullman,S. Tsur(1997).Dynamic Itemset Counting and Implication Rules for Market Basket Data.ACM SIGMOD Conference on Management of Data
  3. Han, J.,J. Pei,Y. Yin(2000).Mining Frequent Patterns without Candidate Generation.Proc. ACM SIGMOD Int. Conf. on Management of Data
  4. Quest Synthetic Data Generation Code
  5. Lee, C. H.,C. R. Lin,M. S. Chen(2001).Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining.Proc. of the ACM 10th Intern'l Conf. on Information and Knowledge Management (CIKM-01)
  6. Park, J. S.,M. S. Chen,P. S. Yu(1995).An Effective Hash Based Algorithm for Mining Association Rules.Proc. Of the ACM SIGMOD Conference on Management Data
  7. Pei, J.,J. Han,L. V. S. Lakshmanan(2001).Mining Frequent Itemsets with Convertible Constraints.Proc. of 2001 Int. Conf. on Data Engineering
  8. Tseng, F. C.,C. C. Hsu(2001).Creating Frequent Patterns with the Frequent Pattern List.Proc. Of the Asia Pacific Conference of Data Mining and Knowledge Discovery,Hong Kong:
  9. 陳秀如(2004)。碩士論文(碩士論文)。南台科技大學資訊管理研究所。
  10. 黃仁鵬、熊浩志、郭煌政(2004)。直覺拆解之關聯按則演算按-IDA。第十屆資訊管理暨實務研討會
  11. 黃仁鵬、錢依佩、吳聲弘(2003)。高效率之關聯規則探勘演算法-ICI。第十四屆國際資訊管理學術研討會
  12. 黃南傑(2004)。碩士論文(碩士論文)。南台科技大學資訊管理所。
  13. 微軟公司(2000)。Microsoft Analysis services的範本倉儲資料庫FoodMart。
  14. 熊浩志(2005)。碩士論文(碩士論文)。南台科技大學資訊管理研究所。
被引用次数
  1. 黃仁鵬(2016)。GSPT:使用前序表的高效關聯規則演算法。Electronic Commerce Studies,14(2),257-277。
  2. 黃仁鵬、柯柏瑄(2009)。GSSA:以階段分組排序搜尋機制探勘關聯規則之演算法。電子商務學報,11(3),551-568。