题名

GSPT:使用前序表的高效關聯規則演算法

并列篇名

GSPT: An Efficient Mining Frequent Itemsets Algorithm Using Prefix Table

作者

黃仁鵬(Jen-Peng Huang)

关键词

資料探勘 ; 關聯規則 ; 前序表 ; 排序 ; 階段搜尋 ; data mining ; association rules ; prefix table ; sorting ; gradation scanning

期刊名称

Electronic Commerce Studies

卷期/出版年月

14卷2期(2016 / 06 / 30)

页次

257 - 277

内容语文

繁體中文

中文摘要

隨著資訊科技的快速進步,交易、文件、日常處理的資料被電子化大量的累積,資料探勘的技術變得日益重要。其中,又以關聯規則在資料探勘領域中扮演著相當重要的角色,因此,許多探勘關聯規則演算法被不斷提出及改進,其主要目的在於增加執行效能或提升記憶體使用率。本研究提出GSPT 演算法 (Gradation Scanning using Prefix Table)。GSPT 演算法主要的特色就是前序表的排序概念與階段縮減過濾機制。建立前序表可將候選項目集加以集中,可減少計算支持度時掃瞄資料庫需花費的時間,進而改善傳統關聯規則演算法資料庫的掃瞄方式,加上配合前序表所提出的排序機制可有效的提升探勘效能;而階段縮減過濾機制可大量減少非頻繁項目集的數量,將可更適用於探勘交易長度較長的資料庫並且有效提昇記憶體的使用率。本研究實驗顯示本演算法在效能上優於Apriori 與 FP-growth 演算法。

英文摘要

Due to the science and technology make a great progress, transactions, documents and data are transformed into electronic types, the large number of data has been accumulated. Therefore, data mining technology becomes more important than before in recent years. In data mining territory, mining association rules plays a quite important position. Many of association rules mining algorithms were proposed to improve the efficiency of data mining or save the utility rate of memory. In this paper we propose a new algorithm - GSPT (Gradation Scanning using Prefix Table). The characters of the GSPT algorithm uses prefix table and gradation reduction mechanisms. GSPT uses prefix table to reduce the time of scanning database and gradation reduction mechanisms to reduce infrequent itemsets. Comprehensive experiments have been conducted to assess the performance of the proposed algorithm. The experimental results show that the GSPT algorithm outperforms Apriori and FP-growth.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 經濟學
参考文献
  1. Huang, J.-P.,Chien, I.-P.,Kuo, H.-C.(2006).An efficient incremental mining algorithm-ICI.Journal of e-Business,8(3),393-413.
    連結:
  2. Huang, J.-P.,Lan, G.-C.(2006).An efficient algorithm for mining association rules -GRA.Journal of e-Business,8(4),469-498.
    連結:
  3. Huang, J.-P.,Tsai, C.-L.(2007).GSA : A gradational scanning algorithm for mining association rules.Journal of e-Business,9(4),823-846.
    連結:
  4. 黃仁鵬、柯柏瑄(2009)。GSSA:以階段分組排序搜尋機制探勘關聯規則之演算法。電子商務學報,11(3),551-568。
    連結:
  5. Agrawal, R.,Srikant, R.(1994).Fast algorithms for mining association rules.Proceedings of 1994 International Conference on Very Large Data Bases
  6. Brin, S.,Motwani, R.,Ullman, J. D.,Tsur, S.(1997).Dynamic Itemset Counting and Implication Rules for Market Basket Data.ACM SIGMOD Conference on Management of Data
  7. Calders, T.,Dexters, N.,Gillis, J. J. M.,Goethals, B.(2014).Mining frequent itemsets in a stream.Information Systems,39,233-255.
  8. Chen, X.,Li, L.,Ma, Z.,Bai, S.,Guo, F.(2006).F-Miner: A New Frequent Itemsets Mining Algorithm.IEEE International Conference on e-Business Engineering
  9. Cuzzocrea, A.,Jiang, F.,Lee, W.,Leung, C.(2014).Efficient Frequent Itemset Mining from Dense Data Streams.Web Technologies and Applications
  10. Dai, C.,Chen, L.(2012).An algorithm for mining frequent closed itemsets in data stream.Physics Procedia,24,1722-1728.
  11. Han, J.,Pei, J.,Yin, Y.(2000).Mining Frequent Patterns without Candidate Generation.Proc. ACM SIGMOD Int. Conf. on Management of Data
  12. Huang, J.-P.,Chen, S.-J.,Kuo, H.-C.(2007).An efficient incremental mining algorithm-QSD.Intelligent Data Analysis,11(3),265-278.
  13. Huang, J.-P.,Hsiung, H.-C.,Kuo, H.-C.(2004).Intuitional Decomposition Association Rule Algorithm-IDA.Proceedings of th The 10th conference on Information management Practice 2004 (IMP2004)
  14. Khan, Z.,Haseen, F.,Rizvi, S.,ShabbirAlam, M.(2015).Enhanced BitApriori Algorithm: An Intelligent Approach for Mining Frequent Itemset.Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014
  15. Liu, X.,Wu, J.,Gu, F.,Wang, J.,He, Z.(2015).Discriminative pattern mining and its applications in bioinformatics.Briefings in Bioinformatics,16(5),884-900.
  16. Naulaerts, S.,Meysman, P.,Bittremieux, W.,Vu, T. N.,Vanden Berghe, W.,Goethals, B.,Laukens, K.(2015).A primer to frequent itemset mining for bioinformatics.Briefings in Bioinformatics,16(2),216-231.
  17. Park, J. S.,Chen, M. S.,Yu, P. S.(1995).An Effective Hash-based Algorithm for Mining Association Rules.Proc. Of the ACM SIGMOD Conference on Management of Data
  18. Sucahyo, Y. G.,Gopalan, R.(2004).CT-PRO: A Bottom-Up Non Recursive Frequent Itemset Mining Algorithm Using Compressed FP-Tree Data Structure.Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI)
  19. Tseng, F. C.,Hsu, C. C.(2001).Creating frequent patterns with the frequent pattern list.Proc. Of the Asia Pacific Conference of Data Mining and Knowledge Discovery,Hong Kong:
  20. Zhang, C.,Masseglia, F.,Lechevallier, Y.(2014).The anti-bouncing data stream model for web usage streams with intralinkings.Information Sciences,278,757-772.