题名

以索引值導向為基礎具高效率的新網格群集演算法

并列篇名

An Index Value Oriented Scheme on Efficient Grid-based Clustering Algorithm

作者

陳而設

关键词

網格式分群 ; 資料探勘 ; 資料分群 ; data clustering ; data mining ; grid-based clustering

期刊名称

屏東科技大學資訊管理系所學位論文

卷期/出版年月

2016年

学位类别

碩士

导师

蔡正發

内容语文

繁體中文

中文摘要

由於資訊科技發展蓬勃,資料數量增加的速度日益成長,面對大量的資料數據,如何從中獲取重要的規則及資訊是相當重要的課題,而資料探勘(Data Mining)是挖掘資料集所含之有用資訊的重要技術之一,因此,若提出的演算法能適用於大型資料庫那將會是相當有價值的技術。本論文提出的新演算法IVOS是基於網格式架構的新技術,為避免傳統網格式演算法中重複搜尋的網格方式,本論文運用別於傳統網格式演算法的合併及擴散方式,並導入索引值的概念,以提升分群效率,其中主要提出改良的流程可以分成四個部分:(1) 上方網格為無效網格、(2)上方網格為有效網格、(3)將索引值導回邊界值、(4)多群集合併。從實驗結果中可以得知,IVOS演算法在時間成本上均比其它方法快1.5倍以上,而分群正確率及雜訊濾除率也皆在99%的水準之上。

英文摘要

Data mining is one of the most significant techniques for mining useful information from datasets, which has become a challenging issue for scholars to investigate efficiency and performance improvement. Thereby, the algorithm that can be employed to big data will be a valuable technique. This paper proposed an Index Value Oriented Scheme(IVOS) algorithm based on grid clustering. The algorithm applied merging and spreading methods different from traditional grid algorithms, and searching approaches that can reduce repetition in order to improve clustering efficiency. The main improvements are as follows. (1) The top grids are invalid. (2) The top grids are valid. (3) The index values are deduced to boundary values. (4) Multi-clusters are merged. According to the simulation results, the proposed IVOS is faster than the other algorithms involving CLIQUE, ANGEL, GCCR and TING. Moreover, the proposed algorithm has at least 99% of clustering correctness rate and noise filtering rate.

主题分类 管理學院 > 資訊管理系所
社會科學 > 管理學
参考文献
  1. [2] 林英盛,一個建立於網格式具高效能及高效率的群聚演算法,國立屏東科技大學資訊管理所碩士論文,2012。
    連結:
  2. [3] 張志豪,一個使用空間交會凝聚技術之有效率的網格式分群演算法,國立屏東科技大學資訊管理所碩士論文,2012。
    連結:
  3. [5] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan,P., “Automatic subspace clustering of high dimensional data for data mining applications,” Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 94-105, 1998.
    連結:
  4. [8] Karypis, G., Han, E.H., Kumar, V., “Chameleon: Hierarchical clustering using dynamic modeling,” IEEE Computer, vol. 32, no. 8, pp. 68-75, 1999.
    連結:
  5. [11] Tsai, C.F., Yen, C.C, “ANGEL: A new effective and efficient hybrid clustering technique for large databases,” Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 817-824, Springer, Heidelberg, 2007.
    連結:
  6. [12] Wang, W.Y, Muntz, R, “STING: A statistical information grid approach to spatial data mining,” VLDB, pp. 186-195, 1997.
    連結:
  7. [13] Zhang, T., Ramakrishnan, R, “BIRCH: An efficient Data Clustering Method for Very Large Databases,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 103-114. ACM Press, Montreal, Canada, 1996.
    連結:
  8. [14] Beauchemin, M., ”A density-based similarity matrix construction for spectral clustering,” Neurocomputing, vol. 151, no. 2, pp. 835-844, 2015.
    連結:
  9. [15] Bouveyron, C., and Brunet, C., “Model-based clustering of high-dimensional data: A review,” Computational Statistics & Data Analysis, vol. 71, pp. 92-106, 2014.
    連結:
  10. [16] Chen, X., “A new clustering algorithm based on near neighbor influence,” Expert Systems with Applications, vol. 42, pp. 7746-7758, 2015.
    連結:
  11. [17] Gan, G., and Ng, M. K.-P., “Subspace clustering using affinity propagation,” Pattern Recognition, vol. 48, no. 4, pp. 1451-1460, 2015.
    連結:
  12. [18] Hou, C., Nie, F., Yi, D., Tao, D., “Discriminative embedded clustering: a framework for grouping high-dimensional data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, pp. 1287-1299, 2014.
    連結:
  13. [19] İnkaya, T., Kayalıgil, S., Özdemirel, N.E., “Ant colony optimization based clustering methodology,” Applied Soft Computing, vol. 28, pp. 301-311, 2015.
    連結:
  14. [20] Kim, Y., Shim, K., Kim, M., and Lee, J. S., “DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce,” Information systems, vol. 42, no. 3, pp. 15-35, 2014.
    連結:
  15. [21] Tsai, C.F., Huang, S.C., “An effective and efficient grid-based data clustering algorithm using intuitive neighbor relationship for data mining,” Machine Learning and Cybernetics (ICMLC), vol. 2, pp. 478-483, 2015.
    連結:
  16. 中文文獻
  17. [1] 葉恆甫,一個使用空間切割技術之有效率密度式分群演算法,國立屏東科技大學碩士論文,2010。
  18. [4] 胡永慶,一個使用交錯的鄰近網格搜尋之有效率的新網格式分群演算法,國立屏東科技大學資訊管理所碩士論文,2013。
  19. 英文文獻
  20. [6] Ester, M., Kriegel, H.P., Sander, J., Xu, X., “A density-based algorithm for discovering clusters in large spatial databases with noise, ” Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD',96), pp. 226-231, 1996.
  21. [7] Guha, S., Rastogi, R., and Shim, K., “CURE: An efficient clustering algorithm for large databases,” Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 73-84, 1998.
  22. [9] MacQueen, J.B., “Some methods for classification and analysis of multivariate observations,” Proc. 5th Berkeley Symp., vol. 1, pp. 281-297, 1967.
  23. [10] Tsai, C.F., Liu, C.W., “KIDBSCAN: A new efficient data clustering algorithm for data mining in large databases,” Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 702–711, Springer, Heidelberg, 2006.