题名

基於核方法聚類過程之相似圖建構及核生成和標準化的研究

并列篇名

A Study on End-to-End Kernel-Based Clustering Processes of Similarity Graph Constructions, Kernel Generations and Normalizations

DOI

10.6342/NTU201703989

作者

陳均豪

关键词

聚類分析 ; 核方法 ; 矩陣標準化 ; 核相似圖建構 ; 圖稀疏 ; cluster analysis ; kernel method ; matrix normalization ; kernel similarity matrix construction ; graph sparsification

期刊名称

國立臺灣大學資訊工程學系學位論文

卷期/出版年月

2017年

学位类别

碩士

导师

林守德

内容语文

英文

中文摘要

聚類方法比如特徵值聚類,核k平均群聚類和對稱非負矩陣分解在資料探勘上扮演著重要的角色。這一篇論文聚焦在回答關於核聚類的方法的重要研究問題:相似矩陣的生成,稀疏化和標準化會如何影響聚類的結果?我們能不能辨別出一些能夠給出高質量聚類結果的上述組合。與此同時,在這一篇論文中我們提出了使用Laplacian圖來解釋為什麼使用K臨近圖稀疏化會使得效果更好。

英文摘要

Kernel-based clustering such as Spectral Clustering, Kernel K-Means, and Symmetry NMF plays an important role on data mining. This works aims at answering a critical research question regarding to kernel-based clustering: how the similarity matrix generations, sparsification, and the normalizations influence the clustering results? Can we identify a set of combinations among them to achieve high-quality clustering results. Furthermore, in this thesis we provide a interpretation of the graph sparsification, especially K-NN graph in terms of the graph Laplacian property to explain why it works.

主题分类 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學系
参考文献
  1. [1] M. Beauchemin. On affinity matrix normalization for graph cuts and spectral clustering. Pattern Recognition Letters, 68:90–96, 2015.
    連結:
  2. [3] P. Berkhin. A survey of clustering data mining techniques. In Grouping multidimensional data, pages 25–71. Springer, 2006.
    連結:
  3. [4] M. Brito, E. Chavez, A. Quiroz, and J. Yukich. Connectivity of the mutual k-nearestneighbor graph in clustering and outlier detection. Statistics & Probability Letters, 35(1):33–42, 1997.
    連結:
  4. [5] D. Cai, X. He, J. Han, and T. S. Huang. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1548–1560, 2011.
    連結:
  5. [12] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras. A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE transactions on emerging topics in computing, 2(3):267–279, 2014.
    連結:
  6. [13] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. Journal of machine learning research, 9(Aug):1871–1874, 2008.
    連結:
  7. [14] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta. A survey of kernel and spectral methods for clustering. Pattern recognition, 41(1):176–190, 2008.
    連結:
  8. [17] L. Hubert and P. Arabie. Comparing partitions. Journal of classification, 2(1):193– 218, 1985.
    連結:
  9. [18] H. Jia, S. Ding, X. Xu, and R. Nie. The latest research progress on spectral clustering. Neural Computing and Applications, 24(7-8):1477–1486, 2014. 35
    連結:
  10. [19] H.-P. Kriegel, E. Schubert, and A. Zimek. The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowledge and Information Systems, pages 1–38, 2016.
    連結:
  11. [20] D. Kuang, S. Yun, and H. Park. Symnmf: nonnegative low-rank approximation of a similarity matrix for graph clustering. Journal of Global Optimization, 62(3):545–574, 2015.
    連結:
  12. [21] H. W. Kuhn. The hungarian method for the assignment problem. 50 Years of Integer Programming 1958-2008, pages 29–47, 2010.
    連結:
  13. [22] D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788, 1999.
    連結:
  14. [23] T. Li and C. Ding. The relationships among various nonnegative matrix factorization methods for clustering. In Sixth International Conference on Data Mining (ICDM’06), pages 362–371. IEEE, 2006.
    連結:
  15. [24] J. MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA., 1967.
    連結:
  16. [25] M. Maier, M. Hein, and U. von Luxburg. Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theoretical Computer Science, 410(19):1749–1764, 2009.
    連結:
  17. [26] M. Maier, U. Von Luxburg, and M. Hein. How the result of graph clustering methods depends on the construction of the graph. ESAIM: Probability and Statistics, 17:370–418, 2013.
    連結:
  18. [29] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence, 22(8):888–905, 2000.
    連結:
  19. [30] R. Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. The annals of mathematical statistics, 35(2):876–879, 1964.
    連結:
  20. [31] A. Strehl and J. Ghosh. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec):583–617, 2002.
    連結:
  21. [33] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
    連結:
  22. [34] F. Wang, P. Li, A. C. König, and M. Wan. Improving clustering by learning a bistochastic data similarity matrix. Knowledge and information systems, 32(2):351–382, 2012.
    連結:
  23. [35] R. Xu and D. Wunsch. Survey of clustering algorithms. IEEE Transactions on neural networks, 16(3):645–678, 2005.
    連結:
  24. [38] L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In NIPS, volume 17, page 16, 2004.
    連結:
  25. [2] M. Belkin. Problems of Learning on Manifolds. PhD thesis, The University of Chicago, 2003.
  26. [6] F. R. Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997.
  27. [7] I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 551–556. ACM, 2004.
  28. [8] I. S. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral clustering and graph cuts. Computer Science Department, University of Texas at Austin, 2004. 34
  29. [9] C. Ding, X. He, and H. D. Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the 2005 SIAM International Conference on Data Mining, pages 606–610. SIAM, 2005.
  30. [10] C. H. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pages 107–114. IEEE, 2001.
  31. [11] E. Elhamifar and R. Vidal. Sparse manifold clustering and embedding. In Advances in neural information processing systems, pages 55–63, 2011.
  32. [15] L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE transactions on computer-aided design of integrated circuits and systems, 11(9):1074–1085, 1992.
  33. [16] G. Hinton and S. Roweis. Stochastic neighbor embedding. In NIPS, volume 15, pages 833–840, 2002.
  34. [27] A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. In NIPS, volume 14, pages 849–856, 2001.
  35. [28] F. Nie, X. Wang, M. I. Jordan, and H. Huang. The constrained laplacian rank algorithm for graph-based clustering. In AAAI, pages 1969–1976. Citeseer, 2016. 36
  36. [32] M. Vladymyrov and M. A. Carreira-Perpinán. Entropic affinities: Properties and efficient numerical computation. In ICML (3), pages 477–485, 2013.
  37. [36] W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 267–273. ACM, 2003.
  38. [37] R. Zass and A. Shashua. Doubly stochastic normalization for spectral clustering. In NIPS, pages 1569–1576, 2006.