题名

A Nonparametric Multi-Seed Data Clustering Technique

并列篇名

非參數式資料群集法

DOI

10.29977/JCIIE.200801.0001

作者

李增坪(Tseng-Pin Lee);耿伯文(Victor B. Kreng)

关键词

群集 ; 最小展開樹 ; 基因演算法 ; clustering ; minimal spanning tree ; genetic algorithms

期刊名称

工業工程學刊

卷期/出版年月

25卷1期(2008 / 01 / 01)

页次

1 - 10

内容语文

英文

中文摘要

單一群集中心點無法處理細長形狀的資料分佈;所以,當資料分佈形成複雜形狀,需要將之分割成數個小群集,並將這些小群集合併為一群,因而需要多個小群集的中心點,作為最終單一群集的起始參考點。本研究提出一非參數式的資料群集法,藉由分割與合併的程序來處理複雜形狀的資料分佈;在分割程序中,應用基因演算法將資料區分為數個小群集,並找出最適宜的群集中心點;而後,應用本研究所發展一種嶄新的判斷演算法-採用最小展開樹與統計方法,判斷任何鄰近的小群集是否合併為單一群集。最終,本文藉由數種資料分佈與實際資料,驗證本群集法的有效性。

英文摘要

Clustering of data around one seed does not work well if the shape of the cluster is elongated or non-convex. A complex shaped cluster requires several seeds. This study developed a nonparametric multi-seed data clustering approach which splits and merges procedures to handle the complex shapes of clusters. The splitting process utilizes a genetic algorithm to search for the appropriate cluster centers, which split all data into a considered amount of groups. To assign several seeds into one cluster, an innovative clustering process using a minimal spanning tree and statistics concept was proposed to judge whether a pair of clusters should be merged or separated. Experimental results illustrate the difficulties of one-seed-per-cluster, and also the effectiveness of the proposed clustering scheme.

主题分类 工程學 > 工程學總論
参考文献
  1. Bandyopadhyay, S.,U. Maulik(2001).Nonparametric genetic clustering : comparison of validity indices.IEEE Transactions on System, Man, Cybernetics, Part C,31,120-125.
  2. Bandyopadhyay, S.,U. Maulik(2002).Genetic clustering for automatic evolution of clusters and application to image classification.Pattern Recognition,35,1197-1208.
  3. Bezdek, J. C.,N. R. Pal(1998).Some new indexes of cluster validity.IEEE Transactions on System, Man, Cybern. Part B,28,301-315.
  4. Chaudhuri, D.,B. B. Chaudhuri(1997).A novel multiseed nonhierarchical data clustering technique.IEEE Transactions on System, Man, Cybernetics, Part B,27,871-877.
  5. Chiou, Y. C.,L. W. Lan(2001).Genetic clustering algorithms.European Journal of Operational Research,135,413-427.
  6. Davies, D. L.,D. W. Bouldin(1979).A cluster separation measure.IEEE Transactions on Pattern Analysis Machine Intelligence,1,224-227.
  7. Jain, A. K.,M. N. Murty,P. J. Flyn(1999).Data clustering: a review.ACM Computing Surveys,31,264-323.
  8. Krishna, K.,M. N. Murty(1999).Genetic K-means algorithm.IEEE Transactions on System, Man, Cybernetics, Part B,29,433-439.
  9. Malay, K. P.,S. Bandyopadhyay,U. Maulik(2004).Validity index for crisp and fuzzy clusters.Pattern Recognition,37,487-501.
  10. Maulik, U.,S. Bandyopadhyay(2000).Genetic algorithm-based clustering technique.Pattern Recognition,33,1455-1465.
  11. Michalewicz, Z.(1999).Genetic Algorithms + Data structures.Berlin:Springer.
  12. Tou, J. T.,R. C. Gonzalez(1974).Pattern Recognition Principles.MA:Addison-Wesley Reading.
  13. Tseng, L.Y.,S. B. Yang(2001).A genetic clustering algorithm for data with Non-spherical-shape clusters.Pattern Recognition,34,415-424.
  14. Tseng, L.Y.,S. B. Yang(2000).A genetic approach to the automatic clustering problem.Pattern Recognition,33,1251-1259.
  15. Wu, S.,A. W. C. Liew,H. Yan,M. Yang(2004).Cluster analysis of gene expression data based on self-splitting and merging competitive learning.IEEE Transactions on Information Technology in Biomedicine,8,5-15.
  16. Zhang, Y. J.,Z. Q. Liu(2002).Self-Splitting competitive learning: A new on-line clustering paradigm.IEEE Transactions on Neural Networks,13,369-380.