题名

Evidence from an IC Packaging Foundry by Using a Two-Phase Clustering Methodology

并列篇名

應用二階段分群方法於IC封裝廠

DOI

10.29977/JCIIE.200807.0003

作者

楊旭豪(Hsu-Hao Yang);劉自強(Tzu-Chiang Liu);蘇旭東(Hsu-Dong Su)

关键词

分群 ; 自我組織地圖 ; 最小跨越樹 ; IC封裝 ; clustering ; self-organizing maps ; minimum spanning tree ; IC packaging

期刊名称

工業工程學刊

卷期/出版年月

25卷4期(2008 / 07 / 01)

页次

287 - 297

内容语文

英文

中文摘要

分群是將物件群集一起使得同群內的物件同質性愈高,而異群間的物件差異性愈明顯。本研究應用二階段分群方法。該方法的第一階段爲自我組織地圖(self-organizing maps, SOM),第二階段包含k-means演算法與以跨越樹爲基(minimum spanning tree-based)的分群方法。跨越樹爲基的分群方法計算效率高,而且比較不受資料分布的影響。因本研究所使用的實務資料數值差異大,因此考慮二種資料轉換,包含min-max正規化與z-score正規化。我們比較的標準是Davies-Bouldin (DB)值與Wilk's lambda值。根據使用台灣某IC封裝廠焊線機資料的測試結果,我們發現,綜合考慮DB值與Wilk's lambda值,在第二階段應用k-means演算法於經過min-max正規化的資料轉換表現比較好。儘管跨越樹爲基的方法並未比k-means演算法優越,但我們發現,就偵測離群值而言,跨越樹爲基的方法比k-means演算法略勝一籌,尤其是資料經過正規化後。

英文摘要

Clustering is to group objects together so that they are as homogenous as possible within the same cluster while most distinct in different clusters. This paper uses a two-phase clustering methodology that integrates the self-organizing maps (SOM) algorithm in the first phase with the k-means algorithm and the minimum spanning tree-based (MST-based) clustering in the second phase. The MST-based clustering is used because it is efficient to solve tree-type problems and tends to be less sensitive to the geometric shape of data. Two types of data transformations including min-max normalization and z-score normalization are employed to deal with the situation where magnitudes of real-life data differ sharply. We compare clustering results in terms of Davies-Bouldin (DB) value and Wilk's lambda value. According to the results by using the data of Wire Bond machines from a Taiwanese IC packaging foundry, we find that applying the k-means algorithm in the second phase to the data with min-max normalization is better in terms of jointly considering DB value and Wilk’s lambda value. Despite that applying the MST-based clustering in the second phase does not outperform the k-means algorithm; however, we find that the former prevails over the latter in terms of detecting outliers especially when normalized data are used.

主题分类 工程學 > 工程學總論
参考文献
  1. Ahmad, K.,B. L. Vrusias,A. Ledford(2001).Choosing feature sets for training and testing self-organizing maps: a case study.Neural Computing & Applications,10,56-66.
  2. Balakrishnan, P. V.,M. C. Cooper,V. S. Jacob,P. A. Lewis(1996).Comparative performance of the FSCL neural net and k-means algorithm for market segmentation.European Journal of Operational Research,93,346-357.
  3. Canetta, L.,N. Cheikhrouhou,R. Glardon(2005).Applying two-stage SOM-based clustering approaches to industrial data analysis.Production Planning & Control,16,774-784.
  4. Davies, D. L.,D. W. Bouldin(1979).A cluster separation measure.IEEE Transactions on Pattern Analysis and Machine Intelligence,1,224-227.
  5. Forina, M.,C. C. Oliveros,C. Casolino,M. Casale(2004).Minimum spanning trees: ordering edges to identify clustering structure.Analytica Chimica Acta,515,43-53.
  6. Grabmeier, J.,A. Rudolph(2002).Techniques of cluster algorithms in data mining.Data Mining and Knowledge Discovery,6,303-360.
  7. Guha, S.,R. Rastogi,K. Shim(2001).CURE: an efficient clustering algorithm for large databases.Information Systems,26,35-58.
  8. Guha, S.,R. Rastogi,K. Shim(2000).ROCK: a robust clustering algorithm for categorical attributes.Information Systems,25,345-366.
  9. Jain, A. K.,M. N. Murty,P. J. Flynn(1999).Data clustering: a review.ACM Computer Survey,31,264-323.
  10. Jain, A. K.,R. C. Dubes(1988).Algorithms for Clustering Data.Upper Saddle River, NJ:Prentice Hall.
  11. Jiang, M. F.,S. S. Tseng,C. M. Su(2001).Two-phase clustering process for outliers detection.Pattern Recognition Letters,22,691-700.
  12. Karypis, G., E. H. Han,V. Kumar(1999).CHAMELEON: a hierarchical clustering algorithm using dynamic modeling.IEEE Computer,32,68-75.
  13. Kaufman, L.,P. J. Rousseeuw(1990).Finding Groups in Data: an Introduction to Cluster Analysis.New York, NY:John Wiley & Sons.
  14. Kohonen, T.(1985).The self-organization map.Proceedings of IEEE,73,1551-1558.
  15. Kohonen, T.(1995).Self-Organizing Maps.Berlin, Germany:Springer-Verlag.
  16. Kuo, R. J.,L. M. Ho,c. M. Hu(2002).Integration of self-organizing feature map and k-means algorithm for market segmentation.Computers & Operations Research,29,1475-1493.
  17. Laszlo, M.,S. Mukherjee(2005).Minimum spanning tree partitioning algorithm for microaggregation.IEEE Transactions on Knowledge and Data Engineering,17,902-911.
  18. Luo, F.,L. Khan,F. B. Bastani,I. L. Yen,J. Zhou(2004).A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles.Bioinformatics,20,2605-2617.
  19. MacQueen, J.(1967).Some methods for classification and analysis of multivariate observations.Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability,Berkeley, CA:
  20. Ng, R.,J. Han(1994).Efficient and effective clustering method for spatial data mining.Proceedings of International Conference on Very Large Data Base,Santiago, Chile:
  21. Pölzlbauer G.,M. Dittenbach,A. Rauber(2006).Advanced visualization of self-organizing maps with vector fields.Neural Networks,19,911-922.
  22. Vesanto, J.,E. Alhoniemi(2000).Clustering of the self-organizing map.IEEE Transactions on Neural Networks,11,586-600.
  23. Xu, Y.,V. Olman,D. Xu(2002).Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning tree.Bioinformatics,18,536-545.
  24. Xu, Y.,V. Olman,D. Xu(2001).Minimum spanning tree for gene expression data clustering.Genome Informatics,12,24-33.
  25. Zahn, C. T.(1971).Graph-theoretical methods for detecting and describing gestalt clusters.IEEE Transactions on Computers,20,68-86.
  26. Zhang, T.,R. Ramakrishnan,M. Livny(1996).BIRCH: an efficient data clustering method for very large databases.Proceedings of International Conference on Management of Data,Montreal, Canada: