题名

Application of C-Means and MC-Means Clustering Algorithms to Soybean Dataset

作者

Faraj A. El-Mouadib;Halima S. Talhi

关键词

C-means ; Cluster analysis ; Data Mining (DM) ; Knowledge Discovery in Database (KDD) ; MC-means

期刊名称

International Journal of Electronic Commerce Studies

卷期/出版年月

1卷2期(2010 / 12 / 01)

页次

61 - 76

内容语文

英文

英文摘要

At the present time, massive amounts of data are being collected. The availability of such data gives rise to the urgent need to transform the data into knowledge; this is the function of the field of Knowledge Discovery in Database (KDD). The most essential step in KDD is the Data Mining (DM) step which is the search engine to find the knowledge embedded in the data. The tasks of DM can be classified into two types, namely: predictive or descriptive, according to the sought functionality.One of the older and well-studied functionalities in data mining is cluster analysis (Clustering). Clustering methods can be either hierarchal or partitioning. One of the very well known clustering algorithms is the C-means.In this paper, we turn our focus on cluster analysis in general and on the C-means partitioning method in particular. We direct our attention to the modification of the C-means algorithm in the way it calculates the means of the clusters. We consider the mean of a cluster to be one of the objects instead of being an imaginary point in the cluster. Our modified C-means (MC-means) algorithm is implemented in a system developed in the visual basic.net programming language. The well-known Soybean dataset is used in an experiment to evaluate our modification to the C-means algorithm. This paper is concluded with an analysis and discussion of the experiments' result on the bases of several criteria.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 經濟學
社會科學 > 財金及會計學
社會科學 > 管理學
参考文献
  1. Asuncion, A. and Newman, D. J.. UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, School of Information and Computer Science, 2007.
  2. Fayyad, U.,Piatetsky-shapiro, G.,Smyth, P.(1996).From Data Mining to Knowledge Discovery in data base.American:Association for Artificial Intelligence.
  3. Fowler, M.,Scott, K.(2000).UML Distilled A Brief Guide to the Standard Object Modeling Language.USA:Addison Wesley.
  4. Han, J.,Kamber, M.(2000).Data Mining: Concepts and Techniques.Canada:Morgan Kaufmann publishers.
  5. Hand, D.,Mannila, H.,Smyth, P.(2001).Principles of Data Mining.Cambridge, Massachusetts London England:Massachusetts Institute of Technology Press.
  6. Kaufman, L.,Rousseeum, P.(1990).Finding Groups in Data.United States of America:John Wiley & Sons, Inc.
  7. Kogan, J.(2007).Introduction to Clustering Large and High-Dimensional Data.United States of America:Cambridge University Press.
  8. Larose, D.(2005).Discovering Knowledge in Data.New Jerse:John Wiley & Sons, Inc.
  9. Loton, T.,McNeish, K.,Schoellmann, B.,Slater, J.,Wu, Chaur(2002).Professional UML with Visual Studio .NET Unmasking Visio for Enterprise Architects.United Kingdom:Wrox Press Ltd.
  10. MacQueen, J(1967).Some methods for classification and analysis of multivariate observations.Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability
  11. Michalski, R. S.,Chilausky, R. L.(1980).Learning by Being Told and Learning from Examples: An Experimental Comparison of the Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis.International Journal of Policy Analysis and Information Systems,4(2),125-161.
  12. Mitra, S.,Acharya, T.(2003).Data Mining Multimedia.New Jersey:John Wiley & Sons, Inc.
  13. Miyamoto, S.,Ichihashi, H.,Honda, K.(2008).Algorithms for Fuzzy Clustering Methods.Springer:Verlag Berlin Heidelberg.
  14. Pender, T.(2002).UML Weekend Crash Course.Indianapolis, Indiana:Wiley Publishing Inc.
  15. Weilkiens, T.(2007).Systems engineering with SysML/UML: modeling, analysis, design.United States of America:Morgan Kaufmann Publishers.