题名

A Pattern Search in Data Analysis

作者

Chun-Hung Tzeng;Fu-Shing Sun

关键词

Pattern-recognition ; Similarity ; Representative ; Heuristic-information

期刊名称

International Journal of Electronic Commerce Studies

卷期/出版年月

1卷2期(2010 / 12 / 01)

页次

117 - 137

内容语文

英文

英文摘要

This paper introduces a probabilistic model of two-class pattern recognition. The measurable sets are defined by a similarity, which is a reflexive and symmetric binary relation. The heuristic information model is formulated by a type of data clustering called representative clustering. The heuristic information about a data record is a data subset containing the record, which is computed by comparing the record with all representative records. For the corresponding classifiers, both Bayes and Neyman-Pearson Theorems are proved in this paper. In application, the knowledge discovering process searches for similarity and representative clustering in a training data set. The evaluation is extended to records in a testing data set. The experiment shows the trade-off between the number of representatives and classifier performance.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 經濟學
社會科學 > 財金及會計學
社會科學 > 管理學
参考文献
  1. ACM KDD-Cup 1999, Computer Network Intrusion Detection, http://kdd.ics.uci.edu/databases/kddcup99/.
  2. DARPA Intrusion Detection Evaluation, http://www.ll.mit.edu/mission/communications/ist/CST/index.html.
  3. Devroye, L.,Gyoerffi, G. L.(1996).A Probabilistic Theory of Pattern Recognition.Springer.
  4. Good, I. J.(1983).The Foundations of Probability and Its Applications.Minneapolis:University of Minnesota Press.
  5. Good, I. J.(1965).THE ESTIMATION OF PROBABILITIES, An Essay on Modern Bayesian Methods.Cambridge, Mass.:MIT Press.
  6. Greco, S.,Matarazzo, B.,Slowinski, R.(2001).Rough set theory for multicriteria decision analysis.European Journal of Operational Research,129(1),1-47.
  7. Han, J.,Kamber, M.(2006).Data Mining Concepts and Techniques.Morgan Kaufmann.
  8. Hastie, R.,Tibshirami, T.,Friedman, J.(2001).The Elements of Statistical Learning.Springer.
  9. Maak, W.(1967).Fastperiodische Funktionen.Springer-Verlag.
  10. Maloof, M. A. E.(2006).Machine Learning and Data Mining for Computer Security.Springer-Verlag.
  11. Polkowski, L.(2002).Rough Sets, Mathematical Foundations.Heidelberg:Physica-Verlag.
  12. Stepaniuk, J.(2000).Knowledge discovery by application of rough set models.Rough Set Methods and Applications : new developments in knowledge discovery in information systems,Heidelberg, Germany:
  13. Tzeng, C.-H.(2008).Similarity and pattern recognition.Proc. Intern. Conf. on Data Mining and Appl. ICDMA’08
  14. Tzeng, C.-H.(1988).A Theory of Heuristic Information in Game-Tree Search.Springer-Verlag.
  15. Tzeng, C.-H.(2009).A Probabilistic Model of Pattern Recognition on Abstract Data.Proc. of The 2009 Intern. Conf. on Data Mining
  16. Tzeng, C.-H.,Sun, F.-S.(2003).Data clustering in tolerance space.Advances in Intelligent Data Analysis V
  17. Tzeng, C.-H.,Tzeng, C.-S. O.(1978).Tolerance spaces and almost periodic functions.Bull. Inst. Math. Acad. Sinica,6,159-173.
  18. Zeeman, E. C.(1962).The topology of the brain and visual perception.Topology of 3-Manifolds and related Topics, Proc. The Univ. of Georgia Institue