题名

PERFORMANCE MEASURES IN CLASSIFICATION PROBLEMS WITH CLASS-IMBALANCED DATA

并列篇名

不平衡資料下的分類表現測度探討

作者

柯博祥(Bo-Shiang Ke);張源俊(Yuan-chin Ivan Chang)

关键词

AC_1 statistic ; lift index ; cumulative gains chart ; cumulative lift chart ; imbalanced data ; AC_1統計量 ; 提升指標(lift index)累積增益表(cumulative gains chart) ; 累積提升表(cumulative lift chart) ; 不平衡資料

期刊名称

中國統計學報

卷期/出版年月

55卷1期(2017 / 03 / 01)

页次

2 - 24

内容语文

英文

中文摘要

An enormous amount of classification models and many accompanying performance measures have been proposed in the literature. Due to the uniqueness of individual problems, perplexity usually arises from choosing an appropriate measure for a new question. This situation is more complicated when the data sizes are imbalanced among classes and has been viewed as one of 10 challenges in the decision-making-related research. In this paper, we review many popular classification performance criteria and focus on their properties under the situation of imbalanced class sizes in a binary classification.

英文摘要

在文獻中已經提出許多分類模型和用來評估模型效能的表現測度(performance measures)。由於不同分類問題有其獨特性,如何選定適合的測度往往易使人混亂。這樣的情形在不平衡資料(imbalanced data) 的狀況中更為複雜,這問題也已經被決策相關研究列為十大挑戰之一。在此篇文章中,不僅回顧了許多常見的表現準則,並討論它們在不平衡的二元分類資料下的性質。

主题分类 基礎與應用科學 > 統計
参考文献
  1. Altman, D. G.,Bland, J. M.(1994).Statistics notes: Diagnostic tests 2: predictive values.British Medical Journal,309(6947),102.
  2. Blattberg, R.,Kim, B.,Neslin, S.(2010).Database Marketing: Analyzing and Managing Customers.Springer.
  3. Brodersen, K. H.,Ong, C. S.,Stephan, K. E.,Buhmann, J. M.(2010).The balanced accuracy and its posterior distribution.Proceedings of the 2010 20th International Conference on Pattern Recognition
  4. Bult, J. R.,Wansbeek, T.(1995).Optimal selection for direct mail.Marketing Science,14(4),378-394.
  5. Burez, J.,Van den Poel, D.(2009).Handling class imbalance in customer churn prediction.Expert Systems with Applications,36(3),4626-4636.
  6. Cohen, J.(1960).A coefficient of agreement for nominal scales.Educational and Psychological Measurement,20(1),37-46.
  7. Cover, T. M.,Thomas, J. A.(2012).Elements of Information Theory.JohnWiley & Sons.
  8. Egan, J.(1975).Signal Detection Theory and ROC Analysis.Academic Press.
  9. Fawcett, T.(2006).An introduction to ROC analysis.Pattern Recognition Letters,27(8),861-874.
  10. Gensch, D. H.(1984).Targeting the switchable industrial customer.Marketing Science,3(1),41-54.
  11. Gensch, D. H.,Aversa, N.,Moore, S. P.(1990).A choice-modeling market information system that enabled ABB electric to expand its market share.Interfaces,20(1),6-25.
  12. Gwet, K. L.(2014).Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring The Extent of Agreement Among Raters.Advanced Analytics, LLC..
  13. Gwet, K. L.(2002).Kappa statistic is not satisfactory for assessing the extent of agreement between raters.Statistical Methods for Inter-Rater Reliability Assessment Series,1(6),1-6.
  14. Hand, D. J.(2012).Assessing the performance of classification methods.International Statistical Review,80(3),400-414.
  15. Hoehler, F. K.(2000).Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity.Journal of Clinical Epidemiology,53(5),499-503.
  16. Hong, C. S.(2009).Optimal threshold from ROC and CAP curves.Communications in Statistics - Simulation and Computation,38(10),2060-2072.
  17. James, G.,Witten, D.,Hastie, T.,Tibshirani, R.(2013).An Introduction to Statistical Learning.Springer.
  18. Krawczyk, B.(2016).Learning from imbalanced data: open challenges and future directions.Progress in Artificial Intelligence,1-12.
  19. Nash, E.(Ed.)(1992).The Direct Marketing Handbook.McGraw-Hill.
  20. Pepe, M.(2003).The Statistical Evaluation of Medical Tests for Classification and Prediction.Oxford University Press.
  21. Powers, D. M.(2015).,未出版
  22. Velez, D. R.,White, B. C.,Motsinger, A. A.,Bush, W. S.,Ritchie, M. D.,Williams, S. M.,Moore, J. H.(2007).A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction.Genetic Epidemiology,31(4),306-315.
  23. Wang, Y.,Hu, B. G.(2009).Derivations of normalized mutual information in binary classifications.Proceeding of the 6th International Conference on Fuzzy Systems and Knowledge Discovery
  24. Webb, A.,Copsey, K.(2011).Statistical Pattern Recognition.Wiley.
  25. Yang, Q.,Wu, X.(2006).10 challenging problems in data mining research.International Journal of Information Technology & Decision Making,5(04),597-604.
  26. Youden, W. J.(1950).Index for rating diagnostic tests.Cancer,3(1),32-35.