题名

決策樹形式知識整合之研究

并列篇名

The Research on Decision-Tree-Based Knowledge Integration

DOI

10.6382/JIM.200507.0247

作者

馬芳資(Fang-Tz Ma);林我聰(Woo-Tsong Lin)

关键词

決策樹 ; 決策樹合併 ; 知識整合 ; Decision Tree ; Decision Tree Merging ; Knowledge Integration

期刊名称

資訊管理學報

卷期/出版年月

12卷3期(2005 / 07 / 01)

页次

247 - 279

内容语文

繁體中文

中文摘要

隨著知識經濟時代的來臨,掌握知識可幫助組織提昇其競爭力,因此對於知識的產生、儲存、應用和整合,已成為熱烈討論的議題,本研究擬針對知識整合此議題進行探討。在知識呈現方式中,決策樹(Decision Tree)形式知識為樹狀結構,可以用圖形化的方式來呈現,它的結構簡單且易於瞭解,本研究將針對決策樹形式知識來探討其知識整合的課題。本研究提出一個合併選擇決策樹方法MODT(Merging Optional Decision Tree),主要是在原始決策樹結構中增加一個選擇連結(Option Link),來結合具有相同祖先(Ancestor)的兩個子樹;而結合的方式是以兩兩合併的方式,由上而下前序式比對兩棵決策樹的節點(Node),利用接枝(Grafting)技術來結合兩探樹的知識。再者,利用強態法則(Strong Pattern Rule)概念來提昇合併決策樹(Merged Decision Tree)的預測能力。本研究利用實際信用卡客戶的信用資料來進行驗證,以隨機抽取的五組獨立測試例子集來測試二十棵原始決策樹(Primitive Decision Tree)和十棵合併決策樹,並比較兩者的預測準確度,是故總共進行五十次的比較,合併決策樹的準確度同時大於、等於兩棵原始決策樹的比例為79.5%;並且針對兩者的準確度進行統計檢定,我們發現合併決策樹的準確度是有顯著大於原始決策樹。亦即本研究所提出之合併選擇決策樹方法可達成知識整合與累積的目的。

英文摘要

Along with the approach of the knowledge economy era, mastering knowledge can aid organizations to improve their competitive abilities. Therefore, knowledge creation, retention, application, and integration are becoming the hottest themes for discussion nowadays. This research tends to focus on discussion of knowledge integration and related subjects. Among the methods of knowledge representation, the decision tree is the most common. It shows knowledge structure in a tree-shaped graphic. Decision trees are considerably simple and easily understood; thus we focus on decision-tree-based knowledge in connection with the theme of knowledge integration. Our research proposes a method called MODT (Merging Optional Decision Tree), which merges two knowledge trees at once and adds an optional link to merge nodes which have the same ancestor. In MODT, we compared the corresponding nodes of two trees by using the top-down traversal method to make sure their nodes were the same. When their nodes were the same, we recounted the number of samples and recalculated the degree of purity. When their nodes were not the same, we added the node of the second tree and its descendants to the first tree by using the Grafting Technique. This yielded a completely merged decision tree. The Strong Pattern Rule was used to strengthen the forecast accuracy when using merged decision trees. We took sample data from credit card users to carry out the experiment, and five groups of the test samples were extracted randomly to test twenty primitive and ten merged trees. Eventually, after fifty comparison tests, the merged tree showed a 79.5% chance of being equal or more accurate than the primitive trees. This research result supports our proposition that the merged decision tree method could achieve a better outcome with regard to knowledge integration and accumulation.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. Bauer, F.,Kohavi, R.(1999).An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.Journal of Machine Learning.
  2. Breiman, L.(1996).Bagging predictors.Machine Learning,24,123-140.
  3. Butine, W.(1992).Learning classification trees.Statistics and Computing,2(2),63-73.
  4. Chan, P. K.,Stolfo, S. J.(1995).A comparative evaluation of voting and meta-learning on partitioned data.Proceedings of the 12th International Conference on Machine Learning (ICML-95).
  5. Chan, P. K.,Stolfo, S. J.(1995).Learning arbiter and combiner trees from partitioned data for scaling machine learning.on Knowledge Discovery and Data Mining.
  6. Chen, K.,Wang, L.,Chi, H.(1997).Methods of Combining Multiple Classifiers with Different Features and Their Applications to Text-Independent Speaker Identification.International Journal of Pattern Recognition and Artificial Intelligence,11(3),417-445.
  7. Cormen, T. H.,Leiserson, C. E.,Rivest, R. L.,Stein, C.(2001).Introduction to Algorithms.
  8. Freund, Y.,Schapire, R. E.(1997).A decision-theoretic generalization of on-line learning and an application to boosting.Journal of Computer and System Sciences,55,119-139.
  9. Hall, L. O.,Chawla, N.,Bowyer, K. W.(1998).Combining decision trees learned in parallel.Working Notes of the KDD-97 Workshop on Distributed Data Mining.
  10. Holmes, G.,Kirkby, R.,Pfahringer, B.(2004).Mining data streams using option trees.Department of Computer Science.
  11. Kohavi, R.,Kunz, C.(1997).Option Decision Trees with Majority Votes.Machine Learning: Proceedings of the Fourteenth International Conference.
  12. Kohavi, R.,Quinlan, J. R.(2002).Decision-tree discovery.Handbook of Data Mining and Knowledge Discovery.
  13. Prodromidis, A. L.,Stolfo, S. J.(1998).Mining databases with different schemas: Integrating incompatible classifiers.Proc. KDD-98.
  14. Quinlan, J. R.(1996).Improved use of continuous attributes in C4.5.Journal of Artificial Intelligence Research,4,77-90.
  15. Quinlan, J. R.(1998).Mini-Boosting Decision Trees.AI Access Foundation and Morgan Kaufmann Publishers.
  16. Quinlan, J. R.(1996).Bagging, Boosting, and C4.5.Proceedings Thirteenth National Conference on Artificial Intelligence,725-730.
  17. Quinlan, J. R.(1993).C4.5: Programs for Machine Learning.
  18. Smyth, P.,Gray, A.,Fayyad, U.(1995).Retrofitting decision tree classifiers using kernel density estimation.Proceedings of the Twelfth International Conference on 844 Machine Learning.
  19. Ting, K. M.,Low, B. T.(1997).Model combination in the multiple-data-batched scenario.Proc European Conference on Machine Learning, Prague, Czech Republic, LNAI-1224,250-265.
  20. Ting, K. M.,Low, B. T.(1996).Working Paper 96/19, Department of Computer ScienceWorking Paper 96/19, Department of Computer Science,University of Waikato.
  21. Ting, K. M.,Witten, I. H.(1997).Stacking bagged and dagged models.Proc International Conference on Machine Learning, Tennessee.
  22. Todorovski, L.,Dzeroski, S.(1999).Experiments in meta-level learning with ILP.Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery.
  23. Todorovski, L.,Dzeroski, S.(2000).Combining Classifiers with Meta Decision Trees.Machine Learning Journal.
  24. Todorovski, L.,Dzeroski, S.(2000).Combining Multiple Models with Meta Decision Trees.Proceedings of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery.
  25. Webb, G. I.(1996).Further experimental evidence against the utility of occam`s razor.Journal of Artificial Intelligence Research.
  26. Webb, G. I.(1999).Decision Tree Grafting From The All Tests But One Partition.Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI 99).
  27. Williams, G.(1996).Induction and Combining Multiple Decision Trees.
  28. Witten, I. H.,Frank, E.(2000).Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations.
  29. Wolpert, D. H.(1992).Stacked generalization.Neural Networks,5,241-259.
  30. Zheng, Z.,Webb, G. I.(1998).Multiple Boosting: A Combination of Boosting and Bagging.Proceedings of the 1998 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 98).
  31. Zhou, Z. H.,Chen, Z. Q.(2002).Hybrid Decision Tree.Knowledge-Based Systems,15(8),515-528.
  32. 方維(1994)。演算法與資料結構。台北:維科出版社。
  33. 馬芳資(1995)。信用卡信用風險預警範例學習系統之研究。第十屆全國技職及職業教育研討會,技職研討會,商業類Ⅰ,427-436。