题名

由醫療資料庫發掘有意義之模糊關聯規則

并列篇名

Finding Relevant Fuzzy Association Rules from Medical Databases

DOI

10.6382/JIM.200504.0025

作者

謝楠楨(Nan-Chen Hsieh)

关键词

資料探勘 ; 叢集劃分 ; 自我組織映射圖網路 ; 模糊關聯規則 ; 模糊重組關聯 ; 真實值 ; Data mining ; cluster partitioning ; self-organizing map SOM ; fuzzy association rule ; fuzzy resemblance relation ; truth value

期刊名称

資訊管理學報

卷期/出版年月

12卷2期(2005 / 04 / 01)

页次

25 - 51

内容语文

繁體中文

中文摘要

本研究將提出一種適用於醫療資料庫探勘之四階段作業程序,以改善現有關聯規則(association rule)資料探勘研究中常見,如所發掘之關聯規則語意不清晰、關聯規則重複,以及因傳統關聯規則「支持度\信賴度」機制的限制,造成遺失有意義的規則等問題。為使發掘之關聯規則語意清晰,本研究首先運用叢集劃分(cluster partitioning)技術,自動將資料表格中數值資料(quantitative data)的資料欄位,轉換成為口語化述辭(linguistically terms)形式的模糊集合,其後使用自我組織映射圖網路(SOM, self-organizing maps)叢集分析法,依據敏感度分析(sensitivity analysis)所獲得之相對重要資料欄位,以及資料本身特徵,將所有資料區分為數個內部資料特徵相似的叢集,並對各叢集進行關聯規則分析,其後並以模糊相似關聯(fuzzy resemblance relation)概念設計之演算法,將語意近似之重覆關聯規則加以合併。藉由關聯規則之合併,可有效減少發掘關聯規則之數量,且所保留之關聯規則更具資訊表達之完整性(informative),且更易於醫療領域之解釋及運用。另為判斷關聯規則之可信度,本研究並運用模糊資料庫(fuzzy database)中真實值(truth value)評量方法,保留具較高真實度之關聯規則。最後,我們並使用一真實的疾病醫療資料庫驗證本研究提出的作法。

英文摘要

For data mining applications, association rule can be used to support a decision making process. However, association rule algorithms usually yield a large numbers of rules, and many of the rules may contain redundant, irrelevant information or describe trivial knowledge. In this paper we present a four-stage data mining processes for finding relevant fuzzy association rules from medical database. Fuzzy association rules are especially suitable in medical mining, since they consist of simple linguistically interpretable rules and do not have the drawbacks of symbolic or crisp association rule. In the first phase, the Cluster partitioning technique was used to automatically transform quantitative values into fuzzy linguistically terms. The linguistically terms were modeled by means of fuzzy sets defined in the appropriate attribute domains. Next, a Kohonen self-organizing map (SOM) was used to identify clusters based on shared feature attribute values. The resulting clusters were then classified by feature attributes determined using an Apriori association rule algorithm. Because the association rule algorithm tended to generate large numbers of rules, we present interactive strategies for pruning redundant association rules on the basis of fuzzy resemblance relation to enhance its readability, and evaluate the truth degree of the discovered fuzzy association rules by the truth evaluation mechanism. Finally, we demonstrate our approach on a real disease medical database.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 管理學
参考文献
  1. Agrawal R.,Imielinski T.,Swami A.(1993).Mining association rules between sets of items in large databases.ACM SIGMOD International Conference.
  2. Bastide Y.,Pasquier N.,Taouil R.,Stumme G.,Lakhal L.(2000).Mining minimal non-redundant association rules using frequent closed item sets.Lecture Notes In Computer Science,1861
  3. Baysrdo R. J.,Agrawal R.(1999).Mining the most interesting rules.Proc. KDD Conference.
  4. Brin S.,Motwani R.,Silversterin C.(1997).Beyond market baskets: Generalizing association rules to correlation.Proc. SIGMOD conference.
  5. Chaea Y. M.,Kima H. S.,Tarkb K. C.,Parkb H. J.,Hoa S. H.(2003).Analysis of healthcare quality indicator using data mining and decision support system.Expert Systems with Applications,24,167-172.
  6. Chen G.,Wei Q.(2002).Fuzzy association rules and the extended mining algorithms.Information Sciences,147,201-228.
  7. Chen M. S.,Han J.,Yu P. S.(1996).Data mining: An overview from database perspective.IEEE Transactions on Knowledge and Data Engineering,8,866-883.
  8. Cybenko G.(1989).Approximating by super positions of a sigmoidal function.Mathematical Control Signal Systems,2,303-314.
  9. Delgado M.,Sánchez D.,Martín-Bautista M. J.,Vila M. A.(2001).Mining association rules with improved semantics in medical databases.Artificial Intelligence in Medicine,21,241-245.
  10. Fayyad U.,Piatetsky-Shapiro G.,Smyth P.(1996).From data mining to knowledge discovery in databases.AI Magazine,17,37-54.
  11. Frawley W. J.,Piatetsky-Shapiro G.,Matheus C. J.(1991).Knowledge discovery in databases: an overview.
  12. Fu A.,Wong M.,Sze S.,Wong W.,Yu W.(1998).Finding fuzzy sets for the mining of fuzzy association rules for numerical attributes.Proceedings of International Symposium on Intelligent Data Engineering and Learning.
  13. Han J.,Fu Y.(1995).Discovery of multiple-level association rule from large databases.Proceedings VLDB conference.
  14. Heckerman D.(1996).Bayesian networks for knowledge discovery.Advances in Knowledge Discovery and Data Mining.
  15. Hornik K.,Stinchcombe M.,White H.(1989).Multilayer feedforward networks are universal approximations.Neural Networks,2,336-359.
  16. Hsieh N. C.(2004).Handling indefinite and maybe information in logical fuzzy relational databases.International Journal of Intelligent Systems,19(3),257-276.
  17. Jensen S.(2001).Mining medical data for predictive and sequential patterns.PKDD 2001 Discovery Challenge on Thrombosis Data.
  18. Kaufman L.,Rousseeuw P. J.(1990).Finding Groups in Data: An introduction to cluster analysis.
  19. Klemettinen M.,Mannila H.,Ronkainen P.,Toivonen H.,Verkamo A. I.(1994).Finding interesting rules from large sets of discovered association rules.Proceeding CIKM conference.
  20. Kohonen T.(1995).The self-organizing map.
  21. Lavrac N.(1999).Selected techniques for data mining in medicine.Artificial Intelligence in Medicine,16,3-23.
  22. Levin B.,Meidan A.,Cheskis A.,Gefen O.,Vorobyov(1999).PKDD99 Discovery Challenge-Medical Domain.Workshop Notes on Discovery Challenge
  23. Markey M. K.,Lo J. Y.,Tourassi G. D.,Floyd Jr. C. E.(2003).Self-organizing map for cluster analysis of a breast cancer database.Artificial Intelligence in Medicine,27,113-127.
  24. Mitra S.(2002).Data mining in soft computing framework: A survey.IEEE Transactions on Neural Networks,13(1)
  25. Ng R. T.,Han J.(1994).Efficient and effective clustering methods for spatial data mining.Proceeding 20th International Conference on Very Large Databases.
  26. Ordonez C.,Santana C. A.,Braal L.(2000).Discovering interesting association rules in medical data.ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD 2000).
  27. Park J. S.,Chen M-S.,Yu P. S.(1995).An effective hash-based algorithm for mining association rules.Proceedings of ACMSIGMOD.
  28. Srikant R.,Agrawal R.(1994).Fast algorithms for mining association rules.Proceedings of the 20th VLDB Conference.
  29. Srikant R.,Agrawal R.(1995).Mining generalized association rules.Proceedings of the 21th VLDB Conference.
  30. Srikant R.,Agrawal R.(1996).Mining quantitative association rules in large relational tables.Proceedings of the ACM SIGMOD International Conference.
  31. Taylor C. C.(1999).PKDD`99 Discovery Challenge: Medical Data Set.Workshop Notes on Discovery Challenge
  32. Yager R. R.(1984).General multiple-objective decision functions and linguistically quantified statements.International Journal of Man-Machine Studies,21,389-400.
  33. Yager R. R.(1988).On ordering weighted averaging aggregation operations in multicriteria decision-making.IEEE Transactions on System, Man, Cybernetics,18,183-190.
  34. Zadeh L. A.(1984).A computational approach to fuzzy quantifiers in natural languages.Computers Mathematics with Applications,9,149-184.
  35. Zadeh L. A.(1978).Fuzzy sets as a basis for theory of possibility.Fuzzy Sets and Systems,3-28.
  36. Zemankova M.,Kandel A.(1985).Implementing Imprecise in Information Systems.Information Sciences,37,107-141.
  37. Zytkow J.,Gupta S.(2000).Guide to Medical Data on Collagen Disease and Thrombosis.PKDD 2001 Discovery Challenge on Thrombosis Data.
被引用次数
  1. Lin, Jan-Yan,Lin, Angel,Hu, Yi-Chung,Lin, Jan-Yan,Lin, Angel,Hu, Yi-Chung(2013).Analyzing Investment Regions in Mainland China for Taiwanese Firms by Association Rule Mining.Asia Pacific Management Review,18(2),143-160.
  2. 胡宜中、林震岩、林雅惠(2011)。運用關聯規則和序列型樣探討投資地區之關聯性與遷移─以印刷電路板產業為例。明新學報,37(1),217-230。