题名

以資料探勘技術進行糖尿病與乳癌關聯性分析之研究

并列篇名

Constructing a hybrid data mining scheme for analyzing the relationship between diabetes and breast cancer

DOI

10.6338/JDA.201610_11(5).0006

作者

李天行(Tian-Shyug Lee);黃婷萱(Ting-Xuan Huang);呂奇傑(Chi-Jie Lu)

关键词

資料探勘 ; 糖尿病 ; 疾病危險因子 ; 乳癌 ; data mining ; diabetes mellitus ; disease risk factor ; breast cancer

期刊名称

Journal of Data Analysis

卷期/出版年月

11卷5期(2016 / 10 / 01)

页次

77 - 96

内容语文

繁體中文

中文摘要

糖尿病為現今醫學上難以治癒之慢性疾病,其併發症引發死亡人口數逐年提升。近年來糖尿病與癌症的關聯性,於學術界中廣為探討。由於乳癌為台灣地區女性癌症發生率第一名,因此本研究的目的為利用資料探勘技術建構疾病危險因子分析模式以進行糖尿病與乳癌之關聯性分析,期望找出女性糖尿病患者與乳癌較具有相關性的糖尿病併發症。本研究使用回溯性世代研究,研究對象為全民健保資料庫2005年至2012年間的女性糖尿病患者,分析其於未來兩年內罹患乳癌之疾病危險因子。在分析模式中,先採用集群減少多數抽樣技術(under sampling based on clustering, SBC)處理健保資料庫存在的類別不平衡之問題,接著以糖尿病併發症為預測變數,最後使用分類迴歸樹(classification and regression trees, CART)建構女性糖尿病患者罹患乳癌的預測模式,進而找出重要的疾病危險因子。研究結果發現,當女性糖尿病患者,患有「糖尿病所致多發神經病變」或「併有末梢血管循環疾患之糖尿病」時,其罹患乳癌的勝算比顯著較高,代表此兩個糖尿病併發症是與罹患乳癌較有關聯性的重要疾病危險因子。本研究之分析模式能夠發揮資料探勘技術之特性,找出與女性糖尿病患者罹患乳癌相關的重要疾病危險因子,提供有用資訊於醫療方面上作為參考。

英文摘要

Diabetes is a chronic disease which cannot be cured by medical technology nowadays. The deaths caused by diabetes complications are increased year by year. The breast cancer brings huge medical expenses and becomes the burden of the National Health Insurance. Analyzing the relevance between diabetes and breast cancer is an attractive issue in recent years. Among all the cancer, the incidence of breast cancer is the highest in Taiwanese female. Therefore, the purpose of this study is to apply data mining techniques to propose a disease risk factor analysis scheme for analyzing relationship between diabetes and breast cancer. The proposed scheme includes under sampling based on clustering (SBC) which is used to deal with class imbalance problem, and classification and regression trees (CART) which is utilized to build classification model and select important risk factors. The used data of the diabetic patients without breast cancer but suffering breast cancer in next two years are collected from the National Health Insurance Research Database of Taiwan. Experimental results showed that "diabetes neuropathy" and "Diabetes mellitus with peripheral circulatory disorder" are identified as important risk factors by using the proposed scheme. The female diabetic patients with the two risk factors have higher incidence of suffering breast cancer than those without the two factors. The results of this paper provide an effective and appropriate disease prediction model to find important disease risk factors for recognizing the female diabetic patients who would suffer from breast cancer.

主题分类 基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 管理學
参考文献
  1. 陳正美、徐建業、邱泓文、白其卉、吳柏動(2011)。以類神經網路及分類迴歸樹輔助肝癌病患預測存活情形。臺灣公共衛生雜誌,30(5),481-493。
    連結:
  2. 顏秀珍、李御璽、王秋光(2009)。改善不平衡資料集中少數類別資料之分類正確性的方法。電子商務學報,11(4),847-858。
    連結:
  3. 衛生福利部統計處(2015) 。103 年度死因統計, 取自:http://www.mohw.gov.tw/cht/DOS/Statistic.aspx?f_list_no=312&fod_list_no=5487 。搜尋日期:2015 年6 月17 日。
  4. Maynard, G. D. (1910). A statistical study in cancer death-rates. Biometrika, 7(3), 276-304.
  5. Boyle, P.,Boniol, M.,Koechlin, A.,Robertson, C.,Valentini, F.,Coppens, K.(2012).Diabetes and breast cancer risk: a meta-analysis.British journal of cancer,107(9),1608-1617.
  6. Breiman, L.,Friedman, J.,Stone, C. J.,Olshen, R. A.(1984).Classification and regression trees.Boca Raton, USA:CRC press.
  7. Cabena, P.,Hadjinian, P.,Stadler, R.,Verhees, J.,Zanasi, A.(1998).Discovering data mining: from concept to implementation.New Jersey, USA:Prentice-Hall, Inc..
  8. Dine, J.,Deng, C. X.(2013).Mouse models of BRCA1 and their application to breast cancer research.Cancer and Metastasis Reviews,32(1-2),25-37.
  9. Emerging Risk Factors Collaboration(2011).Diabetes mellitus, fasting glucose, and risk of cause-specific death.New England Journal Medicine,2011(364),829-841.
  10. Fayyad, U.,Piatetsky-Shapiro, G.,Smyth, P.(1996).From data mining to knowledge discovery in databases.AI magazine,17(3),37.
  11. Fonarow, G. C.,Adams, K. F.,Abraham, W. T.,Yancy, C. W.,Boscardin, W. J.,ADHERE Scientific Advisory Committee(2005).Risk stratification for in-hospital mortality in acutely decompensated heart failure: classification and regression tree analysis.Jama,293(5),572-580.
  12. Giovannucci, E.,Harlan, D. M.,Archer, M. C.,Bergenstal, R. M.,Gapstur, S. M.,Habel, L. A.(2010).Diabetes and cancer: a consensus report.CA: a cancer journal for clinicians,60(4),207-221.
  13. Gu, X.,Ni, T.,Wang, H.(2014).New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification.The Scientific World Journal,536434.
  14. Hu, J.,He, X.,Yu, D. J.,Yang, X. B.,Yang, J. Y.,Shen, H. B.(2014).A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction.PloS one,9(9),107676.
  15. Joslin, E. P.,Lombard, H. L.,Burrows, R. E.,Manning, M. D.(1959).Diabetes and cancer.New England Journal of Medicine,260(10),486-488.
  16. Keteepe-Arachi, T.,Sharma, S.(2016).Underestimating risk in women delays diagnosis of CVD.The Practitioner,260(1791),11-5.
  17. Laupacis, A.,Sekar, N.(1997).Clinical prediction rules: a review and suggested modifications of methodological standards.Jama,277(6),488-494.
  18. Law, J. H.,Habibi, G.,Hu, K.,Masoudi, H.,Wang, M. Y.,Stratford, A. L.(2008).Phosphorylated insulin-like growth factor-i/insulin receptor is present in all breast cancer subtypes and is related to poor survival.Cancer research,68(24),10238-10246.
  19. Lin, T.,Chou, P.,Lai, M. S.,Tsai, S. T.,Tai, T. Y.(2001).Direct costs-of-illness of patients with diabetes mellitus in Taiwan.Diabetes research and clinical practice,54,43-46.
  20. Longadge, R.,Dongre, S.(2013).Class imbalance problem in data mining review.International Journal of Computer Science and Network,2(1)
  21. Mani, I.,Zhang, I.(2003).KNN approach to unbalanced data distributions: A case study involving information extraction.Work-shop on Learning from Imbalanced Datasets. ICML 2003,Washington, DC.:
  22. Michels, K. B.,Solomon, C. G.,Hu, F. B.,Rosner, B. A.,Hankinson, S. E.,Colditz, G. A.,Manson, J. E.(2003).Type 2 diabetes and subsequent incidence of breast cancer in the Nurses' Health Study.Diabetes care,26(6),1752-1758.
  23. Oh, S. M.,Stefani, K. M.,Kim, H. C.(2014).Development and application of chronic disease risk prediction models.Yonsei medical journal,55(4),853-860.
  24. Palaniappan, S.,Awang, R.(2008).Intelligent heart disease prediction system using data mining techniques.2008 IEEE/ACS International Conference on Computer Systems and Applications. AICCSA 2008,Doha, Qatar:
  25. Pereira, S.,Fontes, F.,Sonin, T.,Dias, T.,Fragoso, M.,Castro-Lopes, J.,Lunet, N.(2014).Neurological complications of breast cancer: study protocol of a prospective cohort study.BMJ open,4(10),e006301.
  26. Prather, J. C.,Lobach, D. F.,Goodwin, L. K.,Hales, J. W.,Hage, M. L.,Hammond, W. E.(1997).Medical data mining: knowledge discovery in a clinical data warehouse.Proc AMIA Annu Fall Symp.
  27. Reaven, G. M.(1980).Insulin-independent diabetes mellitus: metabolic characteristics.Metabolism,29(5),445-454.
  28. Srokowski, T. P.,Fang, S.,Hortobagyi, G. N.,Giordano, S. H.(2009).Impact of diabetes mellitus on complications and outcomes of adjuvant chemotherapy in older patients with breast cancer.Journal of Clinical Oncology,27(13),2170-2176.
  29. Suh, S.,Kim, K. W.(2011).Diabetes and cancer: is diabetes causally related to cancer?.Diabetes & metabolism journal,35(3),193-198.
  30. Tabaei, B. P.,Herman, W. H.(2002).A multivariate logistic regression equation to screen for diabetes development and validation.Diabetes Care,25(11),1999-2003.
  31. Tseng, C. H.,Chong, C. K.,Tai, T. Y.(2009).Secular trend for mortality from breast cancer and the association between diabetes and breast cancer in Taiwan between 1995 and 2006.Diabetologia,52(2),240-246.
  32. Wolf, I.,Sadetzki, S.,Catane, R.,Karasik, A.,Kaufman, B.(2005).Diabetes mellitus and breast cancer.The lancet oncology,6(2),103-111.
  33. Xie, X. D.,Qu, S. X.,Liu, Z. Z.,Zhang, F.,Zheng, Z. D.(2009).Study on relationship between angiogenesis and micrometastases of peripheral blood in breast cancer.Journal of cancer research and clinical oncology,135(3),413-419.
  34. Yen, S. J.,Lee, Y. S.(2006).Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset.Intelligent Control and Automation,344(8),731-740.
  35. Yen, S. J.,Lee, Y. S.(2009).Cluster-based under-sampling approaches for imbalanced data distributions.Expert Systems with Applications,36(3),5718-5727.
  36. 李哲全、傳振宗、吳篤安(2006)。糖尿病的診斷與治療。慈濟醫學雜誌,18(1_S),1-9。
  37. 沈宜靜、林建良、許惠恒(2011)。糖尿病與癌症之關聯以及台灣現況探討。內科學誌,22(1),19-30。
  38. 陳民虹(2005)。乳癌的流行病學特徵及危險因子。澄清醫護管理雜誌,1(1),30-38。
  39. 鄭淑敏(2013)。碩士論文(碩士論文)。高雄市,高雄醫學大學藥學研究所碩士在職專班。