题名

利用資料探勘技術建立疾病危險因子分析模式-以糖尿病腎病變透析治療為例

并列篇名

Applying data mining techniques for constructing disease risk factor analysis model-Case study for dialysis treatment of diabetic nephropathy

DOI

10.6338/JDA.201610_11(5).0005

作者

李天行(Tian-Shyug Lee);劉程凱(Chen-Kai Liu);劉文勝(Wen-Sheng Liu);呂奇傑(Chi-Jie Lu)

关键词

資料探勘 ; 類別不平衡 ; 疾病危險因子 ; 糖尿病腎病變 ; 透析治療 ; data mining ; class imbalance ; disease risk factor ; diabetic nephropathy ; dialysis treatment

期刊名称

Journal of Data Analysis

卷期/出版年月

11卷5期(2016 / 10 / 01)

页次

53 - 76

内容语文

繁體中文

中文摘要

透析治療已成為全民健保的龐大負擔,而腎病變為影響糖尿病患者是否進入透析階段的主要因素。本研究利用資料探勘技術分析健保資料庫,探討未患有腎病變的糖尿病患者,於未來三年內發生腎病變,並進入透析階段(即糖尿病腎病變透析治療)之疾病危險因子。本研究利用健保資料庫進行回溯性世代研究,並透過集群減少多數抽樣技術(SBC)、分類迴歸樹(CART)與支援向量機(SVM)等技術,建立疾病危險因子分析模式。研究結果發現,當患者具備「糖尿病病程五年以上」、「增殖型視網膜病變」與「玻璃體出血」等分析模式所篩選之疾病危險因子時,其三年內進入透析階段的發生率與勝算比皆顯著較高。因此所提之分析模式,能夠有效的發揮資料探勘技術之特性,並減少資料類別不平衡的影響,找出有效的疾病危險因子,讓相關單位可對透析治療之高危險族群加強健康管理,減少健保負擔。

英文摘要

Dialysis treatment has become a huge burden on national health insurance. Nephropathy is a major technique to diagnose whether diabetic patients need dialysis treatment. The purpose of this study is to apply data mining techniques to analysis the databases of national health insurance to explore disease risk factors affecting diabetic patients without nephropathy and started dialysis treatment within next three years. The proposed disease risk factor analysis model composes three data mining techniques including under sampling based on clustering (SBC), classification and regression tree (CART) and support vector machine (SVM). Experimental results showed that three disease risk factors can be identified involving "diabetes of over 5-years duration", "Proliferative diabetic retinopathy", and "vitreous hemorrhages" are selected as important risk factors by using the proposed techniques. The diabetic patients with the three risk factors have higher incidences of dialysis than those without the three factors. The proposed model also manages the class imbalance problem and can be used to accurately find important disease risk factors and high-risk groups accordingly.

主题分类 基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 管理學
参考文献
  1. 衛生福利部健康保險署, 2014 。102 年全民健康保險統計。http://www.nhi.gov.tw/webdata/webdata.aspx?menu=17&menu_id=1023&WD_ID=1043&webdata_id=4639#content。搜尋日期:2015 年3 月1 日。
  2. Hsu, C. W., Chang, C. C., & Lin, C. J. 2003. A practical guide to support vector classification. Working paper, National Taiwan University, Taipei.
  3. 衛生福利部健康保險署, 2014 。臺灣透析相關數據,http://www.nhi.gov.tw/webdata/webdata.aspx?menu=17&menu_id=1027&webdata_id=4565&WD_ID=1076。搜尋日期:2015 年3 月1 日。
  4. Al‐Rubeaan, K.,El‐Asrar, A.,Ahmed, M.,Youssef, A. M.,Subhani, S. N.,Ahmad, N. A.,Alguwaihes, A.,Alotaibi, M. S.,Al-Ghamdi, A.,Ibrahim, H. M.(2014).Diabetic retinopathy and its risk factors in a society with a type 2 diabetes epidemic: A Saudi National Diabetes Registry‐based study.PLoS One,93(2),140-147.
  5. Alwakeel, J. S.,Al-Suwaida, A.,Isnani, A. C.,Al-Harbi, A.,Alam, A.(2009).Concomitant macro and microvascular complications in diabetic nephropathy.Saudi Journal of Kidney Diseases And Transplantation,20(3),402-409.
  6. Arsanjani, R.,Dey, D.,Khachatryan, T.,Shalev, A.,Hayes, S. W.,Fish, M.,Nakanishi, R.,Germano, G.,Berman, D. S.,Slomka, P.(2015).Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population.Journal of Nuclear Cardiology,22(5),877-84.
  7. Balov, N.(2013).A categorical network approach for discovering differentially expressed regulations in cancer.BMC Medical Genomics,6(Suppl 3),S1.
  8. Breiman, L.,Friedman, J. H.,Olshen, R. A.,Stone, C. J.(1984).Classification and Regression Trees.CA:Wadsworth.
  9. Chen, M. C.,Chen, L. S.,Hsu, C. C.,Zeng, W. R.(2008).An information granulation based data mining approach for classifying imbalanced data.Information Sciences,178(16),3214-3227.
  10. Chen, M. S.,Wang, C. C.,Wang, L. Y.(2008).Applying decision tree to explore critical factors in developing diabetes among employees in high-tech industry.Journal of Health Management,6,135-146.
  11. Davis, T. M.,Brown, S. G.,Jacobs, I. G.,Bulsara, M.,Bruce, D. G.,Davis, W. A.(2010).Determinants of severe hypoglycemia complicating type 2 diabetes: the Fremantle diabetes study.The Journal of Clinical Endocrinology & Metabolism,95(5),2240-2247.
  12. Emamjomeh, A.,Goliaei, B.,Zahiri, J.,Ebrahimpour, R.(2014).Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method.Molecular BioSystems,12(10),3147-3154.
  13. Gromski, P. S.,Correa, E.,Vaughan, A. A.,Wedge, D. C.,Turner, M. L.,Goodacre, R.(2014).A comparison of different chemometrics approaches for the robust classification of electronic nose data.Analytical and Bioanalytical Chemistry,406(29),7581-7590.
  14. Gu, X.,Ni, T.,Wang, H.(2014).New fuzzy support vector machine for the class imbalance problem in medical datasets classification.The Scientific World Journal
  15. Hachesu, P. R.,Ahmadi, M.,Alizadeh, S.,Sadoughi, F.(2013).Use of data mining techniques to determine and predict length of stay of cardiac patients.Healthcare informatics research,19(2),121-129.
  16. Hägg, S.,Thorn, L. M.,Forsblom, C. M.,Gordin, D.,Saraheimo, M.,Tolonen, N.(2014).Different risk factor profiles for ischemic and hemorrhagic stroke in type 1 diabetes mellitus.Stroke,45(9),2558-2562.
  17. He, H.,Garcia, E. A.(2009).Learning from imbalanced data.IEEE Transactions on Knowledge and Data Engineering,21(9),1263-1284.
  18. Hsieh, M. C.,Hsieh, Y. T.,Cho, T. J.,Chen, J. F.,Lin, S. D.,Chen, H. C.,Tu, S. T.(2011).Remission of diabetic nephropathy in type 2 diabetic Asian population: role of tight glucose and blood pressure control.European Journal of Clinical Investigation,41(8),870-878.
  19. Huang, J. F.,Chuang, W. L.,Dai, C. Y.,Ho, C. K.,Hwang, S. J.,Chen, S. C.,Lin, Z. Y.,Wang, L. Y.,Chang, W. Y.,Yu, M. L.(2006).Viral hepatitis and proteinuria in an area endemic for hepatitis B and C infections: another chain of link?.Journal of Internal Medicine,260(3),255-262.
  20. Huang, W.,Huang, J.,Liu, Q.,Lin, F.,He, Z.,Zeng, Z.,He, L.(2014).Neutrophil-lymphocyte ratio is a reliable predictive marker for early‐stage diabetic nephropathy.Clinical Endocrinology,82(2),229-233.
  21. Ito, S.(2010).Treatment strategies according to the stage of diabetic nephropathy.Nihon Rinsho,68(suppl 9),465-471.
  22. Ji, J.,Ling, X. B.,Zhao, Y.,Hu, Z.,Zheng, X.,Xu, Z.,Wen, Q.,Kastenberg, Z. J.,Li, P.,Abdullah, F.,Brandt, M. L.,Ehrenkranz, R. A.,Harris, M. C.,Lee, T. C.,Simpson, B. J.,Bowers, C.,Moss, R. L.,Sylvester, K. G.(2014).A data-driven algorithm integrating clinical and laboratory features for the diagnosis and prognosis of necrotizing enterocolitis.PloS One,9(2)
  23. Khalilia, M.,Chakraborty, S.,Popescu, M.(2011).Predicting disease risks from highly imbalanced data using random forest.BMC Medical Informatics and Decision Making,11(1),51.
  24. Kitabchi, A. E.,Umpierrez, G. E.,Murphy, M. B.,Barrett, E. J.,Kreisberg, R. A.,Malone, J. I.,Wall, B. M.(2004).Hyperglycemic crises in diabetes.Diabetes Care,27(Suppl 1),94-102.
  25. Levey, A. S.,Coresh, J.(2012).Chronic kidney disease.Lancet,379(9811),165-180.
  26. Mitchell, T. J.,Beauchamp, J. J.(1988).Bayesian Variable Selection inLinear Regression.Journal of the American Statistical Association,83(404),1023-1032.
  27. Mohammadzadeh, F.,Noorkojuri, H.,Pourhoseingholi, M. A.,Saadat, S.,Baghestani, A. R.(2015).Predicting the probability of mortality of gastric cancer patients using decision tree.Irish Journal of Medical Science,184(2),277-284.
  28. Shin, K. S.,Lee, T. S.,Kim, H. J.(2005).An application of support vector machines in bankruptcy prediction model.Expert Systems with Applications,28(1),127-135.
  29. Spaleniak, S.,Korzeniewska-Dyl, I.,Moczulski, D.(2014).Serum uric acid concentration is associated with early changes of glomerular filtration rate in patients with diabetes type 1 without increased albumin excretion.Polski Merkuriusz Lekarski: Organ Polskiego Towarzystwa Lekarskieg,37(220),217-220.
  30. Tabaei, B. P.,Herman, W. H.(2002).A multivariate logistic regression equation to screen for diabetes development and validation.Diabetes Care,25(11),1999-2003.
  31. Tolonen, N.,Forsblom, C.,Mäkinen, V. P.,Harjutsalo, V.,Gordin, D.,Feodoroff, M.,Sandholm, N.,Thorn, L. M.,Wadén, J.,Taskinen, M. R.,Groop, P. H.(2014).Different lipid variables predict incident coronary artery disease in patients with type 1 diabetes with or without diabetic nephropathy: The FinnDiane Study.Diabetes Care,37(8),2374-2382.
  32. Vapnik, V. N.(2000).The Nature of Statistical Learning Theory.NY:Springer Press.
  33. Wolfsdorf, J.,Craig, M. E.,Daneman, D.,Dunger, D.,Edge, J.,Warren Lee, W. R.,Rosenbloom, A.,Sperling, M. A.,Hanas, R.(2007).Diabetic ketoacidosis.Pediatric Diabetes,8(1),28-43.
  34. Worku, D.,Hamza, L.,Woldemichael, K.(2010).Patterns of diabetic complications at Jimma university specialized hospital, southwest ethiopia.Ethiopian Journal of Health Sciences,20(1),33-39.
  35. Yang, W. C.,Hwang, S. J.(2008).Incidence, prevalence and mortality trends of dialysis end-stage renal disease in Taiwan from 1990 to 2001: the impact of national health insurance.Nephrology Dialysis Transplantation,23(12),3977-3982.
  36. Yang, W. C.,Hwang, S. J.,Chiang, S. S.,Chen, H. F.,Tsai, S. T.(2001).The impact of diabetes on economic costs in dialysis patients: experiences in Taiwan.Diabetes Research and Clinical Practice,,54(Suppl 1),47-54.
  37. Yen, S. J.,Lee, Y. S.(2009).Cluster-based under-sampling approaches for imbalanced data distributions.Expert Systems with Applications,36(3),5718-5727.
  38. Young, B. A.,Lin, E.,Von Korff, M.,Simon, G.,Ciechanowski, P.,Ludman, E. J.,Everson-Stewart, S.,Kinder, L.,Oliver, M.,Boyko, E. J.,Katon, W. J.(2008).Diabetes complications severity index and risk of mortality, hospitalization, and healthcare utilization.The American Journal of Managed Care,14(1),15-23.
  39. Yu, R.,Abdel-Aty, M.(2013).Utilizing support vector machine in real-time crash risk evaluation.Accident Analysis & Prevention,51,252-259.
  40. Zambrano-Galván, G.,Reyes-Romero, M. A.,Lazalde, B.,Rodríguez-Morán, M.,Guerrero-Romero, F.(2014).Risk of microalbuminuria in relatives of subjects with diabetic nephropathy: a predictive model based on multivariable dimensionality reduction approach.Clinical Nephrology,83(2),86-92.