题名

應用整合式多階段分類模式於肝臟疾病的患者預測之研究

并列篇名

APPLICATION OF INTEGRATED MULTI-STAGE CLASSIFICATION MODEL TO THE PREDICTION OF LIVER DISEASE PATIENTS

DOI

10.6338/JDA.202210_17(3).0001

作者

李岳樺(Yueh-Hua Lee);沈湘莉(Hsiang-Li Shen);呂奇傑(Chi-Jie Lu);周茂振(Mao-Jhen Jhou)

关键词

肝臟疾病 ; 特徵選取 ; 資料不平衡 ; 整合式預測模式 ; liver disease ; feature selection ; data imbalance ; integrated prediction model

期刊名称

Journal of Data Analysis

卷期/出版年月

17卷3期(2022 / 10 / 01)

页次

1 - 16

内容语文

繁體中文

中文摘要

肝臟疾病是由多種原因引起的複雜疾病。由於肝臟患者的症狀細微,在早期難以診斷。肝細胞的累積脂肪會與代謝綜合症、心血管疾病和2型糖尿病等重要的慢性疾病有關。在關於肝臟疾病預測的眾多文獻中,使用機器學習技術建構肝臟疾病預測模式,已廣泛的應用於肝臟議題中,然而,影響肝臟疾病的風險因子眾多,且資料組成結構具有類別不平衡(Class Imbalance)問題。為建構有效的預測模式,本研究運用印度肝臟疾病資料為研究的實證資料建構整合式預測架構。在此預測架構中,將應用機器學習中的邏輯斯迴歸(LR)、支援向量機(SVM)、多元適性雲型迴歸(MARS)和坡度提升演算法(XGBoost)分類技術與特徵選取技術的內嵌法(Lasso)、過濾法(Filter)方法;以及過採樣法(Over)、人工數據合成法(SDG)處理資料不平衡技術建構預測模式,並將所提之整合式模式與單純模式的結果進行比較。實證結果顯示,無論資料切割比例,所提之整合式預測模式的預測結果相較於單純預測模式較佳。並由最佳模式中可知,經由資料不平衡技術後再特徵選取能夠有效提升預測績效,並且所提模式能有效地建構肝臟疾病的預測模式。

英文摘要

Liver disease is a complex disease caused by many reasons. Due to the subtle symptoms of liver patients, it is difficult to diagnose at an early stage. The accumulation of fat in liver cells may be related to important chronic diseases such as metabolic syndrome, cardiovascular disease, and type 2 diabetes. The accumulation of fat in liver cells may be related to important chronic diseases such as metabolic syndrome, cardiovascular disease, and type 2 diabetes. In many literatures on liver disease prediction, the use of machine learning technology to construct liver disease prediction models has been widely used in liver issues. However, there are many risk factors affecting liver disease, and the data composition structure has a class imbalance problem. In order to construct a valid prediction model, this study used empirical data of the Indian liver disease data to this study to construct an integrated prediction framework. In this prediction framework, the logistic regression (LR), support vector machine (SVM), multivariate adaptive regression splines (MARS), and eXtreme Gradient Boosting (XGBoost) for classification technology in machine learning are applied; embedded method (Lasso) and filter method in feature selection technology; oversampling method (Over), synthetic data generation method (SDG) processing data imbalance technology to construct a prediction model, and carry out the results of the proposed integrated model and simple model compare. The empirical results show that regardless of the data cutting ratio, the prediction results of the proposed integrated prediction model are better than those of the simple prediction model. And it can be seen from the best model that feature selection after data imbalance technology can effectively improve prediction performance, and the proposed model can effectively construct a prediction model for liver disease.

主题分类 基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 管理學
参考文献
  1. Abdar, M.,Zomorodi-Moghadam, M.,Das, R.,Ting, I. H.(2017).Performance analysis of classification algorithms on early detection of liver disease.Expert Systems with Applications,67,239-251.
  2. Açıkoğlu, M.,Tuncer, S. A.(2020).Incorporating feature selection methods into a machine learning-based neonatal seizure diagnosis.Medical hypotheses,135,109464.
  3. Andrade, A.,Silva, J. S.,Santos, J.,Belo-Soares. P.(2012).Classifier approaches for liver steatosis using ultrasound images.Procedia Technology,5,763-770.
  4. Araújo, A. R.,Rosso, N.,Bedogni, G.,Tiribelli, C.,Bellentani, S.(2018).Global epidemiology of non-alcoholic fatty liver disease/non-alcoholic steatohepatitis: What we need in the future.Liver international: official journal of the International Association for the Study of the Liver,38(Suppl 1),47-51.
  5. Bidi, N.,Elberrichi, Z.(2016).Feature selection for text classification using genetic algorithms.Proceedings of the 2016 8th International Conference on Modelling, Identification and Control (ICMIC),Algiers, Algeria:
  6. Bugianesi, E.,Leone, N.,Vanni, E.,Marchesini, G.,Brunello, F.,Carucci, P.,Musso, A.,De Paolis, P.,Capussotti, L.,Salizzoni, M.(2002).Expanding the natural history of nonalcoholic steatohepatitis: From cryptogenic cirrhosis to hepatocellular carcinoma.Gastroenterology,123(1),134-140.
  7. Chalasani, N.,Younossi, Z.,Lavine, J. E.,Diehl, A. M.,Brunt, E. M.,Cusi, K.,Charlton, M.,Sanyal, A. J.(2012).The diagnosis and management of non-alcoholic fatty liver disease: practice Guideline by the American Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological Association.Hepatology,55(6),2005-2023.
  8. Chen, T.,Guestrin, C.(2016).XGBoost: A Scalable Tree Boosting System.Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,San Francisco, CA, USA:
  9. Cortes, C.,Vapnik, V.(1995).Support-Vector Networks.Machine learning,20,273-297.
  10. Fernando, H.,Wiktorowicz, J. E.,Soman, K. V.,Kaphalia, B. S.,Khan, M. F.,Shakeel Ansari, G. A.(2013).Liver proteomics in progressive alcoholic steatosis.Toxicology and applied pharmacology,266(3),470-480.
  11. Fotouhi, S.,Asadi, S.,Kattan, M. W.(2019).A comprehensive data level analysis for cancer diagnosis on imbalanced data.Journal of biomedical informatics,90,103089.
  12. Friedman, J. H.(1991).Multivariate Adaptive Regression Splines.The Annals of Statistics,19(1),1-67.
  13. Gu, X.,Ni, T.,Wang, H.(2014).New fuzzy support vector machine for the class imbalance problem in medical datasets classification.The scientific world journal,2014(9),536434.
  14. Gulia, A.,Vohra, R.,Rani, P.(2014).Liver Patient Classification Using Intelligent Techniques.International Journal of Computer Science and Information Technologies (IJCSIT),5(4),5110-5115.
  15. Islam, M. M.,Wu, C. C.,Poly, T. N.,Yang, H. C.,Li, Y. J.(2018).Applications of Machine Learning in Fatty Live Disease Prediction.Studies in health technology and informatics,247,166-170.
  16. Kim, Y. S.,Sohn, S. Y.,Kim, D. K.,Kim, D.,Paik, Y. H.,Shim, H. S.(2003).Screening test data analysis for liver disease prediction model using growth curve.Biomedicine & pharmacotherapy,57(10),482-488.
  17. Kumar, Y.,Sahoo, G.(2013).Prediction of different types of liver diseases using rule based classification model.Technology and health care: official journal of the European Society for Engineering and Medicine,21(5),417-432.
  18. Lavanya, D.,Rani, D. K. U.(2011).Analysis of feature selection with classification: Breast cancer datasets.Indian Journal of Computer Science and Engineering (IJCSE),2,756-763.
  19. Lin, R. H.(2009).An intelligent model for liver disease diagnosis.Artificial Intelligence in Medicine,47(1),53-62.
  20. Ma, H.,Xu, C.,Xu, L.,Yu, C.,Miao, M.,Li, Y.(2013).Independent association of HbA1c and nonalcoholic fatty liver disease in an elderly Chinese population.BMC gastroenterology,13,3.
  21. Matteoni, C. A.,Younossi, Z. M.,Gramlich, T.,Boparai, N.,Liu, Y. C.,McCullough, A. J.(1999).Nonalcoholic fatty liver disease: a spectrum of clinical and pathological severity.Gastroenterology,116(6),1413-1419.
  22. Naseem, R.,Khan, B.,Shah, M. A.,Wakil, K.,Khan, A.,Alosaimi, W.,Uddin, M. I.,Alouffi, B.(2020).Performance Assessment of Classification Algorithms on Early Detection of Liver Syndrome.Journal of Healthcare Engineering,2020,1-13.
  23. Sung, K. C.,Kim, S. H.(2011).Interrelationship between fatty liver and insulin resistance in the development of type 2 diabetes.The Journal of clinical endocrinology and metabolism,96(4),1093-1097.
  24. Targher, G.,Day, C. P.,Bonora, E.(2010).Risk of cardiovascular disease in patients with nonalcoholic fatty liver disease.The New England journal of medicine,363(14),1341-1350.
  25. Thirunavukkarasu, k.,Singh, A. S.,Irfan, M.,Chowdhury, A.(2018).Prediction of Liver Disease using Classification Algorithms.Proceedings of the 2018 4th International Conference on Computing Communication and Automation (ICCCA),Greater Noida, India:
  26. Tian, X.,Chong, Y.,Huang, Y.,Guo, P.,Li, M.,Zhang, W.,Du, Z.,Li, X.,Hao, Y.(2019).Using machine learning algorithms to predict hepatitis B surface antigen seroclearance.Computational and Mathematical Methods in Medicine,2019,1-7.
  27. Vapnik, V. N.(1998).Statistical Learning Theory.New York, USA:Wiley.
  28. Vijayarani, S.,Dhayanand, S.(2015).Liver disease prediction using SVM and Naïve Bayes algorithms.International Journal of Science, Engineering and Technology Research (IJSETR),4(4),816-820.
  29. Viloria, A.,Lezama, O. B. P.,Mercado-caruzo, N.(2020).Unbalanced data processing using oversampling: Machine learning.Procedia Computer Science,175,108-113.
  30. Wang, Y.,Du, Z.,Lawrence, W. R.,Huang, Y.,Deng, Y.,Hao, Y.(2019).Predicting Hepatitis B Virus Infection Based on Health Examination Data of Community Population.International journal of environmental research and public health,16(23),4842.
  31. Williams, C. D.,Stengel, J.,Asike, M. I.,Torres, D. M.,Shaw, J.,Contreras, M.,Landt, C. L.,Harrison, S. A.(2011).Prevalence of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis among a largely middle-aged population utilizing ultrasound and liver biopsy: a prospective study.Gastroenterology,140(1),124-131.
  32. Wu, C. C.,Yeh, W. C.,Hsu, W. D.,Islam, M. M.,Nguyen, P.,Poly, T. N.,Wang, Y. C.,Yang, H. C.,Jack Li, Y. C.(2019).Prediction of fatty liver disease using machine learning algorithms.Computer Methods and Programs in Biomedicine,170,23-29.
  33. Yap, B. W.,Rani, K. A.,Rahman, H. A.,Fong, S.,Khairudin, Z.,Abdullah, N. N.(2014).An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets.Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013),Singapore:
  34. Young, J. F.,Tsai, C. A.,Chen, J. J.,Latendresse, J. R.,Kodell, R. L.(2006).Database composition can affect the structure–activity relationship prediction.Journal of Toxicology and Environmental Health, Part A,69(16),1527-1540.