题名

應用資料探勘技術建立組合型乳癌預測模式

并列篇名

Application of data mining techniques to establish a combined predictive model of breast cancer

DOI

10.6202/THJ.202212_(18).0002

作者

林榮禾(Rong-Ho Lin);林敬順(Ching-Shun Lin);張雅婷(Ya-Ting Chang)

关键词

乳癌診斷 ; 資料探勘 ; 診斷模型 ; 組合模型 ; breast cancer ; data mining ; diagnosis model ; combined model

期刊名称

慈惠學報

卷期/出版年月

18期(2022 / 12 / 01)

页次

17 - 31

内容语文

繁體中文;英文

中文摘要

國際癌症研究機構中所示提到,乳癌發生是女性罹患率高的癌症,所以針對乳癌診斷的判定為重要研究,癌症期別是對於惡性腫瘤(癌症)的狀況進行分類,不同類別及期別對於罹患者未來的存活率影響極大。研究目的為應用資料探勘的方法,結合電腦處理,建立乳癌多數決分析預測模式,在醫師診斷時間內提高乳癌中的診斷判定的準確度,由乳癌的診斷不同期別、及治療中的方法指引獲得資料,進行資料分析,模式提供資訊供醫生在臨床上判斷期別與治療時資訊,早期診斷治療之參考。研究收集資料來自Wisconsin Diagnostic Breast Cancer(WDBC)個案收集相關資料,使用多數分類應用其倒傳遞類神經網路(Back Propagation Network, BPN)、決策樹(Decision Trees, DT)中的C5.0演算法、支援向量機(Support Vector Machines, SVM)、邏輯式迴歸(Logistic Regression, LR)、案例式推理(Case Based Reasoning , CBR)、投票組合運用預測診斷模型、建立判定乳癌多數決模型最適的輔助系統評估,預測模式建模參考。

英文摘要

Breast cancer is a prevalent form of cancer among women, and its accurate diagnosis remains a challenge in clinical settings. This study aims to improve the accuracy of a breast cancer pathologic prediction model to provide better guidance for doctors in determining the stages and treatments of the disease. The Wisconsin Diagnostic Breast Cancer (WDBC) data used in this study contains features computed from a digitized image of a fine needle aspirate of a breast mass, which describes the characteristics of the cell nuclei present in the image. Several data exploration methods were used for majority classification, including the C5.0 algorithm in the Back Propagation Network, Decision Trees, Support Vector Machines, Logistic Regression, and Case Based Reasoning. This study also implemented a Combined Voting system as an auxiliary evaluation tool to determine the most appropriate breast cancer pathologic diagnosis model.

主题分类 醫藥衛生 > 預防保健與衛生學
醫藥衛生 > 社會醫學
参考文献
  1. 林正祥,劉士嘉.(2013)。台灣老人成功與活躍老化之健康餘命探討。台灣公共衛生雜誌,32(6),562-575。
    連結:
  2. 林慧淳,葉玲玲,吳仁佑,黃達夫(2002)。以體療品質的觀點探討台灣乳癌之診療型態。台灣公共衛生雜誌,21(5),349-362。
    連結:
  3. 陳正美,徐建業,邱泓文,白其卉,吳柏動(2011)。以類神經網路及分類迴歸樹輔助肝癌病患預測存活情形。台灣公共衛生雜誌,30(5),481-493。
    連結:
  4. Akinnuwesi, B. A.,Macaulay, B. O.,Aribisala, B. S.(2020).Breast cancer risk assessment and early diagnosis using Principal Component Analysis and support vector machine techniques.Informatics in Medicine Unlocked,21,100459.
  5. AMODT, A.,PLAZA, E.(1994).Case-based reasoning: Foundational issues, methodological variations, and system approaches.Artificial Intelligence Communications,7,39-59.
  6. Breiman, L.(2001).Statistical modeling: The two cultures (with comments and a rejoinder by the author).Statistical science,16(3),199-231.
  7. Chang, L.-M.(2002)。國立中央大學資訊管理研究所=National Central University。
  8. Chen, M.-S.,Han, J.,Yu, P. S.(1996).Data mining: an overview from a database perspective.IEEE Transactions on Knowledge and data Engineering,8(6),866-883.
  9. Chen, R.-S.,Wu, R.-C.(2006).Using data mining technology to design an intelligent quality analysis control system for semiconductor packaging industry.Proceedings of the 10th WSEAS international conference on Computers
  10. Chou, YH,Tiu, CM,Hung, GS,Wu, SC,Chang, TY,Chiang, HK(2001)。用於乳腺超聲診斷的腫瘤輪廓特徵的逐步邏輯回歸分析。醫學與生物學中的超聲,27(11),1493-1498。
  11. Cooper, G. F.,Herskovits, E.(1992).A Bayesian method for the induction of probabilistic networks from data.Machine learning,9(4),309-347.
  12. Cortes, C.,Vapnik, V.(1995).Support-vector networks.Machine learning,20(3),273-297.
  13. Cruz-Ramírez, N.,Acosta-Mesa, H. G.,Carrillo-Calvet, H.,Alonso Nava-Fernández, L.,Barrientos-Martínez, R. E.(2007).Diagnosis of breast cancer using Bayesian networks: A case study.Computers in Biology and Medicine,37(11),1553-1564.
  14. Donepudi, M. S.,Kondapalli, K.,Amos, S. J.,Venkanteshan, P.(2014).Breast cancer statistics and markers.J Cancer Res Ther,10(3),506-511.
  15. Düntsch, I.,Gediga, G.(2001).Roughian: Rough information analysis.International Journal of Intelligent Systems,16(1),121-147.
  16. Fahad Ullah, M.(2019).Breast Cancer: Current Perspectives on the Disease Status.Adv Exp Med Biol,1152,51-64.
  17. Fenstermaker, L. K.(1994).Remote sensing thematic accuracy assessment.
  18. Grupe, F. H.,Mehdi Owrang, M.(1995).Data base mining discovering new knowledge and competitive advantage.Information System Management,12(4),26-31.
  19. Guo, Z.,Xu, M.,Yang, Y.,Li, Y.,Wu, H.,Zhu, Z.,Zhao, Y.(2023).CED: A caselevel explainable paramedical diagnosis via AdaGBDT.Computers in Biology and Medicine,153,106500.
  20. Haberman, S.,Pitacco, E.(2018).Actuarial Models for Disability Insurance: A multiple state approach.
  21. Han, J.,Kamber, M.(2006).Data mining: concepts and techniques.Morgan Kaufmann:University of Illinois at Urbana Champaign.
  22. Honeycutt, A. A.,Boyle, J. P.,Broglio, K. R.,Thompson, T. J.,Hoerger, T. J.,Geiss, L. S.,Venkat Narayan, K.(2003).A dynamic Markov model for forecasting diabetes prevalence in the United States through 2050.Health care management science,6(3),155-164.
  23. Hopfield, J. J.(1982).Neural networks and physical systems with emergent collective computational abilities.Proceedings of the national academy of sciences,79(8),2554-2558.
  24. Hortobagyi, G. N.,Edge, S. B.,Giuliano, A.(2018).New and Important Changes in the TNM Staging System for Breast Cancer.Am Soc Clin Oncol Educ Book,38,457-467.
  25. Hruschka, E. R., Jr.,Ebecken, N. F.(2007).Towards efficient variables ordering for Bayesian networks classifier.Data & Knowledge Engineering,63(2),258-269.
  26. Iverson, L. R.,Prasad, A. M.,Matthews, S. N.,Peters, M.(2008).Estimating potential habitat for 134 eastern US tree species under six climate scenarios.Forest ecology and management,254(3),390-406.
  27. Khandezamin, Z.,Naderan, M.,Rashti, M. J.(2020).Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier.Journal of Biomedical Informatics,111,103591.
  28. Lin, T. Y.,Cercone, N.(2012).Rough sets and data mining: Analysis of imprecise data.Springer Science & Business Media.
  29. Linoff, G. S.,Berry, M. J.(2011).Data mining techniques: for marketing, sales, and customer relationship management.John Wiley & Sons.
  30. Maughan, K. L.,Lutterbie, M. A.,Ham, P. S.(2010).Treatment of breast cancer.Am Fam Physician,81(11),1339-1346.
  31. Michael, J.,Gordon, S. L.(1997).Data mining technique for marketing, sales and customer support.New York:John Wiley&Sons INC.
  32. Mienye, I. D.,Sun, Y.,Wang, Z.(2019).Prediction performance of improved decision tree-based algorithms: a review.Procedia Manufacturing,35,698-703.
  33. Mu, T.,Nandi, A. K.(2007).Breast cancer detection from FNA using SVM with different parameter tuning systems and SOM–RBF classifier.Journal of the Franklin Institute,344(3-4),285-311.
  34. Panjer, H. H.(1987).AIDS: Survival analysis of persons testing HIV.AIDS,6,9.
  35. Pesapane , F.,Rotili, A.,Agazzi, G. M.,Botta, F.,Raimondi, S.,Penco, S, ,Cassano, E.(2021).Recent Radiomics Advancements in Breast Cancer: Lessons andPitfalls for the Next Future.Current Oncology,28(4),2351-2372.
  36. Pirikahu, S.,Lund, H.,Cadby, G.,Wylie, E.,Stone, J.(2022).The impact of breast density notification on rescreening rates within a population-based mammographic screening program.Breast Cancer Research,24(1)
  37. Pool, K. N.,Judkins, A. F.(1990).A health investment that may save your life.Cancer nursing,13(6),329-334.
  38. Rosenquist, C. J.,Lindfors, K. K.(1994).Screening mammography in women aged 40-49 years: analysis of cost-effectiveness.Radiology,191(3),647-650.
  39. Scott, D. M.(2022).Breast Cancer Screening: An Overview of Risk-specific Screening and Risk Assessment.Clinical Obstetrics and Gynecology,65(3),482-493.
  40. Siegel, R. L.,Miller, K. D.,Fuchs, H. E.,Jemal, A.(2022).Cancer statistics, 2022.Ca-a Cancer Journal for Clinicians,72(1),7-33.
  41. Su, S. Y.(2022).Nationwide mammographic screening and breast cancer mortality in Taiwan: an interrupted time-series analysis.Breast Cancer,29(2),336-342.
  42. Tam, V.,Frost, S. A.,Hillman, K. M.,Salamonson, Y.(2008).Using administrative data to develop a nomogram for individualising risk of unplanned admission to intensive care.Resuscitation,79(2),241-248.
  43. Teichgraeber, D. C.,Guirguis, M. S.,Whitman, G. J.(2021).Breast Cancer Staging: Updates in the AJCC Cancer Staging Manual, 8th Edition, and Current Challenges for Radiologists, From the AJR Special Series on Cancer Staging.AJR Am J Roentgenol,217(2),278-290.
  44. Teodorović, D.,Šelmić, M.,Mijatović-Teodorović, L.(2013).Combining casebased reasoning with Bee Colony Optimization for dose planning in well differentiated thyroid cancer treatment.Expert Systems with Applications,40(6),2147-2155.
  45. Traeger, M.,Eberhart, A.,Geldner, G.,Morin, A.,Putzke, C.,Wulf, H.,Eberhart, L.(2003).Prediction of postoperative nausea and vomiting using an artificial neural network.Der Anaesthesist,52(12),1132-1138.
  46. Trayes, K. P.,Cokenakes, S. E. H.(2021).Breast Cancer Treatment.Am Fam Physician,104(2),171-178.
  47. Tsumoto, S.(2000).Automated discovery of positive and negative knowledge in clinical databases.IEEE Engineering in Medicine and Biology Magazine,19(4),56-62.
  48. Wang, K.-M.,Wang, K.-J.,Makond, B.(2020).Survivability modelling using Bayesian network for patients with first and secondary primary cancers.Computer Methods and Programs in Biomedicine,196,105686.
  49. Wang, Y.、Zhang, H.、Li, H.、Xiong, J.、Wang, J. 和 Huang, Y. (2023) 25 (1),114-125。Wisconsin Diagnostic Breast Cancer (WDBC),http://pages.cs.wisc.edu/資料取得日期:2022/6/1.
  50. Wittekind, C.,Brierley, J. D.,Lee, A.,van Eycken, E.(2019).TNM supplement: a commentary on uniform use.John Wiley & Sons.
  51. 朱峻田,李儼庭,張峻嘉,周芷佑,陳民虹(2022)。台灣乳癌篩檢策略。家庭醫學與基層醫療,37(4),119-124。
  52. 李麗燕(2002)。臺北醫學大學護理學系碩士暨碩士在職專班。
  53. 沈明來(1998).實用多變量分析.台北:九州.
  54. 林建甫(2008).存活分析.雙葉書廊.
  55. 林榮禾,張雅婷,莊淳淩,陳鉞忠,林祐豪,林敬順,陳昌明(2021)。以資料探勘技術建立輔助乳癌診斷預測模型。中國工業工程學會年會暨學術研討會,台北:
  56. 常傳訓,陳楚杰,陳俞文,黃純文(2018)。乳癌病人乳癌篩檢利用、篩檢間隔與治療型態。北市醫學雜誌,15(3),10-18。
  57. 張弘昌(2021)。國立臺北護理健康大學。
  58. 張昭威(2010)。朝陽科技大學。
  59. 張雅婷(2008)。國立臺北科技大學。
  60. 莊東漢(1992)。行政院國家科學委員會專題研究計畫成果報告。Soldering & Surface Mount Technology,27
  61. 許瓊月(2009)。南華大學生死學研究所。
  62. 陳民虹(2005)。乳癌的流行病學特徵及危險因子。澄清醫護管理雜誌,1(1),30-38。
  63. 陳慧珠(2008)。中國醫藥大學醫務管理學研究所。
  64. 粘志鵬(2006)。國立交通大學。
  65. 黃志仁(2012)。高雄醫學大學醫學研究所。
  66. 黃其晟(2021)。乳癌基因篩檢。台灣更年期醫學會會訊,65,20-22。
  67. 黃泓智,林家玉,余清祥(2004)。癌症醫療費用之推估: 馬可夫鏈模型之應用。保險專刊,20(1),1-10。
  68. 黃浚銘(2007)。臺北科技大學商業自動化與管理研究所。
  69. 黃筱芸(2007)。嘉南藥理科技大學醫療資訊管理研究所。
  70. 楊玟霖(2015)。輔仁大學統計資訊學系應用統計碩士班=fu ren da xue tong ji zi xun xue xi ying yong tong ji shuo shi ban。
  71. 葉芷吟(2016)。國立臺北科技大學。
  72. 劉守恆(2002)。國立成功大學地球科學系。
  73. 歐鐙元(2015)。逢甲大學土地管理學系。
  74. 蔡薇茹(2012)。國立臺灣大學流行病學與預防醫學研究所。
  75. 盧瑜芬(2006)。台北,國防醫學院公共衛生研究所。
  76. 蘇妍如,吳東光,孟瑛如(2007)。應用決策樹於學習障礙鑑定之評估。Journal of Information Technology and Applications (資訊科技與應用期刊),2(2),107-115。