题名

比較三種資料探勘演算法預測子宮頸癌五年存活的外部通用性效能

并列篇名

Predicting Cervical Cancer Survivability: A Comparison of Three Data Mining Methods

DOI

10.7023/TJFM.200712.0222

作者

張語恬(Yu-Tieng Chang);朱基銘(Chi-Ming Chu);簡戊鑑(Wu-Chien Chien);周雨青(Yu-Ching Chou);楊燦(Tsang Yang);盧瑜芬(Yu-Fen Lu);白健佑(Chian-You Pai);白璐(Lu Pai);Thomas Wetter;孫建安(Chien-An Sun);羅慶徽(Ching-Hui Loh)

关键词

cervical cancer survivability ; logistic regression ; artificial neural network ; decision tree ; AUC Area Under the ROC Curve

期刊名称

台灣家庭醫學雜誌

卷期/出版年月

17卷4期(2007 / 12 / 01)

页次

222 - 238

内容语文

繁體中文

中文摘要

本研究比較類神經網路、邏輯斯迴歸及決策樹三種資料探勘演算法,使用不同診斷年份的樣本作模型訓練,對預測子宮頸癌五年存活情形的效能,並進行外部通用性(External Generalization)驗證。 本研究採用美國國家癌症研究所(NCI: National cancer Institute)所提供的流行病學調查(SELR: the Surveillance, Epidemiology, and End Results)數據中的癌症登記資料庫(CIPUD, Cancer Incidence Public-use Database),從西元1973年至西元2000年間選取156,502筆資料記錄及72個變項,經過資料清理後,留下與預測子宮頸癌五年存活較相關的18個變項,與子宮頸癌診斷年份爲1988-1996年的資料共2,022筆,依診斷年份將樣本,分成8組不同的模型訓練樣本與測試樣本,帶入類神經網路(artificial neural network)、決策樹(decision tree)以及邏輯斯迴歸(logistic regression)三種演算法造出模型,以AUC (area under the ROC curve)、準確率(accuracy),作爲演算法預測能力評估,並找出可以得到良好預測結果的模型設計。 結果顯示:內部驗證的模型預測力最好的爲類神經網路的模型1,其AUC與準確率值分別爲0.9392、0.9474。外部驗證的AUC結果,以類神經網路的模式7表現最好,其值分別爲0.6455。在內部驗證(internal validation)的AUC與準確率結果表現,類神經網路與決策樹都較邏輯斯迴歸佳。在外部驗證(external validation)的AUC結果表現,類神經網路與邏輯斯迴歸都較決策樹好。 類神經網路與邏輯斯迴歸建造的模型,有較好的外部通用性,而類神經網路與決策樹建造的模型,有較好的模型準確率。若想要得到較好的外部驗證結果,訓練樣本可以取過去的2-3年以上的資料。

英文摘要

The purpose of the study was to compare the performances of an artificial neural network (ANN), decision tree (C5), and logistic regression (LR) for predicting the 5-year survivability of cervical cancer and their external validation for generalization. The data was collected from SEER (Surveillance, Epidemiology, and End Results) of the NCI (National Cancer Institute) in the United States during the years 1973~2000. There were 156,502 cases with 72 variables. After the data was cleaned, there were 2,022 cases and 18 variables remaining during years 1988~1996. The dataset was divided into 8 categories of training sets and test sets, according to the year the patients were diagnosed. The 8 training sets were applied to three algorithms: 1) ANN, 2) C5, and 3) LR to build 8 models. The parameters of performance of the models were accuracy and AUC (Area under the ROC curve) for predicting 5-year survivability of cervical cancer patients. ANN had the best internal validation of the AUC and accuracy (AUC, 0.9392; accuracy, 0.9474) on model 1 and the best external validation of the AUC (0.6455) on model 7. ANN and C5 outperformed LR with respect to internal validation. ANN and LR both performed better than C5 in the external validation of the AUC. All in all, algorithms of ANN and LR performed better for external generalization, and algorithms of ANN and C5 performed more accurately for classification.

主题分类 醫藥衛生 > 社會醫學
参考文献
  1. 何子銘、盧瑜芬、許家瑋(2006)。運用三種資料探勘方法預測子宮頸癌存活情形之比較。台灣家醫誌,16,192-203。
    連結:
  2. 世界衛生組織
  3. Anonymity
  4. 行政院衛生署全國衛生統計資訊網:臺灣地區死因統計資料
  5. Baxt WG(1995).Application of artificial neural networks to clinical medicine.Lancet,346,135-138.
  6. Baxt WG(1994).Complexity, chaos and human physiology: the justification for non-linear neural computational analysis.Cancer Lett,77,85-93.
  7. Delen D,Walker G,Kadam A(2005).Predicting breast cancer survivability: a comparison of three data mining methods.Artif Intell Med,34,113-127.
  8. Dreiseitl S,Ohno-Machado L,Kitrler H,Vinterbo S,Billhardt H,Binder M(2001).A comparison of machine learning methods for the diagnosis of pigmented skin lesions.J Biomed Inform,34,28-36.
  9. Eftekhar B,Mohammad K,Ardebili HE,Ghodsi M,Ketabchi E(2005).Comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data.BMC Med Inform Decis Mak,5,3.
  10. Matheny ME,Ohno-Machado L,Resnic FS(2005).Discrimination and calibration of mortality risk prediction models in interventional cardiology.J Biorned inform,38,367-375.
  11. Snow PB,Kerr DJ,Brandt JM,Rodvold OM(2001).Neural network and regression predictions of 5-year survival after colon carcinoma treatment.Cancer,91,1673-1678.
  12. Terrin N,Schmid CH,Griffith JL,D`Agostino RB,Selker HP(2003).External validity of predictive models: a comparison of logistic regression, classification trees, and neural networks.J Clin Epidemiol,56,721-729.
  13. Tu JV(1996).Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes.J Clin Epidemiol,49,1225-1231.
被引用次数
  1. 蘇珉一,蕭嘉瑩,鄭婉如,馬瑞菊,林俊男,林佩璇,李佳欣(2020)。運用決策樹演算法於肝硬化重症病人死亡預測。Journal of Data Analysis,15(4),1-14。
  2. 鄭博文、游雅雯、梁玉芬、林宏茂(2012)。以資料探勘技術預測健康檢查大腸息肉之風險因子。醫務管理期刊,13(3),162-178。