题名

資料採礦應用於乳癌患者之遺傳基因及生活因素探討

并列篇名

Application of Data Mining on Hereditary Genes and Behavioral Factors of Breast Cancer Patients

DOI

10.6338/JDA.200906_4(3).0008

作者

侯藹玲(Ai-Ling Hour);朱國豪(Kuo-Hao Chu);蘇志雄(Chih-Hsiung Su)

关键词

基因 ; 乳癌 ; 家族遺傳 ; 判別分析 ; Gene ; Microarray ; Breast Cancer ; Family Heredity ; T-test ; Discriminant Analysis

期刊名称

Journal of Data Analysis

卷期/出版年月

4卷3期(2009 / 06 / 01)

页次

131 - 158

内容语文

繁體中文

中文摘要

近代醫學的技術,通常都要等到發病後,才能做出診斷與治療。因此大部分的病患被發現時,其病情已經達嚴重且治癒困難狀態,而此時治癒機會也較低。利用生物晶片,配合分子醫學影像,不僅可提供細胞的生理途徑及疾病成因,也使得正常或腫瘤細胞的基因表現可以經由分子影像表現出來。除此之外,也可藉由DNA晶片技術(Microarray),大量地快速尋找候選基因,進而診斷疾病的分子層面病變,提供一個完全不一樣的醫療照顧。根據最新癌症統計資料,乳癌已經成為國內女性10大癌症的首位,據研究發現,影響乳癌的危險因子,包含了家族病史、年齡、抽菸、飲酒等,在這些危險的因子中,家族史為最顯著的因素,而有家族遺傳的人比沒有家族遺傳的人,罹患乳癌的相對危險性大約是3倍左右。此研究目的是利用乳癌病患的特性資料,從NCBI資料庫,抓取病人的54675個基因表現量,進行乳癌、家族遺傳、抽菸、飲酒、轉移等之T-test差異性比較,從中可找出59個候選基因與因遺傳而罹患乳癌最有顯著相關,利用這些基因建立31個對有無家族史之判別模型,而31個判別分析模型之整體預測能力約界在50%至60%左右,進而將測試資料帶入判別分析模型,得分類矩陣之正確率約達60%。因此在往後疾病剛萌芽的分子階段,挑選出病患的59個主要影響乳癌遺傳基因之表現量,帶入此模型來判別病人是否因家族史而罹患乳癌,如此可在早期讓病患進行乳癌的治療,為乳癌病患提供一個更完善的醫療照顧。

英文摘要

With modern medical technology, diagnosis and treatment can be made after the incidence. Therefore most of cancers that have been diagnosed must have other tests performed to determine. Once the stage is known, usually in the later stage and the cure rate is decreasing. Take advantage of biochips and the image of molecular medicine, it is not only to provide the growth of cells, the cause of disease but demonstrate normal or tumor cell gene by molecular imaging. Moreover, search a large number of candidate genes quickly by DNA chip technology (Microarray), then diagnose lesion of molecular level to provide a completely different medical care. According to the latest statistics, women get breast cancer more than any other type of cancer. It was found that risk factors for breast cancer include family history, age, smoking, drinking and so on. Family history is the most significant factor. It's about 3 times risky for family heredity got breast cancer than without family heredity.The purpose of this study is making use of the characteristics of breast cancer information from NCBI database. According to patients' 54675 gene expression, process T-test to compare the differences with breast cancer, family heredity, smoking, drinking and metastasis. It can be found 59 candidate genes are significantly related to breast cancer. Making use of these genes to built 31 discriminant models whether a family history. The overall predictive ability is in 50-60% of 31 models. Then taking testing data into the discriminant models, it is found 60 % correct rate in the classification matrix. In the disease embryonic elements stage, selected patients' 59 gene expressions impact on breast cancer. Put these gene expressions into model to determine whether the patient got breast cancer because of family history. Therefore an appropriate treatment plan can be developed.

主题分类 基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 管理學
参考文献
  1. Agrawal, R.,Srikant, R.(1994).Fast Algorithm for Mining Association Rules.Proceedings of the 20th International Conference on Very Large Databases
  2. Benjamini, Y.,Hochberg, Y.(1995).Controlling the false discovery rate: A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society,57(1),289-300.
  3. Calle, E. E.,Martin, L. M.,Thun, M. J.,Miracle, H. L.,Heath, C. W.(1993).Family history, age, and risk of fatal breast cancer.American Journal of Epidemiology,138(9),675-681.
  4. Compagni, A.,Christofori, G.(2000).Recent advances in research on multistage tumorigenesis.Br. J. Cancer,83,1-5.
  5. Dupont, W. D.,Page, D. L.(1991).Menopausal estrogen replacement therapy and breast cancer.Archives of Internal Medicine,151(1),62-72.
  6. Henderson, I. C.(1993).Risk factors breast cancer development.Cancer supplement,71,2127-2140.
  7. Parazzini, F.,Vecchia, C. L.,Francesch, S.,Bocciolone, L.(1992).Menstrual and reproductive factors and breast cancer in women with family history of the disease.International Journal of Cancer,51,677-681.
  8. Rossing, M. A.,Stanford, J. L.,Weiss, N. S.,Habel, L. A.(1996).Oral contraceptive use and risk of breast cancer in middle-aged women.American Journal of Epidemiology,144(2),161-164.
  9. Sattin, R. W.,Rubin, G. L.,Webster, L. A.,Huezo, C. M.,Wingo, P. A.,Ory, H. W.,Layde, P. M.(1985).Family history and the risk of breast cancer.JAMA,253,1908-1913.
  10. White, E.,Malon, K. E.,Weiss, N. S.,Daling, J. R.(1994).Breast cancer among young U.S. women in relation to oral contraceptive use.Journal of National Cancer Institute,86(6),505-514.
  11. Zheng, T.,Holford, T. R.,Mayne, S. T.,Owens, P. H.,Zhang, Y.,Zhang, B.,Boyle, P.,Zahm, S. H.(2001).Lactation and breast cancer risk: a case-control study in Connecticut.British Journal of Cancer,84(11),1472-1476.
  12. 吳漢銘(2008)。Microarray Analysis。中央研究院生命科學圖書館。
  13. 林真真(2007)。統計分析與使用手冊(使用R軟體)。文魁資訊。
  14. 耿直、鄔宏潘、謝邦昌、趙雅婷、蘇志雄(2002)。生物醫學統計學。鼎茂圖書。
  15. 張堯庭、朱世武、謝邦昌(2001)。資料採礦入門及應用─從統計技術看資料採礦。中國統計出版社。
  16. 陳順宇(2005)。多變量分析。華泰文化。
  17. 戴政、江淑瓊(2000)。生物醫學統計概論。瀚盧圖書。
  18. 謝邦昌(1999)。STATISTICA基本使用手冊。曉園出版社。
  19. 謝邦昌、鄭宇庭、蘇志雄、郭良芬(2007)。Excel在資料採礦上之應用。中華資料採礦協會。