题名

Big Data Analytics for the Association between the Ambient Air Pollution and Incidence of Cardiovascular Diseases Hospitalization

DOI

10.29428/9789860544169.201801.0174

作者

Chien-Lung Chan;Jyun-Yun Lu;Chiung-Yi Wu;Ren-Hao Pan

关键词
期刊名称

NCS 2017 全國計算機會議

卷期/出版年月

2017(2018 / 01 / 01)

页次

930 - 935

内容语文

英文

中文摘要

Big Data Analytics was conducted to investigate whether ambient air pollution was associated with increased risk of cardiovascular disease hospitalization. The data sources come from the National Health Insurance Research Database (NHIRD), Environmental Protection Department's 〞Air Quality Monitoring Data〞 and Taiwan Typhoon and Flood Research Institute's 〞Atmospheric Research Database〞. We constructed predictive model of cardiovascular disease hospitalization by using four kinds of data mining methods with Hadoop distributed data processing platform. Time stratified case-crossover design was used to assess the association of the level of air pollutants exposure preceding each acute cardiovascular disease hospitalization event. PM_(10), O_3 and CO turned out to be the most significant predictive factors of cardiovascular disease hospitalizations. Furthermore, we constructed and compared four kinds of prediction models - Random Forest, Support Vector Machine, Decision Tree and Logistic Regression. Random Forest had the best AUC on monthly-adjusted data. The accuracy was up to 88%, which was 1.7 times of traditional Logistic Regression, 11% higher than Decision Tree, and 4% higher than Support Vector Machine.

主题分类 基礎與應用科學 > 資訊科學