题名 |
Big Data Analytics for the Association between the Ambient Air Pollution and Incidence of Cardiovascular Diseases Hospitalization |
DOI |
10.29428/9789860544169.201801.0174 |
作者 |
Chien-Lung Chan;Jyun-Yun Lu;Chiung-Yi Wu;Ren-Hao Pan |
关键词 | |
期刊名称 |
NCS 2017 全國計算機會議 |
卷期/出版年月 |
2017(2018 / 01 / 01) |
页次 |
930 - 935 |
内容语文 |
英文 |
中文摘要 |
Big Data Analytics was conducted to investigate whether ambient air pollution was associated with increased risk of cardiovascular disease hospitalization. The data sources come from the National Health Insurance Research Database (NHIRD), Environmental Protection Department's 〞Air Quality Monitoring Data〞 and Taiwan Typhoon and Flood Research Institute's 〞Atmospheric Research Database〞. We constructed predictive model of cardiovascular disease hospitalization by using four kinds of data mining methods with Hadoop distributed data processing platform. Time stratified case-crossover design was used to assess the association of the level of air pollutants exposure preceding each acute cardiovascular disease hospitalization event. PM_(10), O_3 and CO turned out to be the most significant predictive factors of cardiovascular disease hospitalizations. Furthermore, we constructed and compared four kinds of prediction models - Random Forest, Support Vector Machine, Decision Tree and Logistic Regression. Random Forest had the best AUC on monthly-adjusted data. The accuracy was up to 88%, which was 1.7 times of traditional Logistic Regression, 11% higher than Decision Tree, and 4% higher than Support Vector Machine. |
主题分类 |
基礎與應用科學 >
資訊科學 |