英文摘要
|
Backgroud – Global aging trend combined with societal changes are creating population health problems and increasing health care spending. As a precaution, local policy makers have been promoting electronic medical data to help achieve five major goals of health care system: 1) improving health care quality, safety, and performance, 2) committing to patient health needs, 3) improving health care coordination, 4) improving the health of the population, and 5) ensuring privacy and security. However, in order to make these medical data to be "Meaningful Use", to expand data usage, and to create more profits, many research difficulties have to be overcome and it will not an easy task. Currently medical data is scattered in different industries, data collection is difficult, and mutual analysis is rare. Furthermore, medical records have been accumulating to big data after many years. This not only significantly impacts original plan and research, but also creates bonus innovative applications and opportunities.
Objectives – Given that the current biomedical field in big data analysis infrastructure is still seriously lagging behind current trend, researchers have to spend considerable time on constructing and organizing their data and on interpreting meaning and identifying issues with these data. To revolutionize biomedical big data analysis, this study proposes a set of methods ranging from data storage to data analysis. Based on this set of methods, two novel applications for big data were verified, 1) prompt testing of medical reported incidents, such as adverse drug reactions reported incidents, 2) timely monitoring and tracking of temporal medical events, such as monitoring of newly marketed drugs. To achieve the objectives, this set of methods must have: 1) timeliness, to quickly respond process results, 2) effectiveness, shall reach low cost reach, 3) scalability, shall allow horizontal expansion of computing power and storage capacity, 4) easy calculation, convenient for testing and calculating tracking indicators, and 5) applicability.
Methods – Unlike epidemiological research methods, problems to be studied for tracking and analysis of temporal medical events cannot be delivered in advance. This study proposes a new model, providing an operation mechanism which allows for timely tracking and monitoring of medical events and uncovering relevant information. This model contains four parts, which are: 1) source of data, namely current electronic medical data, 2) data management, including big data storage model PDMdoc, temporal medical events model TMEdoc, and tactics and management of sharded cluster, 3) processing and computing, including sharded cluster operating procedures, cloud computing MapReduce big data processing methods, and an integrated temporal event tracking analysis, 4) tracking indicator, content mainly comprising of a number of indicators, and recording patient index value for every occurrence. Among them, indicators belong to practical application level; therefore impacting whether this model can achieve timely monitoring and tracking function, the essential part lies in data management and efficiency of processing and calculation method.
Results – Complexity of the research methods in this study: 1) sharded cluster horizontal scaling and degree of parallelism is 1 unit, specifically, every time a shard is added to the cluster system, the computing power and storage capacity will both be increased by 1 unit, not affected by the number of cluster nodes, 2) network I/O, only relevant to the amount of data for search results, irrelevant to the number of cluster nodes, 3) search and disk I/O, average seek time for PDMdoc and TMEdoc are O(1) and O(logd(STMEdoc/B)), respectively, average disk I/O for seek time, rotational delay, transmission time are "O(1), O(1), O(EPDMdoc)" and "O(logd(STMEdoc/B)), O(1), O(ETMEdoc × LTMEdoc)", respectively. Statistics in experiments performed, 1) data, gathered from Taiwan NHIRD LHID2010 Dataset, containing health care data of a total of one million people for the period 1996 to 2010, 2) test system, sharded cluster containing 3 shard nodes built on MongoDB and five PCs, 3) experiments results: a) benchmarks, the times required to search diseased patients from 8 disease groups for single server system and sharded cluster range from 0.607 to 63.248 seconds and from 0.336 to 29.484 seconds, respectively, the two systems have performance ratio of 1:2.024, b) adverse drug reactions reported incidents, take Januvia drug safety information published by FDA in September, 2009 for example, the test result for odds ratio is 1.626, showing that this type of incidents had significant occurrences in Taiwan as well, c) monitoring for newly marketed drugs, system processing capacity for number of TME can exceed 140,000 per second, the daily number of drugs that can be monitored is estimated to be above tens of thousands.
|
参考文献
|
-
[1] United Nations, Department of Economic and Social Affairs, Population Division (2013). World Population Prospects: The 2012 Revision, Volume I: Comprehensive Tables ST/ESA/SER.A/336. and, Key Findings and Advance Tables. Working Paper No. ESA/P/WP.227.
連結:
-
[2] United Nations, Department of Economic and Social Affairs, Population Division (2007). World Population Prospects: The 2006 Revision, Highlights,Working Paper No. ESA/P/WP.202.
連結:
-
[7] WHO, United States of America, Statistics (2012), Available: http://www.who.int/countries/usa/en/
連結:
-
[8] Y.P. Wen, S.M. Huang, T.L. Chiang, “An analysis of the growth of healthcare expenditure in Taiwan:healthcare inflation, volume-intensity, and equity,” Taiwan J Public Health, Vol. 31, No.1, pp. 1–10, 2012.
連結:
-
[12] HITECH, Available: http://www.hitechanswers.net/
連結:
-
[14] 行政院衛生福利部, Available: http://www.mohw.gov.tw/
連結:
-
[15] W. Hersh, et al., “Health-care hit or miss?,” Nature, vol. 470, pp. 327–329, Feb. 2011.
連結:
-
[17] M.A. Musen and J.H. Bemmel, Handbook of Medical Informatics, Houten: Bohn Stafleu Van Loghum, 1999.
連結:
-
[18] R. Agarwal, G. Gao, C. DesRoches, and A. K. Jha, “Research Commentary: The Digital Transformation of Healthcare: Current Status and the Road Ahead,” Information Systems Research, 21(4), pp. 796–809, 2010.
連結:
-
[19] K. Miller, “Big Data Analytics in Biomedical Research,” Biomedical Computation Review, 2012.
連結:
-
[24] E. F. Codd, “A relational model of data for large shared data banks,” Commun. ACM, vol. 13(6), pp. 377–387, 1970.
連結:
-
[26] V. Mayer-Schonberger and K. Cukier, BIG DATA: A Revolution That Will Transform How We Live, Work, and Think., New York: Houghton Mifflin Harcourt, 2013.
連結:
-
[28] Wikipedia, “Big Data”, Available: http://en.wikipedia.org/wiki/Big_data
連結:
-
[30] Silberschatz and S. Zdonik, “Strategic directions in database systems—breaking out of the box,” ACM Comput. Surv., vol. 28(4), pp. 764–778, Dec. 1996.
連結:
-
[31] G. DeCandia, et al., “Dynamo: amazon's highly available key-value store,” ACM SIGOPS, vol. 41(6), pp. 205–220, Dec. 2007.
連結:
-
[33] A. Lakshman and P. Malik, “Cassandra: a decentralized structured storage system,” ACM SIGOPS, vol. 44(2), pp. 35–40, April 2010.
連結:
-
[35] R. Cattell, “Scalable SQL and NoSQL data stores,” ACM SIGMOD Record, vol. 39(4), pp. 12–27, Dec. 2010.
連結:
-
[36] M. Stonebraker, “SQL databases v. NoSQL databases,” Commun. ACM, vol. 53(4), pp. 10–11, April 2010.
連結:
-
[37] K.K Lee, W.C. Tang, K.S. Choi, “Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage,” Comput. Methods Programs Biomed., vol. 110(1) pp. 99–109, April 2013.
連結:
-
[38] B. G. Tudorica and C. Bucur, “A comparison between several NoSQL databases with comments and notes,” 10th RoEduNet, pp.1–5, June 2011.
連結:
-
[39] W. Zhu and M. Li, "Using MongoDB to Implement Textbook Management System instead of MySQL," ICCSN, pp. 303-305, May 2011.
連結:
-
[40] H. Chen, R. H. L. Chiang and V. C. Storey, “Business Intelligence and Analytics: From Big Data to Big Impact,” MIS Quarterly, vol. 36, no. 4, pp. 1165–1188, Dec. 2012.
連結:
-
[41] J. Boyle, “Biology must develop its own big-data systems,” Nature, 499:7, July 2013.
連結:
-
[42] W. Wang and E. Krishnan, "Big Data and Clinicians: A Review on the State of the Science," JMIR MEDICAL INFORMATICS, vol. 2, no. 1, 2014.
連結:
-
[43] D. A Grimes and K. F Schulz, “An overview of clinical research: the lay of the land,” THE LANCET, Vol 359, pp. 57–61, Jan. 2002.
連結:
-
[44] M. J. Stampfer and G. A. Colditz, “Estrogen replacement therapy and coronary heart disease: a quantative assessment of the epidemiological evidence,” Prev Med., vol. 20(1), pp. 47–63, Jan. 1991.
連結:
-
[45] D. A Lawlor, G. D. Smith and S. Ebrahim, “The hormone replacement - coronary heart disease conundrum: is this the death of observational epidemiology?,” Int. J. Epidemiology, vol. 33(3), pp. 464–467, 2004.
連結:
-
[48] A.C. Tricco, B. Pham and N. S.B. Rawson, “Manitoba and Saskatchewan administrative health care utilization databases are used differently to answer epidemiologic research questions,” J Clin Epidemiol., vol. 61(2), pp. 192–197, Feb. 2008.
連結:
-
[50] General Practice Research Database, GPRD, Available: http://www.gprd.com/
連結:
-
[52] Y.C. Chen, H.Y. Yeh, J.C. Wu, I. Haschler, T.J. Chen and T. Wetter, “Taiwan’s National Health Insurance Research Database: administrative health care database as study object in bibliometrics,” Scientometrics, vol. 86, pp. 365–380, 2011.
連結:
-
[53] Y.C. Chen, J.C. Wu, T.J. Chen and T. Wetter, “A publicly available database accelerates academic production,” BMJ, 342:d637, 2011.
連結:
-
[56] R. Bayer and E. McCreight, “Organization and maintenance of large ordered indexes,” Acta Informatica, 1:173–189, 1972.
連結:
-
[57] E. Meijer and G. Bierman, “A Co-Relational Model of Data for Large Shard Data Banks,” Comm. ACM, vol. 54, no. 4, pp. 49–58, April 2011.
連結:
-
[58] Pavlo, C. Curino, S. Zdonik, “Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems,” ACM SIGMOD, pp. 61–72, May 2012.
連結:
-
[59] Y. Liu, Y. Wang, Y. Jin, “Research on The Improvement of MongoDB Auto-Sharding in Cloud Environment,” IEEE ICCSE, pp. 851–854, July 2012.
連結:
-
[60] J. Dean, S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
連結:
-
[61] J. Dean and S. Ghemawat, “MapReduce: A Flexible Data Processing Tool,” Commun. ACM, vol. 53, no. 1, pp. 72–77, Jan. 2010.
連結:
-
[63] E.A. Brewer, “Towards robust distributed systems,” ACM, PODC, 2000.
連結:
-
[65] E. Meijer and G. Bierman, “A Co-Relational Model of Data for Large Shard Data Banks,” Commun. ACM, vol. 54, no. 4, pp. 49–58, April 2011.
連結:
-
[66] 程炯謀, 應用NoSQL資料庫建置健保資料庫之巨量資料視覺化呈現, 碩士論文, Feb. 2015.
連結:
-
[68] C.H. Lin, L.C. Huang, S.C. T. Chou, C.H. Liu, H.F. Cheng and I.J. Chiang, “Temporal Event Tracing on Big Healthcare Data Analytics,” IEEE BigData, pp. 281–287, July 2014.
連結:
-
參考文獻
-
[3] 內政部統計處, 102年底人口結構分析, 內政統計通報, Jan. 2014.
-
[4] 國家發展委員會, 中華民國人口推計(103至150年), 2014.
-
[5] OECD Health Statistics 2014 - Frequently Requested Data, Available: http://www.oecd.org/els/health-systems/OECD-Health-Statistics-2014-Frequently-Requested-Data.xls
-
[6] P. Smith, “Health system efficiency: what can health economists contribute?,” Plenary, 9th World Congress, International Health Economics Association, Sydney, 2013.
-
[9] 行政院衛生福利部, 102年版公共衛生年報, Dec. 2013.
-
[10] 行政院主計總處, 國民所得統計及國內經濟情勢展望, Aug. 2014.
-
[11] Frost & Sullivan, top 20 global mega trends and their impact on business cultures and society, 2008. Available: http://www.frost.com/prod/servlet/cpo/213016007
-
[13] HITECH, Available: http://en.wikipedia.org/wiki/Health_Information_Technology_for_Economic_and_Clinical_Health_Act
-
[16] M. Porta and J. M. Last, A Dictionary of Epidemiology, New York: Oxford University Press, 2008.
-
[20] K. Miller, “Leveraging Social Media for Biomedical Research: How Social Media Sites Are Rapidly Doing Unique Research on Large Cohorts,” Biomedical Computation Review, 2012.
-
[21] J. Manyika, M. Chui, B. Brown, and J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers, “Big data: The next frontier for innovation, competition, and productivity," McKinsey Global Institue, May 2011.
-
[22] P. Groves, B. Kayyali, D. Knott, and S. V. Kuiken, “The big data revolution in healthcare: Accelerating value and innovation," McKinsey, Jan. 2013.
-
[23] National Health Insurance Research Database (NHIRD), Available: http://nhird.nhri.org.tw/en/index.htm
-
[25] R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 5th Ed., Pearson, Addison Wesley.
-
[27] M. A. Beyer and D. Laney, The Importance of 'Big Data': A Definition, Gartner, June 2012.
-
[29] P. A. Bernstein, et al., “Future directions in DBMS research - the Laguna Beach Participants,” ACM SIGMOD Record, vol. 18(1), pp. 17–26, 1989.
-
[32] F. Chang, et al., “Bigtable: a distributed storage system for structured data,” ACM T. Comput. Syst., vol. 26, no. 2, art. 4, 2006.
-
[34] NoSQL Databases, Available: http://www.nosql-database.org/
-
[46] National Cancer Institue, CCPS Site, Available: http://cancercontrol.cancer.gov/
-
[47] SEER-Medicare Linked Database, Available: http://healthcaredelivery.cancer.gov/seermedicare/
-
[49] Clinical Practice Research Datalink, CPRD, Available: http://www.cprd.com/
-
[51] 成功大學健康資料加值應用研究中心, 健保資料發表論文搜尋, Available: http://healthdata.rsh.ncku.edu.tw/bin/home.php , http://nhipapers.idv.tw/
-
[54] MongoDB database, , Available: https://www.mongodb.org/
-
[55] 李友專, 徐建業, 郭譽申, 簡文山, 行政院衛生署「各專科電子病歷內容基本格式制定、試作與資訊技術交流」案期末成果報告(核定版), 台灣醫學資訊學會, Dec. 2006.
-
[62] Google MapReduce, OSDI’04 slides, Available: http://research.google.com/archive/mapreduce-osdi04-slides/index.html http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0007.html http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0008.html
-
[64] 陸嘉恒, 挑戰大數據, 台北, 佳魁資訊, Oct. 2013.
-
[67] FDA Januvia Tablet, Available: http://www.fda.gov/Safety/MedWatch/SafetyInformation/Safety-RelatedDrugLabelingChanges/ucm121926.htm
|