题名

結合品質量測模式的數據品質改善程序以降低大數據應用風險

并列篇名

Data Quality Improvement Procedure with Data Quality Measurement Model for Reducing Big Data Application Risks

作者

賴森堂

关键词

大數據 ; 數據品質 ; 量測模式 ; 前置處理 ; 改善程序 ; big data ; data quality ; quality measurement ; preprocessing ; improvement procedure

期刊名称

電腦稽核

卷期/出版年月

35期(2017 / 01 / 20)

页次

22 - 35

内容语文

繁體中文

中文摘要

多元的網路應用完全融入人們日常生活,各項活動都能透過網路達成交易,蒐集分析網路活動留下或產生的各種活動,萃取數據之商業價值可以為企業與組織提升市場競爭優勢,剖析日常活動數據是政府單位改善民眾生活品質的依據。從網路環境蒐集到的大量數據,具備數量龐大(Volume)、多元化格式(Variety)、持續且快速生成(Velocity)及難以識別真實性(Veracity)等特質,這些特質也成為大數據應用必須面對的挑戰,此外網路環境蒐集到的數據,存在許多錯誤與品質問題,數據品質缺失,將直接影響大數據的分析效率與結果,甚至造成錯誤的決策、不精確的預測、不當的規劃等危機,成為大數據分析與應用的主要風險。如何確保大數據應用的數據品質已成為企業與組織必須重視的議題,為此,本文以數據前置處理(Data Preprocessing)為基礎,提出一套數據品質量測 (Data Quality Measurement; DQM)模式,結合數據品質改善程序(DQIP)及時識別數據品質缺失且追溯出不完善的前置處理作業。在數據分析作業前,具體改善數據品質,可以有效降低大數據分析與應用的風險。

英文摘要

Diversification applications of network fully combined with the people's daily activities and life. All network activities generate and record the large amount of data that implies the business values of enterprises and organizations. Collecting, analyzing and visualizing the big data, intelligent information may be efficiently extracted. Big data applications can help enterprises enhance market competitiveness advantages, and assist government units improve the people daily life quality. However, big data collected from network and IoT (Internet of Things) environment existed many quality defects and problems to be resolved. For processing the huge amount of data, the data quality is a major factor to increase efficiency of data analysis. In addition, low quality data will directly impact the data analysis results, and may cause wrong decisions, inaccurate predication. Data quality improvement mechanism is an important procedure of big data applications to increase data quality. Data preprocessing completed, how to ensure data quality has become a concern issue of big data applications. Based on data preprocessing, this paper proposes the Data Quality Measurement (DQM) model to identify data quality defects. Combining the Data Quality Improvement Procedure (DQIP), the data quality defects can trace back to imperfect preprocessing tasks. Imperfect data preprocessing tasks need to be redone, data quality can timely got improvement and make the incensement of big data analysis efficiency and quality.

主题分类 基礎與應用科學 > 資訊科學
参考文献
  1. J. Tee, Handling the four 'V's of big data: volume, velocity, variety, and veracity, TheServerSide.com, 2013.
  2. 嚴思祺 (台灣特約記者),「台灣三星電子『寫手門』案裁罰千萬」BBC中文網,2013年 10月 24日。
  3. Dave Wagner, “The importance of big data analytics in business”, October, 2014 ,World of tech http://http://www.techradar.com/news/world-of-tech/the-importance-of-big-data-analytics-in-business-1267606/2
  4. 蔡惠如,「聘工讀生上網護航 三星挨轟」(蘋果日報/頭條要聞/╱台北報導),台灣蘋果日報2013年04月06日
  5. Bala Deshpande, “5 situations which drive data pre-processing before data mining,” 2013, http://www.simafore.com/blog/bid/116618/5-situations-which-drive-data-preprocessing-before-data-mining
  6. Cai, L.,Zhu, Y.(2015).The Challenges of Data Quality and Data Quality Assessment in the Big Data Era.Data Science Journal,14(2),1-10.
  7. Chen, C.L. Philip,Zhang, C.-Y.(2014).Data-intensive applications, challenges, techniques and technologies: A survey on Big Data.Information Sciences,275,314-347.
  8. Conte, S. D.,Dunsmore, H. E.,Shen, V. Y.(1986).Software Engineering Metrics and Models.Menlo Park:Benjamin/Cummings.
  9. Davenport, Thomas H.,Patil, D.J.(2012).Data Scientist:The Sexiest Job of the 21st Century.Harvard Business Review,October,70-76.
  10. Dong, X. L.,Srivastava, D.(2013).Big data integration.IEEE 29th International Conference on Data Engineering (ICDE)
  11. Elgendy, Nada,Elragal, Ahmed(2014).Big Data Analytics: A Literature Review Paper.Lecture Notes in Computer Science,214-227.
  12. Fan, W.(2012).Data Quality: Theory and Practice.Web-Age Information Management
  13. Fan, W.,Geerts, F.(2012).Foundations of Data Quality Management.Morgan & Claypool.
  14. Fenton, N. E.(1991).Software Metrics - A Rigorous Approach.Chapman & Hall.
  15. Galin, Daniel(2004).Software Quality Assurance.Addison-Wesley.
  16. Liu, J.,Li, J.,Li, W.,Wu, J.(2016).Rethinking big data: A review on the data quality and usage issues.ISPRS Journal of Photogrammetry and Remote Sensing,115,134-142.
  17. Lukoianova, Tatiana,Rubin, Victoria L.(2014).Veracity Roadmap: Is Big Data Objective, Truthful and Credible?.24th ASIS SIG/CR Classification Research Workshop
  18. Redman, T.(1998).The impact of poor data quality on the typical enterprise.CACM,41(2),79-82.
  19. Saha, B.,Srivastava, D.(2014).Data quality: The other face of Big Data.2014 IEEE 30th International Conference on Data Engineering (ICDE)
  20. Sarsfield, S.(2011).The Butterfly Effect of Data Quality.The Fifth MIT Information Quality Industry Symposium
  21. Taleb, Ikbal,Dssouli, Rachida,Serhani, Mohamed Adel(2015).Big Data Pre-processing: A Quality Framework.2015 IEEE International Congress on
  22. Zikopoulos, P.,Eaton, C.(2011).Understanding big data: analytics for enterprise class Hadoop and streaming data.McGraw-Hill Osborne Media.