题名

雲端開放資料分析運算平台之研究-以資訊安全紀錄檔分析為例

并列篇名

Cloud Computing Platform for Open Data Analysis

作者

陳嘉玫(Chia-Mei Chen);賴谷鑫(Gu-Hsin Lai);張育涵(Yu-Han Chang)

关键词

巨量資料 ; 雲端環境 ; 平行運算 ; Big data ; cloud computing ; parallel computing

期刊名称

Electronic Commerce Studies

卷期/出版年月

15卷3期(2017 / 09 / 30)

页次

313 - 334

内容语文

繁體中文

中文摘要

隨著電子商務蓬勃發展,近年來消費者的消費模式有著重大的改變,線上交易成為消費者重要的消費通路。因此電子商務網站以及企業也成為目標式攻擊(targeted attack)或是進階持續性滲透攻擊(Advanced Persistent Threat,簡稱APT)的目標之一。攻擊者針對特定目標使用先進且客製化攻擊技術,入侵目標網路與主機,並潛伏於企業中以竊取重要資訊。為了防禦攻擊,企業部署許多資訊安全設備,如防火牆、防毒軟體、入侵偵測系統等。由於目標式攻擊多半長時間潛伏於企業網路,因此偵測系統需要關聯大量且異質的資料。傳統偵測方式,已無法處理龐大且異質的資料,因此雲端運算成為入侵偵測與分析的重要平台之一。雲端運算平台的主要目的在於分散儲存與平行運算,提升運算效能。雲端運算系統之效能則取決(1)基礎建設的效能、(2)虛擬主機的規劃、以及(3)分析演算法的優劣。本研究以雲端主機規劃角度切入,探討如何虛擬主機與儲存空間之規畫對運算效能之影響。透過某企業之實際資料,評估本研究所提出之雲端運算平台效能。本研究採用支援向量機(support vector machine)對資料關聯性分析,找出可能的攻擊行為。本研究提供虛擬機器配置參數之建議,並建立一套偵測模型。透過本研究所提出的參數,企業可以根據本研究所提出之建議,以最經濟的方式建構雲端資訊安全分析平台。

英文摘要

The convenience of emerging electronic commerce and mobile commerce has changed the customer behaviors. Online purchase has played an important role on consumer shopping. In the meantime, high profit businesses have become primary targets for attackers, so called target attacks or advanced persistent threat (APT) attacks. Attackers apply high technology skills to attack high valued organizations, such as electronic commerce services, high tech companies, and governments. To protect the security of the premise, businesses have deployed various defense mechanisms, such as firewall, anti-virus software, spam filter, and intrusion detection system. To detect targeted attacks, the intrusion detection system requires to analyze and correlate a vast amount of log files in a long time span from various defense systems. The traditional computation model, a single powerful machine, was not capable of processing such big amount of data in a timely manner. Distributed cloud computing could improve the data processing performance. There are three aspects which influence the performance of cloud computing platform: (1) the infrastructure, (2) virtual machine planning, and (3) the data analysis model. By applying the real business data, this study proposed a cloud computing platform for analyzing security data. The study gives a list of recommendation on resource allocation of virtual machine and the minimum infrastructure specification for businesses which plan to apply for cloud platform in an economic way.

主题分类 基礎與應用科學 > 資訊科學
社會科學 > 經濟學
参考文献
  1. Carasso, D. (2012). Exploring splunk. Retrieved June 2, 2016, from: http://www.nhhs.net/ourpages/auto/2011/10/7/51955419/Exploring_Splunka.pdf
  2. EMC Corporation (2011, June 28). EMC news: World's data more than doubling every Two Years-Driving big data opportunity, new IT roles. Retrieved December 1, 2016, from http://www.emc.com/about/news/press/2011/20110628-01.htm
  3. Strauch, C., Sites, U. L. S., & Kriha, W. (2011). NoSQL Databases. Lecture Notes, Stuttgart Media University..
  4. INFOVISION Inc. (n.d.). Big Data Hadoop Architecture. Retrieved December 1, 2016, from http://www.infovision.com/services/technology-solutions/big-data-analytics/service-offerings/
  5. Assunção, M. D.,Calheiros, R. N.,Bianchi, S.,Netto, M. A.,Buyya, R.(2015).Big data computing and clouds: Trends and future directions.Journal of Parallel and Distributed Computing,79-80,3-15.
  6. Aydin, G.,Hallac, I. R.,Karakus, B.(2015).Architecture and implementation of a scalable sensor data storage and analysis system using cloud computing and big data technologies.Journal of Sensors,2015,834217.
  7. Ayyalasomayajula, H.(2015).University of Houston.
  8. Bhuvaneshwar, K.,Sulakhe, D.,Gauba, R.,Rodriguez, A.,Madduri, R.,Dave, U.,Lacinski, L.,Foster, I.,Gusev, Y.,Madhavan, S.(2014).A case study for cloud based high throughput analysis of NGS data using the globus genomics system.Computational and Structural Biotechnology Journal,13,64-74.
  9. Dede, E.,Govindaraju, M.,Gunter, D.,Canon, R. S.,Ramakrishnan, L.(2013).Performance evaluation of a mongodb and hadoop platform for scientific data analysis.Proceedings of the 4th ACM workshop on Scientific cloud computing
  10. Feller, E.,Ramakrishnan, L.,Morin, C.(2015).Performance and energy efficiency of big data applications in cloud environments: A Hadoop case study.Journal of Parallel and Distributed Computing,79-80,80-89.
  11. He, P.,Zhu, J.,He, S.,Li, J.,Lyu, M. R.(2016).An evaluation study on log parsing and its use in log mining.Dependable Systems and Networks (DSN), 2016 46th Annual IEEE/IFIP International Conference
  12. Jia, B.(2010).Norway,University of Stavanger.
  13. Kambatla, K.,Kollias, G.,Kumar, V.,Grama, A.(2014).Trends in big data analytics.Journal of Parallel and Distributed Computing,74(7),2561-2573.
  14. Lin, X.,Wang, P.,Wu, B.(2013).Log analysis in cloud computing environment with Hadoop and Spark.Broadband Network & Multimedia Technology (IC-BNMT), 2013 5th IEEE International Conference on
  15. Nair, T. G.,Vaidehi, M.(2011).Efficient resource arbitration and allocation strategies in cloud computing through virtualization.2011 IEEE International Conference on Cloud Computing and Intelligence Systems
  16. Odersky, M.,Altherr, P.,Cremet, V.,Emir, B.,Maneth, S.,Micheloud, S.,Zenger, M.(2004).LAMP-REPORTLAMP-REPORT,Switzerland:Ecole Polytechnique Federale de Lausanne.
  17. Stroeh, K.,Madeira, E. R. M.,Goldenstein, S. K.(2013).An approach to the correlation of security events based on machine learning techniques.Journal of Internet Services and Applications,4,7.
  18. White, T.(2012).Hadoop: The Definitive Guide.USA:O'Reilly Media, Inc..
  19. Xiao, Z.,Chen, H.,Zang, B.(2011).A hierarchical approach to maximizing mapreduce efficiency.Proceedings of the PACT
  20. Xu, X.(2006).Adaptive intrusion detection based on machine learning: Feature extraction, classifier construction and sequential pattern prediction.International Journal of Web Services Practices,2(1-2),49-58.
  21. Zaharia, M.(2016).An Architecture for Fast and General Data Processing on Large Clusters.Berkeley:University of California at Berkeley, Electrical Engineering and Computer Sciences.
  22. Zaharia, M.,Chowdhury, M.,Franklin, M. J.,Shenker, S.,Stoica, I.(2010).Spark: cluster computing with working sets.HotCloud,10(10-10),95.
  23. 林大貴(2016)。Hadoop+Spark 大數據巨量分析與機器學習整合開發實戰。臺北市:博碩文化股份有限公司。