题名

個人化分散式大資料開發平台之研發與工程應用

并列篇名

DEVELOPMENT AND APPLICATION OF PERSONAL HADOOP-BASED BIG DATA PLATFORM

DOI

10.6652/JoCICHE.201806_30(2).0003

作者

吳建衡(Gary Wu);林聖峯(Franco Lin);張文鎰(Wen-Yi Chang);蔡惠峰(Whey-Fone Tsai);林錫慶(Shi-Ching Lin);楊朝棟(Chao-Tung Yang)

关键词

大資料 ; 大資料運算叢集 ; 在地計算 ; 分散式編程模型 ; 工程應用 ; big data ; hadoop ; in-place computation ; mapreduce ; engineering application

期刊名称

中國土木水利工程學刊

卷期/出版年月

30卷2期(2018 / 06 / 01)

页次

107 - 120

内容语文

繁體中文

中文摘要

近年來,大資料及資料挖掘技術為熱門的研究領域,不少跨國企業,如英特爾、谷歌及阿里巴巴等,每年均投放大量資源,挖掘和分析大資料,從而調整經營策略。環境監測與運算大資料分析的需求已快速成長,但開發者卻缺少大資料程式開發測試平台,若要自行架設分散式大資料運算叢集並不容易,因此本研究利用虛擬環境建置技術軟體,研發一套個人化大資料開發平台的虛擬機器,能在單一主機如個人電腦上快速建立分散式叢集系統及其編程模型開發環境。針對系統效能檢測,本研究以標準計字數案例進行各種相關效能分析,其結果顯示,在程式開發測試階段,使用一加三虛擬機器分散式運算之規格可為初學者工程人員極佳之測試與訓練平台。最後,本研究以河川環境監測與模式運算兩個應用測試案例來說明本研究大資料開發平台系統的大資料分析技術,其中,河川流場影像辨識案例能說明大資料開發平台分散式儲存特性以及分散式管理與平行計算原理;而二維水理模式應用案例則說明利用程式串流技術,能讓土木水利界常用之Fortran程式直接轉換進行分散式管理與平行計算的方法。因此,本研究所提出之個人化大資料開發平台與兩個應用測試案例將可做為國內大資料研發應用之有力工具,協助加快解決土木水利工程應用問題。

英文摘要

Big data and data mining technology is getting much more popular in recent years. Many world-class corporations such as Intel, Google and Alibaba invest large amounts of financial and manpower resources to perform big data analysis and data mining in order to assist the decision making and business strategy. The demand for big data analytics associated with environmental monitoring and model simulation has grown rapidly. However, many developers lack a big data platform for programing and testing because a distributed Hadoop cluster is not easily built. Hence, the present study utilized virtual environment technology to establish the personal Hadoop-based big data platform, which can replicate virtual machines on a single machine and provide an environment for data management and computing programing. Regarding the performance benchmark, the standard WordCount case was employed to analyze the performance. The result shows that using the distributed 1 + 3 virtual machines could be an ideal platform of code programing and testing for beginners with civil engineering background. In the end, two application cases are given to illustrate the big data analytics techniques in the developed big data platform. One is the flow image recognition for river velocity measurement, which explains the storage characteristics in the special designed file system and the distributed computing concept in data management and computing programing. The other is the two-dimensional hydraulic model simulation, which introduces the way to use native Fortran code for data management and computing programing by the streaming technique. Thus, the proposed big data platform with virtual machine capability as well as two application cases could be powerful tools to facilitate fast solving civil and hydraulic engineering problems regarding big data issues.

主题分类 工程學 > 土木與建築工程
工程學 > 水利工程
工程學 > 市政與環境工程
参考文献
  1. https://www.VirtualBox.org/
  2. https://cloud.google.com/hadoop/
  3. https://azure.microsoft.com/en-us/services/hdinsight/
  4. 國網中心教育訓練網,https://edu.nchc.org.tw/course/index.asp
  5. https://aws.amazon.com/tw/emr/
  6. https://wiki.apache.org/hadoop/PoweredBy
  7. https://www.most.gov.tw/
  8. http://hicloud.hinet.net/hicloud_caas_about.html
  9. https://eclipse.org/
  10. 國網中心大資料平台,http://www.nchc.org.tw/tw/inner.php?CONTENT_ID=744
  11. Liferay Portal,https://www.liferay.com
  12. http://hadoop.apache.org/
  13. https://hortonworks.com/products/sandbox/
  14. FTP Site (IP = 140.110.20.15 ; Port = 21; Anonymous)
  15. Alham, N. K.,Li, M.,Liu, Y.,Hammoud, S.(2011).A MapReduce-based distributed SVM algorithm for automatic image annotation.Computers and Mathematics with Applications,62,2801-2811.
  16. Al-Hamodi, Arkan a. G.,Lu, Songfeng,Alsalhi, Yahya E. A.(2016).An enhanced frequent pattern growth based on mapreduce for mining association rules.International Journal of Data Mining & Knowledge Management Process (IJDKP),6(2),19-28.
  17. Ayma, V. A.,Ferreira, R. S.,Happ, P.,Oliveira, D.,Feitosa, R.,Costa, G.,Plaza, A.,Gamba, P.(2015).Classification algorithms for big data analysis, a map reduce approach.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences,XL-3/W2,17-21.
  18. Chang, W. Y.,Lin, F.,Tsai, W. F.,Liao, T. S.,Lai, J. S.,Loh, C. H.(2016).Quick LPPIV measurement using Android devices.12th International Conference on Hydroinformatics (HIC 2016),Inchon, Korea:
  19. Guo, J.,Rao, R.(2011).Large-scale text analysis based on UIMA and cloud computing.Energy Procedia,13,6696-6703.
  20. He, Q.,Wang, Q.,Zhuang, F.,Tan, Q.,Shi, Z.(2011).Parallel CLARANS clustering based on mapreduce.Energy Procedia,13,3269-3279.
  21. Lin, F.,Chang, W. Y.,Lee, L. C.,Hsiao, H. D.,Tsai, W. F.,Lai, J. S.(2013).Applications of image recognition for real-time water level and surface velocity.2013 IEEE International Symposium on Multimedia (ISM 2013),Anaheim, California, USA:
  22. Markonis, Dimitrios,Schaer, Roger,Eggel, Ivan,Müller, Henning,Depeursinge, Adrien(2015).,未出版
  23. Muste, M.,Fujita, I.,Hauet, A.(2008).Large-scale particle image velocimetry for measurements in riverine environments.Water Resources Research,44,W00D19.
  24. Muste, M.,Ho, H. C.,Kim, D.(2011).Considerations on direct stream flow measurements using video imagery: Outlook and research needs.Journal of Hydro-environment Research,5,289-300.
  25. Nanaware, Ashwini,Barapatre, Harish(2016).Video conversion in different format using mapreduce on hadoop.International Journal of Application or Innovation in Engineering & Management (IJAIEM),5(12),78-81.
  26. Rajak, Roshan,Raveendran, Deepu,Bh, Maruthi Chandrasekhar,Medasani, Shanti Swarup(2015).High resolution satellite image processing using hadoop framework.2015 IEEE International Conference on Cloud Computing in Emerging Markets
  27. Tsai, W. F.,Chen, B.,Chang, J. Y.,Lin, F. P.,Chang, C. H.,Sun, C. Y.,Su, W. R.,Chen, M. F.,Shih, D. S.,Chen, C. H.,Lin, S. C.,Yu, S. J.(2013).Application of near real-time and multiscale three dimensional earth observation platforms in disaster prevention.International Journal of Automation and Smart Technology,1(2),35-50.
  28. Wu, T.Y.,Chen, C. Y.,Kuo, L. S.,Lee, W. T.,Chao, H. C.(2012).Cloud-based image processing system with priority-based data distribution mechanism.Computer Communications,35,1809-1818.
  29. Zhang, J.,Li, T.,Ruan, D.,Gao, Z.,Zhao, C.(2012).A parallel method for computing rough set approximations.Information Sciences,194,209-223.
  30. 李正國、蔡惠峰、林錫慶、孫嘉陽、張宏生、蘇文瑞、傅金城、張智昇、張駿暉、林聖琪,「災害管理資訊平台的發展雛型與個案應用」,台灣災害管理學會第十一期電子報災害管理科技與知識專欄 (2013)。
  31. 林聖峯,張文鎰,蔡惠峰,廖泰杉,賴進松,羅俊雄。雷射定位質點影像流速與水位整合量測系統。中國土木水利工程學刊
  32. 城田真琴(2013).大數據的獲利模式.經濟新潮社出版.
  33. 胡世忠(2013).雲端時代的殺手應用─Big Data 海量資料分析.天下雜誌.
  34. 張文鎰,賴進松,王聖川,洪國展,游騰一,李隆正,郭文達,蔡惠峰(2013)。二維水理多模式平台之研發與應用。中國土木水利工程學刊,25(1),21-32。
  35. 陸嘉恆(2014).Hadoop 實戰技術手冊.佳魁資訊.
  36. 蔡惠峰,李隆正,林聖峰,張文鎰,林錫慶(2013)。環境與災防海量資料應用系統之架構、範例與挑戰。國土資訊系統通訊期刊,88,32-45。
  37. 蔡惠峰,張哲豪,張文鎰,連和政,黃良雄,李光敦(2013)。開放式洪水預報與決策資訊整合平台之應用。中國土木水利工程學刊,25(3),211-221。
  38. 蔡惠峰,張哲豪,張文鎰,連和政,黃良雄,李光敦(2013)。開放式洪水預報與決策資訊整合平台之應用。中國土木水利工程學刊,25(3),211-221。