题名

資料映射技術在台灣地區產業創新調查之應用

并列篇名

Data Mapping Technique in Taiwan Innovation Survey

DOI

10.6338/JDA.201002_5(1).0010

作者

江志民(Chih-Ming Chiang)

关键词

資料採礦 ; 映射模組 ; 卡方檢定 ; 羅吉斯迴歸 ; Data Mining ; Mapping technique ; Chi-square test ; Logistic regression

期刊名称

Journal of Data Analysis

卷期/出版年月

5卷1期(2010 / 02 / 01)

页次

199 - 211

内容语文

繁體中文

中文摘要

隨著日新月異的資訊發展,資料庫的建立在企業中佔有極大的部分。企業在建立資料庫時,有可能因為當初建立資料時,沒有收集相關或有可能的資訊,或是在輸入資料時有所遺漏,皆會造成資料庫的缺漏,因此要找一個良好的方法,以提高現有資料庫的完整性。本研究藉由資料映射技術找出有關輔助資料庫與目標資料庫的關連性,並藉由關連性有效的將輔助資料庫中有價值之欄位,經由其相對關係映射到目標資料庫上,以提高目標資料庫之完整性。本研究以台灣地區產業創新活動調查為例,並結合映射技術以說明映射技術的操作過程。將操作過程分為變數轉換、選擇目標欄位、映射模型,並使用卡方檢定與二元羅吉斯迴歸去進行模型的建立。透過映射技術可以得到未知的8964筆資料,在本研究中,整體的平均正確率為80.94%,可以說明映射技術能夠提高資料庫的完整性。

英文摘要

With the rapid advances in information development, the establishment of database plays an important role in business. At the beginning of databases establishing, omission in data collection or data input will both cause gaps in the database. Therefore, finding a good method to improve the completeness of database will be very important. This research finds the correlation between auxiliary database and target database by mapping technique, and mapping from valuable blanks in auxiliary database to target database through their correlation to improve the completeness of database.In this case, we take investigation of Industrial Innovation in Taiwan for instance, and combine mapping technique to show the Process of mapping technique. We divided the Process into variable transformation, target blank and mapping model selection. Besides, we use Chi-square test and Binary Logistic Regression to build a model. Through mapping technique, we could get the unknown 8964 data. In this research, the integral accuracy rate is 80.94%. That is to say, mapping technique could increase the completeness of database.

主题分类 基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 管理學
参考文献
  1. (1982).Management Innovation.
  2. Schumpeter, J. A., “The Theory of Economic Development. MA: Harvard Economic Studies”, 1934.
  3. Agresti, A.(2002).Categorical Data Analysis.New York:John Wiley.
  4. Fayyad, U.M.,Piatesky-Shapiro, G.,Smith, P.,Uthurusany, R.(1996).Advances in Knowledge Discovery and Data Mining.Cambridge:The AAAI Press.
  5. Harrell, F. E.,Lee, K. L.(1985).A comparison of the discrimination of discriminant analysis and logistic regression.Biostatistics: Statistics in biomedical, public health, and environmental sciences,Amsterdam:
  6. Holt, Kunt(1983).Product innovation Management.London:Butterworth Henemann.
  7. Lee, T.-S.,Chen, I-F.(2005).A two-stage hybrid credit scoring model using artificial neural networks and multivariateadaptive regression splines.Expert Systems With Applications,28(4),743-752.
  8. R.V.Hogg,Tanis, E.A.(1993).Probability and Statistical Inference.New York:Macmillan.
  9. West, D.(2000).Neural network credit scoring models.Computers and Operations Research,27(11-12),1131-1152.
  10. 王文青(1997)。東海大學統計研究所。
  11. 王俊毅(2001)。國立中央大學統計研究所。
  12. 彭怡青(2001)。國立台灣大學資訊工程學研究所。
  13. 黃治豪(1994)。輔仁大學資訊管理學系。