题名

家庭收入遺失值之插補研究

并列篇名

A Study of Imputating Missing Data for Household Income

DOI

10.6338/JDA.200608_1(4).0006

作者

梁德馨(Te-Hsin Liang);王靖怡(Chi-Yi Wang);楊雅惠(Ya-Hui Yang)

关键词

遺失值 ; 熱卡法 ; 眾數插補法 ; 多元羅吉斯迴歸法 ; 整合插補法 ; Missing Data ; Hot Deck ; Mode Imputation Method ; Multinomial Logistic Regression ; Multiple Imputation method

期刊名称

Journal of Data Analysis

卷期/出版年月

1卷4期(2006 / 08 / 01)

页次

75 - 101

内容语文

繁體中文

中文摘要

「家庭平均月收入」在許多研究中皆為重要的影響或觀察變數,但其發生項目無反應(item nonresponde)的機會很高。本研究以2006年「台灣寬頻網路使用調查」資料集作為實證資料,分別以熱卡法、眾數插補法、機率分配插補法、多元羅吉斯迴歸法及整合插補法等方法進行「家庭平均月收入」遺失值之插補研究。並利用2005年「台灣寬頻網路使用調查」資料集驗證評估結果的一致性。研究結果發現,對家庭平均月收入而言,個人教育程度及居住城鄉別為較佳的插補輔助變數。整體而言,以使用個人教育程度及居住城鄉別為解釋變數之「多元羅吉斯迴歸法」為最適插補模型;但若考慮插補後結構不變情況下,則以依個人教育程度及居住城鄉別分層之「熱卡法」為最適插補模型。

英文摘要

'Household Income' is one of the main factors that will significantly affect many social issues. Due to privacy considering, many people do not willing to answer their household income and lead to item nonresponse happen. In this research, the adaptable imputation model for household income will be exhibited. Base on the data of '2006 Survey of Internet Broadband Usage in Taiwan' and compared the imputation effects of the Hot Deck, the Mode Imputation method, the Multinomial Logistic Regression, and the Multiple Imputation method, 'the education degree' and 'town of resident' were found to be the best auxiliary variables to impute the missing data for household income. Generally, the Multinomial Logistic Regression with 'personal education degree 'and 'town of resident', has the best imputation assessment. But if considering with the goodness of fit of the imputation data structure, the Hot Deck using 'personal education degree' and 'town of resident' as auxiliary variables, will be the better imputation model. In order to evidence the universality of our conclusion, data of '2005 Survey of Internet Broadband Usage in Taiwan' was used and the result showed it has consistency.

主题分类 基礎與應用科學 > 資訊科學
基礎與應用科學 > 統計
社會科學 > 管理學
参考文献
  1. Periklis, A.,“Data Clustering Techniques”,2002.Available from : URL:http://www.cs.toronto.edu/~periklis/pubs/depth.pdf.
  2. Little, R. J. A.,Rubin D. B.(1987).Statistic Analysis with Missing Data.John Wiley & Sons..
  3. Little, R. J. A.,Rubin D. B.(2002).Statistic Analysis with Missing Data 2nd edition.John Wiley & Sons..
  4. Little,R. J. A.(1988).Missing data adjustments in large survey.Journal of Business and Economic Statistics,6,287-289.
  5. Pyle, D.(1999).Data preparation for data mining.Morgan Kaufmann Publishers.
  6. Rubin , D. B.(1986).Statistical matching using file concatenation with adjusted weights and multiple imputations.Journal of Business and Economic Statistics,4,87-94.
  7. Rubin , D. B.(1987).Multiple imputation for nonresponse in surveys.John Wiley & Sons..
  8. Sentas P.,Angelis L.(2006).Categorical missing data imputation for software cost estimation by multinomial logistic regression.Journal of Systems and Software,79(3),404-414.
  9. 林慧玲、陳正倉(2004)。基礎統計學。台北:雙葉書廊有限公司。
  10. 林曉芳(2002)。博士論文(博士論文)。國立政治大學教育學系教育心理與輔導組。
  11. 曹志弘(1999)。碩士論文(碩士論文)。國立中央大學統計研究所。
  12. 許禎元(1997)。問卷調查資料的處理與統計分析—以SPSS for Windows 7.0的處理爲例。復興岡學報,61,76-91。
  13. 陳信木、林佳瑩(1997)。調查資料之遺漏值的處置—以熱卡插補法爲例。調查研究,3,75-106。
被引用次数
  1. 謝邦彥、紀宏、宋龍華(2009)。應用資料採礦於智慧型統計資料庫系統。數據分析,4(5),197-212。