题名 |
家庭收入遺失值之插補研究 |
并列篇名 |
A Study of Imputating Missing Data for Household Income |
DOI |
10.6338/JDA.200608_1(4).0006 |
作者 |
梁德馨(Te-Hsin Liang);王靖怡(Chi-Yi Wang);楊雅惠(Ya-Hui Yang) |
关键词 |
遺失值 ; 熱卡法 ; 眾數插補法 ; 多元羅吉斯迴歸法 ; 整合插補法 ; Missing Data ; Hot Deck ; Mode Imputation Method ; Multinomial Logistic Regression ; Multiple Imputation method |
期刊名称 |
Journal of Data Analysis |
卷期/出版年月 |
1卷4期(2006 / 08 / 01) |
页次 |
75 - 101 |
内容语文 |
繁體中文 |
中文摘要 |
「家庭平均月收入」在許多研究中皆為重要的影響或觀察變數,但其發生項目無反應(item nonresponde)的機會很高。本研究以2006年「台灣寬頻網路使用調查」資料集作為實證資料,分別以熱卡法、眾數插補法、機率分配插補法、多元羅吉斯迴歸法及整合插補法等方法進行「家庭平均月收入」遺失值之插補研究。並利用2005年「台灣寬頻網路使用調查」資料集驗證評估結果的一致性。研究結果發現,對家庭平均月收入而言,個人教育程度及居住城鄉別為較佳的插補輔助變數。整體而言,以使用個人教育程度及居住城鄉別為解釋變數之「多元羅吉斯迴歸法」為最適插補模型;但若考慮插補後結構不變情況下,則以依個人教育程度及居住城鄉別分層之「熱卡法」為最適插補模型。 |
英文摘要 |
'Household Income' is one of the main factors that will significantly affect many social issues. Due to privacy considering, many people do not willing to answer their household income and lead to item nonresponse happen. In this research, the adaptable imputation model for household income will be exhibited. Base on the data of '2006 Survey of Internet Broadband Usage in Taiwan' and compared the imputation effects of the Hot Deck, the Mode Imputation method, the Multinomial Logistic Regression, and the Multiple Imputation method, 'the education degree' and 'town of resident' were found to be the best auxiliary variables to impute the missing data for household income. Generally, the Multinomial Logistic Regression with 'personal education degree 'and 'town of resident', has the best imputation assessment. But if considering with the goodness of fit of the imputation data structure, the Hot Deck using 'personal education degree' and 'town of resident' as auxiliary variables, will be the better imputation model. In order to evidence the universality of our conclusion, data of '2005 Survey of Internet Broadband Usage in Taiwan' was used and the result showed it has consistency. |
主题分类 |
基礎與應用科學 >
資訊科學 基礎與應用科學 > 統計 社會科學 > 管理學 |
参考文献 |
|
被引用次数 |