题名 |
二分类变量的缺失数据插补研究 |
并列篇名 |
Research on Imputation Methods of Binary Variable's Missing Data |
DOI |
10.6338/JDA.201310_8(5).0005 |
作者 |
胡丹丹(Dan-Dan Hu);金勇进(Yong-Jin Jin);戴明锋(Ming-Feng Dai);张喆(Zhe Zhang) |
关键词 |
二分类变量 ; 缺失数据 ; Logistic回归模型 ; 近似贝叶斯bootstrap方法 ; binary variable ; Missing data ; Logistic regression model ; Approximate Bayes bootstrap method |
期刊名称 |
Journal of Data Analysis |
卷期/出版年月 |
8卷5期(2013 / 10 / 01) |
页次 |
85 - 95 |
内容语文 |
簡體中文 |
中文摘要 |
大部分标准统计方法假设用于分析的数据是完整的,但是通常有数据缺失问题的存在,因此缺失数据成为数据分析中普遍存在和无法回避的一个问题。在社会学、经济学、人口学等学科研究中,都需要使用二分类变量进行测量,二分类变量缺失数据的研究对进一步完善人们社会行为、消费行为等方面的研究具有重要意义。本文在完全随机缺失机制下,选择使用有效的协变量从单一回归插补法和多重插补法分别对二分类变量缺失数据插补进行了研究;使用统计软件SAS进行实证数据模拟分析,比较了两种插补法的优劣,并对多重随机插补中无法定量推导的插补次数M进行了经验值的分析,为在实践中使用多重插补法提供参考的插补次数值。 |
英文摘要 |
In most cases of statistical methods, data is supposed to be complete. However, data missing is inevitable. Therefore, missing data is very common in data analysis and need to be solved. In the study of sociology, economics, demography and other subjects, binary variables are required in measuring. The analysis of binary variable's missing data is essential to perfect man's behavior, such as social behavior, consuming behavior and so on.This paper premises on completely random mechanism of data missing; studies the imputation methods of binary variable's missing data by using covariant from single regression imputation and multiple imputation; makes simulation analysis on empirical data using statistical software SAS, compared the advantages and disadvantages of two methods; and researches on imputation times M, which cannot be quantitatively deducted in multiple imputation, providing the reference of the imputation times M when multiple imputation method is used in practice. |
主题分类 |
基礎與應用科學 >
資訊科學 基礎與應用科學 > 統計 社會科學 > 管理學 |
参考文献 |
|