题名

Comparative Analysis and Application of Imputed Estimators for Population Mean under Stratified Unequal Probability Sampling

并列篇名

在分層不等機率抽樣下母體均數插補估計量之比較分析及應用

DOI

10.7014/SRMA.2021040001

作者

許玉雪(Esher Hsu)

关键词

nonresponse ; ratio imputation ; imputed estimator ; bias-adjusted estimator ; unequal probability sampling ; 無回應 ; 比率插補 ; 插補估計量 ; 偏誤調整估計量 ; 不等機率抽樣

期刊名称

調查研究-方法與應用

卷期/出版年月

46期(2021 / 04 / 01)

页次

7 - 54

内容语文

英文

中文摘要

With continuously increasing demand for accurate data, the sampling design of surveys has become more and more complex. Unequal probability sampling methods are therefore increasingly used in sample surveys. Item nonresponse is inevitable in survey practice. How to obtain unbiased estimation with data imputation for a complex survey is thus an important issue for research. Previous studies have presented some imputed estimators for equal probability sampling with uniform response. It would be worthwhile to explore the performance of imputed estimators applied to complex surveys, such as unequal probability sampling or different missing data mechanisms. This study aims to present imputed estimators of the population mean for survey data imputed with an auxiliary variable under a stratified unequal probability sampling design, and to compare their performance in terms of different missing data mechanisms and different levels of the correlation coefficient between the auxiliary variable and the variable of interest. By taking nonresponse and imputation into account, this study derives three imputed estimators (weighted, unweighted, and bias-adjusted imputed estimators) and their corresponding variance estimators with stratified unequal probability sampling, where missing data are imputed by ratio imputation. Six cases under different conditions (missing data mechanisms, population distribution, and sample allocation) are selected for a simulation study to compare the performance of the proposed imputed estimators in terms of relative bias and coefficient of variation. The relative bias of the variance estimators is also studied to compare the performance of the corresponding variance estimators. A practical application is performed to show how to apply the imputed estimators derived in this study to real survey data. As expected, simulation results show that the performance of the estimators varies depending on the missing data mechanisms, population distributions, and methods of sample allocation. Simulation results indicate that the estimation precision of the imputed estimator increases as the correlation between the auxiliary variable and the variable of interest increases for all three imputed estimators. The imputed estimators perform with greater stability in cases of missing completely at random (MCAR) than in cases of missing at random (MAR). Comparing the performance among the three imputed estimators, this study shows that in cases of high correlation between the auxiliary variable and the variable of interest, the proposed bias-adjusted estimator works well with stratified unequal probability sampling in reducing the estimation bias and the underestimation of mean square error (MSE) due to unweighted imputation. Moreover, the variance estimator of the bias-adjusted estimator has the smallest relative bias for estimating MSE compared with the two others. The unadjusted imputed estimator with unweighted imputation may cause estimation bias, while its corresponding variance estimators may also underestimate the MSE of the estimator. However, simulation results do not reveal that the bias-adjusted estimator performs better than the imputed estimator with weighted imputation except at a high level of correlation between the auxiliary variable and the variable of interest. In practice, an auxiliary variable which has high correlation with the variable of interest, is commonly used to impute missing values to increase estimation precision. If the survey weights are unavailable and unweighted ratio imputation is used to impute missing values, the proposed bias-adjusted estimator with the corresponding variance estimator is suggested for obtaining a better estimation.

英文摘要

調查實務上遺漏值在所難免,如何在複雜抽樣設計下結合遺漏值插補而能得到不偏估計量成為重要的研究課題。本文旨在探討分層不等機率抽樣下結合輔助變數插補遺漏值的插補估計量在不同遺漏機制(MCAR、MAR)及輔助變數與興趣變數之不同相關水準下的表現。本文在分層不等機率抽樣下結合比率插補法導出三種母體均數插補估計量(加權、未加權及偏誤調整)及其變異數估計量。利用插補估計量之相對偏誤及變異係數與其變異數估計量之相對偏誤,比較分析插補估計量的表現,並以一實例說明這些插補估計量如何應用於實際調查資料。模擬結果顯示,三個估計量的估計精確度都將隨著輔助變數和興趣變數相關性的增加而增加,插補估計量在MCAR遺漏機制表現較為穩定。本文所提偏誤調整插補估計量在輔助變數與興趣變數具有高度相關時,確可減少來自未加權的估計偏誤並降低均方誤的低估。實務上,若無權重資料可用而採未加權比率插補,本文所提的偏差調整插補估計量可用以得到較佳的估計。

主题分类 社會科學 > 社會科學綜合
参考文献
  1. Al-Jararha, Jehad M.,Sulaiman, Mazen(2020).Horvitz-Thompson Estimator Based on the Auxiliary Variable.Statistics in Transition New Series,21(1),37-53.
  2. Chen, Sixia,Haziza, David(2019).Recent Developments in Dealing with Item Non–response in Surveys: A Critical Review.International Statistical Review,87(S1),S192-S218.
  3. Cochran, William G.(1977).Sampling Techniques.New York:John Wiley & Sons.
  4. Fay, Robert E.(1991).A Design-Based Perspective on Missing Data Variance.Proceedings of the 1991 Annual Research Conference,Washington, DC:
  5. Haziza, David,Rao, Jon N. K.(2005).Inference for Domains under Imputation for Missing Survey Data.The Canadian Journal of Statistics,33(2),149-161.
  6. Haziza, David,Rao, Jon N. K.(2003).Inference for Population Means under Unweighted Imputation for Missing Survey Data.Survey Methodology,29,81-90.
  7. Hedayat, Samad,Sinha, Bikas K.(1991).Design and Inference in Finite Population Sampling.New York:John Wiley & Sons.
  8. Horvitz, Daniel G.,Thompson, Donovan J.(1952).A Generalization of Sampling without Replacement from A Finite Universe.Journal of the American Statistical Association,47(260),663-685.
  9. Hsu, Esher,Lin, Chien-Fu J.,Kuo, Chen-Meng,Wu, Nan-Min,Ma, Hsiao-Kan,Bor, Yunchang J.,Chien, Yu-Lan(2001).,Taipei:Environmental Protection Administration.
  10. Keeble, Claire,Law, Graham R.,Barber, Stuart,Baxter, Paul D.(2015).Choosing a Method to Reduce Selection Bias: A Tool for Researchers.Open Journal of Epidemiology,5,155-162.
  11. Knaub, James R.(2017).Comparison of Model-Based to Design-Based Ratio Estimators.The 2017 JSM,Baltimore, Maryland, USA:
  12. Little, Roderick J. A.(1992).Regression with Missing X’s: A Review.Journal of the American Statistical Association,87(420),1227-1237.
  13. Little, Roderick J. A.,Rubin, Donald B.(1987).Statistical Analysis with Missing Data.New York:John Wiley & Sons.
  14. Namboodiri, N. Krishnan(1978).Survey Sampling and Measurement.New York:Academic Press, Inc..
  15. Nittner, Thomas(2003).Missing at Random (MAR) in Nonparametric Regression—A Simulation Experiment.Statistical Methods & Applications,12,195-210.
  16. Rao, Jon N. K.(1966).Alternative Estimators in PPS Sampling for Multiple Characteristics.Sankhya: The Indian Journal of Statistics, Series A (1961),28(1),47-60.
  17. Rao, Jon N. K.,Shao, Jun(1992).Jackknife Variance Estimation with Survey Data under Hot Deck Imputation.Biometrika,79,811-822.
  18. Sampford, Michael R.(1967).On Sampling without Replacement with Unequal Probabilities of Selection.Biometrika,54,499-513.
  19. Särndal, Carl-Erik(1978).Design-Based and Model-Based Inference in Survey Sampling.Scandinavian Journal of Statistics,5(1),27-52.
  20. Särndal, Carl-Erik(1992).Methods for Estimating the Precision of Survey Estimates When Imputation Has Been Used.Statistics Canada,18,241-252.
  21. Shao, Jun,Steel, Philip(1999).Variance Estimation for Survey Data with Composite Imputation and Nonnegligible Sampling Fractions.Journal of the American Statistical Association,94(445),254-265.
  22. Skinner, Chris J.,Rao, Jon N. K.(2002).Jackknife Variance Estimation for Multivariate Statistics Under Hot-deck Imputation from Common Donors.Journal of Statistical Planning and Inference,102,149-167.
  23. Wheeler, David C.,VanHorn, Jason E.,Paskett, Electra D.(2007).Technical ReportTechnical Report,Georgia:Department of Biostatistics Rollins School of Public Health Emory University.