题名

缺失資料在因素分析上的處理方法之研究

并列篇名

Missing Data Techniques for Factor Analysis

作者

王鴻龍(Hong-Long Wang);楊孟麗(Meng-Li Yang);陳俊如(Chun-Ju Chen);林定香(Ting-Hsiang Lin)

关键词

台灣教育長期追蹤資料庫 ; 缺失資料 ; 探索式因素分析 ; 蒙第卡羅-馬可夫鏈(MCMC)插補法 ; 邏輯斯迴歸插補法 ; TEPS ; missing data ; exploratory factor analysis ; MCMC imputation ; logistic regression imputation

期刊名称

教育科學研究期刊

卷期/出版年月

57卷1期(2012 / 03 / 01)

页次

29 - 50

内容语文

繁體中文

中文摘要

因素分析常用來研究問卷及量表。當資料缺失過多或缺失機制為非完全隨機時,分析所得的共同因素個數或因素負荷常有偏差。本研究使用「台灣教育長期追蹤資料庫」,將其中的完整資料視為基準資料,並根據原有缺失結構,建構一至五倍缺失比率的資料集,以探討因素分析對缺失插補的敏感度。研究者比較了四種缺失處理法,包括:可用個體法、完整個體法、邏輯斯迴歸插補法與蒙第卡羅-馬可夫鏈(Monte Carlo Markov Chain, MCMC)插補法。結果顯示,缺失比率愈高時,所估計出來的變異數矩陣與基準資料的矩陣差異愈大。可用個體法在缺失比率較高時,萃取的共同因子的個數比基準資料多。在因素負荷上,可用個體法的誤差最嚴重,而完整個體法雖然和其他兩種插補法的誤差接近,不過會因缺失比率的增加與基準的誤差而隨之變大。研究者建議在缺失比率20%~30%或以上時,使用邏輯斯迴歸插補法或是蒙第卡羅-馬可夫鏈插補法後再進行因素分析會有較小的誤差。

英文摘要

Factor analysis is frequently employed to analyze scales and questionnaires. However, when the proportion of missing data is high or the missing data are not random, the number of factors extracted can be biased. We used the Taiwan Education Panel Survey (TEPS) and constructed 5 data sets with different missing proportions to assess the effects of missingness on factor analysis imputation. Complete observed data were used as a baseline for comparison. We compared the 4 treatments: available case method (AC), the complete case method (CC), MCMC single imputation (MCMC), and step-wise logistic regression single imputation (LR). The results show that the higher the missing proportion, the greater the discrepancy between the covariance matrix of the constructed data set and that of the baseline. For the AC method, the higher the proportion of missing data, the more the number of extracted factors exceeds that of the baseline. The AC method possessed the largest bias in factor loadings. The bias in factor loading of the CC method increased as the missing portion also increased. Thus, we recommend not applying the list-wise deletion method for factor analysis when the missing proportion is 20% or more.

主题分类 社會科學 > 教育學
参考文献
  1. Allison, P. D.(2003).Missing data techniques for structural equation modeling.Journal of Abnormal Psychology,112(4),545-557.
  2. Allison, P. D.(2000).Multiple imputation for missing data: A cautionary tale.Sociological Methods and Research,28(3),301-309.
  3. Anderson, T. W.(2003).An introduction to multivariate statistical analysis.New York, NY:John Wiley & Sons.
  4. Bernaards, C. A.,Sijtsma, K.(2000).Influence of imputation and EM methods on factor analysis when item non-response in questionnaire data is non-ignorable.Multivariate Behavioral Research,35(3)
  5. Brown, C. H.(1983).Asymptotic comparison of missing data process for estimating factor loadings.Psychometrika,48(2),269-291.
  6. Dempster, A.,Laird, N.,Rubin, D.(1977).Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society, Series B,39(1),1-38.
  7. Elena, E. D.(2008).An overview of prevention and correction methods for non-response in surveys.Analele Stiintifice ale Universitatii "Alexandru Ioan Cuza" din Iasi - Stiinte Economice,55,371-380.
  8. Enders, C. K.(2010).Applied missing data analysis.New York, NY:Guildford Press.
  9. Enders, C. K.,Bandalos, D. L.(2001).The relative performance of full information maximum likelihood estimation for missing data in structural equation models.Structural Equation Modeling: A Multidisciplinary Journal,8(3),430-457.
  10. Enders, C. K.,Peugh, J. L.(2004).Using an EM covariance matrix to estimate structural equation models with missing data: Choosing an adjusted sample size to improve the accuracy of inferences.Structural Equation Modeling: A Multidisciplinary Journal,11(1),1-19.
  11. Gilks, W.,Richardson, S.,Spiegelhalter, D.(1995).Markov Chain Monte Carlo in practice.London, UK:Chapman and Hall.
  12. Kamakura, W. A.,Wedel, M.(2000).Factor analysis, missing data, discrete variables, data fusion, item non-response.Journal of Marketing Research,37(4),490-498.
  13. Liu, C.,Rubin, D. B.(1998).Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data.Statistica Sinica,8,729-747.
  14. McArdle, J. J.(1994).Structural factor analysis experiments with incomplete data.Multivariate Behavioral Research,29(4),409-454.
  15. McLachlan, G.,Krishnan, T.(1997).The EM algorithm and extensions.New York, NY:John Wiley & Sons.
  16. Rubin, D. B.(1987).Multiple imputation for nonresponse in surveys.New York, NY:John Wiley & Sons.
  17. Rubin, D. B.,Thayer, D. T.(1982).EM algorithms for ML factor analysis.Psychometrika,47(1),67-76.
  18. Schafer, J. L.(1997).Analysis of incomplete multivariate data.New York, NY:Chapman & Hall.
  19. Schafer, J. L.,Graham, J. W.(2002).Missing data: Our view of the state of the art.Psychological Methods,7(2),147-177.
  20. Schafer, J. L.,Olsen, M. K.(1998).Multiple imputation for multivariate missing-data problems: A data analyst's perspective.Multivariate Behavioral Research,33(4),545-571.
  21. Yang, M.-L.,Tam, T.(2004).Mental health inequality in the adolescent society: Family background and the paradox of academic success in Taiwan.conference on Social Stratification, Mobility, and Exclusion, the Research Committee on Social Stratification and Mobility (RC28) of the International Sociological Association,Neuchatel, Switzerland:
  22. 張苙雲(2009)。中央研究院調查報告中央研究院調查報告,臺北市=Taipei, Taiwan:中央研究院=Academic Sinica。
被引用次数
  1. 龔心怡、李靜儀(2015)。影響國中經濟弱勢學生之學業表現與中輟傾向之因素:以「脈絡-自我-行動-結果」之動機發展自我系統模式為取向。教育科學研究期刊,60(4),55-92。
  2. 林俊瑩(2016)。學科補習之動態變化對學習成就族群落差的影響:以屏東縣小學生為例。教育研究與發展期刊,12(4),23-56。
  3. 阮孝齊(2015)。國中學生學校歸屬感影響模式之研究。當代教育研究季刊,23(3),81-123。
  4. 詹秀雯、張芳全(2018)。基隆市國中生學習成就影響因素之縱貫性研究。臺北市立大學學報:教育類,49(2),1-32。
  5. 張明麗,高碩亨,胡家珮(2019)。學前與中小學教師工作-家庭衝突、工作-家庭互利、工作滿意、家庭生活滿意對身體健康狀況與快樂感之影響。臺北市立大學學報.教育類,50(2),33-57。
  6. 趙淑賢,翁麗雀,唐婉如(2017)。探討醫院培訓課程對專科護理師角色能力的效益。台灣專科護理師學刊,3(1),30-40。
  7. (2020)。弱勢照顧補助措施對幼兒認知發展的影響評估。臺東大學教育學報,31(2),1-38。
  8. (2023)。親職參與和幼兒語言能力關係:家庭學習環境的中介效果。高雄師大學報:教育與社會科學類,55,1-25。
  9. (2024)。應用計畫行為理論探究學生作弊行為之縱貫分析。師資培育與教師專業發展期刊,17(1),69-99。