题名

測驗向度數評估方法的比較

并列篇名

A Performance Comparison of Test Dimensionality Assessment Methods

作者

楊彥文(Yen-wen Yang);凃柏原(Bor-yaun Twu)

关键词

向度數 ; DETECT ; NOHARM ; 平行分析 ; HULL ; dimensionality ; DETECT ; NOHARM ; Parallel Analysis ; HULL

期刊名称

教育學誌

卷期/出版年月

40期(2018 / 11 / 01)

页次

121 - 180

内容语文

繁體中文

中文摘要

本研究旨在模擬具有簡單結構以及部分複雜結構的多向度資料,透過操弄向度個數(1、2、3)、各向度能力值之間的相關(0、0.3、0.6)、各向度題數(10、20題)、以及樣本人數(250、500、1000、2000人)等條件,每個條件組合下模擬100次,以比較DETECT(Kim, 1994)、NOHARM(McDonald, 1996)、平行分析(PA; Horn, 1965)和HULL法(Ceulemans & Kiers, 2006)等四種測驗向度評估方法之表現,以各種方法正確辨認向度個數的百分比,作為比較各方法表現之標準。研究主要發現如下:(1)模擬資料分析結果顯示,當資料是單一向度時,DETECT無法順利得到一個單向度的解;資料是二向度時,因資料結構與DETECT的理論不是十分一致,因此DETECT的表現不佳,其他三種方法皆優於DETECT;當資料是簡單結構的三向度時,DETECT的表現比在二向度的情形好。向度之間的相關小於0.3時,PA的表現最佳,向度之間的相關為0.6時,DETECT和NOHARM比PA及HULL法好。如果資料是3d1的(即資料雖然是用三向度的MIRT模式產生,但因為試題箭頭指向同一個方向,實際上可說是單一向的),則各種方法的表現情形與單一向度時一樣。(2)在各向度能力值之間的相關為0.6以下,PA和HULL的表現優於DETECT和NOHARM,但是當各向度能力值之間的相關變大時,DETECT和NOHARM的表現優於PA和HULL。(3)當試題數增加時,四種方法的正確模式判斷率也會隨之提升。(4)NOHARM、PA和HULL三種方法似乎較不受樣本大小改變而影響,對DETECT而言,當樣本大小增加時,正確模式判斷率也會隨之增加。

英文摘要

The purpose of this study was to investigate the performance of four dimensionality assessment procedures, namely DETECT (Kim, 1994), NOHARM (McDonald, 1996), Parallel Analysis (Horn, 1965) and HULL method (Ceulemans & Kiers, 2006), in terms of their accuracy of identifying the numbers of dimensions given by different multidimensional data sets. With the manipulation of the number of dimensions, the correlation among the dimensions, the number of items per dimension, and the sample size, simulated responses were generated under different conditions, for each of the 100 replications per condition. The main findings were as the following: Firstly, when the data is unidimensional, DETECT is not able to obtain the proper one-factor solution. Gor the two-dimensional case, DETECT's performance is not good as expected due to the fact that some of the item response data was not generated to be within-item multidimensional. DETECT performs better in the three-dimensional case than in the two-dimensional case, because the item response data was generated to be between-item multidimensional one. Parallel aanalysis performs better than than other methods when the correlation between domain abilities is less than .3, and DETECT and NOHARM outperforms Parallel analysis and HULL when the correlation becomes .6. For the so-called 3d1 data, in which the item response was generated using a M2PL model, but all items point to the same direction in the latent space, all methods give similar results as was the unidimensional case. Secondly, the PA and HULL outperformed DETECT and NOHARM when the correlation among dimensions was 0.3 or lower, and the DETECT and NOHARM outperformed PA and HULL when the correlation was 0.6 or higher. Thirdly, as the number of items increased, the accuracy of identifying the numbers of dimensions was also increased for all procedures. Finally, sample size seem did not affect NOHARM, PA and HULL, but when sample size increases, the performace of the DETECT procedure improves.

主题分类 社會科學 > 教育學
参考文献
  1. Ackerman, T. A.(1992).A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective.Journal of Educational Measurement,29(1),67-91.
  2. Ackerman, T. A.,Gierl, M. J.,Walker, C. M.(2003).Using multidimensional item response theory to evaluate educational and psychological tests.Education Measurement: Issues and Practice,22(3),37-53.
  3. Adams, R. J.,Wilson, M.,Wang, W. C.(1997).The multidimensional random coefficients multinomial logit model.Applied Psychological Measurement,21(1),1-23.
  4. Akaike, H.(1973).Information theory and an extension of the maximum likelihood principle.2nd International Symposium on Information Theory,Budapest:
  5. Ceulemans, E.,Kiers, H. A. L.(2006).Selecting among three-mode principal component models of different types and complexities: A numerical convex HULL based method.British Journal of Mathematical and Statistical Psychology,59,133-150.
  6. Crawford, A. V.,Green, S. B.,Levy, R.,Lo, W. J.,Scott, L.,Svetina, D.,Thompson, M. S.(2010).Evaluation of parallel analysis methods for determining the number of factors.Educational and Psychological Methods,70,885-901.
  7. De Champlain, A. F.,Gessaroli, M. E.(1998).Assessing the dimensionality of item response matrices with small sample sizes and short test lengths.Applied Measurement in Education,11,231-253.
  8. De Champlain, A. F.,Tang, K. L.(1993).The effect of nonnormal ability distributions on the assessment of dimensionality.the meeting of the National Couyncil on Measurement in Education,Atlanta, GA:
  9. Fabrigar, L. R.,Wegener, D. T.,MacCallum, R. C.,Strahan, E. J.(1999).Evaluating the use of exploratory factor analysis in psychological research.Psychological Methods,4(3),272-299.
  10. Finch, H.,Habing, B.(2005).Comparison of NOHARM and DETECT in item cluster recovery: Counting dimensions and allocating items.Journal of Educational Measurement,42,149-169.
  11. Fraser, C.(1988).NOHARM II: A Fortran program for fitting unidimensional and multidimensional normal ogive models of latent trait theory.Armidale, N.S.W.:University of New England, Centre for Behavioral Studies.
  12. Gessaroli, M. E.,De Champlain, A. F.(1996).Using an approximate chi-square statistic to test the number of dimensions underlying the response to a set of items.Journal of Educational Measurement,33,157-179.
  13. Gierl, M. J.,Leighton, J. P.,Tan, X.(2006).Evaluating DETECT classification accuracy and consistency when data display complex structure.Journal of Educational Measurement,43,265-289.
  14. Hattie, J.(1985).Methodology review: assessing unidimensionality of tests and items.Applied Psychological Measurement,9,139-164.
  15. Hendriks, A. A. J.,Hofstee, W. K. B.,De Raad, B.(1999).The Five-Factor Personality Inventory (FFPI).Personality and Individual Differences,27,307-325.
  16. Horn, J. L.(1965).A rationale and test for the number of factors in factor analysis.Psychometrika,30(2),179-185.
  17. Jang, E. E.,Roussos, L. A.(2007).An investigation into the dimensionality of TOEFL using conditional covariance-based nonparametric approach.Journal of Educational Measurement,44(1),1-21.
  18. Kim, H.(1994).Urbana-Champaign, IL.,University of Illinois.
  19. Knol, D. L.,Berger, M. P. F.(1991).Empirical comparison between factor analysis and item response models.Multivariate Behavioral Research,26,457-477.
  20. Lorenzo-Seva, U.,Ferrando, P. J.(2013).FACTOR 9.2: A comprehensive program for fitting exploratory and semiconfirmatory factor analysis and IRT models.Applied Psychologic Measurement,37(6),497-498.
  21. Lorenzo-Seva, U.,Rodríguez-Fornells, A.(2006).Acquiescent responding in balanced multidimensional scales and exploratory factor analysis.Psychometrika,71,769-777.
  22. Lorenzo-Seva, U.,Timmerman, M. E.,Kiers, H. A. L.(2011).The HULL method for selecting the number of common factors.Multivariate Behavioral Research,46,340-364.
  23. McDonald, R. P.(1981).The dimensionality of tests and items.British Journal of Mathematical and Statistical Psychology,34,100-117.
  24. McDonald, R. P.,Mok, M. C.(1995).Goodness of fit in item response models.Multivariate Behavioral Research,54,483-495.
  25. Muthén, B.(1983).Latent variable structural equation modeling with categorical data.Econometrics,22,48-65.
  26. O'Connor, B. P.(2000).SPSS and SAS programs for determining the number of components using parallel analysis and Velicer's MAP test.Behavior Research Methods, Instruments, & Computers,32(3),396-402.
  27. Reckase, M. D.(1985).The difficulty of test items that measure more than one ability.Applied Psychological Measurement,9(4),401-412.
  28. Reckase, M. D.(2009).Multidimensional item response theory.New York, NY:Springer.
  29. Roussos, L. A.,Ozbek, Y. O.(2006).Formulation of the DETECT population parameter and evaluation of DETECT estimator bias.Journal of Educational Measurement,43(3),215-243.
  30. Schwarz, G.(1978).Estiating the dimension of a model.The Annals of Statistics,6(2),461-464.
  31. Stone, C. A.,Yeh, C.-C.(2006).Assessing the dimensionality and factor structure of multiple-choice exams: An empirical comparison of methods using the Multistate Bar Examination.Educational and Psychology Measurement,66(2),193-214.
  32. Stout, W.(1990).A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation.Psychometrika,55,293-325.
  33. Stout, W. F.(1987).A nonparametric approach for assessing latent trait dimensionality.Psychometrika,52,589-617.
  34. Stout, W.,Habing, B.,Douglas, J.,Kim, H. R.,Roussos, L.,Zhang, J.(1996).Conditional covariance-based nonparametric multidimensionality assessment.Applied Psychological Measurement,20,331-354.
  35. Stout. W.,Douglas. B.,Junker. B,Roussos, L.(1999).Stout. W., Douglas. B., Junker. B, & Roussos, L. (1999). DIMTEST [Computer software]. The William Stout Institute for Measurement, Champaign, IL..
  36. Svetina, D.(2011).Arizona State University.
  37. Svetina, D.,Levy, R.(2012).An overview of software for conducting dimensionality assessment in multidimensionality models.Applied Psychological Measurement,36(8),659-669.
  38. Svetina, D.,Levy, R.(2014).A framework for dimensionality assessment for multidimensional item response models.Educational Assessment,19,35-57.
  39. Takane, Y.,de Leeuw, J.(1987).On the relationship between item response theory and factor analysis of discretized variables.Psychometrika,52(3),393-408.
  40. Tanaka, J. S.,Huba, G. J.(1989).A general coefficient of determination for covariance structure models under arbitrary GLS estimation.British Journal of Mathematical & Statistical Psychology,42(2),233-239.
  41. Tate, R.(2003).A comparison of selected empirical methods for assessing the structure of responses to test items.Applied Psychological Measurement,27,159-203.
  42. Timmerman, M. E.,Lorenzo-Seva, U.(2011).Dimensionality assessment of ordered polytomous items with parallel analysis.Psychological Methods,16(2),209-220.
  43. van der Linden, W. J.(Ed.),Hambleton, R. K.(Ed.)(1996).Handbook of modern item response theory.New York, NY:Springer.
  44. Wang, M.(1985).,Iowa City, IA:University of Iowa.
  45. Weng, L. J.,Cheng, C. P(2005).Parallel analysis with unidimensional binary data.Educational and Psychological Measurement,65(5),697-716.
  46. Yu, C. H.,Popp, S. O.,DiGangi, S.,Jannasch-Pennell, A.(2007).Assessing unidimensionality: A comparison of Rasch modeling, parallel analysis, and TETRAD.Practical Assessment, Research & Evaluation,12(14)
  47. Zhang, J. M.,Stout, W.(1999).The theoretical DETECT index of dimensionality and its application to approximate simple structure.Psychometrika,64(2),213-249.
  48. Zhang, J. M.,Stout, W.(1999).Conditional covariances structure of generalized compensatory multidimensional items.Psychometrika,64(2),129-152.
  49. Zwick, W. R.,Velicer, W. F.(1986).Comparison of five rules for determining the number of components to retain.Psychological Bulletin,99,432-442.
  50. 陳榮華、吳明雄、陳心怡(2010)。新編多元性向測驗。台北:中國行為科學社。