英文摘要
|
The purpose of this paper is to explore subscale scores estimation in two testing design situations, single testing design and equating testing design. Additionally, two new methods to estimate subscale scores are presented in this paper.
Using simulation data, this study investigates the accuracy of subscale scores estimation for different methods of estimating subscale scores. In single testing design, factors taken into consideration include the following: correlation between subscales, sample sizes, ratio of CR/MC items, numbers of subscales, and test length. In equating testing design, factors taken into consideration include the following: correlation between subscales, sample sizes, collocation of anchor items, and equating methods.
The results show that:
1. New methods of estimating subscale scores are better than other methods.
2. The estimation error decreases as correlation between subscales increases; however, the sample sizes don't impact the estimation error.
3. In single testing design, the estimation error decrease as ratio of CR/MC items increase and the estimation error decrease as test length increase.
4. In equating testing design, the collocation of anchor items do not impact the estimation error and the concurrent calibration method based on item response theory has higher accuracy than equating calibration based on classical test theory.
|
参考文献
|
-
Kelley, T. L. (1927). The interpretation of educational measurements. New York: World Book.
-
Kelley, T. L. (1947). Fundamentals of statistics. Cambridge, MA: Harvard University Press.
-
Baxter, G. P.,Ahmed, S.,Sikali, E.,Waits, T.,Sloan, M.,Salvucci, S.(2007).Technical report of the NAEP Mathematics Assessment in Puerto Rico: Focus on statistical issues (NCES 2007-462rev).Washington, DC:National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.
-
Bock, R. D.,Thissen, D.,Zimowski, M. F.(1997).IRT estimation of domain scores.Journal of Educational Measurement,34(3),197-211.
-
Brennan, R. L.(ed.)(2007).Educational measurement.New York:Macmillan.
-
Gessaroli, M. E.(2004).Using hierarchical multidimensional item response theory to estimate augmented subscores.the annual meeting of the National Council on Measurement in Education,San Diego, CA:
-
Green, B. F.,Bock, R. D.,Humphreys, L. G.,Linn, R. L.,Reckase, M. D.(1984).Technical guidelines for assessing computerized adaptive tests.Journal of Educational Measurement,21(4),347-360.
-
Gummerman, K.(1972).A response-contingent measure of proportion correct.The Journal of the Acoustical Society of America,52,1645-1647.
-
Johnson, D. A.,Wichern, D. W.(2007).Applied multivariate statistical analysis.Upper Saddle River, NJ:Pearson.
-
Kahraman, N.,Kamata, A.(2004).Increasing the precision of subscale scores by using out-of-scale information.Applied Psychological Measurement,28(6),407-426.
-
Lord, F. M.(1983).Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability.Psychometrika,48,233-245.
-
Lord, F. M.(1980).Application of item response theory to practical testing problems.Hillsdale, NJ:Lawrence Erlbaum Associates.
-
Martin, M. O.(Ed.),Mullis, I. V. S.(Ed.),Chrostowski, S. J.(Ed.)(2004).TIMSS 2003 Technical Report.Chestnut Hill, MA:TIMSS & PIRLS International Study Center, Boston College.
-
Muraki, E.(1992).A generalized partial credit model: Application of an EM algorithm.Applied Psychological Measurement,16(2),159-176.
-
Muraki, E.,Bock, R. D.(1996).PARSCALE: IRT based test scoring and item analysis for graded open-ended exercises and performance tasks (Version 3).Chicago, IL:Scientific Software International.
-
Nance, L. A.,John, R. D.,Terry, L. S.(2001).The NEAP 1998 Technical Report.Washington, DC:National Center for Education Statistics, Educational Testing Service.
-
Novick, M. R.,Jackson, P. H.(1974).Statistical methods for educational and psychological research.New York, NY:McGraw-Hill.
-
Organisation for Economic Co-operation and Development [OECD](2005).PISA 2003 Technical Report.Paris:Organisation for Economic Co-operation and Development [OECD].
-
Pommerich, M.,Nicewander, W. A.,Hanson, B.(1999).Estimating average domain scores.Journal of Educational Measurement,36,199-216.
-
Shin, C. D.(2006).A comparison of methods of estimating subscale scores for Mixed-Format tests.the annual meeting of the National Council on Measurement in Education,San Francisco, CA:
-
Shin, C. D.,Ansley, T.,Tsai, T.,Mao, X.(2005).A comparison of methods of estimating objective scores.the annual meeting of the National Council on Measurement in Education,Montreal, Quebec, Canada:
-
Tate, R. L.(2004).Implications of multidimensionality for total score and subscale performance.Applied Measurement in Education,17(2),89-112.
-
Wainer, H.,Vevea, J. L.,Camacho, F.,Reeve III, B. B.,Rosa, K.,Nelson, L.,Swygert, K. A.,Thissen, D.(2000).Test scoring.Hillsdale, NJ:Lawrence Erlbaum Associates.
-
Yen, W. M.(1987).A Bayesian/IRT index of objective performance.the annual meeting of the Psychometric Society,Montreal, Quebec, Canada:
-
Yen, W. M.,Sykes, R. C.,Ito, K.,Julian, M.(1997).A Bayesian/IRT index of objective performance for tests with mixed-item types.the annual meeting of the National Council on Measurement in Education,Chicago, IL:
-
Zimowski, M. F.,Muraki, E.,Mislevy, R. J.,Bock, R. D.(2003).BILOG-MG.Chicago, IL:Scientific Software International.
-
洪碧霞、林素微、林娟如(2006)。認知複雜度分析架構對TASA-MAT六年級線上測驗試題難度的解釋力。教育研究與發展期刊,2(4),69-86。
-
楊孟麗、譚康榮、黃敏雄(2003)。台灣教育長期追蹤資料庫—心理計量報告:TEPS2001分析能力測驗第一版。台北市:中央研究院調查研究專題中心。
|