In resent years, for estimating students' higher abilities, the framework of assessment gradually turns into large-scale standardized assessment framework. Suitable model not only tells us the ability estimates wanted, and gets the better estimation result. By means of empirical study, the main purpose of the study is to compare if there is difference in mathematical ability estimation by HIRT (hierarchical item response theory), MIRT (multidimensional item response theory) and UIRT (unidimensional item response theory) and what their influences are as the reference of mathematical assessment model. The assessment on Decimal division is designed for six-grade students based on the mathematical assessment framework of NAEP.The reliability on the assessment is 0.79. The result is analyzed and compared by HIRT, MIRT and UIRT models. According to the model fit indexes (AIC, BIC and DIC), it shows that HIRT model is suitable to large-scale standardized assessment framework. In HIRT pattern, the coefficients of Decimal division, and conceptual understanding, procedural knowledge, problem solving inference regression are higher than 0.7, especially conceptual understanding influence the Decimal division. Therefore, the result of the empirical study confirms HIRT model can provide more information and has better estimation.
Adams, R. J.,Wilson, M.,Wang, W. C.(1997).The multidimensional random coefficients multinomial logit model.Applied Psychological Measurement,21,1-23.
Congdon, P.(2003).Applied Bayesian modelling.New York:John Wiley.
Cowles, M. K.(2004).Review of WinBUGS 1.4.The American Statistician,58,330-336.
Hambleton, R. K.(Ed.),Zall, J.(Ed.)(1991).Advances in educational and psychological testing.Boston:Kluwer-Nijhoff.
Hoskens, M.,De Boeck, P(1997).A parameteric model for local dependence among test items.Psychological methods,2,261-277.
National Assessment Governing Board(2002).Mathematics framework for the 2003 national assessment of educational progress.National Assessment Governing Board U.S. Department of Education.
Qiu, Z.,Song, P. X.-K.,Tan, M.(2002).Bayesian hierarchical models for multi-level repeated ordinal data using WinBUGS.Journal of Biopharmaceutical Statistics,12,121-135.
Rasch, G.(1960).Probability models for some intelligence and attainment tests.Copenhagen Danmark:Danmark's Paedogogiske Institute for Educational Research.
Song, H.(2007).The State University of New Jersey.
Spiegelhalter, D.,Best, N.,Carlin, B.(1998).Technical reportTechnical report,University of Minnesota.
Sturtz, S.,Ligges, U.,Gelman, A.(2005).R2WinBUGS: A package for running WinBUGS from R.Journal of Statistical Software,12,1-16.
Wang, W.,Wilson, M.,Cheng, Y.(2000).Local Dependence between Latent Traits when Common Stimuli are Used.International Objective Measurement Workshop,New Orleans, LA:
Wilson, M.,Adams, R. J.(1995).Rasch models for item bundles.Psychometrika,60,181-198.