American Educational Research Association,American Psychological Association,National Council on Measurement in Education(1999).Standards for educational and psychological testing.Washington, DC:American Educational Research Association.
Ang-Aw, H. T.,Goh, C. C. M.(2011).Understanding discrepancies in rater judgment on national-level oral examination tasks.RELC Journal,42(1),31-51.
Brandon, P. R.(2004).Conclusions about frequently studied modified Angoff standard-setting topics.Applied Measurement in Education,17(1),59-88.
Brennan, R. L.(Ed.)(2006).Educational measurement.Westport, CT:American Council on Education.
Brown, A.(1995).The effect of rater variables in the development of an occupation-specific language performance test.Language Testing,12(3),1-15.
Cizek, G. J.(Ed.)(2001).Setting performance standards: Concepts, methods, and perspectives.Mahwah, NJ:Lawrence Erlbaum Associates.
Cizek, G. J.(Ed.)(2001).Standard setting: Concepts, methods, and perspectives.Mahwah, NJ:Lawrence Erlbaum Associates.
Clauser, J. C.(2013).Amherst, MA,University of Massachusetts.
Cuesta-Albertos, J. A.,Gordaliza, A.,Matrán, C.(1997).Trimmed k-means: An attempt to robustify quantizers.The Annals of Statistics,25(2),553-576.
Ferdous, A. A.,Plake, B. S.(2005).Understanding the factors that influence decisions of panelists in a standard-setting study.Applied Measurement in Education,18(3),257-267.
Hein, S. F.,Skaggs, G. E.(2009).A qualitative investigation of panelists' experiences of standard setting using two variations of the bookmark method.Applied Measurement in Education,22(3),207-228.
Huang, Z.(1997).A fast clustering algorithm to cluster very large categorical data sets in data mining.Data Mining and Knowledge Discovery,2(3),1-8.
Huang, Z.(1997).Clustering large data sets with mixed numeric and categorical values.Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining
Impara, J. C.,Plake, B. S.(2005).Teachers' ability to estimate item difficulty: A test of the assumption in the Angoff standard setting method.Journal of Educational Measurement,35(1),69-81.
Kaftandjieva, F.(2010).Methods for setting cut scores in criterion-referenced achievement tests: A comparative analysis of six recent methods with an application to tests of reading in EFL.Cito, Arnhem, The Netherlands:European Association for Language Testing and Assessment.
Kane, M. T.(1994).Validating the performance standards associated with passing scores.Review of Educational Research,64(3),425-461.
Kane, M. T.(1987).On the use of IRT models with judgmental standard setting procedures.Journal of Educational Measurement,24(4),333-345.
Khalid, M. N.(2011).Cluster analysis: A standard setting technique in measurement and testing.Journal of Applied Quantitative Method,6(2),46-58.
Lin, Y.-H.,Tseng, F.-L.,Sung, Y.-T.(2013).The development and application of the rater-effects-monitored Yes/No Angoff standard-setting method: Some preliminary results.the annual meeting of the International Conference on Standard-Based Assessment,Taipei, Taiwan:
Linacre, J. M.(2002).What do infit and outfit, mean-square and standardized mean?.Rasch Measurement Transactions,16(2),878.
Linacre, J. M.(1989).Many-facet Rasch measurement.Chicago, IL:MESA Press.
Lumley, T.(1998).Perceptions of language-trained raters and occupational experts in a test of occupational English language proficiency.English for Specific Purposes,17(4),347-367.
Lunz, M. E.,Stahl, J. A.(1990).Judge consistency and severity across grading periods.Evaluation & the Health Professions,13(4),425-444.
MacCann, R. G.,Stanley, G.(2006).The use of Rasch modeling to improve standard setting.Practical Assessment, Research & Evaluation,11(2),1-17.
Masters, G. N.(1982).A Rasch model for partial credit scoring.Psychometrika,47,149-174.
Orr, M.(2002).The FCE speaking test: Using rater reports to help interpret test scores.System,30(2),143-154.
Pitoniak, M. J.(2003).Amherst, MA,University of Massachusetts.
Plake, B. S.,Melican, G. J.,Mills, C. N.(1991).Factors influencing intrajudge consistency during standard-setting.Educational Measurement: Issues and Practice,10(2),15-16.
Scullen, S. E.,Mount, M. K.,Goff, M.(2000).Understanding the latent structure of job performance ratings.Journal of Applied Psychology,85,956-970.
Sireci, S. G.,Hauger, J. B.,Wells, C. S.,Shea, C.,Zenisky, A. L.(2009).Evaluation of the standard setting on the 2005 Grade 12 National Assessment of Educational Progress Mathematics Test.Applied Measurement in Education,22,339-358.
Smith, E. V.(Ed.),Smith, R. M.(Ed.)(2004).Introduction to Rasch measurement: Theory, models and applications.Maple Grove, MN:JAM Press.
Thorndike, R. L.(Ed.)(1971).Educational measurement.Washington, DC:American Council on Education.
Timm, N. H.(2002).Applied multivariate analysis.New York, NY:Springer-Verlag.
Trochim, W.,Donnelly, J. P.,Arora, K.(2015).Research methods: The essential knowledge base.Belmont, CA:Wadsworth.
U.S. Department of Education(2009).Evaluation of the National Assessment of Educational Progress: Study report.Washington, DC:Author.
Violato, C.,Marini, A.,Lee, C.(2003).A validity study of expert judgment procedures for setting cutoff scores on high-stakes credentialing examinations using cluster analysis.Evaluations and the Health Professions,26(1),59-72.
Wang, N.,Wiser, R. F.,Newman, L. S.(2001).Use of the Rasch IRT model in standard setting: An item mapping method.the Annual Meeting of the National Council on Measurement in Education,Seattle, WA: