Comparison of Passing Scores Determined by The Angoff Method in Different Item Samples

Main Article Content



In this study, the efficiency of various random sampling methods to reduce the number of items rated by judges in an Angoff standard-setting study was examined and the methods were compared with each other. Firstly, the full-length test was formed by combining Placement Test 2012 and 2013 mathematics subsets. After then, simple random sampling (SRS), content stratified (C-SRS), item-difficulty stratified (D-SRS) and content-by-difficulty random sampling (CD-SRS) methods were used to constitute different length of subsets (30%, 40%, 50%, 70%) from the full-test. In total, 16 different study conditions (4 methods x 4 subsets) were investigated. In data analysis part, ANOVA analysis was conducted to examine whether minimum passing scores (MPSs) for the subsets were significantly different from the MPSs of the full-length test. As a follow-up analysis, RMSE and SEE (Standard Error of Estimation) values were calculated for each study condition. Results indicated that the estimated Angoff MPSs were significantly different from the full-test Angoff MPS (45.12) only in the study conditions of 30%-C-SRS, 40% C-SRS, 30% D-SRS and 30%-CD-SRS. According to RMSE values, the C-SRS method had the smallest error while the SRS method had the biggest one. Moreover, SEE examinations revealed that to achieve estimations similar to the full-test Angoff MPS (within one SEE), it is sufficient to get 50% of items with the C-SRS method. C-SRS method was the more effective one compared to the others in reducing the number of items rated by judges in MPS setting studies conducted with the Angoff method.

Article Details

How to Cite
KARA, H., & ÇETİN, S. (2020). Comparison of Passing Scores Determined by The Angoff Method in Different Item Samples. International Journal of Assessment Tools in Education, 7(1), 80-97. Retrieved from


Behuniak, P., Gable, R. K., &Archambault, F. X. (1982). The validity of categorized proficiency test scores. Educational and Psychological Measurement, 42, 247-252.

Berk, R. A. (1996). Standard setting: The next generation (where few pschometricians have gone before!). Applied Measurement in Education, 9, 215–235.

Buckendahl, C. W., Ferdous, A. A. & Gerrow, J. (2010). Recommending cut scores with a subset of items: An empirical illustration. Practical Assessment, 15(6), 1-10.

Cizek, G. J. (2001). Setting performance standards: Concepts, methods and perspectives. Mahwah, NJ: Lawrence Erlbaum Associates.

Çetin, S. (2011). İşaretleme ve angoff standart belirleme yöntemlerinin karşılaştırılması [Comparison of Bookmark and Angoff Standard Setting Methods]. PhD dissertation, Hacettepe University, Ankara.

Downing, S. M. (2006). Selected-Response item formats in test development. In T. M. Haladyna & S. M. Downing (Ed.), Handbook of test development (pp. 287-300). Mahwah, New Jersey: Routledge.

Ferdous, A. A., & Plake, B. S. (2005). The use of subsets of test questions in an Angoff standard setting method. Educational and Psychological Measurement, 65(2), 185-201.

Ferdous, A. A., & Plake, B. S. (2007). Item selection strategy for reducing the number of items rated in an Angoff standard setting study. Educational and Psychological Measurement, 67(2), 193-206.

Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th ed.). New York: McGraw Hill.

Hambleton, R. M. (1998). Setting performance standards on achievement tests: Meeting the requirements of Title I. In L. N. Hansche (Ed.), Handbook for the development of performance standards (pp. 87-114). Washington, DC: Council of Chief State School Officers.

Hambleton, R. K., & Pitoniak, M. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational Measurement (pp. 433–470). Westport, CT: Praeger.

Impara, J. C., & Plake, B. S. (1997). Standard setting: An alternative approach. Journal of Educational Measurement, 34, 353-366.

Irwin, P. (2007). An alternative examinee-centered standard setting strategy (Doctoral dissertation). University of Nebraska, USA.

Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn (Ed.), Educational measurement (pp. 485-514). New York: American Council on Education/Macmillan.

Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319-342.

Kannan, P., Sgammato, A., & Tannenbaum, R. J. (2015). Evaluating the operational feasibility of using subsets of items to recommend minimal competency cut scores. Applied Measurement in Education, 28(4), 292-307.

Kannan, P., Sgammato, A., Tannenbaum, R. J., & Katz, I. R. (2015). Evaluating the consistency of angoff-based cut scores using subsets of items within a generalizability theory framework. Applied Measurement in Education, 28(3), 169-186.

Lewis, D. M., Mitzel, H. C., & Green, D. R. (1996). Standard setting: A bookmark approach. In D. R. Green (Ed.), IRT-based standard setting procedures utilizing behavioral anchoring. Symposium conducted at the Council of Chief State School Officers National Conference on Large-Scale Assessment, Phoenix, AZ.

MEB (2009). İlköğretim matematik dersi 6-8. sınıflar öğretim programı ve kılavuzu. [Elementary mathematics course curriculum and guide of 6-8. classes]. Retrieved November 29, 2019, from .

Mehrens, W. A. (1995). Methodological issues in standard setting for educational exams. In Proceedings of Joint Conference on Standard Setting for Large-Scale Assessments (pp. 221-263). Washington, DC: National Assessment Governing Board and National Center for Education Statistics.

Norcini, J., Shea, J., & Ping, J. C. (1988). A note on the application of multiple matrix sampling to standard setting. Journal of Educational Measurement, 25(2), 159–164.

Özçelik, D. A. (2013). Test Hazırlama Kılavuzu [Test Preparation Guide]. Pegem Akademi Yayıncılık.

Pallant, J. (2005). SPSS survival manual: A step by step guide to data analysis using SPSS for Windows (2nd ed.). Crows Nest, Australia: Allen & Unwin.

Plake, B. S., & Impara, J. C. (2001). The fourteenth mental measurements yearbook. Lincoln, NB: Buros Institute of Mental Measurements.

Reckase, M. D. (2001). Innovative methods for helping standard-setting participants to perform their task. The role of feedback regarding consistency, accuracy, and impact. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp.159-174). Mahwah, NJ: Erlbaum.

Sireci, S. G., Patelis, T., Rizavi, S., Dillingham, A. M., & Rodriguez, G. (2000). Setting standards on a computerized-adaptive placement examination. Laboratory or Psychometric and Evaluative Research Report No. 378.

Smith, T. N. (2011). Using stratified item selection to reduce the number of items rated in standard setting. University of South Florida, USA.

Siegel, S. (1956). Nonparametric methods for the behavioral sciences. New York.