Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST

Main Article Content

Melek Gülşah Şahin Nagihan Boztunç Öztürk


New statistical methods are being added to the literature as a result of scientific developments each and every day. This study aims at investigating one of these, Maximum Likelihood Score Estimation with Fences (MLEF) method, in ca-MST. The results obtained from this study will contribute to both national and international literature since there is no such study on the applicability of MLEF method in ca-MST. In line with the aim of this study, 48 conditions (4 module lengths (5-10-15-20) x 2 panel designs (1-3; 1-3-3) x 2 ability distribution (normal-uniform) x 3 ability estimation methods (MLEF-MLE-EAP) were simulated and the data obtained from the simulation were interpreted with correlation, RMSE and AAD as an implication of measurement precision; and with conditional bias calculation in order to show the changes in each ability level. This study is a post-hoc simulation study using the data from TIMSS 2015 at the 8th grade in mathematics. “xxIRT” R package program and MSTGen simulation software tool were are used in the study. As a result, it can be said that MLEF, as a new ability estimation method, is superior to MLE method in all conditions.  EAP estimation method gives the best results in terms of the measurement precision based on correlation, RMSE and AAD values, whereas the results gained via MLEF estimation method are pretty close to those in EAP estimation method. MLE proves to be less biased in ability estimation, especially in extreme ability levels, when compared to EAP ability estimation method.

Article Details

How to Cite
Şahin, M. G., & Boztunç Öztürk, N. (2019). Analyzing the Maximum Likelihood Score Estimation Method with Fences in ca-MST. International Journal of Assessment Tools in Education, 6(4), 555-567. Retrieved from
Author Biography

Melek Gülşah Şahin, Gazi University

Gazi University, Gazi Education Faculty, Department of Educational Sciences, Turkey


Baker, F.B., & Kim, S. (2004). The basics of item response theory using R. New York: Marcel Dekker.

Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Pyschological Measurement, 6, 431 444. DOI: 10.1177/014662168200600405

Embretson, S. E., and Reise, S.P. (2000). Item response theory for pyschologists. Mahwah, NJ, US: Lawrence Erlbaum

Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.

Magis, D., Beland, S., & Raiche, G. (2010). A test-length correction to the estimation of extreme proficiency levels. Applied Psychological Measurement, 35, 91–109.

Magis, D., & Raiche, G. (2010). An iterative maximum a posteriori estimation of proficiency level to detect multiple local likelihood maxima. Applied Psychological Measurement, 34, 75–90.

Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37(8) 666–668. doi: 10.1177/0146621613499639

Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short Length Tests an Computerized Adaptive Tests. Applied Psychological Measurement, 40(4), 289-301.

Hambleton, R. K., H. Swaminathan and H. J. Rogers. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

Hendrickson, A. 2007. An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26, 44–52.

International Association for the Evaluation of Educational Achievement (IEA), (2013). TIMSS 2015 Assessment Frameworks. Boston College: TIMSS & PIRLS International Study Center, Lynch School of Education.

Jodoin, M. G., Zenisky, A. L., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203-220.

Kim, S., Moses, T., & You, H. (2015). A Comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement. 52(1), 70-79.

Luecht, R. M. (2000). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.

Luecht, R.M., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189-202.

Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229-249.

Leucht, R., & Sireci, S.G. (2011). A review of models for computer-based testing. Research Report. New York: The College Board. Retrieved from

Luo, X. (2017). Package 'xxIRT'. (Version 2.0.3). Retrieved September 25, 2018 from

Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test design for multistage testing. (Unpublished doctoral dissertation). University of Texas at Austin.

Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.

R Core Team. (2014). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from

Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer

Robin, F. (1999, March). Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montréal, Québec.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrica Monograph Supplement, 34(4,Pt.2), 100)

Sarı, H.İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406. DOI: 10.21031/epod.280183

Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29(3), 243-251.

Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.

Yan, D., von Davier, A.A., & Lewis, C. (2014) Computerized multistage testing: Theory and applications. CRC Press

Yen, W. M., & Fitzpatrick, A.R. (2006). Item response theory. In R. L. Brennan (Ed.). Educational measurement. Westport, CT: American Council on Educaiton and Praeger.

Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.

Zenisky, A., Hambleton, R.J., & Luecht, R.M. (2010). Multistage testing: Issues, design and research. In W.J. ven der Linden & C.E.W. Glass (Eds.). Elements of adaptive testing (pp.355-372). New York: Springer