Main Article Content
New statistical methods are being added to the literature as a result of scientific developments each and every day. This study aims at investigating one of these, Maximum Likelihood Score Estimation with Fences (MLEF) method, in ca-MST. The results obtained from this study will contribute to both national and international literature since there is no such study on the applicability of MLEF method in ca-MST. In line with the aim of this study, 48 conditions (4 module lengths (5-10-15-20) x 2 panel designs (1-3; 1-3-3) x 2 ability distribution (normal-uniform) x 3 ability estimation methods (MLEF-MLE-EAP) were simulated and the data obtained from the simulation were interpreted with correlation, RMSE and AAD as an implication of measurement precision; and with conditional bias calculation in order to show the changes in each ability level. This study is a post-hoc simulation study using the data from TIMSS 2015 at the 8th grade in mathematics. “xxIRT” R package program and MSTGen simulation software tool were are used in the study. As a result, it can be said that MLEF, as a new ability estimation method, is superior to MLE method in all conditions. EAP estimation method gives the best results in terms of the measurement precision based on correlation, RMSE and AAD values, whereas the results gained via MLEF estimation method are pretty close to those in EAP estimation method. MLE proves to be less biased in ability estimation, especially in extreme ability levels, when compared to EAP ability estimation method.
International Journal of Assessment Tools in Education
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Bock, R.D., & Mislevy, R.J. (1982). Adaptive EAP estimation of ability in microcomputer environment. Applied Pyschological Measurement, 6, 431 444. DOI: 10.1177/014662168200600405
Embretson, S. E., and Reise, S.P. (2000). Item response theory for pyschologists. Mahwah, NJ, US: Lawrence Erlbaum
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.
Magis, D., Beland, S., & Raiche, G. (2010). A test-length correction to the estimation of extreme proficiency levels. Applied Psychological Measurement, 35, 91–109.
Magis, D., & Raiche, G. (2010). An iterative maximum a posteriori estimation of proficiency level to detect multiple local likelihood maxima. Applied Psychological Measurement, 34, 75–90.
Han, K. T. (2013). MSTGen: simulated data generator for multistage testing. Applied Psychological Measurement, 37(8) 666–668. doi: 10.1177/0146621613499639
Han, K. T. (2016). Maximum Likelihood Score Estimation Method with Fences for Short Length Tests an Computerized Adaptive Tests. Applied Psychological Measurement, 40(4), 289-301.
Hambleton, R. K., H. Swaminathan and H. J. Rogers. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hendrickson, A. 2007. An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26, 44–52.
International Association for the Evaluation of Educational Achievement (IEA), (2013). TIMSS 2015 Assessment Frameworks. Boston College: TIMSS & PIRLS International Study Center, Lynch School of Education.
Jodoin, M. G., Zenisky, A. L., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test designs for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203-220.
Kim, S., Moses, T., & You, H. (2015). A Comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement. 52(1), 70-79.
Luecht, R. M. (2000). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), New Orleans, LA.
Luecht, R.M., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189-202.
Luecht, R. M., & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35(3), 229-249.
Leucht, R., & Sireci, S.G. (2011). A review of models for computer-based testing. Research Report. New York: The College Board. Retrieved from https://files.eric.ed.gov/fulltext/ED562580.pdf
Luo, X. (2017). Package 'xxIRT'. (Version 2.0.3). Retrieved September 25, 2018 from https://cran.r-project.org/web/packages/xxIRT/xxIRT.pdf
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test design for multistage testing. (Unpublished doctoral dissertation). University of Texas at Austin.
Patsula, L. N. (1999). A comparison of computerized-adaptive testing and multi-stage testing. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
R Core Team. (2014). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/
Reckase, M.D. (2009). Multidimensional item response theory. New York: Springer
Robin, F. (1999, March). Alternative item selection strategies for improving test security and pool usage in computerized adaptive testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Montréal, Québec.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrica Monograph Supplement, 34(4,Pt.2), 100)
Sarı, H.İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406. DOI: 10.21031/epod.280183
Wainer, H., Kaplan, B., & Lewis, C. (1992). A comparison of the performance of simulated hierarchical and linear testlets. Journal of Educational Measurement, 29(3), 243-251.
Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12, 339-368.
Yan, D., von Davier, A.A., & Lewis, C. (2014) Computerized multistage testing: Theory and applications. CRC Press
Yen, W. M., & Fitzpatrick, A.R. (2006). Item response theory. In R. L. Brennan (Ed.). Educational measurement. Westport, CT: American Council on Educaiton and Praeger.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. (Unpublished doctoral dissertation). University of Massachusetts at Amherst.
Zenisky, A., Hambleton, R.J., & Luecht, R.M. (2010). Multistage testing: Issues, design and research. In W.J. ven der Linden & C.E.W. Glass (Eds.). Elements of adaptive testing (pp.355-372). New York: Springer