Equality of Admission Tests Using Kernel Equating Under the Non-Equivalent Groups with Covariates Design



Kernel equating, Non-equivalent groups design, NEC design, Background variables, Admission tests


Educational assessment tests are designed to measure the same psychological constructs over extended periods of time. This feature is important considering that test results are often used in the selection process for admittance to university programs. However, test forms that measure the same construct will often differ in level of difficulty, as unique test items tend to be used for each new test administration. To ensure fair assessments, especially for those whose results weigh heavily in selection decisions, it is necessary to collect evidence demonstrating that the assessments are not biased, and to confirm that the scores obtained from different test forms have statistical equality. For this purpose, test equating has important functions, as it prevents bias generated by differences in the difficulty levels of different test forms, allows the scores obtained from different test forms to be reported on the same scale, and ensures that the reported scores communicate the same meaning. In this study, these important functions were evaluated using real college admission test data from different test administrations. The kernel equating method under the non-equivalent groups with covariates design was applied to determine whether the scores obtained from different time periods but measuring the same psychological constructs were statistically equivalent. The non-equivalent groups with covariates design was specifically used because the test groups of the admission test are non-equivalent and there are no anchor items. Results from the analyses showed that the test forms had different score distributions, and that the relationship was non-linear. The equating procedure was thus adjusted to eliminate these differences and thereby allow the tests to be used interchangeably.


Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan Publishing Company.

Andersson B., Branberg K., & Wiberg, M. (2013a). Test equating using the kernel method with the R package kequate. R package version 1.3.2, https://CRAN.R-project.org/package=kequate

Andersson, B., Branberg K., & Wiberg, M. (2013b). Performing the Kernel Method of Test Equating with the Package kequate. Journal of Statistical Software, 55(6), 1-25. https://www.jstatsoft.org/v55/i06/

Angoff, W. H. (1971). Scale, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 509-600). Washington: American Council of Education.

Angoff, W. H. (1982). Summary and derivation of equating methods used at ETS. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 55-69). New York: Academic.

ANKUDEM. (2011). Ankara Üniversitesi Yabancı Uyruklu Öğrenci Seçme ve Yerleştirme Sınavı (AYÖS) projesi kesin raporu [Ankara University Student Selection and Placement Exam for International Students (AYOS) project final report]. Project No.: 11Y5250001. Ankara: Ankara University Scientific Research Project Office.

ANKUDEM. (2012). AYÖS 2012 Temel Öğrenme Becerileri Testi üzerine bir çalışma [A study on AYOS 2012 Basic Learning Skills Test]. Internal Report. Ankara: Ankara University Measurement and Evaluation Application and Research Center.

Author, A. (XXXX).

Branberg K., & Wiberg, M. (2011). Observed score linear equating with covariates. Journal of Educational Measurement, 48(4), 419-440. https://www.jstor.org/stable/41427533

Braun, H. I., & Holland, P. W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9-49). New York: Academic.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. https://doi.org/10.1007/BF02310555

Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). New York: Harper Collins Publishers, Inc.

Dorans, N., & P. Holland (2000). Population invariance and the equitability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37(4), 281-306. https://doi.org/10.1111/j.1745-3984.2000.tb01088.x

Felan, G. D. (2002, February, 14-16). Test equating: Mean, linear, equipercentile, and item response theory. [Paper presentation]. The Annual Meeting of the Southwest Educational Research Associations, Austin, TX, United States. https://files.eric.ed.gov/fulltext/ED462436.pdf

Fraenkel, J. R., & Wallen, N. E. (2009). How to design and evaluate research in education (7th ed.). New York: McGraw-Hill Companies.

González, J., Barrientos, A. F., & Quintana, F. A. (2015). Bayesian nonparametric estimation of test equating functions with covariates. Computational Statistics & Data Analysis, 89, 222–244. https://doi.org/10.1016/j.csda.2015.03.012

González, J., & von Davier, A. A. (2017). An illustration of the Epanechnikov and adaptive continuization methods in kernel equating. In L. A. van der Ark, M. Wiberg, S. A. Culpeppe, J. A. Douglas, & W. C. Wang (Eds.), Quantitative Psychology – 81st Annual Meeting of the Psychometric Society (2016), Asheville, North Carolina, Springer Proceedings in Mathematics & Statistics, 196, 253-262. https://doi.org/10.1007/978-3-319-56294-0_23

González, J., & Wiberg, M. (2017). Applying test equating methods using R. New York: Springer.

Holland, P. W., & Thayer, D. T. (1985). Section pre-equating in the presence of practice effects. Journal of Educational Statistics, 10(2), 109-120. https://www.jstor.org/stable/1164838

Kan, A. (2010). Test eşitleme: Aynı davranışları ölçen, farklı madde formlarına sahip testlerin istatistiksel eşitliğinin sınanması [Test equating: Testing the statistical equality of tests that measure the same behavior and, have different item forms]. Journal of Measurement and Evaluation in Education and Psychology, 1 (1), 16-21. https://dergipark.org.tr/en/download/article-file/65994

Kolen, M. J. (1990). Does matching in equating work? A discussion. Applied Measurement in Education, 3(1), 23-39, https://doi.org/10.1207/s15324818ame0301_7

Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices (3rd ed.). New York: Springer.

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151-160. https://doi.org/10.1007/BF02288391

Levine, R. (1955). Equating the score scales of alternate forms administered to samples of different ability. Research Bulletin 55(2), i-118. Princeton, New Jersey: Educational Testing Service. https://doi.org/10.1002/j.2333-8504.1955.tb00266.x

Livingston, S. A., Dorans, N. J., & Wright, N. K. (1990). What Combination of Sampling and Equating Methods Works Best? Applied Measurement in Education, 3(1), 73-95, https://doi.org/10.1207/s15324818ame0301_6

Lord, F.M. (1950). Notes on comparable scales for test scores. Research Bulletin 50(48), 1-20. Princeton, New Jersey: Educational Testing Service. https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1950.tb00673.x

Moses, T., & Holland, P. W. (2010). A comparison of statistical selection strategies for univariate and bivariate log-linear models. The British journal of mathematical and statistical psychology, 63(Pt 3), 557-574. https://doi.org/10.1348/000711009X478580

R Core Team (2018). R: A language and environment for statistical computing. [Computer software manual]. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/

Sansivieri, V., & Wiberg, M. (2016). IRT observed-score equating with the nonequivalent groups with covariates design. In L.A. van der Ark, Wiberg, M., Culpepper, S. S., Douglas, J. A., & Wang, W. C. (Eds.), Quantitative psychology (pp. 275-285). Springer Proceedings in Mathematics & Statistics Volume 196, https://doi.org/10.1007/978-3-319-56294-0_25

Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461-464. https://projecteuclid.org/download/pdf_1/euclid.aos/1176344136

von Davier, A. A., Holland, P. W., & Thayer, D.T. (2004). The kernel method of test equating. New York Springer.

Wallin, G. (2019). Extensions of the kernel method of test score equating. [Unpublished doctoral dissertation]. Department of Statistics Umea School of Business, Economics and Statistics Umea University, Sweden.

Wallin G., Wiberg M. (2017) Nonequivalent groups with covariates design using propensity scores for kernel equating. In L.A. van der Ark, Wiberg, M., Culpepper, S. S., Douglas, J. A., & Wang, W. C. (Eds.), Quantitative psychology (pp. 309-319). Springer Proceedings in Mathematics & Statistics Volume 196, http://doi-org-443.webvpn.fjmu.edu.cn/10.1007/978-3-319-56294-0_27

Wallin, G., & Wiberg, M. (2019). Kernel equating using propensity scores for nonequivalent groups. Journal of Educational and Behavioral Statistics, 44(4), 390-414. https://doi.org/10.3102/1076998619838226

Wiberg, M., & Branberg, K. (2015). Kernel equating under the non-equivalent groups with covariates design. Applied Psychological Measurement, 39(5), 349-361. https://doi.org/10.1177/0146621614567939



How to Cite

Altıntaş, Özge, & Wallin, G. (2021). Equality of Admission Tests Using Kernel Equating Under the Non-Equivalent Groups with Covariates Design. International Journal of Assessment Tools in Education, 8(4), 729-743. Retrieved from https://ijate.net/index.php/ijate/article/view/44