An Investigation of Item Bias of English Test: The Case of 2016 Year Undergraduate Placement Exam in Turkey

Main Article Content

Rabia Akcan Kübra Atalay Kabasakal


The purpose of this study is to determine whether English test items of Undergraduate Placement Exam (UPE) in 2016 contain differential item functioning (DIF) and differential bundle functioning (DBF) in terms of gender and school type and examine the possible sources of bias of DIF items. Mantel Haenszel (MH), Simultaneous Item Bias Test (SIBTEST) and Multiple Indicator and Multiple Causes (MIMIC) methods were used for DIF analyses. DBF analyses were conducted by MIMIC and SIBTEST methods. Expert opinions were consulted to determine the sources of bias. Data set of the study consisted of responses of 59818 students to 2016 UPE English test.  As a result of the analyses carried out on 60 items, it was seen that one item in translation subtest contained DIF favoring male students. In school type based analyses, it was concluded that there were nine DIF items in vocabulary and grammar knowledge subtest, six DIF items in reading comprehension subtest and four DIF items in translation subtest. Experts stated that one item containing DIF by gender was unbiased, and evidence of bias was found in thirteen of nineteen items that contained DIF by school type. According to DBF analyses, it was found that some item bundles contained DBF with respect to gender and school type.  As a result of research, it was discovered that there were differences with regard to the number of DIF items identified by three methods and the level of DIF that the items contained; however, methods were consistent in detecting uniform DIF.

Article Details

How to Cite
Akcan, R., & Atalay Kabasakal, K. (2019). An Investigation of Item Bias of English Test: The Case of 2016 Year Undergraduate Placement Exam in Turkey. International Journal of Assessment Tools in Education, 6(1), 48-62. Retrieved from


Abbott, M. L. (2007). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7-36. DOI: 10.1177/0265532207071510

Akın Arıkan, Ç., Uğurlu, S., & Atar, B. (2016). A DIF and bias study by using MIMIC, SIBTEST, Logistic Regression and Mantel-Haenszel methods. Hacettepe University Journal of Education, 31(1), 34-52. DOI:10.16986/HUJE.2015014226

Arga, B. (2017). Gender and student achievement in Turkey: School types and regional differences based on PISA 2012 data (Master's Thesis). İhsan Doğramacı Bilkent University, Ankara.

Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (Type I error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel methods in the determination of differential item functioning. Educational Sciences: Theory & Practice, 14(6), 2186-2193. DOI: 10.12738/estp.2014.6.2165

Bakan Kalaycıoğlu, D. (2008). Öğrenci Seçme Sınavı'nın madde yanlılığı açısından incelenmesi [Item bias analysis of the University Entrance Examination]. (Doctoral Dissertation). Hacettepe University, Ankara.

Berberoğlu, G., & Kalender, İ. (2005). Öğrenci başarısının yıllara, okul türlerine, bölgelere göre incelenmesi: ÖSS ve PISA analizi [Investigation of student achievement across years, school types and regions: The SSE and PISA anaylses]. Eğitim Bilimleri ve Uygulama, 4(7), 21-35.

Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. London Sage.

Chalmers, R. P. (2017). Improving the crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika. DOI: 10.1007/s11336-017-9583-8

Chalmers, R. P. (2018). mirt,version 1.27.1: Multidimensional item response theory. Retrieved from

Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement Issues and Practice, 17(1), 31-44.

Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item-bundle DIF hypothesis testing: Identifying suspect bundles and assessing their differential functioning. Journal of Educational Measurement, 33(4), 465-484.

Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel,SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295. DOI: 10.1177/0146621605275728

Finch, H. (2012). The MIMIC model as a tool for differential bundle functioning detection. Applied Psychological Measurement, 36(1), 40-59. DOI: 10.1177/0146621611432863

Finch, H. W., & French, B. F. (2007). Detection of crossing differential item functioning a comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582. DOI: 10.1177/0013164406296975

Fox, J. (2016). polycor,version 0.7-9: Polychoric and polyserial correlations. Retrieved from

Gierl, M. J., Bisanz, J., Bisanz, G. L., Boughton, K. A., & Khaliq, S. N. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement, 20(2), 26-36.

Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A Confirmatory analysis. Journal of Educational Measurement, 38(2), 164-187.

Gök, B., Kelecioğlu, H., & Doğan, N. (2010). Değişen madde fonksiyonunu belirlemede Mantel–Haenszel ve Lojistik Regresyon tekniklerinin karşılaştırılması [The comparison of Mantel Haenszel and Logistic Regression tecniques in determining the differential item functioning]. Eğitim ve Bilim, 35(156).

Hallquist, M., & Wiley, J. (2018). MplusAutomation,version 0.7-2: An R package for facilitating large-scale latent variable analyses in Mplus. Retrieved from

Holland , P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer, & H. I. Braun, Test Validity (pp. 129-145). Hillsdale NJ: Erlbaum.

Kan, A. (2007). Test yansızlığı: H.Ü. Yabancı dil muafiyet sınavının cinsiyete ve bölümlere göre DMF analizi [Test fairness: DIF analysis across gender and department of H.U foreign language proficiency examination]. Eurasian Journal of Educational Research(29), 45-58.

Karakaya, İ. & Kutlu, Ö. (2012). Seviye belirleme sınavındaki Türkçe alt testlerinin madde yanlılığının incelenmesi [An investigation of item bias in Turkish sub tests in Level Determination Exam]. Eğitim ve Bilim 37(165).

Li, H.-H., & Stout, W. (1996). A new procedure for detection of crosssing DIF. Psychometrika, 61(4), 647-677.

Lin , J., & Wu, F. (2003). Differential performance by gender in foreign language testing. Paper presented at the Annual Meeting of the National Councilon Measurement in Education.

Magis, D., Beland, S., & Raiche , G. (2016). difR, version 4.7:Collection of methods to detect dichotomous differential item functioning(DIF). Retrieved from

Mcnamara, T., & Roever, C. (2006). Psychometric approaches to fairness: Bias and DIF. Language Learning, 56(S2), 81-128.

Muthen, L. K., & Muthen, B. O. (1998). Mplus user’s guide. Los Angeles.

Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-haenszel and Simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315-328.

Osterlind, S. J. (1983). Test item bias. Sage Publications, Inc.

Raiche, G., & Magis, D. (2015). nFactors,version 2.3.3:Parallel analysis and non graphical solutions to the Cattell. Retrieved from

Rosseel, Y. (2017). lavaan,version 0.5-23.1097: Latent variable analysis. Retrieved from

Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230.

Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/ DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194.

Willse, J. T. (2018). CTT,version 2.3.2: Classical test theory functions. Retrieved from

Woods, C. M., & Grimm, K. J. (2011). Testing for nonuniform differential item functioning with Multiple Indicator Multiple Cause models. Applied Psychological Measurement, 35(5), 339-361. DOI: 10.1177/0146621611405984

Yalçın, S. (2011). Türk öğrencilerin PISA başarı düzeylerinin veri zarflama analizi ile yıllara göre karşılaştırılması[The comparison of Turkish students’ PISA achievement levels in relation to years via data envelopment analysis]. (Master's Thesis). Ankara University, Ankara.

Yiğit, S. (2010). PISA matematik alt test sorularına verilen cevapların bazı faktörlere göre incelenmesi (Kocaeli-Kartepe örneği) [The analysis of the answers to PISA maths subtest questions according to certain factors (Kocaeli-Kartepe case)]. (Master's Thesis). Sakarya University, Sakarya.

Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. P. W. Holland, & H. Wainer içinde, Differential Item Functioning (s. 337-347). Hillsdale NJ:Erlbaum.

Zumbo, B. D. (1999). A Handbook on the theory and methods of differencial item functioning (DIF):Logistic Regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottowa on Directorate of Human Resources Research and Evaluation,Deparment of National Defense.