How Many Response Categories are Enough for Likert Type Scales?

An Empirical Study Based on Item Response Theory


  • Eren Can Aybek Pamukkale University
  • Çetin Toraman Çanakkale Onsekiz Mart University


Likert-type scale, response categories, item response theory


The current study investigates the optimum number of response categories for the Likert type of scales under the item response theory (IRT). The data was collected from university students that attends to mainly medicine and education faculties. A form of the “Social Gender Parity Scale” developed by Gözütok et al. (2017) was prepared, which had 3, 5 and 7-point response categories. Graded response model (GRM) was used for item calibrations. The results of the study showed that in scale development studies, using a 5-point response option provides advantages over using a 3-point response category by reliability and test information perspective. In addition, it does not pose a major disadvantage compared to a 7-point response category. Therefore, researchers are recommended to use a 5-point response category, also considering the ease of responding.


Adelson, J. L., & McCoach, D. B. (2010). Measuring the mathematical attitudes of elementary students: The effects of a 4-point or 5-point Likert-Type scale. Educational and Psychological Measurement, 70(5) 796-807.

Aiken, L. R. (1983). Number of response categories and statistics on a teacher rating scale. Educational and Psychological Measurement, 43, 397-401.

Anastasi, A., & Urbina, S. (1997). Psychological testing. The USA: Prentice-Hall International, Inc.

Bora, B. (2013). Pazarlama araştırmalarında kullanılan likert türü ölçeklerin uygulanabilirliğinin incelenmesi. Doktora Tezi. Sakarya Üniversitesi. Sakarya.

Chalmers, R.P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. doi:10.18637/jss.v048.i06

Champney, H., & Marshall, H. (1939). Optimal refinement of the rating scale. Journal of Applied Psychology, 23, 323-331.

Chang, L. (1994). A psychometric evaluation of 4-point and 6-point Likert-type scales in relation to reliability and validity. Applied Psychological Measurement, 18(3), 205-215.

Dawes, J. (2008). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International Journal of Market Research, 50(1), 61-104.

DeVellis, R. F. (2003). Scale development, theory and applications. California: SAGE Publications.

Dunn-Rankin, P., Knezek, G. A., Wallace, S., & Zhang, S. (2004). Scaling methods. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Gözütok, F. D., Toraman, Ç. ve Acar Erdol, T. (2017). Toplumsal cinsiyet eşitliği ölçeğinin (TCEÖ) geliştirilmesi (Development of gender equality scale). İlköğretim Online Dergisi (Elementary Education Online), 16(3), 1036-1048.

Jamieson, S. (2004). Likert Scales: How to (Ab)use them. Medical Education, 38, 1217‐1218

Joshi, A., Kale, S., Chandel, S., & Pal, D. K. (2015). Likert scale: Explored and explained. British Journal of Applied Science & Technology (BJAST), 7(4), 396-403.

Leung, S. O. (2011). A comparison of psychometric properties and normality in 4-, 5-, 6-, and 11-point Likert Scales. Journal of Social Service Research, 37, 412-421.

Lord, F. M. (1954). Chapter II: Scaling. Review of Educational Research, 24(5), 375-392.

Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Adv in Health Sci Educ 15, 625-632.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill, Inc.

Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica 104, 1-15.

Price, L. R. (2017). Psychometric methods, theory into practice. New York: The Guilford Press

R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

Revelle, W. (2021) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, version=2.1.6.

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680

Thomas, H. (1982). IQ interval scales, and normal distributions. Psychological Bulletin, 91, 198-202

Toraman, C. & Ozen, F. (2019). An investigation of the effectiveness of the gender equality course with a specific focus on faculties of education. Educational Policy Analysis and Strategic Research, 14(2), 6-28.

Torgerson, W. S. (1958). Theory and methods of scaling. New York: John Willey & Sons, Inc.

Wong, C.-S., Chuen, K.-C., & Fung, M.-Y. (1993). Differences between odd and even number of response scales: Some empirical evidence. Chinese Journal of Psychology, 35, 75-86.

Wu, H., & Leung, S. O. (2017) Can Likert Scales be treated as interval scales? A simulation study. Journal of Social Service Research, 43(4), 527-532.



How to Cite

Aybek, E. C., & Toraman, Çetin. (2022). How Many Response Categories are Enough for Likert Type Scales? : An Empirical Study Based on Item Response Theory. International Journal of Assessment Tools in Education, 9(2), 534-547. Retrieved from