How Many Response Categories are Enough for Likert Type Scales?

An Empirical Study Based on Item Response Theory

Authors

  • Eren Can Aybek Pamukkale University
  • Çetin Toraman Çanakkale Onsekiz Mart University

Keywords:

Likert-type scale, response categories, item response theory

Abstract

The current study investigates the optimum number of response categories for the Likert type of scales under the item response theory (IRT). The data was collected from university students that attends to mainly medicine and education faculties. A form of the “Social Gender Parity Scale” developed by Gözütok et al. (2017) was prepared, which had 3, 5 and 7-point response categories. Graded response model (GRM) was used for item calibrations. The results of the study showed that in scale development studies, using a 5-point response option provides advantages over using a 3-point response category by reliability and test information perspective. In addition, it does not pose a major disadvantage compared to a 7-point response category. Therefore, researchers are recommended to use a 5-point response category, also considering the ease of responding.

References

Adelson, J. L., & McCoach, D. B. (2010). Measuring the mathematical attitudes of elementary students: The effects of a 4-point or 5-point Likert-Type scale. Educational and Psychological Measurement, 70(5) 796-807. https://doi.org/10.1177/0013164410366694

Aiken, L. R. (1983). Number of response categories and statistics on a teacher rating scale. Educational and Psychological Measurement, 43, 397-401.

Anastasi, A., & Urbina, S. (1997). Psychological testing. The USA: Prentice-Hall International, Inc.

Bora, B. (2013). Pazarlama araştırmalarında kullanılan likert türü ölçeklerin uygulanabilirliğinin incelenmesi. Doktora Tezi. Sakarya Üniversitesi. Sakarya.

Chalmers, R.P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. doi:10.18637/jss.v048.i06

Champney, H., & Marshall, H. (1939). Optimal refinement of the rating scale. Journal of Applied Psychology, 23, 323-331.

Chang, L. (1994). A psychometric evaluation of 4-point and 6-point Likert-type scales in relation to reliability and validity. Applied Psychological Measurement, 18(3), 205-215. https://doi.org/10.1177/014662169401800302

Dawes, J. (2008). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International Journal of Market Research, 50(1), 61-104. https://doi.org/10.1177/147078530805000106

DeVellis, R. F. (2003). Scale development, theory and applications. California: SAGE Publications.

Dunn-Rankin, P., Knezek, G. A., Wallace, S., & Zhang, S. (2004). Scaling methods. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.

Gözütok, F. D., Toraman, Ç. ve Acar Erdol, T. (2017). Toplumsal cinsiyet eşitliği ölçeğinin (TCEÖ) geliştirilmesi (Development of gender equality scale). İlköğretim Online Dergisi (Elementary Education Online), 16(3), 1036-1048. http://dx.doi.org/10.17051/ilkonline.2017.330240

Jamieson, S. (2004). Likert Scales: How to (Ab)use them. Medical Education, 38, 1217‐1218

Joshi, A., Kale, S., Chandel, S., & Pal, D. K. (2015). Likert scale: Explored and explained. British Journal of Applied Science & Technology (BJAST), 7(4), 396-403. https://doi.org/10.9734/BJAST/2015/14975

Leung, S. O. (2011). A comparison of psychometric properties and normality in 4-, 5-, 6-, and 11-point Likert Scales. Journal of Social Service Research, 37, 412-421. https://doi.org/10.1080/01488376.2011.580697

Lord, F. M. (1954). Chapter II: Scaling. Review of Educational Research, 24(5), 375-392. https://doi.org/10.3102/00346543024005375

Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Adv in Health Sci Educ 15, 625-632. https://doi.org/10.1007/s10459-010-9222-y

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill, Inc.

Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica 104, 1-15. https://doi.org/10.1016/s0001-6918(99)00050-5

Price, L. R. (2017). Psychometric methods, theory into practice. New York: The Guilford Press

R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Revelle, W. (2021) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych version=2.1.6.

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680

Thomas, H. (1982). IQ interval scales, and normal distributions. Psychological Bulletin, 91, 198-202

Toraman, C. & Ozen, F. (2019). An investigation of the effectiveness of the gender equality course with a specific focus on faculties of education. Educational Policy Analysis and Strategic Research, 14(2), 6-28. https://doi.org/10.29329/epasr.2019.201.1

Torgerson, W. S. (1958). Theory and methods of scaling. New York: John Willey & Sons, Inc.

Wong, C.-S., Chuen, K.-C., & Fung, M.-Y. (1993). Differences between odd and even number of response scales: Some empirical evidence. Chinese Journal of Psychology, 35, 75-86.

Wu, H., & Leung, S. O. (2017) Can Likert Scales be treated as interval scales? A simulation study. Journal of Social Service Research, 43(4), 527-532. https://doi.org/10.1080/01488376.2017.1329775

Published

2022-06-19

How to Cite

Aybek, E. C., & Toraman, Çetin. (2022). How Many Response Categories are Enough for Likert Type Scales? : An Empirical Study Based on Item Response Theory. International Journal of Assessment Tools in Education, 9(2), 534-547. Retrieved from https://ijate.net/index.php/ijate/article/view/81

Issue

Section

Articles