How Many Response Categories are Enough for Likert Type Scales?
An Empirical Study Based on Item Response Theory
Keywords:
Likert-type scale, response categories, item response theoryAbstract
The current study investigates the optimum number of response categories for the Likert type of scales under the item response theory (IRT). The data was collected from university students that attends to mainly medicine and education faculties. A form of the “Social Gender Parity Scale” developed by Gözütok et al. (2017) was prepared, which had 3, 5 and 7-point response categories. Graded response model (GRM) was used for item calibrations. The results of the study showed that in scale development studies, using a 5-point response option provides advantages over using a 3-point response category by reliability and test information perspective. In addition, it does not pose a major disadvantage compared to a 7-point response category. Therefore, researchers are recommended to use a 5-point response category, also considering the ease of responding.
References
Adelson, J. L., & McCoach, D. B. (2010). Measuring the mathematical attitudes of elementary students: The effects of a 4-point or 5-point Likert-Type scale. Educational and Psychological Measurement, 70(5) 796-807. https://doi.org/10.1177/0013164410366694
Aiken, L. R. (1983). Number of response categories and statistics on a teacher rating scale. Educational and Psychological Measurement, 43, 397-401.
Anastasi, A., & Urbina, S. (1997). Psychological testing. The USA: Prentice-Hall International, Inc.
Bora, B. (2013). Pazarlama araştırmalarında kullanılan likert türü ölçeklerin uygulanabilirliğinin incelenmesi. Doktora Tezi. Sakarya Üniversitesi. Sakarya.
Chalmers, R.P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. doi:10.18637/jss.v048.i06
Champney, H., & Marshall, H. (1939). Optimal refinement of the rating scale. Journal of Applied Psychology, 23, 323-331.
Chang, L. (1994). A psychometric evaluation of 4-point and 6-point Likert-type scales in relation to reliability and validity. Applied Psychological Measurement, 18(3), 205-215. https://doi.org/10.1177/014662169401800302
Dawes, J. (2008). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International Journal of Market Research, 50(1), 61-104. https://doi.org/10.1177/147078530805000106
DeVellis, R. F. (2003). Scale development, theory and applications. California: SAGE Publications.
Dunn-Rankin, P., Knezek, G. A., Wallace, S., & Zhang, S. (2004). Scaling methods. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc.
Gözütok, F. D., Toraman, Ç. ve Acar Erdol, T. (2017). Toplumsal cinsiyet eşitliği ölçeğinin (TCEÖ) geliştirilmesi (Development of gender equality scale). İlköğretim Online Dergisi (Elementary Education Online), 16(3), 1036-1048. http://dx.doi.org/10.17051/ilkonline.2017.330240
Jamieson, S. (2004). Likert Scales: How to (Ab)use them. Medical Education, 38, 1217‐1218
Joshi, A., Kale, S., Chandel, S., & Pal, D. K. (2015). Likert scale: Explored and explained. British Journal of Applied Science & Technology (BJAST), 7(4), 396-403. https://doi.org/10.9734/BJAST/2015/14975
Leung, S. O. (2011). A comparison of psychometric properties and normality in 4-, 5-, 6-, and 11-point Likert Scales. Journal of Social Service Research, 37, 412-421. https://doi.org/10.1080/01488376.2011.580697
Lord, F. M. (1954). Chapter II: Scaling. Review of Educational Research, 24(5), 375-392. https://doi.org/10.3102/00346543024005375
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Adv in Health Sci Educ 15, 625-632. https://doi.org/10.1007/s10459-010-9222-y
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill, Inc.
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica 104, 1-15. https://doi.org/10.1016/s0001-6918(99)00050-5
Price, L. R. (2017). Psychometric methods, theory into practice. New York: The Guilford Press
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Revelle, W. (2021) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych version=2.1.6.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680
Thomas, H. (1982). IQ interval scales, and normal distributions. Psychological Bulletin, 91, 198-202
Toraman, C. & Ozen, F. (2019). An investigation of the effectiveness of the gender equality course with a specific focus on faculties of education. Educational Policy Analysis and Strategic Research, 14(2), 6-28. https://doi.org/10.29329/epasr.2019.201.1
Torgerson, W. S. (1958). Theory and methods of scaling. New York: John Willey & Sons, Inc.
Wong, C.-S., Chuen, K.-C., & Fung, M.-Y. (1993). Differences between odd and even number of response scales: Some empirical evidence. Chinese Journal of Psychology, 35, 75-86.
Wu, H., & Leung, S. O. (2017) Can Likert Scales be treated as interval scales? A simulation study. Journal of Social Service Research, 43(4), 527-532. https://doi.org/10.1080/01488376.2017.1329775
Published
How to Cite
Issue
Section
Copyright (c) 2022 International Journal of Assessment Tools in Education

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.