An Investigation of Data Mining Classification Methods in Classifying Students According to 2018 PISA Reading Scores

Data Mining Classification Methods in Classifying Students


  • Emrah Büyükatak
  • Duygu Anıl Hacettepe University


The purpose of this research was to determine classification accuracy of the factors affecting the success of students' reading skills based on PISA 2018 data by using Artificial Neural Networks, Decision Trees, K-Nearest Neighbor, and Naive Bayes data mining classification methods and to examine the general characteristics of success groups. In the research, 6890 student surveys of PISA 2018 were used. Firstly, missing data were examined and completed. Secondly, 24 index variables thought to affect the success of students' reading skills were determined by examining the related literature, PISA 2018 Technical Report, and PISA 2018 data. Thirdly, considering the sub-classification problem, the students were scaled in two categories as “Successful” and “Unsuccessful” according to the scores of PISA 2018 reading skills achievement test. Statistical analysis was conducted with SPSS MODELER program. At the end of the research, it was determined that Decision Trees C5.0 algorithm had the highest classification rate with 89.6%, the QUEST algorithm had the lowest classification rate with 75%, and four clusters were obtained proportionally close to each other in Two-Step Clustering analysis method to examine the general characteristics according to the success scores. It can be said that the data sets are suitable for clustering since the Silhouette Coefficient, which is calculated as 0.1 in clustering analyses, is greater than 0. It can be concluded that according to achievement scores, all data mining methods can be used to classify students since these models make accurate classification beyond chance.


Aksoy, E. (2014). Determination of the mathematically gifted and talented students using data mining in terms of some variables [Master Thesis] Dokuz Eylül University Department of Educational Sciences.

Anıl, D. (2008). The analysis of factors affecting the mathematical success of Turkish students in the PISA 2006 evaluation program with structural equation modeling. American-Eurasian Journal of Scientific Research, 3(2), 222-227.

Aydın, S. (2015). Data mining and an application on Anadolu University distance education system [Doctoral dissertation]. Anadolu University.

Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., & Zanasi, A. (1998). Discovering data mining: from concept to implementation. Prentice-Hall, Inc.

Cai, Y.D., & Chou, K.C. (2003). Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochemical and Biophysical Research Communications, 305(2), 407-411.

Çalış, A., Kayapınar, S., & Çetinyokuş, T. (2014). An application on computer and internet security with decision tree algorithms in data mining. Journal of Industrial Engineering, 25(3), 2-19.

Erdil, Z. (2010). Relationship of academic achievement and early intervention programs for children who are at socio-economical risk. Journal of Hacettepe University Faculty of Nursing, 17(1), 72-78.

Gelbal, S. (2010). The effect of socio-economic status of eighth grade students on their achievement in Turkish. Education and Science, 33(150). article/view/626

Liu, Y., & Schumann, M. (2005). Data mining feature selection for credit scoring models. Journal of the Operational Research Society, 56(9), 1099 1108.

Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications. Academic Press.

Özbay, Ö. (2015). The current status of distance education in the world and Turkey. The Journal of International Educational Sciences, 2(5), 376-394.

Özer, Y., & Anıl, D. (2011). Examining the factors affecting students' science and mathematics achievement with the structural equation modeling. Hacettepe University - Journal of Education, 41, 313-324.

Rizvi, S., Rienties, B., & Khoja, S.A. (2019). The role of demographics in online learning; A decision tree based approach. Computers & Education, 137, 32 47.

Roiger, R.J. (2017). Data mining: a tutorial-based primer. Chapman and Hall/CRC.

Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135 146.

Şahin, M. (2018). Risk assessment in car insurance using decision trees and artificial neural networks [Doctoral dissertation]. Yıldız Technical University Department of Statistics.

Witten, I.H. & Frank, E. (2000). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers.

Xu, Y., & Goodacre, R. (2018). On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. Journal of Analysis and Testing, 2(3), 249-262.



How to Cite

Büyükatak, E., & Anıl, D. (2023). An Investigation of Data Mining Classification Methods in Classifying Students According to 2018 PISA Reading Scores: Data Mining Classification Methods in Classifying Students. International Journal of Assessment Tools in Education, 9(4), 867-882. Retrieved from