A Comparison of Machine Learning Algorithms in Predicting Students' Academic Performance
DOI:
https://doi.org/10.57152/predatecs.v3i2.1861Keywords:
Academic Performance, Decision Tree, Machine Learning, Random Forest, Support Vector MachineAbstract
Predicting students’ academic performance enables early interventions and data-driven planning in education. We compare five machine-learning algorithms Decision Tree, K-Nearest Neighbor, Naive Bayes, Random Forest, and Support Vector Machine on a publicly available dataset of 1,001 students, evaluated with Accuracy, Precision, Recall, and F1-Score. The Decision Tree achieved the highest performance, with perfect scores on this dataset, while SVM (?82% F1) and Random Forest (?81% F1) were competitive. These results suggest that simple, interpretable models can be highly effective when features are clean and predictive; however, the Decision Tree’s perfection also indicates potential overfitting and warrants further validation on larger, more diverse samples. The study underscores how model choice should reflect dataset characteristics and practical deployment goals in educational settings, informing early-warning systems and targeted support programs.
References
L. Smith and C. Lamprecht, “Identifying the limitations associated with machine learning techniques in performing accounting tasks,” vol. 22, no. 2, pp. 227–253, 2024, doi: 10.1108/JFRA-05-2023-0280.
S. Zhang, S. Member, and J. Li, “KNN Classification With One-Step Computation,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 3, pp. 2711–2723, 2023, doi: 10.1109/TKDE.2021.3119140.
I. Lillo-bravo, J. Vera-medina, C. Fernandez-peruchena, and E. Perez-aparicio, “Random Forest model to predict solar water heating system performance,” Renew. Energy, vol. 216, no. April, p. 119086, 2023, doi: 10.1016/j.renene.2023.119086.
S. Lee, “Transformation Based Tri-Level Feature Selection Approach Using Wavelets and Swarm Computing for Prostate Cancer Classification,” vol. 8, 2020.
M. Rajakumaran, G. Arulselvan, S. Subashree, and R. Sindhuja, “Measurement?: Sensors Crop yield prediction using multi-attribute weighted tree-based support vector machine,” Meas. Sensors, vol. 31, no. May 2023, p. 101002, 2024, doi: 10.1016/j.measen.2023.101002.
I. J. A. Res, “Manuscript Info Abstract ISSN?: 2320-5407 Introduction?: -,” vol. 12, no. 01, pp. 422–438, 2024, doi: 10.21474/IJAR01/18138.
I. Bagus, G. Purwania, I. N. S. Kumara, and M. Sudarma, “Application of IoT-Based System for Monitoring Energy Consumption,” vol. 5, no. 2, 2020.
Par?: Henri Barki , École des HEC Jon Hartwick , McGill University,” no. June 2001, 2014, doi: 10.2307/3250929.
S. Chowdhury, “Comparison of accuracy and reliability of random forest , support vector machine , artificial neural network and maximum likelihood method in land use / cover classification of urban setting,” Environ. Challenges, vol. 14, no. October 2023, p. 100800, 2024, doi: 10.1016/j.envc.2023.100800.
A. Coscia, V. Dentamaro, S. Galantucci, A. Maci, and G. Pirlo, “Journal of Information Security and Applications Automatic decision tree-based NIDPS ruleset generation for DoS / DDoS attacks,” J. Inf. Secur. Appl., vol. 82, no. March, p. 103736, 2024, doi: 10.1016/j.jisa.2024.103736.
Z. Zhao, Z. Luo, J. Li, C. Chen, and Y. Piao, “When Self-Supervised Learning Meets Scene Classification?: Remote Sensing Scene Classification Based on a Multitask Learning Framework,” pp. 1–22, 2020, doi: 10.3390/rs12203276.
X. Zhang, H. Shen, T. Huang, Y. Wu, B. Guo, and Z. Liu, “Improved random forest algorithms for increasing the accuracy of forest aboveground biomass estimation using Sentinel-2 imagery,” Ecol. Indic., vol. 159, no. February, p. 111752, 2024, doi: 10.1016/j.ecolind.2024.111752.
S. García-ponsoda, A. Maté, and J. Trujillo, “Refining ADHD diagnosis with EEG?: The impact of preprocessing and temporal segmentation on classification accuracy,” Comput. Biol. Med., vol. 183, no. September, p. 109305, 2024, doi: 10.1016/j.compbiomed.2024.109305.
J. Mushava and M. Murray, “Flexible loss functions for binary classification in gradient-boosted decision trees?: An application to credit scoring,” Expert Syst. Appl., vol. 238, no. PC, p. 121876, 2024, doi: 10.1016/j.eswa.2023.121876.
N. Liu, Y. Xiang, F. Wang, and S. Cao, “Big Data Course Multidimensional Evaluation Model based on Knowledge Graph enhanced Transformer,” Cogn. Robot., 2024, doi: 10.1016/j.cogr.2024.11.003.
F. Farah, A. Ahmed, and R. Ça, “Results in Engineering Integrating autoencoder and decision tree models for enhanced energy consumption forecasting in microgrids?: A meteorological data-driven approach in Djibouti,” vol. 24, no. September, 2024, doi: 10.1016/j.rineng.2024.103033.
D. Ananda, S. Nurhidayarnis, and T. A. Afifah, “Text Classification of Translated Qur ’ anic Verses Using Supervised Learning Algorithm,” vol. 1, no. January, pp. 78–84, 2024.
T. Hara and M. Sasabe, “Practicality of in-kernel / user-space packet processing empowered by lightweight neural network and decision tree ?,” Comput. Networks, vol. 240, no. December 2023, p. 110188, 2024, doi: 10.1016/j.comnet.2024.110188.
S. Devasahayam and B. Albijanic, “Predicting hydrogen production from co-gasification of biomass and plastics using tree based machine learning algorithms,” Renew. Energy, vol. 222, no. November 2023, p. 119883, 2024, doi: 10.1016/j.renene.2023.119883.
S. S. Shijer, A. H. Jassim, L. A. Al-haddad, and T. T. Abbas, “e-Prime - Advances in Electrical Engineering , Electronics and Energy Evaluating electrical power yield of photovoltaic solar cells with k-Nearest neighbors?: A machine learning statistical analysis approach,” e-Prime - Adv. Electr. Eng. Electron. Energy, vol. 9, no. June, p. 100674, 2024, doi: 10.1016/j.prime.2024.100674.
A. Maity, P. Prakasam, and S. Bhargava, “Robust dual-tone multi-frequency tone detection using k-nearest neighbour classifier for a noisy environment,” 2020, doi: 10.1108/ACI-10-2020-0105.
J. Kim, J. Choi, Y. Park, C. K. Leung, S. Member, and A. Nasridinov, “KNN-SC?: Novel Spectral Clustering Algorithm Using k-Nearest Neighbors,” IEEE Access, vol. 9, pp. 152616–152627, 2021, doi: 10.1109/ACCESS.2021.3126854.
Y. Peng, “LK-Index?: A Learned Index for KNN Queries,” IEEE Access, vol. 12, no. August, pp. 103096–103103, 2024, doi: 10.1109/ACCESS.2024.3433524.
C. K. Zarry and A. Kurniawan, “Classifications of Offline Shopping Trends and Patterns with Machine Learning Algorithms,” vol. 2, no. July, pp. 18–25, 2024.
Z. Xue, J. Wei, and W. Guo, “A Real-Time Naive Bayes Classifier Accelerator on FPGA,” IEEE Access, vol. 8, pp. 40755–40766, 2020, doi: 10.1109/ACCESS.2020.2976879.
C. J. Anderson et al., “A novel naïve Bayes approach to identifying grooming behaviors in the force-plate actometric platform,” J. Neurosci. Methods, vol. 403, no. July 2023, p. 110026, 2024, doi: 10.1016/j.jneumeth.2023.110026.
Q. A. Al-haija and A. A. Alsulami, “Fast anomalous traffic detection system for secure vehicular communications,” vol. 5, no. x, 2024, doi: 10.23919/ICN.2024.0021.
K. Sumwiza, C. Twizere, G. Rushingabigwi, and P. Bakunzibake, “Informatics in Medicine Unlocked Enhanced cardiovascular disease prediction model using random forest algorithm,” Informatics Med. Unlocked, vol. 41, no. August, p. 101316, 2023, doi: 10.1016/j.imu.2023.101316.
K. A. Mahasiswa, R. Rachmatika, and A. Bisri, “Perbandingan Model Klasifikasi untuk Evaluasi,” vol. 6, no. 3, pp. 417–422, 2020.
C. P. Trisya, N. W. Azani, and L. M. Sari, “Performance Comparison of ARIMA , LSTM and SVM Models for Electric Energy Consumption Analysis,” vol. 1, no. January, pp. 85–94, 2024.
E. Asamoah, G. B. M. Heuvelink, and I. Chairi, “Heliyon Random forest machine learning for maize yield and agronomic efficiency prediction in Ghana,” vol. 10, no. July, 2024.
A. Shell, “Optimization of hydrochar production from almond shells using response surface methodology, artificial neural network, support vector machine and XGBoost,” Desalin. Water Treat., vol. 317, no. February, p. 100154, 2024, doi: 10.1016/j.dwt.2024.100154.
Y. Zhu, C. Gu, and M. A. Diaconeasa, “A missing data processing method for dam deformation monitoring data using spatiotemporal clustering and support vector machine model,” Water Sci. Eng., vol. 17, no. 4, pp. 417–424, 2024, doi: 10.1016/j.wse.2024.08.003.
Kaggle, "Students Performance in Exams Dataset", https://www.kaggle.com/datasets/spscientist/students-performance-in-exams, accessed on May 2025.
V. Wulandari, Mustakim, R. Novita and N. E. Rozanda, "Implementation of Machine Learning Algorithm for Stroke Risk Classification by Applying Sequential Forward Selection," 2025 International Conference on Computer Sciences, Engineering, and Technology Innovation (ICoCSETI), Jakarta, Indonesia, 2025, pp. 696-701, doi: 10.1109/ICoCSETI63724.2025.11020494.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Juanda Alra Baye, Gemma Tahmid Alfaridzi, Hilmy Abdurrahim, Abid Aziz Adinda, Muhammad Rakha Athallah, Muhammad Zahid Ramadhan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright © by Author; Published by Institut Riset dan Publikasi Indonesia (IRPI)
This Public Research Journal of Engineering, Data Technology and Computer Science is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.









