Implementation of Machine Learning Algorithms for Predicting Student Academic Performance
DOI:
https://doi.org/10.57152/ijatis.v3i1.1871Keywords:
Academic Performance, K-Nearest Neighbors, Naïve Bayes, Random Forest, Support Vector MachineAbstract
This study examines the effectiveness of five data mining algorithms, K-Nearest Neighbor (K-NN), Naive Bayes, Decision Tree, Random Forest, and Support Vector Machine (SVM), in predicting and comparing students' academic performance. The goal is to evaluate the following: the study data includes average grades, learning motivation, study hours per week, and parental support. The data underwent preprocessing steps, including normalization, outlier removal, and splitting into training and test sets. Model performance was evaluated using accuracy, precision, and recall metrics. The results indicate that the Random Forest algorithm performed the best, followed by the Decision Tree, which also demonstrated strong performance. The SVM and Naive Bayes algorithms provided excellent results, while K-NN performed poorly due to class overlap in the data. The conclusion of this study is that the Random Forest algorithm is the most effective method for predicting students' academic performance and significantly contributes to data-driven analysis to improve the quality of education.
References
D. Kusumastuti, “Kecemasan dan Prestasi Akademik pada Mahasiswa,” Analitika, vol. 12, no. 1, pp. 22–33, 2020, doi: 10.31289/analitika.v12i1.3110.
L. R. Chairiyati, “Hubungan antara Self-Efficacy ….. (Lisa Ratriana Chairiyati) Hubungan Antara Self-Efficacy Akademik Dan Konsep Diri Akademik Dengan Prestasi Akademik,” Humaniora, vol. 4, no. 2, pp. 1125–1133, 2013.
E. R. Astuti and R. Zakaria, “Hubungan Motivasi Belajar Dengan Prestasi Akademik,” J. Heal. Sci. Gorontalo J. Heal. Sci. Community, vol. 5, no. 1, pp. 222–228, 2021, doi: 10.35971/gojhes.v5i1.10276.
R. Rachmatika and A. Bisri, “Perbandingan Model Klasifikasi untuk Evaluasi Kinerja Akademik Mahasiswa,” J. Edukasi dan Penelit. Inform., vol. 6, no. 3, p. 417, 2020, doi: 10.26418/jp.v6i3.43097.
Y. Mardi, “Data Mining?: Klasifikasi Menggunakan Algoritma C4.5,” Edik Inform., vol. 2, no. 2, pp. 213–219, 2017, doi: 10.22202/ei.2016.v2i2.1465.
F. N. R. F. J. Aziz, B. D. Setiawan, and I. Arwani, “Implementasi Algoritma K-Means untuk Klasterisasi Kinerja Akademik Mahasiswa,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 6, pp. 2243–2251, 2018.
B. Zahedi, B. Nahid-Mobarakeh, S. Pierfederici, and L. E. Norum, “A robust active stabilization technique for dc microgrids with tightly controlled loads,” Proc. - 2016 IEEE Int. Power Electron. Motion Control Conf. PEMC 2016, vol. VI, no. 1, pp. 254–260, 2016, doi: 10.1109/EPEPEMC.2016.7752007.
A. P. Wibawa, M. Guntur, A. Purnama, M. Fathony Akbar, and F. A. Dwiyanto, “Metode-metode Klasifikasi,” Pros. Semin. Ilmu Komput. dan Teknol. Inf., vol. 3, no. 1, pp. 134–138, 2018.
G. A. Rosso, “Milton,” William Blake Context, no. September, pp. 184–191, 2019, doi: 10.1017/9781316534946.021.
D. P. Utomo and M. Mesran, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,” J. Media Inform. Budidarma, vol. 4, no. 2, p. 437, 2020, doi: 10.30865/mib.v4i2.2080.
E. R. Astuti and R. Zakaria, “Relationship of Learning Motivation with Academic Achievement,” J. Heal. Sci.?; Gorontalo J. Heal. Sci. Community, vol. 5, no. 1, 2021.
F. Azuaje, “Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques 2nd edition,” Biomed. Eng. Online, vol. 5, no. 1, 2006, doi: 10.1186/1475-925x-5-51.
S. B. Kotsiantis and D. Kanellopoulos, “Data preprocessing for supervised leaning,” Int. J. …, vol. 1, no. 2, pp. 1–7, 2006, doi: 10.1080/02331931003692557.
Z. C. Dwinnie, L. Khairani, M. A. M. Putri, J. Adhiva, and M. I. F. Tsamarah, “Application of the Supervised Learning Algorithm for Classification of Pregnancy Risk Levels,” Public Res. J. Eng. Data Technol. Comput. Sci., vol. 1, no. 1, pp. 26–33, 2023, doi: 10.57152/predatecs.v1i1.806.
V. B. S. Prasath et al., “Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier -- A Review,” pp. 1–39, 2017, doi: 10.1089/big.2018.0175.
M. Kumari and S. Soni, “A Review of classification in Web Usage Mining using K-Nearest Neighbour,” Adv. Comput. Sci. Technol., vol. 10, no. 5, pp. 1405–1416, 2017, [Online]. Available: http://www.ripublication.com
A. Kataria and M. D. Singh, “A Review of Data Classification Using K-Nearest Neighbour Algorithm,” Int. J. Emerg. Technol. Adv. Eng., vol. 3, no. 6, pp. 354–360, 2013.
M. Yousef, M. Nebozhyn, H. Shatkay, S. Kanterakis, L. C. Showe, and M. K. Showe, “Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier,” Bioinformatics, vol. 22, no. 11, pp. 1325–1334, 2006, doi: 10.1093/bioinformatics/btl094.
Z. Xue, J. Wei, and W. Guo, “A Real-Time Naive Bayes Classifier Accelerator on FPGA,” IEEE Access, vol. 8, pp. 40755–40766, 2020, doi: 10.1109/ACCESS.2020.2976879.
J. Abellán and J. G. Castellano, “Improving the Naive Bayes classifier via a quick variable selection method using maximum of entropy,” Entropy, vol. 19, no. 6, 2017, doi: 10.3390/e19060247.
A. F. Lubis et al., “Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes and Decission Tree,” Public Res. J. Eng. Data Technol. Comput. Sci., vol. 2, no. 1, pp. 44–51, 2024, doi: 10.57152/predatecs.v2i1.1103.
A. Rana and R. Pandey, “A review of popular decision tree algorithms in data mining,” Asian J. Multidimens. Res., vol. 10, no. 10, pp. 230–237, 2021, doi: 10.5958/2278-4853.2021.00837.5.
S. R. Jiao, J. Song, and B. Liu, “A Review of Decision Tree Classification Algorithms for Continuous Variables,” J. Phys. Conf. Ser., vol. 1651, no. 1, 2020, doi: 10.1088/1742-6596/1651/1/012083.
H. Blockeel, L. Devos, B. Frénay, G. Nanfack, and S. Nijssen, “Decision trees: from efficient prediction to responsible AI,” Front. Artif. Intell., vol. 6, 2023, doi: 10.3389/frai.2023.1124553.
G. M. Toche Tchio, J. Kenfack, D. Kassegne, F. D. Menga, and S. S. Ouro-Djobo, “A Comprehensive Review of Supervised Learning Algorithms for the Diagnosis of Photovoltaic Systems, Proposing a New Approach Using an Ensemble Learning Algorithm,” Appl. Sci., vol. 14, no. 5, 2024, doi: 10.3390/app14052072.
B. P. O. Lovatti, M. H. C. Nascimento, Á. C. Neto, E. V. R. Castro, and P. R. Filgueiras, “Use of Random forest in the identification of important variables,” Microchem. J., vol. 145, no. November 2018, pp. 1129–1134, 2019, doi: 10.1016/j.microc.2018.12.028.
E. Scornet, G. Biau, and J. P. Vert, “Consistency of random forests,” Ann. Stat., vol. 43, no. 4, pp. 1716–1741, 2015, doi: 10.1214/15-AOS1321.
K. Fawagreh, M. M. Gaber, and E. Elyan, “Random forests: From early developments to recent advancements,” Syst. Sci. Control Eng., vol. 2, no. 1, pp. 602–609, 2014, doi: 10.1080/21642583.2014.956265.
A. D. Kulkarni and B. Lowe, “Random Forest Algorithm for land cover classification,” Int. J. Recent Innov. Trends Comput. Commun., vol. 4, no. 3, pp. 58–63, 2016, [Online]. Available: http://www.ijritcc.org
Kaitlin, T.?; Smith, and B. Sadler, “Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets,” Recomm. Cit. Kirasich, vol. 1, no. 3, p. 9, 2018, [Online]. Available: https://scholar.smu.edu/datasciencereviewhttp://digitalrepository.smu.edu.Availableat:https://scholar.smu.edu/datasciencereview/vol1/iss3/9
B. Gaye, D. Zhang, and A. Wulamu, “Improvement of Support Vector Machine Algorithm in Big Data Background,” Math. Probl. Eng., vol. 2021, 2021, doi: 10.1155/2021/5594899.
H. Bhavsar and M. H. Panchal, “A Review on Support Vector Machine for Data Classification,” Int. J. Adv. Res. Comput. Eng. Technol., vol. 1, no. 10, pp. 2278–1323, 2012.
Z. R. Yang, “Biological applications of support vector machines.,” Brief. Bioinform., vol. 5, no. 4, pp. 328–338, 2004, doi: 10.1093/bib/5.4.328.
S. Huang, C. A. I. Nianguang, P. Penzuti Pacheco, S. Narandes, Y. Wang, and X. U. Wayne, “Applications of support vector machine (SVM) learning in cancer genomics,” Cancer Genomics and Proteomics, vol. 15, no. 1, pp. 41–51, 2018, doi: 10.21873/cgp.20063.
D. C. Toledo-Pérez, J. Rodríguez-Reséndiz, R. A. Gómez-Loenzo, and J. C. Jauregui-Correa, “Support Vector Machine-based EMG signal classification techniques: A review,” Appl. Sci., vol. 9, no. 20, 2019, doi: 10.3390/app9204402.







