Predicting Students’ Mathematics Scores from Reading Scores Using Supervised Learning

Authors

  • Nofita Fitriyani Telkom University Purwokerto

DOI:

https://doi.org/10.57152/malcom.v6i2.2555

Keywords:

Mathematics, Reading Scores, Score Prediction, Students, Supervised Learning

Abstract

This study aims to predict students’ mathematics scores based on their reading scores using a supervised learning approach. The dataset used is from Students' Performance in Exams (Kaggle), consisting of 1,000 student records, and was analyzed using Microsoft Excel and Google Colaboratory. The data was divided into training and test data with a ratio of 80:20. The research stages included descriptive statistical analysis, data visualization, Pearson correlation testing, linear regression model development, and model performance evaluation using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and coefficient of determination (R²).  Prior to modeling, regression assumptions including linearity, normality of residuals, and homoscedasticity were examined to ensure model validity. The results showed a strong positive relationship between reading and math scores with a correlation coefficient of 0.818. The linear regression model produced an MAE of 7.281, an RMSE of 8.818, and an R² of 0.680. Decision Tree Regressor was selected as a comparison model because it represents a non-linear and non-parametric supervised learning approach commonly used in educational data mining. This study contributes to educational data mining literature by demonstrating that interpretable regression models explain significant mathematics achievement variance, rivaling the performance of non-linear alternatives.

Downloads

Download data is not yet available.

References

N. Fauziah, W. Hadi, and Y. Sari, “The Relationship between Reading Comprehension Ability and the Ability to Solve Mathematics Story Problems for Class V Elementary School,” Jurnal Gentala Pendidikan Dasar, vol. 9, no. 1, pp. 55–58, Jun. 2024, doi: 10.22437/GENTALA.V9I1.32978.

I. R. Boctot, D. M. Enriquez, and C. P. Yurango, “Reading Comprehension as a Predictor of Mathematical Word Problem-solving Ability among Grade 7 Students,” Asian Journal of Education and Social Studies, vol. 51, no. 7, pp. 1115–1121, Jul. 2025, doi: 10.9734/AJESS/2025/V51I72196.

Fitri Anisa Kusumastuti, Novela Wulandari, Muh. Khaedir Lutfi, and Aeni Rohmawati, “Kemampuan Membaca Teks Matematika Sebagai Prediktor Literasi Matematis Siswa Sekolah Menengah Pertama,” JIPMat, vol. 10, no. 2, pp. 198–211, Oct. 2025, doi: 10.26877/jipmat.v10i2.2663.

R. P. Mentari, R. Tuanaya, and M. Albrecht, “Correlation Of Reading Comprehension Skill And Ability To Solve Mathematics Story Questions Of Students In Indonesia: A Meta-Analysis,” Matematika Dan Pembelajaran, vol. 11, no. 2, pp. 154–168, Nov. 2023, doi: 10.33477/mp.v11i2.5514.

I. Simbolon, P. Aditya, and E. Br Purba, “Prediksi Performa Akademik Siswa Berdasarkan Kehadiran dan Aktivitas E-Learning Menggunakan Algoritma Decision Tree,” RIGGS: Journal of Artificial Intelligence and Digital Business, vol. 4, no. 2, pp. 4899–4910, Jul. 2025, doi: 10.31004/riggs.v4i2.1352.

Riska Rismaya, Dwi Yuniarto, and David Setiadi, “Penerapan Algoritma Machine Learning dalam Prediksi Prestasi Akademik Mahasiswa,” Router?: Jurnal Teknik Informatika dan Terapan, vol. 3, no. 1, pp. 15–23, Feb. 2025, doi: 10.62951/router.v3i1.389.

K. V. Patil, K. D. Yesugade, and K. B. Naikwadi, “A Study on Regression Based Machine Learning Models to Predict the Student Performance,” Journal of Engineering Education Transformations, vol. 38, no. 2, pp. 177–186, Oct. 2024, doi: 10.16920/jeet/2024/v38i2/24200.

Y. Zhang and M. Cutumisu, “Predicting the Mathematics Literacy of Resilient Students from High?performing Economies: A Machine Learning Approach,” Studies in Educational Evaluation, vol. 83, p. 101412, Dec. 2024, doi: 10.1016/J.STUEDUC.2024.101412.

S. Bhutto, I. F. Siddiqui, Q. A. Arain, and M. Anwar, “Predicting Students’ Academic Performance Through Supervised Machine Learning,” ICISCT 2020 - 2nd International Conference on Information Science and Communication Technology, Feb. 2020, doi: 10.1109/ICISCT49550.2020.9080033.

R. Guevara-Reyes, I. Ortiz-Garcés, R. Andrade, F. Cox-Riquetti, and W. Villegas-Ch, “Machine learning models for academic performance prediction: interpretability and application in educational decision-making,” Front. Educ. (Lausanne)., vol. 10, p. 1632315, Aug. 2025, doi: 10.3389/FEDUC.2025.1632315/BIBTEX.

M. K. Kassy, “Predicting Student Performance using Linear Regression,” Data Science Insights, vol. 3, no. 2, pp. 66–74, Aug. 2025, doi: 10.63017/jdsi.v3i2.104.

R. I. Athallah, G. Al Godzali, and E. Rivalni, “Academic Performance Prediction from Study Habits and Lifestyle using Linear Regression,” Journal of Artificial Intelligence and Engineering Applications (JAIEA), vol. 5, no. 1, pp. 337–343, Oct. 2025, doi: 10.59934/jaiea.v5i1.1313.

D. Kristiani and N. Tupulu, “Pengaruh Kemampuan Membaca dan Motivasi Belajar Terhadap Kemampuan Pemecahan Masalah pada Soal Cerita Matematika,” Jurnal Pendidikan Matematika, vol. 4, pp. 789–797, Aug. 2025, doi: 10.56916/jp.v4i3.2205.

H. ? Volia, U. Citra, B. Roswita, L. Nahak, U. Citra Bangsa, and C. A. Naitili, “Pengaruh Literasi Digital dan Minat Baca Terhadap Motivasi Belajar Siswa SD GMIT Kuanino 3 Kupang,” Jurnal Jendela Pendidikan, vol. 5, Nov. 2025, doi: 10.57008/jjp.v5i04.1793.

“Students Performance in Exams.” Accessed: Dec. 17, 2025. [Online]. Available: https://www.kaggle.com/datasets/spscientist/students-performance-in-exams?utm_source

B. Mahendra, D. Pratama, A. Faqih, and R. Kurniawan, “Evaluasi Pengaruh Kualitas Data Terhadap Performa Model Machine Learning Menggunakan Pendekatan Data-Centric AI,” Jurnal Sistem Informasi dan Teknologi (SINTEK), doi: 10.56995/sintek.v6i1.211.

Pratik Mahajan, “Machine Learning-Based Data Preprocessing as well as Visualization Techniques for Predicting Students’ Tasks,” in Demystifying Emerging Trends in Machine Learning, BENTHAM SCIENCE PUBLISHERS, 2025. doi: 10.2174/97898153053951250201.

V. Çetin and O. Y?ld?z, “A comprehensive review on data preprocessing techniques in data analysis,” Pamukkale University Journal of Engineering Sciences, vol. 28, no. 2, pp. 299–312, Apr. 2022, doi: 10.5505/pajes.2021.62687.

A. M. Sharifnia, D. E. Kpormegbey, D. K. Thapa, and M. Cleary, “A Primer of Data Cleaning in Quantitative Research: Handling Missing Values and Outliers,” J. Adv. Nurs., vol. 82, no. 1, pp. 970–975, Jan. 2026, doi: 10.1111/jan.16908.

V. R. Joseph, “Optimal ratio for data splitting,” Stat. Anal. Data Min., vol. 15, no. 4, pp. 531–538, Aug. 2022, doi: 10.1002/sam.11583.

J. J. Salazar, L. Garland, J. Ochoa, and M. J. Pyrcz, “Fair train-test split in machine learning: Mitigating spatial autocorrelation for improved prediction accuracy,” J. Pet. Sci. Eng., vol. 209, p. 109885, Feb. 2022, doi: 10.1016/j.petrol.2021.109885.

D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput. Sci., vol. 7, pp. 1–24, Jul. 2021, doi: 10.7717/PEERJ-CS.623.

I. Shatz, “Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics,” Behavior Research Methods 2023 56:2, vol. 56, no. 2, pp. 826–845, Mar. 2023, doi: 10.3758/s13428-023-02072-x.

S. Midway and J. W. White, “Testing for normality in regression models: mistakes abound (but may not matter),” R. Soc. Open Sci., vol. 12, no. 4, p. 241904, Apr. 2025, doi: 10.1098/rsos.241904.

Downloads

Published

2026-04-18

How to Cite

Fitriyani, N. (2026). Predicting Students’ Mathematics Scores from Reading Scores Using Supervised Learning . MALCOM: Indonesian Journal of Machine Learning and Computer Science, 6(2), 494-503. https://doi.org/10.57152/malcom.v6i2.2555