A Simulation of Student Study Group Formation Design Using K-Means Clustering

Authors

  • Yudistira Ardi Nugraha Setyawan Putra Universitas Airlangga
  • Hendro Margono Airlangga University

DOI:

https://doi.org/10.57152/malcom.v5i2.1795

Keywords:

Academic Performance, K-Means Clustering, Machine Learning, Ridge Regression, Student Group Formation

Abstract

This research focuses on developing a simulation model for forming student study groups using an enhanced K-Means algorithm, addressing the challenge of optimizing group dynamics to improve learning outcomes. By analyzing the effectiveness of the formed study groups through RMSE (Root Mean Square Error) after dimensionality reduction with various regression models—including Linear Regression, Ridge Regression, Lasso Regression, Elastic Net, Random Forest Regressor, Gradient Boosting Regressor, and XGBoost Regressor—we aim to provide educators with a robust tool for assessing group configurations. The study identifies four distinct clusters, revealing that "Previous_Score" and "Attendance" are critical variables, achieving a highest Silhouette Score of 0.64 with five selected features. The ridge regression model also yielded a low RMSE of 0.045, explaining 72.39% of the variance in "Exam_Score." The findings suggest that targeted interventions tailored to each cluster—yellow, purple, blue, and green—can enhance academic outcomes by addressing specific student needs. This data-driven approach optimizes group dynamics and fosters a more inclusive learning environment, enhancing academic performance and cultivating essential social skills. The study underscores the potential of machine learning techniques in education and suggests avenues for future research into alternative clustering methods and their long-term impact on student engagement and success.

Downloads

Download data is not yet available.

References

X. Zhou, Q. Li, D. Xu, A. Holton, and B. Sato, “The promise of using study-together groups to promote engagement and performance in online courses: Experimental evidence on academic and non-cognitive outcomes,” Internet and Higher Education, Sep. 2023, doi: 10.1016/j.iheduc.2023.100922.

E. Vrieling-Teunter, N. de Vries, P. Sins, and M. Vermeulen, “Student motivation in teacher learning groups,” European Journal of Teacher Education, Jun. 2022, doi: 10.1080/02619768.2022.2086119.

N. Davidovitch and R. Yavich, “Study group size, motivation and engagement in the digital era,” Problems of education in the 21st century, Jun. 2023, doi: 10.33225/pec/23.81.361.

“Inclusive Study Group Formation At Scale,” Feb. 2022, doi: 10.48550/arxiv.2202.07439.

V. Abou-Khalil and H. Ogata, “Homogeneous Student Engagement: A Strategy for Group Formation During Online Learning,” Aug. 2021, doi: 10.1007/978-3-030-85071-5_6.

N. Sarode and J. W. Bakal, “Toward Effectual Group Formation Method for Collaborative Learning Environment,” Jan. 2021, doi: 10.1007/978-981-15-8677-4_29.

K. Lee, J. Ko, C. Jwa, and J. Cho, “Development of Grouping Tool for Effective Collaborative Learning,” Journal of Digital Convergence, Jan. 2018, doi: 10.14400/JDC.2018.16.7.243.

W. D. Linn, K. C. Lord, C. Y. Whong, and E. G. Phillips, “Developing effective study groups in the quest for the ‘Holy Grail’: critical thinking.,” The American Journal of Pharmaceutical Education, Oct. 2013, doi: 10.5688/AJPE778180.

V. H.-I. Chi and P. Kadandale, “All Groups Are Not Created Equal: Class-Based Learning Communities Enhance Exam Performance and Reduce Gaps,” CBE- Life Sciences Education, Sep. 2022, doi: 10.1187/cbe.21-09-0240.

A. Mujkanovic and A. Bollin, “Improving learning outcomes through systematic group reformation: the role of skills and personality in software engineering education,” International Conference on Software Engineering, May 2016, doi: 10.1145/2897586.2897615.

P. I. Ciptayani, K. C. Dewi, and I. W. B. Sentana, “Student grouping using adaptive genetic algorithm,” in International Electronics Symposium, Sep. 2016. doi: 10.1109/ELECSYM.2016.7861034.

Y. Y. L. Yuyun, C. R. T. Sinaga, M. Nugroho, and M. Ridha, “K-Means Algorithm for Clustering Students Based on Areas of Expertise (A Case Study),” Jun. 2024, doi: 10.62123/aqila.v1i1.23.

J.-P. Huang, P.-C. Wang, and R. M. F. Lubis, “The Process of Grouping Elementary School Students Receiving PIP Assistance uses the K-Means Algorithm,” Bulletin of Informatics and Data Science, Nov. 2023, doi: 10.61944/bids.v2i2.78.

I. W. Pramudjianto, A. K. Ningsih, and A. Komarudin, “Grouping Education Students at Pusdikjas Institutions of The TNI-AD’s Disjasad Using the K-Means Clustering Method,” Oct. 2023, doi: 10.55324/enrichment.v1i7.64.

P. Data, “Student Performance Factors,” Kaggle.com, 2024.https://www.kaggle.com/datasets/lainguyn123/student-performance-factors

A. Laakel Hemdanou, M. Lamarti Sefian, Y. Achtoun, and I. Tahiri, “Comparative analysis of feature selection and extraction methods for student performance prediction across different machine learning models,” Computers and Education: Artificial Intelligence, vol. 7, p. 100301, Dec. 2024, doi: https://doi.org/10.1016/j.caeai.2024.100301.

R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996, Available: https://www.jstor.org/stable/2346178

A. E. Hoerl and R. W. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics, vol. 42, no. 1, p. 80, Feb. 2000, doi: https://doi.org/10.2307/1271436.

L. Breiman, “Random Forests,” Jan. 2001. Available: https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf

T. Chen and C. Guestrin, “XGBoost: a Scalable Tree Boosting System,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, pp. 785–794, 2016, doi: https://doi.org/10.1145/2939672.2939785.

V. J. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artificial Intelligence Review, vol. 22, no. 2, pp. 85–126, Oct. 2004, doi: https://doi.org/10.1007/s10462-004-4304-y.

F. Husson and J. Josse, “Handling missing values in multiple factor analysis,” Food Quality and Preference, Dec. 2013, doi: 10.1016/J.FOODQUAL.2013.04.013.

S. Sharma, Y. Zhang, J. Aliaga, D. Bouneffouf, V. Muthusamy, and K. R. Varshney, “Data Augmentation for Discrimination Prevention and Bias Disambiguation,” in National Conference on Artificial Intelligence, Feb. 2020. doi: 10.1145/3375627.3375865.

D. Panda, “Does data cleaning disproportionately affect autistics,” Autism, Feb. 2018, doi: 10.1177/1362361316673566.

P. Cerda and G. Varoquaux, “Encoding high-cardinality string categorical variables,” arXiv: Learning, Jul. 2019, doi: 10.1109/TKDE.2020.2992529.

B. Wang et al., “A Normalized Numerical Scaling Method for the Unbalanced Multi-Granular Linguistic Sets,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Apr. 2015, doi: 10.1142/S0218488515500099.

A. Matuszak, “Dimensional Analysis can Improve Equations of the Model,” Procedia Engineering, Jan. 2015, doi: 10.1016/J.PROENG.2015.06.174.

G. C. Cawley, “Over-fitting in model selection and its avoidance,” 2012. doi: 10.1007/978-3-642-34156-4_1.

Y. Sun, J. Yao, and S. Goodison, “Feature selection for nonlinear regression and its application to cancer research,” in SIAM International Conference on Data Mining, Jan. 2015. doi: 10.1137/1.9781611974010.9.

M. Musso, E. Kyndt, E. Cascallar, and F. Dochy, “Predicting general academic performance and identifying the differential contribution of participating variables using artificial neural networks,” Aug. 2013, doi: 10.14786/FLR.V1I1.13.

L. Lovmar, A. Ahlford, M. Jonsson, and A.-C. Syvänen, “Silhouette scores for assessment of SNP genotype clusters,” BMC Genomics, Mar. 2005, doi: 10.1186/1471-2164-6-35.

S. Paul and P. Drineas, “Feature selection for ridge regression with provable guarantees,” Neural Computation, Apr. 2016, doi: 10.1162/NECO_A_00816.

W. S. Dong, C. H. Tian, Y. Wang, J. Yan, and C. Zhang, “Method and apparatus for evaluating predictive model,” Jun. 25, 2014

J. Peters, D. Janzing, and B. Schölkopf, “Identifying Cause and Effect on Discrete Data using Additive Noise Models,” in International Conference on Artificial Intelligence and Statistics, Mar. 2010.

S. Huang, F. Wei, L. Cui, X. Zhang, and M. Zhou, “Unsupervised Fine-tuning for Text Clustering,” in International Conference on Computational Linguistics, Dec. 2020. doi: 10.18653/V1/2020.COLING-MAIN.482.

T. Yoshida, I. Takeuchi, and M. Karasuyama, “Learning Interpretable Metric between Graphs: Convex Formulation and Computation with Graph Mining,” in Knowledge Discovery and Data Mining, Jul. 2019. doi: 10.1145/3292500.3330845.

K. Al Hazaa et al., “The effects of attendance and high school GPA on student performance in first-year undergraduate courses,” Cogent Education, vol. 8, no. 1, p. 1956857, Jan. 2021, doi: https://doi.org/10.1080/2331186x.2021.1956857.

S. White, L. Groom-Thomas, and S. Loeb, “Undertaking complex but effective instructional supports for students: A systematic review of research on high-impact tutoring planning and implementation,” doi: https://doi.org/10.26300/wztf-wj14.

D. L. DuBois, B. E. Holloway, J. C. Valentine, and H. Cooper, “Effectiveness of Mentoring Programs for Youth: A Meta-Analytic Review,” American Journal of Community Psychology, vol. 30, no. 2, pp. 157–197, Apr. 2002, doi: https://doi.org/10.1023/a:1014628810714.

M. Colasante, J. Bevacqua, and S. Muir, “Flexible hybrid format in university curricula to offer students in-subject choice of study mode: An educational design research project,” Journal of University Teaching and Learning Practice, vol. 17, no. 3, pp. 119–136, Jul. 2020, doi: https://doi.org/10.53761/1.17.3.9.

K. E. Cortes, K. Kortecamp, S. Loeb, and C. D. Robinson, “A scalable approach to high-impact tutoring for young readers,” Learning and Instruction, vol. 95, pp. 102021–102021, Sep. 2024, doi: https://doi.org/10.1016/j.learninstruc.2024.102021.

Downloads

Published

2025-03-21

How to Cite

Putra, Y. A. N. S., & Margono, H. (2025). A Simulation of Student Study Group Formation Design Using K-Means Clustering. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 5(2), 598-608. https://doi.org/10.57152/malcom.v5i2.1795