Classification-Based Supervised Learning Algorithms for Accurate Prediction of Customer Churn in Banking
DOI:
https://doi.org/10.57152/ijatis.v3i1.2499Keywords:
Churn Banking, Classification, Multi-Layer Perceptron, Random Forest, Support Vector Machine, XGBoostAbstract
The banking industry has become increasingly dynamic with the emergence of financial technology (fintech) companies that have significantly changed customer behavior and expectations. As competition intensifies, customer churn has become a critical issue because it directly affects a bank’s revenue, reputation, and long-term sustainability. Therefore, banks require effective analytical approaches to identify customers likely to leave and to develop appropriate retention strategies. This study aims to analyze and predict customer churn likelihood using a bank customer dataset by applying supervised machine learning classification techniques. Five algorithms were evaluated, namely Decision Tree, Random Forest, Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost). The models were trained and evaluated using a hold-out validation approach, and performance was assessed using accuracy as the primary evaluation metric. The experimental results show that Random Forest achieved the highest accuracy of 86%, outperforming the other algorithms, while the MLP model produced the lowest accuracy of 82%. These findings indicate that ensemble-based methods provide better performance for predicting bank customer churn. The results of this study can assist banks in identifying potential churn customers and in developing effective customer retention strategies. Future research may explore additional algorithms, advanced data preprocessing techniques, and larger datasets to further improve prediction performance.
References
B. Thenmozhi, C. Jeyabharathi, and S. Vimala, “Customer Churn Prediction Analysis in the Banking Sector Using Machine Learning,” 2024.
J. Basit, A. Sheikh, N. Umer, and M. Syed, “Comparative Analysis of Deep Learning Architectures for Customer Churn Prediction in the Banking Sector,” 2025. https://journals.iub.edu.pk/index.php/JCIS/
A. Raza et al., “Predicting regional-scale groundwater levels at high spatial resolution using spatial Random Forest models,” International Journal of Applied Earth Observation and Geoinformation, vol. 144, Nov. 2025, doi: 10.1016/j.jag.2025.104918.
N. Simarmata et al., “Comparison of random forest, gradient tree boosting, and classification and regression trees for mangrove cover change monitoring using Landsat imagery,” Egyptian Journal of Remote Sensing and Space Science, vol. 28, no. 1, pp. 138–150, Mar. 2025, doi: 10.1016/j.ejrs.2025.02.002.
P. P. Singh, F. I. Anik, R. Senapati, A. Sinha, N. Sakib, and E. Hossain, “Investigating customer churn in banking: A machine learning approach and visualization app for data science and management,” Data Science and Management, vol. 7, no. 1, pp. 7–16, Mar. 2024, doi: 10.1016/j.dsm.2023.09.002.
B. Thenmozhi, C. Jeyabharathi, and S. Vimala, “Customer Churn Prediction Analysis In The Banking Sector Using Machine Learning,” 2024.
C. Karunakaran, V. Niranjan, and A. S. Setlur, “Random Forest and XGBoost-based ensemble models for colorectal cancer exome variant classification and web application deployment for early prediction,” Computational and Structural Biotechnology Reports, vol. 2, p. 100063, 2025, doi: 10.1016/j.csbr.2025.100063.
A. Y. Mahmoud, “Novel efficient feature selection: Classification of medical and immunotherapy treatments utilising Random Forest and Decision Trees,” Intell. Based. Med., vol. 10, Jan. 2024, doi: 10.1016/j.ibmed.2024.100151.
M. Imani, A. Beikmohammadi, and H. R. Arabnia, “Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels,” Technologies (Basel)., vol. 13, no. 3, Mar. 2025, doi: 10.3390/technologies13030088.
T. Adugna, W. Xu, and J. Fan, “Comparison of Random Forest and Support Vector Machine Classifiers for Regional Land Cover Mapping Using Coarse Resolution FY-3C Images,” Remote Sens. (Basel)., vol. 14, no. 3, Feb. 2022, doi: 10.3390/rs14030574.
M. S. Chowdhury, “Comparison of accuracy and reliability of random forest, support vector machine, artificial neural network and maximum likelihood method in land use/cover classification of urban setting,” Environmental Challenges, vol. 14, Jan. 2024, doi: 10.1016/j.envc.2023.100800.
S. Hafsah et al., “Classification of IPB variety of cayenne pepper genotypes using physical characteristics during the growing period until harvest using machine learning,” Future Foods, vol. 10, Dec. 2024, doi: 10.1016/j.fufo.2024.100500.
D. C. Djarang and W. P. Sari, “An In-Depth Rainfall Classification Using Random Forest And Artificial Neural Network,” Procedia Comput. Sci., vol. 269, pp. 1339–1347, 2025, doi: 10.1016/j.procs.2025.09.075.
E. Helmud, E. Helmud, F. Fitriyani, and P. Romadiana, “Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 13, no. 1, pp. 92–97, Feb. 2024, doi: 10.32736/sisfokom.v13i1.1985.
W. J. Sari et al., “Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 2, no. 1, pp. 34–43, Apr. 2024, doi: 10.57152/predatecs.v2i1.1119.
Y. Yang and H. Wang, “Random Forest-Based Machine Failure Prediction: A Performance Comparison,” Applied Sciences (Switzerland), vol. 15, no. 16, Aug. 2025, doi: 10.3390/app15168841.
M. M. Hassan et al., “A comparative assessment of machine learning algorithms with the Least Absolute Shrinkage and Selection Operator for breast cancer detection and prediction,” Decision Analytics Journal, vol. 7, Jun. 2023, doi: 10.1016/j.dajour.2023.100245.
O. Idemudia, J. O. Ehiorobo, O. C. Izinyon, and I. R. Ilaboya, “Evaluating the performance of Random Forest, Decision Tree, Support Vector Regression and Gradient Boosting for streamflow prediction,” CTU Journal of Innovation and Sustainable Development, vol. 16, no. 2, pp. 116–130, Jul. 2024, doi: 10.22144/ctujoisd.2024.297.
S. Kumar Ghosh and F. Janan, “Prediction of Student’s Performance Using Random Forest Classifier,” 2021.
I. M. Rajagukguk, R. Hartanto, Julian, and R. Halim, “Comparative Analysis of XGBoost, Random Forest, and Logistic Regression for Classifying Jakarta’s Air Pollution Index (ISPU),” in Procedia Computer Science, Elsevier B.V., 2025, pp. 108–120. doi: 10.1016/j.procs.2025.08.264.
V. B. Moneravilla et al., “Random Forest Regression assisted Raman spectroscopy for authenticating the purity of virgin coconut oil,” Journal of Food Composition and Analysis, vol. 149, p. 108784, Jan. 2026, doi: 10.1016/j.jfca.2025.108784.
E. Meriç and Ç. Özer, “Symptom-Based Health Status Prediction via Decision Tree, KNN, XGBoost, LDA, SVM, and Random Forest,” in Lecture Notes in Networks and Systems, Springer Science and Business Media Deutschland GmbH, 2023, pp. 193–207. doi: 10.1007/978-3-031-27099-4_15.
S. R. Suwanlee et al., “Weed classification in sugarcane fields in Northeast Thailand from multi-temporal Sentinel-1 and Sentinel-2 data together with random forest algorithm,” Science of Remote Sensing, vol. 13, p. 100352, Jun. 2026, doi: 10.1016/j.srs.2025.100352.
D. N. Cosenza et al., “Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock,” Forestry, vol. 94, no. 2, pp. 311–323, Apr. 2021, doi: 10.1093/forestry/cpaa034.
N. Y. Nikitin and A. Stepashkin, “Classification of tensile test results of unidirectional carbon fiber-polysulfone composite material based on random forest, KNN and CNN methods,” Results in Materials, vol. 28, Dec. 2025, doi: 10.1016/j.rinma.2025.100788.
S. T. Hamidou and A. Mehdi, “Enhancing IDS performance through a comparative analysis of Random Forest, XGBoost, and Deep Neural Networks,” Machine Learning with Applications, vol. 22, p. 100738, Dec. 2025, doi: 10.1016/j.mlwa.2025.100738.
Y. Yuan, K. Wang, D. Duives, W. Daamen, and S. P. Hoogendoorn, “Machine learning-based bicycle delay estimation at signalized intersections using sparse GPS data and traffic control signals - A Dutch case study using random forest algorithm,” Artificial Intelligence for Transportation, vol. 3–4, p. 100037, Nov. 2025, doi: 10.1016/j.ait.2025.100037.
Y. Altork, “Comparative analysis of machine learning models for wind speed forecasting: Support vector machines, fine tree, and linear regression approaches,” International Journal of Thermofluids, vol. 27, May 2025, doi: 10.1016/j.ijft.2025.101217.
D. K. Murugan, Z. Said, D. Dineshbabu, S. Shankaranarayanan, G. Dhamodaran, and C. V. Dayakar, “Experimental validation and support vector machine optimization of rice husk gasification for sustainable syngas production and dual-fuel engine application,” Results in Engineering, vol. 28, Dec. 2025, doi: 10.1016/j.rineng.2025.108345.
A. S. More and D. P. Rana, “Performance enrichment through parameter tuning of random forest classification for imbalanced data applications,” in Materials Today: Proceedings, Elsevier Ltd, 2022, pp. 3585–3593. doi: 10.1016/j.matpr.2021.12.020.
X. Yu et al., “A continual-learning-based Multi-Layer Perceptronfor improved reconstruction of three-dimensional nitrate concentrations,” Earth Syst. Sci. Data, vol. 17, no. 6, pp. 2735–2759, Jun. 2025, doi: 10.5194/essd-17-2735-2025.







