Clasification of A Credit Card Fraud Detection Model Using XGBoost with Smote and Gridsearchcv Optimization
Keywords:
Classification, Credit Card Fraud, GridsearchCV, SMOTE, XGBoostAbstract
The development of digital technology has motivated rapid growth in online transactions, so the increase in the volume of digital transactions also increases the risk of credit card fraud, particularly in transactions where a card is not present. By employing the Extreme Gradient Boosting (XGBoost) method in conjunction with the Synthetic Minority Over-sampling Technique (SMOTE) to solve class imbalance and fine-tuning model parameters using GridSearchCV, this study aims to improve a fraud detection system. The dataset, which consists of anonymized credit card transactions, presents a stark imbalance with fraudulent cases accounting for only 0.172% of the data. The study involves several stages: preprocessing the data, balancing class distribution, training the model, and evaluating its performance through metrics such as F1-score, precision, recall, accuracy, and AUC-ROC. Implementation of SMOTE proved effective in enhancing the representation of rare fraud cases without introducing overfitting, while GridSearchCV identified the most effective parameter configuration. The resulting model achieved top-tier performance with 100% accuracy, 0.81 precision, 0.85 recall, an F1-score of 0.83, and an AUC-ROC of 0.979, indicating strong capability in distinguishing fraudulent from legitimate transactions. The novelty of this study lies in the systematic integration of SMOTE, XGBoost, and GridSearchCV into a unified pipeline designed to address extreme class imbalance in real-world credit card transactions. Unlike previous studies that focused solely on algorithm comparison or hyperparameter tuning, this research emphasizes reducing false negatives, which pose the greatest financial and reputational risks. The findings not only demonstrate superior performance metrics but also provide practical contributions for financial institutions, regulators, and e-commerce platforms in developing scalable, reliable, and adaptive fraud detection systems
References
TransUnion, “2023 state of omnichannel fraud:Trends and strategies for enabling trusted commerce,” Transunion, 2023.
I. Mekterovi?, M. Karan, D. Pintar, and L. Brki?, “Credit card fraud detection in card-not-present transactions: Where to invest?,” Applied Sciences (Switzerland), vol. 11, no. 15, 2021, doi: 10.3390/app11156766.
K. F. Mauladi, I. M. L. M. Jaya, and M. A. Esquivias, “Exploring the Link between Cashless Society and Cybercrime in Indonesia,” Journal of Telecommunications and the Digital Economy, vol. 10, no. 3, pp. 58–76, 2022, doi: 10.18080/jtde.v10n3.533.
Y. Dewi, H. Suharman, P. S. Koeswayo, and N. D. Tanzil, “Factors influencing the effectiveness of credit card fraud prevention in Indonesian issuing banks,” Banks and Bank Systems, vol. 18, no. 3, pp. 44–60, 2023, doi: 10.21511/bbs.18(4).2023.05.
I. Y. Hafez, A. Y. Hafez, A. Saleh, A. A. Abd El-Mageed, and A. A. Abohany, “A systematic review of AI-enhanced techniques in credit card fraud detection,” J Big Data, vol. 12, no. 1, 2025, doi: 10.1186/s40537-024-01048-8.
J. J. Assabil and I. C. Obagbuwa, “Credit Card Fraud Detection Using Machine Learning Algorithms?: A Comparative Study of Six Models,” International Journal of Intelligent Systems And Applications In Engineering, vol. 12, pp. 862–875, 2024.
N. T. Ali, S. J. Hasan, A. Ghandour, and Z. S. Al-Hchimy, “Improving credit card fraud detection using machine learning and GAN technology,” BIO Web Conf, vol. 97, pp. 1–18, 2024, doi: 10.1051/bioconf/20249700076.
P. Gupta, A. Varshney, M. R. Khan, R. Ahmed, M. Shuaib, and S. Alam, “Unbalanced Credit Card Fraud Detection Data: A Machine Learning-Oriented Comparative Study of Balancing Techniques,” Procedia Comput Sci, vol. 218, pp. 2575–2584, 2022, doi: 10.1016/j.procs.2023.01.231.
S. Joses and A. Saikhu, “Enhancing XGBoost and CatBoost Methods for Diagnosing Parkinson’s Disease Through the Integration of SMOTE and Feature Selection Techniques,” in 2024 8th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia: IEEE, 2024, pp. 487–492. doi: 10.1109/ICITISEE63424.2024.10729906.
S. Kabane, “Impact of Sampling Techniques and Data Leakage on XGBoost Performance in Credit Card Fraud Detection,” Journal of Cornell University, pp. 1–19, 2024.
G. K. Kulatilleke, “Challenges and Complexities in Machine Learning based Credit Card Fraud Detection,” Journal of Cornell University, pp. 1–17, 2022.
S. B. Punuri et al., “Efficient Net-XGBoost: An Implementation for Facial Emotion Recognition Using Transfer Learning,” Mathematics, vol. 11, no. 3, pp. 1–24, 2023, doi: 10.3390/math11030776.
A. A. Syahputra and R. E. Saputro, “Application of the XGBoost Model with Hyperparameter Tuning for Industry Classification for Job Applicants,” Sinkron, vol. 8, no. 3, pp. 1920–1931, 2024, doi: 10.33395/sinkron.v8i3.13840.
A. Dal Pozzolo, O. Caelen, Y. A. Le Borgne, S. Waterschoot, and G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Syst Appl, vol. 41, no. 10, pp. 4915–4928, 2014, doi: 10.1016/j.eswa.2014.02.026.
F. Carcillo, A. Dal Pozzolo, Y. A. Le Borgne, O. Caelen, Y. Mazzer, and G. Bontempi, “SCARFF: A scalable framework for streaming credit card fraud detection with spark,” Information Fusion, vol. 41, no. May 2019, pp. 182–194, 2018, doi: 10.1016/j.inffus.2017.09.005.
A. D. Pozzolo, O. Caelen, R. A. Johnson, and G. Bontempi, “Calibrating probability with undersampling for unbalanced classification,” Proceedings - 2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015, no. December, pp. 159–166, 2015, doi: 10.1109/SSCI.2015.33.
S. Wang et al., “Advances in Data Preprocessing for Biomedical Data Fusion: An Overview of the Methods, Challenges, and Prospects Shuihua,” Science Direct, 2021.
D. Pramana and M. Mustakim, “Prediksi Status Penanganan Pasien Covid-19 dengan Algoritma Naïve Bayes Classifier di Provinsi Riau,” Jurnal Sistem Komputer dan Informatika (JSON), vol. 3, no. 2, p. 202, 2021, doi: 10.30865/json.v3i2.3570.
E. F. Okagbue et al., “A comprehensive overview of artificial intelligence and machine learning in education pedagogy: 21 Years (2000–2021) of research indexed in the scopus database,” Social Sciences and Humanities Open, vol. 8, no. 1, p. 100655, 2023, doi: 10.1016/j.ssaho.2023.100655.
Y. Matsuo et al., “Machine Learning: A Review of Learning Types,” Neural Networks, vol. 7, no. 1, pp. 267–275, 2020, doi: 10.20944/preprints202007.0230.v1.
N. M. Noor Mathivanan, N. A. MdGhani, and R. M. Janor, “A comparative study on dimensionality reduction between principal component analysis and k-means clustering,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 2, pp. 752–758, 2019, doi: 10.11591/ijeecs.v16.i2.pp752-758.
J. M. Ramírez-Sanz, J. A. Maestro-Prieto, Á. Arnaiz-González, and A. Bustillo, “Semi-supervised learning for industrial fault detection and diagnosis: A systemic review,” ISA Trans, vol. 143, no. August, pp. 255–270, 2023, doi: 10.1016/j.isatra.2023.09.027.
V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau, An introduction to deep reinforcement learning, vol. 11, no. 3–4. 2018. doi: 10.1561/2200000071.
S. Sankar, A. Potti, G. Naga Chandrika, and S. Ramasubbareddy, “Thyroid Disease Prediction Using XGBoost Algorithms,” Journal of Mobile Multimedia, vol. 18, no. 3, pp. 917–934, 2022, doi: 10.13052/jmm1550-4646.18322.
I. Muslim Karo Karo, “Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan,” Journal of Software Engineering, Information and Communication Technology, vol. 1, no. 1, pp. 11–18, 2020.
R. Chen et al., “A study on predicting the length of hospital stay for Chinese patients with ischemic stroke based on the XGBoost algorithm,” BMC Med Inform Decis Mak, vol. 23, no. 1, pp. 1–10, 2023, doi: 10.1186/s12911-023-02140-4.
J. Wu, X. Y. Chen, H. Zhang, L. D. Xiong, H. Lei, and S. H. Deng, “Hyperparameter optimization for machine learning models based on Bayesian optimization,” Journal of Electronic Science and Technology, vol. 17, no. 1, pp. 26–40, 2019, doi: 10.11989/JEST.1674-862X.80904120.
M. M. Ramadhan, I. S. Sitanggang, F. R. Nasution, and A. Ghifari, “Parameter Tuning in Random Forest Based on Grid Search Method for Gender Classification Based on Voice Frequency,” DEStech Transactions on Computer Science and Engineering, no. cece, 2017, doi: 10.12783/dtcse/cece2017/14611.
E. Pujo, A. Akhmad, K. Adi, and A. P. Widodo, “Enhancing the Accuracy of Airline Review Classification Using SMOTE and Grid Search with Cross Validation for Hyperparameter Tuning,” 2024. [Online]. Available: https://www.jisem-journal.com/
O. Shobayo, O. Zachariah, M. O. Odusami, and B. Ogunleye, “Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm,” Analytics, vol. 2, no. 3, pp. 604–617, 2023, doi: 10.3390/analytics2030034.
V. Chang, B. Ali, L. Golightly, M. A. Ganatra, and M. Mohamed, “Investigating Credit Card Payment Fraud with Detection Methods Using Advanced Machine Learning,” Information (Switzerland), vol. 15, no. 8, pp. 1–20, 2024, doi: 10.3390/info15080478.
H. C. Du, L. Lv, H. Wang, and A. Guo, “A novel method for detecting credit card fraud problems,” PLoS One, vol. 19, no. 3 March, pp. 1–26, 2024, doi: 10.1371/journal.pone.0294537.
T. Majumder, “Financial Fraud Detection for Credit Card Using XGBoost & SMOTE,” Nanotechnol Percept, pp. 32–50, doi: 10.62441/nano-ntp.vi.3425.
S. Rabbani, D. Safitri, N. Rahmadhani, A. A. F. Sani, and M. K. Anam, “Perbandingan Evaluasi Kernel SVM untuk Klasifikasi Sentimen dalam Analisis Kenaikan Harga BBM,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 2, pp. 153–160, 2023, doi: 10.57152/malcom.v3i2.897.
D. Trisanto, N. Rismawati, M. F. Mulya, and F. I. Kurniadi, “Modified Focal Loss in Imbalanced XGBoost for Credit Card Fraud Detection,” International Journal of intelligent Engineering & Systems, vol. 14, no. 4, 2021, doi: 10.1080/10462938809365891.
M. Fuat Asnawi, N. Fitriyanto, and M. Agoeng Pamoengkas, “THE APPLICATION OF XGBOOST CLASSIFICATION FOR FRAUD DETECTION IN CREDIT CARD TRANSACTIONS,” Clean Energy and Smart Technology, vol. 03, p. 2, 2025, doi: 10.58641/e-ISSN.
P. Gupta, A. Varshney, M. R. Khan, R. Ahmed, M. Shuaib, and S. Alam, “Unbalanced Credit Card Fraud Detection Data: A Machine Learning-Oriented Comparative Study of Balancing Techniques,” Procedia Comput Sci, vol. 218, pp. 2575–2584, 2023, doi: 10.1016/j.procs.2023.01.231.