Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients

Windy Junita Sari; Nasya Amirah Melyani; Fadlan Arrazak; Muhammad Asyraf Bin Anahar; Ezza Addini; Zaid Husham Al-Sawaff; Selvakumar Manickam

doi:10.57152/predatecs.v2i1.1119

Authors

Windy Junita Sari Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia
Nasya Amirah Melyani Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia
Fadlan Arrazak Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia
Muhammad Asyraf Bin Anahar International Islamic University of Malaysia, Malaysia
Ezza Addini Ankara Y?ld?r?m Beyaz?t Üniversitesi, Turkey
Zaid Husham Al-Sawaff Center of Technical Research Northern Technical University Mosul, Iraq
Selvakumar Manickam Universiti Sains Malaysia, Malaysia

DOI:

https://doi.org/10.57152/predatecs.v2i1.1119

Keywords:

Classification, Neural Network, Random Forest, Stroke, Support Vector Machine

Abstract

Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.

References

G. Sailasya and G. L. A. Kumari, “Analyzing the Performance of Stroke Prediction using ML Classification Algorithms,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 6, pp. 539–545, 2021, doi: 10.14569/IJACSA.2021.0120662.

H. Li, S. Ghorbani, C. C. Ling, V. W. Yong, and M. Xue, “The extracellular matrix as modifier of neuroinflammation and recovery in ischemic stroke and intracerebral hemorrhage,” Neurobiol. Dis., vol. 186, no. September, p. 106282, 2023, doi: 10.1016/j.nbd.2023.106282.

G. Fekadu et al., “Management protocols and encountered complications among stroke patients admitted to stroke unit of Jimma university medical center, Southwest Ethiopia: Prospective observational study,” Ann. Med. Surg., vol. 48, no. September, pp. 135–143, 2019, doi: 10.1016/j.amsu.2019.11.003.

M. Fadli and R. A. Saputra, “Klasifikasi Dan Evaluasi Performa Model Random Forest Untuk Prediksi Stroke,” J. Tek., vol. 12 No.02, no. 02, pp. 72–80, 2023, doi: http://dx.doi.org/10.31000/jt.v12i2.9099.

T. Imai, S. Iwata, D. Miyo, S. Nakamura, M. Shimazawa, and H. Hara, “A novel free radical scavenger, NSP-116, ameliorated the brain injury in both ischemic and hemorrhagic stroke models,” J. Pharmacol. Sci., vol. 141, no. 3, pp. 119–126, 2019, doi: 10.1016/j.jphs.2019.09.012.

H. Y. Cheng, Y. S. Wang, P. Y. Hsu, C. Y. Chen, Y. C. Liao, and S. H. H. Juo, “miR-195 Has a Potential to Treat Ischemic and Hemorrhagic Stroke through Neurovascular Protection and Neurogenesis,” Mol. Ther. - Methods Clin. Dev., vol. 13, no. June, pp. 121–132, 2019, doi: 10.1016/j.omtm.2018.11.011.

E. Dritsas and M. Trigka, “Stroke Risk Prediction with Machine Learning Techniques,” Sensors, vol. 22, no. 13, 2022, doi: 10.3390/s22134670.

Q. Huang et al., “Association between genetic predisposition and disease burden of stroke in China: a genetic epidemiological study,” Lancet Reg. Heal. - West. Pacific, vol. 36, no. 27, p. 100779, 2023, doi: 10.1016/j.lanwpc.2023.100779.

E. Natarajan, F. Augustin, M. K. A. Kaabar, C. R. Kenneth, and K. Yenoke, “Various defuzzification and ranking techniques for the heptagonal fuzzy number to prioritize the vulnerable countries of stroke disease,” Results Control Optim., vol. 12, no. June, p. 100248, 2023, doi: 10.1016/j.rico.2023.100248.

A. Byna and M. Basit, “Penerapan Metode Adaboost Untuk Mengoptimasi Prediksi Penyakit Stroke Dengan Algoritma Naïve Bayes,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 9, no. 3, pp. 407–411, 2020, doi: 10.32736/sisfokom.v9i3.1023.

C. I. Hatleberg et al., “Predictors of Ischemic and Hemorrhagic Strokes Among People Living With HIV: The D:A:D International Prospective Multicohort Study,” EClinicalMedicine, vol. 13, pp. 91–100, 2019, doi: 10.1016/j.eclinm.2019.07.008.

M. Z. Alam, M. S. Rahman, and M. S. Rahman, “A Random Forest based predictor for medical data classification using feature ranking,” Informatics Med. Unlocked, vol. 15, no. January, p. 100180, 2019, doi: 10.1016/j.imu.2019.100180.

N. B. Toosi, A. R. Soffianian, S. Fakheran, S. Pourmanafi, C. Ginzler, and L. T. Waser, “Comparing different classification algorithms for monitoring mangrove cover changes in southern Iran,” Glob. Ecol. Conserv., vol. 19, 2019, doi: 10.1016/j.gecco.2019.e00662.

B. Cui, H. Ding, S. Li, and G. Zhuang, “Recommendation of Clinical Diagnostic Literature based on Random Forest Model and Query Expansion,” Procedia Comput. Sci., vol. 162, no. Itqm 2019, pp. 59–67, 2019, doi: 10.1016/j.procs.2019.11.258.

B. A. Akinnuwesi et al., “Application of support vector machine algorithm for early differential diagnosis of prostate cancer,” Data Sci. Manag., vol. 6, no. 1, pp. 1–12, 2023, doi: 10.1016/j.dsm.2022.10.001.

K. Kannadasan, D. R. Edla, and V. Kuppili, “Type 2 diabetes data classification using stacked autoencoders in deep Neural Network,” Clin. Epidemiol. Glob. Heal., vol. 7, no. 4, pp. 530–535, 2019, doi: 10.1016/j.cegh.2018.12.004.

A. Putri et al., “Komparasi Algoritma K-NN, Naive Bayes dan SVM untuk Prediksi Kelulusan Mahasiswa Tingkat Akhir,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 3, no. 1, pp. 20–26, 2023, doi: 10.57152/malcom.v3i1.610.

E. Oluwasakin et al., “Machine Learning with Applications Minimization of high computational cost in data preprocessing and modeling using MPI4Py,” Mach. Learn. with Appl., vol. 13, no. May, p. 100483, 2023, doi: 10.1016/j.mlwa.2023.100483.

S. Albahra et al., “Seminars in Diagnostic Pathology Artificial intelligence and machine learning overview in pathology & laboratory medicine?: A general review of data preprocessing and basic supervised concepts,” vol. 40, no. February, pp. 71–87, 2023, doi: 10.1053/j.semdp.2023.02.002.

L. Urso, E. Petermann, F. Gnädinger, and P. Hartmann, “Use of random forest algorithm for predictive modelling of transfer factor soil-plant for radiocaesium: A feasibility study,” J. Environ. Radioact., vol. 270, no. October, 2023, doi: 10.1016/j.jenvrad.2023.107309.

P. Josso, A. Hall, C. Williams, T. Le, P. Lusty, and B. Murton, “Application of random-forest machine learning algorithm for mineral predictive mapping of Fe-Mn crusts in the World Ocean,” Ore Geol. Rev., vol. 162, no. June, p. 105671, 2023, doi: 10.1016/j.oregeorev.2023.105671.

C. M. YE??LKANAT, “Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm,” Chaos, Solitons and Fractals, vol. 140, 2020, doi: 10.1016/j.chaos.2020.110210.

A. Hasnain, Y. Sheng, M. Z. Hashmi, U. A. Bhatti, Z. Ahmed, and Y. Zha, “Assessing the ambient air quality patterns associated to the COVID-19 outbreak in the Yangtze River Delta: A random forest approach,” Chemosphere, vol. 314, no. October 2022, p. 137638, 2023, doi: 10.1016/j.chemosphere.2022.137638.

R. Solgi, H. A. Loáiciga, and M. Kram, “Long short-term memory neural network (LSTM-NN) for aquifer level time series forecasting using in-situ piezometric observations,” J. Hydrol., vol. 601, 2021, doi: 10.1016/j.jhydrol.2021.126800.

SAMSON TONTOYE, “healthcare dataset stroke data,” Kaggle, 2021. https://www.kaggle.com/datasets/godfatherfigure/healthcare-dataset-stroke-data (accessed Mar. 10, 2024).

M. F. Fayyad and D. T. Savra, “Sentiment Analysis of Towards Electric Cars using Naive Bayes Classifier and Support Vector Machine Algorithm,” vol. 1, no. July, pp. 1–9, 2023, doi: https://doi.org/10.57152/predatecs.v1i1.814.

C. P. Trisya, N. W. Azani, and L. M. Sari, “Performance Comparison of ARIMA , LSTM and SVM Models for Electric Energy Consumption Analysis,” vol. 1, no. January, pp. 85–94, 2024, doi: https://doi.org/10.57152/predatecs.v1i2.869.

G. Fu, “Tuning model parameters in class-imbalanced learning with precision-recall curve,” no. August, 2021, doi: 10.1002/bimj.201800148.