Applying A Supervised Model for Diabetes Type 2 Risk Level Classification
DOI:
https://doi.org/10.57152/predatecs.v2i2.1105Keywords:
Classification, Diabetes, K-Nearest Neighbor, Naive Bayes, Random ForestAbstract
Diabetes can lead to heart attacks, kidney failure, blindness, and increased risk of death. This research was conducted with the aim of classifying a diabetes risk dataset. In this context, performance comparison was carried out on three supervised learning algorithms: K-Nearest Neighbor, Naive Bayes, and Random Forest, against a dataset containing information on specific indicators related to diabetes risk. Additionally, this study also aimed to evaluate the accuracy comparison of the results produced by these three algorithms. The results of this research show that Random Forest performs very well in detecting diabetes, prediabetes, and non-diabetes, with high precision, recall, and F1-score levels. Meanwhile, although the results are still below Random Forest, both Naive Bayes and K-NN still demonstrate significant performance, especially regarding prediabetes cases. In conclusion, from the comparison results, the Random Forest algorithm shows the highest accuracy level at 99%, followed by K-Nearest Neighbor with an accuracy of 85%, while Naive Bayes has the lowest accuracy rate of 74%. This research indicates that the Random Forest algorithm excels in classifying data compared to the other two algorithms.
References
P. Saeedi et al., “Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition,” Diabetes Res Clin Pract, vol. 157, Nov. 2019, doi: 10.1016/j.diabres.2019.107843.
E. Kalyva, E. Malakonaki, C. Eiser, and D. Mamoulakis, “Health-related quality of life (HRQoL) of children with type 1 diabetes mellitus (T1DM): Self and parental perceptions,” Pediatr Diabetes, vol. 12, no. 1, pp. 34–40, Feb. 2011, doi: 10.1111/j.1399-5448.2010.00653.x.
V. Vijayan and A. Ravikumar, “Study of Data Mining Algorithms for Prediction and Diagnosis of Diabetes Mellitus,” 2014.
CLASSIFICATION OF DIABETES MELLITUS 2019 Classification of diabetes mellitus. 2019. [Online]. Available: http://apps.who.int/bookorders.
J. Tanoey and H. Becher, “Diabetes prevalence and risk factors of early-onset adult diabetes: results from the Indonesian family life survey,” Glob Health Action, vol. 14, no. 1, 2021, doi: 10.1080/16549716.2021.2001144.
A. Algarni, “Data Mining in Education.” [Online]. Available: www.ijacsa.thesai.org
L. Chaves and G. Marques, “Data mining techniques for early diagnosis of diabetes: A comparative study,” Applied Sciences (Switzerland), vol. 11, no. 5, pp. 1–12, Mar. 2021, doi: 10.3390/app11052218.
B. Shivananda Nayak et al., “The association of age, gender, ethnicity, family history, obesity and hypertension with type 2 diabetes mellitus in Trinidad,” Diabetes and Metabolic Syndrome: Clinical Research and Reviews, vol. 8, no. 2, pp. 91–95, 2014, doi: 10.1016/j.dsx.2014.04.018.
W. K. Grylls, J. E. McKenzie, C. C. Horwath, and J. I. Mann, “Lifestyle factors associated with glycaemic control and body mass index in older adults with diabetes,” Eur J Clin Nutr, vol. 57, no. 11, pp. 1386–1393, Nov. 2003, doi: 10.1038/sj.ejcn.1601700.
B. C. K. Choi and F. Shi, “Risk factors for diabetes mellitus by age and sex: results of the National Population Health Survey,” Diabetologia, vol. 44, no. 10, pp. 1221–1231, Oct. 2001, doi: 10.1007/s001250100648.
D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,” in Procedia Computer Science, Elsevier B.V., 2018, pp. 1578–1585. doi: 10.1016/j.procs.2018.05.122.
S. M. Ganie and M. B. Malik, “Comparative analysis of various supervised machine learning algorithms for the early prediction of type-II diabetes mellitus,” Int J Med Eng Inform, vol. 14, no. 6, pp. 473–483, 2022.
A. F. Lubis et al., “Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes and Decission Tree,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 2, no. 1, pp. 44–51, Apr. 2024, doi: 10.57152/predatecs.v2i1.1103.
M. Q. Hatem, “Skin lesion classification system using a K-nearest neighbor algorithm,” Vis Comput Ind Biomed Art, vol. 5, no. 1, Dec. 2022, doi: 10.1186/s42492-022-00103-6.
A. I. Putri et al., “Implementation of K-Nearest Neighbors, Naïve Bayes Classifier, Support Vector Machine and Decision Tree Algorithms for Obesity Risk Prediction,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 2, no. 1, pp. 26–33, Apr. 2024, doi: 10.57152/predatecs.v2i1.1110.
X. Liu and Z. Wang, “Deep Learning in Medical Image Classification from MRI-based Brain Tumor Images,” Aug. 2024, [Online]. Available: http://arxiv.org/abs/2408.00636
A. Rahmah, N. Sepriyanti, M. H. Zikri, I. Ambarani, and M. Yusuf Bin Shahar, “Implementation of Support Vector Machine and Random Forest for Heart Failure Disease Classification,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 1, no. 1, pp. 34–40, 2023, [Online]. Available: https://journal.irpi.or.id/index.php/predatecs/article/view/816
T. Grätz, S. Vospernik, and C. Scheidl, “Evaluation of afforestations for avalanche protection with orthoimages using the random forest algorithm,” Eur J For Res, vol. 143, no. 2, pp. 581–601, Apr. 2024, doi: 10.1007/s10342-023-01640-2.
M. Radja and A. W. R. Emanuel, “Performance Evaluation of Supervised Machine Learning Algorithms Using Different Data Set Sizes for Diabetes Prediction,” in 2019 5th International Conference on Science in Information Technology (ICSITech), IEEE, Oct. 2019, pp. 252–258. doi: 10.1109/ICSITech46713.2019.8987479.
R. Kamali, Y. S. Sari, I. Aldmour, and R. Budiarto, “Verification of Covid-19 Social Assistance Recipients using Naïve Bayes Classifier,” International Journal of Emerging Multidisciplinaries: Computer Science & Artificial Intelligence, vol. 1, no. 2, pp. 1–12, Sep. 2022, doi: 10.54938/ijemdcsai.2022.01.2.100.
A.-J. Gallego, J. Calvo-Zaragoza, and J. R. Rico-Juan, “Insights into efficient k-nearest neighbor classification with convolutional neural codes,” IEEE Access, vol. 8, pp. 99312–99326, 2020.
M. H. Effendy, D. Anggraeni, Y. S. Dewi, and A. F. Hadi, “Classification of Bank Deposit Using Naïve Bayes Classifier (NBC) and K–Nearest Neighbor (K-NN),” in International Conference on Mathematics, Geometry, Statistics, and Computation (IC-MaGeStiC 2021), Atlantis Press, 2022, pp. 163–166.
X. Zhou, P. Lu, Z. Zheng, D. Tolliver, and A. Keramati, “Accident prediction accuracy assessment for highway-rail grade crossings using random forest algorithm compared with decision tree,” Reliab Eng Syst Saf, vol. 200, p. 106931, 2020.
N. Rachdaoui, “Insulin: The friend and the foe in the development of type 2 diabetes mellitus,” Mar. 01, 2020, MDPI AG. doi: 10.3390/IJMS21051770.
G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN Model-Based Approach in Classification.”
S. Raschka, J. Patterson, and C. Nolet, “Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence,” Information, vol. 11, no. 4, p. 193, 2020.
Ramachandran A, “Know the signs and symptoms of diabetes,” Indian J Med Res, 2014.
I. , S. D. , & K. A. Saini, “QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases,” J Adv Res, 2013.
T. R. Patil and M. S. S. Sherekar, “Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification,” International Journal Of Computer Science And Applications, vol. 6, no. 2, 2013, [Online]. Available: http://www.cs.bme.hu/~kiskat/adatb/bank-data-
J. D. M. Rennie, “massachusetts institute of technology-artificial intelligence laboratory Improving Multi-class Text Classification with Naive Bayes Improving Multi-class Text Classification with Naive Bayes,” 2001.
Y. Xia, “Correlation and association analyses in microbiome study integrating multiomics in health and disease,” 2020, pp. 309–491. doi: 10.1016/bs.pmbts.2020.04.003.
Z. C. Dwinnie et al., “Application of the Supervised Learning Algorithm for Classification of Pregnancy Risk Levels,” vol. 1, no. 1, pp. 26–33, 2023, [Online]. Available: https://journal.irpi.or.id/index.php/predatecs/article/view/806
R. , A. S. V. , & S. V. Devika, “Comparative study of classifier for chronic kidney disease prediction using naive bayes, KNN and random forest. In 2019 3rd International conference on computing methodologies and communication (ICCMC),” 3rd International conference on computing methodologies and communication (ICCMC), 2019.
N. T. Luchia, M. Mustakim, N. Noviarni, K. Sussolaikah and T. Arifianto, "Feature Selection In Support Vector Machine And Random Forest Algorithms For The Classification Of Recipients Of The Smart Indonesia Program," 2024 International Conference on Circuit, Systems and Communication (ICCSC), Fes, Morocco, 2024, pp. 1-6, doi: 10.1109/ICCSC62074.2024.10616886.