Obesity Prediction Using Machine Learning Algorithms

Authors

  • Hanifatus Syahidah Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia
  • Novila Irsandi Universitas Islam Negeri Sultan Syarif Kasim Riau, Indonesia
  • Adila Nur Ajizah Dicle University, Turkey
  • Amelia Amelia Dicle University, Turkey

DOI:

https://doi.org/10.57152/ijatis.v2i1.1869

Keywords:

Decision Tree, K-NN, NBC, Random Forest, SVM, Obesity Prediction

Abstract

This study aims to develop a prediction model for obesity levels by utilizing five machine learning algorithms, namely K-Nearest Neighbors (K-NN), Naïve Bayes Classifier (NBC), Decision Tree, Random Forest, and Support Vector Machine (SVM). The data used in this study were obtained from Kaggle, consisting of 2111 data with 17 attributes covering lifestyle and demographic factors. The research process involved data collection, pre-processing, data division using the Holdout Split method (70% training data and 30% testing data), and the application of machine learning algorithms. Performance evaluation used accuracy, precision, recall, and F1 score metrics. The results showed that the Random Forest algorithm had the best performance with an accuracy of 92.29%, followed by Decision Tree at 90.54%, K-NN at 83.44%, and NBC and SVM which reached 59.15% and 59.08%, respectively. Confusion matrix analysis revealed that NBC and SVM had difficulty distinguishing certain obesity classes. Based on these findings, it can be concluded that Random Forest is the most effective algorithm in predicting obesity levels. The results of this study are expected to contribute to developing a more accurate obesity prediction system that can be implemented in the real world.

References

X. Shu and Y. Ye, “Knowledge Discovery: Methods from data mining and machine learning,” Soc. Sci. Res., vol. 110, no. October 2022, p. 102817, 2023, doi: 10.1016/j.ssresearch.2022.102817.

B. ALTINDIS and F. BAYRAM, “Data mining implementations for determining root causes and precautions of occupational accidents in underground hard coal mining,” Saf. Health Work, no. xxxx, 2024, doi: 10.1016/j.shaw.2024.09.003.

J. C. Macuácua, J. A. S. Centeno, and C. Amisse, “Data mining approach for dry bean seeds classification,” Smart Agric. Technol., vol. 5, no. April 2023, doi: 10.1016/j.atech.2023.100240.

C. Saiprakash, S. R. Kumar Joga, A. Mohapatra, and B. Nayak, "Improved fault detection and classification in PV arrays using Stockwell transform and data mining techniques," Results Eng., vol. 23, no. September, p. 102808, 2024, doi: 10.1016/j.rineng.2024.102808.

U. S. Obesity and F. Collaborators, "Articles National-level and state-level prevalence of overweight and obesity among children, adolescents, and adults in the USA, 1990 – 2021, and forecasts up to 2050," pp. 1–21, 2024, doi: 10.1016/S0140-6736(24)01548-4.

F. Ferdowsy, K. S. A. Rahi, M. I. Jabiullah, and M. T. Habib, “A machine learning approach for obesity risk prediction,” Curr. Res. Behav. Sci., vol. 2, no. May, p. 100053, 2021, doi: 10.1016/j.crbeha.2021.100053.

W. Stroebe, “Is the energy balance explanation of the obesity epidemic wrong?,” Appetite, vol. 188, no. May, p. 106614, 2023, doi: 10.1016/j.appet.2023.106614.

A. I. Putri et al., “Implementation of K-Nearest Neighbors, Naïve Bayes Classifier (NBC) Classifier, Support Vector Machine and Decision Tree Algorithms for Obesity Risk Prediction,” Public Res. J. Eng. Data Technol. Comput. Sci., vol. 2, no. 1, pp. 26–33, 2024, doi: 10.57152/predatecs.v2i1.1110.

G. Melo et al., "Structural responses to the obesity epidemic in Latin America: what are the next steps for food and physical activity policies?" Lancet Reg. Heal. - Am., vol. 21, p. 100486, 2023, doi: 10.1016/j.lana.2023.100486.

B. Yu et al., “Sarcopenic obesity is associated with cardiometabolic multimorbidity in Chinese middle-aged and older adults: a cross-sectional and longitudinal study,” J. Nutr. Heal. Aging, vol. 28, no. 10, p. 100353, 2024, doi: 10.1016/j.jnha.2024.100353.

C. Liu et al., “The role of obesity in sarcopenia and the optimal body composition to prevent against sarcopenia and obesity,” Front. Endocrinol. (Lausanne)., vol. 14, no. March, pp. 1–11, 2023, doi: 10.3389/fendo.2023.1077255.

J. H. Bae, J. W. Seo, X. Li, S. Y. Ahn, Y. Sung, and D. Y. Kim, "Neural network model for predicting possible sarcopenic obesity using Korean national fitness award data (2010–2023)," Sci. Rep., vol. 14, no. 1, pp. 1–15, 2024, doi: 10.1038/s41598-024-64742-w.

J. Bae, “Sequential Deep Learning Model for Obesity Prediction Based on Physical Fitness Factors?: An Analysis of Data from the 2010 – 2023 Korean National Physical Fitness Data,” pp. 1–20, 2024.

M. Dirik, “Application of machine learning techniques for obesity prediction: a comparative study,” J. Complex. Heal. Sci., vol. 6, no. 2, pp. 16–34, 2023, doi: 10.21595/chs.2023.23193.

Dr. S. M. Rajbhoj, Shweta Shivale, Prof. Vinod. P. Mulik, Sakshi Shirke, and Prof. Amol. P. Yadav, "Obesity Guard: Machine Learning for Early Detection and Prevention," Int. Res. J. Adv. Eng. Hub, vol. 2, no. 07, pp. 2041–2051, 2024, doi: 10.47392/irjaeh.2024.0279.

M. Méndez, M. G. Merayo, and M. Núñez, Machine learning algorithms to forecast air quality: a survey, vol. 56, no. 9. Springer Netherlands, 2023. doi: 10.1007/s10462-023-10424-4.

Y. Chen, P. Tan, M. Li, H. Yin, and R. Tang, “K-means clustering method based on nearest-neighbor density matrix for customer electricity behavior analysis,” Int. J. Electr. Power Energy Syst., vol. 161, no. January 2024, doi: 10.1016/j.ijepes.2024.110165.

A. F. Lubis et al., "Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes Classifier (NBC) and Decision Tree," Public Res. J. Eng. Data Technol. Comput. Sci., vol. 2, no. 1, pp. 44–51, 2024, doi: 10.57152/predatecs.v2i1.1103.

S. S. Shijer, A. H. Jassim, L. A. Al-Haddad, and T. T. Abbas, “Evaluating electrical power yield of photovoltaic solar cells with k-Nearest neighbors: A machine learning statistical analysis approach,” e-Prime - Adv. Electr. Eng. Electron. Energy, vol. 9, no. July, p. 100674, 2024, doi: 10.1016/j.prime.2024.100674.

O. Peretz, M. Koren, and O. Koren, “Naive Bayes classifier – An ensemble procedure for recall and precision enrichment,” Eng. Appl. Artif. Intell., vol. 136, no. PB, p. 108972, 2024, doi: 10.1016/j.engappai.2024.108972.

M. Muta’alimah, C. K. Zarry, A. Kurniawan, H. Hasysya, M. F. Firas, and N. Nadhirah, “Classifications of Offline Shopping Trends and Patterns with Machine Learning Algorithms,” Public Res. J. Eng. Data Technol. Comput. Sci., vol. 2, no. 1, pp. 18–25, 2024, doi: 10.57152/predatecs.v2i1.1099.

A. V. D. Sano, A. A. Stefanus, E. D. Madyatmadja, H. Nindito, A. Purnomo, and C. P. M. Sianipar, “Proposing a visualized comparative review analysis model on tourism domain using Naïve Bayes Classifier (NBC) classifier,” Procedia Comput. Sci., vol. 227, pp. 482–489, 2023, doi: 10.1016/j.procs.2023.10.549.

W. Guo, G. Wang, C. Wang, and Y. Wang, “Distribution network topology identification based on gradient boosting decision tree and attribute weighted naive Bayes,” Energy Reports, vol. 9, pp. 727–736, 2023, doi: 10.1016/j.egyr.2023.04.256.

I. T. Akinola, Y. Sun, I. G. Adebayo, and Z. Wang, "Daily peak demand forecasting using Pelican Algorithm optimized Support Vector Machine (POA-SVM)," Energy Reports, vol. 12, no. June, pp. 4438–4448, 2024, doi: 10.1016/j.egyr.2024.10.017.

W. J. Sari et al., “Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients,” Public Res. J. Eng. Data Technol. Comput. Sci., vol. 2, no. 1, pp. 34–43, 2024, doi: 10.57152/predatecs.v2i1.1119.

Y. tao Zhu, C. Shi Gu, and M. A. Diaconeasa, "A missing data processing method for dam deformation monitoring data using spatiotemporal clustering and support vector machine model," Water Sci. Eng., vol. 17, no. 4, pp. 417–424, 2024, doi: 10.1016/j.wse.2024.08.003.

W. Zhou, H. Liu, R. Zhou, J. Li, and S. Ahmadi, "An optimal method for diagnosing heart disease using a combination of grasshopper evalutionary algorithm and support vector machines," Heliyon, vol. 10, no. 9, p. e30363, 2024, doi: 10.1016/j.heliyon.2024.e30363.

J. S. Pimentel, R. Ospina, and A. Ara, “A novel fusion Support Vector Machine integrating weak and sphere models for classification challenges with massive data,” Decis. Anal. J., vol. 11, no. December 2023, p. 100457, 2024, doi: 10.1016/j.dajour.2024.100457.

J. Du et al., “Maize crop residue cover mapping using Sentinel-2 MSI data and random forest algorithms,” Int. Soil Water Conserv. Res., no. xxxx, 2024, doi: 10.1016/j.iswcr.2024.09.004.

T. Ait tchakoucht, B. Elkari, Y. Chaibi, and T. Kousksou, "Random Forest with feature selection and K-fold cross-validation for predicting the electrical and thermal efficiencies of air-based photovoltaic-thermal systems," Energy Reports, vol. 12, no. March, pp. 988–999, 2024, doi: 10.1016/j.egyr.2024.07.002.

K. Matsumura, K. Hamazaki, H. Kasamatsu, A. Tsuchida, and H. Inadera, “Decision tree learning for predicting chronic postpartum depression in the Japan Environment and Children’s Study,” J. Affect. Disord., vol. 369, no. February 2024, pp. 643–652, 2025, doi: 10.1016/j.jad.2024.10.034.

M. Bagriacik and F. E. B. Otero, “Multiple fairness criteria in decision tree learning,” Appl. Soft Comput., vol. 167, no. PA, p. 112313, 2024, doi: 10.1016/j.asoc.2024.112313.

M. Itzkin, M. L. Palmsten, M. L. Buckley, J. L. Birchler, and L. M. Torres-Garcia, “Developing a decision tree model to forecast runup and assess uncertainty in empirical formulations,” Coast. Eng., vol. 195, no. October 2024, p. 104641, 2025, doi: 10.1016/j.coastaleng.2024.104641.

Downloads

Published

2025-03-03