Lung Disease Risk Prediction Using Machine Learning Algorithms
Keywords:
Classification, Decision Tree, Lung Diseases, Machine Learning, PredictionAbstract
Lung diseases, including lung cancer, are one of the leading causes of death in the world. Early detection is essential to increase patients' chances of recovery and reduce healthcare costs. The utilization of machine learning algorithms can be used to solve this problem. This study evaluates five machine learning algorithms, namely K-Nearest Neighbors (K-NN), Naïve Bayes Classifier (NBC), Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM), for lung disease prediction using a dataset of 30,000 data with 11 attributes from Kaggle. The dataset was processed through data preprocessing and divided into training and test data with a ratio of 70%:30% and 80%:20%. The algorithm performance was evaluated using precision, recall, F1-score, and accuracy metrics. The results show that RF, SVM, and DT algorithms have the highest performance, with accuracy reaching 94.72% at 70%:30% ratio. The DT algorithm, which previously showed low performance in heart disease classification, provided competitive results in lung disease prediction. This research focuses on the importance of proper algorithm selection and data organization to improve the effectiveness of disease prediction. The findings contribute to the development of artificial intelligence technology for medical applications, particularly in supporting early diagnosis of lung diseases.
References
Aditya Ingole, Yuvraj Patil, Yashraj Wawkar, and Aboli Deole, “Review on Deep Learning for Pulmonary Diseases Detection Using Chest X-Ray,” International Journal of Advanced Research in Science, Communication and Technology, pp. 542–547, May 2024, doi: 10.48175/ijarsct-18577.
S. Kamran Hussain et al., “Machine Learning Approaches for Early Detection of Lung Cancer,” Journal of Computing & Biomedical Informatics, 2023, doi: 10.56979/601/2023.
M. A. Naser, A. A. Majeed, M. Alsabah, T. R. Al-Shaikhli, and K. M. Kaky, “A Review of Machine Learning’s Role in Cardiovascular Disease Prediction: Recent Advances and Future Challenges,” Feb. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/a17020078.
H. Jindal, S. Agrawal, R. Khera, R. Jain, and P. Nagrath, “Heart disease prediction using machine learning algorithms,” in IOP Conference Series: Materials Science and Engineering, IOP Publishing Ltd, Jan. 2021. doi: 10.1088/1757-899X/1022/1/012072.
A. F. Lubis et al., “Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes and Decission Tree,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 2, no. 1, pp. 44–51, Apr. 2024, doi: 10.57152/predatecs.v2i1.1103.
T. A. Assegie, “Heart disease prediction model with k-nearest neighbor algorithm,” International Journal of Informatics and Communication Technology (IJ-ICT), vol. 10, no. 3, p. 225, Dec. 2021, doi: 10.11591/ijict.v10i3.pp225-230.
D. Salama AbdElminaam, N. Mohamed, H. Wael, A. Khaled, and A. Moataz, “MLHeartDisPrediction: Heart Disease Prediction using Machine Learning,” 2023. doi: 10.21608/jocc.2023.282098.
A. A. Ahmad and H. Polat, “Prediction of Heart Disease Based on Machine Learning Using Jellyfish Optimization Algorithm,” Diagnostics, vol. 13, no. 14, Jul. 2023, doi: 10.3390/diagnostics13142392.
S. Hadijah Hasanah, “Application of Machine Learning for Heart Disease Classification Using Naive Bayes,” Jurnal Matematika MANTIK, vol. 8, no. 1, pp. 68–77, Jun. 2022, doi: 10.15642/mantik.2022.8.1.68-77.
A. Arifuddin, G. S. Buana, R. A. Vinarti, and A. Djunaidy, “Performance Comparison of Decision Tree and Support Vector Machine Algorithms for Heart Failure Prediction,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 628–636. doi: 10.1016/j.procs.2024.03.048.
O. Wijaya et al., “Correlation of Sputum Macrophage and Neutrophil with COPD Assessment Test (CAT),” vol. Vol. 32, No 4, 2012.
D. Anwar, Y. Chan, and M. Basyar, “Correlation Between The Degree of Breathlessness According to Modified Medical Research Council Scale (MMRC scale) with The Degree of Chronic Obstructive Pulmonary Disease,” vol. Vol. 32, No 4.
G. R. Macklin et al., “Evolving epidemiology of poliovirus serotype 2 following withdrawal of the serotype 2 oral poliovirus vaccine,” Science (1979), vol. 368, no. 6489, pp. 401–405, Apr. 2020, doi: 10.1126/science.aba1238.
“World Health Organization, ‘Chronic obstructive pulmonary disease (COPD),’ 2019. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/chronic-obstructive-pulmonary-disease-(copd). [Accessed: 8 Dec. 2024].”
G. R. Macklin et al., “Evolving epidemiology of poliovirus serotype 2 following withdrawal of the serotype 2 oral poliovirus vaccine,” Science (1979), vol. 368, no. 6489, pp. 401–405, Apr. 2020, doi: 10.1126/science.aba1238.
A. I. Putri et al., “Implementation of K-Nearest Neighbors, Naïve Bayes Classifier, Support Vector Machine and Decision Tree Algorithms for Obesity Risk Prediction,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 2, no. 1, pp. 26–33, Apr. 2024, doi: 10.57152/predatecs.v2i1.1110.
M. Muta’alimah, C. K. Zarry, A. Kurniawan, H. Hasysya, M. F. Firas, and N. Nadhirah, “Classifications of Offline Shopping Trends and Patterns with Machine Learning Algorithms,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 2, no. 1, pp. 18–25, Apr. 2024, doi: 10.57152/predatecs.v2i1.1099.
O. Peretz, M. Koren, and O. Koren, “Naive Bayes classifier – An ensemble procedure for recall and precision enrichment,” Eng Appl Artif Intell, vol. 136, p. 108972, 2024, doi: https://doi.org/10.1016/j.engappai.2024.108972.
R. Syahputra, G. J. Yanris, and D. Irmayani, “SVM and Naïve Bayes Algorithm Comparison for User Sentiment Analysis on Twitter,” Sinkron, vol. 7, no. 2, pp. 671–678, May 2022, doi: 10.33395/sinkron.v7i2.11430.
R. Alfyani and Muljono, “Comparison of Naïve Bayes and KNN Algorithms to understand Hepatitis,” International Seminar on Application for Technology of Information and Communication (ISemantic), 2020, doi: 10.1109/iSemantic50169.2020.9234299.
I. Fadil, M. A. Helmiawan, F. Supriadi, A. Saeppani, Y. Sofiyan, and A. Guntara, “Waste Classifier using Naive Bayes Algorithm,” in 2022 10th International Conference on Cyber and IT Service Management (CITSM), 2022, pp. 1–5. doi: 10.1109/CITSM56380.2022.9935894.
M. R. Anugrah, N. Nazira, N. A. Al-Qadr, and N. Ihza, “Implementation of C4.5 and Support Vector Machine (SVM) Algorithm for Classification of Coronary Heart Disease,” vol. 1, no. 1, pp. 20–25, 2023, doi: 10.7910/DVN/76SIQD.
A. Pushpa Athisaya Sakila Rani and N. Suresh Singh, “Classification and identification of pest, diseases and nutrient deficiency in paddy using layer based EMD phase features with decision tree,” Information Processing in Agriculture, 2024, doi: https://doi.org/10.1016/j.inpa.2024.09.003.
G. Pagliarini, S. Scaboro, G. Serra, G. Sciavicco, and I. E. Stan, “Neural-symbolic temporal decision trees for multivariate time series classification,” Inf Comput, vol. 301, p. 105209, 2024, doi: https://doi.org/10.1016/j.ic.2024.105209.
S. Talukdar et al., “Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations—A Review,” Remote Sens (Basel), vol. 12, no. 7, 2020, doi: 10.3390/rs12071135.
G. Aziz, N. Minallah, A. Saeed, J. Frnda, and W. Khan, “Remote sensing based forest cover classification using machine learning,” Sci Rep, vol. 14, no. 1, p. 69, 2024, doi: 10.1038/s41598-023-50863-1.
A. Y. Mahmoud, “Novel efficient feature selection: Classification of medical and immunotherapy treatments utilising Random Forest and Decision Trees,” Intell Based Med, vol. 10, p. 100151, 2024, doi: https://doi.org/10.1016/j.ibmed.2024.100151.
M. B. Sharr, C. E. Parrish, and J. Jung, “Automated classification of valid and invalid satellite derived bathymetry with random forest,” International Journal of Applied Earth Observation and Geoinformation, vol. 129, p. 103796, 2024, doi: https://doi.org/10.1016/j.jag.2024.103796.
M. Fauzi Fayyad, D. Takratama Savra, V. Kurniawan, and B. Hilmi Estanto, “Sentiment Analysis of Towards Electric Cars using Naive Bayes Classifier and Support Vector Machine Algorithm,” vol. 1, no. 1, pp. 1–9, 2023, doi: 10.57152/predatecs.v1i1.814.
A. Rahmah, N. Sepriyanti, M. H. Zikri, I. Ambarani, and M. Yusuf Bin Shahar, “Implementation of Support Vector Machine and Random Forest for Heart Failure Disease Classification,” vol. 1, no. 1, pp. 34–40, 2023, doi: 10.57152/predatecs.v1i1.816.
N. W. Azani, C. P. Trisya, L. M. Sari, H. Handayani, and M. R. M. Alhamid, “Performance Comparison of ARIMA, LSTM and SVM Models for Electric Energy Consumption Analysis,” Public Research Journal of Engineering, Data Technology and Computer Science, vol. 1, no. 2, Feb. 2024, doi: 10.57152/predatecs.v1i2.869.
D. A. Anggoro, “Comparison of Accuracy Level of Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) Algorithms in Predicting Heart Disease,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 5, pp. 1689–1694, May 2020, doi: 10.30534/ijeter/2020/32852020.
J. Cai and N. Xi, “Site classification methodology using support vector machine: A study,” Earthquake Research Advances, vol. 4, no. 4, p. 100294, 2024, doi: https://doi.org/10.1016/j.eqrea.2024.100294.
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ananda Putri Aulia, Qaula Adelia, Haykal Alya Mubarak, Mohd. Adzka Fatan, Sudarno Sudarno

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright © by Author; Published by Institut Riset dan Publikasi Indonesia (IRPI)
This Public Research Journal of Engineering, Data Technology and Computer Science is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.