Non-image lung cancer prediction utilizing KNN model promoting health consciousness
Abstract
One of the most prevalent types of cancer worldwide, lung cancer, has a high mortality rate. Imaging sets are commonly used for lung cancer diagnosis. However, imaging sets have limitations in terms of accuracy that can cause false negative cases, which leads to delayed treatment due to late diagnosis, ultimately reducing patients’ survival rates. Societies with low income might need a more economical way to predict lung cancer since the imaging sets require a significant amount of money. This study uses a secondary dataset that contains non-image data such as demographics, lifestyle, and symptoms to create a model to detect lung cancer. The performance of several machine learning models, including SVM, Decision Tree, ANN, Logistic Regression, Random Forest, XGBoost, AdaBoost, Gradient Boost, Light Gradient Boosting, KNN, and Naive Bayes, is compared after the dataset has been preprocessed and divided into training and testing data. It is found that lung cancer is more likely to be diagnosed in females and those with any allergies, alcohol consumption, or difficulty swallowing. Next, it is shown that the KNN model is the best model, with an accuracy of 96.39%, a precision score of 100%, and an F1-score of 97.81%, despite having the lowest recall score among other models. A successful prediction model eases the burden on low-income families to predict the possibility of disease occurrence without spending money on X-rays, thus increasing health consciousness.
Authors

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.