Bootstrapped Aggregating Optimization in Random Forest for Hepatitis Risk
DOI:
https://doi.org/10.26623/transformatika.v22i1.9073Keywords:
Hepatitis, Random Forest, Bootstrapped Aggregating, Risk Prediction, Ensemble TechniquesAbstract
This research optimizes the Random Forest model with Bootstrapped Aggregating to predict hepatitis risk. The global significance of hepatitis as a health problem is underscored by its widespread impact. Using a Kaggle dataset comprising 596 records and 20 attributes, including age categories and gender, the study identifies limitations in predicting hepatitis risk. Through hyperparameter optimization, such as adjusting the number and depth of trees, the Random Forest model with bootstrapped aggregate achieves an accuracy of 96%, surpassing the standard model's 88%. The results demonstrate a significant improvement in precision, recall, and f1 score, particularly in reducing false negatives. The conclusion highlights the practical potential of this model for a more accurate assessment of hepatitis risk. While acknowledging limitations related to the size of the dataset, these findings provide a foundation for developing predictive models in the context of hepatitis risk, emphasizing the importance of employing ensemble techniques to improve model performance.References
[1] T. Vos et al., “Global Burden of 369 Diseases and Injuries in 204 Countries and Territories, 1990–2019: A Systematic Analysis for the Global Burden of Disease Study 2019,” The Lancet, 2020, doi: 10.1016/s0140-6736(20)30925-9.
[2] C. Gomes, R. J. Wong, and R. G. Gish, “Global Perspective on Hepatitis B Virus Infections in the Era of Effective Vaccines,” Clinics in Liver Disease, 2019, doi: 10.1016/j.cld.2019.04.001.
[3] M. Jefferies, B. Rauff, H. Rashid, T. M. Lam, and S. Rafiq, “Update on Global Epidemiology of Viral Hepatitis and Preventive Strategies,” World Journal of Clinical Cases, 2018, doi: 10.12998/wjcc.v6.i13.589.
[4] B. S. Sheena et al., “Global, Regional, and National Burden of Hepatitis B, 1990–2019: A Systematic Analysis for the Global Burden of Disease Study 2019,” The Lancet Gastroenterology & Hepatology, 2022, doi: 10.1016/s2468-1253(22)00124-8.
[5] S. G. Sepanlou et al., “The Global, Regional, and National Burden of Cirrhosis by Cause in 195 Countries and Territories, 1990–2017: A Systematic Analysis for the Global Burden of Disease Study 2017,” The Lancet Gastroenterology & Hepatology, 2020, doi: 10.1016/s2468-1253(19)30349-8.
[6] D. Q. Huang et al., “Global Epidemiology of Cirrhosis — Aetiology, Trends and Predictions,” Nature Reviews Gastroenterology & Hepatology, 2023, doi: 10.1038/s41575-023-00759-2.
[7] O. M. Doyle, N. Leavitt, and J. A. Rigg, “Finding undiagnosed patients with hepatitis C infection: an application of artificial intelligence to patient claims data,” Sci Rep, vol. 10, no. 1, p. 10521, Jun. 2020, doi: 10.1038/s41598-020-67013-6.
[8] D.-V. Phan, C.-L. Chan, A.-H. A. Li, T.-Y. Chien, and V. L. Nguyen, “Liver Cancer Prediction in a Viral Hepatitis Cohort: A Deep Learning Approach,” International Journal of Cancer, 2020, doi: 10.1002/ijc.33245.
[9] K. Swetha, A. Kiran, K. Pavanam, E. N. Vijaya Kumari, T. Naresh, and M. J. Baba, “Inflammation of Liver and Hepatitis Disease Prediction using Machine Learning Techniques,” in 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India: IEEE, May 2023, pp. 218–223. doi: 10.1109/ICICCS56967.2023.10142912.
[10] Prasenjit Maity, Arup Kumar Dey, Krishna Prasad Singha, Avijit Kumar Chaudhuri, and Sulekha Das, “An Approach Combining Feature Selection with Machine Learning Techniques for Prediction Reliability and Accuracy in Hepatitis Diagnosis,” IJETMS, vol. 7, no. 2, pp. 181–194, 2023, doi: 10.46647/ijetms.2023.v07i02.023.
[11] F. R. Albogamy et al., “Decision Support System for Predicting Survivability of Hepatitis Patients,” Frontiers in Public Health, 2022, doi: 10.3389/fpubh.2022.862497.
[12] B. Khaoula, B. Imene, W. Guenifi, A. Gasmi, and S. Laouamri, “Intelligent Analysis of Some Factors Accompanying Hepatitis B,” Molecular Sciences and Applications, 2022, doi: 10.37394/232023.2022.2.7.
[13] I. I. Ahmed, D. Y. Mohammed, and K. A. Zidan, “Diagnosis of hepatitis disease using machine learning techniques,” IJEECS, vol. 26, no. 3, p. 1564, Jun. 2022, doi: 10.11591/ijeecs.v26.i3.pp1564-1572.
[14] M. Ghorbian, “Clinical Usefulness of Machine Learning Approaches as a Non-Invasive Technology in Reducing Hepatitis Disease Mortality,” 2023, doi: 10.21203/rs.3.rs-2965115/v1.
[15] S. V. B. -, N. A. -, N. J. -, and M. D. -, “Liver Disease Prediction Using Machine Learning,” International Journal for Multidisciplinary Research, 2023, doi: 10.36948/ijfmr.2023.v05i03.2955.
[16] A. Alizargar, Y.-L. Chang, and T.-H. Tan, “Performance Comparison of Machine Learning Approaches on Hepatitis C Prediction Employing Data Mining Techniques,” Bioengineering, vol. 10, no. 4, p. 481, Apr. 2023, doi: 10.3390/bioengineering10040481.
[17] J. Yang, “Hepatitis C Risk Prediction Based on Adaboost,” Highlights in Science Engineering and Technology, 2023, doi: 10.54097/hset.v54i.9803.
[18] Η. Δρίτσας and M. Trigka, “Supervised Machine Learning Models for Liver Disease Risk Prediction,” Computers, 2023, doi: 10.3390/computers12010019.
[19] Z. Farhadi, H. Bevrani, and M. Feizi‐Derakhshi, “Improving Random Forest Algorithm by Selecting Appropriate Penalized Method,” Communications in Statistics - Simulation and Computation, 2022, doi: 10.1080/03610918.2022.2150779.
[20] P. Regier, M. Duggan, A. Myers‐Pigg, and N. G. Ward, “Effects of Random Forest Modeling Decisions on Biogeochemical Time Series Predictions,” Limnology and Oceanography Methods, 2022, doi: 10.1002/lom3.10523.
[21] S. Kanwar, L. K. Awasthi, and V. Shrivastava, “Efficient Random Forest Algorithm for Multi-Objective Optimization in Software Defect Prediction,” Iete Journal of Research, 2023, doi: 10.1080/03772063.2023.2205377.
[22] R. Susetyoko, E. Purwantini, B. N. Iman, and E. Satriyanto, “An Improved Accuracy of Multiclass Random Forest Classifier With Continuous Attribute Transformation Using Random Percentile Generation,” International Journal on Advanced Science Engineering and Information Technology, 2023, doi: 10.18517/ijaseit.13.3.18379.
[23] J. Sun and Z. Shen, “Research on Improved Random Forest Algorithm for Highly Unbalanced Data,” Journal of Physics Conference Series, 2022, doi: 10.1088/1742-6596/2333/1/012007.
[24] S. E. Ibrahim, “Improving Land Use/Cover Classification Accuracy From Random Forest Feature Importance Selection Based on Synergistic Use of Sentinel Data and Digital Elevation Model in Agriculturally Dominated Landscape,” Agriculture, 2022, doi: 10.3390/agriculture13010098.
[25] M. I. Prasetiyowati, N. U. Maulidevi, and K. Surendro, “The Accuracy of Random Forest Performance Can Be Improved by Conducting a Feature Selection With a Balancing Strategy,” Peerj Computer Science, 2022, doi: 10.7717/peerj-cs.1041.
[26] K. A. Dauda, “Optimal Tuning of Random Survival Forest Hyperparameter With an Application to Liver Disease,” Malaysian Journal of Medical Sciences, 2022, doi: 10.21315/mjms2022.29.6.7.
[27] Y. Resti, C. Irsan, J. F. Latif, I. Yani, and N. R. Dewi, “A Bootstrap-Aggregating in Random Forest Model for Classification of Corn Plant Diseases and Pests,” Science & Technology Indonesia, 2023, doi: 10.26554/sti.2023.8.2.288-297.
[28] F. Muhammad et al., “Liver Ailment Prediction Using Random Forest Model,” Computers Materials & Continua, 2023, doi: 10.32604/cmc.2023.032698.
[29] M. M. Majzoobi, S. Namdar, R. Najafi-Vosough, A. A. Hajilouei, and H. Mahjub, “Prediction of Hepatitis Disease Using Ensemble Learning Methods,” Journal of Preventive Medicine and Hygiene, p. E424 Pages, Oct. 2022, doi: 10.15167/2421-4248/JPMH2022.63.3.2515.
[30] V. Daniel and R. Ramaraj, “A Novel Modified Long Short Term Memory Architecture for Automatic Liver Disease Prediction From Patient Records,” Concurrency and Computation Practice and Experience, 2022, doi: 10.1002/cpe.7372.
[31] M. Anisetti, C. A. Ardagna, A. Balestrucci, N. Bena, E. Damiani, and C. Y. Yeun, “On the Robustness of Random Forest Against Untargeted Data Poisoning: An Ensemble-Based Approach,” Ieee Transactions on Sustainable Computing, 2023, doi: 10.1109/tsusc.2023.3293269.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Jurnal Transformatika

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

Transformatika is licensed under a Creative Commons Attribution 4.0 International License.



