Boosting Performance Classification KNN Customer Loyalty with Chi-Square and Information Gain

Authors

DOI:

https://doi.org/10.26623/6wgy1097

Keywords:

Chi-Square, Customer Loyalty, Feature Selection, Information Gain, kNN

Abstract

Understanding customer purchasing behavior is essential for predicting customer loyalty, which directly impacts a company's long-term success. This research aims to determine the effect of chi-square and information gain feature selection in optimizing customer loyalty classification performance, compared to pure kNN. Using a public customer purchasing behavior dataset from Kaggle, containing 10,000 data, 12 attributes with loyalty_status as the label (Gold, Regular, Silver). Evaluating performance by accuracy, kappa, classification error, recall, precision, and RMSE. The highest accuracy 91.99% was obtained by kNN k=3 with information gain, kappa 0.844, precision 95.44%, recall 86.30%, with the lowest classification error 8.01% and the second lowest RMSE 0.245, after kNN k=3 with chi-square. Results show that feature selection has a positive impact on classification, increasing accuracy and reducing errors, with the combination of the kNN k=3 method and information gain proving successful in obtaining high accuracy in classifying customer loyalty.

 

Author Biography

  • Atika Mutiarachim, Universitas 17 Agustus 1945 Semarang
    Fakultas Ekonomika dan Bisnis, Program Studi Bisnis Digital

References

[1] W. N. Wassouf, R. Alkhatib, K. Salloum, and S. Balloul, “Predictive Analytics using Big Data for Increased Customer Loyalty: Syriatel Telecom Company Case Study,” J Big Data, vol. 7, no. 1, Dec. 2020, doi: 10.1186/s40537-020-00290-0.

[2] K. Tarnowska, Z. W. Ras, and L. Daniel, “Recommender System for Improving Customer Loyalty,” in Studies in Big Data , vol. 55, 2020.

[3] A. Mutiarachim and N. A. Yuniarti, “Jurnal Sistem Informasi, Manajemen, dan Akuntansi (SIMAK) The Role of Driver Services and Application Quality in Enhancing Gojek Customer Loyalty Through Satisfaction”.

[4] M. Kimura, “Customer Segment Transition Through the Customer Loyalty Program,” Asia Pacific Journal of Marketing and Logistics, vol. 34, no. 3, pp. 611–626, Feb. 2022, doi: 10.1108/APJML-09-2020-0630.

[5] Z. Deng, Z. Zheng, D. Deng, T. Wang, Y. He, and D. Zhang, “Feature Selection for Multi-Label Learning Based on F-Neighborhood Rough Sets,” IEEE Access, vol. 8, pp. 39678–39688, 2020, doi: 10.1109/ACCESS.2020.2976162.

[6] G. Chaubey, P. R. Gavhane, D. Bisen, and S. K. Arjaria, “Customer Purchasing Behavior Prediction using Machine Learning Classification Techniques,” J Ambient Intell Humaniz Comput, vol. 14, no. 12, pp. 16133–16157, Dec. 2023, doi: 10.1007/s12652-022-03837-6.

[7] E. Deniz and S. Ç. Bülbül, “Predicting Customer Purchase Behavior Using Machine Learning Models,” Information Technology in Economics and Business, Jul. 2024, doi: 10.69882/adba.iteb.2024071.

[8] V. Umarani, “Investigation of KNN and Decision Tree Induction Modelin Predicting Customer Buying Pattern,” European Alliance for Innovation n.o., Jan. 2022. doi: 10.4108/eai.7-12-2021.2314593.

[9] A. S. Jaddoa and Z. T. M. Al-Ta’i, “Diagnosis of Diabetes Mellitus using (chi square-information gain) selectors and (SVM and KNN) Classifiers,” in AIP Conference Proceedings, American Institute of Physics Inc., Mar. 2023. doi: 10.1063/5.0102761.

[10] M. Onesime, Z. Yang, and Q. Dai, “Genomic Island Prediction via Chi-Square Test and Random Forest Algorithm,” Comput Math Methods Med, vol. 2021, 2021, doi: 10.1155/2021/9969751.

[11] H. Bhoria, A. Dhankhar, and K. Solanki, “INDIAN JOURNAL OF SCIENCE AND TECHNOLOGY Chi-Square Feature Selection Technique for Student’s performance prediction,” / Indian Journal of Science and Technology, vol. 16, no. 38, pp. 3250–3257, 2023, doi: 10.17485/IJST/v16i38.921.

[12] F. Istighfarizky, N. A. S. ER, I. M. Widiartha, L. G. Astuti, I. G. N. A. C. Putra, and I. K. G. Suhartana, “Klasifikasi Jurnal menggunakan Metode KNN dengan Mengimplementasikan Perbandingan Seleksi Fitur,” Jurnal Elektronik Ilmu Komputer Udayana, vol. 11, no. 1, pp. 167–176, Aug. 2022, [Online]. Available: https://scholar.google.com

[13] S. Hidayatul, A. Aini, Y. A. Sari, and A. Arwan, “Seleksi Fitur Information Gain untuk Klasifikasi Penyakit Jantung Menggunakan Kombinasi Metode K-Nearest Neighbor dan Naïve Bayes,” 2018. [Online]. Available: http://j-ptiik.ub.ac.id

[14] N. Devian et al., “Prediksi Penyakit Diabetes dengan Metode K-Nearest Neighbor (kNN) dan Seleksi Fitur Information Gain,” 2024.

[15] W. Xing and Y. Bei, “Medical Health Big Data Classification Based on KNN Classification Algorithm,” IEEE Access, vol. 8, pp. 28808–28819, 2020, doi: 10.1109/ACCESS.2019.2955754.

[16] S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, “Learning k for kNN Classification,” ACM Trans Intell Syst Technol, vol. 8, no. 3, Jan. 2017, doi: 10.1145/2990508.

[17] F. Nabiel Syahreza, P. Nurul Sabrina, E. Ramadhan Teknik Informatika, U. Jendral Achmad Yani Jl Terusan Jend Sudirman, J. Barat, and K. Cimahi, “Prediksi Penyakit Stroke Menggunakan Metode K-Nearest Neigbors dan Information Gain,” Jurnal Mahasiswa Teknik Informatika, vol. 8, no. 6, Dec. 2024.

[18] Y. Wang and C. Zhou, “Feature Selection Method Based on Chi-Square Test and Minimum Redundancy,” in Emerging Trends in Intelligent and Interactive Systems and Applications (IISA 2020), M. Tavana, N. Nedjah, and R. Alhajj, Eds., Emerging Trends in Intelligent and Interactive Systems and Applications (IISA 2020), Dec. 2020, pp. 171–178.

[19] K. Jain and R. Jindal, “Sampling and noise filtering methods for recommender systems: A literature review,” Eng Appl Artif Intell, vol. 122, p. 106129, Jun. 2023, doi: 10.1016/J.ENGAPPAI.2023.106129.

[20] R. B. Widodo, Machine Learning Metode k-Nearest Neighbors Klasifikasi Angka Bahasa Isyarat. Malang: Media Nusa Creative, 2022.

Published

2025-03-11

Versions

How to Cite

Mutiarachim, A., Fikriah, F. K., Ansor, B., & Ramdani, A. P. (2025). Boosting Performance Classification KNN Customer Loyalty with Chi-Square and Information Gain. Jurnal Transformatika, 22(2), 81-89. https://doi.org/10.26623/6wgy1097 (Original work published 2025)