GA-SVM Wrapper Feature Selection untuk Penanganan Data Berdimensi Tinggi

Ahmad Rifa'i, Joko Suntoro, Galet Guntoro Setiaji

Abstract


Peningkatan data dalam beberapa tahun terakhir ini mengalami peningkatan yang sangat signifikan karena penggunaan sosial media dan peralihan menjadi era digital. Teknik untuk pengolahan data menjadi informasi yang berguna dinamakan dengan data mining. Namun masalah yang terjadi ketika menerapkan data mining, khususnya metode klasifikasi adalah data berdimensi tinggi karena data berdimensi tinggi mempengaruhi hasil evaluasi dalam klasifikasi menjadi rendah. Data berdimensi tinggi didefinisikan sebagai data dengan jumlah fitur yang banyak dan kompleks, kompleksitas fitur mengakibatkan sulitnya memilih subset fitur yang optimal karena terdapat fitur yang tidak relevan. Dalam penelitian ini akan digunakan teknik wrapper dengan menerapkan metode metaheuristik yaitu algoritma genetika (GA) untuk pemilihan subset fitur agar lebih optimal, dan algoritma pengklasifikasi yang digunakan adalah algoritma Support Vector Machine (SVM), metode ini disebut dengan GA-SVM WFS. Hasil akurasi metode GA-SVM WFS lebih tinggi dibandingkan dengan metode SVM, dengan rata-rata hasil akurasi masing-masing sebesar 0,902 dan 0,874. Dalam penelitian ini terdapat perbedaan secara signfikan antara metode GA-SVM WFS dan metode SVM setelah dilakukan uji paired t-test dengan nilai p-value sebesar 0,01 dengan nilai α sebesar 0,05.

Keywords


Klasifikasi; Seleksi Wrapper; Genetic Algorithm; Support Vector Machine; Data Berdimensi Tinggi

References


J. Wan, H. Chen, Z. Yuan, T. Li, X. Yang, and B. Bin Sang, “A novel hybrid feature selection method considering feature interaction in neighborhood rough set[Formula presented],” Knowl Based Syst, vol. 227, Sep. 2021, doi: 10.1016/j.knosys.2021.107167.

Y. Guo, N. Wang, Z. Y. Xu, and K. Wu, “The internet of things-based decision support system for information processing in intelligent manufacturing using data mining technology,” Mech Syst Signal Process, vol. 142, Aug. 2020, doi: 10.1016/j.ymssp.2020.106630.

I. F. Kilincer, T. Tuncer, F. Ertam, and A. Sengur, “SPA-IDS: An intelligent intrusion detection system based on vertical mode decomposition and iterative feature selection in computer networks,” Microprocess Microsyst, vol. 96, p. 104752, Feb. 2023, doi: 10.1016/j.micpro.2022.104752.

A. Cherif, A. Badhib, H. Ammar, S. Alshehri, M. Kalkatawi, and A. Imine, “Credit card fraud detection in the era of disruptive technologies: A systematic review,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 1. King Saud bin Abdulaziz University, pp. 145–174, Jan. 01, 2023. doi: 10.1016/j.jksuci.2022.11.008.

P. S., F. Al-Turjman, and T. Stephan, “An automated breast cancer diagnosis using feature selection and parameter optimization in ANN,” Computers & Electrical Engineering, vol. 90, p. 106958, Mar. 2021, doi: 10.1016/j.compeleceng.2020.106958.

K. Thirumoorthy and J. J. B. J., “A feature selection model for software defect prediction using binary Rao optimization algorithm,” Appl Soft Comput, vol. 131, p. 109737, Dec. 2022, doi: 10.1016/j.asoc.2022.109737.

F. Bodendorf, P. Merkl, and J. Franke, “Intelligent cost estimation by machine learning in supply management: A structured literature review,” Comput Ind Eng, vol. 160, p. 107601, Oct. 2021, doi: 10.1016/j.cie.2021.107601.

P. Qiu and Z. Niu, “TCIC_FS: Total correlation information coefficient-based feature selection method for high-dimensional data,” Knowl Based Syst, vol. 231, p. 107418, Nov. 2021, doi: 10.1016/j.knosys.2021.107418.

M. García-Torres, R. Ruiz, and F. Divina, “Evolutionary feature selection on high dimensional data using a search space reduction approach,” Eng Appl Artif Intell, vol. 117, p. 105556, Jan. 2023, doi: 10.1016/j.engappai.2022.105556.

B. Wang et al., “Selective Feature Bagging of one-class classifiers for novelty detection in high-dimensional data,” Eng Appl Artif Intell, vol. 120, p. 105825, Apr. 2023, doi: 10.1016/j.engappai.2023.105825.

G. Manikandan and S. Abirami, “An efficient feature selection framework based on information theory for high dimensional data,” Appl Soft Comput, vol. 111, p. 107729, Nov. 2021, doi: 10.1016/j.asoc.2021.107729.

S. Solorio-Fernández, J. Fco. Martínez-Trinidad, and J. A. Carrasco-Ochoa, “A Supervised Filter Feature Selection method for mixed data based on Spectral Feature Selection and Information-theory redundancy analysis,” Pattern Recognit Lett, vol. 138, pp. 321–328, Oct. 2020, doi: 10.1016/j.patrec.2020.07.039.

O. Tarkhaneh, T. T. Nguyen, and S. Mazaheri, “A novel wrapper-based feature subset selection method using modified binary differential evolution algorithm,” Inf Sci (N Y), vol. 565, pp. 278–305, Jul. 2021, doi: 10.1016/j.ins.2021.02.061.

A. Got, A. Moussaoui, and D. Zouache, “Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach,” Expert Syst Appl, vol. 183, p. 115312, Nov. 2021, doi: 10.1016/j.eswa.2021.115312.

W. BinSaeedan and S. Alramlawi, “CS-BPSO: Hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis,” Knowl Based Syst, vol. 227, p. 107224, Sep. 2021, doi: 10.1016/j.knosys.2021.107224.

M. R. Alnowami, F. A. Abolaban, and E. Taha, “A wrapper-based feature selection approach to investigate potential biomarkers for early detection of breast cancer,” J Radiat Res Appl Sci, vol. 15, no. 1, pp. 104–110, Mar. 2022, doi: 10.1016/j.jrras.2022.01.003.

M. S. Abbasi, H. Al-Sahaf, M. Mansoori, and I. Welch, “Behavior-based ransomware classification: A particle swarm optimization wrapper-based approach for feature selection,” Appl Soft Comput, vol. 121, p. 108744, May 2022, doi: 10.1016/j.asoc.2022.108744.

A. M. Vommi and T. K. Battula, “A hybrid filter-wrapper feature selection using Fuzzy KNN based on Bonferroni mean for medical datasets classification: A COVID-19 case study,” Expert Syst Appl, vol. 218, p. 119612, May 2023, doi: 10.1016/j.eswa.2023.119612.

S. Jain and A. Saha, “Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection,” Sci Comput Program, vol. 212, p. 102713, Dec. 2021, doi: 10.1016/j.scico.2021.102713.

R. Espinosa, F. Jiménez, and J. Palma, “Multi-surrogate assisted multi-objective evolutionary algorithms for feature selection in regression and classification problems with time series data,” Inf Sci (N Y), vol. 622, pp. 1064–1091, Apr. 2023, doi: 10.1016/j.ins.2022.12.004.

J. Suntoro, A. Ilham, and H. A. D. Rani, “New Method Based Pre-Processing to Tackle Missing and High Dimensional Data of CRISP-DM Approach,” J Phys Conf Ser, vol. 1471, no. 1, p. 012012, Feb. 2020, doi: 10.1088/1742-6596/1471/1/012012.

A. Theissler, M. Thomas, M. Burch, and F. Gerschner, “ConfusionVis: Comparative evaluation and selection of multi-class classifiers based on confusion matrices,” Knowl Based Syst, vol. 247, p. 108651, Jul. 2022, doi: 10.1016/j.knosys.2022.108651.

E. Mortaz, “Imbalance accuracy metric for model selection in multi-class imbalance classification problems,” Knowl Based Syst, vol. 210, p. 106490, Dec. 2020, doi: 10.1016/j.knosys.2020.106490.

S. P. Potharaju, M. Sreedevi, V. K. Ande, and R. K. Tirandasu, “Data mining approach for accelerating the classification accuracy of cardiotocography,” Clin Epidemiol Glob Health, vol. 7, no. 2, pp. 160–164, Jun. 2019, doi: 10.1016/j.cegh.2018.03.004.

N. Kim, “The limit distribution of a modified Shapiro–Wilk statistic for normality to Type II censored data,” J Korean Stat Soc, vol. 40, no. 3, pp. 257–266, Sep. 2011, doi: 10.1016/J.JKSS.2010.10.004.




DOI: http://dx.doi.org/10.26623/transformatika.v21i2.8886

Refbacks

  • There are currently no refbacks.


| View My Stats |

Jurnal Transformatika : Journal Information Technology  by  Department of Information Technology, Faculty of Information Technology and Communication, Semarang University  is licensed under a  Creative Commons Attribution 4.0 International License.