Sensitivitas Sistem Pencarian Artikel Bahasa Indonesia Menggunakan Metode n-gram Dan Tanimoto Cosine

Candra Supriadi, Hidriyanto Dwi Purnomo, Irwan Sembiring

Abstract


The human need for technology and the availability of adequate infrastructure is evidence that technology is now a part of basic human needs. The increasing number of journals and scientific papers, it must be more selective in selecting and sorting even though there are already many online service providers and journal portals. Research on search engines and plagiarism and recommendation systems has been carried out with various methods deemed appropriate to improve the performance of the system itself, this paper has the purpose of calculating the similarity between one article with another article by implementing n-gram and tanimoto cosine. The number of articles tested was forty-three titles and abstracts, tested fifty times with randomly selected keywords, by breaking down each title and abstract sentence into n characters (n = 2 to 8) including spaces and punctuation, then counted similarity with the query or keyword used for system testing. The test was conducted using several threshold variations from n = 2 to 8. After observing fifty times the threshold test of 0.15 has the highest accuracy at n = 4 at 0.92, the highest precision at n = 3 at 0.42 and the highest recall at the test n = 2 = 0.44 .


Keywords


Sistem pencarian, Tanimoto cosine, n-gram

References


S. Palgunadi, Analisis Kombinasi Algoritma Weighted Tree Similarity Dengan Tanimoto Cosine (Tc) Untuk Pencarian Semantik Pada Portal Jurnal, Pros. SNST Fak. Tek., vol. 1, no. 1, 2014.

D. Purwitasari, P. Y. Kusmawan, and U. L. Yuhana, Deteksi Keberadaan Kalimat Sama Sebagai Indikasi Penjiplakan Dengan Algoritma Hashing Berbasis N-Gram, J. Ilm. KURSOR. Surabaya, 2011.

A. Sharma and S. P. Lal, Tanimoto based similarity measure for intrusion detection system, J. Inf. Secur., vol. 2, no. 4, p. 195, 2011.

R. Sarno and F. Rahutomo, Penerapan algoritma weighted tree similarity untuk pencarian semantik, JUTI J. Ilm. Teknol. Inf., vol. 7, no. 1, pp. 39 46, 2008.

J. Fadlil and W. F. Mahmudy, Pembuatan Sistem Rekomendasi menggunakan Decision Tree dan Clustering, Age (Omaha)., vol. 25, no. 26, p. 25, 2007.

F. Ricci, L. Rokach, and B. Shapira, Introduction to recommender systems handbook. Springer, 2011.

Z. Qiu, M. Chen, and J. Huang, Design of multi-mode e-commerce recommendation system, in Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on, 2010, pp. 530 533.

R. B. Rodrigues, C. M. R. da Silva, W. O. Ferreira, and G. MM, A Cloud-Based Recommendation System.

J. Zhang, Z. Lin, B. Xiao, and C. Zhang, An optimized item-based collaborative filtering recommendation algorithm, in Network Infrastructure and Digital Content, 2009. IC-NIDC 2009. IEEE International Conference on, 2009, pp. 414 418.

K. Swearingen and R. Sinha, Beyond Algorithms : An HCI Perspective on Recommender Systems, ACM SIGIR 2001 Work. Recomm. Syst., pp. 1 11, 2001.

E. Horowitz, S. Sahni, and S. Rajasekaran, Computer algorithms C++: C++ and pseudocode versions. Macmillan, 1997.

F. A. S. Board, BAB 2, 2012.

R. Sarno and F. Rahutomo, Penerapan Algoritma Weighted Tree Similarity, J. Teknol. Inf., vol. 7, no. August, pp. 39 46, 2015.

herlawati Prabowo pudjo widodo, Rahmadya trias handayanto, Penerapan Data Mining Pada Matlab, Cetakan Pe. Bandung: Penerbit Rekayasa Sains, Bandung, 2013.

C. Matrix, Confusion Matrix, pp. 8 10, 2008.




DOI: http://dx.doi.org/10.26623/transformatika.v18i1.2184

Refbacks

  • There are currently no refbacks.


| View My Stats |

Jurnal Transformatika : Journal Information Technology  by  Department of Information Technology, Faculty of Information Technology and Communication, Semarang University  is licensed under a  Creative Commons Attribution 4.0 International License.