Content Classification of the Official Website of the Ministry of Foreign Affairs of the Republic of Indonesia (MoFA RI) using Vector Space Model (VSM)
DOI:
https://doi.org/10.57152/malcom.v4i4.1368Keywords:
Classification, Cosine Similarity, MoFA RI, TF-IDF, Vector Space ModelAbstract
The official website of the Ministry of Foreign Affairs of the Republic of Indonesia (MoFA RI) is an important platform for disseminating information to a diverse audience. Efficiently categorizing the vast amount of content available on the website is essential for enhancing user experience and optimizing information retrieval. These categories will also become an identifier and topic classification based on the content inside the article. This study presents a systematic approach to content classification of the Official Website of the Ministry of Foreign Affairs of the Republic of Indonesia (MoFA RI) using the Vector Space Model (VSM). The methodology involves preprocessing the text data, constructing a term-document matrix, and implementing cosine similarity to measure the relevance of documents to predefined categories. The study demonstrates the effectiveness of VSM in accurately classifying content, thus facilitating streamlined access to information for users navigating the website. Furthermore, the findings offer insights into enhancing the organization and accessibility of governmental online platforms, contributing to improved user experience and information dissemination.
References
F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing Survey, vol. 34, pp. 1–47, Mar. 2002, [Online]. Available: www.ira.uka.de/bibliography/Ai/automated.text.
Christopher. D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval, Online Edition. Cambridge: Cambridge University Press, 2009.
P. M. Hasugian, J. Manurung, Logaraz, and U. Ram, “IMPLEMENTATION OF TF-IDF AND COSINE SIMILARITY ALGORITHMS FOR CLASSIFICATION OF DOCUMENTS BASED ON ABSTRACT SCIENTIFIC JOURNALS,” JURNAL INFOKUM, vol. 9, no. Juni, Jun. 2021.
K. Park, J. S. Hong, and W. Kim, “A Methodology Combining Cosine Similarity with Classifier for Text Classification,” Applied Artificial Intelligence, vol. 34, no. 5, pp. 396–411, Apr. 2020, doi: 10.1080/08839514.2020.1723868.
M. Umadevi, “DOCUMENT COMPARISON BASED ON TF-IDF METRIC,” International Research Journal of Engineering and Technology, 2020, [Online]. Available: www.irjet.net
A. Rizqi Lahitani, A. Erna Permanasari, and N. Akhmad Setiawan, “Cosine Similarity to Determine Similarity Measure: Study Case in Online Essay Assessment,” 2016. doi: 10.1109/CITSM.2016.7577578.
M. Eminagaoglu, “A new similarity measure for vector space models in text classification and information retrieval,” J Inf Sci, vol. 48, no. 4, pp. 463–476, Aug. 2022, doi: 10.1177/0165551520968055.
D. Isa, L. Hong, V. P. Kallimani, and R. Rajkumar, “Text Document Pre-Processing Using the Bayes Formula for Classification Based on the Vector Space Model,” 2008.
S. Vahora, M. Hasan, and R. Lakhani, “Novel Approach: Naïve Bayes with Vector Space Model for Spam Classification,” IEEE, 2011.
P. Castells, M. Ferná Ndez, and D. Vallet, “An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval,” 2007.
R. Singh and S. Singh, “Text Similarity Measures in News Articles by Vector Space Model Using NLP,” Journal of The Institution of Engineers (India): Series B, vol. 102, no. 2, pp. 329–338, Apr. 2021, doi: 10.1007/s40031-020-00501-5.
Sintia, S. Defit, and G. Widi Nurcahyo, “PRODUCT CODEFICATION ACCURACY WITH COSINE SIMILARITY AND WEIGHTED TERM FREQUENCY AND INVERSE DOCUMENT FREQUENCY (TF-IDF),” 2021.
S. S. Nyein, Mining Contents in Web Page Using Cosine Similarity. University of Computer Studies, Mandalay, 2011.
M. Hay, W. Oo, and P. Pa, “Myanmar News Retrieval in Vector Space Model using Cosine Similarity Measure,” 2020.
A. Hiro Juni Permana and A. Toto Wibowo, “Movie Recommendation System Based on Synopsis Using Content-Based Filtering with TF-IDF and Cosine Similarity,” Intl. Journal on ICT, vol. 9, no. 2, pp. 1–14, 2023, doi: 10.21108/ijoict.v9i2.747.
D. Meidelfi, I. Rahmayuni, T. Hidayat, and D. Chandra, “TF-IDF Implementation for Similarity Checker on The Final Project Title,” 2021.
M. Artama, I. N. Sukajaya, and G. Indrawan, “Classification of official letters using TF-IDF method,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jun. 2020. doi: 10.1088/1742-6596/1516/1/012001.
A. Prasetyo, B. D. Septianto, G. F. Shidik, and A. Z. Fanani, Evaluation of Feature Extraction TF-IDF in Indonesian Hoax News Classification. IEEE, 2019.
C.-Z. Liu, Y.-X. Sheng, Z.-Q. Wei, and Y.-Q. Yang, “Research of Text Classification Based on Improved TF-IDF Algorithm,” 2018.
Y. Yue, T. Finley, F. Radlinski, and T. Joachims, “A Support Vector Method for Optimizing Average Precision,” SIGIR, 2007.