Spam Detection in YouTube Comments Using Deep Learning Models: A Comparative Study of MLP, CNN, LSTM, BiLSTM, GRU, and Attention Mechanisms

Gregorius Airlangga

doi:10.57152/malcom.v4i4.1671

Authors

Gregorius Airlangga Atma Jaya Catholic University of Indonesia

DOI:

https://doi.org/10.57152/malcom.v4i4.1671

Keywords:

Deep Learning, LSTM, Spam Detection, Text Classification, YouTube Comments

Abstract

This study explores the effectiveness of various deep learning models for detecting spam in YouTube comments. Six models were evaluated: Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Attention mechanisms. The dataset consists of 1,956 real comments extracted from popular YouTube videos, representing both spam and legitimate messages. The preprocessing phase involved tokenization and padding of text sequences to prepare them for model input. Results reveal that the LSTM model achieved the highest test accuracy of 95.65%, outperforming other models by capturing sequential dependencies and context within comments. The CNN model also demonstrated high accuracy, underscoring the importance of local pattern recognition in text classification. While BiLSTM and Attention models offered comparable performance, their marginal improvement over LSTM indicates that sequential modeling plays a crucial role in this task. The GRU model, despite being computationally efficient, showed slightly lower accuracy compared to LSTM and BiLSTM. The MLP model, serving as a baseline, exhibited limited performance, emphasizing the need for advanced architectures in spam detection. These findings suggest that combining sequential modeling with local feature extraction could lead to more robust spam detection systems.

Downloads

Download data is not yet available.

References

R. Gorwa, R. Binns, and C. Katzenbach, “Algorithmic content moderation: Technical and political challenges in the automation of platform governance,” Big Data & Soc., vol. 7, no. 1, p. 2053951719897945, 2020.

G. Jethava and U. P. Rao, “Exploring security and trust mechanisms in online social networks: An extensive review,” Comput. & Secur., p. 103790, 2024.

H. Jahankhani, S. Kendzierskyj, R. Montasari, and N. Chelvachandran, Social Media Analytics, Strategies and Governance. CRC Press, Taylor and Francis Group, 2022.

S. Bayrakdar, I. Yucedag, M. Simsek, and I. A. Dogru, “Semantic analysis on social networks: A survey,” Int. J. Commun. Syst., vol. 33, no. 11, p. e4424, 2020.

A. Puthussery, “Digital marketing: an overview,” 2020.

S. Krüger, Formative Media: Psychoanalysis and Digital Media Platforms. Taylor & Francis, 2024.

S. Rao, A. K. Verma, and T. Bhatia, “A review on social spam detection: Challenges, open issues, and future directions,” Expert Syst. Appl., vol. 186, p. 115742, 2021.

K. Zarei, “Fake identity & fake activity detection in online social networks based on transfer learning,” Institut Polytechnique de Paris, 2022.

A. Makkar and N. Kumar, “An efficient deep learning-based scheme for web spam detection in IoT environment,” Futur. Gener. Comput. Syst., vol. 108, pp. 467–487, 2020.

J. Gui, Y. Zhou, K. Yu, and X. Wu, “PSC-BERT: A spam identification and classification algorithm via prompt learning and spell check,” Knowledge-Based Syst., vol. 301, p. 112266, 2024.

G. Teles, J. J. P. C. Rodrigues, R. A. L. Rabelo, and S. A. Kozlov, “Comparative study of support vector machines and random forests machine learning algorithms on credit operation,” Softw. Pract. Exp., vol. 51, no. 12, pp. 2492–2500, 2021.

A. K. Mehta and S. Kumar, “Comparative Analysis and Optimization of Spam Filtration Techniques Using Natural Language Processing,” in 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), 2024, pp. 1005–1010.

A. Neisari, L. Rueda, and S. Saad, “Spam review detection using self-organizing maps and convolutional neural networks,” Comput. & Secur., vol. 106, p. 102274, 2021.

M. Umer et al., “Impact of convolutional neural network and FastText embedding on text classification,” Multimed. Tools Appl., vol. 82, no. 4, pp. 5569–5585, 2023.

J. P. Tan, A. L. A. Ramos, M. V Abante, R. L. Tadeo, and R. R. Lansigan, “A performance review of recurrent neural networks long short-term memory (LSTM),” in 2022 3rd International Conference for Emerging Technology (INCET), 2022, pp. 1–5.

K. Cheng, Y. Yue, and Z. Song, “Sentiment classification based on part-of-speech and self-attention mechanism,” IEEE Access, vol. 8, pp. 16387–16396, 2020.

W. Liang et al., “Advances, challenges and opportunities in creating data for trustworthy AI,” Nat. Mach. Intell., vol. 4, no. 8, pp. 669–677, 2022.

D. Antonakaki, P. Fragopoulou, and S. Ioannidis, “A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks,” Expert Syst. Appl., vol. 164, p. 114006, 2021.

V. Sharma et al., “FL-XGBTC: federated learning inspired with XG-boost tuned classifier for YouTube spam content detection,” Int. J. Syst. Assur. Eng. Manag., pp. 1–24, 2024.

K. Thomas, P. G. Kelley, S. Consolvo, P. Samermit, and E. Bursztein, “‘It’s common and a part of being a content creator’: Understanding How Creators Experience and Cope with Hate and Harassment Online,” in Proceedings of the 2022 CHI conference on human factors in computing systems, 2022, pp. 1–15.

T. Rains, Cybersecurity Threats, Malware Trends, and Strategies: Learn to mitigate exploits, malware, phishing, and other social engineering attacks. Packt Publishing Ltd, 2020.

F. Al-Turjman and R. Salama, “Security in social networks,” Secur. IoT Soc. Networks, 2020.

A. S. Alhassun and M. A. Rassam, “A combined text-based and metadata-based deep-learning framework for the detection of spam accounts on the social media platform twitter,” Processes, vol. 10, no. 3, p. 439, 2022.