An Explainable Hybrid Framework for Detecting Harmful Short Comments on Vietnamese Social Media Using TF-IDF and Lightweight Transformers
DOI:
https://doi.org/10.54654/isj.v3i26.1162Keywords:
Harmful content detection, Vietnamese NLP, TF-IDF, DistilBERT, SHAP, hate speech, explainable AI, text classificationTóm tắt
The proliferation of harmful content on Vietnamese social media platforms necessitates effective detection systems for short, concise, and slang-heavy texts. This study proposes the Explainable Hybrid Framework (EHF) for enhanced detection and interpretability. On an expert-annotated dataset of 15,162 samples, EHF with SVM achieves an F1-Score of 92.5% and Accuracy of 92.6%, outperforming strong baselines such as PhoBERT on our dataset, while state-of-the-art models like ViHateT5 are discussed on external Vietnamese benchmarks. Notably, it reduces false positives by 30% and false negatives by 25% versus traditional methods, with 45ms inference time. Our primary contribution is an efficient, fast, and explainable solution that addresses the critical challenge of automated moderation for short Vietnamese texts.
Downloads
References
DataReportal, “Digital 2025: Vietnam” (2025), DataReportal. Access time: 12/10/2025, https://datareportal.com/reports/digital-2025-vietnam.
S. Vosoughi, D. Roy and S. Aral, “The spread of true and false news online,” Science, vol. 359, no. 6380, pp. 1146–1151, 2018.
D. N. Long, N. T. Hung, N. T. Dung, D. V. Khanh, N. A. Tu and P. T. B. Van, “A proposed ensemble approach for searching hacking news semantically,” Journal of Science and Technology on Information Security, vol. 2, no. 22, pp. 83–92, 2024. DOI: 10.54654/isj.v2i22.1033.
Google, Temasek and Bain & Company, e-Conomy SEA 2023: Reaching new heights – Navigating the path to profitable growth, Singapore, 2023. Access time: 12/10/2025, https://services.google.com/fh/files/misc/e_conomy_sea_2023_report.pdf.
S. Livingstone, M. Stoilova and A. Kelly, “Cyberbullying: Incidence, trends and consequences,” in Ending the Torment: Tackling Bullying from the Schoolyard to Cyberspace, United Nations, pp. 115–120, 2016.
V. V. Thang, D. V. Pantiukhin, B. T. T. Quyen and V. V. Vu, “A review of neural networks for rare intrusions detection in wireless networks,” Journal of Science and Technology on Information Security, vol. 3, no. 20, pp. 23–34, 2023. DOI: 10.54654/isj.v3i20.984.
M. Banko, B. MacKeen and L. Ray, “A unified taxonomy of harmful content,” in Proceedings of the 4th Workshop on Online Abuse and Harms, pp. 125–137, 2020. DOI: 10.18653/v1/2020.alw-1.16.
P. M. Thuan, B. T. Lam and P. D. Trung, “DSViT: An Enhanced Transformer Model for Deepfake Detection,” Journal of Science and Technology on Information Security, vol. 2, no. 22, pp. 17–28, 2024. DOI: 10.54654/isj.v2i22.1055.
D. Hickey, D. M. T. Fessler, K. Lerman and K. Burghardt, “X under Musk’s leadership: Substantial hate and no reduction in inauthentic activity,” PLoS ONE, vol. 20, no. 2, 2025. DOI: 10.1371/journal.pone.0313293.
J. Allen et al., “Quantifying the impact of misinformation and vaccine confidence,” Science, vol. 384, no. 6665, pp. 567–573, 2024.
P. G. Hoang, C. D. Luu, K. Q. Tran, K. V. Nguyen and N. L.-T. Nguyen, “ViHOS: Hate speech spans detection for Vietnamese,” in Proceedings of EACL, pp. 652–669, 2023.
T.N. Nguyen, T.P. Le and K. V. Nguyen, “ViLexNorm: A lexical normalization corpus for Vietnamese social media text,” arXiv preprint arXiv:2401.16403, 2024. Access time: 12/10/2025, https://arxiv.org/abs/2401.16403.
T. Vu, D. Q. Nguyen, D. Q. Nguyen, M. Dras and M. Johnson, “VnCoreNLP: A Vietnamese natural language processing toolkit,” in Proceedings of NAACL-HLT (Demonstrations), 2018.
N. V. Dinh, T. C. Dang, L. T. Nguyen and K. V. Nguyen, “Multi-dialect Vietnamese: Task, dataset, baseline models and challenges,” in Proceedings of EMNLP, 2024.
T. N. H. Nguyen and T. D. Nguyen, “Vietnamese hate speech detection on social media using TF-IDF and SVM,” in Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies, Ho Chi Minh City, Vietnam, pp. 1–5, 2020.
D. Q. Nguyen and A. T. Nguyen, “PhoBERT: Pre-trained language models for Vietnamese,” in Findings of the Association for Computational Linguistics: EMNLP, pp. 1037–1042, 2020.
T. V. Nguyen, “Word2Vec models for Vietnamese text representation,” Journal of Computer Science and Cybernetics, vol. 34, no. 4, pp. 295–307, 2018.
H. Q. Tran, T. P. Nguyen and T. T. Do, “ViHateT5: A T5-based model for Vietnamese hate speech detection,” in Proceedings of the 2024 International Conference on Asian Language Processing (IALP), 2024.
T. T. Prama, J. F. Amrin, M. M. Anwar and I. H. Sarker, “AI-enabled user-specific cyberbullying severity detection with explainability,” arXiv preprint arXiv:2503.10650, 2025. Access time: 12/10/2025, https://arxiv.org/abs/2503.10650
G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513–523, 1988.
T. H. Nguyen and K. Shirai, “Text classification of technical papers based on text segmentation,” in Proceedings of the 18th International Conference on Applications of Natural Language to Information Systems (NLDB), Lecture Notes in Computer Science, vol. 7934, Springer, pp. 278–284, 2013.
P. Mishra, M. Del Tredici, H. Yannakoudakis and E. Shutova, “Abusive language detection with graph convolutional networks,” in Proceedings of NAACL-HLT, pp. 2145–2150, 2019.
R. Kshirsagar, T. Cukuvac, K. McKeown and S. McGregor, “Predictive embeddings for hate speech detection on Twitter,” in Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), Brussels, pp. 26–32, 2018.
J. Devlin, M. Chang, K. Lee and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, pp. 4171–4186, 2019.
V. Sanh, L. Debut, J. Chaumond and T. Wolf, “DistilBERT: A distilled version of BERT: Smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019. Access time: 12/10/2025, https://arxiv.org/abs/1910.01108
Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. Access time: 12/10/2025, https://arxiv.org/abs/1907.11692
S. T. Luu, K. V. Nguyen and N. L.-T. Nguyen, “A large-scale dataset for hate speech detection on Vietnamese social media texts,” arXiv preprint arXiv:2103.11528, 2021. Access time: 12/10/2025, https://arxiv.org/abs/2103.11528.
L. T. Nguyen, K. V. Nguyen and N. L.-T. Nguyen, “Constructive and toxic speech detection for open-domain social media comments in Vietnamese”, Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices, pp. 572-583, 2021. DOI:10.1007/978-3-030-79457-6_49.
C. N. Vo, K. B. Huynh, S. T. Luu and T.-H. Do, “ViTHSD: Exploiting hatred by targets for hate speech detection on Vietnamese social media texts,” Journal of Computational Social Science, vol. 8, Article 30, 2025. DOI: 10.1007/s42001-024-00348-6.
S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777, 2017.
R. Patel and V. Sharma, “SHAP-based efficient moderation system for online platforms,” in Proceedings of the International Conference on Responsible AI in Online Safety (RAIOS), pp. 22–30, 2024.
A. Nirmal, A. Bhattacharjee, P. Sheth and H. Liu, “Towards interpretable hate speech detection using large language model-extracted rationales,” in Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH), pp. 223–233, 2024. Access time: 15/12/2025, https://aclanthology.org/2024.woah-1.17/.
A. Gupta and R. Singh, “Concept bottleneck models for hate and counter speech detection,” in Proceedings of the Conference on Ethics in NLP, pp. 67–75, 2024.
N. T. Bui and M. L. Tran, “A survey on explainable AI for hate speech and misinformation detection,” in Proceedings of the 2024 International Conference on Asian Language Technology (ALTA), pp. 134–141, 2024.
R. Artstein and M. Poesio, “Inter-coder agreement for computational linguistics,” Computational Linguistics, vol. 34, no. 4, pp. 555–596, 2008.
M. Ha, T.-H. Nguyen, N.-N. Nguyen and Q.-T. Tran, “Vietnamese harmful short comments dataset (ViHSC-15K),” GitHub repository, 2025. Access time: 15/12/2025, https://github.com/mourinhan8/Vietnamese-Harmful-Short-Comments-Dataset-ViHSC-15K.
T. V. Huynh, D. V. Nguyen, K. V. Nguyen, N. L. T. Nguyen and A. G. T. Nguyen, “Hate speech detection on Vietnamese social media text using the Bi-GRU–LSTM–CNN model”, arXiv preprint, arXiv:1911.03644, 2019. Access time: 15/12/2025, https://arxiv.org/abs/1911.03644.
Downloads
Published
How to Cite
Issue
Section
License
Open Access Policy
The Journal of Science and Technology on Information Security provides open access to its published articles to broaden opportunities for high-quality research findings to be available and widely disseminated free of charge, contributing to the greater exchange of knowledge.
Open access statement: CTUJoS permits everyone to read, download, copy, distribute, print, search, or link to the full texts of the published articles without registration, price barriers, or asking for permission from the Journal or the author.
Proposed Policy for Journals That Offer Delayed Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication, with the work [SPECIFY PERIOD OF TIME] after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).










