An Explainable Hybrid Framework for Detecting Harmful Short Comments on Vietnamese Social Media Using TF-IDF and Lightweight Transformers

Authors

  • Ha Van Muon
  • Nguyen Thanh Hai
  • Nguyen Ngoc Nhan
  • Tran Quang Tuan

DOI:

https://doi.org/10.54654/isj.v3i26.1162

Keywords:

Harmful content detection, Vietnamese NLP, TF-IDF, DistilBERT, SHAP, hate speech, explainable AI, text classification

Tóm tắt

The proliferation of harmful content on Vietnamese social media platforms necessitates effective detection systems for short, concise, and slang-heavy texts. This study proposes the Explainable Hybrid Framework (EHF) for enhanced detection and interpretability. On an expert-annotated dataset of 15,162 samples, EHF with SVM achieves an F1-Score of 92.5% and Accuracy of 92.6%, outperforming strong baselines such as PhoBERT on our dataset, while state-of-the-art models like ViHateT5 are discussed on external Vietnamese benchmarks. Notably, it reduces false positives by 30% and false negatives by 25% versus traditional methods, with 45ms inference time. Our primary contribution is an efficient, fast, and explainable solution that addresses the critical challenge of automated moderation for short Vietnamese texts.

Downloads

Download data is not yet available.

References

DataReportal, “Digital 2025: Vietnam” (2025), DataReportal. Access time: 12/10/2025, https://datareportal.com/reports/digital-2025-vietnam.

S. Vosoughi, D. Roy and S. Aral, “The spread of true and false news online,” Science, vol. 359, no. 6380, pp. 1146–1151, 2018.

D. N. Long, N. T. Hung, N. T. Dung, D. V. Khanh, N. A. Tu and P. T. B. Van, “A proposed ensemble approach for searching hacking news semantically,” Journal of Science and Technology on Information Security, vol. 2, no. 22, pp. 83–92, 2024. DOI: 10.54654/isj.v2i22.1033.

Google, Temasek and Bain & Company, e-Conomy SEA 2023: Reaching new heights – Navigating the path to profitable growth, Singapore, 2023. Access time: 12/10/2025, https://services.google.com/fh/files/misc/e_conomy_sea_2023_report.pdf.

S. Livingstone, M. Stoilova and A. Kelly, “Cyberbullying: Incidence, trends and consequences,” in Ending the Torment: Tackling Bullying from the Schoolyard to Cyberspace, United Nations, pp. 115–120, 2016.

V. V. Thang, D. V. Pantiukhin, B. T. T. Quyen and V. V. Vu, “A review of neural networks for rare intrusions detection in wireless networks,” Journal of Science and Technology on Information Security, vol. 3, no. 20, pp. 23–34, 2023. DOI: 10.54654/isj.v3i20.984.

M. Banko, B. MacKeen and L. Ray, “A unified taxonomy of harmful content,” in Proceedings of the 4th Workshop on Online Abuse and Harms, pp. 125–137, 2020. DOI: 10.18653/v1/2020.alw-1.16.

P. M. Thuan, B. T. Lam and P. D. Trung, “DSViT: An Enhanced Transformer Model for Deepfake Detection,” Journal of Science and Technology on Information Security, vol. 2, no. 22, pp. 17–28, 2024. DOI: 10.54654/isj.v2i22.1055.

D. Hickey, D. M. T. Fessler, K. Lerman and K. Burghardt, “X under Musk’s leadership: Substantial hate and no reduction in inauthentic activity,” PLoS ONE, vol. 20, no. 2, 2025. DOI: 10.1371/journal.pone.0313293.

J. Allen et al., “Quantifying the impact of misinformation and vaccine confidence,” Science, vol. 384, no. 6665, pp. 567–573, 2024.

P. G. Hoang, C. D. Luu, K. Q. Tran, K. V. Nguyen and N. L.-T. Nguyen, “ViHOS: Hate speech spans detection for Vietnamese,” in Proceedings of EACL, pp. 652–669, 2023.

T.N. Nguyen, T.P. Le and K. V. Nguyen, “ViLexNorm: A lexical normalization corpus for Vietnamese social media text,” arXiv preprint arXiv:2401.16403, 2024. Access time: 12/10/2025, https://arxiv.org/abs/2401.16403.

T. Vu, D. Q. Nguyen, D. Q. Nguyen, M. Dras and M. Johnson, “VnCoreNLP: A Vietnamese natural language processing toolkit,” in Proceedings of NAACL-HLT (Demonstrations), 2018.

N. V. Dinh, T. C. Dang, L. T. Nguyen and K. V. Nguyen, “Multi-dialect Vietnamese: Task, dataset, baseline models and challenges,” in Proceedings of EMNLP, 2024.

T. N. H. Nguyen and T. D. Nguyen, “Vietnamese hate speech detection on social media using TF-IDF and SVM,” in Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies, Ho Chi Minh City, Vietnam, pp. 1–5, 2020.

D. Q. Nguyen and A. T. Nguyen, “PhoBERT: Pre-trained language models for Vietnamese,” in Findings of the Association for Computational Linguistics: EMNLP, pp. 1037–1042, 2020.

T. V. Nguyen, “Word2Vec models for Vietnamese text representation,” Journal of Computer Science and Cybernetics, vol. 34, no. 4, pp. 295–307, 2018.

H. Q. Tran, T. P. Nguyen and T. T. Do, “ViHateT5: A T5-based model for Vietnamese hate speech detection,” in Proceedings of the 2024 International Conference on Asian Language Processing (IALP), 2024.

T. T. Prama, J. F. Amrin, M. M. Anwar and I. H. Sarker, “AI-enabled user-specific cyberbullying severity detection with explainability,” arXiv preprint arXiv:2503.10650, 2025. Access time: 12/10/2025, https://arxiv.org/abs/2503.10650

G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513–523, 1988.

T. H. Nguyen and K. Shirai, “Text classification of technical papers based on text segmentation,” in Proceedings of the 18th International Conference on Applications of Natural Language to Information Systems (NLDB), Lecture Notes in Computer Science, vol. 7934, Springer, pp. 278–284, 2013.

P. Mishra, M. Del Tredici, H. Yannakoudakis and E. Shutova, “Abusive language detection with graph convolutional networks,” in Proceedings of NAACL-HLT, pp. 2145–2150, 2019.

R. Kshirsagar, T. Cukuvac, K. McKeown and S. McGregor, “Predictive embeddings for hate speech detection on Twitter,” in Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), Brussels, pp. 26–32, 2018.

J. Devlin, M. Chang, K. Lee and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, pp. 4171–4186, 2019.

V. Sanh, L. Debut, J. Chaumond and T. Wolf, “DistilBERT: A distilled version of BERT: Smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019. Access time: 12/10/2025, https://arxiv.org/abs/1910.01108

Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. Access time: 12/10/2025, https://arxiv.org/abs/1907.11692

S. T. Luu, K. V. Nguyen and N. L.-T. Nguyen, “A large-scale dataset for hate speech detection on Vietnamese social media texts,” arXiv preprint arXiv:2103.11528, 2021. Access time: 12/10/2025, https://arxiv.org/abs/2103.11528.

L. T. Nguyen, K. V. Nguyen and N. L.-T. Nguyen, “Constructive and toxic speech detection for open-domain social media comments in Vietnamese”, Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices, pp. 572-583, 2021. DOI:10.1007/978-3-030-79457-6_49.

C. N. Vo, K. B. Huynh, S. T. Luu and T.-H. Do, “ViTHSD: Exploiting hatred by targets for hate speech detection on Vietnamese social media texts,” Journal of Computational Social Science, vol. 8, Article 30, 2025. DOI: 10.1007/s42001-024-00348-6.

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777, 2017.

R. Patel and V. Sharma, “SHAP-based efficient moderation system for online platforms,” in Proceedings of the International Conference on Responsible AI in Online Safety (RAIOS), pp. 22–30, 2024.

A. Nirmal, A. Bhattacharjee, P. Sheth and H. Liu, “Towards interpretable hate speech detection using large language model-extracted rationales,” in Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH), pp. 223–233, 2024. Access time: 15/12/2025, https://aclanthology.org/2024.woah-1.17/.

A. Gupta and R. Singh, “Concept bottleneck models for hate and counter speech detection,” in Proceedings of the Conference on Ethics in NLP, pp. 67–75, 2024.

N. T. Bui and M. L. Tran, “A survey on explainable AI for hate speech and misinformation detection,” in Proceedings of the 2024 International Conference on Asian Language Technology (ALTA), pp. 134–141, 2024.

R. Artstein and M. Poesio, “Inter-coder agreement for computational linguistics,” Computational Linguistics, vol. 34, no. 4, pp. 555–596, 2008.

M. Ha, T.-H. Nguyen, N.-N. Nguyen and Q.-T. Tran, “Vietnamese harmful short comments dataset (ViHSC-15K),” GitHub repository, 2025. Access time: 15/12/2025, https://github.com/mourinhan8/Vietnamese-Harmful-Short-Comments-Dataset-ViHSC-15K.

T. V. Huynh, D. V. Nguyen, K. V. Nguyen, N. L. T. Nguyen and A. G. T. Nguyen, “Hate speech detection on Vietnamese social media text using the Bi-GRU–LSTM–CNN model”, arXiv preprint, arXiv:1911.03644, 2019. Access time: 15/12/2025, https://arxiv.org/abs/1911.03644.

Downloads

Abstract views: 27 / PDF downloads: 7

Published

2026-06-24

How to Cite

Muon, H. V., Hai, N. T., Nhan, N. N., & Tuan, T. Q. (2026). An Explainable Hybrid Framework for Detecting Harmful Short Comments on Vietnamese Social Media Using TF-IDF and Lightweight Transformers. Journal of Science and Technology on Information Security, 1(27), 35-50. https://doi.org/10.54654/isj.v3i26.1162

Issue

Section

Papers