DSViT: An Enhanced Transformer Model for Deepfake Detection
DOI:
https://doi.org/10.54654/isj.v2i22.1055Keywords:
deepfake detection, DSViT, deepfake, spatial deepfakeTóm tắt
The rapid development of artificial intelligence and deep learning models has enabled the creation of highly realistic fake images and videos, posing significant threats to information security and safety. Accurate detection of these forged contents is crucial to prevent the spread of misinformation and to protect the integrity of digital media. Although several advanced studies in this field, such as Vision Transformer (ViT) and Convolutional Vision Transformer (CViT), have been conducted, there remain limitations that need to be addressed. In this paper, we introduce a novel model, improved from CViT, designed to optimize the process of deepfake detection, named DSViT (Deepfake Detection with SC-based Convolutional Vision Transformer). This model judiciously integrates Convolutions and a SCConvolution block with the ViT architecture. We conducted experiments on the Deepfake Detection Challenge (DFDC) dataset and compared the results with the CViT model to demonstrate the effectiveness of the proposed model
Downloads
References
F. Abbas and A. Taeihagh, “Unmasking deepfakes: A systematic review of deepfake detection and generation techniques using artificial intelligence,” Expert Systems With Applications, 2024: 124260.
A. Naitali, M. Ridouani, F. Salahdine, M. Kaabouch, “Deepfake attacks: Generation, detection, datasets, challenges, and research directions,” Computers, vol. 12, no. 10, pp. 216, Oct 2023.
X. Li, H. Zhou, and M. Zhao, “Transformer-based cascade networks with spatial and channel reconstruction convolution for deepfake detection,” Mathematical Biosciences and Engineering, vol. 21, no. 3, pp. 4142-4164, 2024.
D. Wodajo & S. Atnafu, “Deepfake Video Detection Using Convolutional Vision Transformer”, arXiv preprint arXiv:2102.11126, 2021.
F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017.
H. H. Nguyen, N. T. Tieu & I. Echizen, “Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos”, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2307-2311, May 2019.
D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, “MesoNet: A Compact Facial Video Forgery Detection Network,” arXiv:1809.00888, Sep 2018.
M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in International Conference on Machine Learning (ICML), Jun 2019.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in International Conference on Learning Representations (ICLR), May 2021.
A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lučić & C. Schmid, “ViViT: A Video Vision Transformer”, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6816-6826, 2021.
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” in International Conference on Computer Vision (ICCV), Oct 2021.
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Feb 2022.
J. Li, Y. Wen, and L. He, “SCConv: Spatial and channel reconstruction convolution for feature redundancy,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Kagge (2020), Deepfake Detection Challenge. Accessed September 10, 2024, from: https://www.kaggle.com/c/deepfake-detection-challenge/data.
Downloads
Published
How to Cite
Issue
Section
License
Proposed Policy for Journals That Offer Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Proposed Policy for Journals That Offer Delayed Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication, with the work [SPECIFY PERIOD OF TIME] after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).