Static Feature Selection for IoT Malware Detection
DOI:
https://doi.org/10.54654/isj.v1i15.844Keywords:
feature selection, opcode, IoT malware, malware detection, machine learningTóm tắt
Abstract— Our world has recently witnessed the explosive growth of IoT networks as one of the pillars of the 4th industrial revolution. Malware on IoT devices also grows accordingly in number and sophisticated techniques. Therefore, it is necessary to come up with more efficient approaches to IoT malware detection with machine learning models that can be used in solutions using limited resources. In this paper, we study and evaluate the efficiency of using a weight of term frequency– inverse document frequency model in feature selection method combined with an effective machine learning model in IoT malware detection based on opcode sequence features. We performed experiments on a MIPS ELF dataset that included 4,511 malicious samples with main four classes and 4,393 benign programs. Experiment results show that our proposed method has very good performance on the above dataset with detection and classification accuracy which are 99.8% and 95.8% respectively while the models only use 20 opcodes that have the highest weight values.
Downloads
References
D. Gibert, C. Mateu, and J. Planes, “The rise of machine learning for detection and classification of malware: Research developments, trends and challenges”, Journal of Network and Computer Applications, 153(January), 102526, 2020.
S. Smith, IoT Connections To Reach 83 Billion By 2024 Driven By Maturing Industrial Use Cases, Apr. 2020. ([online] Available: https://www.juniperresearch.com/press/pressreleases/iot-connections-to-reach-83-billion-by- 2024-driven).
Nguyen Ngoc Toan, Luong The Dung, Tran Nghi Phu, A novel approach to detect IoT malware by system calls and LSTM model, Journal of Theoretical and Applied Information Technology 31st August 2021 -- Vol. 99, 2021.
Johnson, Richard Arnold, and Dean W. Wichern, Applied Multivariate Statistical Analysis, 5th ed. Prentice Hall, 2002.
J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild,” Journal of Machine Learning Research, vol. 7, no. Dec, 2006, pp. 2721–2744.
G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422– 3426.
J. Saxe and K. Berlin, “Deep neural network-based malware detection using two dimensional binary program features,” in Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on. IEEE, 2015, pp. 11–20.
Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature Subset Selection for Malware Detection in Smart IoT Platforms. Sensors 2021.
D. Carlin, A. Cowan, P. O’Kane, S. Sezer, “The effects of traditional anti-virus labels on malware detection using dynamic runtime opcodes”, IEEE Access 5, 2017, pp. 17742–17752,
Yuxin Ding, Xuebing Yuan, Ke Tang, Xiao Xiao, Yibin Zhang, “A fast malware detection algorithm based on objective-oriented association mining”, Computers & Security, Volume 39, Part B, Pages 315-324, ISSN 0167-4048, 2013,
G. Zhao, K. Xu, L. Xu, B. Wu, “Detecting apt malware infections based on malicious dns and traffic analysis”, IEEE Access 3, 2015, pp. 1132– 1142.
Z. Salehi, A. Sami, M. Ghiasi, “Maar: robust features to detect malicious activity based on api calls, their arguments and return values”, Eng. Appl. Artif. Intell. 59, 2017, pp. 93–102.
Y. Ye, D. Wang, T. Li, D. Ye, and Q. Jiang, “An intelligent pe-malware detection system based on association mining”, J. Comput. Virol. 4 (4), 2008, pp. 323–334.
Z. Fuyong, Z. Tiezhu, “Malware detection and classification based on n-grams attribute similarity”, in: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), vol. 1, 2017, pp. 793–796.
D. Yuxin, Z. Siyi, “Malware detection based on deep learning algorithm”, Neural Comput, Appl. 31 (2), 2019, pp. 461–472.
M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, and G. Giacinto, “Novel feature extraction, selection and fusion for effective malware family classification”, CODASPY 16. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, ACM, New York, NY, USA, 2016, pp. 183–194.
D. Gibert, C. Mateu, J. Planes, R. Vicens, “Classification of malware by using structural entropy on convolutional neural networks”, In: IAAI Conference on Artificial Intelligence, 2018, pp. 7759–7764.
Mas’ud, M.Z.; Sahib, S.; Abdollah, M.F.; Selamat, S.R.; Huoy, C.Y. A comparative study on feature selection method for N-gram mobile malware detection. Int. J. Netw. Secur, 2017,pp.727–733.
Bolón-Canedo, V.; Sánchez-Maroño, N.; AlonsoBetanzos, A. A review of feature selection methods on synthetic data. Knowl. Inf. Syst,2013, pp.483– 519.
Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature Subset Selection for Malware Detection in Smart IoT Platforms, 2021.
Xue, B.; Zhang, M.; Browne, W.N. A comprehensive comparison on evolutionary feature
selection approaches to classification. Int. J. Comput. Intell. Appl. 2015
Tran Nghi Phu, Hoang Dang Kien, Ngo Quoc Dung, Nguyen Dai Tho, “A Novel Framework to Classify Malware in MIPS Architecture-Based IoT Devices”, Hindawi Security and Communication Networks Volume 2019, Article ID 4073940, 13 pages, 2019.
BooJoong Kang, Suleiman Y. Yerima, Sakir Sezer, Kieran McLaughlin, N-gram Opcode Analysis for Android Malware Detection, International Journal on Cyber Situational Awareness, Vol. 1, No. 1, 2016, pp.231-255. [24] Yuxin Ding, Wei Dai, Shengli Yan and Yumei Zhang. Control Flow-Based Opcode Behavior Analysis for Malware Detection. Computers & Security 44, 2014, pp.65–74.
Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G., Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. (Ny), 2013.
Yewale, A., Singh, M., 2017. Malware detection based on opcode frequency, in: Proc. 2016 Int. Conf. Adv. Commun. Control Comput. Technol. ICACCCT, 2016.
Jerome, Q., Allix, K., State, R., & Engel, T. Using opcode sequences to detect malicious Android applications. In Proc. IEEE International Conference on Communications, 2014, pp.914–919.
Kang, BooJoong & Yerima, Suleiman & Sezer, Sakir & Mclaughlin, Kieran. (2016). N-gram Opcode Analysis for Android Malware Detection. International Journal on Cyber Situational Awareness, 2016, pp.231-255.
[Online] https://hex-rays.com/ida-pro/
M. Ghiasi, A. Sami, Z. Salehi, “Dynamic vsa: a framework for malware detection based on register contents”. Eng. Appl. Artif. Intell. 44, 2015, pp. 111– 122.
Daniel Bilar, Opcodes as predictor for malware, Int. J. Electronic Security and Digital Forensics, Vol. 1, No. 2, 2007.
Qaiser, Shahzad & Ali, Ramsha. (2018). Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications. 181. 10.5120/ijca2018917395.
Kim, SW., Gil, JM. Research paper classification systems based on TF-IDF and LDA schemes. Hum. Cent. Comput. Inf. Sci. 9, 30 (2019).
Downloads
Published
How to Cite
Issue
Section
License
Proposed Policy for Journals That Offer Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Proposed Policy for Journals That Offer Delayed Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication, with the work [SPECIFY PERIOD OF TIME] after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).