Static Feature Selection for IoT Malware Detection

Authors

  • Nguyen Ngoc Toan
  • Luong The Dung
  • Dang Quang Thang

DOI:

https://doi.org/10.54654/isj.v1i15.844

Keywords:

feature selection, opcode, IoT malware, malware detection, machine learning

Tóm tắt

AbstractOur world has recently witnessed the explosive growth of IoT networks as one of the pillars of the 4th industrial revolution. Malware on IoT devices also grows accordingly in number and sophisticated techniques. Therefore, it is necessary to come up with more efficient approaches to IoT malware detection with machine learning models that can be used in solutions using limited resources. In this paper, we study and evaluate the efficiency of using a weight of term frequency– inverse document frequency model in feature selection method combined with an effective machine learning model in IoT malware detection based on opcode sequence features. We performed experiments on a MIPS ELF dataset that included 4,511 malicious samples with main four classes and 4,393 benign programs. Experiment results show that our proposed method has very good performance on the above dataset with detection and classification accuracy which are 99.8% and 95.8% respectively while the models only use 20 opcodes that have the highest weight values.

Tóm tắt Cuộc cách mạng công nghiệp lần thứ 4 với sự phát triển của các thiết bị IoT đã và đang ảnh hưởng sâu rộng đến các lĩnh vực trong đời sống xã hội. Các mã độc trên thiết bị IoT ngày càng gia tăng về số lượng và sử dụng các kỹ thuật lẩn tránh tinh vi. Điều này đòi hỏi cần có các phương pháp tiếp cận hiệu quả hơn trong phát hiện mã độc trên thiết bị IoT với các mô hình học máy hiệu quả, có khả năng ứng dụng trong các giải pháp đảm bảo an toàn thông tin có tài nguyên hạn chế. Trong bài báo này, chúng tôi nghiên cứu và đánh giá hiệu quả của việc\ xác định trọng số trong tìm kiếm truy xuất thông tin trong phương pháp trích chọn đặc trưng kết hợp mô hình học máy hiệu quả cho việc phát hiện mã độc IoT dựa trên đặc trưng chuỗi opcode. Chúng tôi đã tiến hành thử nghiệm với một tập dữ liệu MIPS ELF gồm 4.511 mẫu độc hại với 4 loại chính và 4.393 chương trình lành tính. Các kết quả thực nghiệm đã chứng minh rằng phương pháp của bài báo đề xuất cho kết quả tốt đối với tập dữ liệu nêu trên, tỉ lệ phát hiện và phân 4 loại mã độc cao nhất tương ứng là 99.8% và 95.8% khi chỉ cần sử dụng 20 opcode có giá trị trọng số cao nhất.

Downloads

Download data is not yet available.

References

D. Gibert, C. Mateu, and J. Planes, “The rise of machine learning for detection and classification of malware: Research developments, trends and challenges”, Journal of Network and Computer Applications, 153(January), 102526, 2020.

S. Smith, IoT Connections To Reach 83 Billion By 2024 Driven By Maturing Industrial Use Cases, Apr. 2020. ([online] Available: https://www.juniperresearch.com/press/pressreleases/iot-connections-to-reach-83-billion-by- 2024-driven).

Nguyen Ngoc Toan, Luong The Dung, Tran Nghi Phu, A novel approach to detect IoT malware by system calls and LSTM model, Journal of Theoretical and Applied Information Technology 31st August 2021 -- Vol. 99, 2021.

Johnson, Richard Arnold, and Dean W. Wichern, Applied Multivariate Statistical Analysis, 5th ed. Prentice Hall, 2002.

J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild,” Journal of Machine Learning Research, vol. 7, no. Dec, 2006, pp. 2721–2744.

G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422– 3426.

J. Saxe and K. Berlin, “Deep neural network-based malware detection using two dimensional binary program features,” in Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on. IEEE, 2015, pp. 11–20.

Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature Subset Selection for Malware Detection in Smart IoT Platforms. Sensors 2021.

D. Carlin, A. Cowan, P. O’Kane, S. Sezer, “The effects of traditional anti-virus labels on malware detection using dynamic runtime opcodes”, IEEE Access 5, 2017, pp. 17742–17752,

Yuxin Ding, Xuebing Yuan, Ke Tang, Xiao Xiao, Yibin Zhang, “A fast malware detection algorithm based on objective-oriented association mining”, Computers & Security, Volume 39, Part B, Pages 315-324, ISSN 0167-4048, 2013,

G. Zhao, K. Xu, L. Xu, B. Wu, “Detecting apt malware infections based on malicious dns and traffic analysis”, IEEE Access 3, 2015, pp. 1132– 1142.

Z. Salehi, A. Sami, M. Ghiasi, “Maar: robust features to detect malicious activity based on api calls, their arguments and return values”, Eng. Appl. Artif. Intell. 59, 2017, pp. 93–102.

Y. Ye, D. Wang, T. Li, D. Ye, and Q. Jiang, “An intelligent pe-malware detection system based on association mining”, J. Comput. Virol. 4 (4), 2008, pp. 323–334.

Z. Fuyong, Z. Tiezhu, “Malware detection and classification based on n-grams attribute similarity”, in: 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), vol. 1, 2017, pp. 793–796.

D. Yuxin, Z. Siyi, “Malware detection based on deep learning algorithm”, Neural Comput, Appl. 31 (2), 2019, pp. 461–472.

M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, and G. Giacinto, “Novel feature extraction, selection and fusion for effective malware family classification”, CODASPY 16. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, ACM, New York, NY, USA, 2016, pp. 183–194.

D. Gibert, C. Mateu, J. Planes, R. Vicens, “Classification of malware by using structural entropy on convolutional neural networks”, In: IAAI Conference on Artificial Intelligence, 2018, pp. 7759–7764.

Mas’ud, M.Z.; Sahib, S.; Abdollah, M.F.; Selamat, S.R.; Huoy, C.Y. A comparative study on feature selection method for N-gram mobile malware detection. Int. J. Netw. Secur, 2017,pp.727–733.

Bolón-Canedo, V.; Sánchez-Maroño, N.; AlonsoBetanzos, A. A review of feature selection methods on synthetic data. Knowl. Inf. Syst,2013, pp.483– 519.

Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature Subset Selection for Malware Detection in Smart IoT Platforms, 2021.

Xue, B.; Zhang, M.; Browne, W.N. A comprehensive comparison on evolutionary feature

selection approaches to classification. Int. J. Comput. Intell. Appl. 2015

Tran Nghi Phu, Hoang Dang Kien, Ngo Quoc Dung, Nguyen Dai Tho, “A Novel Framework to Classify Malware in MIPS Architecture-Based IoT Devices”, Hindawi Security and Communication Networks Volume 2019, Article ID 4073940, 13 pages, 2019.

BooJoong Kang, Suleiman Y. Yerima, Sakir Sezer, Kieran McLaughlin, N-gram Opcode Analysis for Android Malware Detection, International Journal on Cyber Situational Awareness, Vol. 1, No. 1, 2016, pp.231-255. [24] Yuxin Ding, Wei Dai, Shengli Yan and Yumei Zhang. Control Flow-Based Opcode Behavior Analysis for Malware Detection. Computers & Security 44, 2014, pp.65–74.

Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G., Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. (Ny), 2013.

Yewale, A., Singh, M., 2017. Malware detection based on opcode frequency, in: Proc. 2016 Int. Conf. Adv. Commun. Control Comput. Technol. ICACCCT, 2016.

Jerome, Q., Allix, K., State, R., & Engel, T. Using opcode sequences to detect malicious Android applications. In Proc. IEEE International Conference on Communications, 2014, pp.914–919.

Kang, BooJoong & Yerima, Suleiman & Sezer, Sakir & Mclaughlin, Kieran. (2016). N-gram Opcode Analysis for Android Malware Detection. International Journal on Cyber Situational Awareness, 2016, pp.231-255.

[Online] https://hex-rays.com/ida-pro/

M. Ghiasi, A. Sami, Z. Salehi, “Dynamic vsa: a framework for malware detection based on register contents”. Eng. Appl. Artif. Intell. 44, 2015, pp. 111– 122.

Daniel Bilar, Opcodes as predictor for malware, Int. J. Electronic Security and Digital Forensics, Vol. 1, No. 2, 2007.

Qaiser, Shahzad & Ali, Ramsha. (2018). Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications. 181. 10.5120/ijca2018917395.

Kim, SW., Gil, JM. Research paper classification systems based on TF-IDF and LDA schemes. Hum. Cent. Comput. Inf. Sci. 9, 30 (2019).

Downloads

Abstract views: 55 / PDF downloads: 22

Published

2022-06-08

How to Cite

Toan, N. N. ., Dung, L. T., & Thang, D. Q. (2022). Static Feature Selection for IoT Malware Detection. Journal of Science and Technology on Information Security, 1(15), 74-84. https://doi.org/10.54654/isj.v1i15.844

Issue

Section

Papers