Static Feature Selection for IoT Malware Detection

Authors

  • Nguyen Ngoc Toan
  • Luong The Dung
  • Dang Quang Thang

DOI:

https://doi.org/10.54654/isj.v1i15.844

Keywords:

feature selection, opcode, IoT malware, malware detection, machine learning

Tóm tắt

AbstractOur world has recently witnessed the
explosive growth of IoT networks as one of the
pillars of the 4th industrial revolution. Malware on
IoT devices also grows accordingly in number and
sophisticated techniques. Therefore, it is necessary
to come up with more efficient approaches to IoT
malware detection with machine learning models
that can be used in solutions using limited
resources. In this paper, we study and evaluate the
efficiency of using a weight of term frequency–
inverse document frequency model in feature
selection method combined with an effective
machine learning model in IoT malware detection
based on opcode sequence features. We performed
experiments on a MIPS ELF dataset that included
4,511 malicious samples with main four classes and
4,393 benign programs. Experiment results show
that our proposed method has very good
performance on the above dataset with detection
and classification accuracy which are 99.8% and
95.8% respectively while the models only use 20
opcodes that have the highest weight values.
Tóm tắt Cuộc cách mạng công nghiệp lần thứ 4
với sự phát triển của các thiết bị IoT đã và đang ảnh
hưởng sâu rộng đến các lĩnh vực trong đời sống xã
hội. Các mã độc trên thiết bị IoT ngày càng gia tăng
về số lượng và sử dụng các kỹ thuật lẩn tránh tinh
vi. Điều này đòi hỏi cần có các phương pháp tiếp cận
hiệu quả hơn trong phát hiện mã độc trên thiết bị
IoT với các mô hình học máy hiệu quả, có khả năng
ứng dụng trong các giải pháp đảm bảo an toàn
thông tin có tài nguyên hạn chế. Trong bài báo này,
chúng tôi nghiên cứu và đánh giá hiệu quả của việc
xác định trọng số trong tìm kiếm truy xuất thông tin
trong phương pháp trích chọn đặc trưng kết hợp
mô hình học máy hiệu quả cho việc phát hiện mã
độc IoT dựa trên đặc trưng chuỗi opcode. Chúng tôi
đã tiến hành thử nghiệm với một tập dữ liệu MIPS
ELF gồm 4.511 mẫu độc hại với 4 loại chính và
4.393 chương trình lành tính. Các kết quả thực
nghiệm đã chứng minh rằng phương pháp của bài
báo đề xuất cho kết quả tốt đối với tập dữ liệu nêu
trên, tỉ lệ phát hiện và phân 4 loại mã độc cao nhất
tương ứng là 99.8% và 95.8% khi chỉ cần sử dụng
20 opcode có giá trị trọng số cao nhất.

Downloads

Download data is not yet available.

References

D. Gibert, C. Mateu, and J. Planes, “The rise of

machine learning for detection and classification of

malware: Research developments, trends and

challenges”, Journal of Network and Computer

Applications, 153(January), 102526, 2020.

S. Smith, IoT Connections To Reach 83 Billion By

Driven By Maturing Industrial Use Cases,

Apr. 2020. ([online] Available:

https://www.juniperresearch.com/press/pressreleases/iot-connections-to-reach-83-billion-by-

-driven).

Nguyen Ngoc Toan, Luong The Dung, Tran Nghi

Phu, A novel approach to detect IoT malware by

system calls and LSTM model, Journal of

Theoretical and Applied Information Technology

st August 2021 -- Vol. 99, 2021.

Johnson, Richard Arnold, and Dean W. Wichern,

Applied Multivariate Statistical Analysis, 5th ed.

Prentice Hall, 2002.

J. Z. Kolter and M. A. Maloof, “Learning to detect

and classify malicious executables in the wild,”

Journal of Machine Learning Research, vol. 7, no.

Dec, 2006, pp. 2721–2744.

G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu,

“Large-scale malware classification using random

projections and neural networks,” in Acoustics,

Speech and Signal Processing (ICASSP), 2013 IEEE

International Conference on. IEEE, 2013, pp. 3422–

J. Saxe and K. Berlin, “Deep neural network-based

malware detection using two dimensional binary

program features,” in Malicious and Unwanted

Software (MALWARE), 2015 10th International

Conference on. IEEE, 2015, pp. 11–20.

Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature

Subset Selection for Malware Detection in Smart IoT

Platforms. Sensors 2021.

D. Carlin, A. Cowan, P. O’Kane, S. Sezer, “The

effects of traditional anti-virus labels on malware

detection using dynamic runtime opcodes”, IEEE

Access 5, 2017, pp. 17742–17752,

Yuxin Ding, Xuebing Yuan, Ke Tang, Xiao Xiao,

Yibin Zhang, “A fast malware detection algorithm

based on objective-oriented association mining”,

Computers & Security, Volume 39, Part B, Pages

-324, ISSN 0167-4048, 2013,

G. Zhao, K. Xu, L. Xu, B. Wu, “Detecting apt

malware infections based on malicious dns and

traffic analysis”, IEEE Access 3, 2015, pp. 1132–

Z. Salehi, A. Sami, M. Ghiasi, “Maar: robust

features to detect malicious activity based on api

calls, their arguments and return values”, Eng. Appl.

Artif. Intell. 59, 2017, pp. 93–102.

Y. Ye, D. Wang, T. Li, D. Ye, and Q. Jiang, “An

intelligent pe-malware detection system based on

association mining”, J. Comput. Virol. 4 (4), 2008,

pp. 323–334.

Z. Fuyong, Z. Tiezhu, “Malware detection and

classification based on n-grams attribute similarity”,

in: 2017 IEEE International Conference on

Computational Science and Engineering (CSE) and

IEEE International Conference on Embedded and

Ubiquitous Computing (EUC), vol. 1, 2017, pp.

–796.

D. Yuxin, Z. Siyi, “Malware detection based on

deep learning algorithm”, Neural Comput, Appl. 31

(2), 2019, pp. 461–472.

M. Ahmadi, D. Ulyanov, S. Semenov, M.

Trofimov, and G. Giacinto, “Novel feature

extraction, selection and fusion for effective

malware family classification”, CODASPY 16. In:

Proceedings of the Sixth ACM Conference on Data

and Application Security and Privacy, ACM, New

York, NY, USA, 2016, pp. 183–194.

D. Gibert, C. Mateu, J. Planes, R. Vicens,

“Classification of malware by using structural

entropy on convolutional neural networks”, In:

IAAI Conference on Artificial Intelligence, 2018,

pp. 7759–7764.

Mas’ud, M.Z.; Sahib, S.; Abdollah, M.F.; Selamat,

S.R.; Huoy, C.Y. A comparative study on feature

selection method for N-gram mobile malware

detection. Int. J. Netw. Secur, 2017,pp.727–733.

Bolón-Canedo, V.; Sánchez-Maroño, N.; AlonsoBetanzos, A. A review of feature selection methods

on synthetic data. Knowl. Inf. Syst,2013, pp.483–

Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature

Subset Selection for Malware Detection in Smart

IoT Platforms, 2021.

Xue, B.; Zhang, M.; Browne, W.N. A

comprehensive comparison on evolutionary feature

selection approaches to classification. Int. J.

Comput. Intell. Appl. 2015

Tran Nghi Phu, Hoang Dang Kien, Ngo Quoc

Dung, Nguyen Dai Tho, “A Novel Framework to

Classify Malware in MIPS Architecture-Based IoT

Devices”, Hindawi Security and Communication

Networks Volume 2019, Article ID 4073940, 13

pages, 2019.

BooJoong Kang, Suleiman Y. Yerima, Sakir Sezer,

Kieran McLaughlin, N-gram Opcode Analysis for

Android Malware Detection, International Journal

on Cyber Situational Awareness, Vol. 1, No. 1,

, pp.231-255.

Yuxin Ding, Wei Dai, Shengli Yan and Yumei

Zhang. Control Flow-Based Opcode Behavior

Analysis for Malware Detection. Computers &

Security 44, 2014, pp.65–74.

Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas,

P.G., Opcode sequences as representation of

executables for data-mining-based unknown

malware detection. Inf. Sci. (Ny), 2013.

Yewale, A., Singh, M., 2017. Malware detection

based on opcode frequency, in: Proc. 2016 Int. Conf.

Adv. Commun. Control Comput. Technol.

ICACCCT, 2016.

Jerome, Q., Allix, K., State, R., & Engel, T. Using

opcode sequences to detect malicious Android

applications. In Proc. IEEE International

Conference on Communications, 2014, pp.914–919.

Kang, BooJoong & Yerima, Suleiman & Sezer,

Sakir & Mclaughlin, Kieran. (2016). N-gram

Opcode Analysis for Android Malware Detection.

International Journal on Cyber Situational

Awareness, 2016, pp.231-255.

[Online] https://hex-rays.com/ida-pro/

M. Ghiasi, A. Sami, Z. Salehi, “Dynamic vsa: a

framework for malware detection based on register

contents”. Eng. Appl. Artif. Intell. 44, 2015, pp. 111–

Daniel Bilar, Opcodes as predictor for malware, Int.

J. Electronic Security and Digital Forensics, Vol. 1,

No. 2, 2007.

Qaiser, Shahzad & Ali, Ramsha. (2018). Text

Mining: Use of TF-IDF to Examine the Relevance of

Words to Documents. International Journal of

Computer Applications. 181.

5120/ijca2018917395.

Kim, SW., Gil, JM. Research paper classification

systems based on TF-IDF and LDA schemes. Hum.

Cent. Comput. Inf. Sci. 9, 30 (2019).

Downloads

Abstract views: 0 / PDF downloads: 0

Published

2022-06-08

How to Cite

Toan, N. N. ., Dung, L. T., & Thang, D. Q. (2022). Static Feature Selection for IoT Malware Detection. Journal of Science and Technology on Information Security, 1(15), 74-84. https://doi.org/10.54654/isj.v1i15.844

Issue

Section

Papers