Static Feature Selection for IoT Malware Detection
DOI:
https://doi.org/10.54654/isj.v1i15.844Keywords:
feature selection, opcode, IoT malware, malware detection, machine learningTóm tắt
Abstract—Our world has recently witnessed the
explosive growth of IoT networks as one of the
pillars of the 4th industrial revolution. Malware on
IoT devices also grows accordingly in number and
sophisticated techniques. Therefore, it is necessary
to come up with more efficient approaches to IoT
malware detection with machine learning models
that can be used in solutions using limited
resources. In this paper, we study and evaluate the
efficiency of using a weight of term frequency–
inverse document frequency model in feature
selection method combined with an effective
machine learning model in IoT malware detection
based on opcode sequence features. We performed
experiments on a MIPS ELF dataset that included
4,511 malicious samples with main four classes and
4,393 benign programs. Experiment results show
that our proposed method has very good
performance on the above dataset with detection
and classification accuracy which are 99.8% and
95.8% respectively while the models only use 20
opcodes that have the highest weight values.
Tóm tắt— Cuộc cách mạng công nghiệp lần thứ 4
với sự phát triển của các thiết bị IoT đã và đang ảnh
hưởng sâu rộng đến các lĩnh vực trong đời sống xã
hội. Các mã độc trên thiết bị IoT ngày càng gia tăng
về số lượng và sử dụng các kỹ thuật lẩn tránh tinh
vi. Điều này đòi hỏi cần có các phương pháp tiếp cận
hiệu quả hơn trong phát hiện mã độc trên thiết bị
IoT với các mô hình học máy hiệu quả, có khả năng
ứng dụng trong các giải pháp đảm bảo an toàn
thông tin có tài nguyên hạn chế. Trong bài báo này,
chúng tôi nghiên cứu và đánh giá hiệu quả của việc
xác định trọng số trong tìm kiếm truy xuất thông tin
trong phương pháp trích chọn đặc trưng kết hợp
mô hình học máy hiệu quả cho việc phát hiện mã
độc IoT dựa trên đặc trưng chuỗi opcode. Chúng tôi
đã tiến hành thử nghiệm với một tập dữ liệu MIPS
ELF gồm 4.511 mẫu độc hại với 4 loại chính và
4.393 chương trình lành tính. Các kết quả thực
nghiệm đã chứng minh rằng phương pháp của bài
báo đề xuất cho kết quả tốt đối với tập dữ liệu nêu
trên, tỉ lệ phát hiện và phân 4 loại mã độc cao nhất
tương ứng là 99.8% và 95.8% khi chỉ cần sử dụng
20 opcode có giá trị trọng số cao nhất.
Downloads
References
D. Gibert, C. Mateu, and J. Planes, “The rise of
machine learning for detection and classification of
malware: Research developments, trends and
challenges”, Journal of Network and Computer
Applications, 153(January), 102526, 2020.
S. Smith, IoT Connections To Reach 83 Billion By
Driven By Maturing Industrial Use Cases,
Apr. 2020. ([online] Available:
https://www.juniperresearch.com/press/pressreleases/iot-connections-to-reach-83-billion-by-
-driven).
Nguyen Ngoc Toan, Luong The Dung, Tran Nghi
Phu, A novel approach to detect IoT malware by
system calls and LSTM model, Journal of
Theoretical and Applied Information Technology
st August 2021 -- Vol. 99, 2021.
Johnson, Richard Arnold, and Dean W. Wichern,
Applied Multivariate Statistical Analysis, 5th ed.
Prentice Hall, 2002.
J. Z. Kolter and M. A. Maloof, “Learning to detect
and classify malicious executables in the wild,”
Journal of Machine Learning Research, vol. 7, no.
Dec, 2006, pp. 2721–2744.
G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu,
“Large-scale malware classification using random
projections and neural networks,” in Acoustics,
Speech and Signal Processing (ICASSP), 2013 IEEE
International Conference on. IEEE, 2013, pp. 3422–
J. Saxe and K. Berlin, “Deep neural network-based
malware detection using two dimensional binary
program features,” in Malicious and Unwanted
Software (MALWARE), 2015 10th International
Conference on. IEEE, 2015, pp. 11–20.
Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature
Subset Selection for Malware Detection in Smart IoT
Platforms. Sensors 2021.
D. Carlin, A. Cowan, P. O’Kane, S. Sezer, “The
effects of traditional anti-virus labels on malware
detection using dynamic runtime opcodes”, IEEE
Access 5, 2017, pp. 17742–17752,
Yuxin Ding, Xuebing Yuan, Ke Tang, Xiao Xiao,
Yibin Zhang, “A fast malware detection algorithm
based on objective-oriented association mining”,
Computers & Security, Volume 39, Part B, Pages
-324, ISSN 0167-4048, 2013,
G. Zhao, K. Xu, L. Xu, B. Wu, “Detecting apt
malware infections based on malicious dns and
traffic analysis”, IEEE Access 3, 2015, pp. 1132–
Z. Salehi, A. Sami, M. Ghiasi, “Maar: robust
features to detect malicious activity based on api
calls, their arguments and return values”, Eng. Appl.
Artif. Intell. 59, 2017, pp. 93–102.
Y. Ye, D. Wang, T. Li, D. Ye, and Q. Jiang, “An
intelligent pe-malware detection system based on
association mining”, J. Comput. Virol. 4 (4), 2008,
pp. 323–334.
Z. Fuyong, Z. Tiezhu, “Malware detection and
classification based on n-grams attribute similarity”,
in: 2017 IEEE International Conference on
Computational Science and Engineering (CSE) and
IEEE International Conference on Embedded and
Ubiquitous Computing (EUC), vol. 1, 2017, pp.
–796.
D. Yuxin, Z. Siyi, “Malware detection based on
deep learning algorithm”, Neural Comput, Appl. 31
(2), 2019, pp. 461–472.
M. Ahmadi, D. Ulyanov, S. Semenov, M.
Trofimov, and G. Giacinto, “Novel feature
extraction, selection and fusion for effective
malware family classification”, CODASPY 16. In:
Proceedings of the Sixth ACM Conference on Data
and Application Security and Privacy, ACM, New
York, NY, USA, 2016, pp. 183–194.
D. Gibert, C. Mateu, J. Planes, R. Vicens,
“Classification of malware by using structural
entropy on convolutional neural networks”, In:
IAAI Conference on Artificial Intelligence, 2018,
pp. 7759–7764.
Mas’ud, M.Z.; Sahib, S.; Abdollah, M.F.; Selamat,
S.R.; Huoy, C.Y. A comparative study on feature
selection method for N-gram mobile malware
detection. Int. J. Netw. Secur, 2017,pp.727–733.
Bolón-Canedo, V.; Sánchez-Maroño, N.; AlonsoBetanzos, A. A review of feature selection methods
on synthetic data. Knowl. Inf. Syst,2013, pp.483–
Abawajy, J.; Darem, A.; Alhashmi, A.A. Feature
Subset Selection for Malware Detection in Smart
IoT Platforms, 2021.
Xue, B.; Zhang, M.; Browne, W.N. A
comprehensive comparison on evolutionary feature
selection approaches to classification. Int. J.
Comput. Intell. Appl. 2015
Tran Nghi Phu, Hoang Dang Kien, Ngo Quoc
Dung, Nguyen Dai Tho, “A Novel Framework to
Classify Malware in MIPS Architecture-Based IoT
Devices”, Hindawi Security and Communication
Networks Volume 2019, Article ID 4073940, 13
pages, 2019.
BooJoong Kang, Suleiman Y. Yerima, Sakir Sezer,
Kieran McLaughlin, N-gram Opcode Analysis for
Android Malware Detection, International Journal
on Cyber Situational Awareness, Vol. 1, No. 1,
, pp.231-255.
Yuxin Ding, Wei Dai, Shengli Yan and Yumei
Zhang. Control Flow-Based Opcode Behavior
Analysis for Malware Detection. Computers &
Security 44, 2014, pp.65–74.
Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas,
P.G., Opcode sequences as representation of
executables for data-mining-based unknown
malware detection. Inf. Sci. (Ny), 2013.
Yewale, A., Singh, M., 2017. Malware detection
based on opcode frequency, in: Proc. 2016 Int. Conf.
Adv. Commun. Control Comput. Technol.
ICACCCT, 2016.
Jerome, Q., Allix, K., State, R., & Engel, T. Using
opcode sequences to detect malicious Android
applications. In Proc. IEEE International
Conference on Communications, 2014, pp.914–919.
Kang, BooJoong & Yerima, Suleiman & Sezer,
Sakir & Mclaughlin, Kieran. (2016). N-gram
Opcode Analysis for Android Malware Detection.
International Journal on Cyber Situational
Awareness, 2016, pp.231-255.
[Online] https://hex-rays.com/ida-pro/
M. Ghiasi, A. Sami, Z. Salehi, “Dynamic vsa: a
framework for malware detection based on register
contents”. Eng. Appl. Artif. Intell. 44, 2015, pp. 111–
Daniel Bilar, Opcodes as predictor for malware, Int.
J. Electronic Security and Digital Forensics, Vol. 1,
No. 2, 2007.
Qaiser, Shahzad & Ali, Ramsha. (2018). Text
Mining: Use of TF-IDF to Examine the Relevance of
Words to Documents. International Journal of
Computer Applications. 181.
5120/ijca2018917395.
Kim, SW., Gil, JM. Research paper classification
systems based on TF-IDF and LDA schemes. Hum.
Cent. Comput. Inf. Sci. 9, 30 (2019).
Downloads
Abstract views: 0 / PDF downloads: 0
Published
How to Cite
Issue
Section
License
Proposed Policy for Journals That Offer Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Proposed Policy for Journals That Offer Delayed Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication, with the work [SPECIFY PERIOD OF TIME] after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).