Malware Analysis: A Perspective from Dynamic Symbolic Execution of Binary Code

Authors

  • Mizuhito Ogawa

DOI:

https://doi.org/10.54654/isj.v2i25.1093

Keywords:

Dynamic symbolic execution, binary code, x86, control flow graph, obfuscation

Tóm tắt

Malware analysis typically involves three steps:
obfuscation, infection, and malicious action. Many antivirus methods fail because obfuscation hides control structures. This paper provides an overview of dynamic symbolic execution (DSE) applied to binary code, especially x86. DSE is considered the most powerful technique for deobfuscation and can automatically recover control structures such as control‑flow graphs. Several DSE tools target x86 (e.g., angr, Mayhem, S2E, KLEE‑MC, and BE‑PUM); we examine their design choices and trade‑offs. Finally, we evaluate the effectiveness of control‑flow graph similarity for tasks such as packer identification and original entry point (OEP) detection.

Downloads

Download data is not yet available.

References

M.Sikorski, A.Honig: Practical Malware Analysis, No Stretch Book, 2012.

D.Andriesse, Practical Binary Analysis, No Stretch Book, 2018.

P.Szor: The Art of Computer Virus Research and Defense, Addison Wesley, 2005

C.Anley, J.Heasman, F.Lindner, G.Richarte: The Shellcoder’s Handbook (2nd ed), Addison Wesley, 2007

C.Collberg, J.Nagra: Surreptitious software. Addison Wesley 2010

Nu1L Team: Handbook of CTFer, Springer, 2022.

G.Winskel: The Formal Semantics of Programming Languages, MIT Press, 1993.

F.Nielson, H.R.Nielson, C.Hankin: Principles of Program Analysis, Springer, 1999

E.Clarke: Programming language constructs for which it is impossible to obtain good Hoare axiom systems, JACM 26(1), 1979.

E.Clarke, S.M.German, J.Y.Halpern: Effective axiomatizations of Hoare logics, JACM 30(3), 1983.

K.A.Roundy, B.P.Miller: Binary-code obfuscations in prevalent packer tools. ACM Comput. Surv 46 4:1-4:32, 2013

S.Schrittwieser: Protecting Software through Obfuscation: Can It Keep Pace with Progress

in Code Analysis? ACM Comp Surv 49(1), 2016.

B.Cheng, J.Ming, E.A.Leal, H.Zhang, J.Fu, G.Peng, J.-Y.Marion: Obfuscation-Resilient Exectuable Payload Extraction From Packed Malware, USENIX, 3456, 2021.

D. Brumley, C. Hartwig, Z. Liang, J. Newsome, D. X. Song, H. Yin: Automatically identifying trigger-based behavior in malware, Botnet Detection ADIS 36, 2008.

B.Weisfeiler, A.A.Lehman: A reduction of a graph to a canonical form and an algebra arising during this reduction. NauchnoTechnicheskaya Informatsia 2(9), 1968.

N.Shervashidze, P.Schweitzer, E.J.van Leeuwen, K.Mehlhorn, K.M.Borgwardt: Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2011.

N.M.Kriege, F.D.Johansson, C.Morris: A survey on graph kernels, Applied Network Science 5:6, 2020.

A.Narayanan, M.Chandramohan, R.Venkatesan, L.Chen, Y.Liu, S.Jaiswal: Graph2vec: Learning Distributed Representations of Graphs, https://arxiv.org/pdf/1707.05005, 2017.

J.C.King: Symbolic execution and program testing, Commun. ACM, 19(7), 1976.

P.Godefroid, N.Klarlund, K.Sen: DART: directed automated random testing, PLDI, 2005.

N. Shafiei, F. van Breugel: Automatic handling of native methods in java pathfinder, SPIN, 2014

C. Cadar, D. Dunbar, and D. Engler: Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs,” OSDI, 2008.

C. Păsăreanu, W. Visser, D. Bushnell, J. Geldenhuys, P. Mehlitz, N. Rungta: Symbolic pathfinder: Integrating symbolic execution with model checking for java bytecode analysis, ASE, 2013

M.Mues, F.Howar: JDart: Dynamic Symbolic Execution for Java Bytecode, TACAS, LNCS 12079, 2020

A.V.Thakur, J.Lim, A.Lal, A.Burton, E.Driscoll, M.Elder, T.Andersen, T.Reps: Directed Proof Generation for Machine Code, CAV, LNCS 6174, 2010

D.Brumley, I.Jager, T.Avgerinos, E.J.Schwart: BAP: A Binary Analysis Platform, CAV, LNCS 6806, 2011

A.Djoudi, S.Bardin: BINSEC: Binary Code Analysis with Low Level Regions, TACAS, Springer LNCS 9035, 2015

V. Chipounov, V. Kuznetsov, G. Candea: S2E: A platform for in-vivo multi-path analysis of software systems, SIGARCH Comput. Archit. News 1, 2011

S. K. Cha, T. Avgerinos, A. Rebert, D. Brumley: Unleashing Mayhem on binary code, SP, 2012

A. Romano: Methods for binary symbolic execution, PhD Dissertation, Stanford University, 2014.

M.H.Nguyen, M.Ogawa, Q.T.Tho: Obfuscation code localization based on CFG generation of malware, FPS, LNCS 9482, 2015.

R. David, S. Bardin, T. D. Ta, L. Mounier, J. Feist, M. Potet, J. Marion: BINSEC/SE: A dynamic symbolic execution toolkit for

binary-level analysis, SANER, 2016.

Y.Shoshitaishvili, R.Wang, C.Salls, N.Stephens, M.Polino, A.Dutcher, J.Grosen, S.Feng, C.Hauser, C.Kruegel, G.Vigna: (State of) The Art of War: Offensive Techniques in Binary Analysis, SP, 2016.

A. Vu, M. Ogawa: Formal semantics extraction from natural language specifications for ARM, FM, LNCS 11800, 2019.

Q.T.Trac, M.Ogawa: Formal Semantics Extraction from MIPS Instruction Manual, FTSCS, Springer CCIS 1165, 2019.

A. T. V. Nguyen, M. Ogawa: Automatic stub generation for dynamic symbolic execution of arm binary, SoICT, 2022.

P.Royal, M.Halpin, D.Dagon, R.Edmonds, W.Lee: Automating the Hidden-Code Extraction of Unpack-Executing Malware, ACSAC, 2006.

L.Martignoni, M.Christodorescu, S.Jha: OmniUnpack: Fast, Generic, and Safe Unpacking of Malware ACSAC, 2007.

B.Cheng, J.Ming, J.Fu, G.Peng, T.Chen, X.Zhang, J.-Y.Marion: Towards Paving the Way for Large-Scale Windows Malware Analysis: Generic Binary Unpacking with Orders-of-Magnitude Performance Boost, CCS, 2018.

S.Choi, T.Changi, C.Kim, Y.Park: x64Unpack: Hybrid Emulation Unpacker for 64-bit Windows Environments and Detailed Analysis Results on VMProtect 3.4, IEEE Access. 2020.

N. M. Hai, M. Ogawa, Q. T. Tho: Packer Identification Based on Metadata Signature, ACM SSPREW-7, 2017.

P.T.Hung, M.Ogawa: Original Entry Point detection based on graph similarity. FPS, LNCS 14551, 2023.

F.Yamaguchi, N.Golde, D.Arp, K.Rieck: Modeling and Discovering Vulnerabilities with Code Property Graphs, SP 2014.

Le Vinh: Automatic stub generation from natural language description, Master thesis, JAIST, 2016 September.

H.L.Y.Nguyen: Automatic extraction of x86 formal semantics from its natural language description, Master thesis, JAIST, 2018 March.

Nguyen The Hung: Vulnerabilities detection in binary code, Master thesis, JAIST, 2024 September.

Z.Liy, D.Zouz, S.Xux, X.Ou, H.Jin, S.Wang, Z.Deng, Y.Zhong: VulDeePecker: A Deep Learning-Based System for Vulnerability Detection, NDSS 2018.

D.Guo, S.Ren, S.Lu, Z.Feng, D.Tang, S.Liu, L.Zhou, et.al. GraphCodeBERT: Pre-training code representations with data flow, ICLR, 2021.

V.A.Nguyen, D.Q.Nguyen, V.Nguyen, T.Le, Q.H.Tran, D.Phung: ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection, ICSE-Companion, 2022.

D.Hin, A.Kan, H.Chen, M.A.Babar: LineVD: Statement-level Vulnerability Detection using

Graph Neural Networks, MSR, 2022.

W.Wang, T.N.Nguyen, S.Wang, Y.Li, J.Zhang, A.Yadavally: DeepVD: Toward Class-Separation Features for Neural

Network Vulnerability Detection, ICSE, 2023.

S.Nguyen, T.-T.Nguyen, T.T.Vu, T.-D.Do, K.-T.Ngo, H.D.Vo: Code-centric Learningbased Just-In-Time Vulnerability Detection, archive, 2023.

Y.Li, S.Wang, T.N.Nguyen: Vulnerability Detection with Fine-Grained Interpretations, FSE, 2021.

M.Fu, C.Tantithamthavorn: LineVul: A transformer-based line-level vulnerability prediction, MSR, 2022.

Y.Ding, S.Suneja, Y.Zheng, J.Laredo, A.Morari, G.Kaiser, B.Ray: Velvet: a novel ensemble learning approach to automatically locate vulnerable statements, SANAR, 2022.

Pham Van Hau, To Trong Nghia, Phan The Duy, A method of generating mutated Windows malware to evade

ensemble learning, Journal of Science and Technology on Information Security, vol 1, no 18, 2023, pp 47-60. DOI: https://doi.org/10.54654/isj.v1i18.906.

S.Bardin, R.David; J.-Y.Marion: BackwardBounded DSE: Targeting Infeasibility Questions on Obfuscated Codes, SP, 2017.

Z.Wang, J.Ming, C.Jia, D.Gao: Linear obfuscation to combat symbolic execution, ESORICS, LNCS 6879, 2011.

B.Yadegari, S.Debray: Symbolic Execution of Obfuscated Code, CCS, 2015.

S.Banescu, C.Collberg, V.Ganesh, Z.Newsham, A.Pretschner: Code Obfuscation Against Symbolic Execution Attacks, ACSAC, 2016.

M.Ollivier, S.Bardin, R.Bonichon, J.- Y.Marion: How to Kill Symbolic Deobfuscation for Free (or Unleashing

the Potential of Path-Oriented Protections), ACSAC, 2019.

M.I.Sharif, A.Lanzi, J.T.Gin, W.Lee: Impeding malware analysis using conditional code obfuscation. NDSS, 2008.

M.Sharif, A.Lanzi, J.Giffin, W.Lee: Automatic Reverse Engineering of Malware Emulators, SP, 2009.

B.Yadegari, B.Johannesmeyer, B.Whitely, S.Debray: A generic approach to automatic deobfuscation of executable code, SP, 2015.

H.Li, Y.Zhan, W.Jianqiang, D.Gu: SymSem: Symbolic Execution with Time Stamps for Deobfuscation, INSCRYPT, LNCS 12020, 2019.

M.Liang1, Z.Li1, Q.Zeng, Z.Fang: Deobfuscation of Virtualization-Obfuscated Code Through Symbolic Execution and Compilation Optimization, ICICS, 2017.

T.Blazytko, M.Contag, C.Aschermann, T.Holz: Syntia: Synthesizing the Semantics of Obfuscated Code, USENIX, 2017.

J.Salwan1, S.Bardin, M.-L.Potet: Symbolic deobfuscation: from virtualized code back to the original? (long version), DIMVA, LNCS 10885, 2018.

Downloads

Abstract views: 36 / PDF downloads: 22

Published

2025-09-30

How to Cite

Ogawa, M. (2025). Malware Analysis: A Perspective from Dynamic Symbolic Execution of Binary Code. Journal of Science and Technology on Information Security, 2(25), 5-20. https://doi.org/10.54654/isj.v2i25.1093

Issue

Section

Papers