Malware Analysis: A Perspective from Dynamic Symbolic Execution of Binary Code
DOI:
https://doi.org/10.54654/isj.v2i25.1093Keywords:
Dynamic symbolic execution, binary code, x86, control flow graph, obfuscationTóm tắt
Malware analysis typically involves three steps:
obfuscation, infection, and malicious action. Many antivirus methods fail because obfuscation hides control structures. This paper provides an overview of dynamic symbolic execution (DSE) applied to binary code, especially x86. DSE is considered the most powerful technique for deobfuscation and can automatically recover control structures such as control‑flow graphs. Several DSE tools target x86 (e.g., angr, Mayhem, S2E, KLEE‑MC, and BE‑PUM); we examine their design choices and trade‑offs. Finally, we evaluate the effectiveness of control‑flow graph similarity for tasks such as packer identification and original entry point (OEP) detection.
Downloads
References
M.Sikorski, A.Honig: Practical Malware Analysis, No Stretch Book, 2012.
D.Andriesse, Practical Binary Analysis, No Stretch Book, 2018.
P.Szor: The Art of Computer Virus Research and Defense, Addison Wesley, 2005
C.Anley, J.Heasman, F.Lindner, G.Richarte: The Shellcoder’s Handbook (2nd ed), Addison Wesley, 2007
C.Collberg, J.Nagra: Surreptitious software. Addison Wesley 2010
Nu1L Team: Handbook of CTFer, Springer, 2022.
G.Winskel: The Formal Semantics of Programming Languages, MIT Press, 1993.
F.Nielson, H.R.Nielson, C.Hankin: Principles of Program Analysis, Springer, 1999
E.Clarke: Programming language constructs for which it is impossible to obtain good Hoare axiom systems, JACM 26(1), 1979.
E.Clarke, S.M.German, J.Y.Halpern: Effective axiomatizations of Hoare logics, JACM 30(3), 1983.
K.A.Roundy, B.P.Miller: Binary-code obfuscations in prevalent packer tools. ACM Comput. Surv 46 4:1-4:32, 2013
S.Schrittwieser: Protecting Software through Obfuscation: Can It Keep Pace with Progress
in Code Analysis? ACM Comp Surv 49(1), 2016.
B.Cheng, J.Ming, E.A.Leal, H.Zhang, J.Fu, G.Peng, J.-Y.Marion: Obfuscation-Resilient Exectuable Payload Extraction From Packed Malware, USENIX, 3456, 2021.
D. Brumley, C. Hartwig, Z. Liang, J. Newsome, D. X. Song, H. Yin: Automatically identifying trigger-based behavior in malware, Botnet Detection ADIS 36, 2008.
B.Weisfeiler, A.A.Lehman: A reduction of a graph to a canonical form and an algebra arising during this reduction. NauchnoTechnicheskaya Informatsia 2(9), 1968.
N.Shervashidze, P.Schweitzer, E.J.van Leeuwen, K.Mehlhorn, K.M.Borgwardt: Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12, 2011.
N.M.Kriege, F.D.Johansson, C.Morris: A survey on graph kernels, Applied Network Science 5:6, 2020.
A.Narayanan, M.Chandramohan, R.Venkatesan, L.Chen, Y.Liu, S.Jaiswal: Graph2vec: Learning Distributed Representations of Graphs, https://arxiv.org/pdf/1707.05005, 2017.
J.C.King: Symbolic execution and program testing, Commun. ACM, 19(7), 1976.
P.Godefroid, N.Klarlund, K.Sen: DART: directed automated random testing, PLDI, 2005.
N. Shafiei, F. van Breugel: Automatic handling of native methods in java pathfinder, SPIN, 2014
C. Cadar, D. Dunbar, and D. Engler: Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs,” OSDI, 2008.
C. Păsăreanu, W. Visser, D. Bushnell, J. Geldenhuys, P. Mehlitz, N. Rungta: Symbolic pathfinder: Integrating symbolic execution with model checking for java bytecode analysis, ASE, 2013
M.Mues, F.Howar: JDart: Dynamic Symbolic Execution for Java Bytecode, TACAS, LNCS 12079, 2020
A.V.Thakur, J.Lim, A.Lal, A.Burton, E.Driscoll, M.Elder, T.Andersen, T.Reps: Directed Proof Generation for Machine Code, CAV, LNCS 6174, 2010
D.Brumley, I.Jager, T.Avgerinos, E.J.Schwart: BAP: A Binary Analysis Platform, CAV, LNCS 6806, 2011
A.Djoudi, S.Bardin: BINSEC: Binary Code Analysis with Low Level Regions, TACAS, Springer LNCS 9035, 2015
V. Chipounov, V. Kuznetsov, G. Candea: S2E: A platform for in-vivo multi-path analysis of software systems, SIGARCH Comput. Archit. News 1, 2011
S. K. Cha, T. Avgerinos, A. Rebert, D. Brumley: Unleashing Mayhem on binary code, SP, 2012
A. Romano: Methods for binary symbolic execution, PhD Dissertation, Stanford University, 2014.
M.H.Nguyen, M.Ogawa, Q.T.Tho: Obfuscation code localization based on CFG generation of malware, FPS, LNCS 9482, 2015.
R. David, S. Bardin, T. D. Ta, L. Mounier, J. Feist, M. Potet, J. Marion: BINSEC/SE: A dynamic symbolic execution toolkit for
binary-level analysis, SANER, 2016.
Y.Shoshitaishvili, R.Wang, C.Salls, N.Stephens, M.Polino, A.Dutcher, J.Grosen, S.Feng, C.Hauser, C.Kruegel, G.Vigna: (State of) The Art of War: Offensive Techniques in Binary Analysis, SP, 2016.
A. Vu, M. Ogawa: Formal semantics extraction from natural language specifications for ARM, FM, LNCS 11800, 2019.
Q.T.Trac, M.Ogawa: Formal Semantics Extraction from MIPS Instruction Manual, FTSCS, Springer CCIS 1165, 2019.
A. T. V. Nguyen, M. Ogawa: Automatic stub generation for dynamic symbolic execution of arm binary, SoICT, 2022.
P.Royal, M.Halpin, D.Dagon, R.Edmonds, W.Lee: Automating the Hidden-Code Extraction of Unpack-Executing Malware, ACSAC, 2006.
L.Martignoni, M.Christodorescu, S.Jha: OmniUnpack: Fast, Generic, and Safe Unpacking of Malware ACSAC, 2007.
B.Cheng, J.Ming, J.Fu, G.Peng, T.Chen, X.Zhang, J.-Y.Marion: Towards Paving the Way for Large-Scale Windows Malware Analysis: Generic Binary Unpacking with Orders-of-Magnitude Performance Boost, CCS, 2018.
S.Choi, T.Changi, C.Kim, Y.Park: x64Unpack: Hybrid Emulation Unpacker for 64-bit Windows Environments and Detailed Analysis Results on VMProtect 3.4, IEEE Access. 2020.
N. M. Hai, M. Ogawa, Q. T. Tho: Packer Identification Based on Metadata Signature, ACM SSPREW-7, 2017.
P.T.Hung, M.Ogawa: Original Entry Point detection based on graph similarity. FPS, LNCS 14551, 2023.
F.Yamaguchi, N.Golde, D.Arp, K.Rieck: Modeling and Discovering Vulnerabilities with Code Property Graphs, SP 2014.
Le Vinh: Automatic stub generation from natural language description, Master thesis, JAIST, 2016 September.
H.L.Y.Nguyen: Automatic extraction of x86 formal semantics from its natural language description, Master thesis, JAIST, 2018 March.
Nguyen The Hung: Vulnerabilities detection in binary code, Master thesis, JAIST, 2024 September.
Z.Liy, D.Zouz, S.Xux, X.Ou, H.Jin, S.Wang, Z.Deng, Y.Zhong: VulDeePecker: A Deep Learning-Based System for Vulnerability Detection, NDSS 2018.
D.Guo, S.Ren, S.Lu, Z.Feng, D.Tang, S.Liu, L.Zhou, et.al. GraphCodeBERT: Pre-training code representations with data flow, ICLR, 2021.
V.A.Nguyen, D.Q.Nguyen, V.Nguyen, T.Le, Q.H.Tran, D.Phung: ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection, ICSE-Companion, 2022.
D.Hin, A.Kan, H.Chen, M.A.Babar: LineVD: Statement-level Vulnerability Detection using
Graph Neural Networks, MSR, 2022.
W.Wang, T.N.Nguyen, S.Wang, Y.Li, J.Zhang, A.Yadavally: DeepVD: Toward Class-Separation Features for Neural
Network Vulnerability Detection, ICSE, 2023.
S.Nguyen, T.-T.Nguyen, T.T.Vu, T.-D.Do, K.-T.Ngo, H.D.Vo: Code-centric Learningbased Just-In-Time Vulnerability Detection, archive, 2023.
Y.Li, S.Wang, T.N.Nguyen: Vulnerability Detection with Fine-Grained Interpretations, FSE, 2021.
M.Fu, C.Tantithamthavorn: LineVul: A transformer-based line-level vulnerability prediction, MSR, 2022.
Y.Ding, S.Suneja, Y.Zheng, J.Laredo, A.Morari, G.Kaiser, B.Ray: Velvet: a novel ensemble learning approach to automatically locate vulnerable statements, SANAR, 2022.
Pham Van Hau, To Trong Nghia, Phan The Duy, A method of generating mutated Windows malware to evade
ensemble learning, Journal of Science and Technology on Information Security, vol 1, no 18, 2023, pp 47-60. DOI: https://doi.org/10.54654/isj.v1i18.906.
S.Bardin, R.David; J.-Y.Marion: BackwardBounded DSE: Targeting Infeasibility Questions on Obfuscated Codes, SP, 2017.
Z.Wang, J.Ming, C.Jia, D.Gao: Linear obfuscation to combat symbolic execution, ESORICS, LNCS 6879, 2011.
B.Yadegari, S.Debray: Symbolic Execution of Obfuscated Code, CCS, 2015.
S.Banescu, C.Collberg, V.Ganesh, Z.Newsham, A.Pretschner: Code Obfuscation Against Symbolic Execution Attacks, ACSAC, 2016.
M.Ollivier, S.Bardin, R.Bonichon, J.- Y.Marion: How to Kill Symbolic Deobfuscation for Free (or Unleashing
the Potential of Path-Oriented Protections), ACSAC, 2019.
M.I.Sharif, A.Lanzi, J.T.Gin, W.Lee: Impeding malware analysis using conditional code obfuscation. NDSS, 2008.
M.Sharif, A.Lanzi, J.Giffin, W.Lee: Automatic Reverse Engineering of Malware Emulators, SP, 2009.
B.Yadegari, B.Johannesmeyer, B.Whitely, S.Debray: A generic approach to automatic deobfuscation of executable code, SP, 2015.
H.Li, Y.Zhan, W.Jianqiang, D.Gu: SymSem: Symbolic Execution with Time Stamps for Deobfuscation, INSCRYPT, LNCS 12020, 2019.
M.Liang1, Z.Li1, Q.Zeng, Z.Fang: Deobfuscation of Virtualization-Obfuscated Code Through Symbolic Execution and Compilation Optimization, ICICS, 2017.
T.Blazytko, M.Contag, C.Aschermann, T.Holz: Syntia: Synthesizing the Semantics of Obfuscated Code, USENIX, 2017.
J.Salwan1, S.Bardin, M.-L.Potet: Symbolic deobfuscation: from virtualized code back to the original? (long version), DIMVA, LNCS 10885, 2018.
Downloads
Published
How to Cite
Issue
Section
License
Proposed Policy for Journals That Offer Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Proposed Policy for Journals That Offer Delayed Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication, with the work [SPECIFY PERIOD OF TIME] after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).