Hi everyone, my name is bit. And I was always fascinated about computer security. After a lot of researching, and searching, I fell in love with malware analysis. I thought to myself, maybe these findings may help someone else as well. To be honest, writing this essay was in the back of my mind for two years, but I never had the courage to do it. Because I thought to myself that what I have is not significant enough, or it may not help anyone at all. But I finally overcome it, and started writing about it. I hope that the roadmap, and resources that I’m sharing with you here, will help you out in your journey of becoming a better malware analyst.
So you have decided that you want to become a malware analyst, or even consider it. But I’m sorry to inform you that it’s not going to be easy. I mean, if you think about it reaching every goal is hard. It needs dedication, and hardship. So in order to have a better chance of reaching your goal. I recommend reading Ms. Azeria mini series. She will really help you to have a better understanding of how to set your mind, in order to reach your goal. Even if you don’t intend to become a malware analyst, and have other goals in mind. This mini series will really help you immensely:
1- The Importance of Deep Work
3- The Process of Mastering a Skill
After figuring out your goal, and getting into the right mindset, you need to have good foundations. Why? because you must have enough background knowledge about analyzing malware that you won’t get stuck, or confused, when you read a book about malware analysis, or analyzing a particular malware. It’s like trying to become the best runner, but you don’t have the endurance, or physiques of running. I’m not going into details of why this is so important, but Mr. Lost explains it very well:
Now shall we play a game? :)
After some thought, consideration, and my own experience about the resources that I’ve gathered. I reorganized them into two sections: malware 101, and malware 102. Maybe there are more foundations, but this is all I’ve found. Malware 101 is the necessity for analyzing malware, but if you are in a hurry, you can skip 102. Just remember to go back to 102 when you have the time. If 101 is the foundation, then 102 is like fortifying your foundation. You don’t need 102 right now, but you will eventually.
Malware 101
- Virtualization
You should never analyze malware on your own system, because you’re going to take the risk of damaging your computer. Instead, you need an isolation environment. VMware Workstation Pro ($$$) or VirtualBox (free) software will provide this feature. VMware Workstation Pro has two options that VirtualBox doesn’t have: Playback feature, and taking multiple snapshot branches. VirtualBox doesn’t have playback feature at all, and only supports one snapshot branch. But for anyone who is starting in this field is a decent option. These software give you the ability to install an operating system with the help of ISO file (operating system image) inside your own OS.
So install one of them and follow up a tutorial online to booting up an OS (guest OS) within your own OS (host OS). This is a little guide by how-To geek website on how to download Windows ISO legally. It’s probably best to search online yourself For Linux ISOs, but for starters try to install Ubuntu on your VirtualBox, or VMware. If you think that your system can’t handle Ubuntu, then try Xubuntu. It is a derivation of Ubuntu family, but uses less resources than other Linux distributions.
One thing to take into consideration is that these virtualization software use your PC or Laptop resources (RAM, CPU, Storage device, and etc) to boot up your virtualized (guest) OS. So you need to have a decent PC to be able to virtualize these OSs. SANS institution Reverse-Engineering Malware course gives a detailed requirement for what kind of system you should have in “Laptop Required” section.
- C
First of all you need to learn C programming language, because the malwares that you are going to analyze use the concept of pointer a lot through out their source code, and by being familiar with pointers your job of analyzing malware will be a lot more easier. Malware also uses C library functions a lot, because as Mr. Wosar mentions, “Large portions of all major operating systems these days are still based on C. Therefore, a lot of API documentation is very C-centric. You will have a way easier time reading documentation and manuals if you know C.” [1]. So you will gain a lot by learning about C. If you know a high level language like Java, or C#. You still need to learn C, because high level languages such as Java, C#, Python, etc, hide some aspect of low level programming like pointers, and memory allocation. And when you are analyzing malware, you need to deal with this part of low level programming. Some people may already be familiar with C++, but I still advise you to learn C, because the way C handles some programming concept is completely different than C++.
The CS50 course (one of the top courses from Edx website) is the best source for learning C language in my opinion, and the book that accompanies the course is also the best: Programming in C by Stephen G. Kochan. That being said, CS50 is not an easy course. If you think that CS50 is not for you, and you only want to become familiar with C. Then read the “Programming in C” book, do its exercises, and only watch CS50 videos.
The most important aspect in learning the C programming language is pointers, dealing with structures, memories, and familiarity with C functions, and how they operate. You must master them, because you’re going to need them in “malware analysis”.
- Python
Maybe you need to automate some simple stuff in a short amount of time, or manipulate malware to act as you want it to, or develop an extension for one of your malware analysis tools [2]. Python is your language of choice. There are other scripting languages like Python too, but Python integration in other application (like IDA pro) and “vast amount of libraries aimed explicitly at reverse engineering” [1] makes it the better option.
Programming for everybody (part 1) and Python data structure (part 2) from one of the top courses in Coursera website are good resources to learn about Python programming.
- Assembly
After you have learned C language. You need to become familiar with Assembly language, because you are going to spend a lot of your time dealing with Assembly, when you’re reverse engineering a malware. So a good knowledge of Assembly language, and its instruction is the key to your success. I really recommend Assembly Language Step-by-Step by Jeff Duntemann. Mr. Duntemann will really takes his time, but by doing so will make you fell in love with Assembly language. You need to have a Linux system (maybe Xubuntu) and a debugger that has a GUI like KDbg debugger, or a terminal-based debugger like GDB (GNU debugger) with GEF . For building your code, you need nasm and ld, which you have to type these commands on your system’s terminal:
in 64 bit Linux system:
$> nasm -f elf -g -F dwarf hello.asm
$> ld -m elf_i386 -o hello hello.o
in 32 bit Linxu system:
$> nasm -f elf -g -F dwarf hello.asm
$> ld -o hello hello.o
and for editing your code a simple text editor will suffice. Atom, Visual Studio Code, Notepad++, Sublime Text, Vim and etc. If you’re not sure, then use Visual Studio Code, and install an extension for highlighting x86 Assembly.
“I wanted to give you these handicaps, because the book is a little out-dated in some aspect like debugging software, or how to compile your code to contain debugging symbol, but in the matter of teaching you Assembly language it gets a perfect score ”
If you are a little familiar with Assembly language, or don’t want a gentle introduction, then open security training website has a course regarding x86 Assembly Language: Introductory Intel x86
- Computer Architecture
You need to know how CPU, and your computer system works, because they are the building blocks of how OS operates on a system. Nand to Tetris part 1 course by Coursera website will teach you these concepts in a practical way. Also having Structured Computer Organization book by Andrew S. Tanenbaum, and Todd Austin as a reference will make your life a lot easier.
If you really want a more hands-on experience, then Building an 8-bit Breadboard Computer by Ben Eater is for you, but if you can’t afford the parts, then Nand to Tetris will do the job.
- Operating System
Malware uses the exact functions that OS uses to communicate with computer system, so having a theory of how operating system does its job will help you to understand in some way how malware operates, and interacts with the system. Nand to Tetris part 2 course will help you to understand these fundamentals. Also Modern Operating Systems book by Andrew S. Tanenbaum and Herbert Bos will really help you as a supplementary book.
If you are a more hands-on person, and you are familiar with C++, then Write Your Own Operating System by Viktor Engelmann will teach you a lot about OS and how they operate.
- Network
Since malware in some way, or form communicates with Internet, and you have to deal with it, when you are analyzing a particular malware. Then Computer Network knowledge is essential in your skill sets. Computer Networks book by Andrew S. Tanenbaum will help you gain that knowledge.
Malware 101 Diagram
- Malware 102
When I’ve first started to learn about malware analysis. I thought that by learning Malware 101, and just deepening my knowledge more about C, and Microsoft operating system, I’ve gained everything out there about malware analysis. But as I’ve started to read more books, and did more research about the subject. I started to realize that there is more. Practical Reverse Engineering, and Practical Malware Analysis books were the major part of this realization, and shaped Malware 102.
- C
Now that you have learned fundamentals of C language, then you need to go deeper. Expert C Programming: Deep C Secrets book by Peter van der Linden will provide that deepness.
- C++
C++ was created, because C wasn’t an Object Oriented Programming (OOP). That being said some of the resources that I’m going to introduce here needs C++ background knowledge, so learning C++ language is required. Accelerated C++: Practical Programming by Example book by Andrew Koenig is the best choice since you are familiar with fundamentals of computer programming, and C language. But if you think that the pace of the book is fast, or you need to start at the very basic, then I recommend Learn C++ by code academy, or C++ Primer Plus book by Stephen Prata.
Just a side note regarding C++: I’ve made a short introduction to C++, and sell C++ really short of what it’s really worth, but it’s one of the top languages in the Computer industry, and one of my favorites.
- Assembly
Practical Malware Analysis book states, “What if you encounter an instruction you have never seen before? If you can’t find your answer with a Google search, you can download the complete x86 architecture manuals from Intel. [3]”
- Volume 1 Basic Architecture: This manual describes the architecture and programming environment. It is useful for helping you understand how memory works, including registers, memory layout, addressing, and the stack. This manual also contains details about general instruction groups. [3]
- Volume 2A Instruction Set Reference, A–M, and Volume 2B: Instruction Set Reference, N–Z: These are the most useful manuals for the malware analyst. They alphabetize the entire instruction set and discuss every aspect of each instruction, including the format of the instruction, opcode information, and how the instruction impacts the system. [3]
- Volume 3A System Programming Guide, Part 1, and Volume 3B System Programming Guide, Part 2: In addition to general-purpose registers, x86 has many special-purpose registers and instructions that impact execution and support the OS, including debugging, memory management, protection, task management, interrupt and exception handling, multiprocessor support, and more. If you encounter special-purpose registers, refer to the System Programming Guide to see how they impact execution. [3]
- Optimization Reference Manual: This manual describes code-optimization techniques for applications. It offers additional insight into the code generated by compilers and has many good examples of how instructions can be used in unconventional ways. [3]
So basically the next step to expanding your knowledge of Assembly language is by the help of Intel manuals: Intel 64 and IA-32 Architectures Software Developer’s Manual Combined, and Intel 64 and IA-32 Architectures Optimization Reference Manual .
- Compiler
When you are reverse engineering a malware, you will eventually come upon a section of code that doesn’t make any sense to you. But if you knew how compilers work, and how they translate your code to machine language, then you could really make sense of what’s happening. Practical Reverse Engineering book recommend these two books:
Compilers: Principles,Techniques, and Tools by Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
Linkers and Loaders by John R. Levine
And for advance people, Practical Reverse Engineering recommends
Advanced Compiler Design and Implementation by Steven Muchnick
- Network
You now know the basic of how network operates. The next step is to be able to use it in your malware analysis. Practical Packet Analysis book by Chris Sanders will make you efficient in using Wireshark to analyze malware network activity.
- Obfuscation
Malware uses obfuscation to make the analysis of malware harder. So some background knowledge in this matter is essential:
Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection by Christian Collberg and Jasvir Nagra will provide this background knowledge.
For the following sections I think Practical Malware Analysis book sums up really well why you need to expand your knowledge in these fields, “Most malware targets Windows platforms and interacts closely with the OS. A solid understanding of basic Windows coding concepts will allow you to identify
host-based indicators of malware, follow malware as it uses the OS to execute code without a jump or call instruction, and determine the malware’s purpose.” [4]
- PE
The Portable Executable (PE) format is the file format that Windows operating systems uses to run programs. So as a malware analyst, you need to be familiar with this format.
The PE Header (section 4.3.2.1.1) in page 144 of The Art of Computer Virus Research and Defense book by Peter Szor, and “Headers” section in page 97–102 of Reversing: Secrets of Reverse Engineering by Eldad Eilam are excellent resources to learn about PE format as a malware analyst. Also the picture of PE structure in Portable Executable page from Wikipedia website is a great reference, if you want to have a visual understanding of it. And if you want to study further then I recommend:
An In-Depth Look into the Win32 Portable Executable File Format, Part 1
An In-Depth Look into the Win32 Portable Executable File Format, Part 2
- Operating System
After learning about operating system concepts in malware 101, Windows Internals books by Mark E. Russinovich, David A. Solomon, and Alex Ionescu part 1 and part 2 are the next best option to learn about Windows OS. The part 2 of the 7th edition is not published yet, so stick with the 6th edition.
Also What Makes It Page? by Enrico Martignetti teaches you how virtual memory manager works behind the scenes in Windows OS. [6] And in my opinion is a supplementary book for Windows Internals books.
- Win32 Programming
When you want to learn about kernel in Windows, then you need to have some knowledge of win32 programming to understand how to use Windows API. Windows System Programming by Johnson M. Hart and Windows via C/C++ by Jeffrey Richter, Christophe Nasarre are the best references to learning win32 programming.
- Kernel
The program that controls the communication between software, and hardware in your computer system. Some malware take advantage of Kernel in order to be more stealthy, and persistent. Kernel knowledge is a must, if you want to advance in your career as a malware analyst. Windows NT Device Driver Development by Peter G. Viscarola and W. Anthony Mason is a book on driver development, but the background chapters provide an excellent and concrete introduction to Windows, and it is also an excellent supplementary material for the Windows kernel. [6]
Windows Kernel Programming by Pavel Yosifovich needs a mention here, because it is a modern take on windows kernel programming.
Malware 102 Diagram
- Finally!
At last we are here :). The juicy parts, where the real fun starts. I make this part short and sweet, and I’m only going to show you the first steps. The rest is up to you, and there are some good resources, and materials on how to study further in this field.
- Malware
1- Practical Malware Analysis book by Michael Sikorski, Andrew Honig is the first book that you need to study about malware analysis, because it teaches you everything from ground up. And familiarize you with general techniques, and tools that you need to know for analyzing malware.
2- Download some malware sample and try to analyze them. You could try malware.lu, but there are other websites as well. You just need to look for it (Google is your friend).
3- Malware Analyst’s Cookbook by Michael Ligh, Steven Adair, Blake Hartstein, Matthew Richard is the second book that you need to study. Ms. MalwareUnicorn says, “This book is a great starter for understanding malware from the RE perspective and creating tools to help you RE.” [7]. In my opinion “Practical Malware Analysis” is more beginner friendly than this book, even thought both of these books are introductory books about malware analysis.
- Reverse Engineering (RE)
You will spend most of your time, analyzing binaries (mostly Assembly language). So you must have reverse engineering skills.
1- begin.re (created by Ms. Harpaz) is a great website for anyone that wants to get started with Reverse Engineering.
2- Reversing: Secrets of Reverse Engineering book by Eldad Eilam is the next best thing that you need to study to get better in RE. Windows XP is required to analyze the binaries along this book. And The book’s materials weren’t available in the mentioned website. You can download them from here.
3- Ms. MalwareUnicorn’s Reverse Engineering 101 and 102 workshops are great to practice your newly found skills.
4- Flare-on challenges by FireEye company, which will be held each year is a great way to test your skills. Also you can read the solutions for the previous years on their website.
- General
Art of Computer Virus Research and Defense by Peter Szor, is a book about virus threats, defense techniques, and analysis tools. Mr. Wosar says, “is one of the very few books that looks specifically into how anti-viruses work. While it is a bit older and slightly outdated, the techniques explained in that book are still in use today.”[8]
Gray Hat Python by Justin Seitz is the book that teaches you the first steps on how to use Python in your malware analysis.
Other Resources
Great resources to advance your skills further in this field:
Collection of malware analysis resources by Fabian Wosar
Malware Analyst People to Follow
- fwosar
- PolarToffee
- malwareunicorn
- hasherezade
- pinkflawd
- MalwareTechBlog
- demonslay335
- VK_Intel
- struppigel
- NtSetDefault
- _hugsy_
- megabeets_
I just wanted to point some of them to you, in case you want to follow them.
My only goal for writing this essay, was for you to have a better road map of how to become a malware analyst. I wish you the best in your journey, and good luck.
References
[1] F. Wosar, ‘Collection of malware analysis resources’, [Online]. Available: https://github.com/fwosar/malware-analysis-resources . [Accessed: 10- Jan- 2020].
[2] M. Hutchins ‘Best Languages to Learn for Malware Analysis’, [Online]. Available: https://www.malwaretech.com/2018/03/best-programming-languages-to-learn-for-malware-analysis.html . [Accessed: 10- Jan- 2020].
[3] M. Sikorski, A. Honig, Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. San Fransisco: No Starch Press, 2012, p. 85
[4] M. Sikorski, A. Honig, Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. San Fransisco: No Starch Press, 2012, p. 135
[5] B. Dang, A. Gazet,E. Bachaalany, Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation. Indianapolis: Wiley, 2014, p. xxiv
[6] B. Dang, A. Gazet,E. Bachaalany, Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation. Indianapolis: Wiley, 2014, p. xxv
[7] A. Rousseau, “Resources”, [Online]. Available: https://malwareunicorn.org/#/resources . [Accessed: 22- Jan- 2020].
[8] F. Wosar, ‘Collection of malware analysis resources’, [Online]. Available: https://github.com/fwosar/malware-analysis-resources . [Accessed: 24- Jan- 2020].