
Have you ever wondered how programs work "from the inside"? Reverse engineering is the key to understanding their mechanics, a powerful skill that opens many doors. Start your journey into this fascinating field with our simple and clear guide!
Introduction
Reverse engineering is the process of analyzing software to understand its internal structure, algorithms, and operating logic. This discipline is widely used in cybersecurity to find vulnerabilities, analyze malware, check software for compliance, and to restore lost source code or ensure compatibility between different systems. However, for beginners, reverse engineering can seem complex and intimidating. In this article, we will cover the basic concepts, tools, ethical issues, and first steps in binary analysis so that you can confidently begin exploring this field.Ethics and Law in Reverse Engineering
Before diving into the technical details, it is essential to understand the ethical and legal framework for reverse engineering. Having knowledge in this area comes with a responsibility to apply it.- When is it acceptable:
- Analysis of your own program code.
- Research of open source software.
- Working with programs specifically designed for training and practice (for example, Crackmes, CTF tasks).
- Obtaining explicit written permission from the owner of the software.
- To ensure compatibility (interoperability) in some jurisdictions, subject to strict conditions.
- When to be careful or when it is prohibited:
- Violation of license agreements (EULA) that may prohibit disassembly.
- Copyright infringement (e.g. copying protected algorithms).
- Using the acquired knowledge to create malware, hack systems, or engage in other illegal activities.
Basic concepts
Before you begin practicing, it is important to understand the basic terms.Binaries (executable files)
A binary file is a file that contains instructions that a computer can directly execute. Examples: .exe (Windows), .elf (Linux), .macho(macOS) These files contain machine code.Assembler
Assembly is a low-level programming language that is a human-readable representation of machine code. Here is a lot of training material from @ @Marylin on assembler. Each assembler command usually corresponds to one machine instruction. For example:
C-like:
mov eax, [rbp-0x4] ; Переместить значение из памяти (адрес rbp-0x4) в регистр eax
add eax, [rbp-0x8] ; Сложить значение из памяти (адрес rbp-0x8) с регистром eax
Static analysis
Analysis of a program without actually running it. Main methods:- Disassembly: The process of converting machine code back into assembly code. It is one of the first steps in analyzing binaries.
- Decompilation: A more complex process of attempting to reconstruct the source code in a high-level language (such as C) from machine or assembly code. The result is not always perfect, but is often very helpful in understanding the logic.
Dynamic analysis
Analysis of a program during its execution.- Debugging: The process of stepping through a program to examine its behavior in real time. Debuggers allow you to set breakpoints, monitor processor register values, memory contents, and variables.
Reverse Engineering Tools
To perform a successful analysis, you will need specialized tools. Here are some popular options:- Ghidra
- A free, powerful set of reverse engineering tools developed by the US National Security Agency (NSA).
- Supports disassembly and decompilation for many processor architectures. Has a graphical interface. A great choice for starting out.
- Analysis type: Static (mostly), some possibilities for dynamics through emulation.
- x64dbg (for Windows)
- A modern, free, open source debugger for 32-bit and 64-bit Windows applications with a user-friendly graphical interface.
- Analysis type: Dynamic.
- Radare2 (and Rizin)
- A free, very powerful command-line reverse engineering framework. Supports many architectures and file formats. Rizin is a fork of Radare2.
- Has a steep learning curve but is extremely flexible.
- Analysis type: Static and Dynamic.
- IDA Pro
- Considered an industry standard, it is a very powerful interactive disassembler and debugger.
- It has a high price, but there is a free version IDA Free with limited functionality (for example, without support for many processors or a decompiler for 64-bit). Usually this is a tool for a more advanced stage.
- Analysis type: Static and Dynamic.
- OllyDbg (for Windows)
- Classic free debugger for 32-bit Windows applications. Still popular, but for 64-bit systems it is better to use x64dbg.
- Analysis type: Dynamic.
First Steps in Binary Analysis: A Practical Example
Let's look at a simple example. We'll create a small C program, compile it, and then analyze it using Ghidra(static analysis) and x64dbg(dynamic analysis).Step 0: Create a test program
Create a file simple_sum.cwith the following code:
C:
// simple_sum.c
#include <stdio.h>
int main() {
int a = 5;
int b = 3;
int result = a + b;
return result; // The result is usually returned via the eax/rax register
}
Now let's compile it. If you have Linux or Windows with MinGW/GCC installed:
Open a terminal (command line) and run:
gcc -o simple_sum simple_sum.c -O0 -m64 (or simple_sum.exefor Windows)
- -o simple_sum: Specifies the name of the output file.
- -O0: (zero) Disables compiler optimizations. This is important for learning purposes, as the code will be more straightforward and easier to analyze - we will see how variables are laid out in memory.
- -m64: Explicitly specifies to create a 64-bit executable (for 32-bit use -m32, then the registers will be ebp, esp, eaxetc.). Our assembler example below will be 64-bit oriented.
Step 1: Preparation
Make sure you have downloaded and installed Ghidra And x64dbg. We will analyze the file simple_sum (or simple_sum.exe), which we just created.Step 2: Disassembly (static analysis with Ghidra)
- Launch Ghidra. Create a new project (File -> New Project), select "Non-Shared Project".
- Import your binary file ( simple_sum or simple_sum.exe) to the project (File -> Import File). Leave the default analysis settings.
- After analysis, open the file in CodeBrowser (double-click on the file in the project).
- In the Symbol Tree window (usually on the left), find and expand Functions, then find and double-click the function main.
- In the central "Listing" window you will see the disassembled code of the function. mainThis is static analysis - we study the code without running it.
Code:
; --- Function main ---
; Prolog functions (stack frame setup)
push rbp
mov rbp, rsp
sub rsp, 0x10 ; Allocate stack space for local variables
; int a = 5;
mov DWORD PTR [rbp-0xc], 0x5 ; Store 5 in memory (variable 'a')
; int b = 3;
mov DWORD PTR [rbp-0x8], 0x3 ; Store 3 in memory (variable 'b')
; int result = a + b;
mov edx, DWORD PTR [rbp-0xc] ; Load 'a' (5) into the edx register
mov eax, DWORD PTR [rbp-0x8] ; Load 'b' (3) into register eax
add eax, edx ; Add: eax = eax + edx (3 + 5 = 8)
mov DWORD PTR [rbp-0x4], eax ; We save the result (8) in memory (variable 'result')
; return result; (the result is already in eax if it was the last one calculated there)
mov eax, DWORD PTR [rbp-0x4] ; Load 'result' (8) into eax for return
; Function epilogue (stack restoration)
leave ; mov rsp, rbp; pop rbp
ret ; Return from function
Note: eax– these are the lower 32 bits of a 64-bit register rax. The compiler uses DWORD PTRto indicate that we are working with 32-bit values (size int).
Step 3: Code Analysis (Ghidra)
In the disassembled code we see:- Prologue: push rbp, mov rbp, rsp, sub rsp, ...– standard instructions for creating a function stack frame.
- Initialization of variables: Values 5 And 3are placed in memory at addresses relative to the stack base pointer rbp (For example, [rbp-0xc] And [rbp-0x8]). Ghidra can even automatically add comments with variable names if you specify them (right click on the variable -> Rename Variable).
- Arithmetic operation: Values are loaded from memory into registers ( edx, eax), are added up ( add eax, edx), and the result is saved back to memory, and then to eaxto return from the function.
- Epilogue: leave And ret– standard instructions for terminating a function and returning control.
Step 4: Debugging (dynamic analysis with x64dbg)
Now let's see how the program runs in reality.- Launch x64dbg.
- Open your file simple_sum (or simple_sum.exe) via File -> Open.
- The program will load and stop at the system entry point. We need to find our function main. Go to the "Symbols" tab and find main(may be called simple_sum.mainor similar). Double clicking on it will take you to the beginning of the code mainon the "CPU" tab.
- Set a breakpoint at the first instruction of the function main(for example, on push rbp) by pressing F2.
- Run the program until the breakpoint by pressing F9.
- Now follow the program step by step using:
- F7(Step Into): Execute a statement; if it is a function call, step into it.
- F8(Step Over): Execute a statement; if it is a function call, execute it entirely and stop after.
- Watch the changes in the register window (top right) and the stack/memory window (bottom). You will see how the values 5 And 3are loaded, as the register value changes eaxafter addition.
- For example, after mov DWORD PTR [rbp-0xc], 0x5you can look in memory at the address, which is calculated as rbp-0xc, and see the meaning there 5.
- After add eax, edxyou will see that the register EAXcontains 8.
Step 5: Conclusion from a practical example
We successfully:- We statically analyzed the code using Ghidra, understood its structure and logic without running it.
- We dynamically analyzed the program using x64dbg, tracking its execution step by step and changing data in registers and memory.
Step 5: Conclusion from a practical example
We successfully:- We statically analyzed the code using Ghidra, understood its structure and logic without running it.
- We dynamically analyzed the program using x64dbg, tracking its execution step by step and changing data in registers and memory.
Safety during analysis
When working with unknown or potentially malicious files, always use an isolated environment , such as a virtual machine (VirtualBox, VMware). This will prevent your main system from being infected if the file being analyzed turns out to be malicious. For malware analysis, this is an absolute requirement.Conclusion
Reverse engineering is a powerful tool that opens the door to a deep understanding of software. Starting with simple examples and your own small programs , you can gradually move on to more complex tasks, such as malware analysis, vulnerability research, or ensuring software compatibility. The main thing is not to be afraid, practice a lot, and always remember the ethical side of the issue and comply with the law .Frequently asked questions
- Is reverse engineering difficult to learn?
This can be challenging and requires patience, attention to detail and regular practice. However, it is a very exciting area. Starting with the basics, as in this article, and gradually increasing the complexity is the key to success. - What programming languages do you need to know?
A deep understanding of C and/or C++ is essential, as many of the programs you will analyze are written in them. Knowledge of Assembler work understanding of how operating systems for your target architecture (e.g. x86/x64, ARM) is fundamental. A basic will also help a lot. Python is often used to write helper scripts and automate tasks. - Is it possible to reverse engineer on Linux/macOS?
Yes, of course. Many tools, such as Ghidra, Radare2/ Rizin, cross-platform. For debugging on Linux, popular GDB(often with graphical shells like GEF, Peda, Pwndbg), on macOS – LLDB. - Where can I find binaries for practice?
- Create your own! Write simple C/C++ programs, compile them with different optimization options, and analyze them. This is the best way to understand how high-level language constructs are converted into machine code.
- Crackmes: Sites like crackmes.oneoffer programs specifically designed to train reverse engineering skills.
- CTF Competitions: Many Capture The Flag (CTF) competitions include reverse engineering challenges (categories "Reverse", "Pwn").
- Educational materials from the authors of tools or courses.