Reverse Engineering for Beginners: Where to Start Analyzing Binaries

Tr0jan_Horse

Moderator
Staff member
MODERATOR
ULTIMATE
PREMIUM
MEMBER
Joined
Oct 23, 2024
Messages
304
Reaction score
8,787
Deposit
0$
1750003449199.png
Have you ever wondered how programs work "from the inside"? Reverse engineering is the key to understanding their mechanics, a powerful skill that opens many doors. Start your journey into this fascinating field with our simple and clear guide!

Introduction​

Reverse engineering is the process of analyzing software to understand its internal structure, algorithms, and operating logic. This discipline is widely used in cybersecurity to find vulnerabilities, analyze malware, check software for compliance, and to restore lost source code or ensure compatibility between different systems. However, for beginners, reverse engineering can seem complex and intimidating. In this article, we will cover the basic concepts, tools, ethical issues, and first steps in binary analysis so that you can confidently begin exploring this field.

Ethics and Law in Reverse Engineering​

Before diving into the technical details, it is essential to understand the ethical and legal framework for reverse engineering. Having knowledge in this area comes with a responsibility to apply it.

  • When is it acceptable:
    • Analysis of your own program code.
    • Research of open source software.
    • Working with programs specifically designed for training and practice (for example, Crackmes, CTF tasks).
    • Obtaining explicit written permission from the owner of the software.
    • To ensure compatibility (interoperability) in some jurisdictions, subject to strict conditions.
  • When to be careful or when it is prohibited:
    • Violation of license agreements (EULA) that may prohibit disassembly.
    • Copyright infringement (e.g. copying protected algorithms).
    • Using the acquired knowledge to create malware, hack systems, or engage in other illegal activities.
Always act responsibly and within the laws of your country. This article is for educational purposes only.

Basic concepts​

Before you begin practicing, it is important to understand the basic terms.

Binaries (executable files)​

A binary file is a file that contains instructions that a computer can directly execute. Examples: .exe (Windows), .elf (Linux), .macho(macOS) These files contain machine code.

Assembler​

Assembly is a low-level programming language that is a human-readable representation of machine code. Here is a lot of training material from @ @Marylin on assembler. Each assembler command usually corresponds to one machine instruction. For example:



C-like:
mov eax, [rbp-0x4]  ; Переместить значение из памяти (адрес rbp-0x4) в регистр eax
add eax, [rbp-0x8]  ; Сложить значение из памяти (адрес rbp-0x8) с регистром eax



Static analysis​

Analysis of a program without actually running it. Main methods:
  • Disassembly: The process of converting machine code back into assembly code. It is one of the first steps in analyzing binaries.
  • Decompilation: A more complex process of attempting to reconstruct the source code in a high-level language (such as C) from machine or assembly code. The result is not always perfect, but is often very helpful in understanding the logic.

Dynamic analysis​

Analysis of a program during its execution.

  • Debugging: The process of stepping through a program to examine its behavior in real time. Debuggers allow you to set breakpoints, monitor processor register values, memory contents, and variables.

Reverse Engineering Tools​

To perform a successful analysis, you will need specialized tools. Here are some popular options:

  1. Ghidra
    • A free, powerful set of reverse engineering tools developed by the US National Security Agency (NSA).
    • Supports disassembly and decompilation for many processor architectures. Has a graphical interface. A great choice for starting out.
    • Analysis type: Static (mostly), some possibilities for dynamics through emulation.
  2. x64dbg (for Windows)
    • A modern, free, open source debugger for 32-bit and 64-bit Windows applications with a user-friendly graphical interface.
    • Analysis type: Dynamic.
  3. Radare2 (and Rizin)
    • A free, very powerful command-line reverse engineering framework. Supports many architectures and file formats. Rizin is a fork of Radare2.
    • Has a steep learning curve but is extremely flexible.
    • Analysis type: Static and Dynamic.
  4. IDA Pro
    • Considered an industry standard, it is a very powerful interactive disassembler and debugger.
    • It has a high price, but there is a free version IDA Free with limited functionality (for example, without support for many processors or a decompiler for 64-bit). Usually this is a tool for a more advanced stage.
    • Analysis type: Static and Dynamic.
  5. OllyDbg (for Windows)
    • Classic free debugger for 32-bit Windows applications. Still popular, but for 64-bit systems it is better to use x64dbg.
    • Analysis type: Dynamic.

First Steps in Binary Analysis: A Practical Example​

Let's look at a simple example. We'll create a small C program, compile it, and then analyze it using Ghidra(static analysis) and x64dbg(dynamic analysis).

Step 0: Create a test program​

Create a file simple_sum.cwith the following code:

C:
// simple_sum.c
#include <stdio.h>
int main() {
int a = 5;
int b = 3;
int result = a + b;
return result; // The result is usually returned via the eax/rax register
}


Now let's compile it. If you have Linux or Windows with MinGW/GCC installed:
Open a terminal (command line) and run:
gcc -o simple_sum simple_sum.c -O0 -m64 (or simple_sum.exefor Windows)

  • -o simple_sum: Specifies the name of the output file.
  • -O0: (zero) Disables compiler optimizations. This is important for learning purposes, as the code will be more straightforward and easier to analyze - we will see how variables are laid out in memory.
  • -m64: Explicitly specifies to create a 64-bit executable (for 32-bit use -m32, then the registers will be ebp, esp, eaxetc.). Our assembler example below will be 64-bit oriented.

Step 1: Preparation​

Make sure you have downloaded and installed Ghidra And x64dbg. We will analyze the file simple_sum (or simple_sum.exe), which we just created.

Step 2: Disassembly (static analysis with Ghidra)​

  1. Launch Ghidra. Create a new project (File -> New Project), select "Non-Shared Project".
  2. Import your binary file ( simple_sum or simple_sum.exe) to the project (File -> Import File). Leave the default analysis settings.
  3. After analysis, open the file in CodeBrowser (double-click on the file in the project).
  4. In the Symbol Tree window (usually on the left), find and expand Functions, then find and double-click the function main.
  5. In the central "Listing" window you will see the disassembled code of the function. mainThis is static analysis - we study the code without running it.
Due to disabled optimization ( -O0) and compiler quirks, the code will include instructions for setting up the stack (the function prologue) and saving variables. You'll see something like this (the exact offsets and registers may vary slightly):


Code:

; --- Function main ---
; Prolog functions (stack frame setup)
push rbp
mov rbp, rsp
sub rsp, 0x10 ; Allocate stack space for local variables
; int a = 5;
mov DWORD PTR [rbp-0xc], 0x5 ; Store 5 in memory (variable 'a')
; int b = 3;
mov DWORD PTR [rbp-0x8], 0x3 ; Store 3 in memory (variable 'b')
; int result = a + b;
mov edx, DWORD PTR [rbp-0xc] ; Load 'a' (5) into the edx register
mov eax, DWORD PTR [rbp-0x8] ; Load 'b' (3) into register eax
add eax, edx ; Add: eax = eax + edx (3 + 5 = 8)
mov DWORD PTR [rbp-0x4], eax ; We save the result (8) in memory (variable 'result')
; return result; (the result is already in eax if it was the last one calculated there)
mov eax, DWORD PTR [rbp-0x4] ; Load 'result' (8) into eax for return
; Function epilogue (stack restoration)
leave ; mov rsp, rbp; pop rbp
ret ; Return from function

Note: eax– these are the lower 32 bits of a 64-bit register rax. The compiler uses DWORD PTRto indicate that we are working with 32-bit values (size int).

Step 3: Code Analysis (Ghidra)​

In the disassembled code we see:

  1. Prologue: push rbp, mov rbp, rsp, sub rsp, ...– standard instructions for creating a function stack frame.
  2. Initialization of variables: Values 5 And 3are placed in memory at addresses relative to the stack base pointer rbp (For example, [rbp-0xc] And [rbp-0x8]). Ghidra can even automatically add comments with variable names if you specify them (right click on the variable -> Rename Variable).
  3. Arithmetic operation: Values are loaded from memory into registers ( edx, eax), are added up ( add eax, edx), and the result is saved back to memory, and then to eaxto return from the function.
  4. Epilogue: leave And ret– standard instructions for terminating a function and returning control.
To the right of the assembler listing, Ghidra also shows a decompilation window where it tries to recreate C-like code, which is a great help in understanding.

Step 4: Debugging (dynamic analysis with x64dbg)​

Now let's see how the program runs in reality.

  1. Launch x64dbg.
  2. Open your file simple_sum (or simple_sum.exe) via File -> Open.
  3. The program will load and stop at the system entry point. We need to find our function main. Go to the "Symbols" tab and find main(may be called simple_sum.mainor similar). Double clicking on it will take you to the beginning of the code mainon the "CPU" tab.
  4. Set a breakpoint at the first instruction of the function main(for example, on push rbp) by pressing F2.
  5. Run the program until the breakpoint by pressing F9.
  6. Now follow the program step by step using:
    • F7(Step Into): Execute a statement; if it is a function call, step into it.
    • F8(Step Over): Execute a statement; if it is a function call, execute it entirely and stop after.
  7. Watch the changes in the register window (top right) and the stack/memory window (bottom). You will see how the values 5 And 3are loaded, as the register value changes eaxafter addition.
    • For example, after mov DWORD PTR [rbp-0xc], 0x5you can look in memory at the address, which is calculated as rbp-0xc, and see the meaning there 5.
    • After add eax, edxyou will see that the register EAXcontains 8.

Step 5: Conclusion from a practical example​

We successfully:

  • We statically analyzed the code using Ghidra, understood its structure and logic without running it.
  • We dynamically analyzed the program using x64dbg, tracking its execution step by step and changing data in registers and memory.
This is a very simple example, but it demonstrates the basic workflow: combining static and dynamic analysis to understand the behavior of a program.

Step 5: Conclusion from a practical example​

We successfully:
  • We statically analyzed the code using Ghidra, understood its structure and logic without running it.
  • We dynamically analyzed the program using x64dbg, tracking its execution step by step and changing data in registers and memory.
This is a very simple example, but it demonstrates the basic workflow: combining static and dynamic analysis to understand the behavior of a program.

Safety during analysis​

When working with unknown or potentially malicious files, always use an isolated environment , such as a virtual machine (VirtualBox, VMware). This will prevent your main system from being infected if the file being analyzed turns out to be malicious. For malware analysis, this is an absolute requirement.

Conclusion​

Reverse engineering is a powerful tool that opens the door to a deep understanding of software. Starting with simple examples and your own small programs , you can gradually move on to more complex tasks, such as malware analysis, vulnerability research, or ensuring software compatibility. The main thing is not to be afraid, practice a lot, and always remember the ethical side of the issue and comply with the law .

Frequently asked questions​

  1. Is reverse engineering difficult to learn?
    This can be challenging and requires patience, attention to detail and regular practice. However, it is a very exciting area. Starting with the basics, as in this article, and gradually increasing the complexity is the key to success.
  2. What programming languages do you need to know?
    A deep understanding of C and/or C++ is essential, as many of the programs you will analyze are written in them. Knowledge of Assembler work understanding of how operating systems for your target architecture (e.g. x86/x64, ARM) is fundamental. A basic will also help a lot. Python is often used to write helper scripts and automate tasks.
  3. Is it possible to reverse engineer on Linux/macOS?
    Yes, of course. Many tools, such as Ghidra, Radare2/ Rizin, cross-platform. For debugging on Linux, popular GDB(often with graphical shells like GEF, Peda, Pwndbg), on macOS – LLDB.
  4. Where can I find binaries for practice?
    • Create your own! Write simple C/C++ programs, compile them with different optimization options, and analyze them. This is the best way to understand how high-level language constructs are converted into machine code.
    • Crackmes: Sites like crackmes.oneoffer programs specifically designed to train reverse engineering skills.
    • CTF Competitions: Many Capture The Flag (CTF) competitions include reverse engineering challenges (categories "Reverse", "Pwn").
    • Educational materials from the authors of tools or courses.
 

Step 4: Debugging (dynamic analysis with x64dbg)​

Now let's see how the program runs in reality.

  1. Launch x64dbg.
  2. Open your file simple_sum (or simple_sum.exe) via File -> Open.
  3. The program will load and stop at the system entry point. We need to find our function main. Go to the "Symbols" tab and find main(may be called simple_sum.mainor similar). Double clicking on it will take you to the beginning of the code mainon the "CPU" tab.
  4. Set a breakpoint at the first instruction of the function main(for example, on push rbp) by pressing F2.
  5. Run the program until the breakpoint by pressing F9.
  6. Now follow the program step by step using:
    • F7(Step Into): Execute a statement; if it is a function call, step into it.
    • F8(Step Over): Execute a statement; if it is a function call, execute it entirely and stop after.
  7. Watch the changes in the register window (top right) and the stack/memory window (bottom). You will see how the values 5 And 3are loaded, as the register value changes eaxafter addition.
    • For example, after mov DWORD PTR [rbp-0xc], 0x5you can look in memory at the address, which is calculated as rbp-0xc, and see the meaning there 5.
    • After add eax, edxyou will see that the register EAXcontains 8.
 
Top Bottom