Learning to Reverse Engineer with GDB

Table of Contents

What is Reverse Engineering?

Reverse engineering means to break down an object into simpler constituent parts and learn about its internal workings.

What is GDB?

GDB or GNU Debugger – It is a tool to debug programs written in C, C++, Go, Rust, etc. It works on the binary files made after compiling the source code.

GDB can be used to understand how the program is working at a low level during the runtime. We can see the values or addresses stored in registers, the values stored at the stack, the next instructions, etc. We can even set breakpoints in between the program lines, which could be helpful in debugging, allowing us to view the values in registers after each instruction and thus we can analyse how the changes are taking place.

Some important terms to know

Registers – A type of computer memory that is built inside the CPU. They are temporary storage and are very fast. There are different types of registers:

https://wiki.cdot.senecacollege.ca/wiki/X86_64_Register_and_Instruction_Quick_Start

https://www.cs.uaf.edu/2017/fall/cs301/lecture/09_11_registers.html

RSP – Stack Pointer (Points to the top of the stack)
RBP – Base Pointer (Points to the base of the stack)
RAX, RBX, RCX, RDX – General purpose register
- Functions return value to the RAX register.
RDI – Scratch register (Used to pass 1^st argument to a function)
RSI – Scratch register (Used to pass 2^nd argument to a function)
RIP – Instruction pointer register (Points to the next instruction to execute)

Stack – In assembly, the stack grows downwards. The stack with a lower address value would contain newer elements.

Flag – The flag register contains information about the status of the processor after executing an instruction. It is used for conditional branching.

Writing a C program

Let’s write a simple C program and then debug it with GDB. We can also see how reverse engineering can allow an attacker to leak/steal sensitive information from the binary.

#include <stdio.h> #include <string.h> int main() { char inputstr[100]; printf("Welcome to the game \\n"); printf("Enter the password to continue...\\n"); if (scanf("%s", inputstr) != 1) { fprintf(stderr, "Error reading input.\\n"); return 1; } if (strcmp(inputstr, "mypassword") == 0) { printf("Your password is correct! Please move forward!"); return 1; } else { printf("Incorrect password! Try Again."); } return 0; }

Explanation of the Program

This C program prompts the user to enter a password(string) and then checks the user input with a hard-coded password and prints whether the password is correct or wrong.

We can generate the binary file for the above code by compiling it using gcc.

gcc -o sample sample.c

(Assuming the program as sample.c and its binary created as sample)

Now, let’s say that we only have the binary of an application and not the source code. This is where we can use GDB to debug it and get to know about the workings of the application.

Compiling the C program and generating the binary for it.

Reverse Engineering the C Program

Run GDB on the binary with the command gdb sample

disass main – This command is used to disassemble the main function and give the assembly code for it. However, the assembly code given is not very easily readable as the default is AT&T format.

We can make it readable, by setting the format to Intel.

set disassembly-flavor intel

Dump of assembler code for function main:                                                                                    
   0x0000000000001179 <+0>:     push   rbp
   0x000000000000117a <+1>:     mov    rbp,rsp
   0x000000000000117d <+4>:     sub    rsp,0x70
   0x0000000000001181 <+8>:     lea    rax,[rip+0xe80]        # 0x2008
   0x0000000000001188 <+15>:    mov    rdi,rax
   0x000000000000118b <+18>:    call   0x1030 <puts@plt>
   0x0000000000001190 <+23>:    lea    rax,[rip+0xe89]        # 0x2020
   0x0000000000001197 <+30>:    mov    rdi,rax
   0x000000000000119a <+33>:    call   0x1030 <puts@plt>
   0x000000000000119f <+38>:    lea    rax,[rbp-0x70]
   0x00000000000011a3 <+42>:    mov    rsi,rax
   0x00000000000011a6 <+45>:    lea    rax,[rip+0xe95]        # 0x2042
   0x00000000000011ad <+52>:    mov    rdi,rax
   0x00000000000011b0 <+55>:    mov    eax,0x0
   0x00000000000011b5 <+60>:    call   0x1060 <__isoc99_scanf@plt>
   0x00000000000011ba <+65>:    cmp    eax,0x1
   0x00000000000011bd <+68>:    je     0x11e9 <main+112>
   0x00000000000011bf <+70>:    mov    rax,QWORD PTR [rip+0x2e7a]        # 0x4040 <stderr@GLIBC_2.2.5>
   0x00000000000011c6 <+77>:    mov    rcx,rax
   0x00000000000011c9 <+80>:    mov    edx,0x15
   0x00000000000011ce <+85>:    mov    esi,0x1
   0x00000000000011d3 <+90>:    lea    rax,[rip+0xe6b]        # 0x2045
   0x00000000000011da <+97>:    mov    rdi,rax
   0x00000000000011dd <+100>:   call   0x1070 <fwrite@plt>
   0x00000000000011e2 <+105>:   mov    eax,0x1
   0x00000000000011e7 <+110>:   jmp    0x1237 <main+190>
   0x00000000000011e9 <+112>:   lea    rax,[rbp-0x70]
   0x00000000000011ed <+116>:   lea    rdx,[rip+0xe67]        # 0x205b
   0x00000000000011f4 <+123>:   mov    rsi,rdx
   0x00000000000011f7 <+126>:   mov    rdi,rax
   0x00000000000011fa <+129>:   call   0x1050 <strcmp@plt>
   0x00000000000011ff <+134>:   test   eax,eax
   0x0000000000001201 <+136>:   jne    0x121e <main+165>
   0x0000000000001203 <+138>:   lea    rax,[rip+0xe5e]        # 0x2068
   0x000000000000120a <+145>:   mov    rdi,rax
   0x000000000000120d <+148>:   mov    eax,0x0
   0x0000000000001212 <+153>:   call   0x1040 <printf@plt>
   0x0000000000001217 <+158>:   mov    eax,0x1
   0x000000000000121c <+163>:   jmp    0x1237 <main+190>
   0x000000000000121e <+165>:   lea    rax,[rip+0xe73]        # 0x2098
   0x0000000000001225 <+172>:   mov    rdi,rax
   0x0000000000001228 <+175>:   mov    eax,0x0
   0x000000000000122d <+180>:   call   0x1040 <printf@plt>
   0x0000000000001232 <+185>:   mov    eax,0x0
   0x0000000000001237 <+190>:   leave
   0x0000000000001238 <+191>:   ret
End of assembler dump.

Now, we can start analysing the assembly code and compare it with the original C source code to get the idea.

rbp is the base pointer register, and rsp is the stack pointer register. The value of rsp is copied to rbp (using mov command), to set the base reference because rsp register value keeps on changing during the program, but rbp register value does not.

sub rsp,0x70 command is used to allocate space for the stack.

In the program, we have used char inputstr[100]; allocating a 100-character length for a character array to store the string. Thus, the assembly allocates 0x70 bytes meaning 112 bytes for the stack and moves the top of the stack i.e., rsp, down by 112 bytes. The stack grows downwards in assembly.

There are two printf statements in the source code that are denoted by the two puts method call in assembly.

For a function call, the first argument is passed in the form of the rdi register, so we can see that the rdi register is set with the value of the rax register (using the mov command) before calling the puts method.

rax register is set with the address of a memory location after 0x80 bytes from the current instruction. This address would contain the statement to be printed by the puts function.

In the program, the scanf method is being called for taking user input. scanf function requires two arguments (format specifier and the memory location to store user input).

The first argument while calling a function is rdi and the second argument is rsi.

The scanf function will store the user input into the address stored in rsi (rbp-0x70 – meaning 112 bytes below the base pointer), and it will use the format specifier stored in the address provided by the rdi register. The rdi register contains the address of the format specifier present at address rip+0xe95.

The return value of a function is stored in the rax register, and eax is a smaller version in the rax register that uses 4 bytes. The value at eax register is compared with 1 because the output of the scanf function is compared with 1 in the source code. If the value is not 1, then the program prints the error message and exits.

In the assembly code, the je command is used to move the control to further statements if the value of eax matches 1 (condition is true).

If the condition is false, then the control just passes to the next line which prepares to call the fprintf function.

In the assembly code, the fprintf function is implemented as fwrite function that takes 4 arguments. Here the 4 arguments were the registers (rdi, edx, esi, ecx). rdi contains the string to be printed. edx contains the size of the string (0x15 bytes), esi contains the number of strings to be printed (1) and ecx points to the stderr (standard error stream).

jmp 0x1237 <main+190> command takes control to return the line at the end of the main function to exit the program.

If the scanf function return value would have been 1 (meaning the user input is successful), then the program jumps to je 0x11e9 <main+112> which is out of the if condition for user input. In the next line, there is again an if condition, where the output of a function strcmp is being checked.

Here, the strcmp function is being called to compare the two strings – user input and actual hard-coded password in the program. strcmp function takes two arguments, and hence the value of rdi (first argument) and rsi (second argument) is to be set. One of these registers would store the hard-coded password, and the other would store the user input.

We can set a breakpoint at the strcmp function call and check the value of registers. Both would have the address of the starting characters of the strings.

Now, if we type the start command again after setting the breakpoint, the command will run once again with a default breakpoint at main. We can continue by typing the c command. The program executes the scanf function and asks for the user input, after which the program stops at the breakpoint set by us at strcmp function call.

We can use the command x/5i $rip to get the next 5 instructions in line to be executed. We can see the strcmp function call is in the next instruction. So, the program must have set the argument list to call the strcmp function containing the user input and hard-coded password.

GDB allows us to use x/s command to print the string characters starting from the address specified in the following address until the newline character is reached. x/s $rdi and x/s $rsi can be used to print the string values present at both the addresses in the rdi and rsi registers.

Hence, we could see the hard-coded password even if we didn’t have the source code.

Services

Products

Who we are

Resources

Tools

Community

Contact Us

Top Openings

Employee Centric Work Culture

Never Stop Learning

Cohere with the Community