Table of Contents
ToggleWhat is Reverse Engineering?
Reverse engineering means to break down an object into simpler constituent parts and learn about its internal workings.
What is GDB?
GDB or GNU Debugger – It is a tool to debug programs written in C, C++, Go, Rust, etc. It works on the binary files made after compiling the source code.
GDB can be used to understand how the program is working at a low level during the runtime. We can see the values or addresses stored in registers, the values stored at the stack, the next instructions, etc. We can even set breakpoints in between the program lines, which could be helpful in debugging, allowing us to view the values in registers after each instruction and thus we can analyse how the changes are taking place.
Some important terms to know
Registers – A type of computer memory that is built inside the CPU. They are temporary storage and are very fast. There are different types of registers:
https://wiki.cdot.senecacollege.ca/wiki/X86_64_Register_and_Instruction_Quick_Start
https://www.cs.uaf.edu/2017/fall/cs301/lecture/09_11_registers.html
- RSP – Stack Pointer (Points to the top of the stack)
- RBP – Base Pointer (Points to the base of the stack)
- RAX, RBX, RCX, RDX – General purpose register
- Functions return value to the RAX register.
- RDI – Scratch register (Used to pass 1st argument to a function)
- RSI – Scratch register (Used to pass 2nd argument to a function)
- RIP – Instruction pointer register (Points to the next instruction to execute)
Stack – In assembly, the stack grows downwards. The stack with a lower address value would contain newer elements.
Flag – The flag register contains information about the status of the processor after executing an instruction. It is used for conditional branching.
Writing a C program
Let’s write a simple C program and then debug it with GDB. We can also see how reverse engineering can allow an attacker to leak/steal sensitive information from the binary.
#include <stdio.h>
#include <string.h>
int main() {
char inputstr[100];
printf("Welcome to the game \\n");
printf("Enter the password to continue...\\n");
if (scanf("%s", inputstr) != 1) {
fprintf(stderr, "Error reading input.\\n");
return 1;
}
if (strcmp(inputstr, "mypassword") == 0) {
printf("Your password is correct! Please move forward!");
return 1;
}
else
{
printf("Incorrect password! Try Again.");
}
return 0;
}
Explanation of the Program
This C program prompts the user to enter a password(string) and then checks the user input with a hard-coded password and prints whether the password is correct or wrong.
We can generate the binary file for the above code by compiling it using gcc.
gcc -o sample sample.c
(Assuming the program as sample.c and its binary created as sample)
Now, let’s say that we only have the binary of an application and not the source code. This is where we can use GDB to debug it and get to know about the workings of the application.
Compiling the C program and generating the binary for it.
Reverse Engineering the C Program
Run GDB on the binary with the command gdb sample
disass main
– This command is used to disassemble the main function and give the assembly code for it. However, the assembly code given is not very easily readable as the default is AT&T format.
We can make it readable, by setting the format to Intel.
set disassembly-flavor intel
Dump of assembler code for function main:
0x0000000000001179 <+0>: push rbp
0x000000000000117a <+1>: mov rbp,rsp
0x000000000000117d <+4>: sub rsp,0x70
0x0000000000001181 <+8>: lea rax,[rip+0xe80] # 0x2008
0x0000000000001188 <+15>: mov rdi,rax
0x000000000000118b <+18>: call 0x1030 <puts@plt>
0x0000000000001190 <+23>: lea rax,[rip+0xe89] # 0x2020
0x0000000000001197 <+30>: mov rdi,rax
0x000000000000119a <+33>: call 0x1030 <puts@plt>
0x000000000000119f <+38>: lea rax,[rbp-0x70]
0x00000000000011a3 <+42>: mov rsi,rax
0x00000000000011a6 <+45>: lea rax,[rip+0xe95] # 0x2042
0x00000000000011ad <+52>: mov rdi,rax
0x00000000000011b0 <+55>: mov eax,0x0
0x00000000000011b5 <+60>: call 0x1060 <__isoc99_scanf@plt>
0x00000000000011ba <+65>: cmp eax,0x1
0x00000000000011bd <+68>: je 0x11e9 <main+112>
0x00000000000011bf <+70>: mov rax,QWORD PTR [rip+0x2e7a] # 0x4040 <stderr@GLIBC_2.2.5>
0x00000000000011c6 <+77>: mov rcx,rax
0x00000000000011c9 <+80>: mov edx,0x15
0x00000000000011ce <+85>: mov esi,0x1
0x00000000000011d3 <+90>: lea rax,[rip+0xe6b] # 0x2045
0x00000000000011da <+97>: mov rdi,rax
0x00000000000011dd <+100>: call 0x1070 <fwrite@plt>
0x00000000000011e2 <+105>: mov eax,0x1
0x00000000000011e7 <+110>: jmp 0x1237 <main+190>
0x00000000000011e9 <+112>: lea rax,[rbp-0x70]
0x00000000000011ed <+116>: lea rdx,[rip+0xe67] # 0x205b
0x00000000000011f4 <+123>: mov rsi,rdx
0x00000000000011f7 <+126>: mov rdi,rax
0x00000000000011fa <+129>: call 0x1050 <strcmp@plt>
0x00000000000011ff <+134>: test eax,eax
0x0000000000001201 <+136>: jne 0x121e <main+165>
0x0000000000001203 <+138>: lea rax,[rip+0xe5e] # 0x2068
0x000000000000120a <+145>: mov rdi,rax
0x000000000000120d <+148>: mov eax,0x0
0x0000000000001212 <+153>: call 0x1040 <printf@plt>
0x0000000000001217 <+158>: mov eax,0x1
0x000000000000121c <+163>: jmp 0x1237 <main+190>
0x000000000000121e <+165>: lea rax,[rip+0xe73] # 0x2098
0x0000000000001225 <+172>: mov rdi,rax
0x0000000000001228 <+175>: mov eax,0x0
0x000000000000122d <+180>: call 0x1040 <printf@plt>
0x0000000000001232 <+185>: mov eax,0x0
0x0000000000001237 <+190>: leave
0x0000000000001238 <+191>: ret
End of assembler dump.
Now, we can start analysing the assembly code and compare it with the original C source code to get the idea.
rbp is the base pointer register, and rsp is the stack pointer register. The value of rsp is copied to rbp (using mov command), to set the base reference because rsp register value keeps on changing during the program, but rbp register value does not.
sub rsp,0x70
command is used to allocate space for the stack.
In the program, we have used char inputstr[100]; allocating a 100-character length for a character array to store the string. Thus, the assembly allocates 0x70 bytes meaning 112 bytes for the stack and moves the top of the stack i.e., rsp, down by 112 bytes. The stack grows downwards in assembly.
There are two printf statements in the source code that are denoted by the two puts method call in assembly.
For a function call, the first argument is passed in the form of the rdi register, so we can see that the rdi register is set with the value of the rax register (using the mov command) before calling the puts method.
rax register is set with the address of a memory location after 0x80 bytes from the current instruction. This address would contain the statement to be printed by the puts function.
In the program, the scanf method is being called for taking user input. scanf function requires two arguments (format specifier and the memory location to store user input).
The first argument while calling a function is rdi and the second argument is rsi.
The scanf function will store the user input into the address stored in rsi (rbp-0x70 – meaning 112 bytes below the base pointer), and it will use the format specifier stored in the address provided by the rdi register. The rdi register contains the address of the format specifier present at address rip+0xe95.
The return value of a function is stored in the rax register, and eax is a smaller version in the rax register that uses 4 bytes. The value at eax register is compared with 1 because the output of the scanf function is compared with 1 in the source code. If the value is not 1, then the program prints the error message and exits.
In the assembly code, the je command is used to move the control to further statements if the value of eax matches 1 (condition is true).
If the condition is false, then the control just passes to the next line which prepares to call the fprintf function.
In the assembly code, the fprintf function is implemented as fwrite function that takes 4 arguments. Here the 4 arguments were the registers (rdi, edx, esi, ecx). rdi contains the string to be printed. edx contains the size of the string (0x15 bytes), esi contains the number of strings to be printed (1) and ecx points to the stderr (standard error stream).
jmp 0x1237 <main+190>
command takes control to return the line at the end of the main function to exit the program.
If the scanf function return value would have been 1 (meaning the user input is successful), then the program jumps to je 0x11e9 <main+112>
which is out of the if condition for user input. In the next line, there is again an if condition, where the output of a function strcmp is being checked.
Here, the strcmp function is being called to compare the two strings – user input and actual hard-coded password in the program. strcmp function takes two arguments, and hence the value of rdi (first argument) and rsi (second argument) is to be set. One of these registers would store the hard-coded password, and the other would store the user input.
We can set a breakpoint at the strcmp function call and check the value of registers. Both would have the address of the starting characters of the strings.
Now, if we type the start command again after setting the breakpoint, the command will run once again with a default breakpoint at main. We can continue by typing the c command. The program executes the scanf function and asks for the user input, after which the program stops at the breakpoint set by us at strcmp function call.
We can use the command x/5i $rip
to get the next 5 instructions in line to be executed. We can see the strcmp function call is in the next instruction. So, the program must have set the argument list to call the strcmp function containing the user input and hard-coded password.
GDB allows us to use x/s
command to print the string characters starting from the address specified in the following address until the newline character is reached. x/s $rdi
and x/s $rsi
can be used to print the string values present at both the addresses in the rdi and rsi registers.
Hence, we could see the hard-coded password even if we didn’t have the source code.
Suggested Reads:
- Reverse Engineering Android apps with Smali – Introduction to Smali
- Reverse Engineering Windows apps with OllyDbg – An Introductory Blog To Reverse Engineering