lunes, 21 de junio de 2010

The so called Return Oriented Programming...

Dear Diary,
I always hate the name Returned Oriented Programming, not because it might be an accurate name but just cause it sounds like they are reinventing the wheel once again. Paraphrasing aeschylus sometimes i think the offensive security is just bread crumbs from the great banquet of the end of the 90's hackers.
Anyway, recently I have to teach a class in Norway, the group of students was very smart which always help you push further and further. The last day, as part of our advance stack overflow class, we teach them how to write a ROP shellcode and go the next step and write their automatically tool.
Obviously, one day is not much to write your own tool, but was enough to write their shellcode which I'm proud to said it was half the size of the public exploit I saw.
Since the term start getting over-hyped, I think in a way it make it look far and harder. But thinkings of way to teach it in a class, makes me realize how simple it is. You see some exploits going for the most complicated solutions while the simple ones are shorter and more accurate sometimes.
So let me give you some hints, which are part of the Advance Stack Overflow class at Immunity:

The first step before start writing a ROP shellcode is to plan the strategy ahead, else you will be improvising in the middle of your shellcode and the consequences are going to be just ugly.

Here are some bulletpoints of what you should be thinking ahead before starting your shellcode:

  1. Restriction Bypass
    As we know, the trick behind ROP is that you are basing all your return values on one or two dlls whose base is known by you, either by an infoleak or just lack of REBASE flag.
    With that in mind, you need to find out which API functions are being imported (statically or dinamically) in order to find out how you are going to bypass it.

    There are probably a bunch more, you just need to use your imagination. A recently paper from Brett Moore, make me realize you can use the same trick too.
    Let said you don't have access to any of those API functions, so as you see, the field to play with is very small, an interesting way to potentially bypass it will be by using GlobalAlloc, or any kind of heap wrapper.
    When we call any of those functions with normal size, it will returns us a memory chunk. With the address of the memory chunk we can easily obtain the Heap Segment (usually the LSB are zero). Once you get the Heap Segment, you can grab the address of the PHEAP from it and change the permission flag into EXECUTABLE HEAP. Then the next step will be to force a second allocation of a big size such as it's going to use VirtualAlloc and make it Executable. Voila!
  2. Which registers we control?
    a) Can we control the content to all the registers (In Immunity Debugger a good way to check it, is just to get a POP R32/RETN)
    b) Exchange / Move between registers: All kind of combinations and flavour, whether is just a “MOV RA, RB” , “OR RA, RB” or “XCHG RA, RB”. And so on... Swapping with ESP is always important.
    c) Register logic: Look for different types of register logic, this will allow you to later bypass bad characters restrictions. (NEG R32, etc)
  3. Memory Access Instruction
    You most likely will be doing a lot of READ / WRITE operations, thats the main point of the so called ROP.
    MOV [R32], R32
    MOV R32, [R32]
  4. CPU Context
    Very important to reduce the amount of dwords used, always check if there is more than just ESP pointing to the controlled buffer.

Exploiting 101

The main trick to make ROP very simple (if we can really called it a trick), is just generate a parallel stack for calling functions. The parallel stack if needs to be created on a known address (if possible, which most of the time it is, else we can use ESP itself), if you know the base address of your dll, you most of the times can calculate the address of the .data (RW) section. That's a good spot to start creating your parallel stack (There are static RW pages at the same address along all version of Windows, but that is an exercise for the readers).

In the exercise we were exploiting on class, we were able to respond to all of the questions in the strategy, we have VirtualProtect imported, all the registers could be written with whatever we want, we were able to xchg with the stack , all kind of combinations of memory READ and WRITE and finally EBP was pointing to our stack.

pop R32/ ret // All the registers
mov [ecx], eax / ret // Write to ECX
mov eax, [ecx] / ret // Read from ECX
mov [eax], ebp / ret // Write to EAX content of EBP
xchg eax, esp // Exchange eax with ESP
neg eax // To bypass character restrictions

With that combination, creating the parallel stack to call VirtualProtect on our stack was trivial:
We just need to get the address of VirtualProtect, copy into the parallel stack and start writing all the arguments of VirtualProtect in the parallel stack, for the address to “unprotect” we used the stack itself taken from EBP, the other arguments were just trivial to craft. At the end, you xchg ESP with your parallel stack that will execute VirtualProtect (with the same ret2libc trick you were using) and later jump back to the stack, this time to actually execute your shellcode.


MOV EAX, [ECX] // EAX now holds the address of VirtualProtect
MOV [ECX], EAX // Paralel Stack : 0: [ VirtualProtect ]
MOV [EAX], EBP // Paralel Stack: 8 [ Address of Stack ]
NEG EAX // EAX= 0x2000 Bypassing bad characters
DATA + 0xC // ECX = DATA + 0xC
MOV [ECX], EAX // Paralel Stack : C: [ Size: 0x2000 ]
-0x40 // EAX = -0x40
NEG EAX // EAX = 0x40
DATA + 0xC // ECX = DATA+0xC
MOV [ECX], EAX // Paralel Stack : 0x10: [ Flag: 0x40 ]
DATA + 0x10 // ecx = data+0x10
DATA + 0x60 // eax = data+0x60
MOV [ECX], EAX // Paralel Stack : 0x14: [ OldProtect: Writeable addres in data ]
DATA // eax = data

At this point in code, where we change switch to a parallel stack that it will look like:

[ VirtualProtect ]
[ XXXXXXXX      ]
[ Stack addr      ]
[ 0x2000            ]
[ 0x40                ]
[ DATA +0x60   ]

When the new parallel stack get executed, it will call VirtualProtect on our stack address and later return to XXXXX (I didn't set it, but that should be your stack :)


Once you get shellcode execution, you should go back to your very simple primitives and try to sequence of opcode that do what you one in less step, like a combination of POP or multiple memory access. Always keep in mind that the less return addresses you have, the easy to port and make universal (?).
The lesson learned is that a rop shellcode can easily be understood and written as a series of calls, read and writes instructions. 
At the end, this is nothing more than the old school return to libc, I recommend to read Pablo's 2008 presentation about DEPLIB  for ideas on how to write your own ROP tool.
Sorry for the lack of images or screenshot, but i'm actually should be spending all my free time getting my research done for Blackhat.

The picture on this post was taken by Igor Siwanowicz