Exec Shield for the Linux Kernel
DarkOx writes "There is a new patch from Ingo Molnar which can prevent overflow attacks. The scoop from KernelTrap is as follows: Ingo Molnar has announced a new kernel-based security feature for Linux/x86 called 'Exec Shield'. He describes the patch, which is against the 2.4.20-rc1 kernel, as: 'The exec-shield feature provides protection against stack, buffer or function pointer overflows, and against other types of exploits that rely on overwriting data structures and/or putting code into those structures. The patch also makes it harder to pass in and execute the so-called 'shell-code' of exploits. The patch works transparently, ie. no application recompilation is necessary.'"
http://people.redhat.com/mingo/exec-shield/ANNOUNC E-exec-shield
./cat-lowaddr /proc/self/maps /lib/ld-2.3.2.so /lib/ld-2.3.2.so /lib/libc-2.3.2.so
_____________________________________________
[Announcement] "Exec Shield", new Linux security feature
We are pleased to announce the first publically available source code
release of a new kernel-based security feature called the "Exec Shield",
for Linux/x86. The kernel patch (against 2.4.21-rc1, released under the
GPL/OSL) can be downloaded from:
http://redhat.com/~mingo/exec-shield/
The exec-shield feature provides protection against stack, buffer or
function pointer overflows, and against other types of exploits that rely
on overwriting data structures and/or putting code into those structures.
The patch also makes it harder to pass in and execute the so-called
'shell-code' of exploits. The patch works transparently, ie. no
application recompilation is necessary.
Background:
-----------
It is commonly known that x86 pagetables do not support the so-called
executable bit in the pagetable entries - PROT_EXEC and PROT_READ are
merged into a single 'read or execute' flag. This means that even if an
application marks a certain memory area non-executable (by not providing
the PROT_EXEC flag upon mapping it) under x86, that area is still
executable, if the area is PROT_READ.
Furthermore, the x86 ELF ABI marks the process stack executable, which
requires that the stack is marked executable even on CPUs that support an
executable bit in the pagetables.
This problem has been addressed in the past by various kernel patches,
such as Solar Designer's excellent "non-exec stack patch". These patches
mostly operate by using the x86 segmentation feature to set the code
segment 'limit' value to a certain fixed value that points right below the
stack frame. The exec-shield tries to cover as much virtual memory via the
code segment limit as possible - not just the stack.
Implementation:
---------------
The exec-shield feature works via the kernel transparently tracking
executable mappings an application specifies, and maintains a 'maximum
executable address' value. This is called the 'exec-limit'. The scheduler
uses the exec-limit to update the code segment descriptor upon each
context-switch. Since each process (or thread) in the system can have a
different exec-limit, the scheduler sets the user code segment dynamically
so that always the correct code-segment limit is used.
the kernel caches the user segment descriptor value, so the overhead in
the context-switch path is a very cheap, unconditional 6-byte write to the
GDT, costing 2-3 cycles at most.
Furthermore, the kernel also remaps all PROT_EXEC mappings to the
so-called ASCII-armor area, which on x86 is the addresses 0-16MB. These
addresses are special because they cannot be jumped to via ASCII-based
overflows. E.g. if a buggy application can be overflown via a long URL:
http://somehost/buggy.app?realyloooooooooooooooooo oong.123489719875
then only ASCII (ie. value 1-255) characters can be used by attackers. If
all executable addresses are in the ASCII-armor, then no attack URL can be
used to jump into the executable code - ie. the attack cannot be
successful. (because no URL string can contain the \0 character.) E.g. the
recent sendmail remote root attack was an ASCII-based overflow as well.
With the exec-shield activated, and the 'cat' binary relinked into the the
ASCII-armor, the following layout is created:
$
00101000-00116000 r-xp 00000000 03:01 319365
00116000-00117000 rw-p 00014000 03:01 319365
00117000-0024a000 r-xp 00000000 03:01 319439
0024a000-00
lf(1): it's like ls(1) but sorts filenames by extension, tersely
the patch was designed to be as efficient as possible. There's a very minimal (couple of cycles) tracking overhead for every PROT_MMAP system-call, plus there's the 2-3 cycles cost per context-switch.
I'm not familiar with grsec, but from a brief glance at the grsec site I don't see how they are alike. grsec appears to be an ACL based security model, whereas exec-shield just makes it a lot harder for the CPU to execute code that was never meant to be executed (from the owner of the machine's point of view...).
All I can say is try it. There's a lot more to it than just an ACL. In fact I don't even use the ACL, I use all the _other_ security enhancements it offers.
Someone on lkml asked:
Slightly off-topic, but does anybody know whether IA64 or x86-64 allow you to make the stack non-executable in the same way you can on SPARC?
and hpa replied to this:
x86-64 definitely does, and it's the default on Linux/x86-64.
Up to now, x86 chips have not been able to separate Read from Execute privileges for memory segments, which makes it hard to make stacks non-executable. This is excellent news for anyone looking forward to AMD's x86-64 chips...it keeps looking like they've done the Right Things.
The limitation is with most x86 processors. The new AMD x86-64 architecture does not suffer from this problem, and I believe linux on it defaults to nonexecutable stack (at least that's what was said on the LKML).
Quoting from the README: Non-executable user stack area.
Most buffer overflow exploits are based on overwriting a function's return address on the stack to point to some arbitrary code, which is also put onto the stack. If the stack area is non-executable, buffer overflow vulnerabilities become harder to exploit.
Another way to exploit a buffer overflow is to point the return address to a function in libc, usually system(). This patch also changes the default address that shared libraries are mmap()'ed at to make it always contain a zero byte. This makes it impossible to specify any more data (parameters to the function, or more copies of the return address when filling with a pattern), -- in many exploits that have to do with ASCIIZ strings.
However, note that this patch is by no means a complete solution, it just adds an extra layer of security. Many buffer overflow vulnerabilities will remain exploitable a more complicated way, and some will even remain unaffected by the patch. The reason for using such a patch is to protect against some of the buffer overflow vulnerabilities that are yet unknown.
It only says the code is mapped to the bottom 15MB. Each program has its own address space. It still exists in any part of physical memory it wants to, but the beginning virtuial address is put in the lower 16MB instead of the current default, which I believe relocates programs to 0x08048000 or something like that.
A solution to the problem with music today
Would a memcopy from the heap into executeable space be a fix?
Perhaps even better would be to have your own "private" heap in userland. That would protect the OS, and you get to use your trick still.
I'm not a coder by trade, so I can't really critisize what you're doing. I do understand operating systems and memory allocation therein though, so my admittedly uninformed opinion is that you're employing a somewhat dangerous hack here. The heap was not intended to hold executeable programs - you'd be broken on x86-64 as well. Using what amounts to a design flaw in a program isn't what I would be willing to call "good design". Might be better to come up with a different method - Ingo Molnar seems to think you do at any rate.
Soko
"Depression is merely anger without enthusiasm." - Anonymous
It's against the 2.4.21-rc1 kernel, not 2.4.20-rc1.
While it is true that in general even HUGE changes to the kernel rarely need an application recompile, and are transparent, sometimes this is not the case.
Consider the following:
Actually, even this patch is not entirely transparent. In order to best benefit from the ASCII-armor area, you will notice that in the readme text file they actually gave a patch to binutils to make executables try and use a lower address for their program text. Executables (unlike shared libraries) aren't relocatable and thus need to be re-linked in order to use a different (lower) address...
... assuming that the stack memory is marked as executable in the binary - which will have the net effect of turning off the Exec-shield.
In other words, ELF format has flags to indicate if the stack should be readable and/or executable. If your doing that sort of thing, make sure that the flag is set, and you'll have no problems [0].
It your doing those sort of tricks, your probably being very careful with what goes where, and buffer lengths and such. The problems come in when people don't realise that there could be a problem, and don't audit the buffer handling code properly.
So, don't worry, just use the ELF flags.
[0] Well, you could set the feature to ignore what the binary says, and implement the security anyway. But that's not a good idea, and very much not default.
Yes, grsecurity includes equilevant protection to this. It uses PAX kernel patch for this. It has existed for years, and Ingo really should have mentioned that in his announcement, assuming he even knew about it?
Difference is that PAX uses some weird x86 kludges to do this and it causes slight speed difference (max. 10% IIRC), but I think that's a very small penalty for very good protection against buffer overflows. It also reduces the maximum memory available to process by .. was it 1-2GB, but there's not many processes that really need that much. And Ingo's patch seems to limit at least number of executable pages even more so.
This is exec shield not executive shield, and the exec part most likely (I'm too lazy to look) implies execution rather than executive. Exec shield is a straight forward enough name I'm sure someone has used it for something somewhere before, but it's not a rip of executive shield by norton for christ sake.
The problem exists in all x86-32 processors, and always has. AMD's Opteron x86-64 does have the ability to mark such code as non-executable while readable, but only in x86-64 mode. 32-bit applications will still suffer.
While I applaud the effort made for this product, I still believe that the only way to completely shield ourselves from such attacks is to enforce it in hardware.
I've been screwing around with mprotect() and friends lately to write a exploit delivery system that can't be read by memory inspection tools on the target machine. While checking to see if similar techniques are possible on Windows, I found that the default addresses for PE/i386 executables' stack and text sections are all below 0x00ffffff. .text begins at 0x00400000, for example, and the stack is located below it.
:) I was wondering for the longest time why they chose those mappings by default until I saw this article today.
So it looks like Microsoft beat Ingo to it
-- thalakan
The problem is the way that people use x86's paging mechanism. Each page can, in fact, be distinctly assigned to being either code or data, but because each page requires a distinct entry in the page table, if each process had to have separate code and data pages, then it would stand to double the number of page table entries.
So why did they do it this way, if it was so simple to avoid? Back in the early 90's, when Linux was first written, it was highly desirable (with protected mode code in general) to minimize distinct page table entries in the x86 because page tables could take up so much space. So protected mode programs (and operating systems) were typically designed so that the code and data segment descriptors could end up mapping to the same pages in the x86's paging mechanism to reduce overall page count.
Indeed, if the x86 had not allowed a page to be both writeable and executable, then this would never have been an issue. Although it would have had the memory overhead in the page table entries that I described earlier, since we wouldn't have even had a choice in the matter, we would have just had to settle for what we got (although back then the design would have likely been criticized as being wasteful of memory).
File under 'M' for 'Manic ranting'
If and when different pages are used for code and data, since both already use different segment descriptors, the code and data have their own completely isolated address spaces, so trying to invoke a buffer overflow that would cause a "return" to some point in the user-supplied data would actually simply cause a return to the corresponding point in the code page, which is, of course, not actually modified by the application. At best, a buffer overflow could cause a branch to a particular section of the existing program, but would not permit the execution of arbitrary code since in the ideal case, even the kernel code resides in an address space that is invisible to any running application. The most probable upshot of trying to "return" in this fashion would be a segmentation fault, which should result in no more than the application simply terminating with a core dump.
File under 'M' for 'Manic ranting'
Excatly true, and in some parts of Australia, there are laws prohibiting the use of full role cages and racing harnesses because they make a vehicle safer at high speed, thus enticing the driver to drive at a higher speed.
it is only after a long journey that you know the strength of the horse.
This means that even if an application marks a certain memory area non-executable (by not providing the PROT_EXEC flag upon mapping it) under x86, that area is still executable, if the area is PROT_READ.
No. The x86 page table has 12 bits per page table entry for storing page information. It contains a bit for R/W (read/write) which you can force a page read-only; and it contains a bit U/S (user/supervisor) which you can force a page usable only by the kernel. There is nothing which says "this page must not be executed as code". So Linux kernel actually has an interface that only some hardware provides. I don't think now it still has spare bit to give for executable bit.
Furthermore, the x86 ELF ABI marks the process stack executable, which requires that the stack is marked executable even on CPUs that support an executable bit in the pagetables.
It is not about "do the right thing". The processor simply has no such bit, so there is no new "right thing" for it to do---it is already doing the right thing. The processor assumes that segmentation is used to enforce execute permission, so that each library code should be allocated a segment and inter-segment jumps and calls should be used to access them. In such way only read-only code segments are executable. Linux simply decided at the very beginning not to employ this facility.
ELF is not designed by Linus. And even if ELF is changed so that stack is not assumed executable by default (which probably break some programs that rely on executable stack), all computers from 386 to P4 will not benefit from it.I'm not sure if this answers your question, but it has been pointed out on the linux kernel mailing list that for a successful exploit you need jump into some function (which is in the ASCII armor) *and* you also have to pass parameters to that function.
So if you stop your exploit string early and use the zero byte at the end then you can jump into the ASCII area, but what you can do is quite limited, because you have no arguments set up for that function. At least this is how I understand it.
> The segmentation part of grsecurity still allows an exploit to return to a shared library addresses
;-) SEGMEXEC in PaX/grsec *completely* separates code from data, absolutely not true for Exec Shield. Studying the docs at http://pageexec.virtualave.net/docs/ or at least taking a look at /proc//maps would be useful...
;-). And they don't need to since SEGMEXEC does all and without the performance impact.
Wrong, address randomization prevents that (in a probabilistic sense). Nor does Exec Shield prevent it (hint: i386 is little endian).
> The Exec Shield also tries to limit the size of
> the code segment much more than the
> segmentation-based solution in grsecurity
> - resulting in a much smaller area of attack.
You don't have many clues what you're talking about, do you?
As for the paging based solution: it's not the default, so no wonder not many people are using it
> [Exec Shield] probably gets more [coverage]
> than the segmentation based PaX method
Completely false, see above.
sorry, but the number of context switches is not limited by the HZ setting. thats the maximum number that would occur if there were no blocked threads and no interrupts (except the system timer) causing threads to become runnable.
my system runs audio software that is generally powered by the audio interface interrupt, which occurs at about 1kHz. every single one of those interrupts generally leads to about 4 context switches in the typical case. thats about 4000 context switches per second.
So unless spawn_shell(void) exists, the armoring is quite effective.
The stack/executable bits are there, it's just not that obvious in the feature list. Here's a sample from the news page:
HJ HornbeckThis would be a good time for TASK_UNMAPPED_BASE to become an inherited parameter controlled by setrlimit/getrlimit. It would also be a good time to get a binary structure interface to /proc/self/maps, much like VirtualQuery in Win32.