Exec Shield for the Linux Kernel
DarkOx writes "There is a new patch from Ingo Molnar which can prevent overflow attacks. The scoop from KernelTrap is as follows: Ingo Molnar has announced a new kernel-based security feature for Linux/x86 called 'Exec Shield'. He describes the patch, which is against the 2.4.20-rc1 kernel, as: 'The exec-shield feature provides protection against stack, buffer or function pointer overflows, and against other types of exploits that rely on overwriting data structures and/or putting code into those structures. The patch also makes it harder to pass in and execute the so-called 'shell-code' of exploits. The patch works transparently, ie. no application recompilation is necessary.'"
the patch was designed to be as efficient as possible. There's a very minimal (couple of cycles) tracking overhead for every PROT_MMAP system-call, plus there's the 2-3 cycles cost per context-switch.
Someone on lkml asked:
Slightly off-topic, but does anybody know whether IA64 or x86-64 allow you to make the stack non-executable in the same way you can on SPARC?
and hpa replied to this:
x86-64 definitely does, and it's the default on Linux/x86-64.
Up to now, x86 chips have not been able to separate Read from Execute privileges for memory segments, which makes it hard to make stacks non-executable. This is excellent news for anyone looking forward to AMD's x86-64 chips...it keeps looking like they've done the Right Things.
The limitation is with most x86 processors. The new AMD x86-64 architecture does not suffer from this problem, and I believe linux on it defaults to nonexecutable stack (at least that's what was said on the LKML).
It only says the code is mapped to the bottom 15MB. Each program has its own address space. It still exists in any part of physical memory it wants to, but the beginning virtuial address is put in the lower 16MB instead of the current default, which I believe relocates programs to 0x08048000 or something like that.
A solution to the problem with music today
Would a memcopy from the heap into executeable space be a fix?
Perhaps even better would be to have your own "private" heap in userland. That would protect the OS, and you get to use your trick still.
I'm not a coder by trade, so I can't really critisize what you're doing. I do understand operating systems and memory allocation therein though, so my admittedly uninformed opinion is that you're employing a somewhat dangerous hack here. The heap was not intended to hold executeable programs - you'd be broken on x86-64 as well. Using what amounts to a design flaw in a program isn't what I would be willing to call "good design". Might be better to come up with a different method - Ingo Molnar seems to think you do at any rate.
Soko
"Depression is merely anger without enthusiasm." - Anonymous
It's against the 2.4.21-rc1 kernel, not 2.4.20-rc1.
While it is true that in general even HUGE changes to the kernel rarely need an application recompile, and are transparent, sometimes this is not the case.
Consider the following:
Actually, even this patch is not entirely transparent. In order to best benefit from the ASCII-armor area, you will notice that in the readme text file they actually gave a patch to binutils to make executables try and use a lower address for their program text. Executables (unlike shared libraries) aren't relocatable and thus need to be re-linked in order to use a different (lower) address...
... assuming that the stack memory is marked as executable in the binary - which will have the net effect of turning off the Exec-shield.
In other words, ELF format has flags to indicate if the stack should be readable and/or executable. If your doing that sort of thing, make sure that the flag is set, and you'll have no problems [0].
It your doing those sort of tricks, your probably being very careful with what goes where, and buffer lengths and such. The problems come in when people don't realise that there could be a problem, and don't audit the buffer handling code properly.
So, don't worry, just use the ELF flags.
[0] Well, you could set the feature to ignore what the binary says, and implement the security anyway. But that's not a good idea, and very much not default.
Yes, grsecurity includes equilevant protection to this. It uses PAX kernel patch for this. It has existed for years, and Ingo really should have mentioned that in his announcement, assuming he even knew about it?
Difference is that PAX uses some weird x86 kludges to do this and it causes slight speed difference (max. 10% IIRC), but I think that's a very small penalty for very good protection against buffer overflows. It also reduces the maximum memory available to process by .. was it 1-2GB, but there's not many processes that really need that much. And Ingo's patch seems to limit at least number of executable pages even more so.
I've been screwing around with mprotect() and friends lately to write a exploit delivery system that can't be read by memory inspection tools on the target machine. While checking to see if similar techniques are possible on Windows, I found that the default addresses for PE/i386 executables' stack and text sections are all below 0x00ffffff. .text begins at 0x00400000, for example, and the stack is located below it.
:) I was wondering for the longest time why they chose those mappings by default until I saw this article today.
So it looks like Microsoft beat Ingo to it
-- thalakan
The problem is the way that people use x86's paging mechanism. Each page can, in fact, be distinctly assigned to being either code or data, but because each page requires a distinct entry in the page table, if each process had to have separate code and data pages, then it would stand to double the number of page table entries.
So why did they do it this way, if it was so simple to avoid? Back in the early 90's, when Linux was first written, it was highly desirable (with protected mode code in general) to minimize distinct page table entries in the x86 because page tables could take up so much space. So protected mode programs (and operating systems) were typically designed so that the code and data segment descriptors could end up mapping to the same pages in the x86's paging mechanism to reduce overall page count.
Indeed, if the x86 had not allowed a page to be both writeable and executable, then this would never have been an issue. Although it would have had the memory overhead in the page table entries that I described earlier, since we wouldn't have even had a choice in the matter, we would have just had to settle for what we got (although back then the design would have likely been criticized as being wasteful of memory).
File under 'M' for 'Manic ranting'
If and when different pages are used for code and data, since both already use different segment descriptors, the code and data have their own completely isolated address spaces, so trying to invoke a buffer overflow that would cause a "return" to some point in the user-supplied data would actually simply cause a return to the corresponding point in the code page, which is, of course, not actually modified by the application. At best, a buffer overflow could cause a branch to a particular section of the existing program, but would not permit the execution of arbitrary code since in the ideal case, even the kernel code resides in an address space that is invisible to any running application. The most probable upshot of trying to "return" in this fashion would be a segmentation fault, which should result in no more than the application simply terminating with a core dump.
File under 'M' for 'Manic ranting'
This means that even if an application marks a certain memory area non-executable (by not providing the PROT_EXEC flag upon mapping it) under x86, that area is still executable, if the area is PROT_READ.
No. The x86 page table has 12 bits per page table entry for storing page information. It contains a bit for R/W (read/write) which you can force a page read-only; and it contains a bit U/S (user/supervisor) which you can force a page usable only by the kernel. There is nothing which says "this page must not be executed as code". So Linux kernel actually has an interface that only some hardware provides. I don't think now it still has spare bit to give for executable bit.
Furthermore, the x86 ELF ABI marks the process stack executable, which requires that the stack is marked executable even on CPUs that support an executable bit in the pagetables.
It is not about "do the right thing". The processor simply has no such bit, so there is no new "right thing" for it to do---it is already doing the right thing. The processor assumes that segmentation is used to enforce execute permission, so that each library code should be allocated a segment and inter-segment jumps and calls should be used to access them. In such way only read-only code segments are executable. Linux simply decided at the very beginning not to employ this facility.
ELF is not designed by Linus. And even if ELF is changed so that stack is not assumed executable by default (which probably break some programs that rely on executable stack), all computers from 386 to P4 will not benefit from it.sorry, but the number of context switches is not limited by the HZ setting. thats the maximum number that would occur if there were no blocked threads and no interrupts (except the system timer) causing threads to become runnable.
my system runs audio software that is generally powered by the audio interface interrupt, which occurs at about 1kHz. every single one of those interrupts generally leads to about 4 context switches in the typical case. thats about 4000 context switches per second.