Heap Protection Mechanism
An anonymous reader writes "There's an article by Jason Miller on innovation in Unix that talks about OpenBSD's new heap protection mechanism as a major boon for security. Sounds like OpenBSD is going to be the first to support this new security method."
Let's hope it's not as broken as Microsoft's attempt in SP2.
But why did it take so long to implement?
Kudos to the OpenBSD folks for being at the cutting edge, in terms of implementations of these security features. Where they lead, surely others will follow and we'll be seeing this feature become commonplace. As their focus is security, its understandable that they lead more incentives in these areas than more mainstream Linux distributions.
Business Voyeur
Ok, the article is light on technical details, but it seems that they are using guard pages. Guard pages aren't exactly shiny new. Efence has been using them since a long long time.
Is it really true that the standard GNU/Linux heap implementation holds onto pages like this when it becomes fragmented? That sounds really primitive to me.
The upcoming GCC 4.1 release will include a stack protector. Basically it's a reimplementation of the old propolice patch.
Hopefully mainstream distros that have been wary of propolice will start using this new feature. And perhaps glibc malloc will borrow a few tricks from this new openbsd malloc too.
DEP from Microsoft is only enabled by default on some system binaries. Here is how to enable it for everything: http://www.microsoft.com/technet/security/prodtech /windowsxp/depcnfxp.mspx
My CPU doesn't support DEP in hardware, so I imagine the software-based method of doing this will create quite a speed hit. Anybody have any experience with turning on DEP for all programs?
OSes can put Intron between Exon (useful DNA -- useful stuff on the heap) to
detect badly behaving apps!
In other words, so-called "Junk DNA" may actually have a use...
HEAP PROTECTION
From the kerneltrap.org post:
He explains that for over a decade efforts have been made to find and fix buffer overflows, and more recently bugs have been found in which software is reading before the start of a buffer, or beyond the end of the buffer.
The solution that the kerneltrap.org refers to against buffer overflows is to:
My opinion is that #1 will slow software down, although it will make it indeed more secure. #2 will make it more difficult to exploit buffer overflows, since the space between two allocated heap blocks will be random (and thus the attacker may not know where to overwrite data).
Unless I haven't understood well, these solutions will not offer any real solution to the buffer overflow problem. For example, stack-based exploits can still be used for attacks. The solution shown does not mention usage of the NX bit (which is i86 specific). It is a purely software solution that can be applied to all BSD-supported architectures.
Since all the problems relating to buffers (overflow and underflow) that have costed billions of dollars to the IT industly is the result of using C, doesn't anyone think that it is time to stop using C? there are C-compatible languages that allow bit manipulation but don't allow buffer overflows; e.g. Cyclone.
Then again, while I'm not defending Microsoft, here's my take on their 'security':
Yes, plenty, and maybe even most of their promises about being a generally secure system are complete and utter rubbish. However, I'm willing to bet that each of their OSes are more secure than the last one. The problem is that they still leave plenty of holes open when they do things like (to point out the landmark example) weld the web browser to the kernel. I know that most people crack windows because it's easy, but while I may be wrong on this, I think people will continue to spend thier efforts on Windows even if their security was (competely hypothetically) top-notch only because of the bad reputation that precedes them.
I'm not saying that Windows is secure or that it ever will be. However, their security has improved, regardless of how poorly. The last reason on the list to crack Windows (in my opinion) and possibly the strongest reason is that they have a history of poor security. I think script-kiddies will pour ANY amount of effort into destroying any version of Windows just to keep that idea alive.
As much as I love Linux, I really doubt that any one distro (like RedHat for example) would be able to keep their system as secure as it is now if they were the entire world's information security scapegoat as MS is now. (PS, yes I know that MS does, mostly deserve the title they hold)
(I'm not a security expert, nor am I claiming to be, so if you think I'm wrong all I ask is that you not 'correct' me with a torch ^_^)
Perfecting Discordia
www.stevenvansickle.com
In the three cases we ran into when I was a consultant last year we placed the blame squarely on the shoulders of the vendor with the foobar product. They were coding in an unsafe manner to an undocumented API, it was 100% their fault. Now a naive home user with no real computer savy might blame MS, but when you look at the tradeoffs it seems like the security cleanup was good for the entire computer using public, even if a few people did have an app or two broken.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Yep. After all, the goal of the OpenBSD project is not simply to be the most secure operating system ever. The goal is to provide the best security, along with a number of other goals, such as running Unix software, achieving good performance, and providing a good (according tho the stated goals even "the best") development platform.
Please correct me if I got my facts wrong.
It may be a legitimate invention - it is cited as prior art in an ATT patent. This is also the first known example of a prior Open Source publication causing a patent filer to cite. ATT also removed a claim from the patent that my work invalidated. Just search for "Perens" in the U.S. patent database to find the patent.
We don't run it on production programs because of its overhead. To do this sort of protection exhaustively, it requires minimum two pages of the address space per allocation: one dead page between allocations and one page allocated as actual memory. This is a high overhead of page table entries, translation lookaside buffers, and completely destroys locality-of-reference in your application. Thus, expect slower programs and more disk-rattling as applications page heavily. If you are to allocate and release memory through mmap, you get a high system call overhead too, and probably a TLB flush with every allocation and free.
Yes, it makes it more difficult to inject a virus. Removing page-execute permission does most of that at a much lower cost - it will prevent injection of executable code but not interpreter scripts.
I don't think the BSD allocator will reveal more software bugs unless the programmers have not tested with Electric Fence.
Bruce
Bruce Perens.
I'd be interested to know what the performance impact of this is - exactly what counts as "too much of a slowdown".
Application heap allocation has "traditionally" been fairly inexpensive unless the heap has to be grown (update a couple of free block pointers/sizes) and the cost of growing the heap (which requires extending the virtual address space and therefore fiddling with page tables which would on a typical CPU require a mode change) is mitigated by allocating more virtual address space than is immediately needed.
If free space is always unmapped then each block allocation will require an alteration to the page tables, as will each unallocation. Not to mention that could cause the page-translation hardware to operate sub-optimally since the range of addresses comprising the working set will constantly change.
If most allocations are significantly less than a page size, then the performance impact may be minimal since whole pages will rarely become free, but if allocations typically exceed a page size, that would no longer be true. If the result is that some applications simply implement their own heap management to avoid the overhead, then you've simply increased the number of places that bugs can occur.
Just curious,
Do you also advocate four times the memory usage and double the speed? Do you blame the language or the speaker when they can't formulate a proper sentence construct? How about we just teach the programmers better instead of bitching about the tool they use.
If half the C coders out there knew the differences between stack, heap, and namespaces, we would not even be debating this issue. Don't blame the coders, blame the universities.
Enjoy,
It's just the normal noises in here.
Java's RTspec is not designed around those applications. In the Real time world a lot of people are still using ASM because in plenty of applications programmer time is much cheaper than CPU time. If your paying for 10,000,000 CPU's then and 6 programmers then paying for 50c CPU's vs. 1$ CPU's is worth a lot.
I work with real time systems and 0.0001 seconds (100 microseconds) is plenty fast for most Human to Real time systems applications. Granted JAVA is not what you want for fine-tuning your Engines performance but it's plenty fast for most applications. What makes Java so useful is you get to avoid most of the really time consuming bugs. Compare a fully functional java based multithreaded HTTP server with the C / C++ equivalent and it's going to be 1/3rd as much code. And will operate at vary close to the same speeds. In other words it's designed around applications where programmer time is worth more than machine time. We already have C so Java was built around the 95% of applications that don't need inline ASM.
I have killed BSD UNIX with buggy C networking code which is the only thing I have been unable to duplicate with good Java code. You can do bit twiddling in Java, but it's faster in C. You can have hundreds of threads doing their own thing in either but it's much easer to do that in Java than C/C++. The secret is to know enough about how Java works so that you avoid things like creating new threads that eat up a lot of time. Once you understand how things work you can use things like Thread Pooling that are extremely efficient. Instead of complaining that concatenating Strings takes so long try learning about what other tools are out there like StringBuffer.
PS: A quick look at some fast Java code. (It is a bit dated but gives you some idea what I am talking about.)
And anyone who's run a JVM knows about the price of this task -- yes GC takes time.
However, as I understood the article, the author was making a point that the way most C programmers manage memory tends to make the task more time consuming than is necessary. Therefore relying on a known optimized implementation rather than reinventing the wheel every time may be preferred. After all, it is just the VM implementor that needs to understand how to optimize the memory management, not the application developers. So yes, where the time is spent is shifted but also the amount of total execution time spent on memory management can be reduced -- because the task is managed differently.
As for the specific details of this paper, they're basically discussing how to determine which objects can be safely allocated from the stack, instead of heap, and therefore can be discarded without the usual book keeping required from a heap GC.
how many high performance memory intensive Java applications are there
Java is so widely used on the server side and middleware, it cannot be difficult to come up with examples -- Tomcat, J2EE app servers, etc. eBay for instance advertizes very clearly on their front page to be powered by Sun's Java technology. There are individual Java systems that manage millions of transactions daily, and there must be thousands of systems out there that do this every day with Java.
About 15 years ago I started using a technique to improve performance when doing lots and lots of very short term mallocs.
Essentially, I'd create a large ring buffer of malloced temporary buffers of some standard length. Any time a temporary buffer was needed, I'd grab the next one in the ring.
Before the buffer was provided to the function asking for it, the length would be checked. If the requested length was longer than the current length, the buffer would be freed and one of at least the proper length would be allocated. (I normally allocated by buffers in byte multiples of some fixed constant, usually 32.)
The idea was that by the time it was reused, what was already in the buffer was no longer needed. To achieve that, I'd estimate how many buffers might be needed in the worst case and then multiply that number by 10 for safety's sake.
My primary use of this was when doing enormous numbers of allocations of memory for formatting purposes. The function doing the formatting would request a buffer large enough to hold whatever it would need, write the formatted data into the buffer, and then return a pointer to the buffer. The calling function would simply use the buffer and never have to worry about freeing it.
The performance results were superb except in the very simplest cases where you allocated the buffers without ever using them.
I've never known anyone else who used this kind of approach although I've showed it to a large number of people.
And it can seriously increase the speed of a bit of code. I recently profiled some of my code and found that it was spending around 40% of its time in malloc and free. I created a very simple memory pool (no ring buffers, just a simple linked list with insertions and removals at the same end, nice and fast) for each frequently-allocated object type, and saw a huge speed increase.
I am TheRaven on Soylent News
None of the applications which were actually broken by XP SP2 are something granny would run. On the other hand some of the applications that were affect were, but those were generally minor problems.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
You don't "rearrange" the MMU, you just check/clear the dirty bit in the page table. This bit is set by the hardware regardless so setting it is zero overhead. Clearing these bits is still only 0.006% the time of scanning all of memory. So unless you write to 32k unique pages per GC you get a huge benefit from this.
You could probably also use the MMU to reduce pauses in the gc... you determine what objects are unreachable in the background and using the dirty bit you can tell which pages may have references into the set, so instead of starting over from scratch again you just weed out the now-reachable objects from the set, which seems like it should be a much more tractable problem.
If I wasn't working I would do this. Check out jnode or jxos for example.
Check out libumem -- much more powerful than this and has been around in Solaris for years (granted, the manpage should be better to reveal the powerful features -- but a quick look at the source at opensolaris.org reveals how to use the most advanced features).