Heap Protection Mechanism

Hope by Anonymous Coward · 2005-10-03 02:40 · Score: 4, Interesting

Let's hope it's not as broken as Microsoft's attempt in SP2.

But why did it take so long to implement?

OpenBSD at the cutting edge on security by Sv-Manowar · 2005-10-03 02:43 · Score: 3, Interesting

Kudos to the OpenBSD folks for being at the cutting edge, in terms of implementations of these security features. Where they lead, surely others will follow and we'll be seeing this feature become commonplace. As their focus is security, its understandable that they lead more incentives in these areas than more mainstream Linux distributions.

--
Business Voyeur

Re:OpenBSD at the cutting edge on security by Anonymous Coward · 2005-10-03 02:49 · Score: 1, Interesting

I'm quite surprized to see Theo doing this.

"... in other words, it doesn't matter WHERE you shift the buggy code."

--Theo de Raadt

Won't this crutch actually tempt people to write sloppy memory management because "the heap manager will catch it?"

Is the performance hit really worth it?

Hm... old technique? by archeopterix · 2005-10-03 02:53 · Score: 4, Interesting

Ok, the article is light on technical details, but it seems that they are using guard pages. Guard pages aren't exactly shiny new. Efence has been using them since a long long time.

Could this help Gnome? by MuMart · 2005-10-03 02:59 · Score: 3, Interesting

Sounds like a heap that returns unused pages to the system like this would help the problem described by John Moser in the Gnome Memory Reduction Project here.

Is it really true that the standard GNU/Linux heap implementation holds onto pages like this when it becomes fragmented? That sounds really primitive to me.

Re:Could this help Gnome? by Ulrich+Hobelmann · 2005-10-03 03:45 · Score: 2, Interesting

I don't know if GNU malloc uses mmap() or brk() for its allocation, but in both cases small memory chunk that the user allocates are taken from bigger, contiguous blocks of memory.

Maybe that's primitive, but it ensures that usually small memory requests are fast, and don't have to much space overhead, either.

If the memory de/allocation patterns are really bad, though, you get fragmentation with the mentioned problems. It's a tradeoff. Note that OpenBSD didn't choose its way for reducing heap usage (maybe they use even more memory, due to overheads), but for security reasons.

There would be one solution, and that's using different arenas, or memory regions for allocation. For instance every window might have its own allocation region, so when you close the window/document, the memory BLOCK is freed. No fragmentation, almost no overhead. I really wish Java apps, Cocoa apps, and other (Mozilla) would do this, as they seem to suffer from this fragmentation problem, only increasing their memory usage, even after closing all documents/windows etc.

Anyway, I love the new malloc, and kudos to the whole OBSD team!

In related news, GCC 4.1 stack protector by joib · 2005-10-03 02:59 · Score: 3, Interesting

The upcoming GCC 4.1 release will include a stack protector. Basically it's a reimplementation of the old propolice patch.

Hopefully mainstream distros that have been wary of propolice will start using this new feature. And perhaps glibc malloc will borrow a few tricks from this new openbsd malloc too.

Already in Microsoft DEP by Henry+V+.009 · 2005-10-03 03:03 · Score: 2, Interesting

DEP from Microsoft is only enabled by default on some system binaries. Here is how to enable it for everything: http://www.microsoft.com/technet/security/prodtech /windowsxp/depcnfxp.mspx

My CPU doesn't support DEP in hardware, so I imagine the software-based method of doing this will create quite a speed hit. Anybody have any experience with turning on DEP for all programs?

Re:Already in Microsoft DEP by Henry+V+.009 · 2005-10-03 03:22 · Score: 2, Interesting

Well, I've turned it on now. No noticeable performance hit, and no mysterious application failures. Just compiled a Visual C++ program and it ran fine. Same for G++ on CYGWIN.

Also, I have not been hacked anytime in the last ~5 minutes, whatever that's worth (but would I know?).

As an aside, I read the paper on the Microsoft DEP flaw a few months ago, and wasn't that impressed. It looks very hard to exploit. And since DEP is a added protection mechanism, the existence of a small, hard-to-expliot flaw isn't that big of a deal. (In simple terms, with DEP on, a hacker would have to exploit and DEP flaw and a normal overrun flaw to hack the system.)

Intron == heap protection by hey · 2005-10-03 03:03 · Score: 2, Interesting

I don't know if it was on purpose but your sig -- which mentions Intron (aka Junk DNA) is very apt.
OSes can put Intron between Exon (useful DNA -- useful stuff on the heap) to
detect badly behaving apps!

In other words, so-called "Junk DNA" may actually have a use...
HEAP PROTECTION ;-)

Wrong solution for solving heap problems. by master_p · 2005-10-03 03:06 · Score: 3, Interesting

From the kerneltrap.org post:

He explains that for over a decade efforts have been made to find and fix buffer overflows, and more recently bugs have been found in which software is reading before the start of a buffer, or beyond the end of the buffer.

The solution that the kerneltrap.org refers to against buffer overflows is to:

unmap memory as soon as it is freed, so as that to cause a SIGSEV when illegaly accessed.
have some free 'guard' space between allocated blocks.

My opinion is that #1 will slow software down, although it will make it indeed more secure. #2 will make it more difficult to exploit buffer overflows, since the space between two allocated heap blocks will be random (and thus the attacker may not know where to overwrite data).

Unless I haven't understood well, these solutions will not offer any real solution to the buffer overflow problem. For example, stack-based exploits can still be used for attacks. The solution shown does not mention usage of the NX bit (which is i86 specific). It is a purely software solution that can be applied to all BSD-supported architectures.

Since all the problems relating to buffers (overflow and underflow) that have costed billions of dollars to the IT industly is the result of using C, doesn't anyone think that it is time to stop using C? there are C-compatible languages that allow bit manipulation but don't allow buffer overflows; e.g. Cyclone.

Re:Wrong solution for solving heap problems. by bluefoxlucid · 2005-10-03 05:32 · Score: 3, Interesting

C is actually the most secure language currently, AFAICT. Languages with higher level intrinsics (C++, Java, Basic, Mono, Objective-C, etc) have a more complex implementation that may allow different exploit vectors; while languages with real-time interpretation or runtime code generation (Java or Mono with JIT) will wind up disabling things like strong memory protection policies (strict Data/Code separation -- code is data when generated at runtime) and may not in their own code create a backup buffer overflow protection.

In the event of a screw-up on the part of the JIT or runtime programmer for any language, every program is instantly vulnerable, and all of this generic proactive security stuff is disabled because this "secure language" doesn't work in an "inherantly secure" environment, only a much weakened one. C's runtime is rather basic (and it's still huge), as is its language; people still screw that up once in a while, but rarely.

While these "shiney new secure languages" may boast "immunity to buffer overflows," their runtimes are still designed around other concepts that may leave holes. Look at this memory allocator and think about a bug in the allocator that smashes up its own memory before it gets everything set up; because the new protections aren't yet set in place, it'd be totally vulnerable at that point (no practical exploit of course). A bug that forgets to add guard pages (generates 0 guard pages every time) might occur too in one update. Now add to that something like Java or Mono-- interpreted or not, you're running on a whole -platform- instead of just a runtime. C++ instruments loads of back-end object orientation.

So in short, C is a very basic language that has easily quantifiable attack vectors, and thus the system can be targeted around these for security. Several such enhancements exist, see PaX, GrSecurity, W^X, security in heap allocators, SELinux, Exec Shield, ProPolice. Higher level languages like C++ implement back-end instrumentation that ramps up complexity and may open new, unprotected attack vectors that are harder to quantify. Very high end languages on their own platform, like Java and Mono, not only implement massive complexity, but rely on a back-end that may lose its security due to bugs. Platform languages may also be interpreted or runtime generated, in which case they may require certain protections like PaX' strict NX policy to vanish; in some cases these models (as an implementation flaw) also don't work well with strict mandatory access control policies under systems like SELinux.

Face it. C is the best language all around for speed, security, portability, and maintainability. Assembly only brings excessive speed at the cost of all else; and higher level languages sacrifice both speed and real security (despite their handwaving claims of built-in security) at varying degrees for portability, speed of coding, and maintainability. Even script languages working inside a real tightly secured system would more easily fall victim to cross-site scripting, the injection of script into the interpretation line; under such a system, any similar attack is impossible in a C program.

On a side note, I'd love to see a RAD for C. Think Visual Basic 6.0, but open source, using C/GTK+. Glade comes close. . . .

--
Support my political activism on Patreon.

Re:cool by Iriel · 2005-10-03 03:18 · Score: 2, Interesting

Then again, while I'm not defending Microsoft, here's my take on their 'security':

Yes, plenty, and maybe even most of their promises about being a generally secure system are complete and utter rubbish. However, I'm willing to bet that each of their OSes are more secure than the last one. The problem is that they still leave plenty of holes open when they do things like (to point out the landmark example) weld the web browser to the kernel. I know that most people crack windows because it's easy, but while I may be wrong on this, I think people will continue to spend thier efforts on Windows even if their security was (competely hypothetically) top-notch only because of the bad reputation that precedes them.

I'm not saying that Windows is secure or that it ever will be. However, their security has improved, regardless of how poorly. The last reason on the list to crack Windows (in my opinion) and possibly the strongest reason is that they have a history of poor security. I think script-kiddies will pour ANY amount of effort into destroying any version of Windows just to keep that idea alive.

As much as I love Linux, I really doubt that any one distro (like RedHat for example) would be able to keep their system as secure as it is now if they were the entire world's information security scapegoat as MS is now. (PS, yes I know that MS does, mostly deserve the title they hold)

(I'm not a security expert, nor am I claiming to be, so if you think I'm wrong all I ask is that you not 'correct' me with a torch ^_^)

--
Perfecting Discordia
www.stevenvansickle.com

Re:new method? by afidel · 2005-10-03 03:19 · Score: 2, Interesting

In the three cases we ran into when I was a consultant last year we placed the blame squarely on the shoulders of the vendor with the foobar product. They were coding in an unsafe manner to an undocumented API, it was 100% their fault. Now a naive home user with no real computer savy might blame MS, but when you look at the tradeoffs it seems like the security cleanup was good for the entire computer using public, even if a few people did have an app or two broken.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

OpenBSD Goals by RAMMS+EIN · 2005-10-03 03:35 · Score: 2, Interesting

Yep. After all, the goal of the OpenBSD project is not simply to be the most secure operating system ever. The goal is to provide the best security, along with a number of other goals, such as running Unix software, achieving good performance, and providing a good (according tho the stated goals even "the best") development platform.

--
Please correct me if I got my facts wrong.

This is how Electric Fence works. by Bruce+Perens · 2005-10-03 03:45 · Score: 4, Interesting

Electric Fence explicitly allocates dead pages at the end (or configurably, the beginning) of buffers. It can also protect the memory immediately as it is freed. I think it was first published in 1987.

It may be a legitimate invention - it is cited as prior art in an ATT patent. This is also the first known example of a prior Open Source publication causing a patent filer to cite. ATT also removed a claim from the patent that my work invalidated. Just search for "Perens" in the U.S. patent database to find the patent.

We don't run it on production programs because of its overhead. To do this sort of protection exhaustively, it requires minimum two pages of the address space per allocation: one dead page between allocations and one page allocated as actual memory. This is a high overhead of page table entries, translation lookaside buffers, and completely destroys locality-of-reference in your application. Thus, expect slower programs and more disk-rattling as applications page heavily. If you are to allocate and release memory through mmap, you get a high system call overhead too, and probably a TLB flush with every allocation and free.

Yes, it makes it more difficult to inject a virus. Removing page-execute permission does most of that at a much lower cost - it will prevent injection of executable code but not interpreter scripts.

I don't think the BSD allocator will reveal more software bugs unless the programmers have not tested with Electric Fence.

Bruce

--
Bruce Perens.

Re:This is how Electric Fence works. by stab · 2005-10-03 07:47 · Score: 2, Interesting

The guards pages are only practical for larger allocations due to hardware limitations, as you say. Strings are protected by different means such as Propolice; in order to minimise the overhead of Propolice, it "detects" strings (as opposed to byte buffers) and specifically protects them with canaries to try and find overflows that would smash the stack (the local variable re-arrangement tries to put these buffers as close to the canary as it can at compile time). The string detection is a heuristic as gcc doesn't maintain quite enough type information when it reaches the code generation parts which Propolice touches. Also, there is a simple bounds checker built into gcc which looks for incorrect use of statically allocated buffers with some standard functions such as strncpy or sscanf (you'd be amazed how many people specify the buffer size wrong to a bounded function).

None of these are perfect of course, but each of the techniques has found bugs (hundreds in the case of the two mentioned above) in our source and ports trees. It's also great to see projects like CCured being developed at Berkeley; although the overhead is just slightly too high to be used "out of the box" right now, it still works great with select applications such as Apache. The underlying tool, CIL can compile most of the OpenBSD source tree (including the kernel) now, and the result even boots when using a null source-to-source transform.

Re:Slowdown? by (1+-sqrt(5))*(2**-1) · 2005-10-03 03:52 · Score: 3, Interesting

Trying programming in Java sometime outside of a website applet, kid, and maybe you will learn something.

I'm quite willing to concede the argument that programmer time is dearer than processor time; but why has the the Real Time Specification for Java, for instance, only been able to achieve 100 microsecond interrupt response times, when 2.0 microsecond response times aren't unheard of in other domains?

Performance? by cardpuncher · 2005-10-03 03:52 · Score: 2, Interesting

I'd be interested to know what the performance impact of this is - exactly what counts as "too much of a slowdown".

Application heap allocation has "traditionally" been fairly inexpensive unless the heap has to be grown (update a couple of free block pointers/sizes) and the cost of growing the heap (which requires extending the virtual address space and therefore fiddling with page tables which would on a typical CPU require a mode change) is mitigated by allocating more virtual address space than is immediately needed.

If free space is always unmapped then each block allocation will require an alteration to the page tables, as will each unallocation. Not to mention that could cause the page-translation hardware to operate sub-optimally since the range of addresses comprising the working set will constantly change.

If most allocations are significantly less than a page size, then the performance impact may be minimal since whole pages will rarely become free, but if allocations typically exceed a page size, that would no longer be true. If the result is that some applications simply implement their own heap management to avoid the overhead, then you've simply increased the number of places that bugs can occur.

Re:For Real Security by NullProg · 2005-10-03 05:23 · Score: 2, Interesting

Just curious,

Do you also advocate four times the memory usage and double the speed? Do you blame the language or the speaker when they can't formulate a proper sentence construct? How about we just teach the programmers better instead of bitching about the tool they use.

If half the C coders out there knew the differences between stack, heap, and namespaces, we would not even be debating this issue. Don't blame the coders, blame the universities.

Enjoy,

--
It's just the normal noises in here.

Re:Slowdown? by Retric · 2005-10-03 05:23 · Score: 3, Interesting

Java's RTspec is not designed around those applications. In the Real time world a lot of people are still using ASM because in plenty of applications programmer time is much cheaper than CPU time. If your paying for 10,000,000 CPU's then and 6 programmers then paying for 50c CPU's vs. 1$ CPU's is worth a lot.

I work with real time systems and 0.0001 seconds (100 microseconds) is plenty fast for most Human to Real time systems applications. Granted JAVA is not what you want for fine-tuning your Engines performance but it's plenty fast for most applications. What makes Java so useful is you get to avoid most of the really time consuming bugs. Compare a fully functional java based multithreaded HTTP server with the C / C++ equivalent and it's going to be 1/3rd as much code. And will operate at vary close to the same speeds. In other words it's designed around applications where programmer time is worth more than machine time. We already have C so Java was built around the 95% of applications that don't need inline ASM.

I have killed BSD UNIX with buggy C networking code which is the only thing I have been unable to duplicate with good Java code. You can do bit twiddling in Java, but it's faster in C. You can have hundreds of threads doing their own thing in either but it's much easer to do that in Java than C/C++. The secret is to know enough about how Java works so that you avoid things like creating new threads that eat up a lot of time. Once you understand how things work you can use things like Thread Pooling that are extremely efficient. Instead of complaining that concatenating Strings takes so long try learning about what other tools are out there like StringBuffer.

PS: A quick look at some fast Java code. (It is a bit dated but gives you some idea what I am talking about.)

Re:Slowdown? by liloldme · 2005-10-03 06:24 · Score: 3, Interesting

It doesn't take a genius to realize that is a signficiant task in an application of a real-world size.

And anyone who's run a JVM knows about the price of this task -- yes GC takes time.

However, as I understood the article, the author was making a point that the way most C programmers manage memory tends to make the task more time consuming than is necessary. Therefore relying on a known optimized implementation rather than reinventing the wheel every time may be preferred. After all, it is just the VM implementor that needs to understand how to optimize the memory management, not the application developers. So yes, where the time is spent is shifted but also the amount of total execution time spent on memory management can be reduced -- because the task is managed differently.

As for the specific details of this paper, they're basically discussing how to determine which objects can be safely allocated from the stack, instead of heap, and therefore can be discarded without the usual book keeping required from a heap GC.

how many high performance memory intensive Java applications are there

Java is so widely used on the server side and middleware, it cannot be difficult to come up with examples -- Tomcat, J2EE app servers, etc. eBay for instance advertizes very clearly on their front page to be powered by Sun's Java technology. There are individual Java systems that manage millions of transactions daily, and there must be thousands of systems out there that do this every day with Java.

Re:Slowdown? by eric76 · 2005-10-03 06:28 · Score: 3, Interesting

About 15 years ago I started using a technique to improve performance when doing lots and lots of very short term mallocs.

Essentially, I'd create a large ring buffer of malloced temporary buffers of some standard length. Any time a temporary buffer was needed, I'd grab the next one in the ring.

Before the buffer was provided to the function asking for it, the length would be checked. If the requested length was longer than the current length, the buffer would be freed and one of at least the proper length would be allocated. (I normally allocated by buffers in byte multiples of some fixed constant, usually 32.)

The idea was that by the time it was reused, what was already in the buffer was no longer needed. To achieve that, I'd estimate how many buffers might be needed in the worst case and then multiply that number by 10 for safety's sake.

My primary use of this was when doing enormous numbers of allocations of memory for formatting purposes. The function doing the formatting would request a buffer large enough to hold whatever it would need, write the formatted data into the buffer, and then return a pointer to the buffer. The calling function would simply use the buffer and never have to worry about freeing it.

The performance results were superb except in the very simplest cases where you allocated the buffers without ever using them.

I've never known anyone else who used this kind of approach although I've showed it to a large number of people.

Re:Slowdown? by TheRaven64 · 2005-10-03 07:12 · Score: 2, Interesting

And it can seriously increase the speed of a bit of code. I recently profiled some of my code and found that it was spending around 40% of its time in malloc and free. I created a very simple memory pool (no ring buffers, just a simple linked list with insertions and removals at the same end, nice and fast) for each frequently-allocated object type, and saw a huge speed increase.

--
I am TheRaven on Soylent News

Re:new method? by afidel · 2005-10-03 09:05 · Score: 2, Interesting

None of the applications which were actually broken by XP SP2 are something granny would run. On the other hand some of the applications that were affect were, but those were generally minor problems.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

Re:Heap Protection vs. Managed Code by 0xABADC0DA · 2005-10-03 09:27 · Score: 2, Interesting

You don't "rearrange" the MMU, you just check/clear the dirty bit in the page table. This bit is set by the hardware regardless so setting it is zero overhead. Clearing these bits is still only 0.006% the time of scanning all of memory. So unless you write to 32k unique pages per GC you get a huge benefit from this.

You could probably also use the MMU to reduce pauses in the gc... you determine what objects are unreachable in the background and using the dirty bit you can tell which pages may have references into the set, so instead of starting over from scratch again you just weed out the now-reachable objects from the set, which seems like it should be a much more tractable problem.

If I wasn't working I would do this. Check out jnode or jxos for example.

Solaris has had this for YEARS by sethmeisterg · 2005-10-03 15:38 · Score: 2, Interesting

Check out libumem -- much more powerful than this and has been around in Solaris for years (granted, the manpage should be better to reveal the powerful features -- but a quick look at the source at opensolaris.org reveals how to use the most advanced features).

27 of 365 comments (clear)