Memory Management Technique Speeds Apps By 20%
Dotnaught writes "A paper (PDF) to be presented later this month at the IEEE International Parallel and Distributed Processing Symposium in Atlanta describes a new approach to memory management that allows software applications to run up to 20% faster on multicore processors. Yan Solihin, associate professor of electrical and computer engineering at NCSU and co-author of the paper, says that using the technique is just a matter of linking to a library in a program that makes heavy use of memory allocation. The technique could be especially valuable for programs that are difficult to parallelize, such as word processors and Web browsers." Informationweek has a few more details from an interview with Solihin.
Beware the key term there: "up to."
Are you adequate?
Nothing to see here...
Moving malloc() to a separate thread does not do a thing for the putative word processor.
They might get some speedup if they take a lousy old malloc() and have one thread hold onto the locks.
But of course the *right* way would be to write a new malloc() that can from the get-go run re-entrantly and not require a bevy of slow locks.
If most programs are spending 20% of their time on memory management, something is wrong.
Actually i've found the opposite. Java tends to be really good at transparent memory re-use. From experience if i have ~1,000,000 objects of the same type with the some constantly being deleted and replaced Java will run that program faster than a non-memory pooled C implementation (of course the memory pooled C implementation will be faster again).
In fact many of the benchmarks around that you see claiming Java is faster than C will use an example of a program that creates and destroys objects in a tight loop. The C program will be as written with tons of calls to malloc/free, the Java program will simply reuse the same parts of memory again and again without any system calls. These benchmarks are a bit misleading as the C program isn't optimised with memory re-use whereas the Java Garbage collector tends to do that implicitly.
Because we learnt to program for a single threaded core with it's single processing pipeline since way back, using high level languages that pre-date the multi-threaded era, and it involves re-thinking how things are done on a fundamental level if we're ever to make proper use of 32, 64, 128 cores. Oh and we all know how many programmers are 'get off my lawn' types, myself included.
If I still coded much anymore it would drive me to drink.
After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
That's already implemented in the high performance ssh patch, available here. Scroll down to the "Threaded CTR cipher mode" patch, if that's the only part you're interested in.
I've applied it to the openssh package on my Fedora 12 system. As is, it provides about 40% increased throughput on my quad-core. You may be able to get more by tweaking it to increase the thread count.
Can You Say Linux? I Knew That You Could.
Now digesting the real paper at http://www.ece.ncsu.edu/arpers/Papers/MMT_IPDPS10.pdf, they do do a trick of making free() asynchronous to avoid blocking there, but they also do a kind of client-server thing, with a nontrivial but fast and dumb malloc client in the main thread.
Not bad. They really tried a lot of different stuff, thought some stuff out carefully. This reviewer approves!
The type of system you're talking about is NUMA (Non-Uniform Memory Architecture), and yes, any OS worth its salt has supported it automagically for years. I think even Windows advertises support for NUMA now, whether it works is another question.
Game! - Where the stick is mightier than the sword!