Memory Management Technique Speeds Apps By 20%

← Back to Stories (view on slashdot.org)

Memory Management Technique Speeds Apps By 20%

Posted by kdawson on Monday April 5, 2010 @12:04PM from the rememberance-of-data-past dept.

Dotnaught writes "A paper (PDF) to be presented later this month at the IEEE International Parallel and Distributed Processing Symposium in Atlanta describes a new approach to memory management that allows software applications to run up to 20% faster on multicore processors. Yan Solihin, associate professor of electrical and computer engineering at NCSU and co-author of the paper, says that using the technique is just a matter of linking to a library in a program that makes heavy use of memory allocation. The technique could be especially valuable for programs that are difficult to parallelize, such as word processors and Web browsers." Informationweek has a few more details from an interview with Solihin.

9 of 252 comments (clear)

Min score:

Reason:

Sort:

Beware the key term there: by Estanislao+Mart�nez · 2010-04-05 12:11 · Score: 5, Insightful

Beware the key term there: "up to."

--
Are you adequate?
Nothing to see here.... by Ancient_Hacker · 2010-04-05 12:11 · Score: 5, Insightful

Nothing to see here...
Moving malloc() to a separate thread does not do a thing for the putative word processor.
They might get some speedup if they take a lousy old malloc() and have one thread hold onto the locks.
But of course the *right* way would be to write a new malloc() that can from the get-go run re-entrantly and not require a bevy of slow locks.
1. Re:Nothing to see here.... by Anonymous Coward · 2010-04-05 12:20 · Score: 5, Funny
  
  new malloc()
  I see what you did there.
2. Re:Nothing to see here.... by Anonymous Coward · 2010-04-05 12:28 · Score: 5, Interesting
  
  Exactly, and they are even comparing it to the old and relatively slow Doug Lea allocator.
  If you want to test a new memory allocator, the benchmarks these days are the Hoard allocator, and the TCmalloc allocator from google. These alone will give you more than the 20% speedup mentioned in the article.
  However, that isn't the end of the story. There are proprietary allocators, like the Lockless memory allocator, that are about twice as fast as the older allocators which aren't optimized for multi-core machines.
3. Re:Nothing to see here.... by kscguru · 2010-04-05 14:16 · Score: 5, Interesting
  
  And now I've read their paper. Quick summary: (1) they do indeed speculatively pre-allocate heap blocks, and cache pre-allocated blocks per client thread. (2) They run free() asynchronously, and batch up blocks of ~200 frees for bulk processing. (3) They busy-loop the malloc()-thread because pthread_semaphore wakeups are too slow for them to see a performance gain (section 2.E.2-3).
  In other words, it's a cute trick for making one thread go faster, at the expense of burning 100% of another core by busy-looping. If you are on a laptop, congrats, your battery life just went from 4 hours to 1 hour. On a server, your CPU utilization just went up by 1 core per process using this library. This trick absolutely cannot be used in real life - it's useful only when the operating system runs exactly one process, a scenario that occurs only in research papers. Idea (2) is interesting (though not innovative); idea (3) makes this whole proposal a non-starter for anything except an academic paper.
  
  --
  A witty [sig] proves nothing. --Voltaire
20%?! by temojen · 2010-04-05 12:21 · Score: 5, Insightful

If most programs are spending 20% of their time on memory management, something is wrong.
1. Re:20%?! by guyminuslife · 2010-04-05 18:52 · Score: 5, Funny
  
  I was aware that malloc() had a price tag attached, but free()? That's misleading advertising.
  
  --
  I don't believe in time. It's a grand conspiracy designed to sell watches.
It's programmers that need parallelization by w0mprat · 2010-04-05 12:46 · Score: 5, Insightful

Because we learnt to program for a single threaded core with it's single processing pipeline since way back, using high level languages that pre-date the multi-threaded era, and it involves re-thinking how things are done on a fundamental level if we're ever to make proper use of 32, 64, 128 cores. Oh and we all know how many programmers are 'get off my lawn' types, myself included.

If I still coded much anymore it would drive me to drink.

--
After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
Re:free() is probably more parallizable than mallo by JessGras · 2010-04-05 13:11 · Score: 5, Informative

Now digesting the real paper at http://www.ece.ncsu.edu/arpers/Papers/MMT_IPDPS10.pdf, they do do a trick of making free() asynchronous to avoid blocking there, but they also do a kind of client-server thing, with a nontrivial but fast and dumb malloc client in the main thread.
Not bad. They really tried a lot of different stuff, thought some stuff out carefully. This reviewer approves!