Java Urban Performance Legends
An anonymous reader writes "Programmers agonize over whether to allocate on the stack or on the heap.
Some people think garbage collection will never be as efficient as direct memory management, and others feel it is easier to clean up a mess in one big batch than to pick up individual pieces of dust throughout the day. This article pokes some holes in the oft-repeated performance myth of slow allocation in JVMs."
This article is actually debunking some people's reasons why Java has poor performance. It does little to debunk my actual real world experience that it *is* slow. I'm glad to see that performance has increased alot, but I remember some all (well 90% or something) Java applications, like the original JBuilder, that made me want to claw my eyeballs out when using them. Those apps and other early apps are where Java's performance issues really took hold in many people's psyche.
Additionally, even if it takes more instructions per call, glibc malloc() will return freshly used "hot" memory to you, while the JVM will return the coldest memory to you--the author of the article admits that this is a problem, but it is worse than they say.
First, you definately have an L1 cache miss. On a P4, that is a 28 cycle penalty. You also likely have an L2 cache miss. Fast DDR2 memory has an access latency of about 76 ns--call it 220 processor clock cycles. If you miss the D-TLB, then add another 57 cycles. Total cost of your "fast" allocator--305 cycles of memory access latency. 305 is somewhat higher than 100.
Even worse, if you are on a system with a lesser amount of memory, you may miss main memory entirely, causing a page fault. That'll cost you several milliseconds waiting for the disk. That really really hurts, since 8 milliseconds latency from disk is twenty-four million cycles at 3 GHz. 24 million, 24,000,000, 2.4 * 10^6, no matter how you write it, that's a lot of cycles to allocate memory.
All of a sudden, 100 instructions to hit hot memory seems cheap.