Java Urban Performance Legends
An anonymous reader writes "Programmers agonize over whether to allocate on the stack or on the heap.
Some people think garbage collection will never be as efficient as direct memory management, and others feel it is easier to clean up a mess in one big batch than to pick up individual pieces of dust throughout the day. This article pokes some holes in the oft-repeated performance myth of slow allocation in JVMs."
These java urban performance legends are rubbish - java is highly performant in a rural or urban setting.
How much time have I spent with Electric Fence and valgrind finding memory leaks in my C programs? In Java, the auto garbage collection is as good as Perl's, without that tricky "unreadable code" problem ;). And I can always tune garbage collection performance by forcing a garbage collect when I know my app's got the time, like outside of a loop or before creating more objects in storage.
--
make install -not war
JVM memory allocation isn't "SLOW". It's just pleasantly unhurried.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
JVMs are surprisingly good at figuring out things that we used to assume only the developer could know.
:) ) they weren't. The performance of the language has greatly improved while the perception of language has remained the roughly same (at least amoung the general coding community).
Yes they are. Now. 10 years ago when Java applets were being embedded in webpages (to show rippling water below a logo
Just goes to show that even if you have a great technical product you'll still need the marketdriods. Unfortunately.
http://twitter.com/onion2k
This article is actually debunking some people's reasons why Java has poor performance. It does little to debunk my actual real world experience that it *is* slow. I'm glad to see that performance has increased alot, but I remember some all (well 90% or something) Java applications, like the original JBuilder, that made me want to claw my eyeballs out when using them. Those apps and other early apps are where Java's performance issues really took hold in many people's psyche.
First post here from my Java workstation. Take that!
Don't even get me started on C#.
.Net violates the EULA. And we
wouldn't want that!
I should hope not! Any form of benchmarking of Microsoft's
So when Microsoft declares their interpreted inverse-polyglotic language as "faster" than compiled pure C, just accept it. Best for everyone that way.
the FACT is that people's real-world experience, no matter how anecdotal, consistently demonstrates that Java is MASSIVELY slow than similar apps in C or C++.
Well, the linked article contained a number of what I will graciously call "assumptions" (rather than "outright lies") about allocation patterns in C/C++ that simply don't hold true in most cases.
For example, the parent mentions the old "stack or heap" question... Which no serious C coder would ask. Use the heap only when something won't fit on the stack. Why? The stack comes for "free" in C. If you need to store something too large, you need the heap. But then, you can allocate it once, and don't even consider freeing it until you finish with it (generational garbage collection? Survival time? Gimme a break - It survives until I tell it to go away!). As for recursion... You can blow the stack that way, but a good programmer will either flatten the recursion, or cap its depth.
And the article dares to justify its "assuptions" by comparing Java against a language interpreter such as Perl. Not exactly a fair comparison - Yes, Perl counts as "real-world", but in an interpreter, you can't know ahead of time anything about your memory requirements. At best, you can cap them and dump the interpreted program if it gets too greedy. Now, some might point out that Java gets interpreted as well - And I'll readily admit that, for doing so, it does a damn fine job with its garbage collection. But if you want to compare Java to Perl, then do so. Don't try to sneak in a comparison to C with a layer of indirection.
One last point - The article mentions that you no longer need to use object pooling. SURE you no longer need it - Java does it implicitly at startup. You can avoid all but a single malloc/free pair in C as well, if you just steal half the system's memory in main(). Sometimes that even counts as a good choice - I've used it myself, with the important caveat that I've done so when appropriate. Not always. I don't have malloc() as the second and free() as the second-to-last statements in every program I write. And that most definitely shows in the minimal memory footprints attainable between the two languages... Try writing a Java program that eats less than 32k.
Then use Delphi, or better yet, C#. (or even Python and a few other choices)
Faster productivity, less bugs, no ram guzzling 5 minute startup. Java isn't the only language that reduces development time, it's just the only one (besides VB) that makes you sacrifice big things to get it.
A *good* C++ programmer will probably write code that outperforms the equivalent in Java. A *good* C+ programmer will remember to deallocate all of his objects to prevent memory leaks. A *good* C++ programmer will copy his strings correctly to prevent buffer overflow exploits.
If you have been involved in developmnent for any reasonable amount of time or worked on projects of reasonable size, you know that *good* programmers are hard to come by. When you add the real world to the picture you find that simple things like garbage collection and a virtual machine can make a mediocre java programmer outperform a mediocre C++ programmer.
If schools actually learned to produce good programmers, and HR departments learned how to identify them, and job interviews verified them, we wouldn't be having this discussion.
----- If communism is a system where the government owns business, what do you call a system where business owns govern
Programmer cycles are expensive.
Indeed. It might be worth (pardon my pun) reiterating what those cycles really are, in regard to application performance.
In all languages I know of, you get some library functions ready-made, and you need to code some stuff yourself.
Most performance problems occur in the code you made yourself.
In my experience, you get most bang for buck when you are able to efficiently allocate your programmer time to a) program a functionally complete draft version, b) optimize those parts which need optimization and c) maintain the program, in a manner which is BALANCED, but biased towards maintenance.
De facto, you get better balance between those things, and most bang for buck, using languages such as Java, as opposed to languages such as C++, because (say) Java offers a pretty coherent conceptual framework (class libraries) for creating your draft in a maintainable way, provides default access to excellent non-invasive performance measurement tools such as YourKit and JProfiler which let you objectively find out where you need to do performance work.
This means you can do only the optimization work that is necessary, and create optimized packages which extend the default class library interfaces which means that generally maintenance programmers don't have to put nearly as much work into figuring out how the optimizations affect the draft work.
It's not perfect, but it gets you more bang for buck, which is what matters to you when you manage resources.
Not the default developer perspective, I know.
The article's main point is that Java's memory allocation is faster than malloc, and it's garbage collection is better than cumulative free's.
However, thats not the problem. All memory in a Java program has to be allocated dynamically. Other languages offer static memory alternatives. Static memory use will be more efficient in many cases.
The my language is faster than yours argument is inherently stupid. There is no "best" language. You need to use the right tool for the right job.
--Barry
It's not really apples to oranges to use a smaller profile JRE for creating smaller profile programs. The JRE includes libraries and features such as threading and such. Let's see you build a C program that uses pthreads and one or two major libraries and still have your memory use come in under 32K. Remember, apples to apples, right?
Except, you're wrong. You're assuming that the compiler is doing *no* optimization. So, how would a compiler treat the first "inefficient" code example? Let's take a look:
public double getDistanceFrom(Component other) {
Point otherLocation = other.getLocation();
int deltaX = otherLocation.x - location.x;
int deltaY = otherLocation.getY() - location.getY();
return Math.sqrt(deltaX*deltaX + deltaY*deltaY);
}
Look at the implementation of getLocation(). It's a simple non-recursive procedure. In virtually all compilers, a simple procedure like this is going to be inlined, producing:
public double getDistanceFrom(Component other) {
Point otherLocation = new Point(other.location) ;
int deltaX = otherLocation.getX() - location.getX();
int deltaY = otherLocation.getY() - location.getY();
return Math.sqrt(deltaX*deltaX + deltaY*deltaY);
}
(Note that to the compiler, all variables are effectively public, so this would not be an access violation.)
Next, the compiler can (easily) prove (by inspection) that Point(Point) is a copying constructor. As a result, it can replace the use of new Point(other.location) with other.location so long as it proves it will not modify otherLocation, and of course, it can prove this quite easily by inspection of getDistanceFrom(). This results in:
public double getDistanceFrom(Component other) {
Point otherLocation = other.location ;
int deltaX = otherLocation.getX() - location.getX();
int deltaY = otherLocation.getY() - location.getY();
return Math.sqrt(deltaX*deltaX + deltaY*deltaY);
}
Also, note that getX() and getY() are also simple non-recursive leaf procedures, so the compiler would have inlined them at the same time, so you would actually get code equivalent to this:
public double getDistanceFrom(Component other) {
Point otherLocation = other.location ;
int deltaX = otherLocation.x - location.y;
int deltaY = otherLocation.y - location.y;
return Math.sqrt(deltaX*deltaX + deltaY*deltaY);
}
Which is now faster that your hand-written "optimization." Note in fact, that you "over-optimized" by replacing otherLocation with other.location twice. If a compiler were actually dumb enough to implement it that way, literally, it would have involved an extra (and needless) pointer dereference to get the value of other.location twice, when it already had it sitting in a register. (Fortunally, most any compiler nowadays would have caught your "optimization" and fixed it.)
In fact, the compiler wouldn't stop here. It might even rearrange the order in which it fetches x and y to avoid cache pollution and misses. Do you consider that each time you fetch a variable? Most programmers don't, and shouldn't. The moral of the story is that most programmers fail to realize how smart compilers have become. Compiler writers design optimizations with "good coding practices" such as this in mind, and the programmer will be rewarded if he or she uses them.
Additionally, even if it takes more instructions per call, glibc malloc() will return freshly used "hot" memory to you, while the JVM will return the coldest memory to you--the author of the article admits that this is a problem, but it is worse than they say.
First, you definately have an L1 cache miss. On a P4, that is a 28 cycle penalty. You also likely have an L2 cache miss. Fast DDR2 memory has an access latency of about 76 ns--call it 220 processor clock cycles. If you miss the D-TLB, then add another 57 cycles. Total cost of your "fast" allocator--305 cycles of memory access latency. 305 is somewhat higher than 100.
Even worse, if you are on a system with a lesser amount of memory, you may miss main memory entirely, causing a page fault. That'll cost you several milliseconds waiting for the disk. That really really hurts, since 8 milliseconds latency from disk is twenty-four million cycles at 3 GHz. 24 million, 24,000,000, 2.4 * 10^6, no matter how you write it, that's a lot of cycles to allocate memory.
All of a sudden, 100 instructions to hit hot memory seems cheap.