Java Urban Performance Legends
An anonymous reader writes "Programmers agonize over whether to allocate on the stack or on the heap.
Some people think garbage collection will never be as efficient as direct memory management, and others feel it is easier to clean up a mess in one big batch than to pick up individual pieces of dust throughout the day. This article pokes some holes in the oft-repeated performance myth of slow allocation in JVMs."
How many JVMs can you afford to run on your system for different apps, and how can you make sure they are all the right size, the garbage collectors are in an appropriate mode that can keep up with generated garbage, etc. I can run lots of native apps, which in many cases have no need for a significant heap like Java brings into the picture in far less space than a single JVM. A JVM runs much heavier on the system, and when I run Netbeans, it is continuously on the verge of eating my 1.2 GB powerbook alive, in fact I have to frequently restart Netbeans to get memory back. It has a long way to go in real practical terms even if they have theoretical solutions to some of the problems. I am porting my server software away from Java for the same reasons. This is JDK 1.4, and I am about to try it on 1.5, but I don't think there is that much difference.
This article is actually debunking some people's reasons why Java has poor performance. It does little to debunk my actual real world experience that it *is* slow. I'm glad to see that performance has increased alot, but I remember some all (well 90% or something) Java applications, like the original JBuilder, that made me want to claw my eyeballs out when using them. Those apps and other early apps are where Java's performance issues really took hold in many people's psyche.
the people who keep telling me allocation in Java is slow (much slower than 10 instructions) are generally experienced Java programmers. I use Ocaml, so I'm quite aware of how fast a generational garbage collector can be (btw, on the x86 in Ocaml, allocations are only 5 simple instructions). But from all first hand reports I've heard, Java allocation is still slow. It may be faster than C++, but it's still slow.
(I'd like to see someone make a cutting edge game in Java).
There are lots of games done in Java, mainly for mobile phones through J2ME. A few of them might have cutting edge gameplay (though I've yet to see one), but it is unlikely to see cutting edge graphics. Still, there are some pretty impressive things you can do with JOGL.
Being bitter is drinking poison and hoping someone else will die
But all the examples in the article aren't tested in practise. Maybe the escape analysis the author describes works as advertised. But without actually testing and analysing real code produced and without actual benchmarks the article doesn't proof a thing
The author is giving you a high-level, greatly edited view of some of the major optimization techniques in use today. The original academic/technical papers on which that is based demonstrate/measure their techniques on various benchmarks.
One thing the author seems to forget is that you would at least need a fallback mechanism when a method is overridden in a subclass
This is well-known. Code-patching, on-stack replacement, pre-existence etc are used in production jits to recover from that kind of thing. This paper has a decent summary of current approaches:
You've clearly never worked a medium to large project in c (c++ is easier because you can make much more use of the stack). You forget sometimes, sometimes you don't know (other people's libraries) and most of the time you mess it up while debugging logic issues and totally forget to fix it cause your happy the logic part is working.
That said. I usually spend about 2% of my time, tops, looking for memory leaks. I spend like 15% of my time checking for memory violations though... But most of that time is waiting for a slow machine to run valgrind (if you think java feels slow).
I think the slow part of Java is actually Swing, something about that toolkit used to seem slow (Java 2), like slower than gtk+.
I've found that Java is about as quick as c for heavy processor intensive calculations (like calculating prime numbers); which is impressive since python was 30 times slower. My trouble with Java is its lack of easy connectivity with a more basic language; all the best languages have done that (c allows assembly).
But yea, Java is pretty stinkin' awesome these days.. And he's right, every c program knows that malloc is slow and its speed unpredictable. If you do a big malloc after tons of small frees, you'll be waiting for it to run the whole list and condense it!
On the graphics thing, I don't see why not Java. Next gen phones will have HW accelerated 3d graphics. Check out the TI OMAP 2420 embedded multimedia processor specifications for a look at what the future may bring.
The stack comes for "free" in C. If you need to store something too large, you need the heap. But then, you can allocate it once, and don't even consider freeing it until you finish with it
Or NOT consider freeing it when you forget about it... as the case may be.
Try writing a Java program that eats less than 32k.
No problem at all.
Being bitter is drinking poison and hoping someone else will die
Really? Can provide a few REAL world examples? Can you name one? Personally, I'm running Azureus and Netbeans right now, and they're not perceptably different from C++ desktop applications like KDevelop or OpenOffice.
I have programmed (in many different languages) since I was about 9 and regard myself as a good programmer - who doesn't? ;) Anyway, I now hire and manage programmers and have interviewed probably about a hundred people for programming positions. Most of them are total shit. They couldn't program their way out of a paper bag. Their resume looks great and they have all the acronyms and say the right things. Then I ask them to code a function to reverse the order of a string with allocating as little extra memory as possible. Less than 50% of people can do it right. A simple algorithm like this is easy to check in your head and they just don't get it.
One of the other more "nasty" questions I ask is for people to do a number to hex-string conversion algorithm. I've only have ONE person do it right. Most just stare at me. They don't understand binary! They mostly have CS degrees so they were taught, but their brains don't really get it. It's very interesting because most people don't understand what a number is at a base level - it's astounding. I could go on...
Actually, Java implementations are allowed to ignore System.gc() calls. It's just a suggestion to the runtime.
Sun is introducing shared memory in the new VM that should alleviate this somewhat.
The idea is that the majority of the "fatness" comes from the libaries. If you only have to load the core libraries once, and each VM can use them then additional VMs won't add so much overhead.
If this works, I'm wondering, will we get side-effects from the over-abuse of static member variables.
----- If communism is a system where the government owns business, what do you call a system where business owns govern
In some aspects, yes, it is much faster. For example, I can tell the difference between Java 1.4 and Java 5 (1.5) on the same hardware. My company's website was upgraded to Java 5 and performance increased significantly, mainly due to garbage-collection improvements. So I'd say that the same holds for the older JVMs. However, there may be other things added to the JVMs that make them less suitable for older hardware. I know that class libraries and app frameworks have gotten bigger as time goes on, so that would definately impact things in terms of memory usage.
Bjarne Stroustrup : why so many people use "C++" just as "C" ?
Dennis Ritchie : because you named the language "C++", not "++C"
The typical home machine these days is still sub-ghz, and Java performs so poorly as to be unusable on such machines.
A typical mobile phone is sub-ghz too, and there is plenty of J2ME software running on them...
Java rocks on limited devices AND as server software. It is only on the desktop it isn't a big hit. Yet.
Being bitter is drinking poison and hoping someone else will die
In C/C++ you can wrap malloc() so that if it fails, you can start dumping cache until it succeeds. The solution is elegant, dare I say beautiful.
In Java this is totally impossible.
In Java there are things called weak references which you can use for this exact purpose. A weak reference is a pointer to a pointer, but the pointed-to object is considered available for garbage collection (as long as it isn't referred to anywhere else). This means you can do this:
This, obviously, could be handled inside the cache for better encapuslation. But the point is that when garbage-collection runs, the items in the cache are fair game for collection. The obvious problem is that the GC will reap objects in random order, or it may even reap the most-recently-allocated objects first. But if that's a problem for you then you can probably make your cache smarter, such as keeping strong references to newer objects.
Here's a link in case you ever get the urge to write some Java code.
While I'm sure you can point to a Java program which takes 5 minutes to start, these are the exception in the Java world, not the rule. In fact, the JVM takes seconds (a small number of seconds, less than 5) on all of my systems, including P3's as well as older AMD's. It takes less than 2s on my Athlon 4800+. I can go from Tomcat not running to it being completely started up and serving requests in 4s on my PC, and 11s on my older G5. My IDE, which is extremely feature-rich, takes ~90s to get to where I can start typing code from a completely cold start, in a middling (30K LOC) project. That includes a ton of lib caching and such, so that it can do incremental compilation in real time. It takes 16s to restart once that caching is done.
.Net in memory for your C# program, and that's no small chunk of RAM. The first time you start up a .Net app of any size you're going to be waiting for it to draw for just as long. And it's no better for scripts, just how fast does Mono start up anyway?
As far as guzzling RAM goes, all programs these days guzzle RAM, and they're right to do so, because RAM is cheap. This is not unique to Java, the IDE I mentioned uses 56M once it's really going, surely not an unreasonable amount of memory for such a complex app. Java is still not the best choice for small scripts, Python would be better if your program is going to be mostly startup/shutdown time. C# is no better than Java though. Don't forget you need most of
However, your solution is interesting. Using the null termination for your temporary space is an interesting way of saving a byte. But you're right, I'd prefer an actual variable to be used.
Ever considered a career in programming?
It sounds like there is something wrong with your laptop. I have a 1.3 GHz desktop machine at home. I run Azereus and it's as fast as any other application.
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
> Delphi is Windows only and owned by a company with a very questionable future (forget about Kylix). C# is > basically Windows only as well.
The Borland implementation of Delphi is, but others aren't:
http://lazarus.freepascal.org/
If Java was as bad as you say it would NOT be so popular in embedded devices. There is a huge disconnect between the archaic concept of Java as a slow pig and the real world popularity of Java on small systems like printers and cell phones.
Doesn't that use J2ME though? And, I would assume, J2ME gives you less facilities than J2SE and J2EE, so are we comparing like with like?
Not saying, you're wrong, I just want to know if the comparison is valid.
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Additionally, even if it takes more instructions per call, glibc malloc() will return freshly used "hot" memory to you, while the JVM will return the coldest memory to you--the author of the article admits that this is a problem, but it is worse than they say.
First, you definately have an L1 cache miss. On a P4, that is a 28 cycle penalty. You also likely have an L2 cache miss. Fast DDR2 memory has an access latency of about 76 ns--call it 220 processor clock cycles. If you miss the D-TLB, then add another 57 cycles. Total cost of your "fast" allocator--305 cycles of memory access latency. 305 is somewhat higher than 100.
Even worse, if you are on a system with a lesser amount of memory, you may miss main memory entirely, causing a page fault. That'll cost you several milliseconds waiting for the disk. That really really hurts, since 8 milliseconds latency from disk is twenty-four million cycles at 3 GHz. 24 million, 24,000,000, 2.4 * 10^6, no matter how you write it, that's a lot of cycles to allocate memory.
All of a sudden, 100 instructions to hit hot memory seems cheap.
And I can always tune garbage collection performance by forcing a garbage collect when I know my app's got the time, like outside of a loop or before creating more objects in storage.
No, you can't. At most you can give the JVM a hint that it would be nice to collect garbage now. The JVM is free to ignore that hint, and many JVMs will do just that.
Blasphemy is a human right. Blasphemophobia kills.
Did you miss the part of the article where he explains how the new JVM performs memory allocation on the stack through escape analysis?
The allocators in the current JVMs are smart enough to optimize for cache misses
Allocation in any modern JVM is a pointer increment, nothing more
So which is it? Smart enough to optimize for cache misses or just a pointer increment? You can't have both.
The author of the article claims that, for 'new' objects, a modern VM uses a "stop-and-copy" style garbage collector with a few twists to cheapen the liveness detection (i.e. escape analysis). A disadvantage of such an allocator is that any new allocation to the heap will happen in some of the least-recently used memory. Also, every time you GC, you have to relocate all of your memory, forcing you to horribly pollute your cache, and turning all of your previously hot pages cold.
unlike malloc() which returns whatever the OS deems ok at the moment.
malloc() is a library call, not a system call. The OS is only involved if you need to extend the heap or if you've just allocated a very large object. So far as 'deems ok at the moment', the most recently used memory is what is generally returned (plus or minus size fitting to keep fragmentation under control). This maximizes locality, and minimizes your physical footprint. With a smaller resident set, more physical memory is available for things like other programs, disk buffers/caches, etc. which leads to even better performance.
A simple allocator like described in the article may be "cheap" in terms of cycles, but it is very expensive in its side-effects.
OK, I'll enlighten you. Well, not you, who stubbornly refuse to be enlightened, but somebody else who may be reading.
With a stack or with copying GC, you do allocation by doing a simple add (or subtract) and a check for overflow. On many architectures the check can be implemented using dynamic address translation (virtual memory) capability of the hardware.
With a stack you must pop the stack when you're done. Another simple arithmetic operation. With garbage collection, you do nothing at this time. Running score: stack 2 operations (push,pop); GC 1 operation (allocate).
But with GC you eventuallly run out of space so you copy all the in-use storage. Here's a formula:
That is, the total number of pieces of in-use storage is strictly less (and in practice substantially less) that the number of allocations. Running score: stack 2 operations * #allocations; GC 1 operation * #allocations + 1 operation * (substantially less than #allocations).That is, for each allocation a stack regimen does a push and a pop, and GC does an allocation and some fraction (substantially less than 1) of a copy.
While I'm at it I'll point out that there are many cases in which procedures may be implemented without using dynamic allocation - stack or GC-ed - at all . Your allusion to that mystical compiler that works some sort of stack magic "for free" is simply wrong.