Java Performance Urban Legends
An anonymous reader writes "Urban legends are kind of like mind viruses; even though we know they are probably not true, we often can't resist the urge to retell them (and thus infect other gullible "hosts") because they make for such good storytelling. Most urban legends have some basis in fact, which only makes them harder to stamp out. Unfortunately, many pointers and tips about Java performance tuning are a lot like urban legends -- someone, somewhere, passes on a "tip" that has (or had) some basis in fact, but through its continued retelling, has lost what truth it once contained. This article examines some of these urban performance legends and sets the record straight."
Urban legends are kind of like mind viruses
Are you new here? Usually people call this kind of thing a meme.
The best tip in the article, which really applies to any language (even to choice of languages), is IMHO
"Save optimizations for situations where performance improvements are actually needed, and employ optimizations that will make a measurable difference."
Also, Swing is a bloated pig.
SWT rules!
Interesting. A text that say that some things in java are not as slow as people belive, and yet it fail to deliver any prove anything. For ekample it say: synchronized methods are not slow, and yet it include no benchmark/test to backup that claim.
And about the strings example:
If you want't to prove that the Immutable string class is not slow the right way to do it is to make a program that make a lot of string operations and then compare the speed with one of the non Immutable string classes for java that exists.
I really wish that the slashdot editors would read the stories and dismiss the one widtout any information.
(Yes I know, the slashcode is free, so I could just make my own news-site but life is to short, and the studie take to much time.
One thing to remember is that Java is a 'marketed' language. Hence, be aware of inevitable corporate propaganda. That's not to say that Java is bad, but it is heavily pushed.
Here's a bit of an antidote: Why Java will always be slower than C++
Good advice. People sometimes seem to want to solve the problem before knowing what the problem statement is. While their actions may not degrade performance significantly, they often times do not help.
I've learned over time that everything is relative. There is no cut and dried right and wrong in a lot of cases, but degrees of both. The real answer depends on your need, and not all needs are the same.
But that's not really a good thing. Sun pushed on the JIT on the theory that that would address performance problems. It didn't. The Perl and Python runtimes are much slower than Java's, but Perl and Python applications generally start up much faster and are considerably more responsive.
Java is as sluggish as ever, and more bloated than it has ever been. What is really responsible for Java's poor performance for real-world applications is its class loading, memory footprint, and just plain awful library design and implementation.
Finding where your software spends most of its time can be hard. Having a tool measure resource/time consumption of the regiouns of source code is critical in finding bottlenecks and improving performance.
Why ? Because it depends so much on the performance optimizations the JVM employs.
Let's take them one by one:
<br>
<LI> Final methods and classes - when you call a final method from the same class you save a lookup in the virtual method table (there is no doubt about what method is going to be called, as it couldn't have been overwritten in a descendent), and furthermore you can inline that method. On a "stupid" JVM (read: from Sun) you won't see any difference, on an optimized one you will.
<LI> Synchronization can become a bottleneck on SMP systems, because it implies cache synchronization (exiting a synchronized block
is a memory barrier) - you clearly aren't going to see it on a single processor. But not using synchronization is just as bad (you should use synchronization with <b>all</b> variables that are shared, because you do want memory barriers for correctness)
<LI> Immutable objects - this one clearly depends on the garbage collector that you use.
<p>
Conclusion: the performance of these tricks depend on two things - your JVM and Amdahl's law (how often are these improvements going to manifest themselves)
<p>
The Raven
I'm all for calling a spade a spade, but you can't have your cake and eat it too.
JNI is the NATIVE INTERFACE. For those that don't already know, that's the interface to the underlying operating system. If the OS misbehaves, hiccups, or is inconsistent, when did it become JAVA's responsibility to clean up? When somebody decided that JAVA was getting a black eye because OS call foo(bar) was crashing the application, or better yet didn't behave exactly like foo(bar) on every OS that provides the JVM.
Don't like AWT? Well mabye that's because it's built on top of JNI. Enough said.
Don't like Swing? Well you'd better like AWT. If you don't want the OS to do your GUI work and you don't want the JVM to do your GUI work, mabye you should just get a dry erase marker. You can draw the boxes you need on the screen provided you use a tissue between display updates.
String requres no more attention than any other bit of JAVA code. If you create dozens of objects for the sole purpose of garbage collection, you either just learned JAVA, you're unaware of what you're doing, or you don't care.
And about garbage collection. JAVA's garbage collection may not be your cup of tea, but neither are the memory leaks that are still being cleaned up in systems that lack automatic garbage collection.
So pick your posion. If JAVA isn't perfect, that dosen't make it horrible. JAVA is a good language by most standards, but be honest by stating that it isn't good by your standards.
My biggest reason for liking JAVA is that it forces people to stop writing bad C code. Which is exactly what it was designed to do.
Remember the DOS days? Was was GWBASIC so slow? Why were QuickBASIC programs so slow? Why, later, were Visual Basic programs so slow?
Because those versions of BASIC were all, essentially, interpreted.
Java is essentially an interpreted language, despite JITs and JNIs and whatever.
Interpreted languages are slow. That's why no one writes full-blown applications in BASH.
My journal has hot
However, the article perpetuates another myth: "Synchronization should be easy. The more things you synchronize, the better off you are."
My hard experience says otherwise. First off, making multithreaded programs work correctly is very hard. Therefore, multiple threads should be avoided if at all possible. You can avoid a lot of these problems in many cases if you use a function like "select()" in a single-threaded program (which, IIRC, Java unfortunately doesn't support). Even though it looks harder to program, it ends up being easier to debug.
However, sometimes you just can't avoid threads. IMHO, adding "synchronize" as a language keword and encouraging easy creation of threads was a mistake. That doesn't begin to solve your problems. For example, it does nothing to help you avoid deadlocks. In fact, sprinkling synchronized blocks around your program is a recipe for deadlocks and unexpected timing-dependent buggy behavior.
If you must use multiple threads, there should be one main thread that runs almost all of the program's logic, and a set of highly constrained, carefully controlled worker threads. These threads should not interact with any other (mutable) data structures in the program. Ideally, there should be at most two synchronization points in the program: a work queue and a results queue. The elements of these queues should package up all of the state needed for a worker thread to solve a piece of a problem or deliver its results.
With an approach like this that has minimal synchronization, there's no need to add a keyword to the language or put synchronization into many library container classes. And of course, performance is hardly an issue at all when you only synchronize twice per worker thread run.
Yes, if I need speed, I use C, the same as anyone else. If I am writing a Web application, I use Java. That's an area where Java excels. And maybe I'll get lucky enough to be able to code a project in Assembly or Lisp, who knows? Programming does not follow the "jack of all trades, expert at none" theory. General concepts map well across the spectrum.
I find it discouraging that there are so many programmers who only want to learn as much about their job, as to merely be good enough . Don't they feel any pride, or any desire to excel at something?
Coders who can only handle one language should be paid minimum wage; that is all they are worth. That is because it is neither the language nor the implementation that is important. It is the knowledge of how to program which will ensure your career and pay your bills.
A lot of them are things that actually used to be good advice, but for some reason or another (changes in hardware, compilers, etc.) aren't anymore. For example, it used to be a good idea in C to iterate through arrays by incrementing a pointer and dereferencing it instead of incrementing an index and then using array subscripting -- that way you had one increment per iteration, instead of one increment plus one offset calculation (basically you saved the addition that takes place during the array subscripting). However, on many modern C and C++ compilers in many situations, array subscripting will actually be faster than the pointer-incrementing method, because it's easier for the compiler to perform certain optimizations with. [Reference: Michael J. Scott, Programming Language Pragmatics (Morgan Kaufmann, 2000).]
There's quite a bit of other stuff like this out there as well.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
I guess you already know that the OS reports each thread's memory seperately, even when all threads are in fact using the same chunk of shared memory?
I had a funny time discovering this one out for myself when a java program I launched used 2x more than my physical memory and swap space combined.
Of course you may ask me why I was running an enterprise application on a 486, but that's another story...
I can just as easily pull some facts out of my ass. Jees, you java guys are thin skinned. I work with Java everyday and I agree with the original poster - it's fucking slow as hell to start. Grow up. After driving a supposedly turbo-charged 4 cylinder car for a couple of years I too began to think it was pretty fast - then I had the pleasure of driving an old V8 - no comparison. Be objective. Everything has its strengths and weaknesses. Java's libraries are its strength and its startup time and memory footprint is definitely a weakness.
Java is not always slower. Java's interpreted nature is generally seen as a weakness, but it has advantages too. For example, the JIT has profiling data immediately at hand when doing optimization, whereas compiled languages won't. Even in cases where compiled languages do use profile feedback, it may not be representative of the current program usage.
Try writing a simple recursive Fibonacci number calculator in both C++ and Java. The Java one is faster, when using a JIT enabled JVM. Of course, that is a contrived example, but it shows that just-in-time compiling can be faster.
That's a dumb benchmark. That will tell you that the program *starts* slow...and that, perhaps, invoking an application for each time you want to do something would be roughly the equivalent of starting a new web browser (IE on Windows doesn't count) every time you want to look at a new web page...or booting your operating system every time you want to run a program.
People don't generally write one-off small apps they intend to run hundreds of times a day in java. That's not what it's designed to do.
If you want to compare performance, do something real-like and have it run once in your two or so languages to get a base, and then run them a few thousand times. Something more like this:
http://www.bagley.org/~doug/shootout/
-- The world is watching America, and America is watching TV.
There is a difference between using up all your memory and running out of memory. Java maxes out the memory that it's allowed to take in the circumstances I was talking about.
Suppose you have a tight loop that creates a few objects and then disposes of them. Further suppose that loop spikes the CPU, as would be the case for something under constant load or with a lot of data to process. Then you will be creating objects faster than the GC can get rid of them, because the GC is a lower priority than your main thread, which is doing the important stuff. GC is, by its nature, a lower priority task.
But sometimes it just needs to be higher priority. In the above case, the program will take up all the memory it's allowed. As you suggested elsewhere, you could simply not allow it to take up that much memory. But there might be valid cases where it needs that much memory. In these cases, virtual memory is the sad reality, but correct action is better than no action. However, you do not want all your cases to be this degenerate. So you are forced to decide between simple failure when much memory is needed, or maxing out a large portion of memory in all cases. I shouldn't have to make that choice.
As others have suggested, I could tweak the GC to deallocate faster. But this is a flaw. Java should make things simpler, not more difficult. Telling the computer when I'm done with an object is a simpler solution (to me, anyway) than having to tweak runtime parameters.
If I couldn't trust someone to use new and delete correctly, I wouldn't trust them to do much at all for me. Yes, there are many programmers that mess things up with these operators, but they aren't very good programmers. If you can't use new and delete or free and malloc correctly, then there's probably a lot of other things you can't do well either. Memory management is rather fundamental to computer science.
Now, the transparent memory access in C++ is certainly dangerous. But I'm not talking about direct access to memory. I'm talking about garbage collection on demand.
For example, I do not see how things would become much more dangerous in Java if you added a delete operator to complement the new operator. The operator would have the semantics that meant "delete, now". If you tried to access a deleted object, it would throw an exception. The level of danger afforded by such a feature would be vastly outweighed by the advantages wrought.
While you can't state that a given idiom will improve performance of a program, you also can't state that a given idom will have no effect, either. In particular, the author claims that making methods final has no effect, but fails to show it. The whole article was marked by the same lack of solid evidence the author was decrying. As far as I'm concerned, these Java idioms are still open to question, and the information content was minimal.
I used to be the best damn VB programmer there was until around version 6.0. Well, maybe not the best, but I sure did some cool stuff with VB. I got started years ago with GWBASIC too. IBM made one of the first PC compilers for BASIC called "BASIC Compilers 1.0 and 2.0" This was the pre-mouse era, 1985-87'ish days. For lack of a good editor, I used the GWBASIC interface. When we finally compiled with the 1.0 compiler, the code ran a good bit faster. Then, we got the 2.0 compiler. Oh god how slowly our code ran with 2.0. I don't know what happened with 2.0 but it just didn't cut it. We ended up selling our program compiled with the 1.0 compiler.
Later, around 1988, we got our hands on the first Microsoft QuickBasic compilers. The compiled code then ran like lightning! We found that the QuickBASIC's used the new P-code interpreter. I used QuickBASIC quite a bit. Later, in the early 90's, I changed to VB around version 2.0. VB was quite nimble for program size and execution speed. But when we got version 4.0, it would compile to native code. This was the best performance I ever got out of VB before I realized...hey, the language just isn't giving me what I wanted...eg...smaller size, and faster speed.
Actually, I had been programming in C since around 1989. And, the more I used C, the more skilled I got at writing C code. The break point came with VB 6.0. I just couldn't justify using VB anymore as it wasn't a good environment for writing command line system administration programs. I was entering into systems administration/programming as my main job, so my code needed to run fast, lean, and mean.
When the Java phenomenon hit, I saw clearly from my VB experience that this just wasn't going to be a pretty scene. On the one hand, you had plenty of newbies who really didn't know anything about programming spouting the Java mantra. And, with the Internet, it seemed the entire world was going Java. The two bubbles of Java and the Internet seemed to go hand and hand.
The state of Java now is pure and simply...just bloat. For example, just download and run Morpheus. The thing is written in Java, and for just a simple application, uses 54 Meg. What the heck, 54 Meg!!!??? And, this C# stuff that Microsoft is producing just isn't any better. A simple C# hello world program uses 11 Meg!
I just can't use applications that take seconds to minutes to load up, then use up all my memory, then run slow as Christmas. I'm a systems programmer, I write applications that load, do what they need to do as fast as possible, then completely unload and get out of the way. My apps have to run background'ed so they don't interfere with the users. I don't want my memory used for program space, it's supposed to be used for data space!
You might say that memory, disk, and GHz are cheap now. Well my response is...so are your coding skillz! The one thing I've realized over the years is that the more time you spend programming in these high level languages, the more real experience you lose where you could be coding smaller and more efficiently nearer to the core. I'm trying to say that maybe I wasted a good bit of time in VB when I shouldn't have. The same goes for the Java guys. We had a great programming language called C, and C++ that got pushed to the back-burner because of some people who decided that garbage collection and VM's were a "good thing". Well, maybe they are, but at the expense of expertise in programming? I mean come on, if you can't handle pointers, can you really call yourself a programmer?
I'm so tired of programming environments that keep people from shooting themselves in the foot, or coddle beginning coders. Programming is a skill and an art. It is a good bit like architectural design. Java and VB just turn things into manufactured housing. Do you "want" to live in a double-wide home, or one that was designed to be functional, strong, and beautiful?
I suppose I'm being too hard on everyone here. Heck, I still use VB
If you're going to bitch about somebody else not showing numbers, then why would anybody believe you when you don't post any numbers of your own? Especially if 2 out of the 3 things you mention are your own damn fault (using too many immutables and building too many objects). I wouldn't be surprised if you blamed lock contention for your slow programs (your #1)
That's an excellent example of what I consider the best rule for proper coding: make your goal as clear as possible. The code using array subscripting is easier for less-experienced maintenance programmers to work on and doesn't need to be replaced when you find a clever compiler does a better job.
Jesus of Nazareth did not die so we could enjoy eggs and chocolate bunnies!
Are you sure? Christians enjoy eggs and chocoloate bunnies because despite the attempts of many clueless religious nuts, Christianity did not succeed in stamping out the natural human instinct for enjoying life, in the form of festivals and celebrations in which everyone can share - not just those who believe that their life is controlled by an imaginary being.
Celebrations of renewal and fertility in springtime, involving common symbols such as eggs and rabbits, were widespread before Christianity came along, and they continue today. In one sense, your sig is accurate, in that Jesus of Nazareth has nothing to do with it, other than the fact that the churches that exploit his name co-opted other traditions as part of their relentless assimilation of followers. But in another sense, the sig is wrong, since based on his record, I suspect Jesus of Nazareth would have no problem with people enjoying eggs and chocolate bunnies, even - and perhaps especially - on the anniversary of his death.
Most StringBuffer objects are not used from multiple threads. If you need synchronization, use it. In general you don't.
In particular, if you use a+b on String objects, the compiler will translate to use of a StringBuffer, even though that StringBuffer is manifestly not accessible from any other thread. I don't know if the JVM will optimize away the synchronization -- if it does, the analysis to allow it to do so is an overhead anyway.
The problems with StringBuffer are varied:
1. It is overcomplicated because it uses reference counting to avoid a single object allocation.
2. It guarantees thread safety even though that is not required for most of its uses.
3. The thread safety protects the integrity of the StringBuffer, but is not enough for most code that could use a StringBuffer from multiple threads: th e granularity of the locking is too fine. This is the same flaw that lead to the introduction of the unsynchronized collection classes (such as ArrayList to replace most uses of Vector).
4. It is overkill for the uses the Java compiler makes of it in performing string concatenation, and a performance penalty is paid.
One other interesting thing to note is that my UnsychronizedStringBuffer did not have a default constructor. This forced developers to specify an estimate for the size of the buffer, and so avoided most reallocations -- another performance win, just by restricting an interface.
I'm using JBuilder 8 this week. It's as good as any other desktop application on my Thinkpad R32. Sometimes I demo it to people and the coup de grace is always "... and it's written in java." which is usually responded with "but... I thought java was slow!".
Bad interactive java is easy to write, just like bad MFC applications are easy to write.
The C++ world is full of myths about what does and doesn't enhance performance. Amongst my favourites...
In each of these cases, there is some overhead involved if you actually use the language feature, but generally not otherwise with any recent compiler. However, those overheads are usually less than hand-crafting the equivalent functionality (e.g., long jumps, function look-up tables a la C) would incur. Furthermore, if you actually understand the implications of these features, you can keep the overhead way down. The next time I see someone criticise templates for code bloat, and then demonstrate in the next post that they've never come across templated wrappers for generic base classes, I'm going to have to lecture them. }:-)
On the flip side...
Most of these get much more credit than they deserve. The first is true often, but not always: it sometimes shafts the optimiser in many compilers. The second is not true with any recent compiler. The third is true sometimes, but not nearly as often as you might expect: optimisers miss many of the apparent (to humans) possibilities anyway, and spot some of the others with or without a const there.
As always, the rule of thumb is to write correct, maintainable code first, and then to use compiler-specific, profiler-induced hackery where (and only where) required. Whether you're writing a database or a graphic engine, this is pretty much always good advice.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.