Java Performance Urban Legends
An anonymous reader writes "Urban legends are kind of like mind viruses; even though we know they are probably not true, we often can't resist the urge to retell them (and thus infect other gullible "hosts") because they make for such good storytelling. Most urban legends have some basis in fact, which only makes them harder to stamp out. Unfortunately, many pointers and tips about Java performance tuning are a lot like urban legends -- someone, somewhere, passes on a "tip" that has (or had) some basis in fact, but through its continued retelling, has lost what truth it once contained. This article examines some of these urban performance legends and sets the record straight."
The best tip in the article, which really applies to any language (even to choice of languages), is IMHO
"Save optimizations for situations where performance improvements are actually needed, and employ optimizations that will make a measurable difference."
One thing to remember is that Java is a 'marketed' language. Hence, be aware of inevitable corporate propaganda. That's not to say that Java is bad, but it is heavily pushed.
Here's a bit of an antidote: Why Java will always be slower than C++
But that's not really a good thing. Sun pushed on the JIT on the theory that that would address performance problems. It didn't. The Perl and Python runtimes are much slower than Java's, but Perl and Python applications generally start up much faster and are considerably more responsive.
Java is as sluggish as ever, and more bloated than it has ever been. What is really responsible for Java's poor performance for real-world applications is its class loading, memory footprint, and just plain awful library design and implementation.
I'm all for calling a spade a spade, but you can't have your cake and eat it too.
JNI is the NATIVE INTERFACE. For those that don't already know, that's the interface to the underlying operating system. If the OS misbehaves, hiccups, or is inconsistent, when did it become JAVA's responsibility to clean up? When somebody decided that JAVA was getting a black eye because OS call foo(bar) was crashing the application, or better yet didn't behave exactly like foo(bar) on every OS that provides the JVM.
Don't like AWT? Well mabye that's because it's built on top of JNI. Enough said.
Don't like Swing? Well you'd better like AWT. If you don't want the OS to do your GUI work and you don't want the JVM to do your GUI work, mabye you should just get a dry erase marker. You can draw the boxes you need on the screen provided you use a tissue between display updates.
String requres no more attention than any other bit of JAVA code. If you create dozens of objects for the sole purpose of garbage collection, you either just learned JAVA, you're unaware of what you're doing, or you don't care.
And about garbage collection. JAVA's garbage collection may not be your cup of tea, but neither are the memory leaks that are still being cleaned up in systems that lack automatic garbage collection.
So pick your posion. If JAVA isn't perfect, that dosen't make it horrible. JAVA is a good language by most standards, but be honest by stating that it isn't good by your standards.
My biggest reason for liking JAVA is that it forces people to stop writing bad C code. Which is exactly what it was designed to do.
However, the article perpetuates another myth: "Synchronization should be easy. The more things you synchronize, the better off you are."
My hard experience says otherwise. First off, making multithreaded programs work correctly is very hard. Therefore, multiple threads should be avoided if at all possible. You can avoid a lot of these problems in many cases if you use a function like "select()" in a single-threaded program (which, IIRC, Java unfortunately doesn't support). Even though it looks harder to program, it ends up being easier to debug.
However, sometimes you just can't avoid threads. IMHO, adding "synchronize" as a language keword and encouraging easy creation of threads was a mistake. That doesn't begin to solve your problems. For example, it does nothing to help you avoid deadlocks. In fact, sprinkling synchronized blocks around your program is a recipe for deadlocks and unexpected timing-dependent buggy behavior.
If you must use multiple threads, there should be one main thread that runs almost all of the program's logic, and a set of highly constrained, carefully controlled worker threads. These threads should not interact with any other (mutable) data structures in the program. Ideally, there should be at most two synchronization points in the program: a work queue and a results queue. The elements of these queues should package up all of the state needed for a worker thread to solve a piece of a problem or deliver its results.
With an approach like this that has minimal synchronization, there's no need to add a keyword to the language or put synchronization into many library container classes. And of course, performance is hardly an issue at all when you only synchronize twice per worker thread run.
Yes, if I need speed, I use C, the same as anyone else. If I am writing a Web application, I use Java. That's an area where Java excels. And maybe I'll get lucky enough to be able to code a project in Assembly or Lisp, who knows? Programming does not follow the "jack of all trades, expert at none" theory. General concepts map well across the spectrum.
I find it discouraging that there are so many programmers who only want to learn as much about their job, as to merely be good enough . Don't they feel any pride, or any desire to excel at something?
Coders who can only handle one language should be paid minimum wage; that is all they are worth. That is because it is neither the language nor the implementation that is important. It is the knowledge of how to program which will ensure your career and pay your bills.
A lot of them are things that actually used to be good advice, but for some reason or another (changes in hardware, compilers, etc.) aren't anymore. For example, it used to be a good idea in C to iterate through arrays by incrementing a pointer and dereferencing it instead of incrementing an index and then using array subscripting -- that way you had one increment per iteration, instead of one increment plus one offset calculation (basically you saved the addition that takes place during the array subscripting). However, on many modern C and C++ compilers in many situations, array subscripting will actually be faster than the pointer-incrementing method, because it's easier for the compiler to perform certain optimizations with. [Reference: Michael J. Scott, Programming Language Pragmatics (Morgan Kaufmann, 2000).]
There's quite a bit of other stuff like this out there as well.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Java is not always slower. Java's interpreted nature is generally seen as a weakness, but it has advantages too. For example, the JIT has profiling data immediately at hand when doing optimization, whereas compiled languages won't. Even in cases where compiled languages do use profile feedback, it may not be representative of the current program usage.
Try writing a simple recursive Fibonacci number calculator in both C++ and Java. The Java one is faster, when using a JIT enabled JVM. Of course, that is a contrived example, but it shows that just-in-time compiling can be faster.
That's a dumb benchmark. That will tell you that the program *starts* slow...and that, perhaps, invoking an application for each time you want to do something would be roughly the equivalent of starting a new web browser (IE on Windows doesn't count) every time you want to look at a new web page...or booting your operating system every time you want to run a program.
People don't generally write one-off small apps they intend to run hundreds of times a day in java. That's not what it's designed to do.
If you want to compare performance, do something real-like and have it run once in your two or so languages to get a base, and then run them a few thousand times. Something more like this:
http://www.bagley.org/~doug/shootout/
-- The world is watching America, and America is watching TV.
Not all programs have the same requirements. In most cases developer time is much more valuable than CPU time.
Stay as close to the CPU core as possible, but far enough away to be effective. C and C++ are the only languages that accomplish this role.
Ridiculous. Most good programming techniques are independent of language. If you can't develop effectively in anything but C or C++, then your "skillz" could use some work.
Think before you code.
Yes. Think whether the performance gained by using a low-level language is actually significant, and if so whether it offsets the increased development time and greater risk of uncontrolled failures (e.g. exploitable buffer overruns).
C is a fine language, and is often necessary. But it's a premature optimization to insist on always using it because of nebulous performance concerns.
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
The C++ world is full of myths about what does and doesn't enhance performance. Amongst my favourites...
In each of these cases, there is some overhead involved if you actually use the language feature, but generally not otherwise with any recent compiler. However, those overheads are usually less than hand-crafting the equivalent functionality (e.g., long jumps, function look-up tables a la C) would incur. Furthermore, if you actually understand the implications of these features, you can keep the overhead way down. The next time I see someone criticise templates for code bloat, and then demonstrate in the next post that they've never come across templated wrappers for generic base classes, I'm going to have to lecture them. }:-)
On the flip side...
Most of these get much more credit than they deserve. The first is true often, but not always: it sometimes shafts the optimiser in many compilers. The second is not true with any recent compiler. The third is true sometimes, but not nearly as often as you might expect: optimisers miss many of the apparent (to humans) possibilities anyway, and spot some of the others with or without a const there.
As always, the rule of thumb is to write correct, maintainable code first, and then to use compiler-specific, profiler-induced hackery where (and only where) required. Whether you're writing a database or a graphic engine, this is pretty much always good advice.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.