Java Performance Tuning, 2nd Ed.
Every developer has written a microbenchmark (a bit of code that does something 100-1000 times in a tight loop and measure the time it takes for the supposed "expensive operation") to try and prove an argument about which way is "more efficient" based on the execution time. The problem, is when running in a dynamic, managed environment like the 1.4.x JVM, there are more factors that you don't control than ones that you do, and it can be difficult to say whether one piece of code will be "more efficient" than another without testing with actual usage patterns. The second edition of Review of Java Performance Tuning provides substantial benchmarks (not just simple microbenchmarks) with thorough coverage of the JDK including loops, exceptions, strings, threading, and even underlying JVM improvements in the 1.4 VM. This book is one of a kind in its scope and completeness.
The Gory Details
The best part of this book is that it not only tells you how fast various standard Java operations are (sorting strings, dealing with exceptions, etc.), but he has kept all of the timing information from the previous edition of the book. This shows you how the VMs performance has changed from version 1.1.8 up to 1.4.0, and it's very clear that things are getting better. The author also breaks out the timing information for 3 different flavors of the 1.4.0 JVM: mixed interpreted/compiled mode (standard), server (with Hotspot), and interpreted mode only (no run time optimization applied).
Part 1 : Lies, Damn Lies and Statistics
The book starts off with three chapters of sage advice about the tools and process of profiling/tuning. Before you spend any time profiling, you have to have a process and a goal. Without setting goals, the tuning process will never end and it will likely never be successful.
The author outlines a general strategy that will give you a great starting point for your tuning task forces. Chapter 2 presents the profiling facilities that are available in the Java VM and how to interpret the results, while chapter 3 covers VM optimizations (different garbage collectors, memory allocation options) and compiler optimizations.
Part 2 : The Basics
Chapters 4-9 cover the nuts and bolts, code-level optimizations that you can implement. Chapter 4 discusses various object allocation tweaks including: lazy initialization, canonicalizing objects, and how to use the different types of references (Phantom, Soft, and Weak) to implement priority object pooling. Chapter 5 tells you more about handling Strings in Java that you ever wanted to know. Converting numbers (floats, decimals, etc) to Strings efficiently, string matching -- it's all here in gory detail with timings and sample code.
This chapter also shows the author's depth and maturity; when presenting his algorithm to convert integers to Strings, he notes that while his implementation previously beat the pants off of Sun's implementation, in 1.3.1/1.4.0 Sun implemented a change that now beats his code. He analyzes the new implementation, discusses why it's faster without losing face. That is just one of many gems in this updated edition of the book. Chapter 6 covers the cost of throwing and catching exceptions, passing parameters to methods and accessing variables of different scopes (instance vs. local) and different types (scalar vs. array). Chapter 7 covers loop optimization with a java bent. The author offers proof that an exception terminated loop, while bad programming style, can offer better performance than more accepted practices.
Chapter 8 covers IO, focusing in on using the proper flavor of java.io class (stream vs. reader, buffered vs. unbuffered) to achieve the best performance for a given situation. The author also covers performance issues with object serialization (used under the hood in most Java distributed computing mechanisms) in detail and wraps up the chapter with a 12 page discussion of how best to use the "new IO" package (java.nio) that was introduced with Java 1.4. Sadly, the author doesn't offer a detailed timing comparison of the 1.4 NIO API to the existing IO API. Chapter 9 covers Java's native sorting implementations and how to extend their framework for your specific application.
PART 3 : Threads, Distributed Computing and Other Topics
Chapters 10-14 covers a grab bag of topics, including threading, proper Collections use, distributed computing paradigms, and an optimization primer that covers full life cycle approaches to optimization. Chapter 10 does a great job of presenting threading, common threading pitfalls (deadlocks, race conditions), and how to solve them for optimal performance (e.g. proper scope of locks, etc).
Chapter 11 provides a wonderful discussion about one of the most powerful parts of the JDK, the Collections API. It includes detailed timings of using ArrayList vs. LinkedList when traversing and building collections. To close the chapter, the author discusses different object caching implementations and their individual performance results.
Chapter 12 gives some general optimization principles (with code samples) for speeding up distributed computing including techniques to minimize the amount of data transferred along with some more practical advice for designing web services and using JDBC.
Chapter 13 deals specifically with designing/architecting applications for performance. It discusses how performance should be addressed in each phase of the development cycle (analysis, design, development, deployment), and offers tips a checklist for your performance initiatives. The puzzling thing about this chapter is why it is presented at the end of the book instead of towards the front, with all of the other process-related material. It makes much more sense to put this material together up front.
Chapter 14 covers various hardware and network aspects that can impact application performance including: network topology, DNS lookups, and machine specs (CPU speed, RAM, disk).
PART 4 : J2EE Performance
Chapters 15-18 deal with performance specifically with the J2EE APIs: EJBs, JDBC, Servlets and JSPs. These chapters are essentially tips or suggested patterns (use coarse-grained EJBs, apply the Value Object pattern, etc) instead of very low-level performance tips and metrics provided in earlier chapters. You could say that the author is getting lazy, but the truth is that due to huge number of combinations of appserver/database vendor combinations, it would be very difficult to establish a meaningful performance baseline without a large testbed.
Chapter 15 is a reiteration of Chapter 1, Tuning Strategy, re-tooled with a J2EE focus. The author reiterates that a good testing strategy determines what to measure, how to measure it, and what the expectations are. From here, the author presents possible solutions including load balancing. This chapter also contains about 1.5 pages about tuning JMS, which seems to have been added to be J2EE 1.3 acronym compliant.
Chapter 16 provides excellent information about JDBC performance strategies. The author presents a proxy implementation to capture accurate profiling data and minimize changes to your code once the profiling effort is over. The author also covers data caching, batch processing and how the different transaction levels can affect JDBC performance.
Chapter 17 covers JSPs and servlets, with very little earth shattering information. The author presents tips such as consider GZipping the content before returning it to the client, and minimize custom tags. This chapter is easily the weakest section of the book: Admittedly, it's difficult to optimize JSPs since much of the actual running code is produced by the interpreter/compiler, but this chapter either needs to be beefed up or dropped from future editions.
Finally, chapter 18 provides a design/architecture-time approach towards EJB performance. The author presents standard EJB patterns that lend themselves towards squeezing greater performance out of the often maligned EJB. The patterns include: data access object, page iterator, service locator, message facade, and others. Again, there's nothing earth shattering in this chapter. Chapter 19 is list of resources with links to articles, books and profiling/optimizing projects and products.
What's Bad?
Since the book has been published, the 1.4.1 VM has been released with the much anticipated concurrent garbage collector. The author mentions that he received an early version of 1.4.1 from Sun to test with. However, the text doesn't state that he used the concurrent garbage collector, so the performance of this new feature isn't indicated by this text.
The J2EE performance chapters aren't as strong as the J2SE chapters. After seeing the statistics and extensive code samples of the J2SE sections, I expected a similar treatment for J2EE. Many of the J2SE performance practices still apply for J2EE (serialization most notably, since that his how EJB, JMS, and RMI ship method parameters/results across the wire), but it would be useful to fortify these chapters with actual performance metrics.
So What's In It For Me?
This book is indispensable for the architect drafting the performance requirements/testing process, and contains sage advice for the programmer as well. It's the most up to date publication dealing specifically with performance of Java applications, and is a one-of-a-kind resource.
You can purchase Java Performance Tuning, 2nd Edition from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Each String is around 64 bytes of memory minimum. What a stupid decision to make such a fundamental data type so heavy weight.
I have noticed my JAVA programs run considerably faster under the Sun Forte/One IDE. Once the JAVA app is on its own (especially through a browser), it slows considerably. Does anyone else have experience with this phenomenon?
stuff |
We ported some of our internal Java business applications to C# for use with Mono, and emperical results already suggest the solution is several times faster than the Java code. The port was very easy, with each line of Java code mapping onto one line of C# or less. Porting the UI to Gtk# was more difficult, but we find the Gtk# code more maintainable and the UI, along with the Gtk+ WIMP plugin integrates much more nicely with Windows than SWING. We'll be investigating a switch to Linux over the next few months for some of our Point-of-Sales terminals as a result, and it should be easy thanks to the portability of Mono and Gtk#.
We also ported some of our backend tools for use with Mono. In use with the newly released Mono JIT runtime, Mini, we've achieved some truly stunning results. It turns out that some of the optimisations in the new JIT are better than those used by GCC, so once the code is loaded in memory, it performs better than raw C code. Although I don't yet have hard numbers to back up these result (the transition is still in progress), it has to be said that Mono is the real answer to Java performance. Being Open Source, we can also contribute back to the runtime to make it better suit our needs. It also plays nicely with RedHat 9's NPTL threading implementation, which is more than I can say for the current crop of Java JREs.
Why would the users care whether you use a J2EE or PHP4 backend? You must be thinking of Java applets which very few few people actually use for any "web page development". And the chapters 15-18 that talk about J2EE, what's wrong with using J2EE for the generation of webpages and content?
Why does programming languages have to be an either or situation? Everyone here assumes that anyone who programs in JAva does not know C/C++...why is that? Can't someone know multiple prog langs? I know many (too many too really list here) and find it asinine that people really think that everyone should just program in one lang.
click me
Ick.
The albatross doesn't need killing -- it's already dead. The albatross was hanging from the mariners neck because he had killed it, and by doing so had brought bad luck upon his ship.
Quoting from memory here, because I can't be bothered to go find my copy of the poem:
As I said, that's from memory, so there are probably plenty of mistakes in there, but I'm sure a little googling will turn up a proper copy of the poem.
Previously, the startup slowdown was due to the system having to load, verify, and link the twenty or so classes a simple program depends upon. Pjava and J2ME-CDC solved that by storing an image of the heap with the system classes already loaded, verified, and linked (and quickened) so the system was run-ready almost immediately. I wonder if the J2SE folks picked up on that? Alternatively, they could just be skipping the verify for those classes in the signed rt.jar, and offline preverify them prior to signature - the verifier always was the slow part of the process.
Your point about threads is well taken, and applies more generally to much of java programming. Java's language and libraries make it all to easy to write architecturally-slow programs - you really still have to fully understand what you're doing in order to write a decent program, regardless of the language.
## W.Finlay McWalter ## http://www.mcwalter.org ##
I challenge you to make a C++/C# application that is thread-safe and can scale to millions of pageviews per day without writing a ton of supporting code. With a good J2EE app server, a java coder essentially just has to wrap his thread-unsafe code in a syncronized() statement, and he's done-- his app is now thread-safe.
Additionally, the "cross-platform doesn't matter for sysadmins" is a false statement; our CIO asked our net ops group "what would be the impact of us moving to an Intel platform?" and our sysadmins (after consulting with the coders) replied "absolutely no impact". That made our CIO very, very happy. Again, I challenge you to move your C++ apps from Solaris to Linux, or even to Windows, without any hiccup.
All of these other arguments are very specious: "I don't have enough RAM" will get you a reply of "go down to Fry's and spend $125 on another GB" every time. Processor speeds, even on Sun boxes, is getting to the point where the processor will never be a bottleneck for anything. Sure, java won't run as fast as a natively-compiled app. Neither will perl, php, tcl, or what have you. Raw processor speed is not as important when you have a couple of GHz to play with.
Ugh. If C++ took the Java route, alternative operating systems would be impossible. C is so popular because it has a very minimal runtime system. Java is extremely difficult to port because of it's huge runtime. Java and C++ aren't really aimed at the same market. C++ is aimed at systems programming, where people can take the time to find the best external libraries for particular jobs, and need the performance of native code. Java is aimed (right now) at the server market, where having an quickly accessible, well documented (though not necessarily top-quality, if only because of the lack of competition) platform is more important. Personally, I think Java has more to fear from languages like Python (which have the extensive class libraries and are much more high-level to boot) than languages like C++.
A deep unwavering belief is a sure sign you're missing something...
For most of theses transformations to be correct, the compiler has to prove there is only one pointer to the object -- in the whole program. Whole-program analyses are expensive, and so are point-to analysis. And there is just not that kind of time to spare in a JIT, where every second spent analysing the program is time spent not executing it.
The optimization the book proposes are all hit-or-miss adventures. Even for a programmer with intimate knowledge of the code, it is sometime difficult to predict if a change will help or imper performance. The compiler has even less chance to do so correctly -- and nobody like a compiler which slows down their code trying to optimize it.
This post was compiled with `% gec -O`. email me if you need the sources
I used to side with the purists. I've seen operator overloading so badly abused in some C++ programs that it is terrible.
.NET if some of the nagging problems could be dealt with by those who care. They are so concerned about "control" of the "standard" that they are endangering their de facto standard. Sure, there's gcj and kaffe, but why not just get us all working right on the real thing?
But I recently had to write a Java program that did financial calculations (more rare, even in business software, than you might think). You don't want to use floating point (for all the classic reasons), and, in this case, you don't want to use integers because you need power functions for interest calculations and so forth.
The classic solution appears to be to use the BigDecimal class. I decided to wrap this in my own Money class that would include the financial functions and would also do currency conversions.
This kind of spcialized or "value added" numeric type is exactly where operator overloading should be available. Java would take a huge leap forward in utility if you could just overload the =, +, -, *, and / operators (= being assignment).
I understand why Java's designers did what they did. If they don't want to do this, then for goodness sake, do a hack like "+" support for Strings and "," support in for statements to give at least BigDecimal the basic operator set!
Or bite the bullet and give us operator overloading. I'm come around 180 degrees on this issue. I want my overloaded operators!
While I'm at it, let me trot out my favorite Java rant. Sun should LGPL or BSD License the entire Java SDK and APIs. (I actually think they should GPL it, but I'd settle for this). They are their own worst enemy. Java has a strong market position, and I don't think its going to go away any time soon, but it could completely destroy C# and
I would say the same thing about IBM's Java SDK, but I believe (someone correct me if I am wrong) they have some licensed Sun intellectual property in their SDK.
I actually worry about C# because it does support operator overloading. It is the only thing tempting about it to me.
I'm unsatisfied by the idea that hardware is cheaper than developer time. Word Perfect alledgedly attempted to make a version in java, but scrapped it because of speed concerns.
If your product just barely runs within an acceptible time-frame, then you confronted with the probability that a given customer will agree with you. If a customer doesn't agree, then they will not use your product.. Thus while you save money on developer time, you lose potential customers (or existing ones). Worse, late in the game (after spending significant R&D time), management may decide to scrap the project when performance doesn't cut the mutards.
Yes, I realize that there are full-fledged GUIs written in java (Idea/eclipse/etc), but I definately notice their performance penalties (even on a 2GHZ machine with 0.5Gig mem). Moreover, the memory isn't enough to keep these monsters happy.
Now imagine a work-station runing a dozen java apps, each taking up a majority of the CPU. Current hardware can't even support this sort of environment without some hair-pulling.
Moreover, the idea that you can "throw hardware at it" assumes a certain level of parallelizability. Some problems are inherently serial and there is a definate max performance a single thread can achieve in a given generation.
Moreover there is an incredible danger in sacrificing performance to the hardware gods.
We're going to assume for the moment that we're talking about real developers here and that the idea of O(k), O(n) and O(nlog(n)) are religiously adhered to.
Assume that a given method is a k times slower than it could be for a given environment (where k need not be an integer; e.g. allowing 1.2). Now extend this process for an entire enterprise app.
Now realize that methods are generally part of loops, the delay is amplified. While the app may only utilize nlog(n), there are m such loops (where m grows as if n). Thus we have m^2 = (k*m)^2 => k^2.
Now if we only have k == 1.2, then we're fine, but since we have an interpreted environment, simple operation really take many times longer than equivalent c. Thus k is really 4 to 20 times it's conceptual cost. Thus k^2 => 16 to 400.
A java app has the potential to be 400 times slower than a c equivalent. Given the fuzzy numbers being used, I'll simply state that:
An arbitrarily abstract language is capable of producing sufficient values of k that there exists a level of complexity of code that will produce wait times that exceed the performance capabilities of a given hardware generation.
In short, we can get carried away in abstracting the hardware and overcomplexifying the software to the point of self-destruction.
One good example was an [unnamed] JDBC driver that wrote very elegant object oriented code but unfortunately accounted for 90% of the CPU and memory load of neighboring application. The overhead was small but subtle. Each component allocated and threw away objects.. Each such operation required potential garbage collection and initialization. There was never anything more than O(n), but there were several such O(n) calls within a single JDBC method invocation. The k's compounded enormously.
The thing is that you wouldn't really notice the overhead if you only accessed the JDBC once or twice. Unfortunately we happened to utilize jdbc calls in loops, which exposed the innefficiency. By using a different driver (which performed optimizations such as utilizing a static StringBuffer to avoid ANY memory allocations) the JDBC overhead became negligable.
The moral of the story is to identify (at design time) critical loops AND critically accessed sections and consider performance.. This is profiling 101, but I hear more and more java-developers arrogantly transcend such problems.
-Michael
It does under the hood whenever you use + for concatenation; this is why using String + String in a loop is ineffective: You create a new StringBuffer object per iteration. The solution in this case is to declare the StringBuffer outside the loop and use append() explicitly within.
I think your missing the point the parent was trying to make. Aren't there much bigger things to worry about than writing around bugs in the compiler?
Two years ago, everyone used StringBuffer.. today, everyone knows the use of the plus operator is as fast as it should be, so they opt for readability. Except people, who (like yourself) still use a StringBuffer when doing concatenation in a for loop.
Two years from now, the compiler will be optimized to use a StringBuffer *even* when the concatenation takes place in a loop. My code will still be readable (and run a tiny bit faster). Your code will run at the same speed and people will scratch their head - "why was he using a StringBuffer for concatenation?"
Why can't we just write our Java code as readably as possible, and then go back over it when we're done with an optimization tool looking for bottlenecks?If your internal time is billed out at $50 per hour, and you want to save your company money, you aren't going to spend 4 hours to create a custom garbage collector just to save another 5k of RAM-- you're going to go out and buy another stick of memory.
This reminds me how broken many (most?) corporate accounting systems are. Where I work, for a stick of RAM (or software, or whatever), it would take at least four hours spread over a couple weeks just to figure out who to submit the request to, wait for our "purchasing agent" to get a couple signatures from bureaucrats, wait for the purchase order to work its way to the top of a pile, and finally get the RAM only to discover they ordered the wrong type. All the while, they'll happily pay for labor hours wasted on slow computers with inadequate RAM (for example).
Why there is such a fundamental disconnection between spending money on labor versus spending it on time-saving equipment and software leaves me questioning reality.
Healthcare article at Kuro5hin
How much does that extra development time cost?
Writing ones' own java.lang.String takes time. Writing routines to convert com.donkeybollocks.String to java.lang.String and back again takes time. Supporting it takes time. And time is money. Me, I'd rather spend an extra £100 on a faster processor, or a Gb of RAM, and take a 25% performance improvement.
Come on guys, one of the major wins of the OO methodology is code reuse. Time was when programmers would always have to write their own I/O routines - I thought those days were long-gone. Rewriting fundamental parts of the Java API is just plain silly, unless it has a bug or a serious limitation (eg, it's non-threadsafe).
The more advanced the technology, the more open it is to primitive attack
I don't know what pletorah of things they taught you in CS school, but much of the wisdom they taught some of us can be summarized:
- Big O matters. Optimization of constants is an expensive luxury.
- Reimplementing the wheel for the sake of marginal efficiency is a sure way to get a square and inefficient wheel.
Most algorithms of any common use are provided in the standard libraries of each language. If not there, any algorithm can be implemented in any language by virtue of its Turing-completeness. This guarantees you bigO efficiency, which is what matters in the long run.
The article complains about Java being slow for the sake of its pcode nature. That's a constant factor, not bigO. It's automatically defeated by "CPU is cheap, RAM is cheap", i.e.: constant factor acceleration is cheap.
You better have a good reason to worry about constant factors: if your program demands so much from the machine that the constants make the difference on whether it's practical or not, you better be experimenting with the 'bleeding edge' or there's something really wrong with your program.
Efficient algorithms are used on every language by any programmer worth 2 bucks. Java has the advantage of implementing a bunch of them on standard libraries that work quite well, thank you. Someone who uses bubblesort in Java outside of a classroom is not lazy, he's an idiot. Implementing bubblesort is more complex and expensive than calling Arrays.sort().The same thing actually applies to any programming language.
If your concerns about speed as a typical sysadmin (servers and workstations) or even worse, as a developer, are dominated by constant factors, it's time to go back to take data structures and algorithm analysis at CS school.
Freedom is the freedom to say 2+2=4, everything else follows...
If someone were building a compiler that contemplated making such optimizations, wouldn't it be better to have an option to output optimization hints to the programmer? If I use a certian switch, the compiler emits another output file of notes about the source code, and what optimizations it suggests might be in order.
Naturally, the programmer might see that since both array parameters a and b point to the same array, that this is not really a possible optimization. This realization by the programmer is equivalent to the programmer realizing to insert an assert( a != b );
Why not move the optimizations into the source code directly. This makes the optimization transparent. Furthermore, all optimizations were then supervised by the programmer, not automatically hacked in by the compiler, which could fail if the same array were passed in for both parameters a and b.
The price of freedom is eternal litigation.
20% sounds like an inflated number. The typical app spends most of its time waiting for I/O or doing silly stuff for user interaction. A 20% overall difference would imply what, 50%, 100% slower when it's actually working?
There's some overhead, but it's never that bad. Sure, the overhead matters, which is why there's an investment on improving VM technology, providing access to native operations, etc.
But also the worst overhead offenders are not VM issues, but application design issues: blocking I/O, threading bugs, NOT using multithreading when you should, etc. Also, using Swing/AWT in a non-trivial GUI. I'm beginning to consider Swing/AWT just a giant bug.
However, let's assume the 20% speed difference in an application...
It's not that constant factors don't matter at all, it's that they matter the least. And when you have so many more important problems to fix, they take the last place in the priority list.
By definition, choosing your implementation language is an early decision that takes place long before that list is filled up. Although speed is always a concern, algorithmic speed is more important than PL speed, and independent of it. So PL speed shouldn't have that much to do with the decision.
Saving 20% in hardware is great, and the efficiency marketing advantage is important too. But those advantages are worth nothing if your application is buggier, less extendable, less flexible than your competitors, and specially if it gets to the market AFTER your competitors.
Developing applications in C is more expensive and complicated than developing them in Java, and the difference is typically more than the difference in speed. (I'm not saying that C applications are inherently worse than Java apps, just that to develop the same application with the same extensibility, features and stability takes more time, and a bunch of non-standard libraries).
Development costs add much more quickly than hardware costs, and unlike hardware costs, development costs are not guaranteed a return in performance. These are not dollars and cents, but hundreds and tens of dollars: development is more expensive than hardware.
Then you go to the client 6 months after your competitors and try to sell them the application. If they haven't already bought from the competition, you'll try to convince them that although your application is more expensive, and altough it hasn't been tested in the market for as long as the other ones, and although they'll need a C programmer versed in your favorite non-standard libraries to maintain it (you do give them API documentation and the tools to maintain it, right?), your application will save them a few bucks in hardware.
Unless the difference in hardware costs has more than 4 digits, I think your customer will advice you to take an accounting class.
If the difference is more than 4 digits, you are pushing the technology and you need to care about constant factors.
Either that or you're not dealing with a typical application. For example, a scientific analysis program that spends most of its time in pure computation needs all the juice it can get. Although I understand Java works fine for pure computational tasks.
Now, if you can make your applications in C as cheap, fast, and safely as with Java, then you have great C developers so you should just keep doing that. Most people can't.
Freedom is the freedom to say 2+2=4, everything else follows...