Performance Benchmarks of Nine Languages
ikewillis writes "OSnews compares the relative performance of nine languages and variants on Windows: Java 1.3.1, Java 1.4.2, C compiled with gcc 3.3.1, Python 2.3.2, Python compiled with Psyco 1.1.1, Visual Basic, Visual C#, Visual C++, and Visual J#. His conclusion was that Visual C++ was the winner, but in most of the benchmarks Java 1.4 performed on par with native code, even surpassing gcc 3.3.1's performance. I conducted my own tests pitting Java 1.4 against gcc 3.3 and icc 8.0 using his benchmark code, and found Java to perform significantly worse than C on Linux/Athlon."
Not sure of the accuracy. Benchmark is on a loop:
32-bit integer math: using a 32-bit integer loop counter and 32-bit integer operands, alternate among the four arithmetic functions while working through a loop from one to one billion. That is, calculate the following (while discarding any remainders)....
It also relies on the strength of the compiler, not just the strength of the language.
The Custom Mary
Why did VB do so bad on IO compared to the other .Net benchmarks? They were pretty much equal up until the IO benchmarks? Any chance of getting the code published that was used to test this?
I conducted my own tests pitting Java 1.4 against gcc 3.3 and icc 8.0 using his benchmark code, and found Java to perform significantly worse than C on Linux/Athlon.
Why is this a suprise? C has been most commonly used for so long because of it's speed and efficiency. I think anyone who has done much work with either developing or running large scale java programs knows that speed can definitely be an issue.
Not everything is analogous to cars. Car analogies rarely work.
They probably cheat and use undocumented native OS calls.
Unable to read configuration file '/bigassraid/htdig//conf/14229.conf'
Geocrawler error message.
Don't forget that it is also percieved as slow since just about any application anyone has seen for a desktop environment written in Java has a sluggish GUI.
Yeah, I know Java's strengths aren't in the Desktop arena, they're in development and the back-end.
What is interesting in these functions is that, as pointed in the article, there seems to be something wrong with Sun's implementation for Java.
For many math functions java uses a software implementation rather than using the built in hardware functions on the processer. This is to ensure that these function perform exactly the same on different architectures. This probably accounts for the difference in performance.
In theory, there is no difference between theory and practice, in practice there is.
Benchmark code like this does not represent how these languages are used in practice. Idiomatic Java code tends to be full of dynamic classes and indirection galore. Just testing "arithmetic and trigonometric functions [...] and [...] simple file I/O" is not going to tell you anything about how fast these languages are in the real world.
The Java performance is best explained by an article by Prof Kahan: "How JAVA's Floating-Point Hurts Everyone Everywhere" http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf also see "Marketing vs. Mathematics" http://www.cs.berkeley.edu/~wkahan/MktgMath.pdf I suspect the relatively poor floating-point performance of gcc is also caused by the desire to acheive accurate results.
Don't forget about the Win32 Compiler Shootout
Note that Python is pretty easy to extend in C/C++, so that speed critical parts can be rewritten in C if the performance becomes an issue. Writing the whole program in C or C++ is a premature optimization.
Save your wrists today - switch to Dvorak
In the case of Java, you find that the Intel floating point trig instructions don't meet the Java machine spec. So they had to implement them as a function.
It all depends if you want accuracy or speed.
- "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
Someone should do a study on the time taken to design, implement and debug a resonably complex chunk of code under C++ and Java. I'm pretty sure that the result would show the huge advanatage of Java over C++.
The difference b/w Java and C++ would be dwarfed by the difference b/w Java and Python. Java may be 30-40% more productive than C++, but Python is 1000% more productive than Java. And yes, this applies to larger projects. J2EE may come to its own w/ projects that have hundreds of mediocre programmers, but if you have a mid-size team of highly skilled developers creating something new & unique (something like Zope or Chandler), Python will trounce the competition.
Save your wrists today - switch to Dvorak
There were a number of problems with this benchmark, which are addressed in the OSNews thread about the article.
Namely:
- They only test a highly specific case of small numeric loops that is pretty much the best-case scenario for a JIT compiler.
- They don't test anything higher level, like method calls, object allocation, etc.
Concluding "oh, Java is as fast as C++" from these benchmarks would be unwise. You could conclude that Java is as fast as C++ for short numeric loops, of course, but that would be a different bag of cats entirely.
A deep unwavering belief is a sure sign you're missing something...
If more people would use the SWT libraries (part of the Eclipse project) instead of the crappy AWT/Swing libraries, then this misconception would go away. SWT works by mapping everything to native OS widgets if possible, giving it the look, feel, and speed of a native app. I used Eclipse for quite a while before finding out that it is almost 100% pure Java (other than the JNI code necessary for the native calls).
Site was showing signs of Slashdotting, so I'll quote one of the more important sections...
Results
Here are the benchmark results presented in both table and graph form. The Python and Python/Psyco results are excluded from the graph since the large numbers throw off the graph's scale and render the other results illegible. All scores are given in seconds; lower is better.
int long double trig I/O TOTAL
Visual C++ 9.6 18.8 6.4 3.5 10.5 48.8
Visual C# 9.7 23.9 17.7 4.1 9.9 65.3
gcc C 9.8 28.8 9.5 14.9 10.0 73.0
Visual Basic 9.8 23.7 17.7 4.1 30.7 85.9
Visual J# 9.6 23.9 17.5 4.2 35.1 90.4
Java 1.3.1 14.5 29.6 19.0 22.1 12.3 97.6
Java 1.4.2 9.3 20.2 6.5 57.1 10.1 103.1
Python/Psyco 29.7 615.4 100.4 13.1 10.5 769.1
Python 322.4 891.9 405.7 47.1 11.9 1679.0
Beware: In C++, your friends can see your privates!
So, yes, you can construct programs, even some useful compute intensive programs, that perform as well or better on Java than they do in C. But that still doesn't make Java suitable for high-performance computing or building efficient software.
Benchmarks like the one published by OSnews don't test for these limitations. Microbenchmarks like those are still useful: if a language doesn't do well on them, that tells you that it is unsuitable for certain work; for example, based on those microbenchmarks alone, Python is unlikely to be a good language for Fortran-style numerical computing. But those kinds of microbenchmarks are so limited that they give you no guarantees that an implementation is going to be suitable for any real-world programming even if the implementation performs well on all the microbenchmarks.
I suggest you go through the following exercise: write a complex number class, then write an FFT using that complex number class, "void fft(Complex array[])", and then benchmark the resulting code. C, C++, and C# all will perform reasonably well. In Java, on the other hand, you will have to perform memory allocations for every complex number you generate during the computation.
It's in many ways unfortunate that with JDK 1.2 (Swing) and onwards, Sun pretty much dumped fast native support for GUI rendering. It has its benefits -- full control, easier portability -- but the fact is that simple GUI apps felt faster with 1.1 than they have done ever since (or even more). This is, alas, especially noticeable on X-windows, perhaps since often the whole window is rendered as one big component as opposed to normal x app components (in latter case, x-windows can optimize clipping better).
Years ago (in late 90s, 97 or 98), I wrote a full VT-52/100/102/220 terminal emulator with telnet handling (plus for fun plugged in a 3rd party then-open SSH implementation). After optimizing display buffer handling, it was pretty much on par with regular xterm, on P100 (Red hat whatever, 5.2?), as in felt about as fast, and had as extensive vt-emulation (checked with vttest). Back then I wrote the thing mostly to show it can be done, as all telnet clients written in Java back then were horribly naive, doing full screen redraw and other flicker-inducing stupidities... and contributed to the perception that Java is and will be slow. I thought it had more to do with programmers not optimizing things that need to be optimized.
It's been a while since then; last I tried it on JDK 1.4.2... and it still doesn't FEEL as fast, even though technically speaking all java code parts ARE much faster (1.1 didn't have any JIT compiler; HotSpot, as tests show, is rather impressive in optimizing). It's getting closer, but then again, mu machine has almost an order of magnitude more computing power now, as probably does gfx card.
To top off problems, in general Linux implementation has been left with much less attention than windows version (or Solaris, but Solaris is at least done by same company). :-/
I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
The optimisers in sun's Java VM work on run-time profiling - they identify the most run sections of code and use the more elaborate optimisation steps on these segments alone.
Benchmarks that consist of one small loop will do very well under this scheme, as the critical loop will get all of the optimisation effort, but I suspect that in programs where the CPU time is more distributed over many code sections, this scheme will perform less well.
C doesn't have the benefit of this run-time profiling to aid in optimising critical sections, but it can more afford to apply its optimisations across the entire codebase.
I'd be interested to see results of a benchmark of code where CPU time is more distributed..
According to these benchmarks it doesn't.
The short of it is that GCC 3.2.1 is highly competitive with ICC 7.0, except for two cases:
FP-intensive code on the Pentium 4
Code that allows Intel C++ to auto-generate SSE vector code for it
A deep unwavering belief is a sure sign you're missing something...
The Python 'long' type is not a machine type such as a 32 or 64 or perhaps even 128 bit integer/long.
It is an arbitrary precision decimal type! That's why Python's scores on the Long test are so much higher (slower) than the other languages.
I wonder what Java scores when the benchmark is reimplemented using BigDecimal instead of the 'long' machine type.
Python uses a highly efficient Karatsuba multiplication algorithm for its longs (although that only starts to kick in with very big numbers).
There was an interesting article in Dr Dobb's a few months back. They did a performace (C++) comparison of 6 or so compilers, gcc included. The end result was that performace wise (execution AND code size) gcc came in last place in all their testing. However, gcc did win when it came to conformance to the C++ standard as it was the only compiler that supported all the language features.
Because as we all know VC++ and the other Microsoft languages are so widly available for Linux/BeOS. I'm sorry but your comment is pure troll. It would be interesting to have things like GCC under Linux on the same computer there too, but you can't compare Microsoft's .NET to anything under Linux, because .NET doesn't run under Linux (I know about Mono, but that isn't MS's runtime).
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
OK, Speed does matter a lot.
But what about type safety? Java has no generic typed containers, like the STL. This means you tend to find errors at runtime instead of at compile time.
I need to know that my code is as safe as possible. I don't want a user to find a bug because my hand tests didn't get 100% code coverage every time.
And how about predictable performance. I would much rather know that this function will tak 200ms all of the time instead of 100ms most of the time a 10 s due to garbage collection occasionally.
Then, with more and more languages, especially ones with VMs, you get further and further away from the hardware. The end result: you lose performance. It does more and more for you, but at the expense of real optimizations, the kind that only you can do.
Now the zealots will come out and say, "Language X is better than language Y, see!" To me this argument is boring. I tend to use the appropriate tool for the job. So:
Yes, my teams use many languages, but they also put their effort to where they get the biggest bang for the buck. And in any business approach, that's the key goal. You don't see carpenters use saws to hammer in nails or drive screws. Wise up!
...tizzyd
Productive for you now ... but what about 6 months down the road? What if you want to realize your product to the world, how hard is it to extend it?
The advantages over Java are even increased 6 months down the road. Python code is much more readable and maintainable, hence easier to extend. Dynamically typed object model scales incredibly well.
I used to think the same about Perl vs Java, until I started looking at frameworks like Cocoon and they're all written in Java.
Comparing Perl to Java is foolish, Perl is more like Awk than a general purpose programming language, and not meant for large projects at all.
Save your wrists today - switch to Dvorak
Life isn't like a box of chocolates. It's more like a jar of jalapenos. What you do today, might burn your ass tomorrow.
The Pentium trig instructions are not IEEE compliant (they don't return the correct values for large magnitude arguments). gcc errs on the side of caution and generates slow, software-based wrappers that correct for the limitations of the Pentium instructions by default. Other compilers (e.g., Intel and probably Microsoft) just generate the in-line instructions with no correction. When you look at the claimed superiority of other compilers over gcc, it is usually such tradeoffs that make gcc appear slower.
You can enable inline trig functions in gcc as well, either with a command line flag, or an include file, or by using "asm" statements on a case-by-case basis. Check the documentation. With those enabled, gcc keeps up well with other compilers on trig functions.
Just because Swing is slow does not make it crappy. It meets nicely what it was designed to do. I use swing applications all the time. Today we have 1GHz processors, its not even an issue any longer, but it wont be allowed to die...
Eclipse is nice, I love eclipse. But I dont mistake it as a Swing replacement. AWT has a purpose, as does Swing and SWT, they are all different.
I believe AWT should be as fast as SWT because its also natively implemented.
Raw performance will ALWAYS be an issue. If you can handle 100,000 hits per day on the same hardware that I can handle 1,000,000 (and these are not made up numbers, we see this kind of discrepency in web applications all the time), then I clearly will be able to do MORE business than you and do it cheaper. That gives me a competitive advantage from now till the end of time. If you throw more hardware at the problem, well, so can I and I'll still be ahead of you.
.NET environment are compiled down to executable code, then executed.
.NET environment are compiled to a form of executable code (I don't think it's actual .NET byte code, but it may be) and then executed.
Performance realities do not go away, no matter how much we may wish they would. Now, does that mean you're going to go write major portions of your web application in assembly to speed it up? No, probably not. But your database vendor may very well use some tricks like that to speed up the key parts of their database. You sink or swim by your database, so don't say it doesn't matter because it absolutely does.
Anyway, in my day-to-day operations, I can think of quite a few things that get compiled directly to executable code even though they don't have to be. Why would you do this if performance wasn't an issue and we could just throw more hardware at it?
1. Regular expressions in the
2. XSL transformations in the
3. The XmlSerializer classes creates a special compiled executable specifically created to serialize objects into XML (byte code!!).
And the list just goes on and all of this eventually ends up getting JITed as well. My pages are 100% XML based, go through many transformation steps to get to where they need to be, and on average render in about 70-100ms (depending upon the amount of database calls I need to make and the size of the data). This all happens without spiking our CPU utilization to extreme levels. There is *NO WAY* I could've done this on our hardware if nobody cared about performance.
As always, a good design is the most important factor. But a good design that performs well will always be superior to one that doesn't.
Bryan
They should have written their site in one of the higher-performing languages.
RP
Raw performance will ALWAYS be an issue. If you can handle 100,000 hits per day on the same hardware that I can handle 1,000,000 (and these are not made up numbers, we see this kind of discrepency in web applications all the time), then I clearly will be able to do MORE business than you and do it cheaper.
You raise excellent points. For many enterprise and server applications, performance is an issue. But I never said one should care nothing abut performance, only that in many applications the cost of the coder also impacts financial results.
For the price of one software engineer for a year (call it 50k to 100k burdened labor rate), I can buy between 20 to 100 new PCs (at $1000 to $3000 each). If the programmer is more expensive or the machines are less expensive, then the issue is even more in favor of worring about coder performance.
The trade-off between the hardware cost of the code and the wetware cost is not obvious in every case. A small firm that can double its server capacity for less than the price of a coder. or the creators of an infrequently-used application may not need high performance. On the other hand, a large software seller that sells core performance apps might worry more about speed. My only point is that ignoring the cost of the coder is wrong.
These different languages create a choice of whether to throw more hardware at a problem or throw more coders at the problem.
Two wrongs don't make a right, but three lefts do.
Sorry, dude, but SWT is nowhere *near* as complete as Swing, in terms of functionality. I know, I've tried to use it. Basically, because SWT was designed more or less specifically with Eclipse in mind, it has massive gaps in it's APIs (for example, the imaging model is *severely* lacking). Worse, it's difficult to deploy, and even more difficult to use, as the documentation is remarkably incomplete. So, as much as I hate to say it, SWT simply can't replace Swing right now, and I don't expect it to any time soon.
Enumorators? Reflection?
:-), but Java 1.5 has them better, as first-class objects.
.net/IIS is a better platform for webdevelopment.
;-).
I'm only a beginner in C# and Java, but I know both have reflection, and the proposed Java 1.5 has enums. Kudos to C# for having them first
Also
Better for whom? Why? Doesn't it have the severe shortcoming of platform lockdown?
I can write a c#.net app in 1/4th the code of a java one. Go take a look at Microsoft's petshop program if you do not believe me.
I can write an assembly app in 1/4 the code of a Python one. Assuming, of course, that the Python app wasn't written for small code size... The simile is very accurate; Sun didn't write their petshop for small size.
The Java Petshop reimplementation here spanks both Sun's and Microsoft's petshop in terms of size, and pretty clearly demonstrates that both languages could do better.
BTW, I absolutely love C# -- from what I've done with it so far. My only complaint is that its support is at best halfhearted for other platforms, and I will not allow my work to be tied down to one platform. This is the only thing that kept me from learning K (well, K is portable, the only problem is that it's only available from one vendor, Kx systems). Anyhow, I think C#'s bytecode is far beyond anything Sun's ever going to do with Java.
ALso WIndows2k3 is as stable as Linux now. NT4 is old. The situation has improved dramatically. I have never even seen a blue screen on windows2k yet!
I agree with all of that, but it's not enough. I have seen blue screens and system crashes on 2000 and XP (XP far, FAR FAR more often than 2000). But then I've seen system crashes on Linux, so I'm not just complaining about MS
-Billy
That's a feature built into Java 1.5, but you can get a test reference implementation which is about 96% of the features now to try it out. It has a really clean syntax and provides the benefit you seek.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Changing this to 'linesToWrite = [myString] * ioMax' dropped time on my system from 2830ms to 1780ms (I'd like to note that I/O on my system was already much faster than his *best* I/O score, thank you very much Linux)
In the trig test, I used numarray to decrease the runtime from 47660.0ms to *6430.0ms*. The original timing matches his pretty closely, which means that numarray would probably beat his gcc timings handily, too. Any time you're working with a billion numbers in Python, it's a safe bet that you should probably use numarray!
I didn't immediately see how to translate his other mathematical tests into numarray, but I noted that his textual explanation in the article doesn't match the (python) source code!
(My system is a 2.4GHz Pentium IV running RedHat 9)
Hate stupid software on freshmeat? Laugh at
The windows version of Python is much slower. Testing with Python2.3 + psyco on a 2.4ghz p4 running Linux 2.4.20 yeilds impressive results
$ python -O Benchmark.py
Int arithmetic elapsed time: 13700.0 ms with
Trig elapsed time: 8160.0 ms
$ java Benchmark
Int arithmetic elapsed time: 13775 ms
$ java -server Benchmark
Int arithmetic elapsed time: 9807 ms
(n.b. this is only a small subset of the tests- I didn't feel like waiting. Trig was not run for java because it took forever.)
To dismiss a few common myths...
1) Python IS compiled to bytecode on it's first run. The bytecode is stored on the filesystem in $PROGNAME.pyc.
2) the -O flag enables runtime optimization, not just faster loading time. On average you get a 10-20% speed boost.
3) Python is a string and list manipulation language, not a math language. It does so significantly faster than your average C coder could do so, with a hell of a lot less effort.
Faster processors should enable us to achieve more, not achieve the same old stuff much less efficiently.