Performance Benchmarks of Nine Languages
ikewillis writes "OSnews compares the relative performance of nine languages and variants on Windows: Java 1.3.1, Java 1.4.2, C compiled with gcc 3.3.1, Python 2.3.2, Python compiled with Psyco 1.1.1, Visual Basic, Visual C#, Visual C++, and Visual J#. His conclusion was that Visual C++ was the winner, but in most of the benchmarks Java 1.4 performed on par with native code, even surpassing gcc 3.3.1's performance. I conducted my own tests pitting Java 1.4 against gcc 3.3 and icc 8.0 using his benchmark code, and found Java to perform significantly worse than C on Linux/Athlon."
I am not a compiler nerd (IANACN?), so maybe someone else can answer the following simple question:
Why are the Microsoft languages so fast with the Trig functions?
I'm a 2000 man.
Not sure of the accuracy. Benchmark is on a loop:
32-bit integer math: using a 32-bit integer loop counter and 32-bit integer operands, alternate among the four arithmetic functions while working through a loop from one to one billion. That is, calculate the following (while discarding any remainders)....
It also relies on the strength of the compiler, not just the strength of the language.
The Custom Mary
Why did VB do so bad on IO compared to the other .Net benchmarks? They were pretty much equal up until the IO benchmarks? Any chance of getting the code published that was used to test this?
Well, for performance it does. For cross platform compilation it rocks the house. If you really want performance you need to be using something like Intel's C compiler (which oddly was not tested)
Finkployd
I conducted my own tests pitting Java 1.4 against gcc 3.3 and icc 8.0 using his benchmark code, and found Java to perform significantly worse than C on Linux/Athlon.
Why is this a suprise? C has been most commonly used for so long because of it's speed and efficiency. I think anyone who has done much work with either developing or running large scale java programs knows that speed can definitely be an issue.
Not everything is analogous to cars. Car analogies rarely work.
I see once again that Eugenia (a supposed pro-Linux pro-BeOS person who doesn't use Windows) has done all her benchmarks [i]under[/i] Windows. I have a feeling that Python would perform a lot better if it was running in a proper POSIX environment (linked against Linux's libraries instead of the Cygwin libs). Probably the C code compiled with GCC would perform a fair bit better too.
Why benchmark the various ".NET languages" (those languages whose compilers target the CLR)? Every compiler targeting the CLR produces Intermediate Languages, or more specifically MSIL. The only differences you'd find is in optimizations performed for each compiler, which usually aren't too much (like VB.NET allocates a local variable for the old "Function = ReturnValue" syntax whether you use it or not).
Look at the results for C# and J#. They are almost exactly the same, except for the IO which I highly doubt. Compiler optimizations could squeeze a few more ns or ms out of each procedure, but nothing like that. After all, it's the IL from the mscorlib.dll assembly that's doing most the work for both languages in exactly the same way (it's already compiled and won't differ in execution).
When are people going to get this? I know a lot of people that claim to be ".NET developers" but only know C# and don't realize that the clas libraries can be used by any languages targeting the CLR (and each has their shortcuts).
...some analysis of the code generated by Visual C++ and gcc side by side, particularly for those trig calls. If there's that great a discrepancy between the runtimes, that's a good clue that either one of the compilers is under-optimising (i.e. missing a trick), or the other is over-optimising (i.e. applying some transformation that only approximates what the answer should be). I didn't see any mention of the numerical results obtained being checked against what they ought to be (or even against each other).
:)
As any games/DSP programmer will tell you, there are a million ways to speed up trig providing that you don't *really* care after 6dps or so.
OK, maybe I'm just bitter because I was expecting gcc 3.1 to wipe the floor.
These sigs are more interesting tha
Benchmark code like this does not represent how these languages are used in practice. Idiomatic Java code tends to be full of dynamic classes and indirection galore. Just testing "arithmetic and trigonometric functions [...] and [...] simple file I/O" is not going to tell you anything about how fast these languages are in the real world.
Given the ever accelerating clockspeed of processors, is the raw performance of langauges that big an issue? Except for CPU-intensive programs (3-D games, high-end video/audio editing), current CPUs offer more than enough horsepower to handle any application. (Even 5-year old CPUs handle almost every task with adequate speed). Thus, code performance is not a big issue for most people.
On the other hand, the time and cost required by the coder is a bigger issue (unless you outsource to India). I would assume that some languages are just easier to design for, easier to write in, and easier to debug. Which of these langauges offers the fastest time to "bug-free" completion for applications of various sizes?
Two wrongs don't make a right, but three lefts do.
The Java performance is best explained by an article by Prof Kahan: "How JAVA's Floating-Point Hurts Everyone Everywhere" http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf also see "Marketing vs. Mathematics" http://www.cs.berkeley.edu/~wkahan/MktgMath.pdf I suspect the relatively poor floating-point performance of gcc is also caused by the desire to acheive accurate results.
Don't forget about the Win32 Compiler Shootout
Note that Python is pretty easy to extend in C/C++, so that speed critical parts can be rewritten in C if the performance becomes an issue. Writing the whole program in C or C++ is a premature optimization.
Save your wrists today - switch to Dvorak
Ximian's Mono has a C# compiler for open OS's:
http://www.go-mono.com/c-sharp.html
Someone should do a study on the time taken to design, implement and debug a resonably complex chunk of code under C++ and Java. I'm pretty sure that the result would show the huge advanatage of Java over C++.
The difference b/w Java and C++ would be dwarfed by the difference b/w Java and Python. Java may be 30-40% more productive than C++, but Python is 1000% more productive than Java. And yes, this applies to larger projects. J2EE may come to its own w/ projects that have hundreds of mediocre programmers, but if you have a mid-size team of highly skilled developers creating something new & unique (something like Zope or Chandler), Python will trounce the competition.
Save your wrists today - switch to Dvorak
There were a number of problems with this benchmark, which are addressed in the OSNews thread about the article.
Namely:
- They only test a highly specific case of small numeric loops that is pretty much the best-case scenario for a JIT compiler.
- They don't test anything higher level, like method calls, object allocation, etc.
Concluding "oh, Java is as fast as C++" from these benchmarks would be unwise. You could conclude that Java is as fast as C++ for short numeric loops, of course, but that would be a different bag of cats entirely.
A deep unwavering belief is a sure sign you're missing something...
Site was showing signs of Slashdotting, so I'll quote one of the more important sections...
Results
Here are the benchmark results presented in both table and graph form. The Python and Python/Psyco results are excluded from the graph since the large numbers throw off the graph's scale and render the other results illegible. All scores are given in seconds; lower is better.
int long double trig I/O TOTAL
Visual C++ 9.6 18.8 6.4 3.5 10.5 48.8
Visual C# 9.7 23.9 17.7 4.1 9.9 65.3
gcc C 9.8 28.8 9.5 14.9 10.0 73.0
Visual Basic 9.8 23.7 17.7 4.1 30.7 85.9
Visual J# 9.6 23.9 17.5 4.2 35.1 90.4
Java 1.3.1 14.5 29.6 19.0 22.1 12.3 97.6
Java 1.4.2 9.3 20.2 6.5 57.1 10.1 103.1
Python/Psyco 29.7 615.4 100.4 13.1 10.5 769.1
Python 322.4 891.9 405.7 47.1 11.9 1679.0
Beware: In C++, your friends can see your privates!
Keep in mind too that these benchmarks were all run on windows. I think gcc plays a lot nicer with glibc compared to the windows native libraries. Also, as pointed out, it's about being portable, not the most optimized compiler.
-t
http://unmoldable.com W:"No one of consequence" I:"I must know" W:"Get used to disappointment"
Using the IBM Java VM, I've been able to achieve consistently cutting my runtimes in half over the Sun VM. Anyone currently using the Sun VM for production work should test the IBM one and consider the switch.
My application that I benchmarked is data and network and memory intensive, although not math intensive, so that's what I can speak for. We consistently use 2 GB of main memory and pump a total of 2.5 TB (yes, TB) of data (doing a whole buch of AI style work inside the app itself) through the application over it's life cycle, and we cut our total runtime from 6 days to 2.8 days by switching to the IBM VM.
You are not testing the languages, you are testing the compilers. If you test a language with a crummy compiler (gcc sucks compared to commercial optimized C++ compilers) you will think the language is slow, when in fact, the compiler just sucks. The only valid comparisons that can be made are same language, different compilers.
They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.
So, yes, you can construct programs, even some useful compute intensive programs, that perform as well or better on Java than they do in C. But that still doesn't make Java suitable for high-performance computing or building efficient software.
Benchmarks like the one published by OSnews don't test for these limitations. Microbenchmarks like those are still useful: if a language doesn't do well on them, that tells you that it is unsuitable for certain work; for example, based on those microbenchmarks alone, Python is unlikely to be a good language for Fortran-style numerical computing. But those kinds of microbenchmarks are so limited that they give you no guarantees that an implementation is going to be suitable for any real-world programming even if the implementation performs well on all the microbenchmarks.
I suggest you go through the following exercise: write a complex number class, then write an FFT using that complex number class, "void fft(Complex array[])", and then benchmark the resulting code. C, C++, and C# all will perform reasonably well. In Java, on the other hand, you will have to perform memory allocations for every complex number you generate during the computation.
The optimisers in sun's Java VM work on run-time profiling - they identify the most run sections of code and use the more elaborate optimisation steps on these segments alone.
Benchmarks that consist of one small loop will do very well under this scheme, as the critical loop will get all of the optimisation effort, but I suspect that in programs where the CPU time is more distributed over many code sections, this scheme will perform less well.
C doesn't have the benefit of this run-time profiling to aid in optimising critical sections, but it can more afford to apply its optimisations across the entire codebase.
I'd be interested to see results of a benchmark of code where CPU time is more distributed..
According to these benchmarks it doesn't.
The short of it is that GCC 3.2.1 is highly competitive with ICC 7.0, except for two cases:
FP-intensive code on the Pentium 4
Code that allows Intel C++ to auto-generate SSE vector code for it
A deep unwavering belief is a sure sign you're missing something...
The Python 'long' type is not a machine type such as a 32 or 64 or perhaps even 128 bit integer/long.
It is an arbitrary precision decimal type! That's why Python's scores on the Long test are so much higher (slower) than the other languages.
I wonder what Java scores when the benchmark is reimplemented using BigDecimal instead of the 'long' machine type.
Python uses a highly efficient Karatsuba multiplication algorithm for its longs (although that only starts to kick in with very big numbers).
There was an interesting article in Dr Dobb's a few months back. They did a performace (C++) comparison of 6 or so compilers, gcc included. The end result was that performace wise (execution AND code size) gcc came in last place in all their testing. However, gcc did win when it came to conformance to the C++ standard as it was the only compiler that supported all the language features.
OK, Speed does matter a lot.
But what about type safety? Java has no generic typed containers, like the STL. This means you tend to find errors at runtime instead of at compile time.
I need to know that my code is as safe as possible. I don't want a user to find a bug because my hand tests didn't get 100% code coverage every time.
And how about predictable performance. I would much rather know that this function will tak 200ms all of the time instead of 100ms most of the time a 10 s due to garbage collection occasionally.
Then, with more and more languages, especially ones with VMs, you get further and further away from the hardware. The end result: you lose performance. It does more and more for you, but at the expense of real optimizations, the kind that only you can do.
Now the zealots will come out and say, "Language X is better than language Y, see!" To me this argument is boring. I tend to use the appropriate tool for the job. So:
Yes, my teams use many languages, but they also put their effort to where they get the biggest bang for the buck. And in any business approach, that's the key goal. You don't see carpenters use saws to hammer in nails or drive screws. Wise up!
...tizzyd
Productive for you now ... but what about 6 months down the road? What if you want to realize your product to the world, how hard is it to extend it?
The advantages over Java are even increased 6 months down the road. Python code is much more readable and maintainable, hence easier to extend. Dynamically typed object model scales incredibly well.
I used to think the same about Perl vs Java, until I started looking at frameworks like Cocoon and they're all written in Java.
Comparing Perl to Java is foolish, Perl is more like Awk than a general purpose programming language, and not meant for large projects at all.
Save your wrists today - switch to Dvorak
The Pentium trig instructions are not IEEE compliant (they don't return the correct values for large magnitude arguments). gcc errs on the side of caution and generates slow, software-based wrappers that correct for the limitations of the Pentium instructions by default. Other compilers (e.g., Intel and probably Microsoft) just generate the in-line instructions with no correction. When you look at the claimed superiority of other compilers over gcc, it is usually such tradeoffs that make gcc appear slower.
You can enable inline trig functions in gcc as well, either with a command line flag, or an include file, or by using "asm" statements on a case-by-case basis. Check the documentation. With those enabled, gcc keeps up well with other compilers on trig functions.
Windows was a good choice for this test, because many of the development languages that were used in this test aren't really mature enough in *nix. (i.e. .Net languages and arguably Java) A better test would be doing both tests on both OS's, because GCC is really more optimized twords Linux, while VC++ is more optimized twords Windows. I would have rather seen VC++ vs. Borderland C++, because that is a more real world business example.
It is amusing that the obsession with raw speed never goes away, even though computers have gotten thousands of times faster since the the days of the original wisdom about how one shouldn't be obsessed with speed. Programmers put down Visual Basic as slow when it was an interpreted language running on a 66MHz 486. It was still put down as slow when it shared the same machine code generating back-end as Visual C++ running on a 3GHz Pentium 4. And still some people--usually people with little commercial experience--continue to insist that speed is everything.
Here's a bombshell: if you have a nice language, and that language doesn't have any hugely glaring drawbacks (such as simple benchmarks filling up hundreds of megabytes of memory), then don't worry about speed. From past experience, I've found it's usually easy to start with what someone considers to be a fast C or C++ program. Then I write a naive version in Python or another language I like. And guess what? My version will be 100x slower. Sometimes this is irrelevant. 100x slower than a couple of microseconds doesn't matter. Other times it does matter. But it usually isn't important to be anywhere near as fast as C, just to speed up the simpler, cleaner Python version by 2-20x. This can usually be done by fiddling around a bit, using a little finesse, trying different approaches. It's all very easy to do, and one of the great secrets is that high-level optimization is a lot of fun and more rewarding than assembly level optimization, because the rewards are so much greater.
This is mostly undiscovered territory, but I found one interesting link.
Note that I'm not talking about diddly high-level tasks in language like Python, but even things like image processing. It doesn't matter. Sticking to C and C++ for performance reasons, even though you know there are better languages out there, is a backward way of thinking.
They should have written their site in one of the higher-performing languages.
RP
I was a bit surprised by this quote in the article:
"Even if C did still enjoy its traditional performance advantage, there are very few cases (I'm hard pressed to come up with a single example from my work) where performance should be the sole criterion when picking a programming language. I"
I can only assume from this that he has never done or known anyone who has done any realtime programming. If you're going to write something
like a car engine management system performance is the ONLY critiria, hence a lot of these sorts of systems are still hand coded in assembler , never
mind C.
Raw performance will ALWAYS be an issue. If you can handle 100,000 hits per day on the same hardware that I can handle 1,000,000 (and these are not made up numbers, we see this kind of discrepency in web applications all the time), then I clearly will be able to do MORE business than you and do it cheaper.
You raise excellent points. For many enterprise and server applications, performance is an issue. But I never said one should care nothing abut performance, only that in many applications the cost of the coder also impacts financial results.
For the price of one software engineer for a year (call it 50k to 100k burdened labor rate), I can buy between 20 to 100 new PCs (at $1000 to $3000 each). If the programmer is more expensive or the machines are less expensive, then the issue is even more in favor of worring about coder performance.
The trade-off between the hardware cost of the code and the wetware cost is not obvious in every case. A small firm that can double its server capacity for less than the price of a coder. or the creators of an infrequently-used application may not need high performance. On the other hand, a large software seller that sells core performance apps might worry more about speed. My only point is that ignoring the cost of the coder is wrong.
These different languages create a choice of whether to throw more hardware at a problem or throw more coders at the problem.
Two wrongs don't make a right, but three lefts do.
Ummm.. Slashdot is written in Perl, as are many other large projects. I've yet to see anything like Slashdot written in Awk.
I heard there was a vote b/w Perl, Awk, Intercal and sed, and Perl won by a narrow margin.
Save your wrists today - switch to Dvorak
His benchmark isn't fair, he's omitting the fame pointer on VC++ but not gcc. How is that fair?
Guido van Rossum noted in an interview the following statistic, and I think it bears considerably on appropriateness:
So then, unless you quantify the types of apps you build, the team you use, and the results that are expected, my experience has shown me that most of the time, for business apps, it's overkill. Now, if you're in a dev team at a software company, well then, I could consider the other side.
...tizzyd
That's a feature built into Java 1.5, but you can get a test reference implementation which is about 96% of the features now to try it out. It has a really clean syntax and provides the benefit you seek.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Comparison against gcc, gcj and Java 1.4.1 on the same host:I was somwhat surprised on the difference in the trig tests, as both appear to use libm. Not surprised that the IO was slower, the Java IO classes are nifty but do add quite a bit of overhead compared fputs/fgets.
(Sorry about the formatting, it was the best I could do)
Changing this to 'linesToWrite = [myString] * ioMax' dropped time on my system from 2830ms to 1780ms (I'd like to note that I/O on my system was already much faster than his *best* I/O score, thank you very much Linux)
In the trig test, I used numarray to decrease the runtime from 47660.0ms to *6430.0ms*. The original timing matches his pretty closely, which means that numarray would probably beat his gcc timings handily, too. Any time you're working with a billion numbers in Python, it's a safe bet that you should probably use numarray!
I didn't immediately see how to translate his other mathematical tests into numarray, but I noted that his textual explanation in the article doesn't match the (python) source code!
(My system is a 2.4GHz Pentium IV running RedHat 9)
Hate stupid software on freshmeat? Laugh at
The windows version of Python is much slower. Testing with Python2.3 + psyco on a 2.4ghz p4 running Linux 2.4.20 yeilds impressive results
$ python -O Benchmark.py
Int arithmetic elapsed time: 13700.0 ms with
Trig elapsed time: 8160.0 ms
$ java Benchmark
Int arithmetic elapsed time: 13775 ms
$ java -server Benchmark
Int arithmetic elapsed time: 9807 ms
(n.b. this is only a small subset of the tests- I didn't feel like waiting. Trig was not run for java because it took forever.)
To dismiss a few common myths...
1) Python IS compiled to bytecode on it's first run. The bytecode is stored on the filesystem in $PROGNAME.pyc.
2) the -O flag enables runtime optimization, not just faster loading time. On average you get a 10-20% speed boost.
3) Python is a string and list manipulation language, not a math language. It does so significantly faster than your average C coder could do so, with a hell of a lot less effort.
I see just one small issue with the benchmarks. Microsoft claims, that all .NET languages are compilled at the runtime. This means, that the first pass of the execution through the function has a compile time added on top of the execution, which falsifies somewhat the .NET execution time benchmark. I did some simple tests that confirm this. To my surprise, .NET languages are actually faster than Visual C++, Borland C++ or GNU C+ for a simple 1/n series calculation without visible loss of accuracy. Don't ask me how it is possible. I don't know, but it is a fact that my benchmark shows. My best guess would be that the just in time compiler is better in getting code optimized for the CPU in the particular machine it runs or maybe it is better in filling the cache. The key of the benchmark is to write software in such a way that it runs through the function at least two times. The first time it runs just to allow just in time compiler to compile the code and then it runs subsequent times to measure performance. Below is the schematics of my benchmark:
// This is to allow .NET "just in time" compiler to compile the benchmark function
// CurrentTime is a placeholder here for a system time function in ticks // lprt is a placeholder for a nice formatting print here
// This is the body of the benchmark
.NET is part of the execution time but I disagree. My position on this is, that in most real life cases the software runs into the particular functions many times thus creating long exectution times. It is rare, that a signle function call creates long exectuion time that is annoying to the user.
double benchmark(int number_of_iterations);
void main (void)
{
Time start,end;
double outcome;
benchmark(1);
for(int i = 1; i < 11; ++i)
{
start = CurrentTime();
outcome = benchmark(i*1000000);
end = CurrentTime();
lprt (i,outcome,end-start);
}
}
double benchmark (int number_of_iterations)
{
double s,t;
s = 0.0;
t = 1.0;
for(int i = 1; i < number_of_iterations; ++i)
{
s += 1.0/t;
t += 1.0;
}
return (s);
}
As you can see above, I run the benchmark function once with counter of 1 and ignore its outcome before starting to measure time. The key is to allow compiler to compile the benchmarking function before running actual benchmark. Once it is done, I run then the benchmark 10 times for succesively larger counter from 1 billion to 10 billion and print number of iterations (in billions), the accuarcy and the time it takes to run. The idea here is that under the assumption that the benchmark time is related to number of iterations as a linear function I can easily find linear best fit function between number of cycles and run time in the form of
time = a * number_of_cycles + b
and then use value of a as a measurement of the benchmark. The value of b is good check, how the benchmark behaves. If it is large, then something went wrong. In my case it was always close to zero. I'm now away from my home computer and I don't have all the compilers, that were tested in this article, so I can't repeat those benchmarks modified to this method at the moment, but you guys might try to do it yourself.
Some people might challenge this by stating that the compile time for
Best regards.
Why are the two best Borland languages never included in benchmarks? Maybe in just the odd-ball that doesn't use C++, Java, or Micro$oft. - TMK