Performance Benchmarks of Nine Languages
ikewillis writes "OSnews compares the relative performance of nine languages and variants on Windows: Java 1.3.1, Java 1.4.2, C compiled with gcc 3.3.1, Python 2.3.2, Python compiled with Psyco 1.1.1, Visual Basic, Visual C#, Visual C++, and Visual J#. His conclusion was that Visual C++ was the winner, but in most of the benchmarks Java 1.4 performed on par with native code, even surpassing gcc 3.3.1's performance. I conducted my own tests pitting Java 1.4 against gcc 3.3 and icc 8.0 using his benchmark code, and found Java to perform significantly worse than C on Linux/Athlon."
I am not a compiler nerd (IANACN?), so maybe someone else can answer the following simple question:
Why are the Microsoft languages so fast with the Trig functions?
I'm a 2000 man.
Not sure of the accuracy. Benchmark is on a loop:
32-bit integer math: using a 32-bit integer loop counter and 32-bit integer operands, alternate among the four arithmetic functions while working through a loop from one to one billion. That is, calculate the following (while discarding any remainders)....
It also relies on the strength of the compiler, not just the strength of the language.
The Custom Mary
Why did VB do so bad on IO compared to the other .Net benchmarks? They were pretty much equal up until the IO benchmarks? Any chance of getting the code published that was used to test this?
Well, for performance it does. For cross platform compilation it rocks the house. If you really want performance you need to be using something like Intel's C compiler (which oddly was not tested)
Finkployd
I conducted my own tests pitting Java 1.4 against gcc 3.3 and icc 8.0 using his benchmark code, and found Java to perform significantly worse than C on Linux/Athlon.
Why is this a suprise? C has been most commonly used for so long because of it's speed and efficiency. I think anyone who has done much work with either developing or running large scale java programs knows that speed can definitely be an issue.
Not everything is analogous to cars. Car analogies rarely work.
I see once again that Eugenia (a supposed pro-Linux pro-BeOS person who doesn't use Windows) has done all her benchmarks [i]under[/i] Windows. I have a feeling that Python would perform a lot better if it was running in a proper POSIX environment (linked against Linux's libraries instead of the Cygwin libs). Probably the C code compiled with GCC would perform a fair bit better too.
Why benchmark the various ".NET languages" (those languages whose compilers target the CLR)? Every compiler targeting the CLR produces Intermediate Languages, or more specifically MSIL. The only differences you'd find is in optimizations performed for each compiler, which usually aren't too much (like VB.NET allocates a local variable for the old "Function = ReturnValue" syntax whether you use it or not).
Look at the results for C# and J#. They are almost exactly the same, except for the IO which I highly doubt. Compiler optimizations could squeeze a few more ns or ms out of each procedure, but nothing like that. After all, it's the IL from the mscorlib.dll assembly that's doing most the work for both languages in exactly the same way (it's already compiled and won't differ in execution).
When are people going to get this? I know a lot of people that claim to be ".NET developers" but only know C# and don't realize that the clas libraries can be used by any languages targeting the CLR (and each has their shortcuts).
...some analysis of the code generated by Visual C++ and gcc side by side, particularly for those trig calls. If there's that great a discrepancy between the runtimes, that's a good clue that either one of the compilers is under-optimising (i.e. missing a trick), or the other is over-optimising (i.e. applying some transformation that only approximates what the answer should be). I didn't see any mention of the numerical results obtained being checked against what they ought to be (or even against each other).
:)
As any games/DSP programmer will tell you, there are a million ways to speed up trig providing that you don't *really* care after 6dps or so.
OK, maybe I'm just bitter because I was expecting gcc 3.1 to wipe the floor.
These sigs are more interesting tha
Well unfortunately, comparing Java to C# on a Windows machine is like comparing a bird and a dolphins ability to swim in water; Several components of C# are integrated right into the operating system so naturally it's going to run faster on a windows machine. Compare C#, C++ and Java on machines where the components aren't integrated and then we will have a FAIR benchmark.
Oh wait! C# only runs on one operating system. Can you name any other development languages that only run on ONE OS, boys and girls? Neither can I.
This is my sig. There are many like it but this one is mine.
Benchmark code like this does not represent how these languages are used in practice. Idiomatic Java code tends to be full of dynamic classes and indirection galore. Just testing "arithmetic and trigonometric functions [...] and [...] simple file I/O" is not going to tell you anything about how fast these languages are in the real world.
Someone should do a study on the time taken to design, implement and debug a resonably complex chunk of code under C++ and Java. I'm pretty sure that the result would show the huge advanatage of Java over C++.
The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.
code
Given the ever accelerating clockspeed of processors, is the raw performance of langauges that big an issue? Except for CPU-intensive programs (3-D games, high-end video/audio editing), current CPUs offer more than enough horsepower to handle any application. (Even 5-year old CPUs handle almost every task with adequate speed). Thus, code performance is not a big issue for most people.
On the other hand, the time and cost required by the coder is a bigger issue (unless you outsource to India). I would assume that some languages are just easier to design for, easier to write in, and easier to debug. Which of these langauges offers the fastest time to "bug-free" completion for applications of various sizes?
Two wrongs don't make a right, but three lefts do.
The Java performance is best explained by an article by Prof Kahan: "How JAVA's Floating-Point Hurts Everyone Everywhere" http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf also see "Marketing vs. Mathematics" http://www.cs.berkeley.edu/~wkahan/MktgMath.pdf I suspect the relatively poor floating-point performance of gcc is also caused by the desire to acheive accurate results.
It's a pity that the present-day language of choice for high-performance computing, Fortran 90/95/HPF, was not covered in this study. There has been anecdotal evidence that C++ has approached Fortran, performance-wise, in recent years, but I've yet to see a proper comparison of the two languages.
Tubal-Cain smokes the white owl.
Don't forget about the Win32 Compiler Shootout
Note that Python is pretty easy to extend in C/C++, so that speed critical parts can be rewritten in C if the performance becomes an issue. Writing the whole program in C or C++ is a premature optimization.
Save your wrists today - switch to Dvorak
IMO a program should use whatever tools are available and appropreate for the job, and not just worry about what is faster.
There were a number of problems with this benchmark, which are addressed in the OSNews thread about the article.
Namely:
- They only test a highly specific case of small numeric loops that is pretty much the best-case scenario for a JIT compiler.
- They don't test anything higher level, like method calls, object allocation, etc.
Concluding "oh, Java is as fast as C++" from these benchmarks would be unwise. You could conclude that Java is as fast as C++ for short numeric loops, of course, but that would be a different bag of cats entirely.
A deep unwavering belief is a sure sign you're missing something...
Site was showing signs of Slashdotting, so I'll quote one of the more important sections...
Results
Here are the benchmark results presented in both table and graph form. The Python and Python/Psyco results are excluded from the graph since the large numbers throw off the graph's scale and render the other results illegible. All scores are given in seconds; lower is better.
int long double trig I/O TOTAL
Visual C++ 9.6 18.8 6.4 3.5 10.5 48.8
Visual C# 9.7 23.9 17.7 4.1 9.9 65.3
gcc C 9.8 28.8 9.5 14.9 10.0 73.0
Visual Basic 9.8 23.7 17.7 4.1 30.7 85.9
Visual J# 9.6 23.9 17.5 4.2 35.1 90.4
Java 1.3.1 14.5 29.6 19.0 22.1 12.3 97.6
Java 1.4.2 9.3 20.2 6.5 57.1 10.1 103.1
Python/Psyco 29.7 615.4 100.4 13.1 10.5 769.1
Python 322.4 891.9 405.7 47.1 11.9 1679.0
Beware: In C++, your friends can see your privates!
Keep in mind too that these benchmarks were all run on windows. I think gcc plays a lot nicer with glibc compared to the windows native libraries. Also, as pointed out, it's about being portable, not the most optimized compiler.
-t
http://unmoldable.com W:"No one of consequence" I:"I must know" W:"Get used to disappointment"
I found it interesting to note in the benchmark design that the Visual C++ compile used the "omit frame pointer" option, while gcc did not. It seems to be the consensus over at the Gentoo Forums that this flag makes a fairly noticable difference (if negatively impacting debug options), and I'd like to see the C piece re-run using this option. It's tough enough to compare apples to apples in tests such as these, but at least try to use the same compile flags where available..
Dan
Using the IBM Java VM, I've been able to achieve consistently cutting my runtimes in half over the Sun VM. Anyone currently using the Sun VM for production work should test the IBM one and consider the switch.
My application that I benchmarked is data and network and memory intensive, although not math intensive, so that's what I can speak for. We consistently use 2 GB of main memory and pump a total of 2.5 TB (yes, TB) of data (doing a whole buch of AI style work inside the app itself) through the application over it's life cycle, and we cut our total runtime from 6 days to 2.8 days by switching to the IBM VM.
You are not testing the languages, you are testing the compilers. If you test a language with a crummy compiler (gcc sucks compared to commercial optimized C++ compilers) you will think the language is slow, when in fact, the compiler just sucks. The only valid comparisons that can be made are same language, different compilers.
They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.
So, yes, you can construct programs, even some useful compute intensive programs, that perform as well or better on Java than they do in C. But that still doesn't make Java suitable for high-performance computing or building efficient software.
Benchmarks like the one published by OSnews don't test for these limitations. Microbenchmarks like those are still useful: if a language doesn't do well on them, that tells you that it is unsuitable for certain work; for example, based on those microbenchmarks alone, Python is unlikely to be a good language for Fortran-style numerical computing. But those kinds of microbenchmarks are so limited that they give you no guarantees that an implementation is going to be suitable for any real-world programming even if the implementation performs well on all the microbenchmarks.
I suggest you go through the following exercise: write a complex number class, then write an FFT using that complex number class, "void fft(Complex array[])", and then benchmark the resulting code. C, C++, and C# all will perform reasonably well. In Java, on the other hand, you will have to perform memory allocations for every complex number you generate during the computation.
The optimisers in sun's Java VM work on run-time profiling - they identify the most run sections of code and use the more elaborate optimisation steps on these segments alone.
Benchmarks that consist of one small loop will do very well under this scheme, as the critical loop will get all of the optimisation effort, but I suspect that in programs where the CPU time is more distributed over many code sections, this scheme will perform less well.
C doesn't have the benefit of this run-time profiling to aid in optimising critical sections, but it can more afford to apply its optimisations across the entire codebase.
I'd be interested to see results of a benchmark of code where CPU time is more distributed..
Come on guys, carefully choosing that simple benchmarks I can easily prove any language's faster than GCC.
According to these benchmarks it doesn't.
The short of it is that GCC 3.2.1 is highly competitive with ICC 7.0, except for two cases:
FP-intensive code on the Pentium 4
Code that allows Intel C++ to auto-generate SSE vector code for it
A deep unwavering belief is a sure sign you're missing something...
I would say yes, If you are coding just for WinTel platforms I would use MS tools and/or the Intel compiler. If you are coding only for SPARC/solaris, use their compiler.
GCC, like apache, is meant to be correct and portable first, fast second. Despite this, I wouldn't say thats its performance sucks, I would say that it is the fastest cross-platform option available. (as compared to Java the only other cross platform non-interpreted language in the test group).
I like java for some things, and the performance has even improved a bit lately. However if I am doing ANYTHING that has to scale and perform well under heavy load that uses cryptographic functions (especially public key encipherment), there is no way I can even seriously consider Java.
Someone (meaning anyone other than me) should do a benchmark of THAT, I'm sure it would be quite telling.
Finkployd
The Python 'long' type is not a machine type such as a 32 or 64 or perhaps even 128 bit integer/long.
It is an arbitrary precision decimal type! That's why Python's scores on the Long test are so much higher (slower) than the other languages.
I wonder what Java scores when the benchmark is reimplemented using BigDecimal instead of the 'long' machine type.
Python uses a highly efficient Karatsuba multiplication algorithm for its longs (although that only starts to kick in with very big numbers).
There was an interesting article in Dr Dobb's a few months back. They did a performace (C++) comparison of 6 or so compilers, gcc included. The end result was that performace wise (execution AND code size) gcc came in last place in all their testing. However, gcc did win when it came to conformance to the C++ standard as it was the only compiler that supported all the language features.
It's well known that benchmarks aren't the be all and end all. They're often just statistics geared towards a ultra-specific application (remember all those /. stories about benchmark cheating vendors?)
I've seen examples of gcc in a cygwin shell kicking visual-c++ ass at load up times of huge image data on a wintel box. I've also seen java (jdk 1.3) annihilate native c code on console apps calculating complex mathematical formulas on a linux box. This goes for both AMD and Intel chips.
Moral of the story? These languages are all suited to specific uses. Analyze your tasks, your platforms, and your compilers. Learn how to use optimizations properly. Evaluate your need for portability. Do a few tests for performance in different languages and compilers to see which one actually is fastest for your current application.
There is no single "fastest" language.
OK, Speed does matter a lot.
But what about type safety? Java has no generic typed containers, like the STL. This means you tend to find errors at runtime instead of at compile time.
I need to know that my code is as safe as possible. I don't want a user to find a bug because my hand tests didn't get 100% code coverage every time.
And how about predictable performance. I would much rather know that this function will tak 200ms all of the time instead of 100ms most of the time a 10 s due to garbage collection occasionally.
Conveniently I have the same system configuration as ikewillis (dual 2.0 GHz Athlon MP), but am running Windows XP instead of Linux. I also have Intel C++ 8.0, which he used on Linux to generate his results.
So I ran the same tests that he ran under Linux under Windows. Here are my results from Intel C++ 8.0, with Profile Guided Optimization turned off (comparing to his with PGO on):
Running the same tests under Windows with PGO turned on, the numbers did not change except on the least-significant digits, so I won't bother to list those too. Before running the tests, I set the program to run at high priority on one processor to avoid unnecessary interference from other running applications, or unnecessary processor-jumping--although when I tried it without, there wasn't much of a difference (< 1%).
Conclusions? First, it seems the 64-bit integer performance problem is something that exists only for Intel C++ 8.0 on Linux, not Windows. Second, it seems stdlib I/O performance is significantly higher under Linux than under Windows for this benchmark.
It's hard for thee to kick against the pricks.
Yeah, with the increase in hardware price/performance the performance consideration is becoming less and less of a consideration in _most_ applications. There are still environments where efficiency is of paramount concern (the combination of great speed and low resource drag). Examples I work with are real-time financial trading applications, network back-bone servers (routers firewalls, intelligent switches, etc.), mobile and embedded devices, server daemons, and network applications (packet sniffers, etc.).
For general business processing applications and most web applications, efficiency is less of a concern and cost/time-to-market/maintainability/security are bigger.
I like these benchmarks but would like to see ones that also benchmark the other characteristics of languages (such as lines of code to do a common task, number of tests that need to be performed to validate common functions, memory space, etc. etc.)
Then, with more and more languages, especially ones with VMs, you get further and further away from the hardware. The end result: you lose performance. It does more and more for you, but at the expense of real optimizations, the kind that only you can do.
Now the zealots will come out and say, "Language X is better than language Y, see!" To me this argument is boring. I tend to use the appropriate tool for the job. So:
Yes, my teams use many languages, but they also put their effort to where they get the biggest bang for the buck. And in any business approach, that's the key goal. You don't see carpenters use saws to hammer in nails or drive screws. Wise up!
...tizzyd
I would like to see benchmarks between Java vendors on the same platform for 1.4.x. Specifically, I'd like to see Sun JVM, IBM J9, and BEA JRocket. The question is how do other commercial JVMs really stack up against the Sun standard.
I would also like to see benchmarks of the same JVM across different operating systems on the same processor, namely Windows, Linux, BSD, and (if it matters) Solaris x86. The question is how do other JVMs stack up against the Windows 'standard'.
It would also be nice to see a 'leveling' benchmark across different processors, specifically comparing a suite of Java benchmarks on WinTel and MacOS.
The Pentium trig instructions are not IEEE compliant (they don't return the correct values for large magnitude arguments). gcc errs on the side of caution and generates slow, software-based wrappers that correct for the limitations of the Pentium instructions by default. Other compilers (e.g., Intel and probably Microsoft) just generate the in-line instructions with no correction. When you look at the claimed superiority of other compilers over gcc, it is usually such tradeoffs that make gcc appear slower.
You can enable inline trig functions in gcc as well, either with a command line flag, or an include file, or by using "asm" statements on a case-by-case basis. Check the documentation. With those enabled, gcc keeps up well with other compilers on trig functions.
I don't know about anyone else, but most programs in the real application world do alot of string manipulation, and I have seen some pretty shocking results of string manipulation benchmarks showing Java the worst with the C++ class second, and Python actualy leading the pack. It would be usefull to also see the overhead calcs for object management too. Java is so memory heavy we have problems with machines that have 4 Gig of RAM configured.
It's pretty stupid to run benchmarks for a language in a non native environment for the python marks.
Yet again OS News publishes a completely meaningless story.
Everyone is living in a personal delusion, just some are more delusional than others.
Windows was a good choice for this test, because many of the development languages that were used in this test aren't really mature enough in *nix. (i.e. .Net languages and arguably Java) A better test would be doing both tests on both OS's, because GCC is really more optimized twords Linux, while VC++ is more optimized twords Windows. I would have rather seen VC++ vs. Borderland C++, because that is a more real world business example.
It is amusing that the obsession with raw speed never goes away, even though computers have gotten thousands of times faster since the the days of the original wisdom about how one shouldn't be obsessed with speed. Programmers put down Visual Basic as slow when it was an interpreted language running on a 66MHz 486. It was still put down as slow when it shared the same machine code generating back-end as Visual C++ running on a 3GHz Pentium 4. And still some people--usually people with little commercial experience--continue to insist that speed is everything.
Here's a bombshell: if you have a nice language, and that language doesn't have any hugely glaring drawbacks (such as simple benchmarks filling up hundreds of megabytes of memory), then don't worry about speed. From past experience, I've found it's usually easy to start with what someone considers to be a fast C or C++ program. Then I write a naive version in Python or another language I like. And guess what? My version will be 100x slower. Sometimes this is irrelevant. 100x slower than a couple of microseconds doesn't matter. Other times it does matter. But it usually isn't important to be anywhere near as fast as C, just to speed up the simpler, cleaner Python version by 2-20x. This can usually be done by fiddling around a bit, using a little finesse, trying different approaches. It's all very easy to do, and one of the great secrets is that high-level optimization is a lot of fun and more rewarding than assembly level optimization, because the rewards are so much greater.
This is mostly undiscovered territory, but I found one interesting link.
Note that I'm not talking about diddly high-level tasks in language like Python, but even things like image processing. It doesn't matter. Sticking to C and C++ for performance reasons, even though you know there are better languages out there, is a backward way of thinking.
Python did pretty badly in the tests. The reason is that in Python it takes a long time to translate a variable name into a memory address (It happens at runtime instead of compile time).
The benchmark code has stuff that basically looks like this:
Adding 1 to i takes no time at all but looking up i take a little time. In C this is going to be a lot faster.
Python did really bad when "i" from the example above was a long compared to when it was a long in C. That's because Python has big number support but in C a long is limited to just 4 bytes.
Python did OK in the trig section because the trig functions are implemented in C. It still suffers because it takes a long time to look up variables though.
In real life, variable look up time is sometimes a factor. However, for programs that I've written getting data from the network, or database was the bottleneck.
I know it ties into the GCC libs, but does G++ behave any better/worse than GCC. Comparing VC++ (a C++ compiler) and GCC (a C compiler) is a bit skewed. Also, how about a comparison of GCC in windows VS linux (comparable machines), just to see if the OS has any bearing on things?
Third, Java 1.4.2 performed as well as or better than the fully compiled gcc C benchmark, after discounting the odd trigonometry performance. I found this to be the most surprising result of these tests, since it only seems logical that running bytecode within a JVM would introduce some sort of performance penalty relative to native machine code. But for reasons unclear to me, this seems not to be true for these tests.
I dont know why the reasons are not clear to him. Perhaps its because he still thinks the JVM is "running bytecode" and does not understand what JITs did or what HotSpot compilers do. Byte code is only run the first few passes, after which its optimized into native code. Native being whatever the compiler of the c program used to compile the JVM could do. This is fundamental. Which explains his results, and points to a poor HotSpot implementation where trig functions are concerned.
Why didn't they include ActivePerl?
In the article it rather sounds like they just assumed Python performance would be an indicator of performance for interpreted languages generally, but is there anything to back this up?
This Like That - fun with words!
I actually use C++ for portability, not speed or generic programming (which are nice to have).
If you avoid platform, compiler, and processor specific features, C++ is even more portable than Java. Java on the other hand tends to drag all platforms down to the least common denominator, then requires the use of contorted logic and platform extensions just to attain acceptable performance.
People seem to have forgotten the original intention of C: portable code.
So it couldn't possibly be a problem with the app? It has to be the language it's written in? By that logic, C/C++ must suck really badly, because we all know how unreliable Windows 95 was. Puuuurlease.
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
They should have written their site in one of the higher-performing languages.
RP
I was a bit surprised by this quote in the article:
"Even if C did still enjoy its traditional performance advantage, there are very few cases (I'm hard pressed to come up with a single example from my work) where performance should be the sole criterion when picking a programming language. I"
I can only assume from this that he has never done or known anyone who has done any realtime programming. If you're going to write something
like a car engine management system performance is the ONLY critiria, hence a lot of these sorts of systems are still hand coded in assembler , never
mind C.
1) JIT optimizations don't always kick in until a function has been run several times. Since the bechmarks only run once, they are crippled on java.
2) Java's IO function work on UTF-8 or other system dependant character set. So in essence java is doing twice the ammount of work during the IO benchmark.
I'm sure other people will comment as well, but overall these numbers are not that suprising for code that was just copy and pasted from c code. Why do people expect that ANY language will perform well using another languages code.
Raw performance will ALWAYS be an issue. If you can handle 100,000 hits per day on the same hardware that I can handle 1,000,000 (and these are not made up numbers, we see this kind of discrepency in web applications all the time), then I clearly will be able to do MORE business than you and do it cheaper.
You raise excellent points. For many enterprise and server applications, performance is an issue. But I never said one should care nothing abut performance, only that in many applications the cost of the coder also impacts financial results.
For the price of one software engineer for a year (call it 50k to 100k burdened labor rate), I can buy between 20 to 100 new PCs (at $1000 to $3000 each). If the programmer is more expensive or the machines are less expensive, then the issue is even more in favor of worring about coder performance.
The trade-off between the hardware cost of the code and the wetware cost is not obvious in every case. A small firm that can double its server capacity for less than the price of a coder. or the creators of an infrequently-used application may not need high performance. On the other hand, a large software seller that sells core performance apps might worry more about speed. My only point is that ignoring the cost of the coder is wrong.
These different languages create a choice of whether to throw more hardware at a problem or throw more coders at the problem.
Two wrongs don't make a right, but three lefts do.
I ran four tests using Portable.net and mono. For lazyness reasons I only ran Int and Trig benchmarks. All tests were performed on a 2.4ghz p4.
First, I compiled Benchmark.cs using cscc (portable.net) and mcs (mono).
I then ran each binary with mono and ilrun (portable.net). Results are interesting.
Portable.net compiler: cscc -O3 Benchmark.cs
$ ilrun Benchmark.portable.exe
Int arithmetic elapsed time: 12996 ms
Trig elapsed time: 28700 ms
$ mono Benchmark.portable.exe
Int arithmetic elapsed time: 16235 ms
Trig elapsed time: 4534 ms
Mono Compiler: mcs Benchmark.cs
$ ilrun Benchmark.exe
Int arithmetic elapsed time: 13784ms
Trig elapsed time: 27939 ms
$ mono Benchmark.exe
Int arithmetic elapsed time: 15994 ms
Trig elapsed time: 4596 ms
As you can see, Portable.net has slightly faster Int math, but crumbles under the trig functions. There is no significant difference between the compilers.
the Portable.net runtime had a serious bug where the time calculated was an order of magnitute out. I used the unix time command to get a more accurate result.
It would be interesting to do this comparison using Microsoft.NET as well. I would assume Microsoft.net would absolutely rape these results.
n.b. Please note this was not a comprehensive benchmark. I disabled some of the tests because I didn't feel like waiting (So sue me), while X, xmms, xchat, etc were running.
We weren't quite ready to release it, but we've been working on a language performance comparison test of our own. It is available at:
http://scutigena.sourceforge.net/
It's designed as a framework that ought to run cross-platform, so you can run it yourself. We haven't added it yet, but I think we really want to divide the tests into two categories. "Get it done" - and each language implements it the best way for that language, and "Basic features comparison" - where each language has to show off features like lists, hash tables, how fast function calls are, and so forth.
It's an ongoing project, so new participants are welcome! I would appreciate it if comments went to the appropriate SF mailing lists instead of here, so that I can better keep track of them.
http://www.welton.it/davidw/
His benchmark isn't fair, he's omitting the fame pointer on VC++ but not gcc. How is that fair?
Guido van Rossum noted in an interview the following statistic, and I think it bears considerably on appropriateness:
So then, unless you quantify the types of apps you build, the team you use, and the results that are expected, my experience has shown me that most of the time, for business apps, it's overkill. Now, if you're in a dev team at a software company, well then, I could consider the other side.
...tizzyd
That's a feature built into Java 1.5, but you can get a test reference implementation which is about 96% of the features now to try it out. It has a really clean syntax and provides the benefit you seek.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Reminds me of my 6th grade 'science fair' project.
I took a couple different compilers, languages, did some loops and math and such, timed them all.
"Which computer language is the fastest"
About half way through the project I realized how big of a waste of time it was.
What kinds of things should you be testing?
Speeds of function calls???
Implement various sorting algorithims?
Audio/Video compression/decompression?
When it comes down to it, it's all the same math, and any good compiler is going to come close to making the same darn code.
By now, we all know that you use one language for one thing, and another language for another. For various reasons.
A hammer is your only tool if all your problems are nails, isn't the cliche?
Comparison against gcc, gcj and Java 1.4.1 on the same host:I was somwhat surprised on the difference in the trig tests, as both appear to use libm. Not surprised that the IO was slower, the Java IO classes are nifty but do add quite a bit of overhead compared fputs/fgets.
(Sorry about the formatting, it was the best I could do)
Changing this to 'linesToWrite = [myString] * ioMax' dropped time on my system from 2830ms to 1780ms (I'd like to note that I/O on my system was already much faster than his *best* I/O score, thank you very much Linux)
In the trig test, I used numarray to decrease the runtime from 47660.0ms to *6430.0ms*. The original timing matches his pretty closely, which means that numarray would probably beat his gcc timings handily, too. Any time you're working with a billion numbers in Python, it's a safe bet that you should probably use numarray!
I didn't immediately see how to translate his other mathematical tests into numarray, but I noted that his textual explanation in the article doesn't match the (python) source code!
(My system is a 2.4GHz Pentium IV running RedHat 9)
Hate stupid software on freshmeat? Laugh at
The windows version of Python is much slower. Testing with Python2.3 + psyco on a 2.4ghz p4 running Linux 2.4.20 yeilds impressive results
$ python -O Benchmark.py
Int arithmetic elapsed time: 13700.0 ms with
Trig elapsed time: 8160.0 ms
$ java Benchmark
Int arithmetic elapsed time: 13775 ms
$ java -server Benchmark
Int arithmetic elapsed time: 9807 ms
(n.b. this is only a small subset of the tests- I didn't feel like waiting. Trig was not run for java because it took forever.)
To dismiss a few common myths...
1) Python IS compiled to bytecode on it's first run. The bytecode is stored on the filesystem in $PROGNAME.pyc.
2) the -O flag enables runtime optimization, not just faster loading time. On average you get a 10-20% speed boost.
3) Python is a string and list manipulation language, not a math language. It does so significantly faster than your average C coder could do so, with a hell of a lot less effort.
I see just one small issue with the benchmarks. Microsoft claims, that all .NET languages are compilled at the runtime. This means, that the first pass of the execution through the function has a compile time added on top of the execution, which falsifies somewhat the .NET execution time benchmark. I did some simple tests that confirm this. To my surprise, .NET languages are actually faster than Visual C++, Borland C++ or GNU C+ for a simple 1/n series calculation without visible loss of accuracy. Don't ask me how it is possible. I don't know, but it is a fact that my benchmark shows. My best guess would be that the just in time compiler is better in getting code optimized for the CPU in the particular machine it runs or maybe it is better in filling the cache. The key of the benchmark is to write software in such a way that it runs through the function at least two times. The first time it runs just to allow just in time compiler to compile the code and then it runs subsequent times to measure performance. Below is the schematics of my benchmark:
// This is to allow .NET "just in time" compiler to compile the benchmark function
// CurrentTime is a placeholder here for a system time function in ticks // lprt is a placeholder for a nice formatting print here
// This is the body of the benchmark
.NET is part of the execution time but I disagree. My position on this is, that in most real life cases the software runs into the particular functions many times thus creating long exectution times. It is rare, that a signle function call creates long exectuion time that is annoying to the user.
double benchmark(int number_of_iterations);
void main (void)
{
Time start,end;
double outcome;
benchmark(1);
for(int i = 1; i < 11; ++i)
{
start = CurrentTime();
outcome = benchmark(i*1000000);
end = CurrentTime();
lprt (i,outcome,end-start);
}
}
double benchmark (int number_of_iterations)
{
double s,t;
s = 0.0;
t = 1.0;
for(int i = 1; i < number_of_iterations; ++i)
{
s += 1.0/t;
t += 1.0;
}
return (s);
}
As you can see above, I run the benchmark function once with counter of 1 and ignore its outcome before starting to measure time. The key is to allow compiler to compile the benchmarking function before running actual benchmark. Once it is done, I run then the benchmark 10 times for succesively larger counter from 1 billion to 10 billion and print number of iterations (in billions), the accuarcy and the time it takes to run. The idea here is that under the assumption that the benchmark time is related to number of iterations as a linear function I can easily find linear best fit function between number of cycles and run time in the form of
time = a * number_of_cycles + b
and then use value of a as a measurement of the benchmark. The value of b is good check, how the benchmark behaves. If it is large, then something went wrong. In my case it was always close to zero. I'm now away from my home computer and I don't have all the compilers, that were tested in this article, so I can't repeat those benchmarks modified to this method at the moment, but you guys might try to do it yourself.
Some people might challenge this by stating that the compile time for
Best regards.
Not only that, but it helps to have a benchmark that actually tests the things that are claimed at being tested.
It's a Good Thing (tm) this didn't make the front page.
The author states "I am by no means an expert in benchmarking; I launched this project largely as a learning experience" and it shows. The man has an associates degree in computer science, and a Ph.D. in psychology. His list of publications are his dissertation, a single published paper, and excerpts within a 15-year-old travel guide.
Just a cursory glance at the first page of his article shows that he has no clue as to how things work. He states "I first tried to eliminate the CLR from the Visual C++ benchmark by turning off the language's "managed" features with the #pragma unmanaged directive, but I was surprised to see that this didn't lead to any performance gains." If he cannot understand why something that generates a few thousand CPU instructions of initial overhead doesn't change the speed of an I/O- and loop limited program, he isn't skilled enough to interpret his own results.
His benchmarks never actually test his first, second, third, or fifth question. His fourth question is actually addressed better when his contrived test is compared on relative measure with his two Java tests.
Look at his benchmark programs (found here). Some of those test can, should, and will have compiler-specific optimizations, having nothing to do with the language. General 'counting loops', which is the only thing he is using, have long been known to produce bad benchmarks. He claims to be testing 64-bit floating point math, but in fact, many of his examples use 80-bit floating point.
Just for fun, look at his VC and Java 1.4 floating point tests. Now look at his compiler options. It is painfully obvious that the compiler saw "He explicitly said this is a Pentium 4, I can use parallel floating point instructions!" where the other compilers could not. Saying that those languages are inherently faster than the other compiled languages is lunacy.
This is hardly news. This is a BAD example of benchmarking, and would be given a poor grade in a graduate level CS class.
//TODO: Think of witty sig statement
That would explain a lot then.
Any recent version of Visual C++ supports "intrinsics", i.e., conversions of a function call directly into a specific machine code instruction, or "perfect in-lining" if you prefer.
That means that if you use something like abs(x) in C++, these compilers would convert directly into an instruction to get x to the head of the floating point stack, then a FPU abs opcode, with no overheads at all. Same goes for trig functions and other variations supported by the FPU.
Moreover, IME the VC++ optimiser is quite smart about intrinsics, and in lengthy calculations will often arrange for the right values to be coming to the head of the stack in the correct order as cheaply as possible, even if it means planning ahead a few instructions. If you look at the assembly language output from VC++ for a numerical computation, it tends to have a series of instructions to stack what it needs, with the occasional calculation opcode thrown in between them, and then a whole series of neatly co-ordinated calculation opcodes at the end.
I don't see how any language that has function calling at all could keep up with this low-level, direct-to-FPU approach, and surely any language using software emulation of floating point code won't even come close.
I don't know whether VC# and VB.Net use the same trick, but given how much effort MS put into sharing things across the product range prior to releasing VS.Net, it must be a good bet. There are some advantages to writing platform-specific code. :-)
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Why are the two best Borland languages never included in benchmarks? Maybe in just the odd-ball that doesn't use C++, Java, or Micro$oft. - TMK
By replacing the Reader/Writer classes in the java benchmark with their InputStream/OutputStream counterparts I realized a 24% IO performance boost when running the 1.4 JVM with the -server option. The Stream classes don't bother with the unicode conversions that the Reader/Writer classes do. Since the other benchmarks didn't perform unicode conversions (at least the C/C++ ones - can't speak for the other langs), this seemed like a reasonable modification.
Curiously, without the -server option, this resulting in a 78% performance HIT.
- Marty
A few posts mention that .NET is for Windows only. That's completely untrue. .NET executables can be run on Linux, FreeBSD, MacOS X, maybe even other platforms in near future. .NET Runtimes:
Mono - http://go-mono.com - C# and VB
DotGNU - http://dotgnu.org - C#
For Windows, the Microsoft's .NET Framework is the winner, being almost twice as fast as Mono, with DotGNU being the slowest, but I've only done synthetic SciMark2 benchmarks.
(Also don't forget you can compile Java to .NET assembly :)
I mean, the functions are so simple, the code generated doesn't stress anything. Not any of the 'advanced' compiler, or even architecture features. None of the good features of a JIT.
I mean seriously, they do math on all the ints from one to one billion. Why even bother? Adding large 32 bit ints takes exactly the same amount of time as adding small ones (but I guess you save one variable by doing math with the counter. Or one extra line of code saved)
I'm sorry, but this is the most pointless compiler benchmark ever.
A good language comparison would be to have a bunch of groups of people try to code up the best implementation they could in whatever language, of some complex problem, and use that as the baseline.
autopr0n is like, down and stuff.
You are completely wrong. Java programs are taken and converted into machine code on the target platform. Saying that it's "the same as 'C' code" because the JVM is written in C is like saying that if I were to write a C++ compiler in Python, running the resulting binaries would be the same as running Python code.
In other words, idiotc.
autopr0n is like, down and stuff.
I think enough other people have already pointed out that the kind of comparision done in this article is rather useless. Different languages are designed for different uses, and while some languages might favour faster code, other might favour ease of development or portability.
Anyway, even when remaining within a same language or language family, the benchmarks are still quite meaningless. For instance when you want to compare the Performance of MSVC++ and GCC. The benchmark has several flaws:
- the code is too trivial. It doesn't show how good the compilers really are at optimizing
- the code is too library dependent. For instance, in the trig benchmark, only the runtime library is really benchmarked and not the code generated by the compiler itself
- for the floating point benchmarks, the options chosen for both compilers do not match. For MSVC++, the options chosen favour speed over accuracy, while the GCC options favour accuracy over speed.
The last point can very easily be illustrated with the trig benchmark.
On my computer (P4, 2.8GHz), I get the following results:
1) Options from the article: 10.9s
2) additional option -ffast-math : 6.9s
(this option is also a significant win for the double benchmark)
3) options above plus linking with CRT_fp8.o : 2.8s
The last option may need some explanation:
Programs compiled by MSVC++ by default set the math coprocessor to 64bit, while GCC programs set it to 80bit. Linking with CRT_fp8.o on Windows platforms makes GCC programs behave like MSVC++ programs and only use 64bit precision. For arithmetic operations, this makes no difference, but the built in transcedental functions become much faster if you reduce the precision of the coprocessor. So all in all, be were able to reduce the speed of the trig benchmark by a factor 3.9 just by changing the compilation options. This is almost exactly the difference seen in the article between the MSVC++ and the GCC results for the trig benchmark.
All in all, for trivial benchmarks like this, if you chose matching compilation options, different compilers give you almost the same results.
The only real weakness that GCC is showing is 64bit integer arithmetic. These are badly implemented in GCC and could be vastly improved.
Marcel
I assume you don't acutally mean "standard" but "common".
;)
:)
Unofficially standard
Does it have pointers or not? (Yes it does - and yes they are restricted - and the issues that that cuases is not overly severe).
It has references, which aren't the same thing. C++ has references (e.g. char&) and pointers (e.g. char*). And yes, when I say it makes "pointers safer" I mean referencing. Whatever
Java is a FAR, FAR, FAR, beter ___LANGUAGE___ than C++ - ignore the Vm, the libraries, etc... and look at the language - Java is way better.
I have. I've used Java and C++ in two rather large projects. And I really can't see why Java is that much better. Or, indeed, that much different.
I don't have to worry about pointers going astray (but then again, I don't have pointers full stop), or garbage collection, but apart from that, what's the difference? Ignoring the libraries, that the only thing I can think of that's different. Well, except that C++ has multiple inheritence, and so forth.
Perhaps I've missed something. Could you explain why it is "way better"? Remember to ignore the Vm, the libraries and just to focus on the language.
C++ would have been stillborn if it was not C-like - Java would have been if it was not C++ like.
Agreed. That doesn't make Java a good language, though; that's just a reason why it's bad.
That said, I've never really understood that argument. I mean, how long does it take to learn a new language? A few hours for an experienced programmer, really. Making Java C++ may have saved a day or so of programming time on one programming project, but that's nothing compared to how long software takes to develop. On the other hand, companies rarely act in a logical fashion when it comes to software.
I think the author underestimates the impact of having Athlon specific optimizatins turned on for his c compiler(where applicable) while the Java HotSpot JIT compiler likely only optimizes very well for Pentium 4.
.NET languages with Java and used gcc as reference as well. Interesting for me is that Java can compete with .NET.
Conclusion would be: the JIT compiled Java on an Athlon is poorly or not at all optimzed.
Sidenote: the original author of the original benchmark wanted to compare
Further: obviously trig functions (which could be compiled to a single math processor opcode) are not optimzed at all in Java. From the language level calling a trig fuction is a call to a static method in the class Math. If that is "mapped" one to one to machine code it results in a JSR to the C function which contains only a few opcodes, but what a c compiler will compile to one opcode is in trivial 'mapped' Java about 10 opcodes.
angel'o'sphere
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I agree with 2/3 of your replies, but you need to look up the difference between C++ destructors and Java's finalize(). Hint: one is useful, the other usually isn't. :-)
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.