Linux Number Crunching: Languages and Tools
ChaoticCoyote writes " You've covered some of my past forays into benchmarking, so I thought Slashdot might be interested in Linux Number Crunching: Benchmarking Compilers and Languages for ia32. I wrote the article while trying to decide between competing technologies. No one benchmark (or set of benchmarks) provides an absolute answer -- but information helps make reasonable decisions. Among the topics covered: C++, Java, Fortran 95, gcc, gcj, Intel compilers, SMP, double-precision math, and hyperthreading."
Interesting numbers. Have you considered benchmarking Octave or rlab also? (Or is there a native MATLAB for Linus now?)
There is a quick and dirty intro to K over at Kuro5hin.
Some more links for more inforation:
Kernigan's benchmark test
more examples
Kx: the people who make K and KDB
I'm hardly a Java junky, but I've spent a lot of time recently with the language and I've heard a lot of complaints from my peers about Java being slow. Most of the time, just like this author, they're wrong! Java isn't slow, but sometimes you do have to program more thoughtfully to make Java fast.
First things first, though. No one would ever claim that JDK 1.4 is the ultimate Java speed demon. Even the "HotSpot" in server mode is going to be slow if your code isn't written well. But the author fails to do any profiling, and fails to give anyone even a hint as to why Java doesn't perform well. But I shouldn't get on him about his coding, or lack of profiling... neither issue is the reason his test showed Java to be slow.
The real problem: Firs, I'll cut him some slack for not profiling. However, I won't cut him any slack for using an interpreter instead of a JIT compiler. Java's been shown time and time again to be as fast as FORTRAN/C++ when using a good compiler, rather than an interpreter. *sigh* When will the madness end? A 0.07 second query to Google should explain that one to even a novice. Java IS fast. Interpreted byte-code is slow. Java != interpreted byte code; Java is a language.
Anyway, here's a link to a weak, biased, and not so rigorous argument backing up that statement. But, it's an easy read for Java newbies, so I'll risk posting it anyway: Java is Fast for Dummies(tm)
Yes, but indeed if you're really looking to benchmark only, comparing a row-based database engine with a column-based one is like comparing an apple to an orange. Both are fruits, both give you calories, but they're quite different.
Now as we're going off-topic from the original submission, one could benchmark KDB with Sybase IQ Multiplex. Here you're talking about 2 column-based db engines. In my testing, KDB is indeed up to an order of magnitude (10x) faster than Sybase IQ which is itself 2 orders of magnitude (100x) faster than row-based database engines.
However, as the article in the post says, benchmarks don't give the whole story.
Apart from the usual learning curve issues and available management tools (which KDB sadly lacks compared to Sybase IQ), there is one fundamental difference between the 2 db engines (and Oracle, DB2, Sybase ASE, etc...):
KDB is single-process, and does not pool memory. I'm not saying this is bad, but it makes for very interesting architectural issues when designing a system. For example, if you're going to use KDB, you're better off with the fastest possible single-CPU system. The best platform for KDB is probably the fastest Intel P4 Xeon, dual-processor, and as much RAM as possible on the machine. One processor will be used exclusively for KDB, the other for the OS. To grow, you'll implement a farm of those.
On the other hand, the other major DB engines generally perform much better in multi-CPU systems such as 16-way Sun servers. They pool the memory and use all the CPUs you'll give them. This makes for a more expensive single system, but an easier implementation if your application is larger than what a single dual Intel box can provide. In such a case, KDB will need one write engine and multiple read engines, significant storage pooling issues, etc...
Anyway, one last point regarding column-based database engines: they are certainly amazing for reporting and most read commands. Where they lose to row-based engines is in inserts, and in selects that return data from a large number of columns.
In the former case, you trick KDB and Sybase IQ into performing batch inserts (where the loading of columns will only be "wasted" once per batch). In the latter case, you're going to be hurt with KDB and Sybase IQ whatever you do, as they'll have to load in memory all the columns out of which you need the data.
Bottom line:
If you need OLTP (lots of inserts/updates) and aren't worried about extreme speeds, go for Oracle, Sybase ASE, DB2, etc...
If you need fast reporting with very quick time to market, go for Sybase IQ Multiplex.
If you need the absolute ultimate in reporting speed and have the time and resource to apply to it, go for KDB.
time python -O almabench.py user: 22m19.354s
gcc -ffast-math -O3 almabench.cpp -lm time ./a.out
user: 0m50.348s
C++ is only 27 times faster than Python for planetary simulations.
Almabench.py is my own conversion from the cpp source. I will send it to the author for possible addition to the benchmark.
-- Imperial units must die --
Here's some the "better" parts of Java:
Don't get me wrong. Java's fine for certain applications. For lightweight networking stuff, I think it's almost unparalleled. It's also pretty good for prototyping C++ stuff. It's good for lightweight tasks that break down logically into threads -- Java has nice threading support.
My beef is that Java is not, despite its supporters' loud claims (which have been going on for years), remotely performance-competitive with C.
The language simply has some foundational performance limitations in it. It was designed that way, and tweaking implementations cannot get around that.
I agree that there are some nice things about Java.
Rapid Development
Damn straight. Java is a great prototyping language.
Hotspot
Not bad, but not that incredible, either. The benchmarks I've seen haven't shown HotSpot to be incredible, and besides, competitors like C (gcc) have branch-profiling code of their own.
Secure Software
True. There are some improvements. But buffer overflows are less and less common in C (due to *excellent* libraries like glib), and have been fixed in other languages without anywhere near the performance hit of Java (like Ocaml).
One of the big factors remaining is just HOW you write code in Java.
It may be a personal thing, but I have a deep dislike of languages where you have to modify your regular coding style to get decent performance at a given point. It used to be BASICs...you'd use some nasty trick and you could actually get decent performance out of the thing. Then MATLAB. *God* I hate vectorizing operations. I expect that a MATLAB guru simply does this in his sleep, but I find it incredibly frusterating to totally rethink code in an any areas where performance matters.
These things slow Java down, btu also make it more uniform which makes it easier (faster) to
A fair number of the uniformity improvements in Java could have come from simply tweaking syntax (int[50] x instead of int x[50], for example).
I'm all for modern language features...I just think that doing anything that implies a necessary performance hit is a bad idea. If someone wants a given feature, they can slap it on top. I can make C++ have a virtual function, but I can't make Java run quickly.
If you made every function in your C++ classes virtual, used RTTI and Strings to do runtime linking, etc. your C++ programs would be slower too!
Ya, but Stroustroup went to a lot of work to ensure that you only "pay for what you use".
So, I'm not out to bash Java as a usable language. It has some major pluses. However, specifically in the performance arena, Java definitely has issues.
May we never see th