Slashdot Mirror


Linux Number Crunching: Languages and Tools

ChaoticCoyote writes " You've covered some of my past forays into benchmarking, so I thought Slashdot might be interested in Linux Number Crunching: Benchmarking Compilers and Languages for ia32. I wrote the article while trying to decide between competing technologies. No one benchmark (or set of benchmarks) provides an absolute answer -- but information helps make reasonable decisions. Among the topics covered: C++, Java, Fortran 95, gcc, gcj, Intel compilers, SMP, double-precision math, and hyperthreading."

22 of 318 comments (clear)

  1. Octave by sql*kitten · · Score: 5, Interesting

    Interesting numbers. Have you considered benchmarking Octave or rlab also? (Or is there a native MATLAB for Linus now?)

  2. He didn't include K. by Jayson · · Score: 5, Interesting
    K is a high-performance array language. It is based on APL and Lisp. It really shines when crunching obscene amounts of data. This seems like something that would be perfect for the language. The proof of K's speed lies in KDB, a database written entirely in K. On TPC benchmarks is spanks Oracle and other leading databases (including some amazing scaling across processors: simple table scans with 2.5 billion rows take 1 second and multi-dimensional aggregations take 10-20 seconds).

    There is a quick and dirty intro to K over at Kuro5hin.

    Some more links for more inforation:
    Kernigan's benchmark test
    more examples
    Kx: the people who make K and KDB

    1. Re:He didn't include K. by DGolden · · Score: 5, Insightful

      You should consider the readability of the language for someone WHO KNOWS THE LANGUAGE, dammit.

      I don't go round claiming japanese or arabic is unreadable - I just don't know the language.

      The analogy extends further - it is possible to construct almost unreadable drivel in natural languages, and it is possible to construct almost unreadable drivel even in python.

      However normal code written in python, forth, common lisp, or even, god forbid, c++ or perl is readable to someone who knows the language.

      Now, some programming languages are closer to english in appearance than others. However, for long-term use, that doesn't matter so much - it's just a barrier to entry for lazy people.

      I don't happen to know K. I do know APL, though - APL didn't look like K, since it had its own non-ASCII symbol set. I do find it difficult to read the relatively new asciified line-noise APL-derived languages. But that's because I haven't bothered to learn 'em! I do suspect they would be harder to learn than APL, since the ASCII symbols are already overloaded with so many other meanings already - but once I'd learned them, I would expect that problem to fade - just like I'm not confused that "gift" in German means "poison" in english.

      Actually, now that Unicode is widely supported, I would love to see a resurgence in APLs that use APL symbols, since they're much clearer to me - but so many people have been using the ASCIIfied APLs for so long now that that may never happen.

      --
      Choice of masters is not freedom.
  3. Good. by neksys · · Score: 4, Interesting
    No one benchmark (or set of benchmarks) provides an absolute answer -- but information helps make reasonable decisions

    Ah ha! Someone who understands what benchmarks are for and how to use them - it sometimes seems like the corporate world uses numbers from benchmarks only when they prove their claims. Of course, that's the difference between open source and the business world - open source (ideally) looks at every benchmark result and asks "now how can we get all of these numbers better than the competition?" while more traditional businesses ask, "Which of these numbers make our product look the best?". *shrug* its just nice to see benchmarks used properly, is all.

  4. Java is slow? by nsample · · Score: 5, Interesting


    I'm hardly a Java junky, but I've spent a lot of time recently with the language and I've heard a lot of complaints from my peers about Java being slow. Most of the time, just like this author, they're wrong! Java isn't slow, but sometimes you do have to program more thoughtfully to make Java fast.


    First things first, though. No one would ever claim that JDK 1.4 is the ultimate Java speed demon. Even the "HotSpot" in server mode is going to be slow if your code isn't written well. But the author fails to do any profiling, and fails to give anyone even a hint as to why Java doesn't perform well. But I shouldn't get on him about his coding, or lack of profiling... neither issue is the reason his test showed Java to be slow.


    The real problem: Firs, I'll cut him some slack for not profiling. However, I won't cut him any slack for using an interpreter instead of a JIT compiler. Java's been shown time and time again to be as fast as FORTRAN/C++ when using a good compiler, rather than an interpreter. *sigh* When will the madness end? A 0.07 second query to Google should explain that one to even a novice. Java IS fast. Interpreted byte-code is slow. Java != interpreted byte code; Java is a language.


    Anyway, here's a link to a weak, biased, and not so rigorous argument backing up that statement. But, it's an easy read for Java newbies, so I'll risk posting it anyway: Java is Fast for Dummies(tm)

    1. Re:Java is slow? by AG · · Score: 5, Informative

      gcj really is within 10% of g++ on this benchmark, unfortunately he built the gcj program without the all important -ffast-math option (and -funroll-loops). This is a huge penalty for gcj - more than 2x slower without.

      I sent him a note and hopefully he's update his page.

    2. Re:Java is slow? by X · · Score: 4, Insightful

      The -server option actually will imposive significant overhead for this benchmark. The -server option is not going to do any of it's significant optimisations without a TON of work.

      All your statements about C++ having an advantage over Java in terms of memory management are silly of course, since the Java runtime performs these exact kind of optimisations with Java programs. Because the decision is made at runtime rather than compile time it is actually possible for the Java runtime to make better optimisations than the C++ compiler/developer (who's decisions all have to be made a priori). I'm not saying this means Java always wins, because it most certainly does not, but I'm just saying that the "disadvantage" you are talking about is actually a misunderstanding of the conceptual differences between these two models.

      --
      sigs are a waste of space
    3. Re:Java is slow? by X · · Score: 4, Interesting

      For the record, I actually worked with the JPL evaluating Java's floating point performance. This was in the JDK 1.3 era, when HotSpot was still new. They had initially ported a highly optimised C library to Java and found the performance about in line with what this guy got (4-10x slower... actually it was an order of magnitude worse than this until they used the JIT ;-). The Java code showed many of the same performance errors that this guy's code has, as is common when you just do a line-by-line translation to Java, rather than rewriting the code from scratch. I did rewrite the code base, and managed to get the performance within the 10%-30% range. Using JDK 1.4 I'd have a few other tricks available to me which would probably get it even closer (maybe even faster).

      --
      sigs are a waste of space
  5. Re:Very true by homb · · Score: 5, Interesting

    Yes, but indeed if you're really looking to benchmark only, comparing a row-based database engine with a column-based one is like comparing an apple to an orange. Both are fruits, both give you calories, but they're quite different.

    Now as we're going off-topic from the original submission, one could benchmark KDB with Sybase IQ Multiplex. Here you're talking about 2 column-based db engines. In my testing, KDB is indeed up to an order of magnitude (10x) faster than Sybase IQ which is itself 2 orders of magnitude (100x) faster than row-based database engines.

    However, as the article in the post says, benchmarks don't give the whole story.

    Apart from the usual learning curve issues and available management tools (which KDB sadly lacks compared to Sybase IQ), there is one fundamental difference between the 2 db engines (and Oracle, DB2, Sybase ASE, etc...):

    KDB is single-process, and does not pool memory. I'm not saying this is bad, but it makes for very interesting architectural issues when designing a system. For example, if you're going to use KDB, you're better off with the fastest possible single-CPU system. The best platform for KDB is probably the fastest Intel P4 Xeon, dual-processor, and as much RAM as possible on the machine. One processor will be used exclusively for KDB, the other for the OS. To grow, you'll implement a farm of those.

    On the other hand, the other major DB engines generally perform much better in multi-CPU systems such as 16-way Sun servers. They pool the memory and use all the CPUs you'll give them. This makes for a more expensive single system, but an easier implementation if your application is larger than what a single dual Intel box can provide. In such a case, KDB will need one write engine and multiple read engines, significant storage pooling issues, etc...

    Anyway, one last point regarding column-based database engines: they are certainly amazing for reporting and most read commands. Where they lose to row-based engines is in inserts, and in selects that return data from a large number of columns.
    In the former case, you trick KDB and Sybase IQ into performing batch inserts (where the loading of columns will only be "wasted" once per batch). In the latter case, you're going to be hurt with KDB and Sybase IQ whatever you do, as they'll have to load in memory all the columns out of which you need the data.

    Bottom line:

    If you need OLTP (lots of inserts/updates) and aren't worried about extreme speeds, go for Oracle, Sybase ASE, DB2, etc...

    If you need fast reporting with very quick time to market, go for Sybase IQ Multiplex.

    If you need the absolute ultimate in reporting speed and have the time and resource to apply to it, go for KDB.

  6. JIT? by EnglishTim · · Score: 4, Informative

    The article claims that he *is* using a Just-In-Time compiler. What makes you think otherwise?

  7. gcj results incorrect - 2x worse than truth by AG · · Score: 4, Insightful

    Compiling his benchmark with -ffast-math and -funroll-loops more than doubles the performance of the gcj built benchmark on my P3.

    This brings it within spitting distance of g++.

  8. He didn't include python by more · · Score: 5, Interesting
    Results on My P4 1.5 GHz, RedHat 8.0, gcc 3.2

    time python -O almabench.py user: 22m19.354s

    gcc -ffast-math -O3 almabench.cpp -lm time ./a.out user: 0m50.348s

    C++ is only 27 times faster than Python for planetary simulations.

    Almabench.py is my own conversion from the cpp source. I will send it to the author for possible addition to the benchmark.

    --

    -- Imperial units must die --

  9. Fortran compilers and Linux by EmagGeek · · Score: 4, Interesting
    Here is a more in-depth comparison of Fortran 90 compilers for linux. They compared Intel, NAG, Lahey, and a couple of other compilers. Here is a comparison of Fortran 77 compilers from the same folks. GNU g77 is actually the slowest of them all, and I've actually confirmed that it is the slowest of a group consisting of DEC/Win32, Lahey/Linux, and g77. I've always dreamed of the day that open source developers would throw some real brainweight at a really well optimized Fortran compiler for linux, but it looks like I'll just have to keep dreaming. Lahey is only $199 or so, but they place some HORRIBLE licensing restrictions on the binaries that are created with their compiler. The DEC/Win32 compiler is also really nice, but since I'm not in school anymore, I'm not licensed to use it, and even if I _wanted_ to whore myself out to Micro$oft, I couldn't afford to.

    Just to put some things into perspective, here are some numerical results. These were obtained on my dual-athlon 1.4GHz w/ 1GB of RAM. The task was to compute the TE and TM surface currents induced on a circular cylinder 10 wavelengths in circumferece and having a relative permittivity equal to 4-j2. The program simultaneously solves the perfect electric conducting case. The surface was discretized into 60 cells using 120 unknowns (way overkill, but just to prove the point) using the Integral Equation Asymptotic Phase method.

    g77 Compiler (-O2 -malign-double -funroll-loops): 24.11s
    Lahey Compiler (equivalent paramters): 16.45s

    As you can see, there's really no comparison, except that the lahey-created binary uses about 10% more RAM than does the one created with g77. This is just a summary comparison as I did not go into measuring the difference in the error of the two results compared to a reference solution. I'm assuming that both solutions are about the same with regard to accuracy.

  10. Java is really, really slow by 0x0d0a · · Score: 5, Insightful

    ...I've heard a lot of complaints from my peers about Java being slow.

    Allow me to join the chorus.

    Java isn't slow, but sometimes you do have to program more thoughtfully to make Java fast.

    No. Java *can* be made less mind-bogglingly slow by avoiding certain things...preallocating a pool of objects and using primitive types (like int) whenever possible helps. The way the language is designed makes it *easy* to be mind-bogglingly slow. That doesn't mean that going out of your way to avoid these things makes Java fast. It makes it only "slow".

    Java is Fast for Dummies

    Ah, yes. A link tellings how Java isn't *really* that slow on "javaworld.com". I took a skim.

    The first two pages say basically "Java isn't that slow". They then start rambling about various features that make Java a good language.

    They claim that Java programs load faster than native programs. (The article was written, BTW, in Feb '98, to give an idea of how full of BS they are). This is stupid. JVM startup and load time *dwarfs* application link time. Write "hello world" in C++ and in Java.

    First, they laud the small executable size of Java as being a performance boost based on binary format. Everything I've read points the *other* way...Java *is* fairly compact, but can contain data that isn't nicely aligned along host boundaries.

    Second, what they're talking about, if it's even accurate these days, which I doubt, has a lot to do with the lousiness of the Windows runtime linker. This isn't really an issue for Linux.

    Third, while insinuating that minimizing code size provides a performance boost, they talk about how great it is that Java lets you use *built in* libraries, whereas C++ progams need to *bundle* libraries. What? That's stupid. They're shifting the libraries around, but it sure as hell isn't decreasing total amount of data that needs to get loaded.

    Fourth, this gem: Finally, Java contains special libraries that support images and sound files in compressed formats, such as Joint Photographic Expert Group (JPEG) and Graphics Interchange Format (GIF) for images, and Adaptive m -law encoded (AU) for audio. In contrast, the only natively supported formats in Windows NT are uncompressed: bitmap (BMP) for images and wave (WAV) for audio. Compression can reduce the size of images by an order of magnitude and audio files by a factor of three. Additional class libraries are available if you want to add support for other graphics and sound formats.

    They're billing this as *improving* performance? Yeah, I'd love to have my app blow CPU time decompressing a JPEG image instead of reading a slightly larger BMP image if I'm trying to minimize load time. Oh, and have it load all the JPEG loading code, too.

    They then proceed to ramble about selective loading, and try to imply that Java's runtime linking is faster than C++'s.

    They *then* show off smaller binary sizes by embedding a BMP in the C++ binary and a GIF in the Java binary. Impressive.

    They then claim that claims of poor Java performance are based on non-JIT implementations. This neatly lets them avoid actually citing numbers. Sure, I'll agree that Java went from "Performance Hideous" to "Performance Bad". Everyone uses JIT these days, and damned if Java isn't *still* slow.

    They then try to talk about how JIT allows code to be optimized just like C++. Wow. Yup, JIT sure is known for impressive optimization, isn't it?

    They then use the most artificial, contrived benchmarks I've ever seen (which conveniently avoid almost all of the Java pitfalls...they don't need to do array access, they're trivial to implement without heap allocation...)

    They finish up talking about how C++ RTTI performance sucks compared to Java (ignoring the fact that Java hits RTTI code *far* more often than C++ does, like every time it yanks something out of a generic container class).

    Finally, they finish up by talking about a bunch of random Java features that they think are great, like garbage collection "First, your programs are virtually immune to memory leaks." Hope you don't use hash tables, buddy.

    Next, they talk about how a JVM can defrag memory. I'm going to have to just crack up at that. This isn't a performance boost unless you're using a language that *hideously* fragments memory and eats memory like a *beast* (granted, Java is the best candidate I know of). Runtime memory defragmentation went out of fashion with the classic Mac OS...it's pretty much a bad idea as long as you have a hard drive available. VM systems are pretty damn good these days...if you're trying to maximize performance, there are almost always better things to be doing than blowing cycles and bandwidth defragging memory. There's a reason we don't do it any more.

    Basically, my conclusion is that "Java is Fast for Dummies" is primarily aimed at, well, dummies.

    1. Re:Java is really, really slow by 0x0d0a · · Score: 4, Insightful

      Oh, and a follow-up to my previous post. These clowns spent a long time talking about how people can "ignore JIT overhead" because it's almost completely insignificant "most of the time". Fine. Then they spend 80% of the article talking about *binary load time*, which is essentially only an issue under the *exact* same conditions that the JIT is -- once for a single chunk of code. If they're pimping launch time, they sure as hell shouldn't be ignoring JIT time.

    2. Re:Java is really, really slow by 0x0d0a · · Score: 5, Interesting

      Here's some the "better" parts of Java:

      Don't get me wrong. Java's fine for certain applications. For lightweight networking stuff, I think it's almost unparalleled. It's also pretty good for prototyping C++ stuff. It's good for lightweight tasks that break down logically into threads -- Java has nice threading support.

      My beef is that Java is not, despite its supporters' loud claims (which have been going on for years), remotely performance-competitive with C.

      The language simply has some foundational performance limitations in it. It was designed that way, and tweaking implementations cannot get around that.

      I agree that there are some nice things about Java.

      Rapid Development

      Damn straight. Java is a great prototyping language.

      Hotspot

      Not bad, but not that incredible, either. The benchmarks I've seen haven't shown HotSpot to be incredible, and besides, competitors like C (gcc) have branch-profiling code of their own.

      Secure Software

      True. There are some improvements. But buffer overflows are less and less common in C (due to *excellent* libraries like glib), and have been fixed in other languages without anywhere near the performance hit of Java (like Ocaml).

      One of the big factors remaining is just HOW you write code in Java.

      It may be a personal thing, but I have a deep dislike of languages where you have to modify your regular coding style to get decent performance at a given point. It used to be BASICs...you'd use some nasty trick and you could actually get decent performance out of the thing. Then MATLAB. *God* I hate vectorizing operations. I expect that a MATLAB guru simply does this in his sleep, but I find it incredibly frusterating to totally rethink code in an any areas where performance matters.

      These things slow Java down, btu also make it more uniform which makes it easier (faster) to

      A fair number of the uniformity improvements in Java could have come from simply tweaking syntax (int[50] x instead of int x[50], for example).

      I'm all for modern language features...I just think that doing anything that implies a necessary performance hit is a bad idea. If someone wants a given feature, they can slap it on top. I can make C++ have a virtual function, but I can't make Java run quickly.

      If you made every function in your C++ classes virtual, used RTTI and Strings to do runtime linking, etc. your C++ programs would be slower too!

      Ya, but Stroustroup went to a lot of work to ensure that you only "pay for what you use".

      So, I'm not out to bash Java as a usable language. It has some major pluses. However, specifically in the performance arena, Java definitely has issues.

  11. Comments by the Author by ChaoticCoyote · · Score: 5, Informative

    Almost *ALL* of my email is related to Java. I'll be adding the IBM JDK and older versions of the Sun JDK later today, as per reader request.

    I'm making minor updates to the article as the day passes. I appreciate comments from everyone; once I'm through with my e-mail, I'll respond to these Slashdot comments.

    Additional benchmarks will be added to the article with time; I'm putting together a single-precision ("float") benchamrk, for example.

  12. People who don't know by Raedwald · · Score: 4, Insightful
    The C++ code does not use any object-oriented or exception-handling features of C++; essentially, this is a C program with minor C++ convenience features

    God, it feels like I've spent most of my professional life arguing with Fortran programmers. These people are ignorant, but arrogant. They think that because they have a Phd in Engineering (or Physics, or whatever) and can produce a syntactically-correct Fortran program, they know how to program, and can ignore advice backed up by thirty years of software engineering research and experience. Bizarrely, what little knowledge they have is about 35 years out of date, even for those in their twenties. They live in a ghetto.

    As anyone with even the slightest real computing knowledge knows, what gives you performance is the algorithm chosen, not the implementation. Therefore, what matters is how easy it is to implement a good algorithm. Which means, how easy it is to write a program that implements a difficult to understand algorithm (because an inobvious algorithm-- of course, there are some exceptions). Which means that support for modern programming techniques that help you produce easy to understand programs is important for producing high performance programs. You know, things like the following that are absent from the still widely used Fortran-77:

    • Requiring all variables are explicitly defined before use (accounts for one third of all coding errors in even carefully written Fortran progams). A requirement enforced by all other compiled languages now in common use.
    • Programmer-defined data structures (struct in C). Widely available elsewhere since the late 1960s.
    • Structured control structures (while, etc.). Old timers might be aware that that goto battles were fought and won in the rest of the world by about 1970.
    • Aggregation of subroutines into modules or packages. Widely available elsewhere since the mid 1970s.
    • Support for abstract data types. Widely available elsewhere since the early 1980s.
    • Support for polymorphism and inheritence (object orientation). Widely available elsewhere since the mid 1980s.

    So, comparing the performance of toy a Fortran program with its translation into C++ or Java shows nothing.

    What has happended is a second Software Engineering Crisis. The first 'Crisis was in the mainstream, data processing, part of the software industry. The introduction of more powerful computers resulted in large, complex programs that were failures because they were complicated (See The Mythical Man Month). Since then, we have developed software engineering techniques to deal with their problems, so now large programs can be much more complex (composed of many parts) without being excessively complicated (difficult to produce and understand). Since about twelve years ago, the increasing performance of computers means that number-crunching programs (e.g. CFD programs) don't merely process large amounts of data; they are also large and complicated in their own right. The Software Engineering Crisis has caught up with engineers and scientists. The sad thing is, many don't know it, or ignore the advice (and screamingly obvious signs) that it is here.

    --
    Ne mæg werig mod wyrde wiðstondan, ne se hreo hyge helpe gefremman.
  13. I can only try... by ChaoticCoyote · · Score: 4, Informative

    ...the tools I have at hand. I have nothing against TowerJ, but can't test it if I don't own it. As for Java, I made note of a lack of flags, which is different from complaining. The -O flag is no longer supported by Sun's JDK according to the documentation.

  14. Re:Development time vs. run time by ChaoticCoyote · · Score: 4, Insightful

    Using Object-Oriented constructs is no guarantee that a program is maintainable or even readable. I have seen some horrifying OOP code in my life, written by people so enamoured of syntax that they drown theircode in it.

    In numerical applications, and extra 10% can be the difference between success and failure. I'm corresponding with a fellow who works in meteorology; his company uses commodity boxes to compete with government-funded monopolies. For him, the ability to gain 10% is crucial.

    I am all in favor of object-oriented programming -- but my philosophy matches that of Bjarne Stroustrup, who refers to his language as a having "multiple paradigms." Use OO when it makes sense -- but use the right tool for the task at hand. C++ does not force you to use OOP when it doesn't make sense.

    Many numerical applications make mroe sense when using short variable names (that match formulas in texts) and a function-based approach (again, matching mathematical idiom).

  15. awk by Charles+Dodgeson · · Score: 4, Funny

    Surely people remember that most excellent O'Reilly book, Numerical Recipes in AWK. Unfortunately the 1998 review of it has disappeared.

    --
    Prime numbers are exactly what Alan Greenspan says they are -S. Minsky
  16. An old policy by ChaoticCoyote · · Score: 4, Informative

    Back when I wrote reviews for print magazines (back when there were print magazines, that is), it was standard policy to limit reviews to the actually, shipping, commercial product. Demos often lack critical features (like optimizers) or are tuned for benchmark tests, so I've kept to that policy now that I write reviews for my own web site.

    I own licenses for the Intel compilers, for example -- and, of course, gcc and Sun Java don't cost anything in the first place. I'm considering my options in this case; long gone are the days when a dozen Fortran or C compilers would arrive at my door for a magazine review. Heck, there aren't a dozen compiler vendors left... ;)