Linux Number Crunching: Languages and Tools
ChaoticCoyote writes " You've covered some of my past forays into benchmarking, so I thought Slashdot might be interested in Linux Number Crunching: Benchmarking Compilers and Languages for ia32. I wrote the article while trying to decide between competing technologies. No one benchmark (or set of benchmarks) provides an absolute answer -- but information helps make reasonable decisions. Among the topics covered: C++, Java, Fortran 95, gcc, gcj, Intel compilers, SMP, double-precision math, and hyperthreading."
Well, not to knock K (I know nothing about it), but the timing for database operations will almost always be based on algorithm design, almost never on compiler efficiency. Of course, certain languages make trying out different algorithms easier than others. Maybe K's good at that.
Compiling his benchmark with -ffast-math and -funroll-loops more than doubles the performance of the gcj built benchmark on my P3.
This brings it within spitting distance of g++.
The -server option actually will imposive significant overhead for this benchmark. The -server option is not going to do any of it's significant optimisations without a TON of work.
All your statements about C++ having an advantage over Java in terms of memory management are silly of course, since the Java runtime performs these exact kind of optimisations with Java programs. Because the decision is made at runtime rather than compile time it is actually possible for the Java runtime to make better optimisations than the C++ compiler/developer (who's decisions all have to be made a priori). I'm not saying this means Java always wins, because it most certainly does not, but I'm just saying that the "disadvantage" you are talking about is actually a misunderstanding of the conceptual differences between these two models.
sigs are a waste of space
...I've heard a lot of complaints from my peers about Java being slow.
Allow me to join the chorus.
Java isn't slow, but sometimes you do have to program more thoughtfully to make Java fast.
No. Java *can* be made less mind-bogglingly slow by avoiding certain things...preallocating a pool of objects and using primitive types (like int) whenever possible helps. The way the language is designed makes it *easy* to be mind-bogglingly slow. That doesn't mean that going out of your way to avoid these things makes Java fast. It makes it only "slow".
Java is Fast for Dummies
Ah, yes. A link tellings how Java isn't *really* that slow on "javaworld.com". I took a skim.
The first two pages say basically "Java isn't that slow". They then start rambling about various features that make Java a good language.
They claim that Java programs load faster than native programs. (The article was written, BTW, in Feb '98, to give an idea of how full of BS they are). This is stupid. JVM startup and load time *dwarfs* application link time. Write "hello world" in C++ and in Java.
First, they laud the small executable size of Java as being a performance boost based on binary format. Everything I've read points the *other* way...Java *is* fairly compact, but can contain data that isn't nicely aligned along host boundaries.
Second, what they're talking about, if it's even accurate these days, which I doubt, has a lot to do with the lousiness of the Windows runtime linker. This isn't really an issue for Linux.
Third, while insinuating that minimizing code size provides a performance boost, they talk about how great it is that Java lets you use *built in* libraries, whereas C++ progams need to *bundle* libraries. What? That's stupid. They're shifting the libraries around, but it sure as hell isn't decreasing total amount of data that needs to get loaded.
Fourth, this gem: Finally, Java contains special libraries that support images and sound files in compressed formats, such as Joint Photographic Expert Group (JPEG) and Graphics Interchange Format (GIF) for images, and Adaptive m -law encoded (AU) for audio. In contrast, the only natively supported formats in Windows NT are uncompressed: bitmap (BMP) for images and wave (WAV) for audio. Compression can reduce the size of images by an order of magnitude and audio files by a factor of three. Additional class libraries are available if you want to add support for other graphics and sound formats.
They're billing this as *improving* performance? Yeah, I'd love to have my app blow CPU time decompressing a JPEG image instead of reading a slightly larger BMP image if I'm trying to minimize load time. Oh, and have it load all the JPEG loading code, too.
They then proceed to ramble about selective loading, and try to imply that Java's runtime linking is faster than C++'s.
They *then* show off smaller binary sizes by embedding a BMP in the C++ binary and a GIF in the Java binary. Impressive.
They then claim that claims of poor Java performance are based on non-JIT implementations. This neatly lets them avoid actually citing numbers. Sure, I'll agree that Java went from "Performance Hideous" to "Performance Bad". Everyone uses JIT these days, and damned if Java isn't *still* slow.
They then try to talk about how JIT allows code to be optimized just like C++. Wow. Yup, JIT sure is known for impressive optimization, isn't it?
They then use the most artificial, contrived benchmarks I've ever seen (which conveniently avoid almost all of the Java pitfalls...they don't need to do array access, they're trivial to implement without heap allocation...)
They finish up talking about how C++ RTTI performance sucks compared to Java (ignoring the fact that Java hits RTTI code *far* more often than C++ does, like every time it yanks something out of a generic container class).
Finally, they finish up by talking about a bunch of random Java features that they think are great, like garbage collection "First, your programs are virtually immune to memory leaks." Hope you don't use hash tables, buddy.
Next, they talk about how a JVM can defrag memory. I'm going to have to just crack up at that. This isn't a performance boost unless you're using a language that *hideously* fragments memory and eats memory like a *beast* (granted, Java is the best candidate I know of). Runtime memory defragmentation went out of fashion with the classic Mac OS...it's pretty much a bad idea as long as you have a hard drive available. VM systems are pretty damn good these days...if you're trying to maximize performance, there are almost always better things to be doing than blowing cycles and bandwidth defragging memory. There's a reason we don't do it any more.
Basically, my conclusion is that "Java is Fast for Dummies" is primarily aimed at, well, dummies.
May we never see th
You should consider the readability of the language for someone WHO KNOWS THE LANGUAGE, dammit.
I don't go round claiming japanese or arabic is unreadable - I just don't know the language.
The analogy extends further - it is possible to construct almost unreadable drivel in natural languages, and it is possible to construct almost unreadable drivel even in python.
However normal code written in python, forth, common lisp, or even, god forbid, c++ or perl is readable to someone who knows the language.
Now, some programming languages are closer to english in appearance than others. However, for long-term use, that doesn't matter so much - it's just a barrier to entry for lazy people.
I don't happen to know K. I do know APL, though - APL didn't look like K, since it had its own non-ASCII symbol set. I do find it difficult to read the relatively new asciified line-noise APL-derived languages. But that's because I haven't bothered to learn 'em! I do suspect they would be harder to learn than APL, since the ASCII symbols are already overloaded with so many other meanings already - but once I'd learned them, I would expect that problem to fade - just like I'm not confused that "gift" in German means "poison" in english.
Actually, now that Unicode is widely supported, I would love to see a resurgence in APLs that use APL symbols, since they're much clearer to me - but so many people have been using the ASCIIfied APLs for so long now that that may never happen.
Choice of masters is not freedom.
What is it with Java people that they are all up in arms when someone posts what we all already know? Java is *slow*. It is wonderful for certain types of applications, but it is *slow*.
When I write an application, I want to write it in the most appropriate language, not just in the language I have a fetish for. If performance is important, you need to know how various languages perform. This page shows how java (the full system, ie. language, compiler, runtime, everyting) performs for a certain class of application, and it is performing badly.
As for why it is performing badly, that does not matter at all. The mere fact that it is doing so makes it inappropriate for this type of application. And that is a valid conclusion in my book.
God, it feels like I've spent most of my professional life arguing with Fortran programmers. These people are ignorant, but arrogant. They think that because they have a Phd in Engineering (or Physics, or whatever) and can produce a syntactically-correct Fortran program, they know how to program, and can ignore advice backed up by thirty years of software engineering research and experience. Bizarrely, what little knowledge they have is about 35 years out of date, even for those in their twenties. They live in a ghetto.
As anyone with even the slightest real computing knowledge knows, what gives you performance is the algorithm chosen, not the implementation. Therefore, what matters is how easy it is to implement a good algorithm. Which means, how easy it is to write a program that implements a difficult to understand algorithm (because an inobvious algorithm-- of course, there are some exceptions). Which means that support for modern programming techniques that help you produce easy to understand programs is important for producing high performance programs. You know, things like the following that are absent from the still widely used Fortran-77:
So, comparing the performance of toy a Fortran program with its translation into C++ or Java shows nothing.
What has happended is a second Software Engineering Crisis. The first 'Crisis was in the mainstream, data processing, part of the software industry. The introduction of more powerful computers resulted in large, complex programs that were failures because they were complicated (See The Mythical Man Month). Since then, we have developed software engineering techniques to deal with their problems, so now large programs can be much more complex (composed of many parts) without being excessively complicated (difficult to produce and understand). Since about twelve years ago, the increasing performance of computers means that number-crunching programs (e.g. CFD programs) don't merely process large amounts of data; they are also large and complicated in their own right. The Software Engineering Crisis has caught up with engineers and scientists. The sad thing is, many don't know it, or ignore the advice (and screamingly obvious signs) that it is here.
Ne mæg werig mod wyrde wiðstondan, ne se hreo hyge helpe gefremman.
Now, some programming languages are closer to english in appearance than others. However, for long-term use, that doesn't matter so much - it's just a barrier to entry for lazy people.
I don't think so. Using symbols for expressions is not the same as using English, especially if one is developing in more than one language.
You should also consider other facts:
the introduction of new members in the development team, especially if the new members don't know the language
maintenance and service after installation; it may not be the original developer (who was so efficient in K) that maintains the project
readability; after long coding hours, ',' is easily mistaken for '.', for example; and with so many symbols packed together in such tiny space, the problem only gets worse.
learning curve; much higher than a language based on English.
relevance to documentation/pseudocode; for example, it is much easier to make someone understand that the developed code follows the pseudocode defined in a project's specifications when the code is as close as possible to the English language than when the code is a bunch of symbols thrown together.
Although Python/Lisp/C++/whatever are readable, this is because they are based on English. The more a language is based on English, the better it is for big project development. That's why every coding style says "use readable variable names"...If things were as you claim, we all be programming in assembly or with 0 or 1s; after all, how hard to make a mistake with 2 symbols only :-) ?
Don't forget that Hypercard has been claimed as one of the best programming environments because of its ability to program almost in English.
The attitude "symbols are ok, as long as I understand them" shows that you are ignorant of the issues of real development (with managers, deadlines, multiple and heterogenous environments, different coders, testers, bug reports and bug databases, etc).
Finally, the obfuscated C code contents would not exist if code readability did not play an important role!!!
Using Object-Oriented constructs is no guarantee that a program is maintainable or even readable. I have seen some horrifying OOP code in my life, written by people so enamoured of syntax that they drown theircode in it.
In numerical applications, and extra 10% can be the difference between success and failure. I'm corresponding with a fellow who works in meteorology; his company uses commodity boxes to compete with government-funded monopolies. For him, the ability to gain 10% is crucial.
I am all in favor of object-oriented programming -- but my philosophy matches that of Bjarne Stroustrup, who refers to his language as a having "multiple paradigms." Use OO when it makes sense -- but use the right tool for the task at hand. C++ does not force you to use OOP when it doesn't make sense.
Many numerical applications make mroe sense when using short variable names (that match formulas in texts) and a function-based approach (again, matching mathematical idiom).
All about me
Firstly, I would like to echo the sentiment in kfg's reply to your post.
:-)
:-) ?
.i header includer files had a near 1:1 mapping to the Amiga C .h header include files, including macros for structs and so on.)
:-) ) understands them". I include English-like symbols in that.
Secondly, all html formatting seems to have stopped working- dunno why, so I apologise for the poor formatting of this post.
Thirdly:
*** Now, some programming languages are closer to english in appearance than others. However, for long-term use, that doesn't matter so much - it's just a barrier to entry for lazy people.
***
*** I don't think so. Using symbols for expressions is not the same as using English, especially if one is developing in more than one language.
***
O.K. I may have been oversimplifiing - but: ENGLISH IS SYMBOLS TOO. Using English is exactly the same as using symbols - since using english, or any language, *is* using symbols. That's how humans abstract. In written english you have letters, composed into compund symbols (aside: these compund symbols, "words", are often treated as primitve symbols by fast readers, whose symbol-recognition wetware seems to recognise them in one swoop.)
Now, one could argue "then why not use familiar symbols" - but I find using familiar symbols just because they're familiar to be a bit silly and often dangerously misleading. Think of the emotive symbols "theft" or "piracy" applied to "violation of copyright law", in reality a quite different concept. Or "=" used for both assignment and equality testing (arrgh!!!!), neither of which correspond to mathematics-=, which is a statement of equality.
***You should also consider other facts:
***
***the introduction of new members in the development team, especially if the new members don't know the language
***
It is often better to just budget for bringing the new member up to speed in the language. Or just decide to only hire "speakers" of the language in question. You'll have to bring the new member up to speed on all the little pecularities of your codebase anyway, a much harder task for the new member than merely learning another new computer language.
***maintenance and service after installation; it may not be the original developer (who was so efficient in K) that maintains the project
***
No, but one would hire or train someone capable of maintaining it. Here's usually where the strong arguments for using COBOL or Java come in - "what if one can't hire a K developer in 3 years?", and so on. But learning a new language should be VERY EASY for anyone who calls himself a "programmer" - the hard part will be understanding the code, not the language.
***readability; after long coding hours, ',' is easily mistaken for '.', for example; and with so many symbols packed together in such tiny space, the problem only gets worse.
***
Yes, that is true - but, interestingly, the number of lines of code written by a programmer in a day stays roughly constant - regardless of language used - so the more verbose the language, the less your programmer is doing. And non-ASCII APLs, for example, have much easier to distinguish symbols than #\, and #\.
***learning curve; much higher than a language based on English.
***
Somewhat higher.
Note that I consider "A language based on english" very different to a language in which symbols^Wkeywords happen to correspond somewhat to english words. There are very few programming languages based on english, in which the grammatical structure of the language corresponds closely to english. Many programming languages, however, have keywords based (loosely - printf ??? like "print" followed by a stifled sneeze...) on english words - but the symbols are strung together in ways that are very different to english, and have their own meanings in that language that are typically very different to their english meanings. You can english-open an english-door, but you can mainly only unix-open a unix-file. I won't mention "ontological commitments" right now.
Once you know what "printf" does, you manipulate it as a whole in a c program, you don't spell it out each time you write it "p-r-i-n-t-f" spells "print". A Chinese person can write in C quite easily without knowing much English - once he knows what the symbol "printf" does in the context of C, he doesn't *need* to know that there is an english word "print" or that "f" is the first letter of the english word "file".
So english-like symbols can indeed help in the discovery phase, when you are trying to guess what a symbol does - but so could spanish-like. Or chinese-like. RTFTM (Read-the-fucking-translated-manual) can help here - as a programmer you are manipulating symbols when you are writing a programs, and one of the things programmers spend most of their timing doing is looking up definitions of the meanings of symbols in a particular context - if the definition is in french, and you're french, you can use the symbol, even if the symbol doesn't look french.
***relevance to documentation/pseudocode; for example, it is much easier to make someone understand that the developed code follows the pseudocode defined in a project's specifications when the code is as close as possible to the English language than when the code is a bunch of symbols thrown together.
***
Not if the documentation or pseudocode is in any language other than english. Note that I am European, so I am probably more used to a multilingual environment than you if you are American.
***Although Python/Lisp/C++/whatever are readable, this is because they are based on English. The more a language is based on English, the better it is for big project development.
***
I would disagree e.g. Lisp is SO not based on english. The symbols defined by the language spec are. The "sentence structure" is almost completely alien.
***
That's why every coding style says "use readable variable names"...***
Yes - but readability depends on the reader. And I'd prefer "use meaningful variable names". So english variable names make sense on a big project, since chances are your reader speaks english. But would you suggest using "BEGIN" and "END" instead of "{" and "}" on the same project ???
*** If things were as you claim, we all be programming in assembly or with 0 or 1s; after all, how hard to make a mistake with 2 symbols only
***
Not at all - you misrepresent me. That would be antithetical to forming abstractions via new symbols. (aside: don't forget, most stuff is in fact written in "portable assembler" like languages like C - most good assemblers allow macros and therefore the beginnings of higher-level languages -.e.g Amiga m68k macro assembler
***Don't forget that Hypercard has been claimed as one of the best programming environments because of its ability to program almost in English.
***
(a) Hypercard was claimed as one of the best programming environments for "non-programmers".
(b) Hypercard was one of those few languages where the grammatical structure of the language, not just the keywords, are english-like.
***The attitude "symbols are ok, as long as I understand them" shows that you are ignorant of the issues of real development (with managers, deadlines, multiple and heterogenous environments, different coders, testers, bug reports and bug databases, etc).
***
I assure you, that not my attitude. I am a professional developer and encounter all of the above issues on a daily basis). English-like symbols do indeed help when trying to understand a system. They can also mislead - a variable called "applecount" that holds a count of "all oranges, pears and apples since last tuesday" are very annoying, for example.
My attitude is "symbols are ok, so long as the intended readership (including the computer
***Finally, the obfuscated C code contents would not exist if code readability did not play an important role!!!
***
True. To make the point in my previous comment more concrete: But have you ever seen legalese or a patent document? They're supposed to be in english - all legalese looks obfuscated because they're trying to fit a precise layer on top of an imprecise natural language, and patents are deliberately obfuscated as a matter of course.
Choice of masters is not freedom.