C++ the Clear Winner In Google's Language Performance Tests
Paul Dubuc writes "Google has released a research paper (PDF) that suggests C++ is the best-performing language on the market. It's not for everyone, though. They write, '...it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.'"
I thought this was common knowledge. Did anyone ever doubt that?
My blog. Good stuff (when I remember to update it). Read it.
RTFA and take a good hard look at what they compared it to: Java, Scala, and Go. This post is a complete non-story.
This had been much more fun if there had been more languages and compilers benchmarked. Really interesting read though.
HTTP/1.1 400
Wow, they compared a whole four languages: C++, Java, Go and Scala, of which, C++ is the fastest. Is this seriously a surprise to anyone?
Yes its true that C/C++ is generally faster than other languages, but when it comes to writing bug proof code, its not so good. Its very easy to write past the end of arrays and use bad pointers amongst other things. From a career point of view, C/C++ is bad. I should know, my main expertise is in it and I am struggling to find a job. There seems to be way more jobs for Java and C# programmers.
FTA: "All languages but C++ are garbage collected, where Scala and Java share the same garbage collector."
That's got to play a factor here.
There's no -1 for "I don't get it."
This jibes with "common sense" and the computer-language shoot-out
It's not useless. It's nice to see multiple studies with different approaches coming to the same conclusions.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
They didn't test BASIC? Lame...
-I only code in BASIC.-
I'm disappointed, everybody knows that when the going get tough FORTRAN get going
this post contain no useful information, no need to mod it down
A slower language just means you need to buy more rack space. A more expensive development language (like C++, which needs more skilled coders, more debug time, etc.), means that you need to buy more developer man-hours. As far as the business world is concerned, I thought everyone had come the conclusion already that as far as business apps go, C++ isn't generally worth it.
Leave C++ for the games, the operating systems, and the frameworks that the higher level languages run on. But since everyone already knows this, TFA is kind of pointless.
The fact that they left out C# seems odd as well. It makes me wonder what the point was, really.
- Spryguy
There are three kinds of people in this world: those that can count and those that can't
Notice one of the comments pointed out that Borland Pascal was one of the fastest executing languages next to ASM. I remember that Borland Pascal (in 19991) executed almost 10 times faster than Borland C++ on a consistent basis on the same systems.
This only points out that tests need to compare apples and apples. I would be quite surprised if any C++ can execute a FFT as fast as my Leahy FORTRAN95.
If I was going to pick only one language to work with, it would probably be LISP, but Haskell comes a very close second. I like code that does exactly what I want it to do with no side effects.
There is much more to comparing languages than is reported in the article, including testing the language's suitability for a given task.
"The mind works quicker than you think!"
If pure speed is the sole criterion with tuning effort having zero consideration, wouldn't masterful Assembly or opcode be the fastest?
"Love heals scars love left." -- Henry Rollins
In the absence of evidence to the contrary, it's reasonable to assume that C#'s performance results would be about the same as Java's. Testing C vs. C++ vs. Fortran would much more interesting. (There is no such language as "C/C++" and it's really irritating when people lump them together, as many commenters on this story have.)
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
In some situations Java will be faster than unoptimized C++ - JIT compilation will do enough of a better job than vanilla C++ to make the difference. In general, C++ will clearly be faster. However, I think what most of the people you're qualifying as idiots get up in arms about (rightly) is the assumption that so many programmers seem to make that Java will be many times slower than C++. That's (usually) just wrong.
In particular, here's what Google's analysis had to say about it on page 9:
They go further to say that they deliberately chose not to optimize the Java further, but several of the other C++ optimizations would have applied to Java.
For most programming tasks, use the language that produces testable, maintainable code, and which is a good fit for the kind of problem you're solving. If it performs badly (unlikely on modern machines), profile it and optimize the critical sections. If you have to, write the most critical sections in C or assembly.
If you're choosing the language to write your app based on how it performs, you are likely the one making bad technical decisions.
Of course its going to be a C that wins. It's pushed close to the iron. C++ is for the careful, exacting personalities. I simply don't have the patience to use it on a day to day basis. Scripting is for the frenetic like me. If you pick a scripting language, you're selling out performance for keeping your rapid development and sanity. You can write beautiful safe stuff in C++ too. Use what you're comfortable with.
'...it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.' I think that's the problem in a lot of systems today. Too many "average" programs are being written by too many "average programmers" - and this is why there are the problems such as memory leaks, etc. I feel too many have been spoilt by the ridiculous memory and processing speeds available in todays computers. What ever happend to understanding how functions perform and refining code to be lean and mean?
From the paper (section 6,E: Java Tunings) ". Note that Jeremy deliberately refused
to optimize the code further, many of the C++ optimizations
would apply to the Java version as well.'
--------------------------------------------- "In the end, we're all just water and old stars."
Assembler is by far the best performing language. But most of the time, Java performs well enough with modern hardware, so that performance becomes a moot point. As the article mentions, Java is the easiest to implement. It also has the advantage of being cross platform, something which the article doesn't address.
... most people are doing it wrong.
Good use of C++ fills a very small niche of people that want a relatively high-level language but care about a statement like "the compiler generates good code for this"... You want some of the properties of C, like being close to the hardware and generating straightforward machine code. Add to that some things that make OO easier. Add type safety. Add templates and function objects, which due to inlining gives you much better machine code than the typical C approach of a function pointer and a void* to provide such extensibility. What you have is kind of like "a better C". It has a lot of quirks and baggage, but with the proper understanding of the good and the bad it's really good for the sort of niche where these choices make sense.
The problem I have encountered is that bad C++ programmers are a dime a dozen. I don't think I've ever had any co-workers who would understand my previous paragraph. I know from reading books and papers and Internet that people who "get" C++ exist. The best I can say is that the vast majority of people using a C++ compiler don't know what they are doing. Instead I've met a lot of people writing C++ code who probably should be writing in Java anyway; they discard most of what C++ is good for, usually because they don't really understand it and they're trying to write Java-ish code in C++. The results aren't pretty.
In a way I agree with what Linus said in one of his famous emails, where some silly person was suggesting to rewrite git in C++. To paraphrase: Choosing C as a language is good even if the only thing it accomplishes is it keeps out the bad C++ programmers.
I guess the silver lining is that no two people can agree on what "good" C++ is. So maybe I'm just being too harsh in my assessment.
The cases in which performance is critical, are getting less and less, now that hardware is getting faster and faster. Not strange that Microsoft is focussing on JavaScript and HTML5. It seems that at the moment the greatest effort in performance improvement is put into JavaScript.
I don't know about that, maybe it was faster than Borland C++, but I did a lot of work (disassembly) on Borland Pascal compulation units and executables back in the day, and the code was horrible. HORRIBLE. Didn't even have the most basic peephole-optimizations (though someone wrote an external application to do that). It was fast to compile though, due to being one-pass, but that right there sacrifice optimizations.
So if TP/BP was faster than BC++, it was only because BC++ must have been even worse than I can imagine.
Belief is the currency of delusion.
The fact that they left out C# seems odd as well. It makes me wonder what the point was, really.
Google has an official list of four languages that they approve for use: Java, JavaScript, C++, and Python.
Most of their business runs on Java. Search, Google Maps, etc.
YouTube mostly runs on Python, IIRC.
So, they compared C++, on their list; Java, on their list; Scala, which is not on their list but runs on the JVM and thus would work for them; and a language invented at Google. They didn't test Python, but if they had it would have come in last place (and I say that as one who loves Python; it's reality).
Why is it surprising that Google was testing languages that Google might use for projects?
P.S. Exciting things are happening in the Python world with the PyPy project. PyPy is a Python system that is written in Python. You might expect that to be slow, and for years it was slow; but it includes a Just-in-Time compiler that generates native machine code, and it isn't slow anymore. In fact, it is now faster than the C Python, because it can optimize away lots of stuff that C Python has to do.
http://pypy.org/
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Here's some strong evidence to the contrary:
http://reverseblade.blogspot.com/2009/02/c-versus-c-versus-java-performance.html
trollfood / fanboi-food post is obvious debating the effectiveness/awesomeness/speediness of a programming language is like saying "apples are better cuz oranges have hidden pips"
Which is exactly the point. You compare performance exactly in order to create evidence one way or the other.
The Tao of math: The numbers you can count are not the real numbers.
Nope, C++ certainly not for everyone. But the most powerful tools rarely are.
/* No Comment */
You mean suspend/resume? That was deprecated like 10 years, but functions required to do it right were in from the beginning (since 1.0).
Many talk about time to develop, and certainly C/C++ sucks in that capacity. And many talk about code complexity and liklihood of bugs due to memory management, and certainly there's much to be said there too. But C/C++ fails me in one very simple business category: code robustness.
Every project I work on has clients doing one very important thing -- turning the entire thing upside-down and backwards three times. In C/C++, I'd get to start from scratch every time. In Perl, I get to shuffle a few lines of code to invert a feature, and that's it.
It's really painful to write C/C++ code in a flexible manner for future arbitrary upgrade. And that's what I need in order to run my business. The other speeds can be made up by throwing money at additional hardware. I can't tell a client that it'll take months to make a decent upgrade, nor charge them for that much time.
Many talk about time to develop, and certainly C/C++ sucks in that capacity. And many talk about code complexity and liklihood of bugs due to memory management, and certainly there's much to be said there too. But C/C++ fails me in one very simple business category: code robustness.
Every project I work on has clients doing one very important thing -- turning the entire thing upside-down and backwards three times. In C/C++, I'd get to start from scratch every time. In Perl, I get to shuffle a few lines of code to invert a feature, and that's it.
It's really painful to write C/C++ code in a flexible manner for future arbitrary upgrade. And that's what I need in order to run my business. The other speeds can be made up by throwing money at additional hardware. I can't tell a client that it'll take months to make a decent upgrade, nor charge them for that much time.
Lowest-level language with careful tweaking yields best performance. Film at 11.
" Functional languages should excel at this, they have been ruling the program transformation/analysis space for a long time."
I think the key is homoiconicity, not functionalness. Prolog isn't a functional language, and it's mainly used for domain specific languages/one-off compilers.
Google has released a research paper (PDF) that suggests C++ is the best-performing language on the market.
No, they didn't. They compared four languages (C++, Go, Java and Scala) using a single algorithm, and two implementations (initial and improved) per language. Out of those, the optimized C++ turned out to be the fastest and the least memory hungry, whereas the improved Scala version used the least source code, and the improved Go version compiled the fastest.
None of this allows generalization to "best-performing programming language on the market".
It's not for everyone, though. They write, '...it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.'
This is a very important point. If you are Google, you probably have developers who can do this kind of tuning, and you will probably benefit from it (the developer effort is expensive, but inefficient software may well be more expensive at Google's scale).
In general, though, what you want to consider is not only the best performance that has been produced by the world on a single problem, but also the performance on different problems, the variation in performance between implementations, the average performance, and the development time.
In 2000, Erann Gatt (now Ron Garret) published a paper (PDF) that showed the results of comparing 16 implementations written by 14 programmers, in C or C++ (lumped together), Java, and Common Lisp or Scheme (lumped together). These results show that the fastest programs were written in C or C++, Lisp produced the fastest programs on average, and offered the least variation in performance. Lisp also offered the shortest development time on average.
Of course, this is old data. If anyone has performed a similar study more recently, or including different problems to be solved rather than a single one, I would be very interested.
Meanwhile, the Computer Language Benchmarks Game compares many language implementations across several different tasks, with multiple programs for each task, and shows that the results differ depending on exactly how you measure.
Apparently, if you want the fastest programs, you should go with C, C++, Ada, ATS (Fortran, Common Lisp, and Python also produced fast programs, but weren't as good on average). If you want short programs (which may be expected to correlate with short development time), you might want to go with Ruby, Python, Perl, Lua, or JavaScript. If you want short development time, but also reasonable performance, then Go, Scala, or Haskell may be good choices (or you could go the time-tested route of writing what you can in rapid development languages, and the parts that need to be fast in high-performance languages).
Please correct me if I got my facts wrong.
There is no such thing as a real "computer language speed test", this is really a test of compilers, environments, VMs/interpreters, and environments. The first question that has to be raised is of course when the program is running on hardware, what "language" is it written in? The hardware sure doesn't give a shit. You can compile almost any language into native code, including VM driven languages such as Java and PERL. Now granted TFA states that they ran their loop 15k times to try to minimize the effect of the load time and run as much JIT compiled code as possible, but that's still not the same as compiling it directly into native code.
Which brings up another point, are they really testing the "programming language"(which is just a bunch of specification and usually implementation hints) or are they testing the compiler/environment they are on. Code compiled with GCC and ran on a Linux box will probably perform differently than code compiled with Microsoft's compiler running on Windows which will behave differently than code compiled with LLVM/CLanger running on OS X...... You can probably say the same thing about Java compilers, I'm assuming they used the Oracle reference javac, but there are other Java compilers out there. How do you test the speed of the "language" when so much of that performance depends on the compiler and environment you are running other?
Which leads into my final point, when does a language stop becoming something written in that language? Although not tested this time, probably the best example of this point is Ada. Anyone who has coded in Ada knows how insanely strict it can be, it constantly does things like bounds checking to ensure that data stored in subtypes is within the bounds of those types. However on most Ada compilers most of these checks can be disabled with just a couple compiler flags. Obviously the resulting code is going to be faster than if you kept the checks in, but does it stop becoming Ada at that point? You can make a similar case for Java and JNI. JNI is completely legal in the Java language specification, but when you use JNI does your program stop being a Java program? Could you have optimized it further by using JNI?
This is merely a test of whatever compilers/VMs they used in whatever environment they ran the code in, nothing more, nothing less.
Monstar L
Surely assembly would be the highest performing language, while also requiring the most extensive tuning efforts and a level of sophistication not available to the average programmer...
More importantly tho, would be to know which language provides the best balance between performance and work required?
And ofcourse another thing to consider, is how often the code will run.. Not much point saving an hour writing it, only to waste 500 hours of cpu time running it.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
I think the biggest problem with Java is the memory bloat. A lot of the developers that work with Java have a very limited understanding how things work under the hood. This usually leads to programs that include references to "good to have" packages/frameworks that contain some functionality they need which in turn leads to memory bloat. If you then have a java-program that needs to run in multiple instances you don't gain the functionality of one memory-image many processes most OS's support. (One of the core-features that Dalvik has if I remember correctly).
Since Java have GC a lot of Java-folks seems to think memory use isn't something you should care about since the "GC" can handle it for them. This leads to bad coding-practices. No matter the language you should always know the scope of an objects life.
IMHO it doesn't matter what language you are using as long as you know what you are doing. Java often get "bashed" from C/C++ programmers but I think that's because Java as a language has a lower threshold with regarding to prerequisite knowledge about CS and is easier to learn, which produces programmers that thinks about computers as a mysterious black box that "just works (mostly)" and the code reflects that in functionality and poor performance.
--- Reality doesn't care about your opinions, it happens anyway and if you are in the way you'll get squished.
a-choo - excuse me. So a Toyota Camry is the best all-round performer of vehicles on the road today, But if you want super performance, buy a Ferrari, and if you want to haul freight, buy a Mac truck. Really? Wow. Could a more pointless generalization be possibly made about programming languages? What it really points to is that Google have now totally lost the plot in their love affair of their own cleverness. Hello Borg Mk II
It often turns out programmers are not as good at the assembly as they might think. I'm not saying a little hand tuned assembly isn't useful for some things but the "I can do better than a compiler," attitude usually has no basis in reality. Good ones are pretty clever at optimizing things. So maybe if you have an area of your code that uses a lot of time (as determined by a profiler, don't think you are good at identifying it yourself) and write a hand tuned in-line assembly version (maybe starting with the assembly the compiler generates). However you don't go and write the whole program in assembly.
C++, of course, is something you can quite easily write a very large program in, or operating system for that matter. Not quite as easy as a fully managed language, but perfectly feasible to deal with large scale and indeed it is what a large number of projects use.
The closer to writing in pure binary the more efficient your code potential is. Im sorry for the guys who invested so much time mastering Java and still have hopes to become world renowned programmers.It is no wonder why embedded developers are in such high demand.
Java and c# are rather close on that list, arent they? And both are slower than C++, sometimes significantly, on all but one test.
Plus those tests are mostly very small numerical benchmarks which will not stress the GC much and which are not very representative of typical applications. As the google paper showed, gc tuning is an important part of application performance for larger projects.
What, no INTERCAL? It could have blown away the competition; it has "COME FROM"!
The paper doesn't talk about using:
1. Custom allocators (stack and pool based)
2. Move semantics
3. Modern C++ idioms such as expression templates
I believe if these were used, the performance would have even better and left the rest "truly" in the dust...
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
Please show me how to implement Macros -proper Lisp-style macros- in C or C++ so that they are available in standard C (or C++).
Thanks.
Bad analogies are like waxing a monkey with a rainbow.
Is that a challenge Google?
The competition was Java, Scala and Go. Quite a limited set and definitely missing C for one. And unlike C++ optimization, optimizing C is something you can do without understanding details of the compiler. So I will definitely stay with C for everything that has high performance needs, even if I have to do the OO aspect myself.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
This is what matters for me. I want my program to have acceptable speed just by writing it in the most straightforward way. Squeezing out the last bit of performance isn't usually necessary, but 10x slower is also unacceptable. Many people defend slow languages by saying that premature optimization is the root of all evil, but in C/C++, you simply don't need to do ugly premature optimization in most cases, since the program often runs fast enough as long as all best practices are followed.
C++ seems to be useful in most cases as long as all developers know it well (using correct data structures, not copying large objects around unnecessarily, etc.), including important libraries in STL and Boost.
C is also good with the right libraries such as glib. It is more verbose than C++ and manually freeing all the memory can take a bit of developer time, but it is also easier to learn.
Fortran 90 is even easier, but is only suitable for numerical programs without complicated data structures. Java is also easier and fast enough, and may be a good choice for non-numerical work (numerical programming is still a bit awkward).
Scripting languages are, for the most part, still too slow for non-scripting work. Sure, a program in these languages can be made faster by writing some speed-critical parts in C, or by writing the program carefully so that my particular implementation can optimize it well, but this will usually take more effort and make the program harder to maintain.
The benchmark imposes the restriction of using "idiomatic constructs". But that's just what you wouldn't do if you had to optimize a performance critical section by hand.
Nevertheless, for deciding among those for languages, the conclusions are probably still reasonable. The verbiage before just serves as an alibi.
What the paper really says is "we don't want to use Go"; there is probably pressure to do just that inside Google.
Seems to me they are. For one, their MHz are pushing up. Not by leaps and bounds, but they keep improving. Take a look at the last three generations of mid-range quad core Intel chips ($300ish):
--Sandy Bridge (current) goes from 3.4GHz to 3.8GHz depending on core load (meaning it changes speed dynamically).
--Lynnfield goes from 2.93GHz to 3.5GHz depending on core load.
--Yorkfield 2.67GHz, no boosting.
Yorkfield came out in March of 2008, Sandy Bridge in January of 2011. So in in about 3 years base clock rate has gone up 700MHz and beak clock rate over a GHz.
Now that aside, the CPUs are getting much more efficient per clock. You find that each generation has significant per clock gains over the last one. Same code, same MHz, faster execution.
Then of course there's new instructions. AVX in particular really boost Sandy Bridge's floating point speed.
Add it up, they've increased in speed a lot.
I agree that threading is important, as I noted all this is on quad core processors, but the idea that you can't do threaded code in C++ is silly. The most simple demonstration against that is all the PC games out there. Nearly all of them are done in C++ (Visual C++ to be specific) and these days they are all multi-threaded. For more intense threading in C++ apps, have a look at audio software and plugins.
If your products are all ones where speed is not an issue, and there are plenty that are not, and where the overhead of Java (not so much things lime memory, more like administration in that Java wants to update all the time and you have to have it installed before you app works) are ok then wonderful. However there are a lot of applications out there where performance matters and it still seems to be that languages that generate native code, like C++, do a better job.
Even if I prefer Fortran to C due to the ease of handling matrices, I haven't seen any case where Fortran is faster than C. Therefore "when the going get tough FORTRAN get going" cannot be due to speed. Or, any links to benchmarks to back it?
Talk to Java heads they'll tell you Java is already faster than C++. They can show you some contrived tests to demonstrate this too! Of course they never seem to have a good answer for why if this is the case all performance important apps (like games, audio software, etc) are written in something else or why Java linpack pulls like 750Mflops on a system where native compiled (from C) Linpack gets 41Gflops. However they are sure it is faster!
I haven't used Apple's new compiler, but under Leopard (10.5), using GCC, the actual machine code that is produced for objective c is *awful*. Constantly re-using the same registers, putting them away, bringing them back... it's as if the compiler uses a one-type-of-each-register model. Speaking as an assembly programmer, for any particular procedure that I looked at, I could have *trivially* whipped GCC's ass back into the stone age for execution efficiency.
Apple says the new compiler produces better code, and all I have to say at this point is that shouldn't have been difficult to do, because obj-c -> gcc sucks ass.
I've fallen off your lawn, and I can't get up.
No they're not, they just better understand the argument than you do apparently, and Google's conclusion doesn't disagree with them.
I think everyone agrees C++ can theoretically perform faster, but as Google notes you need to perform a lot of optimisations by hand which requires a lot of time, and a high degree of skill to perform.
The argument for Java is that it performs as well as C++ when you either do not have the time to do these optimisations, or do not have the skill to do these optimisations. There is a further argument that in this day and age there is no point doing such optimisations because the performance edge is small enough to be not worth investing the time in.
The fundamental point is that an extremely skilled developer, with no deadlines, will always be able to produce better performing code in C++ than Java, but:
- Time is money
- Skills cost money
- The code may be harder to read and maintain due to optimisation
- The software will be inherently more prone to developer errors that can lead to security vulnerabilities
So in the real world C++'s performance advantages are more often than not, irrelevant. That is, the real world doesn't really care about performance penis waving C++ developers 99.99% of the time, because they'd rather have code that's portable, easily maintainable, has a smaller potential attack area, and is developed in a cost effective manner. It is these real world factors that give Java the edge, and under these real life factors that C++ struggles to compete. That is why Java, and other languages like C# and .NET have eaten massively into C++'s marketshare in the last decade.
The arguments implying C++ is the one true language are produced only by those lifelong code monkeys who never got passed that role because they simply never understood the practical or business side of things, and without a doubt whilst they can write great code that performs well, there will be inherent disadvantages- their code will over time simply by nature of humans inability to be 100% consistent in avoiding mistakes be more prone to security vulnerabilities, will be less maintainable, and will cost more to develop.
It's really about the best tool for getting the job done without worrying about some performance gain only noticable to a user if they see the sub-second comparison metrics listed for them, and I'm glad to see others point that out too- knowing when to and not to use a certain tool is the sign of a great developer, not dismissing dissenting opinions as vocal idiots without apparently even understanding the argument they make.
Why would anyone expect average programmers to write well performing code in any language?
My conclusion is that the performance advantages of C++ are easily offset by its ability to confuse the programmer. C++ is arcane and complicated; Java is more limited; Python is a lot cleaner. If you know exactly what you're doing, then C++ can be the fastest option, but in real-life situations, with deadlines and average programmers, don't go there.
I've read the entire thread, and one common complain about c++ is manual memory management.
So, my question is this: why don't people use smart pointers? boost has the class boost::shared_ptr which works with lots of compilers, and the upcoming c++ will have std::shared_ptr.
With smart pointers, there is no need to manage memory manually. I've written whole apps (130-150Kloc) without a single delete statement and without a single memory issue by using smart pointers.
Look at performance critical applications: signal processing, image processing, graphics, data base implementation, operating systems. Any time performance is high on the list people pick C, C++, Fortran, and sometimes ADA. Compilation is the only feasible choice. One thing that compilers give is the ability to tune the algorithm based on real execution profiling. If you don't compile at some point you will run into language implementation decisions that are out of the coderscontrol, and you will hit a wall. This means more work, but greater effort gives better results. If other things like programmer time or ease of modification are important then non-compiled languages may be better. One size does not fit all.
Why is Snark Required?
Alex Trebek: This language has the best performance. It's almost "D".
Watson: What is JAVA?
Is this actually news worthy? A better question is: which one allows programmers to perform better? I'm not just talking lines of code, but rather how much time is spent developing in general. I guess that would be hard to measure but it would make for a far better research paper.
After teaching programming for a while now, it's my very strong suspicion that far more performance bottlenecks are caused by programmers who don't even understand what time and space complexity is, than are caused by competent programmers implementing in one language over another.
For instance, I just set a scripting assignment (to be completed in bash 3.0, which lacks associative arrays) which asked students to process syslog files and report how many times different applications appeared in the logs. Some students did this by repeatedly using grep and wc over the entire log file, and wondered why I deducted marks when it took a minute to run on moderately large logs.
God help anyone who uses a program they've written.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
We all know Google and Microsoft Bing are internally implemented in Brainfuck. We don't they simply admit it? They're trying to hide that fact by publishing language-comparison mumbo-jumbo while using the specially crafted Brainfuck processors to keep their domination.
Just that you will go mad figuring out the Hollerith field way of specifying strings and will never complete the project. But if you actually got the code to compile and run, Fortran would be the most awesomest language.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
So they ran a test between two statically-compiled native-code languages and two dynamically-compiled bytecode languages, and then seem surprised when one of the native-code language wins?
I'm also noticing a marked lack of C# (not something I'm personally a fan of, but nothing if not a mainstream competitor to the languages listed) and Python (another language that gets "used for real stuff" inside Google) in the test. Why is that? Not that I'd expect Python to do very well in a performance test against languages like these, but it still seems an odd omission.
Low level languages like C++ are highly optimized for the current computing environment. But they are hard to program. The future belongs to programming environments that are.
1) hard for a programmer to understand because they are data flow or multi-processing or highly data-local (cache optimized) or syncronous (GPU).
2) the environment may be unknown or variable not fixed. you may not know where your program will be executing from day to day or what the specs of the machine are.
Languages good at this don't exist yet. And we have not even begun to tap functional programming's potential.
The other problem with C++ is it's very easy to make logical errors in.
Some drink at the fountain of knowledge. Others just gargle.
Besides, different languages are better at different things. I bet Assembly is a heck of a lot faster than C++, but I wouldn't want to try coding many of the applications that use C++ into Assembly.
... eventually; often when its too late to do anything quick, easy or cheap to make it better.
Given the number of Googlers involved with producing this article, I'm more surprised by them not using TeX quotation marks correctly than the result of the competition. (Not to mention using crappy tables with way too many ruled lines which do nothing to aid understanding of the organisation of the tables.)
Experience frequently shows that people that overlook such details writing an article will often by nature be weak C++ team programmers. More than with most other languages, good C++ coding demands someone who is intrinsically observant, obsessively perfectionist and aware of often obscure side effects and implications of everything they write. This is obviously important on the most basic level so that they don't make mistakes, but even more important that they understand how other people could misinterpret complex code and algorithms and potential consequences.
It is this last level of empathy that is the sign of great C++ programmers vs. the good or clever. "Clever" solitary programmers (such as some of those drawn to perl) are in fact the worst to have in C++ team programming scenarios where their desire to demonstrate their own ability often ends up with some over-templated prematurely optimised nightmare which is then left as a trap for someone seeking to modify it later on.
I fail to see anything looking over Java's spec that doesn't match to the capabilities provided by Intel and AMD's SSE2 implementation. So what, precisely, is it that Java is doing that keeps it from using the massive amount of FPU power present on modern processors?
Also if that is a problem Java has, well then that is a major performance issue right there since most performance intensive apps these days are FP heavy. When people start talking about wanting to crunch a lot of numbers, they are usually talking FP. 3D graphics, games, audio processing, EM simulation, etc you are talking FP heavy stuff. If Java is going to run at about 2% speed, that is a real issue.
The lack (by choice apparently) of operator overloading cripples this language for any scientific application. They say they chose to deny operator overloading because it produces 'obfuscated' code. I would reply with something along the lines of:
c++:
Matrix A,B,C;
Matrix D = A*A+B/C + A*C
Java:
Matrix A,B,C;
Matrix D = A.mult(A).plus(B.div(C)).plus(A.mult(C))
Although my Java is pretty rusty, i'm pretty sure the more obfuscated code is definitely not the c++! Now try moving this to a massively parallel environment.
Let's not forget that Java strings are more similar to Pascal than to C. A string in Java is an object that encapsulates a char[], a start index and a length. So for example, taking a substring from i to j is O(1) in time and memory for Java—it just returns a new String object with the same char[] but a new start index and length.
Are you adequate?
The interesting part is not that C++ is faster, but that it was quite comparable to Java and only became much, much faster after several Google engineers optimized it!
Let this be the final evidence in what I call the "great C++ conspiracy":
Good coders make good code. If you want the fastest code you have to go beyond compiler optimization. You need programmers who are not only at the top of their field, but can dream in code. So, if you want a language to reliably produce the best results, you have to make sure that it attracts the right kind of people and kills the weak.
A language that only works for people who have given up any regular sleeping patterns, social ties, and hope. A language that will cause any sane person to stab themselves with a null pointer. A language where operators can be overloaded, and anything could mean anything else in a different context. Beautiful if done right, but treacherous and deadly to the uninitiated. What better, than to design a language that is the coding equivalent of walking through Mordor!
I am not sure what you mean by handling strings right to left (parsing?), but the main difference between traditional C and Pascal was that C strings usually didn't have explicit length anywhere (the string just continued until there was first 0x00 character) where as the first element of Pascal string was the explicit length of the string (at least in Borland Pascal the maximum length of a normal string was 255 characters for this reason).
Both of these approaches have pros and cons (and sometimes zero ended strings were used in Pascal and some C functions expect explicit length information). And Borland Pascal in some version supported also special strings that you could use as both Pascal and C style strings (they had explicit length at the beginning of the string like normal Pascal strings, but they also had extra 0x00 byte after the end of the string) which was quite useful, if you had to call some external functions/procedures which expected C style string parameters.
For when your only value metric is run-time speed.
Of course, if you actually have to take into account stuff like speed of adding features to the system, or skill-level of available programmers, or robustness, you're probably better off using anything else. Fortunately, most software is not speed-critical and you can use more reasonable programming languages.
C++ is a good language... but only for the limited circumstances in which it excels. Otherwise, the overhead entailed in coding in it isn't worth the cost.
That is all.
Your average programmer either isn't doing C++ or won't be for long...
That is not to say its a bad language, but its like giving a kid a table saw or oxy acetylene torch. Sooner or later something bad is going to happen. Made worse by the fact that every single person doing C++, thinks they are an expert and proceed to use every obscure feature at the same time.
Jeremy Manson brought the performance of Java on par with the original C++ version. This version is kept in the java_pro directory. Note that Jeremy deliberately refused to optimize the code further, many of the C++ optimizations would apply to the Java version as well.
So they intentionally optimized the Java version much less than the C++ version, and they know the Java version could be sped up by applying the same optimizations they used for the C++ version. And (who would believe it?) the C++ version came out faster than the Java version! What a surprise!
Truly nothing to see here. Please move along.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
It has been so many years since I last programmed using x86 assembly, that I didn't remember all the differences between Pascal calling convention and C convention. I did remember the difference in cleaning stack (caller/callee) when function/procedure ends, but I didn't remember difference in the order which parameters were pushed to stack and I didn't remember which registers were used in function calls.
However, as far as I know, this has little to do with parsing source code or strings. And as ar as I know, the typical way to parse strings in C (eg. strtok) and Pascal tends to be from left to right (source code, after it is tokenized is different matter). However, parsing programming languages is often done from right to left, but this depends on the implementation of the parser. It is also so many years since my compiler course, that I cannot remember if there are some programming languages, that absolutely must be parsed in some particular direction.
I think Borland/Turbo Pascal had only strings that were shortstrings (and some versions of Borland Pascal supported also C style strings relatively well). I always wished that they would support longer strings and it is nice to hear that they increased maximum length of Pascal style strings in Delphi. I used lot Turbo/Borland Pascal, but I have never used Delphi and I am not familiar with it.
Nowadays I use mostly just Java, C(++) and Perl, but sometimes I feel that I should try using Pascal again as I really liked it. I recently noticed that there is cross platform IDE/library called Lazarus which is some kind Delphi clone. Perhaps I should try it. On the other hand, maybe learning Scala or OCaml would be even more useful...
I am not sure if I understand what you mean. Of course, you can manually keep track of the length (or have pointer also to the end of the string) of C style strings, if you want to and sometimes it is done for performance or safety reasons.
But by default/convention, strings in C tend to be just pointers to the first character of the string and where the only way to find the length/end of the string is going forward until you find the first 0x00 character. And often the string is truncated by just writing new 0x00 character somewhere between start of the string and previous end of the string. The same trick can often be used for splitting string to separate tokens/strings in place -- without copying data anywhere.
BTW, if I remember correctly, the latest Borland Pascals did offer some kind 32-bit support (using DPMI?). However, I cannot remember any details about it.
I still think I am not able to understand what you mean. I guess I would need source code example to understand it. :)
It is true that you can use sizeof if some string situations (at least with static character arrays and literals, I cannot think any other uses at the moment), but it doesn't work for example in dynamically allocated strings:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
char s1[] = "1234567890";
char *s2 = "1234567890";
char *s3 = strdup(s1);
printf("sizeof=%ld\n", sizeof(s1));
printf("sizeof=%ld\n", sizeof(s2));
printf("sizeof=%ld\n", sizeof(s3));
printf("sizeof=%ld\n", sizeof("1234567890"));
free(s3);
return 0;
}
prints:
sizeof=11
sizeof=8
sizeof=8
sizeof=11
It's news in a sense that it shows that modern Java JIT compilers still can't optimize as efficiently as modern C++ AOT compilers (note that Java isn't really interpreted, at least not if you run that code a few thousand times to "warm up" before hand - it's all compiled to native code).
Though, really, this perf difference has more to do with the fact that Java memory model is more high-level (in particular, it is verifiably memory safe), and that places much more of a burden on optimizer to implement it efficiently (e.g. escape analysis is needed to allocate objects on stack). In C++, you just give the programmer the ability to deal with memory as he sees fit, which leads to fast code when it's written correctly, and spectacular buffer overflow exploits when it's not.
You jest, but what they've tested here is not a language, but a compiler and a runtime.
As someone who writes compliers, I can tell you that the language is just syntax and form. The only "performance" test you can do on a damned language is to see how fast it is to code something. Here's a basic BASIC example.
PRINT "Retarded Test is Retarded."
C:
puts( "Retarded Test is Retarded." );
Java:
System.out.println( "Retarded Test is Retarded." );
C++
std::cout << "Retarded Test is Retarded." << std::endl;
Out of all of these languages, BASIC has the most "performance" in terms of simplicity, key presses, parsing, number of symbols / tokens, etc. (for this test).
What's to keep all of these languages from to compiling (JIT for interpreted/VM languages) the above source codes down to the EXACT SAME sequence of machine code instructions?? NOTHING!!!!!!! Is BASIC now just as fast as C++ or Assembly Language for that matter?
Point being: Languages have little to do with CPU performance of compiled / executing code -- They didn't even test the damn C++ code in more than one COMPILER, which is where your real optimization happens. Seriously, I wrote a language that's a subset of JavaScript that can optionally compile to machine code, and is faster than the exact same valid JavaScript code running in your web browser. Comparable code in G++ is slower, does this now make the JavaScript language faster than C++ ???
Nothing to see here folks. They didn't test shit. If the two input programs perform the same tasks and produce the same output / results, but are simply described in two different languages, then what's to keep any sufficiently advanced optimizing compiler from generating the EXACT SAME machine code? NOTHING. Languages are how you describe to the complier what you want to do, it's the compiler / interpretor that is responsible for "CPU" performance.
Look at performance critical applications: High profile web sites, User Customizable Modules, Super Computer projects, data base integration, etc. Any time performance is high on the list people pick Java, Lua, Lisp, Pascal, PHP, Perl, Python and sometimes JavaScript. Platform independent (byte-code) deployment is the only feasible choice. The most expensive part of a project is its development time. Using a language that reduces this essential.
One thing that VM byte-code interpreters and just in time (JIT) compilers give is the ability to tune the algorithm based on real execution profiling for multiple hardware platforms, without additional software rewrites. If you compile to machine code at some point you will run into language implementation decisions that the naive coder's control, and you will hit a wall because they could never imagine all future platforms.
The computing field changes at an accelerating rate, now with ARM, x64, and even newer x86 chips, or even newer compilers on the same hardware, newly compiled code runs faster than the old binary you build 5 years ago. It is folly to produce software that is not portable to future hardware. This means more work on the part of the platform implementers, but the gain is significant because every program benefits when you upgrade the platform without having to spend development money on a per project basis. Centralized improvements give better distributed results.
If other things like program execution speed are important (they rarely are, most frequent bottlenecks are IO latency -- even in video decoding) then use a native language interface to a compiled language like C (+ Assembly), which nearly all VM languages have support for. This allows you to take advantage of the ease of modification and make the best of programmer time, without sacrificing the ability to fine tune a small processor intensive section (for each targeted environment). Compiled languages may be faster once, but one size does not fit all -- And one set of machine code instructions is rarely fastest for all platforms, even when it's the same platform (x86 compiled for a Generic is slower than AMD or Intel specific targeted code).
RTTI and exceptions
I can see how these are useful on PC- or smartphone-class hardware, but the support library adds substantial space overhead on embedded systems with 288 KiB of RAM and a 16.8 MHz CPU.
in fact, C++ adds compile time code reduction through the template facility
Templates are easy to inadvertently abuse once templated methods get instantiated once for each of several types. For example, sorting an array of each of several types may result in the sort routine getting copied for each type, compared to qsort() which uses the size of an element. What are the best practices to keep template bloat from happening?
I guess a sufficiently determined programmer can hand compile Boost:Spirit to C code
And in fact, this is how early C++ compilers worked, by outputting C code.
C++ is very sensitive to target architecture and compiler flags. That, and if they really did run everything on a Pentium 4, your tests will not reproduce theirs. Pentium 4 architecture was radically different from later Intel chips, AFAIK mostly due to a very deep pipeline.
Sure - but there are a few things to note.
1. The flags used are in their own Makefiles. Also, these days, it really is not very sensitive. Most decent compilers have very little variation in performance when twiddling with detailed optimization flags.
2. The later Intel chips are enormously more important to anyone alive, and are more representative of real figures that will be observed, anyway.
3. It still seems extremely notable that the Java versions - both simple and optimized - outperformed the C++ version.
The last point is critical. It turns the whole article on its head. Instead of supporting all of the "of course, this is obvious" comments, it makes this a notable result. Even with very poor programming, the Java version outperforms the tuned C++ version on modern hardware.
(And, definitely, the Java version was not written by a Java expert. It spews heap like crazy, uses Integer wrapped objects, uses generics where specialized primitive collections really need to be used, uses a custom set implementation that linearly scans the list on every insert, uses LinkedList where an ArrayDequeue or similar should be used, etc.)
Any test like this wouldn't even be noted anywhere if it wasn't from Google, of course. It would be nice to have available the context which makes the results so radically different from anything realistic. (The only information we get is the "Pentium IV" bit, and a Java flag that does not actually exist.) I suspect that in addition to being run on antique hardware, it was also run with a JVM from 8 years ago.
It's surprising to me because it's so insular and doesn't really "prove" anything. It doesn't say "Hey, here's why we use Java and reject C#" or anything else that's in any way useful, that I can see.
Again, I just don't see the point of this. I suppose it might have some meaning inside Google, but then why report on it like a story to anyone outside Google?
- Spryguy
There are three kinds of people in this world: those that can count and those that can't