Bulldozer Server Benchmarks Not Promising
New submitter RobinEggs writes "Some reviews of Bulldozer's server performance have arrived. Ars Technica has the breakdown, and the results are pretty ugly. Apparently Bulldozer fares just as poorly with servers as with desktops. From the article: 'One reason for the underwhelming performance on the desktop is that the Bulldozer architecture emphasizes multithreaded performance over single-threaded performance. For desktop applications, where single-threaded performance is still king, this is a problem. Server workloads, in contrast, typically have to handle multiple users, network connections, and virtual machines concurrently. This makes them a much better fit for processors that support lots of concurrent threads. ... It looks as though the decisions that hurt Bulldozer on the desktop continue to hurt it in the server room. Although the server benchmarks don't show the same regressions as were found on the desktop, they do little to justify the design of the new architecture.' It's probably much too early to start editorializing about the end of AMD, or even to say with certainty that Bulldozer has failed, but my untrained eye can't yet see any possible silver lining in these new processors."
Bulldozers do not make good servers. Use a computer. Problem solved.
How many more years will slashdot have an off-by-one error on your Score in your profile?
And yet, 3 supercomputers with those opterons were ordered in the last 4 weeks ? and in a month, one of them - which is being revamped from #3 supercomputer position of the world - will be #1 supercomputer of the world when complete ? Was lockheed martin also morons to choose an opteron based supercomputer ?
Why is an article which is apparently written to bash amd was included in slashdot despite its apparent bias ?
Read radical news here
The standard of writing at "Ars Technica" have declined far more than AMD's relative performance to Intel.
Recall the Itanium from Intel and HP.. It started out with great hype more than ten years ago. When the first benchmarks came no-one wanted to believe them. Still that particular architecture is about to die.
Unfortunately, Bulldozer may end up with a similar fate. The big difference is that Intel had its regular desktop cpu line-up to finance the Itanium disaster. If nothing can be much improved on the AMD cpu side, can the shrinking graphics card business save AMD?
I hope so.
i always liked the AMD CPUs, mostly for almost equal computing power for less money but at the moment this is not really true anymore it seems when i look at the benchmarks (doesn't matter if desktop or server)
I really don't get the conclusion.
The bulldozer is faster then the Xeon chip on all cpu benchmarks which can generate enough threads to fill all cores.
Each bulldozer core is as fast as a core on a Opteron 6100.
It looks exactly like the cpu I want in my web/db server, and my supercomputer.
We need healthy competition to Intel, to keep pushing tech forward and prices down. Sadly AMD simply has not performed over the last year or two, with no real answers to Intel's I series.
I thought we all switched to full-fat floating-point operations over 15 years ago when the Pentium hit the mainstream and everyone finally had an on-die FPU in their PC
Its application dependent. I doubt if much fp stuff gets done in cryptography, routing, and many simulations.
When someone says that a CPU was designed around multiple threads I think virtualization. yeah you can argue that servers are multithreaded in that they have to handle multiple users connecting, but that's bull. I can write a badly threaded application that doesn't effectively use the multiple cores...
So how do these cpus perform with something like ESX running on them?
Scott
That's perfect for running BOINC though, which is very good at using multiple cores at their full capacity. Useless for the business, but great for contributing to science projects :-)
Bulldozer chips are in short supply due to sales. Because they are not able to immediately meet opteron demands, amd is keeping 8150 supply low, binning them as opterons instead, and therefore leaving desktop market undersupplied. read the informative thread below.
http://www.overclock.net/t/1171264/compared-3-different-bulldozer-fx-8120s-want-to-know-the-difference/10
bulldozer 8150s have been in short supply on newegg and amazon. sometimes they are out of stock, and you cant even put them on watchlist.
way too high sales for a 'failed' processor ?
Read radical news here
One element has me curious about how these benchmarks were prepared: Is the benchmark software compiled on the target platform/cpu combination with all available optimisations of that platform?
Many of these benchmarks have a binary/library or set thereof that is written for a single target platform (the platform the original developers of the benchmark were working on), Usually pre-compiled, usually for intel, on an intel system, by an intel compiler, with intel optimisations or at least two of the four. This same binary is then used against whatever systems on compatible architectures, this has the high potential to produce skewed results on non-intel platforms as not all manufacturers use the same optimisations.
While this specific processor may not be as great as it should have been, I feel that benchmarks in themselves are usually flawed and must be taken with a grain of salt until real-world software that isn't in a lab-style environment is attempted on it.
Actually it was 20 years ago with the 486 that we all got fpu'S on die
Maybe it's early, but I was having a hard time seeing the comparisons they were trying to make. Also when Ars was comparing pricing, X system is 400k and Y system is 600k, what the hell was that, usually stats like that would be accompanied with a link or site to said system. It said benchmarks were "here", I didn't see any. I'd like to see benchmark details such as OS. May be too early to judge as this is the first generation chip, and will the Bulldozer perform better under the next iteration of windows(if that was the control)?
Windows does not (yet) know how to properly schedule threads on that hardware. This has caused issues with all the benchmarks, not unlike what happened when Intel Hyperthreading was first released. Once the proper support is added to the OS kernels, the results should be much better.
TPC-C is performed on Windows 2008 see http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=111111501
Anantech tested on Windows 7.
It is known that Windows 7 and 2008 are not optimized for Bulldozer, especially at the task scheduling level.
So we do not know the real power of the Bulldozer architecture in the Windows world yet
See http://hexus.net/tech/news/cpu/32394-bulldozer-benchmarks-correct-definitive which unfortunately only has very few benchmarks.
You can also look at the phoronix site, where Bulldozer is tested on Linux.
Or maybe you should have finished that paragraph that explains:
Server workloads, in contrast, typically have to handle multiple users, network connections, and virtual machines concurrently. This makes them a much better fit for processors that support lots of concurrent threads. Some commentators have even suggested that Bulldozer was, first and foremost, a server processor; relatively weak desktop performance was to be expected, but it would all come good in the server room.
You're bashing them for not understanding exactly what the paragraph is meant to show that they do understand. Epic fail.
Live today, because you never know what tomorrow brings
I had a 486sx. It may technically have had an FPU on the die, but it was defective and disabled.
Anandtech.com provides much more knowledgeable and professional reviews. They had this to about AMD's new chip, "Unfortunately, with the current power management in ESXi, we are not satisfied with the Performance/watt ratio of the Opteron 6276. The Xeon needs up to 25% less energy and performs slightly better. So if performance/watt is your first priority, we think the current Xeons are your best option. The Opteron 6276 offers a better performance per dollar ratio. It delivers the performance of $1000 Xeon (X5650) at $800. Add to this that the G34 based servers are typically less expensive than their Intel LGA 1366 counterparts and the price bonus for the new Opteron grows. If performance/dollar is your first priority, we think the Opteron 6276 is an attractive alternative." http://www.anandtech.com/show/5058/amds-opteron-interlagos-6200/14
You are thinking the same thing that I had in the back of my mind. The changes in hardware could very well be just enough that the existing kernels are designed to properly handle. The example of Hyperthreading is case-in-point. Once Windows/Linux/BSD/Oracle and such do in fact, make changes to accommodate any subtle changes needed to take full advantages of the hardware, then the tests will be more valid. Now if all/some don't see the need to make any changes, then we can use the word "flop" to describe the current CPU since if the hardware design requires changes in software to exploit the new features and the software does not change: flop, flop and flop.
Its application dependent. I doubt if much fp stuff gets done in cryptography, routing, and many simulations.
So it's like the Cyrix 6x86?
Give me Classic Slashdot or give me death!
Though I'm suspicious that Bulldozer is going down remarkably like NetBurst (NetBurst made design compromises for marketable massive clock gains, Bulldozer similarly makes compromises to boost the now-marketable core count) and time may prove that wrong, but this article was crap.
It looked like they cherry picked some benchmarks from the world at large with no control. As pointed out in the article, the tpmC benchmark had massive storage differences and the cost delta means there were probably node count differences. There are so many things in play that it is impossible to derive any sort of statement specifically about the processors. The article, however uses that as a point to show AMD is more expensive to make AMD look bad but in the same breath says better SSDs probably drove the benefit to steal AMD's thunder. He can't have it both ways. I'm inclined to believe the storage architecture was the key in terms of cost and performance given the nature of the test.
Later, the article says AMD should have just done 16-core Magny-Cours. Clearly AMD should hire him as he is a genius who *must* have considered all the complexities and figured out a way to achieve that core density when no one else in the industry has. No one pretends for a second that a bulldozer module matches 2 'real' cores, but they can't just wave their wand and make a 16-core package of the old architecture. Bulldozer is all about trying to ascertain the 'important' bits of a core and share other bits in the hopes the added resource gives most of the benefit of an additional core without the downsides that make it impossible to do that many cores on a socket.
XML is like violence. If it doesn't solve the problem, use more.
Bulldozer can't consistently beat Phenom X6 in desktop workloads.
It can't consistently beat Magny-Cours in server workloads.
It doesn't seem to be any more power-efficient than AMD's last generation, despite being built on a smaller process node (32nm vs 45nm).
At what point does AMD simply admit Bulldozer is a failure, pull the plug, and write off the sunk costs? Putting good money after bad is a classic business mistake that has killed many companies.
AMD should continue improving their existing cores on the 32nm process (they already have some of the work done with Llano) and forget about their "revolutionary new" architecture which is basically this decade's Prescott.
Or, heck, see if it's possible to scale up the Bobcat cores for mainstream desktop use. Don't forget, Intel's very successful Core 2 Duo came from a previous design (Pentium M) that had been reserved to laptops. AMD will probably have more luck increasing performance (both raw clock and IPC) on Bobcat than trying to tame the heat, insane transistor count, and long pipeline of Bulldozer.
Tech Report demonstrated this to be the case by setting the thread affinity on their tests, so they were locked to specific cores, using only once core per module. They saw as much as a 30% improvement in the single threaded or lightly threaded benchmarks. Other sources, including AMD itself, have demonstrated as much as 10% improvement in performance by using a better thread scheduler. AMD has whitepapers discussing this issue.
As for changing the OS kernels... Windows 8 already has the changes. Windows 7 and Server 2008 may get them in a future update (Service Pack?). Linux kernel support is ready and is available in a kernel patch. Compiler support is now included in VS 2010. So, not necessarily a flop; but, might be a short while before the full capability of the architecture is realized.
Every large business, and most medium sized ones, are going to try to (at least) match that target.
(athough memory seems to be a bigger constraint.)
After clicking on links I finally found some benchmarks. As usual, they were bullshit. Can't these people think of a test that can put them through real hoops? I used to throw 60G pcap files (1 minute of traffic) at machines to determine if the hardware could run our IPS software. The machine with the fewest millions of threads not yet processed won. The application opened a thread for every packet that traversed a 1G nic. The content of each packet was then sent (branched) through the appropriate inspections simultaneously; one thread for each protocol check, one thread for each header check, one thread for each regular expression on the body, making a potential (65,535^2^10k + 4^252^200) new threads per second. No branch prediction can be used in this kind of test because the traffic is never predictable so every path for every packet must be traversed completely. Note: the 10k and 200 are the number of rules (regular expressions) applied to the packets.
Having to work for a living is the root of all evil.
I can't see a way of phrasing it differently, so it seems a pointless exercise. Especially considering my writing tends to be verbose and hard to read at best and you had difficulty with what I hope is writing that went through an editor. Of course you stopped reading before the sentence gave the explanation so maybe if I just repeat that sentence and the few following it verbatim:
If you really want my unskilled wording:
Previous benchmarks on the desktop variants of the architecture were unimpressive. The architecture emphasizes server features over desktop features and hence the server variant should be much better. Now that server benchmarks are in, however, the results are terrible.
All of the integer ops are executed in those units, so yes, they are important. Every single loop and jump and code branch executed by the processor is dependent on some integer arithmetic being performed at as low latency as possible. Even on a completely FPU-less system, you'd be surprised exactly how little floating point ops are actually necessary. Without an FPU you can still do: compiling, digital simulations, run kernels and do virtualization, web/file/database etc. serving, networking, cryptography.
Look at the Sun T1/T2 CPUs, they are designed to have low-FPU power because the market they target doesn't care : "One of the limitations of the T1 design is that a single floating point unit (FPU) is shared between all 8 cores, making the T1 unsuitable for applications performing a lot of floating point mathematics. However, since the processor's intended markets do not typically make much use of floating-point operations, Sun does not expect this to be a problem. Sun provides a tool for analysing an application's level of parallelism and use of floating point instructions to determine if it is suitable for use on a T1 or T2 platform."
The US Office of Management and Budget (OMB) has a virtual to physical server target of 15:1.
Every large business, and most medium sized ones, are going to try to (at least) match that target.
(athough memory seems to be a bigger constraint.)
They're still not likely to use all the cores unless they have some peculiar workload. They'll run out of RAM and IO (on a single server) first.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
Clearly. If you think you can stop reading mid sentence in a pretty standard sentence structure while concluding the opposite of what was being stated. Then yes, your understanding of English is severely lacking.
Yeah, but FPU performance sucked rocks until the pentium came along and pipelined it.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
wrong. the 386-sx had a 16 bit memory bus (vs 32 bit on the DX). It had no FPU, that was a separate socket.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
The cyrix was actually considerably better than intel at integer. This.... not so much.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
Consider me sold. It's like a cluster array in the bedroom, without having to worry about the networking headache.
Plus you won't need any heating in the winter.
Start moving some of that crap to the GPU side of Bulldozer. There are a few things that the GPU could be dedicated to with OpenCL and such.
In a server, it's essentially wasted silicon unless fully utilized.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Which for anyone not playing quake was an absolutely fantastic CPU. I had one of the PR166 (aka 133Mhz, IIRC its was actually an IBM) ones, and it absolutely trounced my roommates P90 (which actually cost a little more), in everything. It was actually pretty amazing. Then there was quake, which was hand optimized for the pentium U-V pipeline arch and made heavy use of FP. The results was that in quake my PR166 @ 133Mhz was performing on par with the 90Mhz Pentium which cost about the same amount of money. I didn't understand the smear campaign back then, and I still don't. Everyone insisted on comparing the PR166 against the P166 which cost significantly more, and complaining when it didn't best it in one or two benchmarks. Dollar for dollar the PR166 kicked the crap out of anything Intel offered. In its worse case it was roughly equal to the equivalently priced pentium, but in its best case was equal to a processor that cost ~3x more.
The general consensus was that the cyrix had crappy floating point, but the cycle latencies were actually equal or better than the Pentium, what killed it was its inability to execute pentium scheduled FP code efficiently. I rolled a couple microbencmarks of my own just doing streaming multiplies and divides and the cyrix could actually beat the Pentium at the same clock rate when the code wasn't explicitly optimized for the Pentium (aka 486 scheduled FP was faster on the cyrix than the P).
Then there was the mystery of why Microsoft disabled the L2 CPU cache in NT when using a cyrix... Which is another whole discussion.
In all, the only real gripe I had about the CPU was its absolute need for good cooling. Without a good fan/heat-sink properly applied it would crash.
Com'on Ars... you can do better than that. Give us some chart pr0n to gloss over.
There are no poor processors, only a poor software...
There you are, staring at me again.
I just read the article about "AMDs Wegschaufler" in the c't (http://www.heise.de/ct/inhalt/2011/25/158/) and instead off just talking about bla-bla-benchmarks they actually tested the CPU for themselves. What did Ars[e] Technica do to get such bad results? Their crap about a "bulldozer server benchmark catastrophe" doesn't even relate to real life anymore because the numbers i saw were quite good. Yes a Sandy Bridge server CPU from Intel will could change that but until it is here, the bulldozer server CPU has the performance crown.
If they can fix whatever is killing the performance of the on-chip caches (previous reviews indicated that the L3 appears to be a bottleneck) and/or figure out how to get the clock speeds up (it supposedly has deeper pipelines than K10, so it theoretically should clock higher), Bulldozer could still be a competitive part. This is, of course, dependent on AMD surviving for long enough to do that. I wonder how long they can get by on their GPU revenues?
The cyrix was actually considerably better than intel at integer
Clock for clock, yes, but Cyrix played stupid marketing games and sold, for example, their 133MHz part as a 6x86 PR166, and it was generally slightly slower than a P166 and a lot slower in floating point. I think that's what killed the brand: if they'd called it a 133MHz part, everyone would have thought it was faster than a Pentium (and it was cheaper than a 133MHz Pentium).
I am TheRaven on Soylent News
I have read that bulldozer is having compiler issues in the desktop space. Apparently, the current gcc, Microsoft, Intel, etc. compilers are having problems with acceleration, core allocation, etc. Fixes are on the way and some compilers, such as Open64 5.0, will apparently drastically improve bulldozer performance. Could the same problem be occurring here?
Everyone insisted on comparing the PR166 against the P166 which cost significantly more
Well, that's probably because Cyrix invited the comparison themselves with their choice of name/code and the implication it carried.
Maybe it *was* a good chip for the money (I don't know, never had one), but you can't say Cyrix were blameless there!
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Prior to this, absolutely no mention of the desktop is made. It would be an entirely different thing if the article was well written, and that paragraph started with something along the lines of : While we realize the performance of this processor on the desktop doesn't mean dick, we looked at it anyway, and it was no surprise when it didn't perform well. One reason ....., but it didn't. It's no surprise you didn't pick up on this, however. Most people today cannot write well, and with a Slash User ID# as huge as yours, you know doubt got most of your "eleet" English skills reading the walls on Facebook.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
OK, before anyone else says it, I meant "no doubt" not "know doubt". Mea culpa ;-)
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
They didn't look at desktop performance so your desired language would be ridiculous.
And It isn't the start of the article, and isn't the first mention of the desktop. Which you might know if you didn't stop reading as soon as you misinterpreted something due to your crappy understanding of English.
Heck it isn't even the first mention of desktop in the summary, even the snippet you did read. The sentence directly before the one you hate so much provided the server/desktop context.
You stop reading because a quote in a slashdot summary, which would often be taken from the middle of an article and be notoriously out of context, isn't an introduction. Well done.
And even though I said my writing is pretty bad and it'd be better to read the hopefully edited article text, I have to go back to writing class because you think the first paragraph of an article doesn't exist.
Whatever. Congrats you stopped reading in the middle of a sentence that comes from the middle of an article and came to the opposite interpretation that reading a a few words further would provide. And given you then had to post about it it wasn't a time constraint that prevented reading a few more letters. Of course that's everyone else's fault and nothing to do with your reading comprehension.
And oh no a large slashdot user id, I must be a baby moron. It couldn't be that I stopped posting with my old one due it being obviously my name and not wanting it to be so clear that I occassionaly post from work.
Yes, It was a reference to previous reports not what was actually being looked at in that article.
Zlib, libpng, jpeg libraries are used in almost everything, yet these libraries are stuck as single-core libraries because they were all initially written before 2003.
Huffman compression doesn't work very well in parallel unless the underlying data structure was specifically designed for it. With JPEG, you can use SIMD instructions to boost iDCT performance, but you can't do much about the Huffman section. For that matter, LZ77 (used in zip and png) doesn't really parallellize well either.
That's the problem - you can't just replace legacy code, in many cases you have to replace the legacy algorithms as well, and that just isn't going to happen. Bzip2 parallellizes well because it first divides the data into specifically sized blocks (900k) and you can work on different blocks at the same time. The original LZMA (7zip) algorithm didn't support multithreading very well, but LZMA2 does. The thing is that for the forseeable future we are going to continue to use deflate (ZIP, gz, header compression, etc.), JPEG, PNG, and other legacy formats - and these will *never* be able to leverage multi-core systems to their full efficiency.
An article about benchmarks without benchmark charts, that the first time I've seen that, I'm really impressed.
I have seen a number of applications do GPGU because it has a *lot* of theoretical potential. I saw quite a few places spend a lot of money assuming they'd sort it out. Most (not all) found that the advertised benefit was not feasible to use with their workload. In some cases it was because the development cost was high, but in many cases they found they really *couldn't* execute in that context no matter the cost.
From the other end, even Intel is making great strides in CPU capability. When people painfully started doing transcode on GPGPU, they made some pretty dramatic results. Then Sandy Bridge brought a transcoding engine along that blew all the GPGPU transcode work out of the water. Despite having indisputably weak GPUs, they are able to deliver potent responses to GPGPU usage of the GPU chips.
Either way, GP-GPU has no bearing on Bulldozer, the architecture doesn't seem particularly more amenable to GPGPU. With Bulldozer, AMD is gambling that somehow (between Piledriver and OS advances) that the limitations hurting their performance today will be alleviated. Intel had a similar sort of behavior around Netburst (except with an assumption of IA64 taking over as a long term strategy), and it didn't pan out for them. It may or may not pan out for AMD.
XML is like violence. If it doesn't solve the problem, use more.
In addition, the 386-sx had only a 24-bit address bus, just like the IBM AT, limiting it to 16 MB of addressable memory (fetched 16-bits at a time).
The 386DX chip needed the companion 387 co-processor for floating point.
Heh. I remember pricing up 16mb of RAM of my early 486dx with 30 pin SIMMs. It was cheaper to buy a pentium :D Can only guess at how expensive a 16 meg 386-SX would have been.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
No, what killed Cyrix is system builders passing those chips off as "real" Intels, and charging full price for a cheap knockoff. Then, when someone knowledgeable would come by to fix or upgrade that machine, we'd see the Cyrix logo and blame every single problem on that dumb sticker.
Even today, few people know what's inside their own PC. Given how confusing it can be for people who actually sell machines for a living, sucha s myself, there really is little hope for the average user to make sense of all the different product lines and competing brands. I would say that 9 out of 10 clients of mine are completely at my mercy. If I were crooked, I could stick a $90 AMD chip in there, charge them for a $350 Intel i7, and they would never know. And I'm not just talking about residential clients, no... corporate sales too! I'm usually called in AFTER they get fleeced by a big name dealer, and AFTER I point out that they paid ten times too much for a basic server with previous-generation components. I had one office get quoted $1600 per workstation, for basic office machines with a 19" LCD. I got them something faster and quieter for half the price, with a larger LCD and enough profit to cover my time building, ghosting and unpacking them on-site. Don't even get me started on the servers, that's the low-hanging fruit...
-Billco, Fnarg.com