Is the Sparc T4 Too Little Too Late?
packetrat writes "Ars Technica reports on Monday's launch of the Sparc T4, and how it finally (nearly 20 years after everyone else) brings out-of-order execution to Sun Sparc ... er, Oracle Sparc. But the benchmarks that Oracle has thrown up (surprise) are a smokescreen for the fact that the processor is still woefully behind state of the art, and it serves mostly as a placeholder to keep the remaining Sparc user base from defecting to Intel — even as Oracle is selling systems based on Intel and Oracle Linux. With the right benchmarks, my minivan outperforms a Maserati. The T4 is a minivan."
Isn't this a repeat from yesterday?
Or are we going to see this story once per core?
With the right benchmarks, my minivan outperforms a Maserati. The T4 is a minivan.
When you're moving lots and lots of boxes, then yes, a minivan does outperform a Maserati. It's a pretty good analogy IMHO.
As Seymour Cray noted: Anyone can build a fast CPU. The trick is to build a fast system.
Personally I've always found SPARC boxes to be good with I/O.
Sun had out of order SPARCs for years, contrary to the article's claims. Sun had a two pronged strategy, one aimed at single thread performance (the UltraSPARC series), the other at multithreaded performance (the T series). The UltraSPARCs were never really that good, so were eventually dropped in favour of the Fujitsu SPARC64 series, and the replacement (code named "rock") was dropped by Oracle because progress seemed stalled forever, but they did indeed have out of order execution, register renaming, and "Rock" had a promising "pre-execution" thread that was supposed to alert cache controllers ahead of time to pre-fetch data that can't be statically predicted, dropping cache misses to near zero.
The purpose of the multithreaded processor was to support mainly I/O bound tasks, and lots of them - web servers are like this, though more in the past where web content was more static. In those systems, a T series SPARC system noticeably outperformed similarly priced competition (with similar reliability - you could get a lot cheaper if you didn't care about component quality).
The single threading improvements in the T series are being added because even I/O bound systems often have compute-bound tasks. In particular, the T4 lets you assign one high priority thread which gets to hog CPU resources, in addition to out of order execution and other techniques that all threads benefit from, so I/O bound threads don't get hung up waiting for a single CPU-bound task to finish.