Is the Sparc T4 Too Little Too Late?
packetrat writes "Ars Technica reports on Monday's launch of the Sparc T4, and how it finally (nearly 20 years after everyone else) brings out-of-order execution to Sun Sparc ... er, Oracle Sparc. But the benchmarks that Oracle has thrown up (surprise) are a smokescreen for the fact that the processor is still woefully behind state of the art, and it serves mostly as a placeholder to keep the remaining Sparc user base from defecting to Intel — even as Oracle is selling systems based on Intel and Oracle Linux. With the right benchmarks, my minivan outperforms a Maserati. The T4 is a minivan."
Sun had out of order SPARCs for years, contrary to the article's claims. Sun had a two pronged strategy, one aimed at single thread performance (the UltraSPARC series), the other at multithreaded performance (the T series). The UltraSPARCs were never really that good, so were eventually dropped in favour of the Fujitsu SPARC64 series, and the replacement (code named "rock") was dropped by Oracle because progress seemed stalled forever, but they did indeed have out of order execution, register renaming, and "Rock" had a promising "pre-execution" thread that was supposed to alert cache controllers ahead of time to pre-fetch data that can't be statically predicted, dropping cache misses to near zero.
The purpose of the multithreaded processor was to support mainly I/O bound tasks, and lots of them - web servers are like this, though more in the past where web content was more static. In those systems, a T series SPARC system noticeably outperformed similarly priced competition (with similar reliability - you could get a lot cheaper if you didn't care about component quality).
The single threading improvements in the T series are being added because even I/O bound systems often have compute-bound tasks. In particular, the T4 lets you assign one high priority thread which gets to hog CPU resources, in addition to out of order execution and other techniques that all threads benefit from, so I/O bound threads don't get hung up waiting for a single CPU-bound task to finish.
I agree with the I/O. I worked at a large research centre with a 3000 tape LTO 4 library and 200TB (about 20 RAID arrays) disk SAN attached to one 2 socket T2 machine. The machine didn't even budge when recovering from a couple tapes, backing up to another 3, and pumping out 10Gbps to userland. It just gobbled up NFS traffic like crazy because it had 128 concurrent threads of capacity. Even Intels high end chip only has 20 and Intel gets all excited about it but the Sparc has had 64 for 4+ years. Maybe it isn't so great with database load, I'm not sure but it kicked but as a fileserver.