Linux 2.6 And Hyper-Threading
David Peters writes "2CPU.com has posted an article on Hyper-Threading performance in Linux. They use Gentoo 1.4 and kernel 2.6.2 and run through several server-oriented benchmarks like Apache, MySQL and even Java server performance with Blackdown 1.4. The hardware they use in the tests is border-line ridiculous (3.2GHz Xeons, 3.2GHz P4 and P4 Prescott) and the results are actually quite interesting. It's a good read as he even takes the time to detail his system configuration all the way down to the CFLAGS used while compiling the software."
Has anybody run into a problem with Hyper-Threading and per-CPU licensing?
The hardware they use in the tests is border-line ridiculous
I'm typing this on a 3.0 GHz Pentium 4 that has hyperthreading. The entire system cost me $1200 to build just before Christmas - including 1GB of RAM, a Radeon 9800 Pro video card and a 120GB SATA hard drive. Dell and IBM sell 3GHz notebooks now for a similar price.
My point is that a 3.2GHz CPU is not ridiculous in an age where 2.66GHz processors are considered entry-level (FYI, Dell is currently selling a 2.66GHz desktop for $499).
What are you still running on? A 486?
Ok, Time to redo the benchmarks, Kernel 2.6.3 is out.
[joking]
Be nice when we see some nice Opteron benchmarks vs the new Xeons.
-
"But Calvin is no kind and loving god! He's one of the _old_ gods! He demands sacrifice!"
The first one performs semi-miracles on repetative build times where you aren't doing "incremental" builds. The second lets you distribute your compile to multiple build servers on the network (beware - there be deamons here)
Build times went from hours to minutes - it was great
I have mod points and I am not afraid to use them
Those sure are some interesting numbers. On the order of a 49% increase or 35% decrease in performance depending on the application. I always figured those high-GHz CPUs would be completely IO-bound. I guess this sometimes allows threads to run with what they've got in the on-chip cache.
Makes you wonder if a kernel could detect if it was helping or not and selectively enable it.
I did some informal testing between VC++ native and C# to .Net bytecode. I had a little loop calculating primes. The native C++ kept everything in registers, while the CLR made everything relative memory accesses to BP. I figured that would devastate performance, but on the Pentium 4, it was only 5% slower! It seems to have an L1 cache that's as fast as the registers. That will certainly make it easier on the compiler writers.
Sort of off topic, did anyone else see that article in MSDN about using .Net for serious number crunching? The author seemed to write the whole article as if he thought it was a good idea. Not that there wouldn't be some advantages to doing that (such as the possibility of tuning for the processor at runtime), but the one graph he showed comparing with native code had .Net running 50% to 33% slower!
Well the hardware is provided by the manufacturers for review (it is a hardware site after all). SPEC doesn't just go around handing out copies of their (very expensive) benchmarking applications.
You are an idiot. To start with, a CPU with HT has two discrete visible register sets. If you are so smart, how would you fix this imaginary performance hit by "handling" registers better
Second, the SMT scheduler in -mm kernels isn't a hack. It is a general and extensible topology description that the scheduler uses to achieve exactly the behaviour it needs.
For the record though, the important point was that the stock 2.6 kernels do not yet handle HT in an ideal manner. The article doesn't mention if the Gentoo kernel used for the benchmarks is HT patched or not.
s /Hyperthreading.html
And with special thanks to Zack Brown, those interested can read summaries of HT issues here:
http://www.kerneltraffic.org/kernel-traffic/topic
My entire lab at school is filled with Dual 3.2GHz Xeons with Quadro fx 1000 cards. People have those types of machines... or 100 of them.