Hyper-Threading Speeds Linux
developerWorks writes "The Intel Xeon processor introduces a new technology called Hyper-Threading (HT) that makes a single processor behave like two logical processors. The technology allows the processor to execute multiple threads simultaneously, which can yield significant performance improvement. But, exactly how much improvement can you expect to see? This article gives the results the investigation into the effects of Hyper-Threading (HT) on the Linux SMP kernel. It compares the performance of a Linux SMP kernel that was aware of Hyper-Threading to one that was not." Ah, the joys of high performance.
We've used XEON's on our DB server for a few months now. The performance has been outstanding. You also see 4 processors when you run top.
At first we thought this was an error, and got in touch with Dell's tech support. But the geeks there said this is normal behavior.
Newsfollow.com
Does SMP support automatically allow benefits from Hyperthreading, or does that require special support all it's own?
"Everything you know is wrong. (And stupid.)"
Moderation Totals: Wrong=2, Stupid=3, Total=5.
Of course multi-threaded applications are going to improve. What's your point?
For those who didn't RTFA:
Simple syscall 1.10 1.10 0%
Simple read 1.49 1.49 0%
Simple write 1.40 1.40 0%
Simple stat 5.12 5.14 0%
Simple fstat 1.50 1.50 0%
Simple open/close 7.38 7.38 0%
Select on 10 fd's 5.41 5.41 0%
Select on 10 tcp fd's 5.69 5.70 0%
Signal handler installation 1.56 1.55 0%
Signal handler overhead 4.29 4.27 0%
Pipe latency 11.16 11.31 -1%
Process fork+exit 190.75 198.84 -4%
Process fork+execve 581.55 617.11 -6%
Process fork+/bin/sh -c 3051.28 3118.08 -2%
is it just me? or does the linux kernel not perform so much better in SMP HT?
"Conclusion
Intel Xeon Hyper-Threading is definitely having a positive impact on Linux kernel and multithreaded applications. The speed-up from Hyper-Threading could be as high as 30% in stock kernel 2.4.19, to 51% in kernel 2.5.32 due to drastic changes in the scheduler run queue's support and Hyper-Threading awareness."
My questions: What's the downside? Is AMD doing anything similar?
Fight with computer brings SWAT team
Tested HT running couple large jobs on a 2 CPU box with each process using over a GB of RAM. Performance went down.
Also HT can play havoc with a openMosix cluster since processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality.
It is not all peaches and cream.
Does anyone know if AMD will be doing something similar or if their current processors do something like this? I know that many High Performance Clusters use SMP machines and multi-threaded code and could take advantage of HT. Many clusters are made with AMD processors due to the fact that they are so much less expensive than Intel.
Mecworks BLOG
> Well, unless you have a computer that has multiple physical CPUs.
You have a point, but he does, as well. SMT ("hyper-threading") should work automatically for multiprocessor systems. So if you have a dual processor, SMT-capable board in a system that's unaware of the SMT functionality, you should still get a boost from SMT. Unless Hyper-Threading is a really, really bizarre implementation of SMT. Reviewers should really compare against an SMP system that is incapable of doing SMT, because it'll do it automatically (or it should), even if you don't tell it to. Alternatively, you could approximate the same results by forcing the system to only use a number of threads equivalent to the number of processors. Not all programs can do this, though (compiling is the only thing that immediately comes to mind).
Granted, I've been out of the loop a bit, so I might be making some really off the wall (and inaccurate) assumptions about Intel's SMT implementation.
nope, you make incorrect assumptions, the hyperthreaded portion of the cpu shows up to software as a seperate cpu. For this reason a win2k pro machine has to have hyperthreading disabled on a dual xeon machine or else it will just use the first physical cpu and its child hyperthread. This is why artificial smp limitations suck. Also win2k server will only allow 4 cpus in standard edition so it can only utilize two physical cpus and their hyperthreads. Windows Server 2003 ups the amount of cpus allowed for standard edition to 8 to account for this.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
What you said was false.
Take the example of database & OLTP applications. Database transactions are heavily dependent on repeated access to RAM. Virtually no database is small enough to fit into cache, and there is often little regularity in which data is accessed. Memory latency will REQUIRE a non-SMT processor to wait IDLY each time there is a memory latency, which takes >100 proc cycles on a modern CPU. This has NOTHING to do with he p4 architecture or long pipelines.
"But remember that HT requires OS support, may require application support..."
HT does not require OS support as long as the OS is capable of recognizing more than 1 CPU. Any threaded app can benefit from HT.
SMT (hyperthreading) will become increasingly important when processors are able to execute more than 2 threads simultaneously.
This development is inevitable. Previously, each new processor generation was faster than the prior one at a given clock rate, because each new processor core had more execution units, and was therefore able to perform more work in parallel. This trend abruptly ended recently, for one reason: there is no more instruction-level parallelism (ILP) to exploit. It is impossible for a processor to look at a thead of execution and find more than a few instructions to execute in parallel.
The only parallelism left to exploit is THREAD-LEVEL parallelism (TLP). Therefore the only way to continually increase performance is to increase the number of threads that a CPU can execute in parallel. This requires two modifications to CPU cores: first, increase the number of thread contexts per CPU, and second, increase the number of pipelines to which those threads can be dispatched.
With the P4, it would be pointless to have more than 2 thread contexts, because there aren't enough CPU resources lying idle to execute more than 2 threads. But future CPUs could make use of more than 2 thread contexts by having enough CPU resources to execute all of them. Future CPUs could have 20 execution units or more, which would be enough to execute several threads. Remember that the number of transistors per CPU continues to increase exponentially.
It's easy to forsee a time when processors have 20 execution units (10 integer, 10 fp) and 4 thread contexts, offering more than triple the performance of a non-SMT cpu. In the future, non-SMT CPUs will make as little sense as a non-superscalar CPU would today.