Hyper-Threading Speeds Linux
developerWorks writes "The Intel Xeon processor introduces a new technology called Hyper-Threading (HT) that makes a single processor behave like two logical processors. The technology allows the processor to execute multiple threads simultaneously, which can yield significant performance improvement. But, exactly how much improvement can you expect to see? This article gives the results the investigation into the effects of Hyper-Threading (HT) on the Linux SMP kernel. It compares the performance of a Linux SMP kernel that was aware of Hyper-Threading to one that was not." Ah, the joys of high performance.
Xeon folks arent having the only fun. The 3 Ghz Pentium 4 is also hyperthreaded for that crunchy flavor and great taste.
We've used XEON's on our DB server for a few months now. The performance has been outstanding. You also see 4 processors when you run top.
At first we thought this was an error, and got in touch with Dell's tech support. But the geeks there said this is normal behavior.
Newsfollow.com
>was aware of Hyper-Threading to one that was not."
But if you aren't going to use hyper threading you would use a UP (non-SMP) kernel, which would gain you considerable performance. The benefits are not so clear cut as many of the benchmarks show limited benefit from hyperthreading and would perform faster on a uniprocessor kernel.
Something that makes your computer run faster also makes free operatings systems faster too?!
I wonder what it does for commercial OSes.
Sorry for the sarcasm, but isn't that obvious? If you have a processor that can do more work than another processor at equivalent MHz, it, by most estimations, will speed something up.
Not true for everything, but pretty close.
That what was all this school was for... to teach us how to solve our own problems. -- janeowit
All operating on a single chip!
Does SMP support automatically allow benefits from Hyperthreading, or does that require special support all it's own?
"Everything you know is wrong. (And stupid.)"
Moderation Totals: Wrong=2, Stupid=3, Total=5.
The results on Linux kernel 2.4.19 show Hyper-Threading technology could improve multithreaded applications by 30%. Current work on Linux kernel 2.5.32 may provide performance speed-up as much as 51%.
while it may not be very useful for a single-user box(it actually looks like it would be a detriment), integrating it into client-server situations would give us some nice boosts in performance. web servers ought to see some real gains with this.
The World's Worst Webcomic!
Of course multi-threaded applications are going to improve. What's your point?
For those who didn't RTFA:
Simple syscall 1.10 1.10 0%
Simple read 1.49 1.49 0%
Simple write 1.40 1.40 0%
Simple stat 5.12 5.14 0%
Simple fstat 1.50 1.50 0%
Simple open/close 7.38 7.38 0%
Select on 10 fd's 5.41 5.41 0%
Select on 10 tcp fd's 5.69 5.70 0%
Signal handler installation 1.56 1.55 0%
Signal handler overhead 4.29 4.27 0%
Pipe latency 11.16 11.31 -1%
Process fork+exit 190.75 198.84 -4%
Process fork+execve 581.55 617.11 -6%
Process fork+/bin/sh -c 3051.28 3118.08 -2%
is it just me? or does the linux kernel not perform so much better in SMP HT?
I saw this also on CNN... except it wasn't Don Knuth you Do-Do. It was Don Bluth of animation fame... All Dogs Go To Heaven, Titan AE, Dragons Lair, etc.
I know, there might be many places where it has been discussed before, but could someone please tell me if HT is only for threading or can it be used for precesses, too. ...
And I know, they are essentially the same syscall under linux, and might be faster, b/c of synchronization issues wrt to the memory access IIRC
We are talking about 3.06GHz processors here, as far as desktop systems go.
If my 500MHz had it, that would be cool.
"Conclusion
Intel Xeon Hyper-Threading is definitely having a positive impact on Linux kernel and multithreaded applications. The speed-up from Hyper-Threading could be as high as 30% in stock kernel 2.4.19, to 51% in kernel 2.5.32 due to drastic changes in the scheduler run queue's support and Hyper-Threading awareness."
My questions: What's the downside? Is AMD doing anything similar?
Fight with computer brings SWAT team
The pretty detailed (for me anyway) article on Ars Technica concludes that performance on a HyperThreaded CPU will be very much dependant on the application mix. While research like this is useful it will probably always be a try and see scenario.
Tested HT running couple large jobs on a 2 CPU box with each process using over a GB of RAM. Performance went down.
Also HT can play havoc with a openMosix cluster since processes can start being migrated around to CPU's that do not really exist and appear to have no load, yet the physical CPU may be 100% loaded in reality.
It is not all peaches and cream.
Like most development shops, we do a great deal of development for multiprocessor machines so we write a lot of multithreaded code. Multithreaded code creates a whole host of new debugging pitfalls that don't show up if the developer is debugging on a single processor workstation. As John Robbins says in his terrific Debugging Applications book, if you are developing a multithreaded application, you better be certain you are doing your debugging in a multiprocessor environment.
From a development standpoint, will a hyperthreaded chip provide an adequate environment in duplicating the behavior of a multi-processor PC well enough that shops can buy cheaper, one CPU machines for development and still be confident in their results? I'm guessing nothing will replace the real thing but I'd be interested in any commentary.
if you had read the article, you would have seen that the kernel doesn't show too many signs of superb HT usage. In fact, performance degrades in many places.
Also, if you knew just an itsy bit about kernels, you would know that Microsoft has done some pretty good advancements and achievements in the SMP realm.
Well if I must say something, it' this: that's really going to put a fancy how-do-you-do in the knickers of all those pay-per-processor software types. I mean Oracle, for heaven's sake, is going to have to go absolutely bonkers trying to figure out how to screw the light-bulb into that buffalo (if you pardon my french). I mean what's a meglomanic to do? I mean I've got expenses! I've got tricarbonfiberalloy yacht hulls to pay for! Can have people going around trying to process code in a processor without us getting some slice of that monkey, I'll tell you right here and now sir! No sir! Maybe it's your not patriotic enough. Trying to cut corners, eh gov'nor? Now I'm gonna have to go and rewrite all the contracts stating explicitly that "processor" is defined as a virtual space for processing. Yes that ought to do it. But I'll still have to have the lawyers check it, just to make sure they aren't any loopies. Drats those laywers! Taking all my money too!
"This isn't a study in computer science, its a study in human behavior"
Yeah
SMP support has existed since NT 4.
If you use NT 4 MP edition, 2k Pro or XP pro, HT just works if you have the hardware.
Linux had to change to accomodate it, as it bypasses the original system BIOS with it's own code.
So what you meant to say was "once again Linux plays catchup to MicroSoft, but only about a year or so later this time, and not 5-10."
I don't need no instructions to know how to rock!!!!
Seems great for server applications but not so great for desktop usage. It would be nice to see some info on compile times.
Modular Redundancy--Because 4 out of 5 Nodes agree
If you overclock the Xeons (And newer P4 CPUs) too high...
"Prepare to go to HyperThread."
"Go to HyperThread!"
*WHOOSH*
"My God, they've gone plaid!"
(Just to keep on topic, this is a very informative shootout between HT/non-HT Intel and AMD SMP processors setups here.)
Just couldn't resist the Spaceballs reference, tho!
Win2K was capable of making use of hyperthreaded processors, though not aware of the difference between a hyperthreaded processor and 2 physical processors. Windows XP is aware of the difference and makes the right choices about thread prioritization / processor affinity - and licensing (XP Pro, which supports "2 processors" will still support "4 virtual processors" on a hyperthreaded machine.) There is nothing new about Linux supporting Hyperthreading. David
Say it aint so!!
Hyperthread support vs not.
Standard API calls (w/ hyper thread) Increase (a bad thing (tm)) of latency of calls by 1-6%.
STD workload (w/ hyper thread) Increase in throughput an average of 5-10%. Disk writes decreased throughput by 30%.
Client network perf: "Chat room" test, increase of throughput 22-28%.
Server network perf: File serving, increase of 9-31%.
Kernal 2.5.24 roughly doubles the above benefits.
Looks like no real downfalls... (How often are you running a single thread? Me either.)
Slashdot must have a very large varchar size on the comment_txt field in their db, or are they just saving as a blob?
"This isn't a study in computer science, its a study in human behavior"
Hyper-Threading Speeds Windows
FoundNews.com - get paid to blog.,
Is there any comparison between a single, HyperThreaded, chip and two chips multiprocessed with SMP? I assume the results would be very similar.
SCO, Microsoft, P2P, what's your hot button?
...did Intel come up with that name in response to AMD's Hypertransport bus architecture, or did they independently decide that the Xeon needed something hyper?
Gentlemen! You can't fight in here, this is the War Room!
We've been doing some research in the Wagner Labs and we've seen many cases where an app is optimzed for hitting the level II cache and thus reducing the pipeling done by optimizing on modern day compilers and when you use these apps on a hyperthreaded proc you actually see a performance DECREASE by the order of Olog(n) due to the fact that the insctuction set is running parallel in the CPU and never leave the LII cache, thusly never getting a chance of utilizing the advantages of hyperthreading.
Once again this proves the point made by Fred Brooks in "The Mythical Man Month" that even if you increase the technical levels of optimiztion you can will only see actual real-world speed improvements in Olog(n)/4ac, on the average.
That said I do think there is quite a bit of potential for hyperthreading when the compilers are able to catch up, so to speak.
Warmest regards,
--Jack
Wagner LLC Consulting Co. - Getting it right the first time
...I don't understand how this helps. I'm typing this on a Dual 1.4 GHz system -- even if a process is multi-threaded, it's still not as fast as a 2800 MHz processor. In addition, many programs can't take advantage of SMP, rendering dual processors 'useless' (for any single process; Linux distributes processes across processors.)
So if 2*1400 1400? Shouldn't taking, say, the 3 GHz P4 and 'emulating' SMP actually slow things down slightly? I don't understand how it can help, and am actually surprised that it doesn't *hurt* speedwise.
________________________________________________
suwain_2
* 128-byte lock alignment
u zzword-cipher-reallignment too.
That's something new that you guys haven't heard of yet ;)
* Spin-wait loop optimization
* Non-execution based delay loops
* Detection of Hyper-Threading enabled processor and starting the logical processor as if machine was SMP
* Serialization in MTRR and Microcode Update driver as they affect shared state
* Optimization to scheduler when system is idle to prioritize scheduling on a physical processor before scheduling on logical processor
* Offset user stack to avoid 64K aliasing
Is that all?! I hoped it'd do the post-integer-supercooled-re-automation-longterm-b
Get your own free personal location tracker
HT = Yet *another* threat to Microsoft. Not only are the Linux servers reliable, but they just sped up 30 to 50%!
---
IMHO, of course.
May the SOURCE be with you.
While this is cool news, it doesn't help us PPC users. Does anyone know if this technology will make it's way to the new IBM chips that Apple will (according to rumors) use?
Faster clock speed processors speed up Linux.
As a rock-in-roll Physicist once said, No matter where you go, there you are.
My OS course is getting a bit fuzzy already, but we were lucky enough to have a visiting professor from a real university teach it. Profesor Tsuni was one of the best teachers I had.
But back to the point, and excuse the "obviousness" of the questions. But HT sounded like a way to more efficiently use the pipelines on modern processors by allowing multiple threads to work on them.
And here is the fuzzy part, or maybe I'm just not remembering correctly. Do the multiple threads need to be in the same process? If so, I remember the linux kernel threading actually throws threads out as new full processes, and I'm unsure of how the CPU can track that. Or is the scheduler smart enough to send processes down the queue in an order where threads that can share processor time easier are sent together?
Also, if some of the posts are correct it seems that multiple processors show up in Top. Off hand I wonder if this hamper or help OpenMosix's algorithms that decide where to place processes to run.
CLOB, most likely. What do they need binary data for?
In soviet russia, first post fails you!
Actually, SMP support has been been in NT since 3.1. From the start NT was designed to be multi-threaded.
As if there wasn't enough already...
processor : 0
bogomips : 3191.60
processor : 1
bogomips : 3198.15
According to that the logical processor is actually faster than the physical one! Just think of what you could wind up with if you instantiated a logical CPU on the logical CPU!
but surely someone who was qualified to comment would know how to spell kernel?
It's not that I want to pull you up for bad spelling but in order to speak with authority one must get the simple things right.
Apologies if you have some sort of linguistic problem but people who seriously study kernel performance see the word kernel constantly and therefore one would expect them to spell kernel kernel.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Multithreaded, sure but I'm not sure if it supported SMP, because I don't think it was a reality on Intel hardware yet (?) Of course, back then there was an Alpha tree so maybe it was.
At any rate, I find it easier to live as though NT 3.x never existed.
I don't need no instructions to know how to rock!!!!
WHat you've conveniently snipped out in your trollish post is all of the applications benchmarks showing improvements. If you're not going to run any application code, you might as well shut the machine off and save the marginal stress on the environment.
Most of us have our computers do work and those applications, running on an OS which has *barely* slowed, will be able to do more work in the same amount of time under the HT-aware OS than under one which does not utilize the second, virtual processor.
I have discovered a truly marvelous sig, unfortunately the sig limit is too small to contain i
Anyone know of any details around SMP versions of HT CPUs. It's not a very google friendly set of search terms.
I expect that there would be a performance difference if the scheduler knew which were real cpus and which were half of an HT pair.
Even flags to fork concerning which processor to fork to. i.e. --this_cpu_but_different_HT_CPU
Because you might want the freedom to attempt to reduce the in-CPU cache misses and the like.
Likewise the the implmentation of Process Groups - setpgid() warrants investigation.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Wow. Its posts like this that really make me feel old at 25.
They had 486 SMP systems. In fact there was an awesome upgrade that came out like ten years ago that let you put two 486 processors in one socket. Of course you needed the clearance for it. SMP was actually all the rage ten years ago for the same reason PowerPC was all the rage. Intel had a hard time scaling, so one of the solutions was to us multithreading and divide up the work.
OS/2 2.11 SMP was out in 1993, and NT 3.1 came out shortly thereafter. Both supported SMP. The Pentium Pro, which came out in early 1995 was highly optimized for 32-bit code and multiprocessing. 4 and 8 way Pentium Pro boards existed. And were somewhat common.
If anything, SMP is LESS common today. When was the last time you saw a 4-way SMP board for sale anywhere? You could easily get them back then. The reason its less common today is processors really are a lot faster. Intel is doing this Hyperthreading crap because they know that orders of magnitude performance gains are a thing of the past, so multithreading is the key.
Of course, us old OS/2 fanatics were saying this ten years ago.
I don't read or respond to AC posts
If you're running code that's efficient on a P4 (few mis-predicted branches, low cache miss rate, good parallelism, etc.) then HT is pretty much useless.
If you're running code that's inefficient on a P4 (which pays for its high GHz with long pipelines, large latencies, a slow decode stage, and several other drawbacks), then HT can usually paper over a fair percentage of these problems. But remember that HT requires OS support, may require application support, and "your mileage will vary".
It's easy to make up & spread cool- and credible-sounding stuff. Finding & checking hard facts is hard work.
I have an idea about "logical processors". If, for some reason Intel decided to make 1 cpu to 3 "virtual" cpu's, could you boot up the computer on cpu1 with a special OS that allows you to boot the other cpu's into their own modes, while having the master OS deal with memory and drive accesses?
CPU1 - MasterOS
CPU2 - Linux 2.4.18
CPU3 - Win2k
It'd be even neater if you could shut down the os'es and reboot the chips.
To the kernel devs: is this possible?
In Europe P4 3.0 with HT costs ~745 euro (+tax)
An Asus A7M for dual Athlon costs ~260 euro (+tax)
Two Athlon XP 2200+ cost ~340 euro (+tax).
Alternatively you can get two Athlon MP 2000+ for
roughly the same money (if you don't trust the
XPs).
Now, please explain to me why would someone
with real SMP needs in mind (and NOT games)
consider the P4 with HT.
P.
P.S. I understand that the prices in the US are
different, but still, it is VERY expensive.
This is a known troll, please mod parent down.
Holy intellectual dishonesty, Batman!
NT and Windows 2000 do not support HT and never will. NT will not becuase it's been end-of-lifed, and Windows 2000 will not because of Microsft policy. On a 2-CPU system with HyperThreading, NT and Windows 2000 will think they have real 4 CPUs (unsurprisingly, this is what a pre-HT version of Linux will see as well). HT support means the OS knows that it has, in this example, 2 real CPUS and 2 fakes, and the scheduler will weight the real CPUs accordingly.
XPPro SP1 is the first, and only shipping version of Windows to support HT.
SMP support has existed since NT 4.
Linux has also had SMP support for ages. The changes that Linux made recently to the kernel was specifically to handle virtual cpus vs. physical cpus. I am willing to bet the farm that NO current MS product tells the difference.
So what you meant to say was "once again Linux plays catchup to MicroSoft, but only about a year or so later this time, and not 5-10.
Sounds more like MS will be catching up (if they even bother) to Linux instead of the other way around. Sorry to feed the trolls but with a +5 informative, it had to be done. Some moderators need to be shot.
seSales, Point of Sale software for OS X.
NGPT 2.2.0 tops both Linuxthreads and NPTL
Keep in mind that NPTL paved the way for the kernel changes that NGPT also makes use of.
I'm sure that the NPTL team won't simply give up.
Anyway - it looks like Linux will finally have a good SMP threading library.
Simply put, you'll need two or more processes consuming all available CPU power before you'll see some real benefits from HT. If you're severely IO-bound, running a high-end FC SAN solution on an old P2 server will outperform a 5ghz machine with a mediocre disk.
So - yes, not all people and applications will benefit from this. But no - it is not try and see.
Stop the brainwash
While Win2K will see a hyper-threaded CPU as 2 physical CPUs, WinXP is smart enough to see it as a CPU and a Virtual CPU. At the last Intel conference I attended they made sure to emphasize that while XP Home doesn't support 2 phyisical prosessors it will properly recognize a hyper-threaded CPU and allocate resources accordingly. Do you think Intel would enable the technology in the P4 3Ghz(a desktop CPU) without making sure Microsoft supported it in their desktop operating systems?
I see the real benefit of Hyper-Threading being increased stability especially for development boxes. In the case where you have an infinite loop bug your CPU usage will eventually hit 100% and the computer will lock up. With Hyper-Threading only one virtual processor would lock up, the other will remain free so you would be able terminate the process and save yourself from crashing and rebooting the system.
Some people just cannot spell. I started out in Anthropology but could never learn to spell peseant (or is it peasant or maybe peasent or pesant). They all look correct to me even tho I've read many books and articles on rural cultures in developing countries. I have a brother who can't spell the word engineer and yet makes 6 figures in this economy designing circuit boards for some well known companies. His partner writes the documentation. If you judge competence in a field based on a mispelled word in /. post, then (not to be too harsh) you're an idiot.
So, in a nutshell, what MS says is: Windows 2000 counts processors in a broken way and requires you to buy licenses for every logical processor, even though you won't get nearly as much processing power as you would if you really had that many physical processors. But rather than fix this bug, we're going to solve the problem by making you buy .NET, which counts processors correctly. So either way, if you're going to use hyperthreading, expect to send us more money.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
This is fine, I guess, if you're going to run a processor as slow (!) as this. Point being that a hyperthreaded system will place greater demands on the ram bandwidth.
... say 3GHz, where HT is enabled in vanilla P4's ... and we can expect to see the memory bandwidth being toasted continuously. Under these conditions I doubt we would see a speedup at all, and quite possibly the reduced cache efficiency would reduce it.
With a slow processor they may be using 80% of the available bandwidth instead of 60% with HT switched off. Upping to processor speed to
Executive Summary: Can we do this again with a non-Xeon P4 3GHz?
Dave
I write a blog now, you should be afraid.
If the results are similar to running SMP with two processors (and they look roughly similar), isn't a system with 2 Athlon-MPs still cheaper for a given performance level?
"with their freedom lost all virtue lose" - Milton
Obviously that depends.
If your web server is just doing static content, then probably not as a 486 can saturate a T1.
If your web server is doing dynamic content, then possibly.
I just upgraded my two web servers to AMD Athlon 1.2 ghz processors with 1.5 gigs of RAM each.
Don't tell me I have to buy another CPU and motherboard combo -- again....
Of course, you are wrong.
XP Pro and even XP Home support hyperthreading. They know the difference between physical and logical and they treat them accordingly (including lisencing ramifications - you can have 2 logical processors on home, 4 on pro vs. 1 / 2 physical processors.)
David
You can't get much faster than Ninnle Linux! Blisteringly fast performance!
My statement was that I doubt any MS product does this. I did not say that MS did not recognize the hyperthreaded CPU.
Can you show anywhere that someone has tested XP to show that it handles this scheduling correctly?
seSales, Point of Sale software for OS X.
Rumor/legend: OS/2 did not have SMP at the divorce; MS got it up in NT, but IBM couldn't on OS/2. Gordon Letwin made a semi-public bet that if IBM could by some date, he'd fly "everyone" [the usenet group? no one was ever sure] to Seattle for lunch. IBM flailed, and failed; Letwin won.
i'll be curious to see websphere's application server's for linux' performance on these new chips.
Currently websphere's performance on linux/intel is pretty crappy. Poor java cannot multithread properly on intel, so instead it spawns of multiple processes. Currently one application server will spawn off about 90 processes on linux, while the equivilant on AIX will spawn off only 2-3.
If you think IBM is promoting linux now, just imagine how much they'll be promoting linux once thier beloved WebSphere runs smoothly on it?
While the technology may be new to Intel, it is 4 decades old. That's not new in my book.
This is nothing new. The Cray MTA supports 128 threads per processor and can scale up to 256 processors in a single system.
And the OS is a BSD variant.
Dual P4 2.4GHz Xeons. Had to compile my own vanilla kernel to get it to see it as 4, but it does...
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
bogomips : 4771.02
--
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
bogomips : 4784.12
--
processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
bogomips : 4784.12
--
processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
bogomips : 4784.12
Odd that the logical processors have more bogomips...Kernel is 2.4.20.
This thing is fast...Apache compiled and installed in about 17 seconds. Kernel compiled in about a minute. The Java app that is going to run on this took about 16 seconds to compile and load...compare that to about 41 seconds on our old dual P3 1GHz machine...NICE.
Don't companies like (guessing) Oracle charge by how many processors you use with their software? I know for solaris (even intel) you are licensed by how many cpus you can use. (Just like windows I guess, 1, 2, 4, 8+ cpus)
Also since XP Home is only single processor capable where does that leave the home users that buy 3.x Ghz computers? Surely it wouldn't be long before someone figures out how to swap a multiprocessor HAL into XP Home...
To simplify greatly, if the CPU has separate units for integer and floating-point math (for example), Hyperthreading means you can use these units in parallel. Therefore, HT will not speed up pure integer or pure FP math, like SMP would. It will only speed up things if you run different kinds of process simultaneously.
Also, many people have noted that HT sometimes slows things down a bit. I don't find this very surprising because the OS needs more work to organize things for HT, but it may not have more CPU resources than a non-HT version.
Personally, I think HT is a good idea because it's using the existing hardware more efficiently in a true hacker spirit. However, it's nowhere near proper SMP.
Escher was the first MC and Giger invented the HR department.
Toms test is flawed because he ran them on an OS which wasn't Hyperthreading aware. It simply thought it had two CPUs.
You schedule processes differently on two CPUs than you do on a single CPU. So, Hyperthreading tricks XP into using techniques which are optimal for a two processor machine when it really only has one.
So, for example, you might go through the trouble of scheduling a thread on the 2nd CPU (since the first is busy). On a two CPU machine that 2nd thread runs. On a HT machine it barely gets run at all unless the first thread gets significantly memory bound. So, you reschedule the thread back on the first one after the first thread is scheduled out. Now you've schedule the same thread twice to get just over one quanta of work done.
You can easily see how CPU gets wasted.
Once XP is aware of Hyperthreading, it will rarely if ever slow down when Hyperthreading is switched on.
And this is the reason you cannot apply Tom's results broadly to Hyperthreading in general, just HT on non-aware versions of XP.
One of the major impediments to increasing CPU performance has been increasing memory latency. Memory latency has grown worse as CPUs have gotten faster. Accessing RAM will now cause a >150 cycle latency, during which the processor sits IDLE.
Cache only partly mitigates this problem. Some applications, such as databases and OLTP, are heavily dependent on repeatedly accessing non-cached RAM. There is no way to cache all the relevant data, since virtually all databases are larger than can fit in any present cache, no matter how large, and there is sometimes no way to predict which data will be accessed. ALL of these applications have CPUs that spend much of their time being IDLE, waiting for memory to be returned.
SMT (hyperthreading) allows the processor to perform useful work during these otherwise idle periods, by allowing the cpu to switch to a thread that is not blocked on memory access. The "idle bubbles" in the execution pipeline can therefore be "filled in" by useful work that advances the state of relevant programs.
SMT can cause a degredation in performance beceause it can lead to "cache thrashing." In an SMT-naive kernel, two unrelated threads could be scheduled for the same physical CPU. These unrelated threads will likely share very little code or data. The two threads will therefore "compete" for the single shared cache, with each thread's data being repeatedly displaced by the other's.
This difficulty can be substantially mitigated by making the kernel aware of "virtual processors," and by implementing scheduleing algorithms to minimize the impact. The performance of hyperthreading will likely improve as kernels are better able to exploit it.
Incorrect, OS/2 was SMP since 2.1. The OS/2 SMP model is still known to be one of the best SMP models to have ever been written. Click on this link http://www.byte.com/art/9406/sec11/art2.htm and learn something about OS/2 SMP (oh geez, it's 1994) and SMP in general.
I have seen screenshots of Windows task manager showing (2) CPU performance graphs.
Since the "Professional" line of NT/2K/XP kernels only support two processors, does this mean you can only use one HT CPU?
It really bugs me when I see benchmark numbers relied upon when they have not been presented as statistically significant.
Whenever you run a benchmark, you MUST run it multiple times and do the proper statistical calculations for standard deviation.
It is NOT VALID to do one run, and it is NOT VALID to average a bunch of runs without knowing what the deviation is.
Some times a benchmark's time will vary by more than 100%. Sometimes the reasons are valid, sometimes they are because of an error in the benchmark.
Without this sort of validation, the numbers presented should not be trusted.
SMT (hyperthreading) will become increasingly important when processors are able to execute more than 2 threads simultaneously.
This development is inevitable. Previously, each new processor generation was faster than the prior one at a given clock rate, because each new processor core had more execution units, and was therefore able to perform more work in parallel. This trend abruptly ended recently, for one reason: there is no more instruction-level parallelism (ILP) to exploit. It is impossible for a processor to look at a thead of execution and find more than a few instructions to execute in parallel.
The only parallelism left to exploit is THREAD-LEVEL parallelism (TLP). Therefore the only way to continually increase performance is to increase the number of threads that a CPU can execute in parallel. This requires two modifications to CPU cores: first, increase the number of thread contexts per CPU, and second, increase the number of pipelines to which those threads can be dispatched.
With the P4, it would be pointless to have more than 2 thread contexts, because there aren't enough CPU resources lying idle to execute more than 2 threads. But future CPUs could make use of more than 2 thread contexts by having enough CPU resources to execute all of them. Future CPUs could have 20 execution units or more, which would be enough to execute several threads. Remember that the number of transistors per CPU continues to increase exponentially.
It's easy to forsee a time when processors have 20 execution units (10 integer, 10 fp) and 4 thread contexts, offering more than triple the performance of a non-SMT cpu. In the future, non-SMT CPUs will make as little sense as a non-superscalar CPU would today.
Any SMP capable operating system supports HT. On the other hand license issues make combining true SMP and HT a pain on non server version of Windows.
You're totally missing that part of the beauty of HT is the transparency.
On the other hand you can write things SPECIFICALLY for HT to deal with things such as cache issues, but saying that windows doesn't support it at all is rather misleading and makes it seem like people wouldn't see any improvements at all.
You Sir, Are an Idiot.
What about "make -j" speedup. Eg. How much does it speed up kernel builds? That's a good high-level, non-synthetic benchmark that's relevant to most of us Slashdotters and developers.
These people measured the time it takes to complete a system call?? That's moronic. No amount of SMP will offer *any* speedup in that test. The advantages of SMTP come when you are running multithreaded compute-intensive applications OR running many processes at high load.
There's no A in kernel. Common begg^Hinner error.
Option 1 : 4 Logical CPUS
Option 2 : 2 x Dual Athlon OpenMosix
Option 3 : 4 node cheap Athlon OpenMosix
All much of a muchness I suspect.
Id lean for Option 2, since its got real SMP, and OpenMosix, and redundancy, and coolness factor.
then at least give them some freakin credit. that is clearly a copy and paste on developerWorks' part. an introduction such as "i read this over at such and such site" is plenty sufficient.
OSNews article timestamp: 2003-01-14 02:08:14
Slasdot article timestamp: Tuesday January 14, @02:36PM
Keep in mind, a 30% gain (for the 2.4 series) in a 2GHz machine would equate to a machine that performed server-oriented functions at an effective 2.6GHz.
When they benchmarked 2.5.32, they showed a 51% increase, which would boost your effective server performance to 3GHz.
Granted, the way I understand it, the actual coordination of core components for the two threads is hard-wired or in firmware. That means Intel can still improve HT, to get a better performance boost. To further that line, consider if Intel were to add additional core sections of their CPUs, to be allocated dynamically by the firmware. That means you're increasing your per-clock performance without the major overhead of developing a whole new CPU core.
I can't see Microsoft standing for it. Intel could put all the pieces for two CPUs on the same die, and call it HT. You might have all the functionality of a dual-CPU setup, with less latency, and still have it show up as a single HT-enabled processor.
With the way Microsoft's handling SMP machines (with CPU licenses), in addition to their statement that they are developing a 64-bit version of Windows based on the Hammer architecture, I think AMD's future looks pretty bright.
What's this Submit thingy do?
I tested HT on dual Xeon and my experience is VERY positive. My opinion is that 20% speed improvement is very conservative: my experience shows that code written with HT in mind (see Intel's develop. guide), gets 50% speed-up (compiled with icc). Many other standard tools I use daily (like BLAST) show significant improvement too. In fact, for my applications, to reach 2*2.8 Xeons I need something like 8*750Mhz Ultra3 CPUs or 4*667 EV67 :) This is quite impressive for me, if cost is taken into the picture!
In general, I think HT is a very clever idea allowing much better use of CPU. I hope they'll come up with 4* and more. This is also an interesting challenge for M$: now, when in 6 months SMP is on every desktop, all kernel internals suddenly need to be SMP safe (including third party buggy drivers): looking back at the long way Linux evolved to the current state and multiplying on their rate of innovation, it's not going to be trivial (good for Linux, of course).
Users don't care about 5% lower throughput, but since the web browser can run simultaneously with a background compilation, the user experience will be much smoother (everyone says that). In the single-threaded benchmarks, there is only one thread running, so no speedups can be expected.
It all starts with the long pipelines and being able to dispatch several instructions at once. The problem was that some of the execution units would go idle due to instructions that couldn't be reordered effectively enough.
The idea behind Hyperthreading is to have an additional context also dispatching instructions which need not be in any particular order with respect to the first thread. This allows the CPU to actually run at closer to 100% capacity.
The upside is that when things work out well, less execution units sit idle and waste cycles. The downside is that if one thread does manage to fully utilize the CPU, you don't benefit, and will likely pay a penelty for the extra scheduling.
From what I've seen, many apps benefit. Heavy and well optomized floating point computation can lose on HT. Some of that can be helped by a more aware scheduler that tries to pair up primarily integer threads with threads doing a lot of floating point.
Microsoft has done some pretty good advancements and achievements in the SMP realm
Hmmm. Given that SMP has been around an awfully long time, I find this a little hard to believe. And I also remember talking to a senioir DB guy at Microsoft where he was explaining how they'd just started to do SMP optimization in the OS - this was for an NT SP in 1998 or 99.
What's incorrect? The split was before 2.0 shipped.
Google Groups for 'Letwin SMP' says (April '93 posting!) his wager was "that IBM would *not* ship an SMP-capable OS/2 by March, 1993". And that guy, ten years back, was conceding defeat.
I will admit this doesn't address when NT got SMPed. Maybe Letwin's bet was just bagging on OS/2, not pumping NT.
As to what's "still known", I'll leave that to the peanut gallery.
on my local dual P IV 2.4 GHz Dell PE2650: .Net/windows 2003 server.
for SP3 on windows 2000 Advanced server, 2 logical per 1 physical.
This will not be fixed in any release of windows 2000.
It will be fixed in
check the MSKB for details.
2000 and probably WinNT 4 saw the HT as two processors but WinXP differentiates between virtual and physical CPU's and schedules them appropritely
I have two Hyper-Threading Xeons in my server and it compiles a kernel like I had typed 'ls'.
So can I use this to run two distributed computing projects at once? Or several instances of one? The Folding@Home project keeps track of how many "active CPUs" that have responded in the last week. Does a hyperthreaded processor count as 2?
My statement was that I doubt any MS product does this.
Do you really think that Intel implements ANY feature without running it by Microsoft first? Think about it.
Anyway, Linux 2.5 is still in Alpha (long way away from mainstream distribution), and Windows 03 will be shipping in April, so we'll call this one for MSFT.
OS/2 2.x SMP was very poor performing, and Microsoft benchmarketed the hell out of the fact that NT did better.
At least with the application we were running (Notes Server), OS/2 SMP wasn't a win at all, while NT scaled almost linearlly. Strange because Notes was very thread-happy (in fact too-thread happy, and it would hit some limit in OS/2 and panic).
Furthermore, OS/2 2.x SMP wasn't in the box and required some special IBM Salesman Ass-Kissing mojo that we apparently didn't have.
Ironically, OS/2 currently has one of the best SMP implementations on x86 available. But back in 1993-4, it was not so good.
I just ordered several dual P4-XEON, 1 Gig RAM, 80 Gig HD workstations for $1800 a piece from Dell. Besides the amazing price, I am very interested to see how a single processor HT compares to two non HT processors. I understand that HT can easily be controlled by a bios switch so it shouldn't be that hard. I full expect the two non HT processors to win, but the question is by how much? Which will perform better for memory bound tasks? Which will perform better for context switches? Which will perform better for high levels of lock contention? How will the interactivity of the user interface compare? Interesting questions, yet I haven't seen any good data on this yet. My hope is that a single HT processor will provide 80% of the benefit a dual processor gives me today - that would be a major win for everyone!
Also, with DDR or Rambus costing nearly triple what SDRAM costs, I wonder if some enterprising company will develop a chipset that can interleave access to two SDRAM DIMM's for performance similar to DDR. Even if the two DIMM's have to be on completely separate electrical busses, I would think that it would result in a lower total cost for most popular combinations of performance levels and memory capacity.
Through a merger, we recently converted to being a Big Blew shop, so when procurement dropped my new IBM x255 in my lap, I was extremely surprised to see that I had more processors (8) than the box physically supports (4). Freaky. Then, I find through some querying that there is apparently work being done in 2.5 to "turn off" this "feature." Makes you wonder if everyone really thinks this is an improvement. But this is a real head trip if you don't know anything about it and your first encounter is in dmesg!
On the whole, though, I'm pleased (so far) with its performance, though I haven't done any real benchmarking. But, the fact that anyone would want to turn this off still bugs me just a little bit - are people as scared of this as I am? I must be getting old...
main(){char I,l,O[]={'-',1-1,0,(1<<5)-1,0+'-',-10-1,-10,11-0,
Incorrect. The "make" command takes care of the parallelism by using a dependancy file (Makefile). If a makefile is written poorly (sequential compile commands with no dependancy information given) then you will get no gain from forking additional make processes as the compile commands will execute sequentially on *one* processor. GCC itself does not utilize more than one processor.
HT is not completly transparent. For example, MTRR registers are not per virt. processor abd need more care. So you need to modify SMP kernel to (correctly) support HT. At least that's my impression from reading newsgroups.
There are plenty of paid MSFT astoturfers
lurking around here.
When will the myth of slashdot consensus die?
Do you think that maybe the people who hate intel are not the same ones who like Xeon HT processors?
Just because it appears that a majority of slashdot readers think one way sometimes, more than likely it just means that the other 90% don't care enough to comment.
The SMP in 2.11 wasn't necessarily the best, but I was just talking about when it really first occurred, so I may have misworded it. They actually had to write the SMP for some pretty interesting hardware (486 SMP) and needed to do some amazing wizardry to make it all work right. After OS/2 Warp came out, the SMP was amazing and NT couldn't touch it. I remember reading an article about the SMP software engineer (name escapes me right now) who he was a programming savaunt and IBM by all means had hired the best of the best. A lot of his work was put on hold for OS/2 Warp as IBM was making OS/2 into a microkernel OS for the then new PowerPC architecture (which was a ultra amazing OS, but never truly released to the public). This is what basically killed OS/2 in the end and held up the SMP implementation that today is currently one the best in the world (on Intel hardware).
If I dedicate one CPU to manage all of the other CPUs, will OS/2 finally run efficiently?
You're buying into the myth of HT transparency. License issues and "things such as cache issues" aren't the problem--the achilles heel of HT is that in a physical smp system you can schedule two processor intensive tasks on two virtual cpus on the same physical cpu, while the other physical cpu is completely idle. This will lead to unpredictable, unrepeatable, and just plain bad performance. One a system with only one physical cpu this is not an issue, but you'd better make sure you don't run HT on a multi-cpu system without some OS scheduler support.