Hyperthreading Hurts Server Performance?
sebFlyte writes "ZDNet is reporting that enabling Intel's new Hyperthreading Technology on your servers could lead to markedly decreased performance, according to some developers who have been looking into problems that have been occurring since HT has been shipping automatically activated. One MS developer from the SQL server team put it simply: 'Our customers observed very interesting behaviour on high-end HT-enabled hardware. They noticed that in some cases when high load is applied SQL Server CPU usage increases significantly but SQL Server performance degrades.' Another developer, this time from Citrix, was just as blunt. 'It's ironic. Intel had sold hyperthreading as something that gave performance gains to heavily threaded software. SQL Server is very thread-intensive, but it suffers. In fact, I've never seen performance improvement on server software with hyperthreading enabled. We recommend customers disable it.'"
Anybody who understands HT has been saying this since chips supported it, I have it enabled because I find that at typical loads our DB servers performance benefits from HT aware scheduling. Welcome to 2002.
Well, a technology with a name such as "HyperThreading" is targeted more to end users who don't know about processors, rather than SQL "Performance Tuners" who try to squeeze every cycle of processing power.
HyperThreading might help poorly written thread management (independent audio and video subsystems for example), but not true multithreading, that's for sure.
I read the intel assembly guide section regarding hyperthreading, and it clearly states that performance will drop if you don't take the shared cache into consideration. The two logical threads contend for the cache, causing the performance problems that were described. In order for there to be a true benefit to hyperthreading, either the program, the OS or the compiler needs to determine that hyperthreading is enabled, and model the code to only use less than half the cache. It's been known that way since the beginning, and frankly, is silly that MS is scratching their heads wondering why this is. Lower the cache footprint, and I'll be willing to bet that performance rises dramatically.
Marxism is the opiate of dumbasses
It's not new. It was found to have some serious security issues anyway a while back, which is why servers should leave it disabled anyway.
indeed has once again proved it is expensive to be poor.
Question I find more interesting: What is the performance gap between dual CPU vs Dual-core?
If you mod me down, I *will* introduce you to my sister!
Those of us who care to measure for themselves rather than buy Intel's propaganda, have noticed this long ago. I bet the people quoted in the article noticed it long ago as well, but it has only recently become "politically correct" to share that knowledge.
They're talking about servers. What about our normal Windows PCs (alright, not all of you are on Windows)? Should we turn it off too?
Perhaps this ushers a new era of computing, where Intel chips underperform AMD ones.
Oh, wait...
...most of us who are involved in high end server administration have known the performace cost of HT since it started shipping on server processors. Now if we could persuade the PHB to to AMD.....
Probably the developers of sql server didn't undestand how to get the best from a hyperthreading architecture. There's a big difference between 'real' threads and 'pseudo' (time-sliced) threads. I'm betting it's the software that's at fault here and not Intel's architecture.
If you have a system thread cleaning out blocks of disk cache memory then of course it is going to suffer. The whole point of hyperthreading was that one thread could run while another was waiting for I/O.
The first tests on Linux when Hyperthreading came out were also pretty discouraging.
Mielipiteet omiani - Opinions personal, facts suspect.
I don't want to start a flamewar, but everytime I see an Intel commercial when the announcer says "pentium 4 with ht technology", it sounds like a stupid marketing ploy. It's suppose to offer better performance in heavily threaded apps, but apparently it doesn't. Also, in the commercials, it never explains to the customer what HT is, which just shows that if they had a great piece of technology, they would atleast take 10 seconds to explain the benefits, but they never do. They say a catch phrase, and that's really what it all seems to boil down to.
public class null extends java applet { System.out.print ("Tabula Rasa"); }
Well, AFAIK, the HTT thing only allows for the processor to sort of split execution units (FPU, ALU, etc) so that one can work on one thread, the other on another one. If an application resorts heavily to one of those units -- and my somewhat uninformed feeling is that software like SQL probably works mostly on the ALU, it, can't possibly GAIN performance. On the other hand, I can see the effort of thrying to pigeonhole the idle threads on the wrong execution unit (will it even try that?) completely borking performance. So yeah, no surprises here.
Just doing a search for "hyperthreading problems" in Google will give you hundreds of articles about this. I remember complaints coming out right after the release of those CPUs from places like Anandtech and so on. In some cases, having hyperthreading enabled could cause a crash. Nice job Intel. I'll stick with my Athlon
gasmonso http://religiousfreaks.com/This sort of effect has been talked about for as long as I remember hearing about hyperthreading. It was common knowledge long before the chips came out that running two threads on the same cache can cause performance issues. One can see this with two chips sharing an L2 cache so why should it be a surprise here?
The real question is whether this issue can be optimized for. If the developers design their code with HT in mind will this still be a problem since the other thread may belong to another process or would properly optimized code be able to deal with his?
Most importantly is this a rare effect or a common one? Would it be rare or common if you optimize your programs for an HT machine?
If you liked this thought maybe you would find my blog nice too:
HyperThreading was never ment for server computers. It is for desktop usage. However, will you gain any performance increase by using it is unclear to me. What about if you compile your applications using Intel's compiler? Anyone have any links to any test reports?
Hyperthrashing?
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
Like so much else in this industry, it's a bunch of hype with no real meat. Now that Intel can't sell their stupid processors based on numbers alone (not much gain from a 1GHz to a 3GHz) they're trying to pull this crap. Hyperthreading is a frickin' joke, there's no software that can take advantage of it (including Windows). I hate Microsoft. I hate Intel. I hate you. Good day.
Do or do not. There is no try. --Yoda
As someone who commented above pointed out intel openly acknowledges performance can be hurt. I don't know what you mean about not being acceptable to notice this as I've seen this sort of issue mentioned in pretty much every article I've read on HT starting quite far back.
HT is just another chip technology like any other. It is only in the rarest circumstances that a new technology will be better/faster for everything. These things all have tradeoffs and the question is whether the benefits are enough to exceed the disadvantages.
I really think you are being a little unfair to intel. If you had evidence that it decreased performance for most systems even when the software was compiled taking HT into account then you might have a point. However, as it is this is no different than IBM touting its RISC technology or AMD talking about their SIMD capabilities. For each of these technologies you could find some code which would actually run slower. If you happen to be running code which makes heavy use of some hardware optimized string instructions a RISC system can actually make things worse not to mention a whole other host of issues. The SIMD capabilities of most x86 processors required switching the FPU state which took time as well.
It's only reasonable that companies want to publisize their newest fancy technology and they are hardly unsavory because they don't put the potential disadvantages centrally in their advertisements/PR material. When you go on a first date do you tell the girl about your loud snoring, how you cheated on your ex or other bad qualities about yourself. Of course not, one doesn't lie about these things but it is only natural to want to put the best face forward and it seems ridiculous to hold intel to a higher standard than an individual in these matters.
If you liked this thought maybe you would find my blog nice too:
But can you handle being slashdotted?
Usual response is to disable it from bios
g _id=12403341
p _id=9028&words=hyperthreading&type_of_search=mlist s
One possible solution (code patch)
http://sourceforge.net/mailarchive/message.php?ms
Other threads with hyperthreading problems (slowdowns)
http://sourceforge.net/search/?forum_id=6330&grou
developer http://flamerobin.org
The article seems to focus only on Windows. To get good performance from hyperthreading, the scheduler has to be aware of situations that could lead to decreased performance and avoid them. So is this a problem with the Windows scheduler being unable to deal with hyperthreading or is hyperthreading really broken? How is hyperthreading performance on other operating systems?
Another question one needs to ask is, how is performance on single and dual CPU systems? Getting good performance on a dual CPU HT system (which means four logical CPUs) is more complicated and thus requires more sophisticated algorithms in the scheduler.
Applications are most likely not to be blamed for the decreased performance. Such hardware differences should be dealt with by the kernel. Occationally the scheduler should keep one thread idle whenever that leads to the best performance. Only when there is a performance benefit should both threads be used at the same time.
Do you care about the security of your wireless mouse?
The hardware vendors are on drugs. They imagine that if they issue a bunch of PR articles, exploitation of new hardware features will happen by itself. It won't and the hw vendors are in for a rude shock.
I second the person that said programmers shouldn't be writing code to the cache size on a processor. How well your code fits in cache is not something you can control at run time. Different releases of the CPU often have different cache sizes. And frankly developers should always try to achieve tight efficent code, not develope to a particular cache size.
Think Deeply.
I have had an ATI all in wonder 9800 for close to more than a year now. I never really used the tuner part until a few weeks a go when I took delivery of several new LCD's and decided that I could be watching a little tv on one while working.
The 9800 sits on my XP box, which rarely gets rebooted. Games, browsing etc. My mac mini and linux boxes sit in their places with a KVM
Well after using the tuner part, it looks great with my digital cable. But the box would lock, couldnt kill the process of the ATI software MMC. A few times an hour sometimes at least once a day. Well I was on the point of sticking an old haupage in there. Or using another MMC.
Well after much digging I found a thread on how HT could cause issues with the software. I disabled it in the bios, do not really need it for anything. And ran the Tuner 48 hours solid without a lockup.
Now perhaps ATI is at fault for the software, but then again HT caused the incompatibility in my book.
Puto
The Revolution Will Not Be Televised
I know asking for them to research is a stretch, but the submitter should at least read the acticle before submitting it. The quote was from a Technical Director at a consulting company that sells Citrix software, not from a developer at Citrix. Hyperthreading can definitely help performance of Metaframe running under Windows 2003. Enabling it in the bios on a server running Windows 2000 was where the problem resided.
I don't know about you guys, but I run many linux servers. I have a mix of CPU's, and the HT servers seem to perform better than non HT servers. Is linux better optimized for HT?
-- these are only opinions and they might not be mine.
As far as I can tell, everything that hyperthreading was designed around was the idea that two dissimilar threads would run at the same time, for example, an I/O bound thread with a FPU-bound thread, or the like.
Running two identical threads on the same processor intuitively seems like it would result in a slowdown, as you've got more overhead than the thread running alone, with the same tasks being executed.
Kinda like trying to toast bread by putting one piece in, then rapidly taking it out and putting a different one in, repeat as needed, vs having two seperate slots, or just toasting one at a time.
Before claiming HT is crap, how about taking a look at some actual test reports? It must be easily tested, so where can we find some real proof?
Hyperthreading is a gimmick to keep Intel's overly long pipeline busy. At one point the wisdom was for processors to have long instruction pipelines. The problem arises when branch prediction fails and trashes your pipeline. AMD saw that the long pipelines were harmful and shortened them on the Athlon line. The rest is history.
As far as I'm concerned, the fiasco of P4 being far worse than P3, and the apparent inability to do a turnabout, means Intel is a broken company. They should have just tacked the new P4 instructions on a P-M core and called it the P5. Oh wait...
"I read the intel assembly guide section regarding hyperthreading, and it clearly states that performance will drop if you don't take the shared cache into consideration." This is a general problem. XBox 360 has similar issues, 3 cores sharing the same cache. Having multiple independent cpu's with each its local memory (like multiprocessor or PS3 SPU's),doesn't suffer from these issues.
HT is a very simple concept: Virtualize 2 CPUs by cutting all caches in half and allocating each half to one of the CPUs, and allow the ALUs to process data from either thread. Ths can give good performance, for instance when one thread has a cache miss and is waiting for data from main memory (or god forbid there is a fault and you need to read from the HDD). In a normal single CPU operation, this ties up resources, and that thread can't make any progress. with HT on, the second thread can continue processing data. Or even without a cache miss, there are 4 (or more) ALUs on the die, and only certain types of applications can effectively make use of them all simulatneously. Having HT allows a higher probability that all the resources on the chip are used. But the cost, as I said above, is cutting the cache sizes in half (effectively). And cache is king for some applications. there are many job types where doubling the cache gives much better performance than even doubling the CPU speed (well, that is probably pushing it, ut certainly adding 10% more cache can be better than 10% higher clock rate), as it means less time going to main memory.
It isn't a foolproof technology, but it has it's benefits. SQL can be very heavy on the cache, and I'm not surprised that it doesn't perform optimally without some tuning.
Only works its magic if you're running one thread that using the ALU and another thread that's heavy on the FPU. This is why I can run 3 instances of WoW on a 2.8GHz HT P4 when my AMD Athlon XP 3400 comes to a crawl in this situation.
Of course a database server isn't going to take advantage of a hyperthreaded CPU. It doesn't do any FPU at all. Perhaps it is possible that since the operating system thinks it has two processors when there's only one that you could see a performance problem under heavy load because it's doing too many context switches on the same CPU and the cache is getting thrashed.
Is it two complete cores? Front Side Bus speed? Memroy Speed? etc.
The IBM 970MP that Apple is using for the dual core PowerMacs was designed right. And due to the cache snooping (among other things), a dual core 970MP can be slightly faster than a dual processor setu at the same clock and bus speeds.
Another multicore chip to look at for being done right is the Sun UltraSPARC T1 processor. Up to 8 cores with 4 threads per core. Sun's threading model in this processor doesn't have the faults that Intel's HyperThreading does.
Intel HT technology seems as bad a patch on the architecture much like Microsoft's updates to Windows.
Beside the cachae considerations which were discussed by numerous people here, there is one aspect that hasn't been mentioned.
The reason why hyperthreading was introduced in first place was to reduce the "idle" time of the processor. The Pentium 4 class processors have an extremely long pipeline and this often leads to pipeline stalls. E.g. the processing of an instruction cannot proceed because it depends on the result of a previous instruction. The idea of hyperthreading is that whenever there is a potential pipeline stall, the processor switches to the other thread which hopefully can continue its executon because it isn't stalled by some dependency. Now most pipeline stalls occur when the code being executed isn't optimized for Pentium 4 class processors. However the better Pentium 4 optimized your code is, the less pipeline stalls you have and the better your CPU utilisation is with a single thread.
Marcel
Actually, the big hitters today typically use IBM mainframe technology. Machines so fault-tolerant that they can lose CPU and memory cards and keep right on running, and end up with uptimes measured in years.
Sun equipment is a bad joke compred to IBM iron. Some banks and big firms have been using the same software for decades; once you get something debugged to the point that it never crashes, and your needs don't vary too much (finance is a pretty well-understood field), you just want it to work. Period.
-Z
I remember early discussions from LKML where developers realized that if you were to run a high-priority thread on one virtual processor and a low-priority thread on the other VP, you'd have a priority imbalance and a situation that you'd want to avoid. The developers solved the problem by adding a tunable parameter that indicated the assumed amount of "extra" performance you could get out of the CPU from HT. In other words, with 1 CPU, max load is 100%; with two physical CPU's, max load is 200%; with one HT CPU, max load would be set to something on the order of 115% to 130%. So, when your hi-pri thread is running and the lo-pri thread wants to run, we let the low-pri thread only run 15% of the time (or something like that), resulting in only a modest impact on the hi-pri thread but an improvement in over-all system throughput.
That being said, I infer from the article that Windows does not do any such priority fairness checking. Consider the example they gave in the article. The DB is running, and then some disk-cache cleaner process comes along and competes for CPU cache. If the OS were SMART, it would recognize that the system task is of a MUCH lower priority and either not run it or only run it for a small portion of the time.
As said by others commenting on this article, the complainers are being stupid for two reasons. One, Intel already admitted that there are lots of cases where HT can hurt performance, so shut up. And Two, there are ways to ameliorate the problem in the OS, but since Windows isn't doing it, they should be complaining to Microsoft, not misdirecting the blame at Intel, so shut up.
(Note that I don't like Intel too terribly much either. Hey, we all hate Microsoft, but when someone is an idiot and blames them for something they're not responsible for, it doesn't help anyone.)
I never accept the assertions that a configuration option lile HyperThreading is always good or always bad. It's never black and white. The answer is always: it depends on the application. In my experience a busy linux java based web serving application that does a lot of context switching and a lot of IO to back end applications uses less CPU when hyperthreading is enabled. Collective wisdom aside, it works for my application so I am leaving it on.
I thought you couldn't report any performance issues of MS SQL Server :)
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Certain applications take a big hit in performance with HT turned on. It's not just server apps. I don't know the specific class of problems, but some of our software has been benchmarked running faster with HT off.
Hyperthreading Speeds Linux.
In a nutshell:
- hyperthreading decreases syscall speed by a few percent
- on single-threaded workloads, the effect is often negligible, with occasional large improvements or degradations
- on multithreaded workloads, around 30% improvement is common
- Linux 2.5 (which introduced HT-awareness) performs significantly better than Linux 2.4
So, from that benchmark (and others like it, just STFW) it appears that HT offers significant benefits; you need multithreading to take advantage of it, and having a HT-aware OS helps.
Please correct me if I got my facts wrong.
We run RH AS2.1 on most machines right now and hyperthreading is disabled (under any kernel) because of this performance hit, it can grind a heavily active database into a big backlog.
So far it looks like AS3 and a newer kernel resolves the issue - but we don't have a big spread of those servers in the DC just yet so may not be a good sampling of HT enabled instances.
you do know that windows has priority levels too ?
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
morcego
``How well your code fits in cache is not something you can control at run time.''
/proc/cpuinfo to see how much cache you really have, and chose the version of your code to run based on that.
You most certainly can, and the speed gains can be significant. One way to do it:
- write a version of your code optimized for 256 KB cache
- write a version of your code optimized for 512 KB cache
Use the contents of
I'm sure there are better ways, but this is just proof that it's possible. Whether or not it's advisable depends on the situation.
Please correct me if I got my facts wrong.
I don't have a HT-capable proc (AMD Athlon XP 1700), so I don't know anything from personal experience.
I decided to check out how PostgreSQL did with HT.
The first link (1) was suggesting to someone--who was having performance problems under FreeBSD--to turn off HT. Of course, that may not be related to PostgreSQL itself, but rather FreeBSD. I really don't know.
The next thing I found showed some mixed results with ext2 under Linux (2). Somethings showed gain with HT, but not others.
Another link (3) commented that HT with Java requires special consideration when coding.
I didn't come up with anything useful under PostgreSQL, so I checked out Linux.
According to Linux Electrons, Linux performance can drop without proper setup.
// file: mice.h
#include "frickin_lasers.h"
I use Nuendo for professional music recording and even though their latest version says it's HT aware, the performance is poor. In fact in several instances it only takes a few instruments loaded for it to peak CPU, change it back to basic CPU with HT off and it works fine.
MY understanding is it's this way with Cubase as well.
"If any question why we died, Tell them because our fathers lied."
So all these Xeons around the farm are laptop CPUs or something?
Dewey, what part of this looks like authorities should be involved?
AMD is the way to go.
So this is all just another misconfiguration by sql server dbas!
Front page news!
HA! HA! HA! HA! HA! HA! HA!
No, if you care about fault-tolerance and extreme availability, you go for HP Non-Stop servers (Tandem/NSK)... /G
HT is like cramming another engine into the Yugo, then attaching a Y adaptor to the intake manifolds so that both of them share that single-barrel carb. The car probably won't go any faster, and the handling will suffer.
I routinely run 4 batch threads running simulation cases. The entire batch (on a dual Xeon with HT) runs in 4 threads in about 27% of the time it takes when serialized. So I can report that HT is very effective for my compute-intensive situation.
Sorry to inject actual data into a religious discussion.
Can anyone explain to me the exact difference between HT and CMT ? I'm wondering if these same issues would plague Sun's new Niagra prcessor.
I hate to break it to you, but NT has been taking advantage of HT in an efficient manner much longer than Linux. HT is a bandaid for poor compiler technology and a mediocre architecture.
google search for "Microsoft SQL Server 2000 Hyperthreading" gives answer in first two links, problem solved
Instead of arguing whether HT is advantage/disadvantage, why not make simple tests and see what the results are??! So, it's very fcking easy to say "this sucks!" and not provide any evidence. Let's SEE SOME COLD HARD FACTS AND THEN DECIDE!
Unfortunately Windows looks at an HT CPU as if it had multiple cores (true SMP). If Microsoft would change the Windows Scheduler to properly treat an HT CPU by adjusting the way it distributes threads and processes to the two virtual CPUs, then there should be a performance gain and no penalty.
--
http://www.gloryhoundz.com/
JIT code can be cached persistently so that startup costs are only paid once. AS400 does this sweetly. And JIT doesn't add significantly to memory footprint (there is a fixed overhead - think about Transmeta), but certain types of garbage collection do - the fast ones (e.g. generational). When it comes to memory management, you can make it small, fast, automatic - pick any two.
I may agree that HyperThreading as implemented in the x86 architecture is a hack, but I wouldn't dismiss the original idea of HT, as implemented in the Tera supercomputers. It was designed to have hundreds of thread contexts in hardware, so if it has to wait on memory, there will be some other thread available to run. There are enough threads available that it can do without a cache, while utilising the full memory bandwidth. This quite neatly avoids cache consistency problems that can kill massively parallel performance.
a.
I don't entirely agree. AMD's multi-core architecture was targeted from the start toward servers. And its quite fair to say servers benefit from it. With HT, servers are not benefiting from it in their 'server' capacity. That is, if the server is loaded, which is the job of a server-to be loaded) then HT benefit is reduced.
Basically HT provides greater responsiveness, than performance so it should be targeted toward desktop not servers.
Its ok to have drawbacks, but they should not be in the thing you are designing for or rather advertising for.
Why HyperThreaded only, performance can drop even on new dual core processors as they share L2 cache. Dual core cannot increase single thread performance if thread is memory bound (though less severly than HT). Hyperthreaded was meant to increase processor throughput but it will work only if program has a decent cache footprint.
They called me mad, and I called them mad, and damn them, they outvoted me. -Nathaniel Lee
Not at all. One of the big problems with HyperThreading as Intel has designed it is that they did not provide sufficient memory bandwidth to be able to feed both threads. This problem also plagues Intel's "dual core" chips. Ultimately, it eliminates the supposed benefit of switching over to the other thread when the first is blocked on memory access because as soon as the second thread needs information from memory it will actually slow things down. Also, even if there were sufficient memory bandwidth, the comparatively long fetch times would still mean that the CPU would be blocked parts of the time waiting on memory because there are only two threads available. Finally, there is a cost to switching between the threads, so even if you had the memory bandwidth and enough threads to prevent idle time it would still lose time because of the overhead in switching to another thread whenever the active one gets blocked.
On the other hand, the new UltraSPARC T1 (aka, Niagara) has massive memory bandwidth, shorter fetch times, four threads per core rather than two, and zero penalty for switching between threads. The result is incredible throughput with a total of 32 hardware threads (8 cores with four threads per core) in a single chip. And by the way, that single chip draws a fraction of the power and generates much less heat than a single Intel HT processor (I swear it seems like the systems are blowing out cool air).
Note, however, that the T1 chip may not be ideal for all workloads. It does have a relatively slow single-threaded performance, so it works best when running highly concurrent applications with minimal locking, or when running several applications concurrently. For some applications, it may be desirable to use processor sets to limit the set of threads that it can see and/or to run multiple copies concurrently and load balance across them. But for others that are designed to scale well (e.g., those that already run well on larger systems like the E6800 with 24 UltraSPARC-III or the E2900 with 12 dual-core UltraSPARC-IV chips), then they can take full advantage of the available processing power.
For the tests that I've run with an application that does scale, a system with a single UltraSPARC T1 chip easily doubles up the performance of a system with two 3.2GHz HT Xeons (regardless of whether HT was enabled or disabled). Of course, I haven't been lucky enough to test with the officially shipping version of the T1 chip (the ones I've been able to use have been running at a slower clock rate, and some of them have had some of the cores disabled), so that performance gap may actually be larger than I have been able to measure so far.
>> CPU usage increases significantly but SQL Server performance degrades
That's called "saturation". Happens to every piece of server software. There is ALWAYS a point where "requests per second" start going down and latencies begin to go up. And from there it usually goes WAY downhill unless you take the load off or reduce it significantly to let the software catch up and recover.
God, I hate when developers are allowed to do perf testing. They test a simple scenario without full understanding of what's going on and make wild conclusions from it to get "visibility" which at large companies like MSFT often leads to promotion. Then they go ahead and solve a "problem" which doesn't exist.
This is not to say that HT doesn't degrade performance. I've heard that from Intel folks themselves that in some scenarios it does. But when a "developer" does perf testing, I take that with a three pound grain of salt.
is this just windows or did they test other x86 oses as well? IE: Could it be a problem with the OS itself and not hyperthreading?
Only 'flamers' flame!
Does slashdot hate my posts?
Roughly the same thing in theory. Difference is that sun applications are typically compiled using sun compilers and that sun hardware doesn't suck and the sun compiler actually knows when and how to make use of the threading benefits.
;)
So no you shouldn'th ave problems at all with the Niagra proc unless you do something stupid like shove linux on it
Well.
All ive seen is better performance with HT and kernel 2.6.
Maybe thats because 2.6 is so much better, maybe its ALSO because of the HT. We will never know.
NO SIG
You'd think, wouldn't you, that HT would cut in half or more the very expensive (in cycles) context switching involved in moving to a new thread or handling an interrupt. This is in addition to giving the processor something to do while the other thread is stalled on latency to main memory. Strange to see it go the other way instead.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Software shouldn't be expected to handle hardware quirks. It's up to the hardware to run the software efficiently.
Seems to me a hardware fix would be to partition the cache into two pieces when HT is enabled and running -- use the whole cache for the processor otherwise.
With 2MB caches per processor now becoming available, would this be such a bad thing? IIRC once you're up to 256KB of cache you've already got a hit rate near 90%. That severely limits your possible improvement to less than 10% regardless of how much more cache you add. And yes I am aware that increasing the processor multiplier does make every cache miss worse in proportion, but still having HT run more efficiently in the bargain could make this tradeoff worth it. And that's even before you consider uneven partitioning if the OS can determine that one thread needs more cache than the other.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
is still a kludge. HT was a cheap hack to get extra performance under certain scenarios. Looks like their getting called out for it. Dual-core is the right answer, HT wasn't.
Quack, quack.
(erons). But the price makes them a hard sell. I'll definately be keeping my eye on these things, as soon as the price points start to line up. I want to see AMD suceed in the server market, but for now (aside from Sun and a few HP systems) Xeon is still the dominant player.
Quack, quack.
I have two identical high-end dual cpu desktops, both with HT enabled sitting on my desk. One runs win-xp, the other a 64 bit Linux. The thing I observe every day is how windows scheduler sucks. I don't know for how long marketing dept. of MSFT knows about HT, but their OS definitely doesn't know about it yet (start update in subversion or compilation in VC -- go to drink some coffee, as computer is unusable). On Linux, on the other hand, HT really improves both responsiveness and throughput. I'm waiting to test quad- dual-core box with HT enabled ;)
The parent post is common sense, which seems infrequent. I have found the range to be quite wide: When rednering animations from Blender, I have found that hyperthreading results in nearly 70% faster throughput when turned on. For rendering MPEG2 using Tmpgenc (under Wine), I see around 40% improvement with HT on. Clearly, these two applications benefit quite a bit from HT due to small computational footprint and/or low cache contention, etc. On the other hand, on my system, on-screen 3D acceleration in the NVIDIA driver (under Linux at least) appears to suffer with HT, with frame rates that are around 10-20% slower than with HT disabled.
So, I see improvements ranging from -20% to +70% depending on the application, with many applications seeing only small differences one way or the other. Like many things, this tends to turn into a religious debate when the fact is that it varies case-by-case.
First, it doesn't matter if the server uses threads or processes. Threads have a minor performance advantage for startup and context switching, and some disadvantages for memory allocation speed (finding VM space is a hashing problem) and some locking overhead. For the most part though, with tasks that just crunch numbers (including scanning memory) or make system calls, there isn't all that much difference.
Running 2 threads per CPU is not cheating. It's normal to run 1 thread per CPU plus 1 thread per concurrent blocking IO operation. That could come out to be 2 threads per CPU.
blame my ignorance, but this ht fiasco is nothing more than greed for ghz which ignorantums as such as me buy as "FAST". pipe stalls made P4 worse than P3 in many cases. of course you have to fabricate some benshmarks ... "oh lets just putsy some more to fill these, yea lets name them threads, hyper-threados"
Twice the ALU power and half the power.
;-)
That's not a hard sell. If you're doing number crunching of any kind in a professional setting an AMDx2 or opt will pay for itself quickly.
Oh that and you're not funding the never ending chain of stupidity that is the P4 design team
Tom
Someday, I'll have a real sig.
When you get down to that level, even accessing a variable counts as "I/O". ;-)
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Of course I agree. Unfortunately the person ultimately responsible for deciding (in my case in favor of the cheaper server line that will get the job done) has to weigh the pros and the cons included in the broader picture. This year will probably be the first year we end with a profit, the right hardware wouldn't have made that possible.
Still your long-term argument holds, but try to explain that to your investors and you can see how it starts to get a little thornier.
Quack, quack.
Also, I think it's amusing that they talk about performance degradation like it's some aweful thing, considering their example software is all blue-light-special junk. Complaining about HT slowing down this particular server, is like complaining that a certain brand of gasoline makes your Yugo run slower. If you care about speed, you probably don't drive a Yugo, and if you care about computers, you probably don't use SQL Server or Citrix.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
just needs an excuse for writing that sorry ass software(which they like anyother touted MS product), bought it (way back in 91)!
Scott McNealy to Michael: "Suck my Sun!" Michael Dell to Scott : "Lick my Dell!"
To all those people thinking that HT multiplies CPU performance by 2, here is a very simple experiment that will prove you wrong. Define those functions in your shell (BSD, Linux, Cygwin, whatever):
Then, on an HT enabled box, benchmark the function running a single CPU intensive process:
It took 5.2 seconds. Now do it with the function running two CPU intensive processes:
It takes twice the time (10.4 secs). If HT really offered twice the perfs, it would have taken the same amount of time (5.2 secs) because HT would have run the 2 processes on the 2 "independent" CPUs, but as you can see this is not the case. The explanation of this is that the execution units are shared between the 2 logical CPUs. Whereas on this dual opteron 244 box I happen to have in front of me, both benchmarks give the same numbers: p1 = 4.4 secs and p2 = 4.5 secs, because on a true SMP (or dual-core) box, the 2 CPUs are obviously independent and don't share their execution units. As it is correctly pointed out by other people, HT is a way to reduce the impact of pipeline stalls and execution units under-utilisation, it is not a way to magically "multiplies" raw CPU performance by 2.
I've personally seen that HT technology can kill performance on Novell NetWare 6.5 on very high-end servers. (Performance increased more than tenfold when HT was disabled.
It is incorrect, IMHO, that Hyperthreading and increased performance are related, at least on server platforms. Unless the server applications are concieved with multithreading in their design, it would be incorrect to assume that Intel HT would somehow figure out the assembly code and facilitate performance.
Of course I know that. It just seemed to me that if Windows had a way to decide not to use both virtual processors because the only two runnable threads were of different priorities (or run the lower-pri thread only part of the time), then the people complaining would never have noticed a problem.