Hyperthreading Hurts Server Performance?
sebFlyte writes "ZDNet is reporting that enabling Intel's new Hyperthreading Technology on your servers could lead to markedly decreased performance, according to some developers who have been looking into problems that have been occurring since HT has been shipping automatically activated. One MS developer from the SQL server team put it simply: 'Our customers observed very interesting behaviour on high-end HT-enabled hardware. They noticed that in some cases when high load is applied SQL Server CPU usage increases significantly but SQL Server performance degrades.' Another developer, this time from Citrix, was just as blunt. 'It's ironic. Intel had sold hyperthreading as something that gave performance gains to heavily threaded software. SQL Server is very thread-intensive, but it suffers. In fact, I've never seen performance improvement on server software with hyperthreading enabled. We recommend customers disable it.'"
Anybody who understands HT has been saying this since chips supported it, I have it enabled because I find that at typical loads our DB servers performance benefits from HT aware scheduling. Welcome to 2002.
Well, a technology with a name such as "HyperThreading" is targeted more to end users who don't know about processors, rather than SQL "Performance Tuners" who try to squeeze every cycle of processing power.
HyperThreading might help poorly written thread management (independent audio and video subsystems for example), but not true multithreading, that's for sure.
I read the intel assembly guide section regarding hyperthreading, and it clearly states that performance will drop if you don't take the shared cache into consideration. The two logical threads contend for the cache, causing the performance problems that were described. In order for there to be a true benefit to hyperthreading, either the program, the OS or the compiler needs to determine that hyperthreading is enabled, and model the code to only use less than half the cache. It's been known that way since the beginning, and frankly, is silly that MS is scratching their heads wondering why this is. Lower the cache footprint, and I'll be willing to bet that performance rises dramatically.
Marxism is the opiate of dumbasses
indeed has once again proved it is expensive to be poor.
Question I find more interesting: What is the performance gap between dual CPU vs Dual-core?
If you mod me down, I *will* introduce you to my sister!
Those of us who care to measure for themselves rather than buy Intel's propaganda, have noticed this long ago. I bet the people quoted in the article noticed it long ago as well, but it has only recently become "politically correct" to share that knowledge.
Perhaps this ushers a new era of computing, where Intel chips underperform AMD ones.
Oh, wait...
If you have a system thread cleaning out blocks of disk cache memory then of course it is going to suffer. The whole point of hyperthreading was that one thread could run while another was waiting for I/O.
The first tests on Linux when Hyperthreading came out were also pretty discouraging.
Mielipiteet omiani - Opinions personal, facts suspect.
I don't want to start a flamewar, but everytime I see an Intel commercial when the announcer says "pentium 4 with ht technology", it sounds like a stupid marketing ploy. It's suppose to offer better performance in heavily threaded apps, but apparently it doesn't. Also, in the commercials, it never explains to the customer what HT is, which just shows that if they had a great piece of technology, they would atleast take 10 seconds to explain the benefits, but they never do. They say a catch phrase, and that's really what it all seems to boil down to.
public class null extends java applet { System.out.print ("Tabula Rasa"); }
Well, AFAIK, the HTT thing only allows for the processor to sort of split execution units (FPU, ALU, etc) so that one can work on one thread, the other on another one. If an application resorts heavily to one of those units -- and my somewhat uninformed feeling is that software like SQL probably works mostly on the ALU, it, can't possibly GAIN performance. On the other hand, I can see the effort of thrying to pigeonhole the idle threads on the wrong execution unit (will it even try that?) completely borking performance. So yeah, no surprises here.
This sort of effect has been talked about for as long as I remember hearing about hyperthreading. It was common knowledge long before the chips came out that running two threads on the same cache can cause performance issues. One can see this with two chips sharing an L2 cache so why should it be a surprise here?
The real question is whether this issue can be optimized for. If the developers design their code with HT in mind will this still be a problem since the other thread may belong to another process or would properly optimized code be able to deal with his?
Most importantly is this a rare effect or a common one? Would it be rare or common if you optimize your programs for an HT machine?
If you liked this thought maybe you would find my blog nice too:
Hyperthrashing?
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
Probably the developers of sql server didn't undestand how to get the best from a hyperthreading architecture. There's a big difference between 'real' threads and 'pseudo' (time-sliced) threads. I'm betting it's the software that's at fault here and not Intel's architecture.
Maybe start by betting your Karma, Mr. AC?AFAIK intel-HT is intended to improve the felt performance of users e.g. in front of a GUI by reducing response time. There has to be a catch, because if it was so easy, everyone would have done it before instead of painstakingly optimizing the CPU.
I'm still trying to figure out what people mean by 'social skills' here.
As someone who commented above pointed out intel openly acknowledges performance can be hurt. I don't know what you mean about not being acceptable to notice this as I've seen this sort of issue mentioned in pretty much every article I've read on HT starting quite far back.
HT is just another chip technology like any other. It is only in the rarest circumstances that a new technology will be better/faster for everything. These things all have tradeoffs and the question is whether the benefits are enough to exceed the disadvantages.
I really think you are being a little unfair to intel. If you had evidence that it decreased performance for most systems even when the software was compiled taking HT into account then you might have a point. However, as it is this is no different than IBM touting its RISC technology or AMD talking about their SIMD capabilities. For each of these technologies you could find some code which would actually run slower. If you happen to be running code which makes heavy use of some hardware optimized string instructions a RISC system can actually make things worse not to mention a whole other host of issues. The SIMD capabilities of most x86 processors required switching the FPU state which took time as well.
It's only reasonable that companies want to publisize their newest fancy technology and they are hardly unsavory because they don't put the potential disadvantages centrally in their advertisements/PR material. When you go on a first date do you tell the girl about your loud snoring, how you cheated on your ex or other bad qualities about yourself. Of course not, one doesn't lie about these things but it is only natural to want to put the best face forward and it seems ridiculous to hold intel to a higher standard than an individual in these matters.
If you liked this thought maybe you would find my blog nice too:
Usual response is to disable it from bios
g _id=12403341
p _id=9028&words=hyperthreading&type_of_search=mlist s
One possible solution (code patch)
http://sourceforge.net/mailarchive/message.php?ms
Other threads with hyperthreading problems (slowdowns)
http://sourceforge.net/search/?forum_id=6330&grou
developer http://flamerobin.org
The article seems to focus only on Windows. To get good performance from hyperthreading, the scheduler has to be aware of situations that could lead to decreased performance and avoid them. So is this a problem with the Windows scheduler being unable to deal with hyperthreading or is hyperthreading really broken? How is hyperthreading performance on other operating systems?
Another question one needs to ask is, how is performance on single and dual CPU systems? Getting good performance on a dual CPU HT system (which means four logical CPUs) is more complicated and thus requires more sophisticated algorithms in the scheduler.
Applications are most likely not to be blamed for the decreased performance. Such hardware differences should be dealt with by the kernel. Occationally the scheduler should keep one thread idle whenever that leads to the best performance. Only when there is a performance benefit should both threads be used at the same time.
Do you care about the security of your wireless mouse?
I second the person that said programmers shouldn't be writing code to the cache size on a processor. How well your code fits in cache is not something you can control at run time. Different releases of the CPU often have different cache sizes. And frankly developers should always try to achieve tight efficent code, not develope to a particular cache size.
Think Deeply.
I have had an ATI all in wonder 9800 for close to more than a year now. I never really used the tuner part until a few weeks a go when I took delivery of several new LCD's and decided that I could be watching a little tv on one while working.
The 9800 sits on my XP box, which rarely gets rebooted. Games, browsing etc. My mac mini and linux boxes sit in their places with a KVM
Well after using the tuner part, it looks great with my digital cable. But the box would lock, couldnt kill the process of the ATI software MMC. A few times an hour sometimes at least once a day. Well I was on the point of sticking an old haupage in there. Or using another MMC.
Well after much digging I found a thread on how HT could cause issues with the software. I disabled it in the bios, do not really need it for anything. And ran the Tuner 48 hours solid without a lockup.
Now perhaps ATI is at fault for the software, but then again HT caused the incompatibility in my book.
Puto
The Revolution Will Not Be Televised
I know asking for them to research is a stretch, but the submitter should at least read the acticle before submitting it. The quote was from a Technical Director at a consulting company that sells Citrix software, not from a developer at Citrix. Hyperthreading can definitely help performance of Metaframe running under Windows 2003. Enabling it in the bios on a server running Windows 2000 was where the problem resided.
I don't know about you guys, but I run many linux servers. I have a mix of CPU's, and the HT servers seem to perform better than non HT servers. Is linux better optimized for HT?
-- these are only opinions and they might not be mine.
As far as I can tell, everything that hyperthreading was designed around was the idea that two dissimilar threads would run at the same time, for example, an I/O bound thread with a FPU-bound thread, or the like.
Running two identical threads on the same processor intuitively seems like it would result in a slowdown, as you've got more overhead than the thread running alone, with the same tasks being executed.
Kinda like trying to toast bread by putting one piece in, then rapidly taking it out and putting a different one in, repeat as needed, vs having two seperate slots, or just toasting one at a time.
Hyperthreading is a gimmick to keep Intel's overly long pipeline busy. At one point the wisdom was for processors to have long instruction pipelines. The problem arises when branch prediction fails and trashes your pipeline. AMD saw that the long pipelines were harmful and shortened them on the Athlon line. The rest is history.
As far as I'm concerned, the fiasco of P4 being far worse than P3, and the apparent inability to do a turnabout, means Intel is a broken company. They should have just tacked the new P4 instructions on a P-M core and called it the P5. Oh wait...
"I read the intel assembly guide section regarding hyperthreading, and it clearly states that performance will drop if you don't take the shared cache into consideration." This is a general problem. XBox 360 has similar issues, 3 cores sharing the same cache. Having multiple independent cpu's with each its local memory (like multiprocessor or PS3 SPU's),doesn't suffer from these issues.
HT is a very simple concept: Virtualize 2 CPUs by cutting all caches in half and allocating each half to one of the CPUs, and allow the ALUs to process data from either thread. Ths can give good performance, for instance when one thread has a cache miss and is waiting for data from main memory (or god forbid there is a fault and you need to read from the HDD). In a normal single CPU operation, this ties up resources, and that thread can't make any progress. with HT on, the second thread can continue processing data. Or even without a cache miss, there are 4 (or more) ALUs on the die, and only certain types of applications can effectively make use of them all simulatneously. Having HT allows a higher probability that all the resources on the chip are used. But the cost, as I said above, is cutting the cache sizes in half (effectively). And cache is king for some applications. there are many job types where doubling the cache gives much better performance than even doubling the CPU speed (well, that is probably pushing it, ut certainly adding 10% more cache can be better than 10% higher clock rate), as it means less time going to main memory.
It isn't a foolproof technology, but it has it's benefits. SQL can be very heavy on the cache, and I'm not surprised that it doesn't perform optimally without some tuning.
"MS SQL was designed and likely largely tested in a single processor system and multiprocessor or HT support is somewhat less than optimal. So MS SQL is likely best tuned to single processor."
Where did you get this wallop of information? It is not true, MS SQL Server performs very well in multiprocessor environments (not using Hyperthreading). Checkout the TPC benchmarks if you don't believe me: http://www.tpc.org/
No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
Usually not, no.
Best would - of course - be to perform your own test, but enabling HT on desktops usually improves the multi-app flow and reduces the cases of boxes "locking" with one application eating all the resources.
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
What was the difference when you tried ?
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Is it two complete cores? Front Side Bus speed? Memroy Speed? etc.
The IBM 970MP that Apple is using for the dual core PowerMacs was designed right. And due to the cache snooping (among other things), a dual core 970MP can be slightly faster than a dual processor setu at the same clock and bus speeds.
Another multicore chip to look at for being done right is the Sun UltraSPARC T1 processor. Up to 8 cores with 4 threads per core. Sun's threading model in this processor doesn't have the faults that Intel's HyperThreading does.
Intel HT technology seems as bad a patch on the architecture much like Microsoft's updates to Windows.
Beside the cachae considerations which were discussed by numerous people here, there is one aspect that hasn't been mentioned.
The reason why hyperthreading was introduced in first place was to reduce the "idle" time of the processor. The Pentium 4 class processors have an extremely long pipeline and this often leads to pipeline stalls. E.g. the processing of an instruction cannot proceed because it depends on the result of a previous instruction. The idea of hyperthreading is that whenever there is a potential pipeline stall, the processor switches to the other thread which hopefully can continue its executon because it isn't stalled by some dependency. Now most pipeline stalls occur when the code being executed isn't optimized for Pentium 4 class processors. However the better Pentium 4 optimized your code is, the less pipeline stalls you have and the better your CPU utilisation is with a single thread.
Marcel
Actually, the big hitters today typically use IBM mainframe technology. Machines so fault-tolerant that they can lose CPU and memory cards and keep right on running, and end up with uptimes measured in years.
Sun equipment is a bad joke compred to IBM iron. Some banks and big firms have been using the same software for decades; once you get something debugged to the point that it never crashes, and your needs don't vary too much (finance is a pretty well-understood field), you just want it to work. Period.
-Z
I remember early discussions from LKML where developers realized that if you were to run a high-priority thread on one virtual processor and a low-priority thread on the other VP, you'd have a priority imbalance and a situation that you'd want to avoid. The developers solved the problem by adding a tunable parameter that indicated the assumed amount of "extra" performance you could get out of the CPU from HT. In other words, with 1 CPU, max load is 100%; with two physical CPU's, max load is 200%; with one HT CPU, max load would be set to something on the order of 115% to 130%. So, when your hi-pri thread is running and the lo-pri thread wants to run, we let the low-pri thread only run 15% of the time (or something like that), resulting in only a modest impact on the hi-pri thread but an improvement in over-all system throughput.
That being said, I infer from the article that Windows does not do any such priority fairness checking. Consider the example they gave in the article. The DB is running, and then some disk-cache cleaner process comes along and competes for CPU cache. If the OS were SMART, it would recognize that the system task is of a MUCH lower priority and either not run it or only run it for a small portion of the time.
As said by others commenting on this article, the complainers are being stupid for two reasons. One, Intel already admitted that there are lots of cases where HT can hurt performance, so shut up. And Two, there are ways to ameliorate the problem in the OS, but since Windows isn't doing it, they should be complaining to Microsoft, not misdirecting the blame at Intel, so shut up.
(Note that I don't like Intel too terribly much either. Hey, we all hate Microsoft, but when someone is an idiot and blames them for something they're not responsible for, it doesn't help anyone.)
I never accept the assertions that a configuration option lile HyperThreading is always good or always bad. It's never black and white. The answer is always: it depends on the application. In my experience a busy linux java based web serving application that does a lot of context switching and a lot of IO to back end applications uses less CPU when hyperthreading is enabled. Collective wisdom aside, it works for my application so I am leaving it on.
While HT degrades faster than two CPU systems for reason of contention of more components than just I/O and memory, if properly programmed it will add to throughput.
Given that MS SQL isn't exactly a rare piece of software, what fraction of software will actually take advantage of the hyperthreading? It's sort of the Itanium argument all over again, who cares how wonderful the architechture is if no software is able to use it well? If I was building server software, my primary performance metrics would be single-core (no HT/dual-core) and multi-CPU (SMP) benchmarks, and HT/double-core performance would be mostly what it would be using the SMP code. Dual-core seems to handle SMP code quite well, so specifically targetting HT CPUs seems like a really niche target, considering the limited gain it has even under the best of conditions.
Live today, because you never know what tomorrow brings
I thought you couldn't report any performance issues of MS SQL Server :)
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Certain applications take a big hit in performance with HT turned on. It's not just server apps. I don't know the specific class of problems, but some of our software has been benchmarked running faster with HT off.
"Of course a database server isn't going to take advantage of a hyperthreaded CPU. It doesn't do any FPU at all."
Actually MS SQL Server and Sybase ASE do use the FPU. I'm not sure about Oracle though.
Jason L. Froebe
Team Sybase
No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
Hyperthreading Speeds Linux.
In a nutshell:
- hyperthreading decreases syscall speed by a few percent
- on single-threaded workloads, the effect is often negligible, with occasional large improvements or degradations
- on multithreaded workloads, around 30% improvement is common
- Linux 2.5 (which introduced HT-awareness) performs significantly better than Linux 2.4
So, from that benchmark (and others like it, just STFW) it appears that HT offers significant benefits; you need multithreading to take advantage of it, and having a HT-aware OS helps.
Please correct me if I got my facts wrong.
We run RH AS2.1 on most machines right now and hyperthreading is disabled (under any kernel) because of this performance hit, it can grind a heavily active database into a big backlog.
So far it looks like AS3 and a newer kernel resolves the issue - but we don't have a big spread of those servers in the DC just yet so may not be a good sampling of HT enabled instances.
MS SQL was designed and likely largely tested in a single processor system and multiprocessor or HT support is somewhat less than optimal. So MS SQL is likely best tuned to single processor.
Are you high, or are you just in the habit of randomly making up nonsensical stuff? While we're at it, which morons modded that post to +4 Insightful? Do you really think that Microsoft would design and target their database server platform for use in only single CPU servers? Database applications are alwasy processor CPU intensive, and Microsoft, Oracle, and other vendors of database software spend ridiculous amounts of time optimizing their software to be heavily multi-threaded exactly so that it will perform well on multiple CPU systems.
Every couple of months either there's a new press release from Microsoft or Oracle indicating that hardware vendor X has set a new record for the highest TPC marks on database processing by using some new multi-CPU configuration and their software. Do you really think that Microsoft could compete in those conditions if they only wrote SQL server for single CPU configurations?
you do know that windows has priority levels too ?
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
morcego
MS SQL was designed and likely largely tested in a single processor system and multiprocessor or HT support is somewhat less than optimal.
;-)
lol, with an outrageous claim like this for dedicated server software, you really need to provide an unbiased source.
Beware: In C++, your friends can see your privates!
``How well your code fits in cache is not something you can control at run time.''
/proc/cpuinfo to see how much cache you really have, and chose the version of your code to run based on that.
You most certainly can, and the speed gains can be significant. One way to do it:
- write a version of your code optimized for 256 KB cache
- write a version of your code optimized for 512 KB cache
Use the contents of
I'm sure there are better ways, but this is just proof that it's possible. Whether or not it's advisable depends on the situation.
Please correct me if I got my facts wrong.
I don't have a HT-capable proc (AMD Athlon XP 1700), so I don't know anything from personal experience.
I decided to check out how PostgreSQL did with HT.
The first link (1) was suggesting to someone--who was having performance problems under FreeBSD--to turn off HT. Of course, that may not be related to PostgreSQL itself, but rather FreeBSD. I really don't know.
The next thing I found showed some mixed results with ext2 under Linux (2). Somethings showed gain with HT, but not others.
Another link (3) commented that HT with Java requires special consideration when coding.
I didn't come up with anything useful under PostgreSQL, so I checked out Linux.
According to Linux Electrons, Linux performance can drop without proper setup.
// file: mice.h
#include "frickin_lasers.h"
I use Nuendo for professional music recording and even though their latest version says it's HT aware, the performance is poor. In fact in several instances it only takes a few instruments loaded for it to peak CPU, change it back to basic CPU with HT off and it works fine.
MY understanding is it's this way with Cubase as well.
"If any question why we died, Tell them because our fathers lied."
So all these Xeons around the farm are laptop CPUs or something?
Dewey, what part of this looks like authorities should be involved?
Why? Because CPU-intensive applications can't help but work better under dual-core systems. Even if MSSQL server was inexplicably designed for a single CPU, you'd end up with it running on one CPU and the OS and everything else on the other, and it would, indeed, be faster.
Under hyperthreading, of course, that doesn't work at all.
I always thought the idea of hyperthreading was a little dodgy, but I didn't know enough about CPU design to prove it. I'm glad other people are saying the same thing. It always seemed like it would be faster to introduce a new instruction that's basically 'save and restore context', and then rewrite the process scheduler to use it instead of doing that manually, then do it how Intel did it, which is to switch back and forth between two specific contexts outside of software control.
If corporations are people, aren't stockholders guilty of slavery?
AMD is the way to go.
No, if you care about fault-tolerance and extreme availability, you go for HP Non-Stop servers (Tandem/NSK)... /G
Really? I guess some AC knows more than the Intel reps that were at last weeks SQL 2005 launch. Every other word out of the reps mouths was about hyper-threading, and they were talking to SQL users.
It's sort of the Itanium argument all over again, who cares how wonderful the architechture is if no software is able to use it well?
A very insightful comment! I'd mod you +1 Insightful if I had the points.
w00t
Can anyone explain to me the exact difference between HT and CMT ? I'm wondering if these same issues would plague Sun's new Niagra prcessor.
What's a carburetor? Or a Yugo, for that matter?
-1 for old and moldy as well as nonsensical.
Yes, definately.
Along with the rest of the machine.
emt 377 emt 4
Where did you get this wallop of information? It is not true, MS SQL Server performs very well in multiprocessor environments (not using Hyperthreading). Checkout the TPC benchmarks if you don't believe me: http://www.tpc.org/
Wow, this post sure attracted a lot of flame bait from M$ 'n FUD crew.
Read the original post, "and likely largely tested in a single processor system".
I don't think Microsoft gave it's developers a $5.8M USD machine in #4 www.tcp.org spot that you can't even buy yet to develop MS SQL. It was more likely a PC, single processor and subsequently and later tested on the bigger iron.
Instead of looking at the www.tcp.org site where vendors post systems you can buy, why not look at what organizations are really buying?
http://www.top500.org/lists/2005/11/
There must be some reason that Microsoft consistantly is excluded completely from the top 10 by *real* world purchases. I didn't check to see how far down the list you have to go to see a Microsoft product. I guess those Dells run Linux nicely.
Go ahead M$ pundits, mod this down too. After all it is the M$ way. You don't like the facts so you FUD it and mod it down.
Anyone have any links to any test reports?
Not teribly scientific, but when I ran seti 3.x on my HT w. Linux I got the following results:
1 seti at a time ran in about 4 hours for 6 units per day.
2 instances of seti at a time was about 5.2 hours per unit at 9.2 seti units per day.
So I ran 2 seti instances to get the throughput as I was after the work unit count.
Unfortunately Windows looks at an HT CPU as if it had multiple cores (true SMP). If Microsoft would change the Windows Scheduler to properly treat an HT CPU by adjusting the way it distributes threads and processes to the two virtual CPUs, then there should be a performance gain and no penalty.
--
http://www.gloryhoundz.com/
JIT code can be cached persistently so that startup costs are only paid once. AS400 does this sweetly. And JIT doesn't add significantly to memory footprint (there is a fixed overhead - think about Transmeta), but certain types of garbage collection do - the fast ones (e.g. generational). When it comes to memory management, you can make it small, fast, automatic - pick any two.
I may agree that HyperThreading as implemented in the x86 architecture is a hack, but I wouldn't dismiss the original idea of HT, as implemented in the Tera supercomputers. It was designed to have hundreds of thread contexts in hardware, so if it has to wait on memory, there will be some other thread available to run. There are enough threads available that it can do without a cache, while utilising the full memory bandwidth. This quite neatly avoids cache consistency problems that can kill massively parallel performance.
a.
Are you high, or are you just in the habit of randomly making up nonsensical stuff?
No, not high. Just willing to take pro-M$ flame bait today.
I guess I overestimated the intelligence of the /. readership, especially those from the PC world.
The fact is, if you are writing software to be efficient on a single processor the architecture of the software will be much different than if you know you have 32 processors. And neither is best for the other.
For single processor speed you don't want the overhead of interprocess commutations so you can skip it and sequentially do what you need to do without worry of what the other processes are doing. In fact, this is usually how most programs operate as coding is much more easy to do.
For multiprocessor systems you want to distribute as evenly as possible the work across as many processors and I/O busses as you have. It is worth the effort of code, threads, interprocess communications layer with mutex, locks and individual disk writers. But this model would run slower on a single CPU.
The HT model isn't dual CPU in performance but does allow for 2 threads on the system to be active at once, at the expense of individual thread performance. Do we want single process speed or throughput? Example:
Classic seti 3.x on Fedora Linux.
- with 1 seti running takes 4 hours
- with 2 seti running each takes 5.2 hours
So if I want the fastest seti I want to run 1. If I want the most seti I want to run 2 to keep the processor busy to maximum performance.
And MS SQL, like it or not will have it's ups and downs depending how it was architected.
I don't entirely agree. AMD's multi-core architecture was targeted from the start toward servers. And its quite fair to say servers benefit from it. With HT, servers are not benefiting from it in their 'server' capacity. That is, if the server is loaded, which is the job of a server-to be loaded) then HT benefit is reduced.
Basically HT provides greater responsiveness, than performance so it should be targeted toward desktop not servers.
Its ok to have drawbacks, but they should not be in the thing you are designing for or rather advertising for.
Why HyperThreaded only, performance can drop even on new dual core processors as they share L2 cache. Dual core cannot increase single thread performance if thread is memory bound (though less severly than HT). Hyperthreaded was meant to increase processor throughput but it will work only if program has a decent cache footprint.
They called me mad, and I called them mad, and damn them, they outvoted me. -Nathaniel Lee
That is the most uninformed and dumbass thing I've seen written on Slashdot in a while, and that's saying something.
http://www.tpc.org/tpcc/results/tpcc_perf_results. asp
Check the www.tpc.org top 10 list. At #8, from way back in 2003, is a 64 way HP Itanium system running SQL Server. Everything else in the top 10 is more recent.
I really wish slashdot had a -1 (Idiot) mod.
- Think of it as evolution in action -
Not at all. One of the big problems with HyperThreading as Intel has designed it is that they did not provide sufficient memory bandwidth to be able to feed both threads. This problem also plagues Intel's "dual core" chips. Ultimately, it eliminates the supposed benefit of switching over to the other thread when the first is blocked on memory access because as soon as the second thread needs information from memory it will actually slow things down. Also, even if there were sufficient memory bandwidth, the comparatively long fetch times would still mean that the CPU would be blocked parts of the time waiting on memory because there are only two threads available. Finally, there is a cost to switching between the threads, so even if you had the memory bandwidth and enough threads to prevent idle time it would still lose time because of the overhead in switching to another thread whenever the active one gets blocked.
On the other hand, the new UltraSPARC T1 (aka, Niagara) has massive memory bandwidth, shorter fetch times, four threads per core rather than two, and zero penalty for switching between threads. The result is incredible throughput with a total of 32 hardware threads (8 cores with four threads per core) in a single chip. And by the way, that single chip draws a fraction of the power and generates much less heat than a single Intel HT processor (I swear it seems like the systems are blowing out cool air).
Note, however, that the T1 chip may not be ideal for all workloads. It does have a relatively slow single-threaded performance, so it works best when running highly concurrent applications with minimal locking, or when running several applications concurrently. For some applications, it may be desirable to use processor sets to limit the set of threads that it can see and/or to run multiple copies concurrently and load balance across them. But for others that are designed to scale well (e.g., those that already run well on larger systems like the E6800 with 24 UltraSPARC-III or the E2900 with 12 dual-core UltraSPARC-IV chips), then they can take full advantage of the available processing power.
For the tests that I've run with an application that does scale, a system with a single UltraSPARC T1 chip easily doubles up the performance of a system with two 3.2GHz HT Xeons (regardless of whether HT was enabled or disabled). Of course, I haven't been lucky enough to test with the officially shipping version of the T1 chip (the ones I've been able to use have been running at a slower clock rate, and some of them have had some of the cores disabled), so that performance gap may actually be larger than I have been able to measure so far.
>> CPU usage increases significantly but SQL Server performance degrades
That's called "saturation". Happens to every piece of server software. There is ALWAYS a point where "requests per second" start going down and latencies begin to go up. And from there it usually goes WAY downhill unless you take the load off or reduce it significantly to let the software catch up and recover.
God, I hate when developers are allowed to do perf testing. They test a simple scenario without full understanding of what's going on and make wild conclusions from it to get "visibility" which at large companies like MSFT often leads to promotion. Then they go ahead and solve a "problem" which doesn't exist.
This is not to say that HT doesn't degrade performance. I've heard that from Intel folks themselves that in some scenarios it does. But when a "developer" does perf testing, I take that with a three pound grain of salt.
is this just windows or did they test other x86 oses as well? IE: Could it be a problem with the OS itself and not hyperthreading?
Only 'flamers' flame!
Does slashdot hate my posts?
Roughly the same thing in theory. Difference is that sun applications are typically compiled using sun compilers and that sun hardware doesn't suck and the sun compiler actually knows when and how to make use of the threading benefits.
;)
So no you shouldn'th ave problems at all with the Niagra proc unless you do something stupid like shove linux on it
Well.
All ive seen is better performance with HT and kernel 2.6.
Maybe thats because 2.6 is so much better, maybe its ALSO because of the HT. We will never know.
NO SIG
You'd think, wouldn't you, that HT would cut in half or more the very expensive (in cycles) context switching involved in moving to a new thread or handling an interrupt. This is in addition to giving the processor something to do while the other thread is stalled on latency to main memory. Strange to see it go the other way instead.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Software shouldn't be expected to handle hardware quirks. It's up to the hardware to run the software efficiently.
Seems to me a hardware fix would be to partition the cache into two pieces when HT is enabled and running -- use the whole cache for the processor otherwise.
With 2MB caches per processor now becoming available, would this be such a bad thing? IIRC once you're up to 256KB of cache you've already got a hit rate near 90%. That severely limits your possible improvement to less than 10% regardless of how much more cache you add. And yes I am aware that increasing the processor multiplier does make every cache miss worse in proportion, but still having HT run more efficiently in the bargain could make this tradeoff worth it. And that's even before you consider uneven partitioning if the OS can determine that one thread needs more cache than the other.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
is still a kludge. HT was a cheap hack to get extra performance under certain scenarios. Looks like their getting called out for it. Dual-core is the right answer, HT wasn't.
Quack, quack.
(erons). But the price makes them a hard sell. I'll definately be keeping my eye on these things, as soon as the price points start to line up. I want to see AMD suceed in the server market, but for now (aside from Sun and a few HP systems) Xeon is still the dominant player.
Quack, quack.
I have two identical high-end dual cpu desktops, both with HT enabled sitting on my desk. One runs win-xp, the other a 64 bit Linux. The thing I observe every day is how windows scheduler sucks. I don't know for how long marketing dept. of MSFT knows about HT, but their OS definitely doesn't know about it yet (start update in subversion or compilation in VC -- go to drink some coffee, as computer is unusable). On Linux, on the other hand, HT really improves both responsiveness and throughput. I'm waiting to test quad- dual-core box with HT enabled ;)
The parent post is common sense, which seems infrequent. I have found the range to be quite wide: When rednering animations from Blender, I have found that hyperthreading results in nearly 70% faster throughput when turned on. For rendering MPEG2 using Tmpgenc (under Wine), I see around 40% improvement with HT on. Clearly, these two applications benefit quite a bit from HT due to small computational footprint and/or low cache contention, etc. On the other hand, on my system, on-screen 3D acceleration in the NVIDIA driver (under Linux at least) appears to suffer with HT, with frame rates that are around 10-20% slower than with HT disabled.
So, I see improvements ranging from -20% to +70% depending on the application, with many applications seeing only small differences one way or the other. Like many things, this tends to turn into a religious debate when the fact is that it varies case-by-case.
If you just make up crap, why don't you even make it believable? You sir are simply a troll.
No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
First, it doesn't matter if the server uses threads or processes. Threads have a minor performance advantage for startup and context switching, and some disadvantages for memory allocation speed (finding VM space is a hashing problem) and some locking overhead. For the most part though, with tasks that just crunch numbers (including scanning memory) or make system calls, there isn't all that much difference.
Running 2 threads per CPU is not cheating. It's normal to run 1 thread per CPU plus 1 thread per concurrent blocking IO operation. That could come out to be 2 threads per CPU.
Twice the ALU power and half the power.
;-)
That's not a hard sell. If you're doing number crunching of any kind in a professional setting an AMDx2 or opt will pay for itself quickly.
Oh that and you're not funding the never ending chain of stupidity that is the P4 design team
Tom
Someday, I'll have a real sig.
When you get down to that level, even accessing a variable counts as "I/O". ;-)
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Of course I agree. Unfortunately the person ultimately responsible for deciding (in my case in favor of the cheaper server line that will get the job done) has to weigh the pros and the cons included in the broader picture. This year will probably be the first year we end with a profit, the right hardware wouldn't have made that possible.
Still your long-term argument holds, but try to explain that to your investors and you can see how it starts to get a little thornier.
Quack, quack.
Also, I think it's amusing that they talk about performance degradation like it's some aweful thing, considering their example software is all blue-light-special junk. Complaining about HT slowing down this particular server, is like complaining that a certain brand of gasoline makes your Yugo run slower. If you care about speed, you probably don't drive a Yugo, and if you care about computers, you probably don't use SQL Server or Citrix.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
just needs an excuse for writing that sorry ass software(which they like anyother touted MS product), bought it (way back in 91)!
Scott McNealy to Michael: "Suck my Sun!" Michael Dell to Scott : "Lick my Dell!"
To all those people thinking that HT multiplies CPU performance by 2, here is a very simple experiment that will prove you wrong. Define those functions in your shell (BSD, Linux, Cygwin, whatever):
Then, on an HT enabled box, benchmark the function running a single CPU intensive process:
It took 5.2 seconds. Now do it with the function running two CPU intensive processes:
It takes twice the time (10.4 secs). If HT really offered twice the perfs, it would have taken the same amount of time (5.2 secs) because HT would have run the 2 processes on the 2 "independent" CPUs, but as you can see this is not the case. The explanation of this is that the execution units are shared between the 2 logical CPUs. Whereas on this dual opteron 244 box I happen to have in front of me, both benchmarks give the same numbers: p1 = 4.4 secs and p2 = 4.5 secs, because on a true SMP (or dual-core) box, the 2 CPUs are obviously independent and don't share their execution units. As it is correctly pointed out by other people, HT is a way to reduce the impact of pipeline stalls and execution units under-utilisation, it is not a way to magically "multiplies" raw CPU performance by 2.
I've personally seen that HT technology can kill performance on Novell NetWare 6.5 on very high-end servers. (Performance increased more than tenfold when HT was disabled.
It is incorrect, IMHO, that Hyperthreading and increased performance are related, at least on server platforms. Unless the server applications are concieved with multithreading in their design, it would be incorrect to assume that Intel HT would somehow figure out the assembly code and facilitate performance.
Of course I know that. It just seemed to me that if Windows had a way to decide not to use both virtual processors because the only two runnable threads were of different priorities (or run the lower-pri thread only part of the time), then the people complaining would never have noticed a problem.