Hyper-Threading Explained And Benchmarked
John Martin writes "2CPU.com has posted an updated article about Hyper-threading performance. They discuss the technology behind it, provide benchmarks, and make observations on what the future holds for hyper-threading. It's actually an easy, interesting read.
Of note, they'll be publishing Part II in the near future which will detail hyper-threading performance under Linux 2.6. Hardware geeks will probably appreciate this."
There was an interesting discussion on the Plan9 newsgroup about hyperthreading recently, read here
For those more technically inclined I would suggest reading Intel's Hyper-Threading Technology Architecture and Microarchitecture whitepaper instead.
I hate to say it, but your logic is flawed.
To put hyperthreading into your car analogy:
Hyperthreading is like a car that has power assisted steering. If you want, you can switch it off; you'll likely have a slightly smoother time with it on. But if you want the control (or don't trust it) then you can switch it off.
For the geek who reads posts as a stack of strings delimited by <br>, Nobody's forcing you to use hyperthreading. Use it, don't use it. Don't complain that it's a Bad Thing[tm] simply because you're being given the choice
Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"
"they'll be publishing Part II in the near future"
Part II should've been published concurrently, using idle time... tch!
If you are really interested in the how and why of hypertreading in suggest you read trough the lecture notes of Computer System Architecture at MIT OpenCourseWare. This gives you enough background to race trough all the articles at Ars Techica et al.
karma police: arrest this man, he talks in maths; he buzzes like a fridge, he's like a detuned radio. [radiohead]
Perhaps I'm feeding a troll here, but....
64 bits, while not interesting in and of itself, is interesting in AMD's implementation. I have an UltraSparc sitting on my desk at work, and I assure you it's one of the most boring machines in the world. Why is AMD interesting? In the Opteron/Athlon 64 they've fixed some of the shortcomings of the x86 architecture. More registers. Access to more than 4GB of RAM without menutia (like Intel uses). Things that were expensive in a register-starved 32 bit processor aren't on an Athlon64.
No, it's not innovative, not by a longshot. It's the same damn thing Intel did when they introduced the 80386. But it continues the line unbroken, and that's why the processor is important.
Hyperthreading is interesting, I agree, but I'd much prefer more affordable dual processor machines. Why in the world do Intel, AMD, and Microsoft go out of their way to keep SMP machines off the desktop? Apple certainly is going in the opposite direction.
Whether it's something obvious like the Pentium off by 1+1=1.9999943 error
The Pentium math bug was with division, not addition, and it only occurred in very specific circumstances. So while it supports your general point that complicated systems are more difficult to debug, that wasn't a very good example of an "obvious" bug. Careless, yes.
One thing that was good for the industry was to move away from the complex instruction set (CISC) towards a reduced set of instructions (RISC), and we have seen the speed improvements as well as a general reduction in hardware bugs since that time.
You do realize that Intel x86 processors are still CISC, right? (OK, actually internally they do execute things very much like a RISC chip, but the instruction set is still CISC, and modern x86 processors are certainly not any _simpler_ for having some RISC-like elements to them.
Besides, RISC chips don't actually have fewer instructions. Most of them these days have more. The difference between CISC and RISC is that RISC chips don't have certain complicated, slow instructions, but rather break these up into smaller pieces. For example, CISC processors usually have an instruction to move memory-to-memory while RISC only moves memory-to-register and register-to-memory. Also, CISC processors often have a division instruction while many RISC processors instead just have a multiplicitive inverse instruction (so to compute a/b you instead compute a*inv(b)).
But to add Hyperthreading, an untested and unproven technology which can guarantee no more than a 12% speed improvement, is folly. Better to amp the CPU clock and deal with a known like heat than to risk your company's livelihood on letting the CPU figure out which thread is which. That is something an OS is much more reliable in handling.
Now that's just ridiculous. Hyperthreading is not untested or unproven. Similar ideas have been discussed in academic papers for years; Intel was just the first to put it into a modern CPU. It's hardly untested, either - Intel started seeding the first Hyperthreading-capable processors what, two years ago now? At that point I wouldn't have suggested running a mission-critical application on a machine with Hyperthreading enabled, but now? You'd be crazy not to if it actually speeds up the application you need to run.
The reality is that in order to advance the speed of computer processors, it's necessary to make them more complicated.
Well Intel is already encountering heat problems which limit how fast they can crank the clockspeed. Hyperthreading is a moderately successful attempt to make use of the available execution units on the chip which would otherwise sit idle. It's also not so new and untested, it has been implemented but not enabled on earlier P4 steppings.
Athlon and Athlon64 are generally better able to make use of their execution units, and wouldn't benefit from HT as much as P4/Xeon.
I think they made a mistake here. ..."
From the article:
"Sandra's CPU benchmark is obviously quite optimized for hyperthreading at this point, and the numbers certainly show that. We see an average improvement of ~39% when hyper-threading is enabled on the P4
The numbers are:
4328 without HT
7125 with HT
You could say that disabling HT makes this benchmark 39% slower. But the the increase by turning HT on is
7125/4328-1 = 1.646 - 1 = 0.646 = 64.6 %
Hrmpf.
I do believe that HT does have future, perhaps not in its present form, but still.
:)
I do remember when there was that RISC vs CISC thing in the 80s, people were saying that CISC was obsolete, RISC being the future and so on. What we see today is not pure RISC processors but something in between. -- It's just that the answer was not that pure or clean as people thought at first.
Few years ago there was BeBox and its BeOS. Well, BeOS had the philosophy for a machine not having a single super-powerful-burning-hot processor but, instead, several low-power combined.
Well, Hyper-Threading may push distributed processing technology to the desktop, to the masses, so we might have interesting changes in software and hardware philosophy in the future.
Sort of romantic thinking... But one can dream.
... I learned from this article.
Why would you want to have a virtual double processor when... you can actually get a second one? Both changes require that you change your motherboard (One for HT, one for Dual Sockets). Dual Celerons sounds like a good cheap buy, or even Dual Athlons. Why bother with this? Except for the coolness factor of having your POST screen littered with "Hyperthreading Enabled", and in most cases it's not even called that, i forgot what they really write on the screen. Seriously, i wouldnt put my money that HT will be even copied to other manufacturers any time soon, unlike SSE or MMX.
Trolls dont like to be Flamebait, because they burn so well. Protect our Troll heritage!
Unfortunately, historically CPU speed has increased faster than memory bandwidth. That's why we've had ever more layers of cache added to our systems, to make up for the relative deficiency.
Unless things change, a technology that works better with a higher ratio of memory bandwith / CPU speed is likely to become progressively less, not more effective.
Of course, there's always the argument that marketing reasons have pushed CPU clockspeed faster than memory bandwidth, and that Intel et al will just shift their focus more towards memory in future. But defying the tide of 'what people think they want' is usually risky.
I have found HyperThreading a real boost for developing operator training simulators (think giant custom computer game for process plant operators [eg: Oil refineries, gas plants, chemicals, etc...]) where the a single thread will totally consume the resources of a single CPU (we call it "no-wait" where the simulation calculates what happens in the next 2 seconds and then immediately jumps to the next timestep, thus fast forwarding through slow parts of a process start-up such as warming a reactor).
An issue we encounter is the DCS (Distributed Control System) interface (the bit that links the PC to the fancy membrane keyboards, touch screens, alarm annunciators that the operator uses on the real plant [to maximise training benefit]). Although the interface typically only uses 0.5 to 2% of the CPU, when the simulation goes flat out, there is a noticable impact on other threads to the point where there is timeouts on data requests from the operator console.
In summary, if you have a system where some threads are IO bound (in our case, processing requests coming across via ethernet) and other threads are CPU intensive (high end numerical calculations) you will see a definite benifit. It allows us to give every team member a machine fit for the job at approximately 1/3 the cost (those of you who wish to argue that SMP machines are cheaper, we are bound by corporate purchasing agreements where SMP falls into the "Workstation" catagory while a uni-processor HT machine falls into the far cheaper "Desktop" catagory).
If you are performing just purely calculations and need to run two parallel threads, I would recommend a SMP or similar machine.
As always your milage may vary.
ZombieEngineer
Just to clarify here, this is not the same idea as the Cell architecture.
The Cell architecture (which may or may not be used for the PS3) is a multi-processor system designed for scalability; It really does have several processors running at the same time. In contrast, 'Hyperthreading' runs multiple threads on a single processor's core.
They both require multi-threaded code to achieve performance improvements, but fundamentally they're really quite different, and yield quite different price / performance trade-offs.
Being a console programmer, and having done quite a bit of work on the PS2, there is something in your comment that is a common misperception. You say that hyperthreading works great when you have people who know their processor upside and down and are not afraid of assembler, well, I am not afraid of assembler, and have done quite a bit of it. The problem is that writing in assembler tends to be slow, especially when trying to do heavy optimization. This takes time, a luxury generally not available to those of us in video games who tend to have hard christmas deadlines to ship our product. For Sony to assume that people are going to learn how to program in assembly is a mistake, as learning assembly isn't the issue, having the time to optimize the code in assembly is the issue. This isn't helped by the fact that most of the tools made available to us are piss poor, which makes working on the code much more difficult. For example, the PS2 has the vector units that are generally programmed in assembly. Not only do you need to make sure that the processing done by the vector units synchronizes with your main CPU, but you don't have ANY sort of debugging capability on these. Because of this, programming vector unit code is incredibly slow.
In addition, video games are things that don't always lend themselves particularly well to running in multiple threads. I have my artificial intelligence code, collision & physics code, and my rendering code. These 3 parts are the main parts of the code that take roughly 90-95% of the total CPU time available to me. I can't run collisions and physics until after the AI has run, and I can't run my rendering until the collision & physics have been run. I can multi-thread individual game objects, but even these constantly interact with each other. This isn't normally a problem if you double buffer it in a way that, for example, after the AI has run, I keep the current frame's AI output around somewhere while I run the next frame, but this requires additional memory, another resource that is scarce on consoles.