Hyper-Threading Explained And Benchmarked

Re:Ever buy a car with auto-everything? by pdbaby · 2004-01-06 21:04 · Score: 5, Interesting

I hate to say it, but your logic is flawed.

To put hyperthreading into your car analogy:
Hyperthreading is like a car that has power assisted steering. If you want, you can switch it off; you'll likely have a slightly smoother time with it on. But if you want the control (or don't trust it) then you can switch it off.

For the geek who reads posts as a stack of strings delimited by <br>, Nobody's forcing you to use hyperthreading. Use it, don't use it. Don't complain that it's a Bad Thing[tm] simply because you're being given the choice

--
Global symbol "$deity" requires explicit package name at line 2. - If only $scripture started "use strict;"

Re:Just Marketing BS by Intel to get suckers to bu by idiotnot · 2004-01-06 21:19 · Score: 5, Interesting

Perhaps I'm feeding a troll here, but....

64 bits, while not interesting in and of itself, is interesting in AMD's implementation. I have an UltraSparc sitting on my desk at work, and I assure you it's one of the most boring machines in the world. Why is AMD interesting? In the Opteron/Athlon 64 they've fixed some of the shortcomings of the x86 architecture. More registers. Access to more than 4GB of RAM without menutia (like Intel uses). Things that were expensive in a register-starved 32 bit processor aren't on an Athlon64.

No, it's not innovative, not by a longshot. It's the same damn thing Intel did when they introduced the 80386. But it continues the line unbroken, and that's why the processor is important.

Hyperthreading is interesting, I agree, but I'd much prefer more affordable dual processor machines. Why in the world do Intel, AMD, and Microsoft go out of their way to keep SMP machines off the desktop? Apple certainly is going in the opposite direction.

Re:SMT by John+Courtland · 2004-01-06 21:28 · Score: 3, Interesting

Yeah, this is the idea behind the new Cell architecture in the PS3. Dumping the old ideas of having a single threaded model and doing everything in multiple threads where global data can be dynamic with each thread containing its own local storage. Done properly, it's blazingly fast. Done poorly, and you end up with race conditions, blocking semaphores, and generally poor code and poor performance. The only problem is that using the paradigms we have today, very few are capable of programming this style right now. The closest people I can think of are the Michael Abrashes, optimization zealots (not saying it's a bad thing), who know their processor upside and down and are not afraid of assembler, or rescheduling instructions to get the most power out of each cycle, instead of letting an optimizing compiler do it for them.

--
Slashdot is proof that Sturgeon's Law applies to mankind.

Wrong percentages? by OMG · 2004-01-06 21:29 · Score: 5, Interesting

I think they made a mistake here.
From the article:
"Sandra's CPU benchmark is obviously quite optimized for hyperthreading at this point, and the numbers certainly show that. We see an average improvement of ~39% when hyper-threading is enabled on the P4 ..."

The numbers are:
4328 without HT
7125 with HT

You could say that disabling HT makes this benchmark 39% slower. But the the increase by turning HT on is
7125/4328-1 = 1.646 - 1 = 0.646 = 64.6 %

Hrmpf.

Re:Wrong percentages? by Glonoinha · 2004-01-07 03:02 · Score: 3, Interesting

Crap you are right - just by turning on HT on the same box he saw a 65% boost in performance.

I think it was a case of -wanting- to see a specific number and juggling things in his head until he got the number he wanted. Intel touts the 30% range and if he initially got the 65% number he probably discarded it and kept juggling the books to get the number in the 30's that he wanted.

As someone that has a P4 2.4 (not HT) box sitting right next to a P4 2.4 (HT) box I will assure you that in real life you are not going to see a 65% sustained boost in performance in day to day use. Not 30% sustained boost either, unless you are only running apps that are heavily optimized and multithreaded.

--
Glonoinha the MebiByte Slayer

Being philosophical on this... by keeboo · 2004-01-06 21:33 · Score: 5, Interesting

I do believe that HT does have future, perhaps not in its present form, but still.

I do remember when there was that RISC vs CISC thing in the 80s, people were saying that CISC was obsolete, RISC being the future and so on. What we see today is not pure RISC processors but something in between. -- It's just that the answer was not that pure or clean as people thought at first.

Few years ago there was BeBox and its BeOS. Well, BeOS had the philosophy for a machine not having a single super-powerful-burning-hot processor but, instead, several low-power combined.
Well, Hyper-Threading may push distributed processing technology to the desktop, to the masses, so we might have interesting changes in software and hardware philosophy in the future.

Sort of romantic thinking... But one can dream. :)

Cache Contention by Detritus · 2004-01-06 21:59 · Score: 3, Interesting

Do any modern chips support per-process cache reservation? That would alleviate some of the problems reported in the article.

--
Mea navis aericumbens anguillis abundat

RISC gives you more bang for your buck by putaro · 2004-01-06 22:04 · Score: 4, Interesting

All things being equal, RISC gives you more bang for your buck. The difference is that Intel has pushed CISC, or specifically the x86 architecture, as fast or faster than RISC by using more bucks. The amount of R&D dollars powered into x86 vs the amount poured into PowerPC or Alpha is overwhelming.

When I was at Apple our processor architect, Phil Koch, gave a talk in, I think, 1997, where he said that the PowerPC consortium had essentially optimized for power consumption and dollars spent on R&D. What was amazing at that time was that PowerPC was competitive with Intel given much lower power consumption and much lower investment of R&D dollars. However, noone really cared about lower power consumption so it didn't translate into any real advantage. Without the R&D dollar leverage given by RISC, however, the PowerPC would not have been able to compete at all. Pushing the 68K architecture to be competitive with Intel with the same R&D dollars as PowerPC would have been impossible

Re:RISC gives you more bang for your buck by Waffle+Iron · 2004-01-07 03:19 · Score: 4, Interesting

All things being equal, RISC gives you more bang for your buck.
Maybe, maybe not. However, it's hard to tell because nobody makes RISC or CISC processors anymore. The RISC concept, implemented in CPUs like the MIPS R3000, originally meant very simple hardware without pipeline interlocks, instruction schedulers, or more than an absolute bare-bones set of instructions. The current Power PC does not match this at all; it is closer to the current X86.
By the same token, CISC used to mean that many or most instructions were implemented in microcode on the processor. Once again, that's no longer the case. All X86s now have a RISC-like core and resemble the Power PC far more than the 80286.
Pure RISC designs and pure CISC designs have both been superceded by a hybrid approach, and neither one would be competetive today outside the embedded device market.
Basically, you were being fed a line of company FUD to get you all excited about their choice of CPU. Today, cache memory dominates the chip real estate, and CPU performance and power consumption are dictated almost exclusively by cache size and silicon process technology rather than these surface architectural details.

Quick Q by AvengerXP · 2004-01-06 22:33 · Score: 5, Interesting

Why would you want to have a virtual double processor when... you can actually get a second one? Both changes require that you change your motherboard (One for HT, one for Dual Sockets). Dual Celerons sounds like a good cheap buy, or even Dual Athlons. Why bother with this? Except for the coolness factor of having your POST screen littered with "Hyperthreading Enabled", and in most cases it's not even called that, i forgot what they really write on the screen. Seriously, i wouldnt put my money that HT will be even copied to other manufacturers any time soon, unlike SSE or MMX.

--
Trolls dont like to be Flamebait, because they burn so well. Protect our Troll heritage!

Re:Quick Q by Ramze · 2004-01-07 01:26 · Score: 2, Interesting

I believe AMD has plans to incorporate more than one CPU on-die in the future. First 2, then 4, etc.
It'll be interesting to see what happens to "hyperthreading" when dual and quad processors come standard on desktop systems for home users.
I look at Hyperthreading as a quick hack to improve response times on a few things. It's a minor speed boost as well, but I think it has enough drawbacks to merit it as only a minor improvement which may not always be a good idea to have enabled. I doubt it will stick around once true dual-processor systems are in the majority, though that's not going to be anytime soon.
In any case, Intel knows it's not a major marketing point or they'd be screaming "Hyperthreading is what you have to have!" like they did w/ MMX.
My response is more like "Hyperthreading... woohoo... call me when you come up with something more interesting."
Re:Quick Q by iconian · 2004-01-07 05:15 · Score: 2, Interesting

It's not that simple. I believe the cheapest HT processor from Intel is the P4 2.4 Ghz, priced at $161. You can buy one Athlon XP 2400+ for $75. A dual processor Athlon motherboard probably costs more than a single processor Pentium 4 motherboard and you will probably have to pay for a bigger power supply unit. However, I don't think dual logical processors in a single Pentium 4 can beat 2 real Athlon XP 2400+ processors performance wise and in performance-price ratio. (Note: I do not work for NewEgg.)

Cache contention with Hyperthreading by xyote · 2004-01-06 22:36 · Score: 4, Interesting

Threads using hyperthreading or SMT share the cache. This can be a problem if the threads are from different processes and not sharing memory. Your cache is effectively halved (with 2 hyperthreads). On the other hand, it could be a real benefit if your threads were from the same process sharing the same memory. You don't have the cache thrashing which could occur on a multi-cpu system. Since cache hits can really kill performance, this could be quite a performance boost.

To really exploit this, you'd need gang scheduling in the operating system. But it's unlikely that SMT would remain around long enough for any efforts to exploit it to be feasible. CMP with separate cache would likely take over before then since it would behave more like separate cpu's from a performance standpoint and thus offer more consistent behavior.

Re:Just Marketing BS by Intel to get suckers to bu by drsmithy · 2004-01-06 22:38 · Score: 4, Interesting

Why in the world do Intel, AMD, and Microsoft go out of their way to keep SMP machines off the desktop? Apple certainly is going in the opposite direction.

No, they aren't. The Apple "common desktop" oriented machines - the eMac, iMac and perhaps at a stretch the 1.6Ghz G5 - are all single CPU machines and are likely to remain so now the G5 has finally appeared (price alone, without going into other aspects, puts the dual G5s into workstation/high-end enthusiast desktop territory).

Apple briefly flirted with putting dual CPUs into their nearly-home-desktop machines, but this was driven by the massive speed deficit at the time of G4 CPUs - they *had* to have dual CPUs to be even remotely competitive. No matter what else Apple's marketing department might have tried to say.

If you could option a dual CPU onto an eMac, and all the iMacs were dual CPU, then your comment would be accurate. Two high-end machines out of a base range of seven (and that's ignoring the laptops) is not a paradigm shift. By that measure, just about any major manufacturer is "going in the opposite direction".

bad programming ... by Anonymous Coward · 2004-01-06 22:42 · Score: 1, Interesting

i'm not 100% sure bout this but i just got da
fishy feeling that hyper threading really is just
to make life easier for novice/beginner programmer
to write programs in "high" level languages (say
Vbasic, or just basic ;) ) that can compete in
performance to programs writen by cracks, say in
assember or C / C++.

i believe CPU manufactures shouldn't care about
this but should cater to the cracks, not the
beginners.(*)

looking at what programms are writen in and then
adapting the CPU to this isn't really the
way to go /methinks. especially if what i'm
guessing should turn out to be true it would be
terrible for a MAINSTREAM processor to make these
bold claims.

i mean it would be okay to market a
"hyperthreading" as a optimizing CPU for
high-level languages or something but making the
claim that it also speeds up execution times for a
assembler program that has been optimized on paper
by the programmer is ... wrong.

(*) of course the market goes where the money is
but at least label the product correctly ...

p.s. anone noticed how long "calc.exe" takes to
load on AMD Athlons?

Future prognosis for HT by sam0ht · 2004-01-06 22:48 · Score: 5, Interesting

From the article: "As bus speeds increase, and more cache becomes available on die, hyper-threading is going to be more and more efficient. It appears to be somewhat of an engineering symbiotic relationship."

Unfortunately, historically CPU speed has increased faster than memory bandwidth. That's why we've had ever more layers of cache added to our systems, to make up for the relative deficiency.

Unless things change, a technology that works better with a higher ratio of memory bandwith / CPU speed is likely to become progressively less, not more effective.

Of course, there's always the argument that marketing reasons have pushed CPU clockspeed faster than memory bandwidth, and that Intel et al will just shift their focus more towards memory in future. But defying the tide of 'what people think they want' is usually risky.

Re:Just Marketing BS by Intel to get suckers to bu by ThaReetLad · 2004-01-06 22:59 · Score: 3, Interesting

I wouldn't say that intel and AMD are against dual CPU machines on the desktop exactly, its just that they cost too much for most users, and most of the time money is better spent on a high end single processor machine than a dual processor one. Of course that is mostly to do with the fact that most SMP systems available up until now haven't scaled very well, not least because with Athlon MP's and Xeons the second CPU has to share the available bandwith with the first. Now though there is the Opteron dual processor system and for the first time low end SMP systems scale memory bandwidth linearly with the number of CPUs so a system with 2 CPU's operates almost twice as fast as a single CPU machine, whereas before you'd be lucky to get a 50% improvement. What will be intersting to see in 2005 will be the dual core Athlon FX type chips. These will basically be 2 of the current Athlon 64 (754 pin) CPU's on a single die each with it's own single channel memory controller. The question is, what are they going to call these chips? They'll have a PR rating of about 6800, just using 2 of the currently available cores!!

--
You can't win Darth. If you mod me down, I shall become more powerful than you could possibly imagine

Situations where HT really becomes useful by ZombieEngineer · 2004-01-06 23:36 · Score: 5, Interesting

I have found HyperThreading a real boost for developing operator training simulators (think giant custom computer game for process plant operators [eg: Oil refineries, gas plants, chemicals, etc...]) where the a single thread will totally consume the resources of a single CPU (we call it "no-wait" where the simulation calculates what happens in the next 2 seconds and then immediately jumps to the next timestep, thus fast forwarding through slow parts of a process start-up such as warming a reactor).

An issue we encounter is the DCS (Distributed Control System) interface (the bit that links the PC to the fancy membrane keyboards, touch screens, alarm annunciators that the operator uses on the real plant [to maximise training benefit]). Although the interface typically only uses 0.5 to 2% of the CPU, when the simulation goes flat out, there is a noticable impact on other threads to the point where there is timeouts on data requests from the operator console.

In summary, if you have a system where some threads are IO bound (in our case, processing requests coming across via ethernet) and other threads are CPU intensive (high end numerical calculations) you will see a definite benifit. It allows us to give every team member a machine fit for the job at approximately 1/3 the cost (those of you who wish to argue that SMP machines are cheaper, we are bound by corporate purchasing agreements where SMP falls into the "Workstation" catagory while a uni-processor HT machine falls into the far cheaper "Desktop" catagory).

If you are performing just purely calculations and need to run two parallel threads, I would recommend a SMP or similar machine.

As always your milage may vary.

ZombieEngineer

HT is awesome by Jeppe+Salvesen · 2004-01-07 00:25 · Score: 4, Interesting

In the app we develop here at work, we are highly conscious of performance and scalability. Simply put - the more transactions we can process, the bigger and happier the customers. And more money in our pockets.

With Xeon with HT, our performance has increased quite dramatically. We use Perl, so we simply fork off the jobs that do the processing. The result is that we fill all the four virtual processors in Linux if we have a sufficient number of jobs running.

--

Stop the brainwash

how to enable for older processors? by Pivot · 2004-01-07 00:38 · Score: 2, Interesting

I have a computer with dual Xeon 1.7GHz. Those apparently have HT capability built in, but it's not enabled in the BIOS. Anyone know a way to circumvent this to enable HT on these?

Re:SMT by jtshaw · 2004-01-07 01:50 · Score: 3, Interesting

You right, very few people can code a program that works well on an SMT processor. It is a lot to keep track of and quite honestly, most of the code I have seen churned out at software companies was done in such a rush because of deadlines the programmers didn't have time to optimize there code.

However, there is no reason why you can take two single threaded processes and use one to fill the holes in the pipeline left by the other so SMT should still have a decent benifit if the kernel scheduler is prepared for this.

Memory bottleneck (was: Future prognosis for HT) by davecb · 2004-01-07 01:53 · Score: 4, Interesting

One of the reasons for hyperthreading (aka chip multithreading) is the slowness of memory and cache.

If you refer back to Marc Tremblay's CMT Article, you'll see that one of the approaches is to run one thread until it blocks on a memory read, then run another until it blocks and so on, repeating for as many threads as it takes to soak up all the wasted time waiting for the memory fetches.

The Sun paper on their plans for it is here. Have a look at page 5 for the diagram.

--dave (biased, you understand) c-b

--
davecb@spamcop.net

Analogy by attonitus · 2004-01-07 02:15 · Score: 4, Interesting

This could be analogous to two people in moderate shape being able to pile more wood in total, than a single person who's in great shape

Could be, but isn't. A better analogy would be two people using the same narrow corridor to perform to chop and pile wood. If one piles wood, whilst the other chops, then they perform better than one person. If they both chop wood, and then both pile wood then they waste lots of time trying to squeeze past each other and accidentally hitting each other with axes.

Okay, so it's not that much better an analogy. But it least it bears some relevance to HyperThreading.

Re:SMT by nikh · 2004-01-07 02:24 · Score: 5, Interesting

Just to clarify here, this is not the same idea as the Cell architecture.

The Cell architecture (which may or may not be used for the PS3) is a multi-processor system designed for scalability; It really does have several processors running at the same time. In contrast, 'Hyperthreading' runs multiple threads on a single processor's core.

They both require multi-threaded code to achieve performance improvements, but fundamentally they're really quite different, and yield quite different price / performance trade-offs.

HT and VMWare: perfect together! by pw700z · 2004-01-07 02:30 · Score: 3, Interesting

I use VMware workstation extensively... and HT rocks. Ever have a virtual machine go to 100% CPU utilization, and your machine slow down to a crawl? With the extra 20% of cpu available, you system can still function and be responsive, and allow you to deal with whatever is going on. Or I can run two VMs and get much better performance out of them and the system as a whole.

Re:SMT by Radius9 · 2004-01-07 03:14 · Score: 5, Interesting

Being a console programmer, and having done quite a bit of work on the PS2, there is something in your comment that is a common misperception. You say that hyperthreading works great when you have people who know their processor upside and down and are not afraid of assembler, well, I am not afraid of assembler, and have done quite a bit of it. The problem is that writing in assembler tends to be slow, especially when trying to do heavy optimization. This takes time, a luxury generally not available to those of us in video games who tend to have hard christmas deadlines to ship our product. For Sony to assume that people are going to learn how to program in assembly is a mistake, as learning assembly isn't the issue, having the time to optimize the code in assembly is the issue. This isn't helped by the fact that most of the tools made available to us are piss poor, which makes working on the code much more difficult. For example, the PS2 has the vector units that are generally programmed in assembly. Not only do you need to make sure that the processing done by the vector units synchronizes with your main CPU, but you don't have ANY sort of debugging capability on these. Because of this, programming vector unit code is incredibly slow.

In addition, video games are things that don't always lend themselves particularly well to running in multiple threads. I have my artificial intelligence code, collision & physics code, and my rendering code. These 3 parts are the main parts of the code that take roughly 90-95% of the total CPU time available to me. I can't run collisions and physics until after the AI has run, and I can't run my rendering until the collision & physics have been run. I can multi-thread individual game objects, but even these constantly interact with each other. This isn't normally a problem if you double buffer it in a way that, for example, after the AI has run, I keep the current frame's AI output around somewhere while I run the next frame, but this requires additional memory, another resource that is scarce on consoles.

Slashdot Mirror

Hyper-Threading Explained And Benchmarked

26 of 245 comments (clear)