Ars Technica on Hyperthreading
radiokills writes "Ars Technica has a highly-informative technical paper up on Hyper-Threading. It's a technical overview of how simultaneous multithreading works, and what problems it will introduce. It also explains why comparing the technology to SMP is Apples to Oranges, in a sense. Starting with the 3 GHz Pentium 4, this tech will be standard in Intel's desktop lines (it's already in the Xeon), so this is important stuff."
But I'd but it gives quite a boost to interactive performance. SMP setups tend to be wonderfully responsive under background loads (much more so than the sum of the CPU speeds would suggest) so I'd guess that allowing the CPU to run more than one thread at a time would make the UI a little more responsive on single-proc machines. Now, all we need are the UNIX developers to stop being afraid of multithreading and maybe some of us UNIX users would be able to take advantage of this :0
A deep unwavering belief is a sure sign you're missing something...
Yes, but since no one has a supersentient compiler and assembler like ht requires, very few programs are able to really take advantage of this.
I dig innovation. I dig more impressive chips. But it's getting to the point where boxes with top of the line CPUs are like those old VWs with Porsche engines in them: there comes a point when improving one part doesn't really matter any more.
All's true that is mistrusted
If you plan to use any of these features effectively on Windows you'll need to upgrade to Windows.NET Server. Windows 2000 can't distinguish between virtual and physical processors, so if the BIOS doesn't set up a two (real) CPU system the right way it will end up ignorning the second physical processor. My source:
. doc
www.microsoft.com/windows2000/docs/hyperthreading
So that's how we can put the thread through the needle even faster? Wow... back in MY day, we had to use our fingers to do that, in candle light, when you couldnt even see the friggin' hole! :P
And so we go, on with our lives
We know the truth, but prefer lies
Lies are simple, simple is bliss
I'm personally more partial to calling it Symmetric Multi-Threading as compared to Hyperthreading which is the brandname Intel created for the concept. Sort of like Xerox versus Photocopy. Of course there are some mix-ups for those who seem to think of the multi-threading as OS based and not hardware. Eh, personal preference.
What is music when you despise all sound?
when will someone develop a processor that will automatically multithread tasks? i.e. you don't have to explicitly ask for new threads, it optimizes the code into threads for you?
yes, I realize this is anti-geek, so this processor would also allow you to take control of thread creation by flipping a register or something.
It's incredibly difficult to automatically parellelize a program well. Even when you can run a preprocessor on it and spend days on computations; doing it in real-time in hardware is even more difficult. This is currently done to a small extent in the pipelining hardware of modern CPUs, and even that small bit of automatic parallelization is ridiculously complex and slows things down (which is why the Itanium dumped it, and put the onus on the computer to paralellize sufficiently for pipelining to work). If it's that difficult to do for the relatively meager paralellization requirements of pipelining, actually breaking the program into separate execution threads is damn near impossible with current technology (at least with any efficiency even remotely approaching writing a program to be properly multithreaded in the first place).
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
What's next, LudicrousThreads?
obligatory spaceballs reference
mp3's are only for those with bad memories
oh no!
Sincerely,
Intel
--
pants ahoy
The company that now owns the name Cray does something very much like this on a fairly grand scale on its own architecture, the MTA (Multi-Threaded Architecture). Here, each processor switches between 128(!) hardware threads to take advantage of the sort of concurrancy you can get for waiting for memory access, etc.
I don't know where you're getting your info about Oracle, but it's wrong. Oracle licensing is determined per-physical CPU. This was something we made doubly-sure to check up on when migrating from our old Oracle server to our new one (dual Xeon w/HT).
On the downside of HT, until the 2.6 (or 3.0, subject to Linus' whim) kernel comes out, there's no point in enabling HT on a Linux box; because the 2.4 scheduler is unaware of HT, all CPUs are treated the same, and the scheduler ends up starving one physical CPU. Performance on a dual-1.8Ghz Xeon, 1Gb RDRAM with HT enabled under 2.4.10 is roughly 5-15% slower than with HT disabled.
2.5.31 with the HT patch dramatically reverses these numbers, providing an average performance that is 30% better than 2.4.10 without HT. YMMV, of course, and I'm not talking about OS performance, I'm talking about Oracle's performance. Still, 30% increase just for flipping a switch in the BIOS and recompiling the kernel is nothing to sneeze at.
Arr! The laws of physics be a harsh mistress!
KernelTrap has had some articles on Linux's support of HT. Ingo Molinar has been working on tuning the scheduler for HT systems. Articles are here:
e rneltrap.org/node.php?id=406
http://kerneltrap.org/node.php?id=391
http://k
</karmawhoring>
Using your sig line to advertise for friends is lame.
They call this stuff Symmetric Multi Threading, but I think that name is a bit misleading. While the thread scheduling itself is symmetric (all process threads are created equal and receive equal execution time), the shared resources on the CPU (cache, shared registers) are NOT symmetric. Since these shared resources are in essence handled on the way in to the execution unit, it becomes really easy to starve the processor when you have contention for one of those resources.
While proper application development can alleviate some of this issue, it will depend heavily on the actual usage patterns of the system. When you have a lot of overlap coming in from memory (like the file system cache on a web server), you don't worry too much about threads stepping on each others' registers. This sounds fantastic for data servers.
Desktop systems, on the other hand, almost never work this way. When you're playing MP3s in the background while web surfing and checking your email, you're already working with vastly different areas of data. Throw the OS and any various background processes into the mix and you've pretty much eliminated any gain and possibly slowed down due to cache contention.
While this was touched on at the end of the article, I don't think it was given enough weight. It doesn't just depend on what applications you're running and wether they were written to take advantage of it. It depends on what you want to do with the whole system. For serving data, this will certainly be good (especially with multiple CPUs!). For desktop systems, this is a non-starter.
I'm not disparaging the technology - far from it. I'm just waiting for Intel and Microsoft to market this to my mom as a way to have higher quality DVD playback - at twice the cost. And her buying it. Again.
Culture is more than commerce
From http://www.hardocp.com/article.html?art=MzEw :
As Barton and MP were mentioned, I did think to ask what [Richard Heye, AMD Vice President of Platform Engineering and Infrastructure and the Computation Products Group] thought about the threat of Intel's Hypertheading. While I see Hyperthreading as possibly becoming a very useful add-on for the Intel CPU, I can assure you that Richard Heye does not. In fact, the subject of Hyperthreading seemed to excite him. Mr. Heye explained that he had been reading papers on the subject for years and that for Intel to bring Hyperthreading to market successfully, they (Intel) were going to have to throw many more dollars at the marketing side than the development side of the issue.
"If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
When Intel switched from the P3 architecture to the P4 architecture, they increased the depth of their pipeline from 10 to 20, I believe. My understanding was that this significantly increased the performance penalty for mispredicts for branches and whatnot requiring a flush of the pipeline. I am curious if adding SMT to this will increase the penalty for mispredicts even more, if both threads must be flushed or only the one. If this is the case, are there cases where the penalty would outweight the benefit?
First Falcon-1 to orbit, then Falcon-9. Then I can die a happy man.
How many processor licenses does Oracle charge for a Power4, which is literally 4 PPC processors on a single die? What about a clustering approach that presents a server farm as a single virtual CPU?
So many technologies can interfere with processor count that Oracle and Microsoft are using whatever is a best case scenario for them. If licensing is by physical silicon only, future iterations of multi-processing on die will really hamper software provides profitability - something you know they will not stand for.
If it was exclusively per CPU, you would also see a lot of shops always buying the absolute fastest processors available, and specialty shops selling factory over clocks of those processors. Reduced licensing costs would actually make the price of exotic cooling methods and reduced cpu life look good.
Same rule applies to Co-location in a different way. How much power can you stuff into 1u of rack space?
If the most costly machine you can buy is a 48 CPU machine that can fit into 3u using Quad processors cards on a back plane but costs less in the long term because you are not paying for 24u of rack space that dual processor 1u machines would take, you buy it. Even if your per cpu cost is 10 times the cost of more conventional systems, the machine pays for itself in rack space costs in 10 months. After 18 months you upgrade the machine because by then you are paying twice as much for per cpu licenses as you could be paying with modern hardware.
Note to businesses: Upgrade now while prices are depressed, and interest rates are low. Sticking with your old hardware is costing you in the long term.
Take out a loan and upgrade. If your hardware is over 18 months old, you can cut your licensing costs in half. Don't sit on hardware when you are just waiting for it to break.
IT is not a static business. Do not keep your hardware until it has no resale value. Do not keep your hardware until you are paying twice as much for licenses as you could be paying. Do not balk at high up front costs if it saves you 10 times it's upfront cost due to licensing/rack space costs. Do not keep old machines that are costing you three times as much in electricity at a given performance level.
Do a real cost analysis, put in the time. This is the perfect time to upgrade. Competition has never been more fierce for the dollars you have to spend. You will get more value for your dollar now than you ever have been able to.
IT is crap as capital. It has no value in three years. Keep you IT expenditures dynamic to avoid riding your capital investment into the ground. Playing the depreciation tax game will not save you nearly as much as keeping old hardware costs you in other areas.
Disclaimer: I am not invested in any IT infrastructure provider and I do not do IT consulting. I just have to run my own shop like the rest of you.
If voting were effective, it would be illegal by now.
You have a very significant mis-understanding of pre-emptive multi-tasking. There is no situation where a locked process cannot be killed on a single CPU system but can be on a multiple CPU system.
When the locked application's timeslice runs out, other applications will get a go, and from that it it possible to kill the locked application. This is one of the reasons pre-emptive multi-tasking became popular.
Since lots of people seem to be missing the point of "hyperthreading", as Intel is calling it, I feel like jumping in and trying to clarify a little bit.
Processor clocks have gotten faster and faster and faster and faster over the last decade. Multiple orders of magnitudes faster. Not only that, but processors have incorporated increasingly clever tricks to process the data they have available to them. Memory speeds have increased too, but even with DDR and all that great stuff, they haven't kept pace. So there are times when your super-fast processor is just sitting there waiting around because it's run out of data to process.
Even if you could (cheaply) make memory that actually ran at 2 GHz or whatever, this would not solve an even more fundamental problem that makes the situation worse: due to the speed of light, a 2 GHz processor is going to have to wait a really significant amount of time if it has to wait on main memory before it's time to process something.
So, here's a question for you: if the processor has to wait a really long time, maybe enough time to execute maybe like 50 instructions, what should it do during that time? Should it:
Well, the idea behind the hyperthreading (a/k/a thread-level parallelism) is that the processor should make some sort of effort to do something.
So, IMHO hyperthreading isn't stupid or a marketing ploy. It's a genuine attempt (one that many processor makers are working on, by the way) to solve a genuine problem. And not only a genuine problem, but one that will increasingly become a bottleneck. (It's already bad enough that it has its own name: "The Von Neumann Bottleneck".)
And by the way, the advantage of this over two processors is that you don't have to build two chips! You don't get double the performance, but it's quite possible that you might get a better bang for the buck. (Notice I said "might".)
Also note on the cache pollution issue (where one thread slows down another by "hogging" the cache and actually causing slower execution for another) that there are ways to mitigate this problem. An obvious one that comes to mind is to bias the processor towards executing a particular one of the threads. That way, one thread runs much more often and should tend to have what it needs in the cache.
Anyway, until the economy gets better and I find a way not to be one of the masses of unemployed software developers anymore, I'm not buying one of these fancy processors...
To scale well you want to lock data rather than code and that can lead to many locks when you are operating on many structures. Ideally these locks each have less contention and better data sharing than "bigger" locks.
UC San Diego has been a leader in research on hyperthreading. We used to have the Tera MTA, which kinda pioneered the whole field, and we have Dr. Dean Tullsen (and his lab of students), whose hyperthreading architecture was used in the new, now-cancelled, alpha chip.
l
.plan file at the time openly questioned the same claim, so I took the single threaded, computation-intensive utility for Quake2 (BSP; LIGHT & VIS are multithreaded) and ran them on the Tera. Nutshell: it couldn't find parallelism. The 300Mhz Tera supercomputer ran at the equivalent speed of a 600Mhz Pentium. Which is crap considering the incredible memory bandwidth and number of computational units it had available.
References: The Tera: http://www.cs.ucsd.edu/users/carter/Tera/tera.htm
Dean Tullsen: http://charlotte.ucsd.edu/users/tullsen/
I was one of the first five students to use the Tera after it came out of development. I decided to take a different approach in evaluating its performance. I didn't like what the Tera corporate benchmarkers were doing. Which was taking applications with known parallelism, writing a serial version of the code, and then post with glowing reviews the results of the Tera automatically finding parallelism, ignoring that the number of pragmas they had to put into the code to allow the compiler to discover parallelism was more work that just writing a parallel code oneself.
I instead called them on their advertising that their compiler could discover latent parallelism in any computation-heavy code. I noticed John Carmack's
When I reported the results to Carmack, his response was, "I have never been a big believer in magically parallizing dusty deck codes. I don't mind specifying explicitly parallel activities and threads, especially with the large payoffs involved."
Cheers,
Bill Kerney