Ars Technica on Hyperthreading
radiokills writes "Ars Technica has a highly-informative technical paper up on Hyper-Threading. It's a technical overview of how simultaneous multithreading works, and what problems it will introduce. It also explains why comparing the technology to SMP is Apples to Oranges, in a sense. Starting with the 3 GHz Pentium 4, this tech will be standard in Intel's desktop lines (it's already in the Xeon), so this is important stuff."
Does amd have naything similar? Dan 20sec rule.
Remember,democracy never lasts long.It soon wastes, exhausts and murders itself. John Adams (1814)
I refuse to support Intel as long as they support Palladium and DRM.
If you don't know what Zoo Blacklisting is, click here.
But I'd but it gives quite a boost to interactive performance. SMP setups tend to be wonderfully responsive under background loads (much more so than the sum of the CPU speeds would suggest) so I'd guess that allowing the CPU to run more than one thread at a time would make the UI a little more responsive on single-proc machines. Now, all we need are the UNIX developers to stop being afraid of multithreading and maybe some of us UNIX users would be able to take advantage of this :0
A deep unwavering belief is a sure sign you're missing something...
Yes, but since no one has a supersentient compiler and assembler like ht requires, very few programs are able to really take advantage of this.
I dig innovation. I dig more impressive chips. But it's getting to the point where boxes with top of the line CPUs are like those old VWs with Porsche engines in them: there comes a point when improving one part doesn't really matter any more.
All's true that is mistrusted
If you plan to use any of these features effectively on Windows you'll need to upgrade to Windows.NET Server. Windows 2000 can't distinguish between virtual and physical processors, so if the BIOS doesn't set up a two (real) CPU system the right way it will end up ignorning the second physical processor. My source:
. doc
www.microsoft.com/windows2000/docs/hyperthreading
So that's how we can put the thread through the needle even faster? Wow... back in MY day, we had to use our fingers to do that, in candle light, when you couldnt even see the friggin' hole! :P
And so we go, on with our lives
We know the truth, but prefer lies
Lies are simple, simple is bliss
come on, more like oranges to tangerines...
you are dealing with data instruction streams going on independantly, sure maybe only x2 or more with SMP, but x anything is infinately greater than x1 when dealing with threads.
and what is really the difference with oranges and tangerines? man i hate tangerines... if anything they are worse than oranges, but so similar. all tangerines should be destroyed. and thus i have proven why hyper-threading will fail.
on that note:
ARE YOU A PHP DEVELOPER? WORK WITH ME AND MAKE MILLIONS!
Web Developer II
MARIJUANA, SHROOMS, X: ONLINE?! - E
I'm personally more partial to calling it Symmetric Multi-Threading as compared to Hyperthreading which is the brandname Intel created for the concept. Sort of like Xerox versus Photocopy. Of course there are some mix-ups for those who seem to think of the multi-threading as OS based and not hardware. Eh, personal preference.
What is music when you despise all sound?
when will someone develop a processor that will automatically multithread tasks? i.e. you don't have to explicitly ask for new threads, it optimizes the code into threads for you?
yes, I realize this is anti-geek, so this processor would also allow you to take control of thread creation by flipping a register or something.
I would agree that a SMP system holds up well. I run 2x 200MHz Pentium Pro, and it gives solid performance as a desktop. I wonder if this tech would allow a slower clock speed chip, thus cooler, that still exhibited good performance. It seems like a good idea for laptops, etc.
Like pi? Try 10,000 digits.
It's incredibly difficult to automatically parellelize a program well. Even when you can run a preprocessor on it and spend days on computations; doing it in real-time in hardware is even more difficult. This is currently done to a small extent in the pipelining hardware of modern CPUs, and even that small bit of automatic parallelization is ridiculously complex and slows things down (which is why the Itanium dumped it, and put the onus on the computer to paralellize sufficiently for pipelining to work). If it's that difficult to do for the relatively meager paralellization requirements of pipelining, actually breaking the program into separate execution threads is damn near impossible with current technology (at least with any efficiency even remotely approaching writing a program to be properly multithreaded in the first place).
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
In reference to the Itanium's pipelining, I of course meant "put the onus on the compiler..."
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
To make optimal use of hyperthreading, I'm guessing the OS guys will have to do some work, like making sure that two threads with huge, non-overlapping data sets don't get scheduled at once, and trying to schedule threads who have overlapping datasets together. And it points out another thing. Again, just when we thought we had enough, we need MORE MEMORY BANDWIDTH. The tests show that while the dual channel RDRAM was fast enough for the two HT-enabled Xeon 2.0 GHz, it wasn't enough for the two 2.4 GHz Xeons.
A deep unwavering belief is a sure sign you're missing something...
What's next, LudicrousThreads?
obligatory spaceballs reference
mp3's are only for those with bad memories
They Love Hyperthreading. Licencing is determined per CPU reported to the OS not per actual piece of silicon.
Double your licencing cost for a 5% to 30% performance improvement? I don't think so. Hyperthreading is DOA on for enterprise.
Luckly MS has decided to enable 2 CPUs in XP home so you dont have to ante up another hundred bucks for XP professional for the 5% to 30% performance improvement.
Junkware.
If voting were effective, it would be illegal by now.
oh no!
Sincerely,
Intel
--
pants ahoy
The company that now owns the name Cray does something very much like this on a fairly grand scale on its own architecture, the MTA (Multi-Threaded Architecture). Here, each processor switches between 128(!) hardware threads to take advantage of the sort of concurrancy you can get for waiting for memory access, etc.
Hyperthreading needs to be used carefully. Certain applications you will end up with signifigant performance decreases with it enabled. Hyperthreading adds additional overhead to threading models and schedulers.
scott
I for one applaud any advance in computer technology - even one that has questionable benefits. There are bound to be lots of stumbling blocks as engineers try to increase the computing power of home PCs.
Some people will say "what do you need more computing power for" - well on the discovery channel a few nights ago there was a documentary about using CAT scan data and visualization techniques to generate a real-time rendered 3D image of a human brain about ot undergo surgery. The render farm they used was massive. It was a computer science research lab in the UK (sorry, don't know which one). Having the power to do that on a chip that is affordable might save lives.
Granted, this is only a small step, with questionable benefits, but it is always good to try and push the boundaries.
This is a very good article to read for those who are not really familiar with how a processor actually does it's work. The first three pages or so are generally what a senior-level college OS course will teach you.
The distinction between a program in memory and a process in execution is important. It is also important to understand the illusion of simultaneous execution that is acheived through concurrent processes using context switches.
Given all that, the article makes it easy to understand where your performance gains (and losses) happen having multi-processors, and indeed in having multi-processing on the same chip.
All in all a good read.
From the article:
.sig file quotes: "A message from the system administrator: 'I've upped my priority. Now up yours.'")
.sig !!
(On a related note, this brings to mind one of my favorite
He stole my
A message from the system administrator: 'I've upped my priority. Now up yours.'
KernelTrap has had some articles on Linux's support of HT. Ingo Molinar has been working on tuning the scheduler for HT systems. Articles are here:
e rneltrap.org/node.php?id=406
http://kerneltrap.org/node.php?id=391
http://k
</karmawhoring>
Using your sig line to advertise for friends is lame.
I know the Hammer is 64 bit, but I've no idea about its multithreading properties...Anyone?
Invoicing, Time Tracking, Reporting
... crack open the machine and demonstrate that there isn't but the one CPU. Really, the price tag of the software needs to be determined *outside* of the product being paid for -- especially on proprietary systems.
<joke>
California might not have spent so much on Oracle licensing costs had they not relied on a calculator running this little jobber:
if (CPU_Count < 16) {
// why would they run on a machine with less
// than 16 cpus? it's an insult to our software!
ChargeForLicenses(Random(255) + 16);
}
else {
// Now we're playing with power!
ChargeForLicenses(CPU_Count);
}
</joke>
Win2k will have twice as many opportunities to freeze?
This is more of an issue of programming language support.
There are languages (well, mostly modifications to existing languges) that allow one to create a program that will scale to any number of processors.
It's actually a very tough problem, because most coders thing in terms of doing x, then y, then z. You really need to think in terms of I need these things done and they have these dependencies, but other than that, divide and concor any way you want.
parallel programming languages on Google
A speech...
They call this stuff Symmetric Multi Threading, but I think that name is a bit misleading. While the thread scheduling itself is symmetric (all process threads are created equal and receive equal execution time), the shared resources on the CPU (cache, shared registers) are NOT symmetric. Since these shared resources are in essence handled on the way in to the execution unit, it becomes really easy to starve the processor when you have contention for one of those resources.
While proper application development can alleviate some of this issue, it will depend heavily on the actual usage patterns of the system. When you have a lot of overlap coming in from memory (like the file system cache on a web server), you don't worry too much about threads stepping on each others' registers. This sounds fantastic for data servers.
Desktop systems, on the other hand, almost never work this way. When you're playing MP3s in the background while web surfing and checking your email, you're already working with vastly different areas of data. Throw the OS and any various background processes into the mix and you've pretty much eliminated any gain and possibly slowed down due to cache contention.
While this was touched on at the end of the article, I don't think it was given enough weight. It doesn't just depend on what applications you're running and wether they were written to take advantage of it. It depends on what you want to do with the whole system. For serving data, this will certainly be good (especially with multiple CPUs!). For desktop systems, this is a non-starter.
I'm not disparaging the technology - far from it. I'm just waiting for Intel and Microsoft to market this to my mom as a way to have higher quality DVD playback - at twice the cost. And her buying it. Again.
Culture is more than commerce
when will someone develop a processor that will automatically multithread tasks? i.e. you don't have to explicitly ask for new threads, it optimizes the code into threads for you?
There should be no such thing as a sequential or algorithmic task. Programs should be parallel to start with. The biggest problem in software engineering is the age-old practice of using the algorithm as the basis of programming. This is the primary reason that software is so unreliable and so hard to develop. Objects in the real world are concurrent. Why should our software objects be any different?
is this similar to or in someway related to HyperBicycles? Please, no techie answers, I don't really understand that stuff.
This is awesome, because SMP is the future baby.
After running single cpu systems for 10 years I finally antied up and built a dual 1 gig P III box, I would never go back.
There are many many reasons for this, first off my computer hasn't locked up in probobly 4 months, I always have a free processor to kill the app! Even though faily few programs are multi-threaded, SETI@HOME, Photoshop, etc, I still use them both evenly by running 3 or 4 things at once...
I still love to be able to burn a cd, listen to music and play counter-stike all at the same time.
I heard of a Higher-UP at Transmetta saying that SMP was crap one time, what a moron, no wonder they aren't doing that well.
Now maybe when we start seeing Asyncronous processor systems come down to the desktop level is when things will really start to cook..
It's not the OS it's the user that sucks. If it's user friendly, you get stupider people. - clinko
There is a writeup on Hyperthreading along with some videos from a Q/A session with Intel representatives at their last IDF (Intel Developer Forum).
/ 15 32/
http://www.hardwareanalysis.com/content/article
Yes, yes, I realize you don't need hyperthreading for that and regular multitasking is good enough...
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
What I think is going to be the quintessential question is, how many algorithms can be expressed fully in a hyperthread context? Kind of like the difficulties with Multi-processor and hyper-cubic architecture. And as someone else pointed out. How do you debug such a beast? Kind of like trying to fix a modern jet engine with a crowbar and rusty nail. The fix-it part hasn't kept up with the create-it part.
Regardless of the technical achievments that are coming out of Intel - and hyperthreading is indeed an achievment to be applauded.. The bottom line - Intels chips have beecome totally irrelevant to me, regardless of their performance since they will contain DRM restrictions.
I'm pinning my hopes on Apple and maybe even China's new Dragon chip for my future computing needs.
Regardless of the technical achievments that are coming out of Intel - and hyperthreading is indeed an achievment to be applauded.. The bottom line - Intels chips have beecome totally irrelevant to me, regardless of their performance since they will contain DRM restrictions.
I'm pinning my hopes on Apple and maybe even China's new Dragon chip for my future computing needs.
www.enthea.org
A good solution to this is to only allow threads from the same process to share a physical CPU via "hyper-threading". This makes it possible for the programmer to provide explicitly for their "cooperation", and even without programmer support, threads from the same process are more likely to use similar TLB and cache data. Traditional time-slicing will still get competing processes in and out of the CPU.
I think alot of people are missing where this will help out alot. Java servers. Java systems under high loads benefit alot from multi-threading, and this can only help.
> Hyperthreading adds additional overhead to threading models and schedulers.
How So?
The CPU simply appears as an SMP pair. But better, if you only have 1 process to schedule you get 100% of the physical silicon, rather than 50%.
If you have multiple processes, the CPU simply uses instruction slots that a single pipeline would have otherwise left unused.
It doesn't ADD overhead to anything in the software. It can be OPTIMIZED to IMPROVE some things. But out of the box, there is no substantial overhead added into the system.
So...
Have you SEEN these chips in action? Do you actually have a motherboard that has an Intel implementation of DRM?
Why not wait to see what happens, instead of spreading FUD?
Will the Itanium have DRM?
It's hard to be religious when certain people are never incinerated by bolts of lightning.
Netware 5 & 6 fully support hyper-threading.
Dual core was in original p4 design, but was dropped due to lack of time to test. I'm not sure, but it may actually be in the silicon still yet just disabled.
there is no freaking way DRM (in the CPU/MB) is going to happen for at least several years.
If it does, then we can blame Intel but they haven't done anything YET.
You sir are spreading FUD in the purest sense.
"No."
Hi, I'm having trouble reaching the linked site. Although if the page shows what I suspect it does, I probably don't really want to see it anyway.
When Intel switched from the P3 architecture to the P4 architecture, they increased the depth of their pipeline from 10 to 20, I believe. My understanding was that this significantly increased the performance penalty for mispredicts for branches and whatnot requiring a flush of the pipeline. I am curious if adding SMT to this will increase the penalty for mispredicts even more, if both threads must be flushed or only the one. If this is the case, are there cases where the penalty would outweight the benefit?
First Falcon-1 to orbit, then Falcon-9. Then I can die a happy man.
And with the addition of DRM in the intel lines we can know exactly how quickly we won't be able to install the patch for the broken / insecure bit of GNU code.
Wake me up when its about AMD.
So, if CAML programs can be threaded, does that mean Bill Gates gets to go to heaven?
Conspiracy theory of the day:
When was the last time a software company found a "cure" for a basic programming problem like deadlocks, race conditions, or garbage collection?
See, they aren't interested in actually solving problems once and for all, since it's not as profitable!
And that's why we need to abolish patents!
mod this guy up!
he's a notorious Usenet kook! slashdot is the perfect place for him!
I'm posting this on a Dell P530 development desktop, running Windows 2000 Server.
The CPU is a single Intel Xeon 2.2 GHz.
Hyperthreading can be turned on or off in the BIOS of the machine. I turned it on before I installed Win2K.
The system was seen as a dual CPU machine from the time I installed it from the original CD, before I applied any service pack.
If I disable hyperthreading in the BIOS and boot Win2K, then I only see one CPU.
I have a second Xeon CPU on order for this machine as it is dual capable. Once I get it, it should make it look like a quad CPU in Win2K.
FYI, I am also running another OS on the system, Warp Server for E-business with the SMP kernel. Unfortunately the OS2APIC.PSD driver only detected one CPU even with hyperthreading enabled. I contacted the OS/2 kernel developer at IBM Austin, who told me that somehow there needed to be explicit support for it in OS/2 SMP for it to work.
I also left about 20 GB unpartitioned on my hard disk for Linux, but I haven't gotten around to installing it yet. Thread support in Linux has historically been poor and this is the main reason why I haven't done so. With the availability of the NPTL library, I'm looking forward to installing Linux, as NPTL becomes the standard pthreads library for Linux.
-- Julien Pierre http://www.madbrain.com/blog
"Years later when Apple brought dual-processing to its PowerMac line, SMP was officially mainstream"
Actually, it's the other way around, there were dual processor PowerMacs, as well as BeBoxes, long befor dual celeron was available. What--more than any other single thing--brought SMP popularity to personal users was the Abit BP6.
According to this article Windows XP home and Pro already support Hyperthreading as does Linux Kernel 2.4.x and later.
ASUS has released BIOS upgrades to the P4T533 line of motherboards that now support Hyperthreading.
And rumors persist that Hyperthreading is on the current P4 chips (Socket 478?) and may be enabled at a later time if all goes well
http://www.kubuntu.org/
Due to the restrictions for queueing instructions on a given thread (50% for each virtual processor), I would think this would help the P4 in a couple ways.
This is kind of like reducing the 20 stage pipeline to a 10 stage pipeline. Caveat being Hyperthreading which dilutes the number of instructions from a mispredicted thread, but does not restrict the pipeline stages.
It kind of makes me wonder why Intel didn't stop at Superthreading with a hardcoded interleaving of the 2 threads... That would give a hard and fast improvement on branch misprediction and it might have made a lot of the other logic much simpler.
---- Smokin' another sig.
I've got a Abit VP6 mobo, and I currently have dual PIII 866s in it. What I was wondering is, can I use PIII Xeon chips in it, or do they require a special board? 'Cause dual 1 GHz Xeons would be SOOOO sweet...
The roots of education are bitter, but the fruit is sweet.
--Aristotle
Just buy dual 1GHz P3s, it would be the same thing.
First of all, Xeons use Slot 2. Second, they are essentially EXACTLY THE SAME CORE as normal P3s if you get 256KB cache Xeons. The higher cache models are still the same core just with more cache. Plus I don't think they make the faster P3 Xeons with more than 256KB of cache, which essentially makes them only marketable to people that don't know what they are doing.
Don't waste your money on P3 Xeons. Normal 1GHz P3s do dual just fine and cost tons less, and would perform *identically*. That is why Intel removed SMP capability from normal P4s, so that they can force people who want a new Intel-based SMP system they have to buy their ludicrously priced line of processors, which again are pretty much exactly the same except with a different socket.
Presently the high end PC market is drying up as users come to realize a Celery 1ghz and some vid ram will take them anywhere they want to go. With them went the market that allowed Moore's Law to propogate during the commoditization (ugly word) of the PC. Aside from the server/workstation market who are the buyers for this technology?
"Academicians are more likely to share each other's toothbrush than each other's nomenclature."
Cohen
here's a question:
Why use SMT on a $500-1000 processor to effectively subdivide 3 GHz among processes when there should be motherboards that support four-eight $50 processors that run at 2Gh, and allow the subdivision of 8, 10, whatever gigahertz among processes?
Basically, Intel doesn't have a mass-market application for gigahertz processors. DiVX encoders may need it, but ma on IE doesn't. So they try to lump as much processing onto the CPU as possible, damned if it should be there. If they had their way, your NVIDIA 3D processing would be directly on the chip too (SSE, MMX, or AMD's 3DNow!). But they can't do that economically with what's needed.
Hey, I'm just your average shit and piss factory.
Suns MAJC (Multiprocessor Architecture for Java Computing or something like that) tried to automatically transparently split threads into multiple threads using some kind of weird speculative logic. I don't think it worked too well...
Inicidentally, that chip was also supposed to do SMT and single-chip-SMP and SIMD. Dunno how well it faired, I kinda forgot about the chip after its second schedule slip, and I haven't seen it mentioned much since then... it should have been out for at least a year now.
Since lots of people seem to be missing the point of "hyperthreading", as Intel is calling it, I feel like jumping in and trying to clarify a little bit.
Processor clocks have gotten faster and faster and faster and faster over the last decade. Multiple orders of magnitudes faster. Not only that, but processors have incorporated increasingly clever tricks to process the data they have available to them. Memory speeds have increased too, but even with DDR and all that great stuff, they haven't kept pace. So there are times when your super-fast processor is just sitting there waiting around because it's run out of data to process.
Even if you could (cheaply) make memory that actually ran at 2 GHz or whatever, this would not solve an even more fundamental problem that makes the situation worse: due to the speed of light, a 2 GHz processor is going to have to wait a really significant amount of time if it has to wait on main memory before it's time to process something.
So, here's a question for you: if the processor has to wait a really long time, maybe enough time to execute maybe like 50 instructions, what should it do during that time? Should it:
Well, the idea behind the hyperthreading (a/k/a thread-level parallelism) is that the processor should make some sort of effort to do something.
So, IMHO hyperthreading isn't stupid or a marketing ploy. It's a genuine attempt (one that many processor makers are working on, by the way) to solve a genuine problem. And not only a genuine problem, but one that will increasingly become a bottleneck. (It's already bad enough that it has its own name: "The Von Neumann Bottleneck".)
And by the way, the advantage of this over two processors is that you don't have to build two chips! You don't get double the performance, but it's quite possible that you might get a better bang for the buck. (Notice I said "might".)
Also note on the cache pollution issue (where one thread slows down another by "hogging" the cache and actually causing slower execution for another) that there are ways to mitigate this problem. An obvious one that comes to mind is to bias the processor towards executing a particular one of the threads. That way, one thread runs much more often and should tend to have what it needs in the cache.
Anyway, until the economy gets better and I find a way not to be one of the masses of unemployed software developers anymore, I'm not buying one of these fancy processors...
To scale well you want to lock data rather than code and that can lead to many locks when you are operating on many structures. Ideally these locks each have less contention and better data sharing than "bigger" locks.
Coming from a limey perspective, Prescott brings up images of our beloved deputy PM - not a pleasant sight: John Prescott in his prime.
He's also known for being large, rough and prone to overconsumption of reources (2 Jag's) - maybe Intel do know what they're doing.....
If green text on a black background isn't your style, check out the HT article over at www.extremetech.com
running factor 12344322343342231127
number of parallel processes / total executiontime
1 2.87000000000000000000
2 2.99500000000000000000
3 3.53000000000000000000
4 4.00500000000000000000
5 5.12600000000000000000
6 6.05500000000000000000
7 7.31285714285714285714
8 8.04875000000000000000
there are 2 xeon 2ghz CPUs in the system with hypertrheading activated thus pretending there are 4 CPUs
in the case of 4 or 8 parallel prozesses the execution time is still about 0.70 percent of what one would expect from only 2 CPU's that means about 40% more performance.. not too bad..
mond.
From my understanding of the article - HT (with appropriately written code excuting) drastically reduces wasted cyles in the execution core - so what you should be comparing is 4 or 8 normal cpu's with 4 or 8 HT capable cpus - and yes I'm sure they'll cost more, at least at first - a Porsche 993 costs 10 times as much as a Ford Focus, but only goes less than twice as fast but in certain circumstances it's useful to have all the power in one unit.
I would agree that it makes no sense right now to buy a 2 CPU HT architecture system over an 8 way (or however many you can get for the same cash) SMP system, but as the technique is perfected the price of these CPU's will come down, and as compilers and application/OS developers use more multi-threading in their products, it will make more sense.
Just so that it won't be later patented...
The starvation issues with symmetric-multithreading can easily be addressed by keeping an instruction count for each virtual thread; perhaps hooked to an interrupt the OS can use to tell when each thread has consumed its allotted processor resource.
That way, threads that have been starved for resources will remain in the process core longer than the any who happen to "hog" a resource. In other words, instead of time slicing, you can use instruction slicing to insure fair use of the scheduler between contending threads.
Volla! Problem solved. (Not counting the dozen man-years it would take to implement.)
I think the poster's argument goes like this.
When you HT a CPU you, basically, divide the core between two processes. So a 2.8GHz CPU roughly gives out 1.4G to each process.
Today, a dual 1.4GHz CPU costs $200. A single 2.8GHz Xeon CPU costs $500.
Could that $300 difference be better spent on a making a dual/quad/Nway support chipset?
Yes, the 2.8G will get cheaper, but so will the 1.4. Someday a 4G Xeon will come 'round, but the 2G would still be far cheaper.
The point is, I think, that HT is an incremental advantage. Yes, it keeps more transistors firing on the CPU, but if you double the number of transistors with multi-CPU SMP you end up with even more of them to fire -- at less cost.
What the poster misses, tho, is that it costs Intel just about as much to make a 2.8G part as it does a a 1.4. It would much rather sell 1 CPU at the higher price, than 4 at the lower.
UC San Diego has been a leader in research on hyperthreading. We used to have the Tera MTA, which kinda pioneered the whole field, and we have Dr. Dean Tullsen (and his lab of students), whose hyperthreading architecture was used in the new, now-cancelled, alpha chip.
l
.plan file at the time openly questioned the same claim, so I took the single threaded, computation-intensive utility for Quake2 (BSP; LIGHT & VIS are multithreaded) and ran them on the Tera. Nutshell: it couldn't find parallelism. The 300Mhz Tera supercomputer ran at the equivalent speed of a 600Mhz Pentium. Which is crap considering the incredible memory bandwidth and number of computational units it had available.
References: The Tera: http://www.cs.ucsd.edu/users/carter/Tera/tera.htm
Dean Tullsen: http://charlotte.ucsd.edu/users/tullsen/
I was one of the first five students to use the Tera after it came out of development. I decided to take a different approach in evaluating its performance. I didn't like what the Tera corporate benchmarkers were doing. Which was taking applications with known parallelism, writing a serial version of the code, and then post with glowing reviews the results of the Tera automatically finding parallelism, ignoring that the number of pragmas they had to put into the code to allow the compiler to discover parallelism was more work that just writing a parallel code oneself.
I instead called them on their advertising that their compiler could discover latent parallelism in any computation-heavy code. I noticed John Carmack's
When I reported the results to Carmack, his response was, "I have never been a big believer in magically parallizing dusty deck codes. I don't mind specifying explicitly parallel activities and threads, especially with the large payoffs involved."
Cheers,
Bill Kerney
that is all :D
Darn... I thought all the P3 Xeons had the 2MB cache... thanks!
The roots of education are bitter, but the fruit is sweet.
--Aristotle
Very few things actually get manufactured these days, because in an
infinitely large Universe, such as the one in which we live, most things one
could possibly imagine, and a lot of things one would rather not, grow
somewhere. A forest was discovered recently in which most of the trees grew
ratchet screwdrivers as fruit. The life cycle of the ratchet screwdriver is
quite interesting. Once picked it needs a dark dusty drawer in which it can
lie undisturbed for years. Then one night it suddenly hatches, discards its
outer skin that crumbles into dust, and emerges as a totally unidentifiable
little metal object with flanges at both ends and a sort of ridge and a hole
for a screw. This, when found, will get thrown away. No one knows what the
screwdriver is supposed to gain from this. Nature, in her infinite wisdom,
is presumably working on it.
- this post brought to you by the Automated Last Post Generator...