Intel Hyperthreading In Reality

One Quesion.... by DocChaos · 2002-02-20 09:35 · Score: 1

WHY? I mean, come on... If you want two processors, shouldnt you have 2 processors in the systems???

--
DocChaos -------- I may be crazy, but then again I may be crazy.

Re:One Quesion.... by Matrim9 · 2002-02-20 09:40 · Score: 1

Allows more processors in less space. I don't know about performance figures...
Re:One Quesion.... by Glonk · 2002-02-20 09:43 · Score: 3, Informative

WHY? I mean, come on... If you want two processors, shouldnt you have 2 processors in the systems???

Maybe because SMT makes the die 5% bigger, while 2 processors is upwards of 100% bigger? This is where a thing called "cost" comes in.

SMT essentially allows for the CPUs to be used more efficiently. A lot of the time an ALU will sit idle while the FPUs work, and with SMT both can work at the same time on different threads.
Re:One Quesion.... by Anonymous Coward · 2002-02-20 09:45 · Score: 0

No, it doesn't allow more processors in less space. It *tricks* the os into thinking there are 2. Notice...there are not 2.
Re:One Quesion.... by Matrim9 · 2002-02-20 09:51 · Score: 1

It does allow more processing in less space, which was the point I was making.
Re:One Quesion.... by guamman · 2002-02-20 10:20 · Score: 1

But what if you want four processors (as I do)? Now it makes that possible. There will always be people out there who want more power, this is just another way to give it to them.
Re:One Quesion.... by rhost89 · 2002-02-20 10:54 · Score: 1

Reminds me of IBM's Power arch for their z/series mainframes. 1 Cpu is actually 8 CPU's. Is it any faster?? IMHO not any faster then if you had 8 actual CPU's in there. Its really more of a space/cost/development cutting move then a performance move. Intel just wants to spin it that way (As they allways do EG. MHZ inflation 2 GHz P4 !> 1..6GHz P3)

--
I will bend your mind with my spoon
Re:One Quesion.... by Anonymous Coward · 2002-02-20 11:24 · Score: 0

Well, stands to reason that if the processors are on the same die then they don't have to communicate as much as two separate processors...less overhead.
Re:One Quesion.... by Anonymous Coward · 2002-02-20 11:27 · Score: 0

the cpu will be able to have the pipelines operate at a better capacity. means it can do a better branch prediction 'n stuff.
better parallizing it's work.

frankly, it's just a way to optimize effiency of the cpu.
Re:One Quesion.... by khuber · 2002-02-20 12:01 · Score: 1

As usual, old tech gets posted to Slashdot and a slew of stupid comments follows. Is this not a tech site?
All this (SMT) is is basically another identical set of registers so overhead due to context switching can be moved to the CPU. Now the CPU can keep some of the state of a running task instead of the OS. Intel is not the only company using it -- Sun has similar technology in their MAJC processor which was announced a long time ago.
There's a little more to it since you can run two seperate processes in the pipeline at once, but it's still not like having two actual CPU cores on a single die.
-Kevin
Re:One Quesion.... by Anonymous Coward · 2002-02-20 12:15 · Score: 0

It's not about extra processors, it's about using your processor(s) more efficiently. Making it appear to be two processors is just the easiest (read most backwards compatible) way of making use of the reclaimed cycles.
Re:One Quesion.... by mandolin · 2002-02-20 14:36 · Score: 2

This is where a thing called "cost" comes in.
The cost savings also applies to the motherboard (and the motherboard design) since you don't have two (real) cpus contending for resources. Last I checked (couple years ago), the price between a dual-proc mobo and the single-proc equivalent was approx $100, a significant fraction of their cost.
Re:One Quesion.... by SK-null · 2002-02-20 15:11 · Score: 1

Sorry, but zSeries aren't based in PowerPC.
And yes, its all about how much power you can buy with a given budget. But isn't everything?
Re:One Quesion.... by blackwings · 2002-02-20 20:00 · Score: 1

Exactly!
Re:One Quesion.... by rhost89 · 2002-02-21 07:17 · Score: 1

I didnt say PowerPC i said Power arch.. as in Power Generation 5 and Power Generation 6 CPU's used by IBM on their OS/390's. Thay are basicly 8 CPU's in one CPU. 2 cpu's per pack 4 pack's per die.

--
I will bend your mind with my spoon

That explains it by Pharmboy · 2002-02-20 09:37 · Score: 4, Funny

That would explain why they cost twice as much as they should :-)

I am not as concerned with how it "tricks" the OS as much as I am about performance and reliability. Tell me how this actually makes the chip BETTER and I might get excited.

--
Tequila: It's not just for breakfast anymore!

Re:That explains it by Phs2501 · 2002-02-20 09:47 · Score: 5, Informative

Basically, as I understand it, it allows closer to 100% use of your CPU at any time.
Modern CPU's have many different execution units. Depending on the code running, not all of them may have work scheduled. Future work may depend on previous results; obviously you can't do this in parallel. The idea of "HyperThreading" is to run more than one thread of execution at a time with the multiple execution units - so more work gets done per clock cycle.
A quick Google search turned up an article here. At one point I read a really excellent article on single-processor multithreading (discussing a future Alpha processor) but I can't find it anymore. Hopefully AMD will do something like this as well for a future Hammer processor.
Re:That explains it by Anonymous Coward · 2002-02-20 09:47 · Score: 1, Informative

> I am not as concerned with how it "tricks" the OS as much as I am about performance and reliability. Tell me how this actually makes the chip BETTER and I might get excited.

Read the freaking article.

[simplified summary]

The processor can handle thread scheduling better than the os can handle thread scheduling. Claiming to be 2 processors pushes half of the scheduling from the os to the processor. Net performance gain is expected to be around 10% when the number of active threads is at least twice the number of processors.
Re:That explains it by Pharmboy · 2002-02-20 10:02 · Score: 1

Basically, as I understand it, it allows closer to 100% use of your CPU at any time.
I don't think I made myself klear in my comments:) Until I actually see them in the field, and hear slashdotters and others say "yea, they are reliable, and they perform better" I am going to be very skeptical. Also, since you don't get a 100% increase in performance with 2 cpus, I wonder what kind of overhead it produces.
Also, will I have to build an SMP kernel for the thing on a linux box? (just curious, the link you gave was dead :-/ )
And to the AC's who post "read the article", I did, from the Gamingpc site (well most of it, their site is slow as sin). I generally don't buy Xeons to play Counter Strike, I buy them to run servers. Since Intel has not manufactured Xeons for quite a while, I DO question reliability and performance of this unproven chip. At least you can use them in your old Xeon MB's.

--
Tequila: It's not just for breakfast anymore!
Re:That explains it by Steveftoth · 2002-02-20 12:05 · Score: 2, Funny

So basically Intel is saying that you can get up to a 30% performance boost with Hyperthreading enhanced code right? Therefore their design of the P4 is only 70% (probably less) efficiant. Same can be said of the Athlon as their CPU is not 100% either. It would be interisting to see how much of a performance boost Athlons get due to this technology. I venture to say that it would be less since the athlons do more work per clock then P4's.
Re:That explains it by tcr · 2002-02-20 12:57 · Score: 2, Funny

Basically, as I understand it, it allows closer to 100% use of your CPU at any time.

Wow, I hope AMD run with this kind of idea as well.

Who'll need central heating in their home anymore? ;-)

--

Information wants to be beer.
Re:That explains it by sharkey · 2002-02-21 07:10 · Score: 2

Basically, as I understand it, it allows closer to 100% use of your CPU at any time.

Sssoooo... Dr. Watson will use 99.5% of my CPU, rather than just 99%?

--

--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
Re:That explains it by gkatsi · 2002-02-21 10:02 · Score: 1

http://www.realworldtech.com/page.cfm?section=news &AID=RWT121300000000&date=12-13-2000
http://www.r ealworldtech.com/page.cfm?section=news &AID=RWT122600000000&date=12-26-2000
http://www.r ealworldtech.com/page.cfm?section=news &AID=RWT011601000000&date=01-16-2001

Excellent articles about Alpha EV8
Re:That explains it by Anonymous Coward · 2002-02-21 18:46 · Score: 0

How about:

Alpha EV8 (Part 1): Simultaneous Multi-Threat
Alpha EV8 (Part 2): Simultaneous Multi-Threat
Alpha EV8 (Part 3): Simultaneous Multi-Threat

Anonymous Coward

Hmmm. Money saving idea? by caferace · 2002-02-20 09:39 · Score: 4, Funny

So I can just buy half a processor, and get full functionality? ;)

Re:Hmmm. Money saving idea? by TheGeneration · 2002-02-20 09:57 · Score: 1

Hmm, it really depends on how it is being accomplished. since 1=2, and 2=4 then it must be 1+1 and 2+2 so by that standard 3=6, and 1/2=1

However, if it's using a method of squaring above 1 then 3 would equal 9, and 1/2 would equal 1/4.

--

The Generation
I'd say something witty here, but I'm not that bright.
Re:Hmmm. Money saving idea? by Anonymous Coward · 2002-02-20 10:06 · Score: 0

However, if it's using a method of squaring above 1

Uhh... you mean squaring any processors other than 1 right? :)
Re:Hmmm. Money saving idea? by GTRacer · 2002-02-20 10:41 · Score: 2

Uhh... you mean squaring any processors other than 1 right?
But I thought processors were rectangular to avoid installation boo-boos. Imagine the support calls when users can't figure out which side of a squared proc is which!
GTRacer
- "Dogbert's Help Desk - How may I not help you?"

--
Defending IP by destroying access to it? That makes sense, RIAA/MPAA. Go to the corner until you can play nice!
Re:Hmmm. Money saving idea? by Anonymous Coward · 2002-02-20 15:40 · Score: 0

So I can just buy half a processor, and get full functionality? ;)
which half are you going to buy?

the outside? K.
Re:Hmmm. Money saving idea? by Fjord · 2002-02-21 05:39 · Score: 1

As funny as this sounds, this is the exact problem with 486s. A few months ago I bought a replacement 486 chip for one that had died. When I went to install it I realized that it could go in any way. I emailed the guy I bought it from and luckily he knew the correct orientation (I could not find a motherboard manual on the web).

--
-no broken link

Hyperthreading? by NWT · 2002-02-20 09:39 · Score: 0, Redundant

Hyperthreading, the feature that can theoretically turn your 2 physical CPU's into 4 virtual CPU's

This is quite kewl, i could turn my 2 cpus to 4, then to 8, 16, 32 ... *zwusch*
I like magic stuff like that :)

--
Life sucks.

Re:Hyperthreading? by mini+me · 2002-02-20 09:51 · Score: 1

Turn your brand new (read: expensive) Intel CPU into 32, 386 processors.
What a deal!!!
Re:Hyperthreading? by metalpet · 2002-02-20 11:49 · Score: 2, Funny

32,768 sorry. had to.
Re:Hyperthreading? by Anonymous Coward · 2002-02-20 12:26 · Score: 0

yeah but who ever heard of a 768 processor?

I wish someone would hyperthread... by Anonymous Coward · 2002-02-20 09:39 · Score: 1, Funny

...my keyboard so I could in essence have four hands.

Hyperthreading useless on Win2K? by syzxys · 2002-02-20 09:39 · Score: 5, Informative

Hyperthreading is a pretty cool idea, especially for those of us who would like to see SMP move more into the mainstream.

According to this article, though (posted on 2cpu.com), the Windows 2000 scheduler doesn't know how to take advantage of hyperthreading, since it doesn't know how to take advantage of virtual processors. (I suppose Windows XP does?) Go figure. Anyway, this looks like it's probably worth checking into. I'm sure Linux will support it!

---
Have you crashed Windows XP with a simple printf recently? Try it!

Re:Hyperthreading useless on Win2K? by NWT · 2002-02-20 09:48 · Score: 0, Redundant

Here's what the article says:

In our case, since we ran with dual Xeon processors (each with hyperthreading capabilities), the OS and software see this as four physical CPUs, even though there are only two physical CPUs running. As you can see by the device and task managers in Windows XP, the OS sees our system with four physical CPU's. Eeven though Windows 2000 and Windows XP only officially support two CPU's, both operating systems were able to run properly with the Hyperthreaded CPU's. This means you don't have to upgrade to a 4-processor OS like Windows 2000 server to take advantage of this technology.

--
Life sucks.
Re:Hyperthreading useless on Win2K? by Anonymous Coward · 2002-02-20 10:07 · Score: 0

For obsessive compulsive karma whores it does.
Re:Hyperthreading useless on Win2K? by Blue+Lozenge · 2002-02-20 10:09 · Score: 5, Informative

Here is a quote from the article:
Since Hyperthreading is implemented on the hardware level, the motherboard sees a single hyperthread-compatible CPU as two physical CPUs. Thus, software that is written for multiple CPUs will be tricked into thinking there is a second CPU in the system, and will run the appropriate multithreaded code if available. Since Windows XP and 2000 are coded to take advantage of multiple CPU's, it too sees a hypertheaded CPU as two.

It would seem that you don't need special OS support beyond standard SMP.
Re:Hyperthreading useless on Win2K? by bmajik · 2002-02-20 11:17 · Score: 2

Disclaimer: heres what I've heard from "credible sources" You may want to verify this with a bunch of benchmarks.

What actually happens is that W2k thinks it has 2 CPUs, when it really has say 1.3 effective CPUs (hyperthreading isn't a 2x perf speedup by any means!!)

This sends the scheduler into fits on w2k. Additionally, it means you cant use a copy of w2k licensed for 2 cpus on a 2 cpu box if each cpu features hyperthreading, since it will look like a 4 cpu box.

Basically, stay away from hyperthreading unless you're using xp, or some other OS that handles it right (do any other oses handle it right ?)

--
My opinions are my own, and do not necessarily represent those of my employer.
Re:Hyperthreading useless on Win2K? by Cyberdyne · 2002-02-20 12:05 · Score: 3, Informative

Additionally, it means you cant use a copy of w2k licensed for 2 cpus on a 2 cpu box if each cpu features hyperthreading, since it will look like a 4 cpu box.

According to a recent post on linux-kernel, there's a BIOS-level hack to work around this: the "real" CPUs are always listed before the "virtual" CPUs. So, if you boot a copy of XP licensed for 4 CPUs on a machine with 4 hyperthreaded CPUs, it will use all four real CPUs, and ignore the hyperthreaded element. (The downside is that processor IDs aren't as obvious under Linux; you'd expect CPU#1 to be the "second half" of CPU#0, but it isn't...)
Re:Hyperthreading useless on Win2K? by (outer-limits) · 2002-02-20 15:52 · Score: 1

I believe this must infringe on the DMCA, when will Microsoft be taking Intel to court. I can hardly wait.

--

Microsoft - Where would you like to go today, Maybe Jail?
Re:Hyperthreading useless on Win2K? by Anonymous Coward · 2002-02-20 19:42 · Score: 0

Except that things are never that simple. A single hyperthreaded CPU is not the same as two CPUs. One reason (among others) is the caching behavior of the hyperthreaded CPU vs. that of multiple CPUs. If the OS isn't aware that the caching behavior is NOT the same as the multiple CPU case, you're in for a lot of performance problems down the line.

Reuven
Re:Hyperthreading useless on Win2K? by fred3666 · 2002-02-21 03:15 · Score: 1

While it is true that w2k licensed for 2 CPUs will "see" 4 CPUs when two SMT Xeon's are installed, Windows XP has supposedly been fixed in this regard. I would guess that it reads the CPUid to find out if there are known SMT CPUs in use.

The problem? Windows XP Professional maxes out at 2 CPUs. There is no WinXP Server, Advanced Server or Datacenter to support additional processors (virtual or otherwise).

So your maximum power under XP Pro will be 2 SMT Xeons or 4 virtual CPUs.

The next server release of Windows (.NET?) will be designed for known SMT CPUs like XP so here is when you could load up a rediculous number of SMT CPUs and go nuts.

You lot should all realize that these CPUs will be rediculously expensive and server boards that support more than 2 SMT CPUs are also prohibitavely expensive so don't expect a 4 SMT (8 virtual) system playing Quake or Max Payne anytime soon. Actually even a regular Xeon motherboard is expensive enough, never mind a 4 CPU one.

Watch out for the XBox 2 from Intel by Asterax · 2002-02-20 09:40 · Score: 0, Offtopic

Heh, the XBox 2 is coming now instead of Microsoft, Intel will get into the Video Game action.

Re:Watch out for the XBox 2 from Intel by gbv23 · 2002-02-20 10:26 · Score: 1

Intel built the motherboard that is used in the xbox
Re:Watch out for the XBox 2 from Intel by Anonymous Coward · 2002-02-20 22:07 · Score: 0

no they didn't

nvidia did. MS has an Intel license which Nvidia had access to because they were contracted by MS to design the mb,gfx & sound chips for the xbox

More info on the Prestonia XEON Chip by lemonhed · 2002-02-20 09:40 · Score: 1

The XEON chip started shipping bacnk in January. The Prestonia server chip is made on the 0.13-micron process. Intel cancelled the .18 micron process last year to focus on the .13 micron process. That is amazing!!!

Alot of people are giving reviews for the new XEON chip

Here is a link To another review of the XEON chip.

Ouch by Andrewkov · 2002-02-20 09:40 · Score: 5, Funny

Wow, kinda sucks if your OS has a per CPU license, like NT and Win2K server!

Re:Ouch by syzxys · 2002-02-20 09:44 · Score: 1

Yeah, we'll have to "upgrade" to Win2K Advanced Server to get enough processor licenses for a dual processor hyperthreading box. Otherwise we'll be violating the EULA, and they might hit us or something. (*cringes*). Great, I can't wait.
---
Have you crashed Windows XP with a simple printf recently? Try it!
Re:Ouch by Amazing+Quantum+Man · 2002-02-20 09:47 · Score: 2, Redundant

RTFA. From the article:

Eeven though Windows 2000 and Windows XP only officially support two CPU's, both operating systems were able to run properly with the Hyperthreaded CPU's. This means you don't have to upgrade to a 4-processor OS like Windows 2000 server to take advantage of this technology.

--
Fascism starts when the efficiency of the government becomes more important than the rights of the people.
Re:Ouch by Anonymous Coward · 2002-02-20 09:49 · Score: 0

It still violates the EULA which is what they were saying. Sure it probably works great but Microsoft nails every time you turn around because they can.
Re:Ouch by Wesley+Felter · 2002-02-20 10:44 · Score: 2

What exactly does the EULA say? Does it say you're only allowed to use 2 physical CPUs or 2 CPU contexts?
Re:Ouch by KidSock · 2002-02-20 13:38 · Score: 3, Interesting

Wow, kinda sucks if your OS has a per CPU license, like NT and Win2K server!

You want to know something even better? The way CPU ids are managed is by bits in an integer. Every other bit represents the "virtual" CPUs. Now when the Windows kernel is selecting CPUs for schedualing purposes it enumerates them in order. This means that when a process is schedualed to run on the next available CPU theres a very good chance it will get a virtual CPU even though a real CPU is completely free. So if you have an 4 CPU machine (4 real, 4 virtual means 8 Hyperthreaded total) and you have 4 processes that can run only 2 real CPUs will be used.

Ok, ok, ok stop laughing. Here's the kicker. MS fixed this. But did they provide the fix to there customers? No. You have to get the Data Center version to enumerate your CPUs properly!
Re:Ouch by evil_one · 2002-02-20 13:56 · Score: 2

Let me know when I, a typical home user, can afford one of these chips in place of a "regular" one, and then we'll look at what OSs properly support the CPUs.
In the mean time, it's the corporations that will be buying computers based on this chip, and have the money to purchase the OS to match.

--
Desperation is a stinky cologne
Re:Ouch by KidSock · 2002-02-20 15:20 · Score: 2

Uh, how about never. This is only an issue on machines with 4 procs or more.
Re:Ouch by Anonymous Coward · 2002-02-20 18:55 · Score: 0

i think thats the point hes making. it says that its a problem on 4 processor servers, and that only companys would have it, so the company can just buy the software
Re:Ouch by Anonymous Coward · 2002-02-20 19:55 · Score: 0

It's not quite as simple as enumerating CPUs in order. There's a definite algorithm that's executed to decide which CPU to schedule a ready process on, and the best decision is not always what you might think it is. For example, assume that you had two hyperthreaded CPUs, which show up in the OS as CPUs 0-3 (where 0/1 and 2/3 consist the two physical processors ). Furthermore, say that virtual CPU 1 is running some process P0, and that P1 is ready to be scheduled. The question to answer is which virtual processor to schedule P1 onto?

The easy answer to this is to always schedule P1 on either CPU 2 or CPU 3 to get the most utilization. However, if that last time P1 was scheduled it was scheduled on one of virtual CPUs 0 or 1 (the first physical processor), cache information for P1 may still be in the cache associated with that physical processor. In this case it may very well be desirable to schedule P1 on CPU 0 even though it will have to share the CPU with P0! However, if a significant amount of time has passed since P1 has been scheduled, we may be back in the case where CPU 2 or 3 is more desirable since most of the cache information is gone.

In general, it's hard to find absolutely correct answers to these questions. People generally produce somewhat ad-hoc algorithms based on some particular chain of logic, and then test out the different algorithms on "typical" workloads. The algorithm that wins generally depends on the different definitions of "typical."

Reuven
Re:Ouch by KidSock · 2002-02-20 20:17 · Score: 2

Well, I don't remember the exact details although I think it had to do with how CPU ids where initially assigned on boot rather than how processes were actively schedualed. Regardless I think the effect I described is accurate. I wish I could remember where I saw the discussion.

Argh, 12 pages! by mESSDan · 2002-02-20 09:42 · Score: 5, Informative

Make sure you use the Printer Friendly view, that way you don't get 12 pages of slashdotted hell! Look here.

--

-- Dan

Think of it as out of order execution ..glorified by esses · 2002-02-20 09:42 · Score: 3, Informative

Basically what they're doing is simply taking unused processor resources and allocating them to another thread. You can now have multiple _threads_ of excecution simultaneously... truely simultaneously.

Thread X is using register's B and C
Thread Y can able to use registers A and D.

These threads can be executed together without a context switch... and the processor will hunt out these relationships in hardware. That's what "the big deal" is.

Until now, when a processor "multitasks", it's simply switching from one thread of execution to the next... it allocates separatetime to two different threads....Now it can allocate the exact same timeslice to multiple threads as long as there isn't a resource dependancy.

If your program can be architechted to take advantage of this (or your OS can schedule tasks like this), you'll get a huge benifit (read: if it works on SMP systems, it'll get some benifit on this as well).

Finally! by rw2 · 2002-02-20 09:43 · Score: 3, Funny

I've been waiting for literally *years* for a CPU that will trick my operating system! Nirvana, I kiss you!

Overheating by chukm · 2002-02-20 09:43 · Score: 1

With AMD's past history with overheating under heavy use (read overclocking), wouldn't this hyperthreading just compund the issue by tricking the OS into overworking the CPUs?

Re:Overheating by Anonymous Coward · 2002-02-20 09:48 · Score: 0

...or a "virtual" cluster :-)
Re:Overheating by Matrim9 · 2002-02-20 09:49 · Score: 1

No. Heavy use does not mean overclocking, not at all. If you run your CPU pegged, the chip wouldn't overheat... and isn't the topic of the post about Intel doing this?
Re:Overheating by Anonymous Coward · 2002-02-20 09:54 · Score: 1, Interesting

Dear Clueless,

The article is about Intel not AMD.

Failure due to overheating is generally a user incompetence problem not a design problem.

High-end business servers aren't typically overclocked.

No, nobody really buys these processors for anything but high-end business servers.

Hyperthreading doesn't magically make the CPU work at 120% capacity. Even without using hyperthreading, different tasks could make the processor work just as hard.

Hope this clears up some issues for you.
Re:Overheating by castlan · 2002-02-20 10:24 · Score: 3, Interesting

Overheating looks like a valid concern in this case. While overclocking will push the limits of heat dissipation, that is not the same as heavy use. An overclocked processor will still generate significant heat even when in an idle loop.

The issue that concerns me is that most consumer CPUs aren't designed with true heavy use in mind, and the specs usually consider that most of the time, the standard processor is not pegged. This can be an issue if full time compute processes don't give the processor time to idle, as in Seti or Distributed.Net usage. That is why these projects specifically warn against overclocking - the combination adds up.

Now even with a full time load, like with the Distributed.net client, the entire processor die isn't generating heat - some of the CPU logic remains idle. This still allows for a buffer for heat dissipation, as slight as it might be. Now with this hyperthreading technique, most of the die can be actively generating heat simultaneously, pushing the heat generation potential higher than the specs likely considered.

Considering that the largest problem that all Intel processors had since the Pentium 60 involves inability to deal with sufficient heat dissipation, this concerns me deeply. I fear the day soon approached where the Intel processor code names are based on the Black Body Effect: The low end "black" "dull red" and "infra-red" models are outmatched by the "Hot-white" and "blue blaze" series, but much of the extraordinary cost is attributed to maintaining active-cooling systems that are spontaneous-combustion-retardant.

And the Melting point of silicon substrate with varioius doping agents will soon become common knowledge.

-castlan
Re:Overheating by Anonymous Coward · 2002-02-20 10:39 · Score: 0

The hyperthreading tries to fill unused function units with instructions from a virtual CPU. Depending on whether they dynamically shutdown parts of the CPU when not in use, this might or might not have an impact on power.

Having a higher utilization on the CPU might add extra heat in the CPU. At a system level, the overall amount of heat would be less than having multiple CPUs with equivalent processing power.

Does that mean AMD would start naming their Athlon as XP2000+ *2 when they have something similar ? ;)

I don't know if VLIW would have a better performance (Compiler schedule instructions trying to fill as many function units as it can) if binary compatibility is not an issue.
Re:Overheating by csbruce · 2002-02-21 03:57 · Score: 2

So I should stop running my seti@home processes?

Just imagine by Jesse+Duke · 2002-02-20 09:43 · Score: 1, Funny

a cluster of a cluster of these ...

You don't have a choice but to imagine... by Anonymous Coward · 2002-02-20 09:45 · Score: 0

...a Beowulf cluster of hyperthreaded Intel processors. They're all virtual anyway.

Doesn't Win2k Pro... by Matrim9 · 2002-02-20 09:45 · Score: 0, Redundant

Only support a max 2 processors? Does this mean that anyone who wants more than 1 real processor has to run the Server edition of one or the other? And... what about per-cpu liscences? If I wanted to run my dual-Xeon mobo before, I only needed a liscense for 2 cpus...

Re:Doesn't Win2k Pro... by Anonymous Coward · 2002-02-20 09:49 · Score: 0

Ah ah ah! Don't say the M word. Dude this is slashdot. Do you want to be shot?
Re:Doesn't Win2k Pro... by Anonymous Coward · 2002-02-20 09:58 · Score: 0

RTFA. Win2k pro worked fine with 2 real/4 virtual CPUs. Windows apparently counts the physical number of CPUs not the virtual number.
Re:Doesn't Win2k Pro... by Matrim9 · 2002-02-20 10:00 · Score: 1

RTFP. You'd still be violating the letter of the EULA. ["You used the multiprocessing system 4 times, so you're paying up!"] It's a moot point if you CAN do it or not.
Re:Doesn't Win2k Pro... by Joe+U · 2002-02-20 10:49 · Score: 1

I thought the EULA said CPU's not virtual CPU's. I don't see a violation.

Sounds great in theory, not great in practice. by yobbo · 2002-02-20 09:45 · Score: 1, Interesting

Sure, it does sound good having the ability to pose as 2 cpu's, but you won't get the performance that you would from a real dual cpu setup.

And, because of this AMD is at advantage. Athlon is much much smaller at an equal fabrication process, so even if hyperthreading took off, AMD would be able to combine 2 cpu cores in the one chip and still be able to compete easily in terms of die size and attain a higher level of performance, because 2 real cpu's will beat 1 cpu posing as two any day.

Re:Sounds great in theory, not great in practice. by Anonymous Coward · 2002-02-20 09:56 · Score: 0

But if you had two of these processors, not only would you get the performance of two processors, but you'd get the performance of hyperthreading as well..
Re:Sounds great in theory, not great in practice. by yobbo · 2002-02-20 10:00 · Score: 1

And if you had 2 dual processors on a chip, you too would get the extra performance - more so than 2 hyperthreading enabled processors.

The point is - the P4's die is so large that any advantage of hyperthreading can quickly be countered by putting 2 small die processors on a single chip, such as athlon.
Re:Sounds great in theory, not great in practice. by dreamchaser · 2002-02-20 10:00 · Score: 2

Let me guess...if AMD had come up with this, you'd be telling us how it was bad for Intel too. What nonsense! AthlonXP's are NOT 'much smaller' at an equal fabircation process (I am assuming you meant the process size).

Don't get religious about your CPU's...it's not only bad form but it's childish too :)
Re:Sounds great in theory, not great in practice. by Anonymous Coward · 2002-02-20 10:04 · Score: 1, Interesting

AMD can't currently combine 2 CPU cores on one chip due to issues other than die size. The biggest two are power consumptiom/routing and thermal output.

And why would you compare 2 CPUs to 1 hyperthreaded CPU? # of CPUs is limited by the CPU hardware itself. If AMD's processor had a limitation of 8, then it wouldn't matter if it was 4x2 or 8x1 in configuration. And if Intel's processor also had a limitation of 8, then 8 hyperthread CPUs could do more work/time than either a 4x2 or 8x1 configuration of AMD CPUs (assuming all CPUs had the same base performance of course).
Re:Sounds great in theory, not great in practice. by Master+Bait · 2002-02-20 11:19 · Score: 1

After reading the article through, I can see how right you are. Seriously, that Xeon is a DOG. The benchmarks between this and the Athlon MP clearly show little advantage.
The only measured advantage showed up in the Linux kernel compile at j=4 where SMT gained 2 seconds out of a total of about 2 minutes for the Xeon. I've been hearing Intel make a lot of excuses about how their new technologies need 'better compilers' and 'optimized code'. Sounds more like the emperor's new clothes to me. Maybe Intel wouldn't need the 'new compilers' excuse if they kicked up the Xeon's level 1 cache up a lot more than the paltry 8k... SMT wants a lot of L1 cache IMO.
Since this is the first I've read about SMT Xeon benchmarks, I decided to run over to pricewatch to check up on the prices for the products used in the test:

2ghz Xeon $430
Supermicro Xeon motherboard $541
AthlonMP 1900 $255
Asus dual Athlon motherboard $210

Which would you use? Which would you use if you were Michael Dell? By the way, isn't IBM putting two or more cpus in one of their Power4 chips?

--
"Only in their dreams can men truly be free 'twas always thus, and always thus will be."
--Tom Schulman
Re:Sounds great in theory, not great in practice. by n0-0p · 2002-02-20 11:51 · Score: 1

The main idea, though, is that you get more bang for your buck through better use of an existing processor. A 10% - 30% increase in processing power for a 5% increase in cost is pretty good math. Intel's prices may be high for the CPU's, but the technology is viable and it's only a matter of time before other CPU manufacturers start using it.
Re:Sounds great in theory, not great in practice. by n0-0p · 2002-02-20 12:00 · Score: 1

I almost forgot one of the really important points for Intel. Hyperthreading allows the processor to take much better advantage of the bandwidth of RDRAM, given RDRAM's memory latency. The two technologies complement each other very nicely, but once again RDRAM is another relatively pricey technology.
Re:Sounds great in theory, not great in practice. by Lithium+Element · 2002-02-20 13:09 · Score: 1

I would use the Xeons myself. Athlon-based systems really lack the kind of reliability that say a server needs...though that is mostly because of chipset problems, not the CPU itself. I believe that Michael Dell would use the Xeon as well...since it's a bit hard to sell an Athlon MP to a large company who needs a system that won't lockup (or burn up) on its own accord.
Re:Sounds great in theory, not great in practice. by Anonymous Coward · 2002-02-20 13:17 · Score: 0

Michael Dell pulls up PriceWatch and orders his CPUs from places called "Accubyte" and "PC XPERTS"?
Re:Sounds great in theory, not great in practice. by Master+Bait · 2002-02-20 19:04 · Score: 1

Athlon-based systems really lack the kind of reliability that say a server needs...though that is mostly because of chipset problems, not the CPU itself.
Seriously, what reliability problems are there in the 760mpx? I've heard about an IO/APIC error with Linux kernels, but this is a setup problem which is solved by disabling MPS1.4 in the BIOS. Same with some gamers not being able to run Quake under Windows--solved by reverting to an earlier Nvidia driver.
But you mention reliability. Data corruption? Boards failing while in use? I don't think so, but if you'd like to point some specific issues out to me, I'd like to know.

--
"Only in their dreams can men truly be free 'twas always thus, and always thus will be."
--Tom Schulman
Re:Sounds great in theory, not great in practice. by Lithium+Element · 2002-02-21 14:09 · Score: 1

I mainly meant VIA-based Athlon boards...I don't know much about AMD's current chipset offerings these days, though I know I'd much prefer their stuff if I owned one of their CPUs (I used to). With VIA all you have to say is "4 in 1"...since getting things right the first time has never been one of VIA's strong points. I know for a fact that their chipsets has some serious problems with burst transfers over PCI, which tends to make them somewhat worthless in any real mission-critical environment. I suppose someday AMD will shake off the consumer-class CPU reputation that it still yet has. When your biggest supporters are gamers and people who just want a 'bang for your buck' system that gaining a positive reputation for corporate servers and such will take some time. Maybe some Athlon MPs and this chipset you speak of would be sufficiently reliable, but keep in mind that AMD isn't well known at all for SMP stuff and definitely aren't pushed very strongly by the big boys (Dell, etc.) in corporate/business markets. Don't get me wrong though, I don't think that Intel is king of CPUs or anything, but an Intel-based SMP setup is a bit less risky...though a bit more expensive certainly.

The advantage of hyperthreading by howlingfrog · 2002-02-20 09:49 · Score: 3, Informative

A number of people have posted asking what the point was of making a single processor act like two processors. It's actually explained in the article linked to above.

Apparantly, he big deal is that a single processor can only handle one thread at a time--multitasking works by breaking programs down into threads, and working on one thread for a little while, then another, then another, then back to the first. But at any given time, only one thread is being actively executed. Hyperthreading changes this--a single processor can work on two threads truly simultaneously. This makes multitasking a hell of a lot more efficient.

--
The original Howling Frog is a fictional character and has no UID.

Re:The advantage of hyperthreading by greymond · 2002-02-20 10:03 · Score: 4, Informative

but then theres this: "While this looks great for showing off to co-workers or friends, you will absolutely NOT get the performance of four CPUs running in your system (I can't stress this enough). As you'll see in our benchmarks later, even if software is written to take advantage of SMP, you rarely ever see performance gains with Hyperthreading enabled."

--
Ave Molech Setting
Re:The advantage of hyperthreading by Webz · 2002-02-20 10:59 · Score: 1

Since you seem to know what you're talking about... =)

Is it me or does this seem like a game of catchup from sloppy, previous implementation? The CNET article about this says hyperthreading makes use of unused parts of the processor. What I want to know is why wasn't it being used in the first place... Like... Why is this news worthy, as in, what makes hyperthreading so revolutionary? The way you've explained it, it seems very fundamental.
Re:The advantage of hyperthreading by eviltwinimposter · 2002-02-21 08:05 · Score: 1

"Imagine what we could do if humans could use 100% of their brains!"

"You could jerk around all over the floor. It's called a seizure."

"Oh."

Re:Think of it as out of order execution ..glorifi by bjk4 · 2002-02-20 09:49 · Score: 5, Interesting

In hyperthreading, the logical processors do not share registers, just function units. Thus, if one logical processor needs to multiply while the other needs to add, they may share the CPU resources simultaneously.

This was developed in response to the observation that individual function units remained idle for multiple cycles while the current process was busy doing one kind of operation.

-B

A trial balloon? by barole · 2002-02-20 09:50 · Score: 1

The impression I got from the story is that Intel put this in now so that they can figure out what bottlenecks they face in turning hyperthreading into an advantage. Also, it gives compiler writer, OS writers, and application writers some exposure to the technology. I would not be surprised if a few years down the road this is a big win for some environments (ie, not office2.005k)

Actually, Intel put a second set of registers by Anonymous Coward · 2002-02-20 09:51 · Score: 0

From the article:

Hyperthreading is simply a method of placing a second set of registers on the processor core, allowing the processor to execute two "threads" at once.

Yeah it should really be by Daath · 2002-02-20 09:52 · Score: 2, Funny

Hyper means overexcited - how about Underexcitethreading - Makes your PC with two CPU's think it only has one! :P Oh wait... That's like windows 98...

--
Any technology distinguishable from magic, is insufficiently advanced.

Re:Yeah it should really be by compwiz3688 · 2002-02-20 10:03 · Score: 1

That's like windows 98...
or the Win9x series in general :)
Re:Yeah it should really be by Anonymous Coward · 2002-02-20 12:19 · Score: 0

Opposite of hyper is hypo.
Re:Yeah it should really be by castlan · 2002-02-20 13:36 · Score: 1

It seems the compund word you seek would by "HypoThreading", as opposed to HyperThreading. This might not be as facetious as you intended, as there are come approached to clustering or compute serving that might be considered an implementation of this hypothreading.

I can't recall off-hand how IBM handles this, but Linux doesn't support quite as many CPUs as does, say, IRIX. So the current solution seems to be to partition a large system into multiple smaller systems. Maybe it would be feasible to virtualize some form of Hypothreading to allow one system partition to utilize an entire enterprise class system.

Then again, since this is a gross "hypersimplification", I think not.

-castlan

Re:Think of it as out of order execution ..glorifi by dtr20 · 2002-02-20 09:54 · Score: 3, Interesting

The Intel Xeon actually has *another* set of registers to cope with the second thread.

Unfortunately, the big slowdown in computers is accessing memory and peripherals on the various buses. Looking at the details of the Xeon, it still competes (and queues) for access to memory.

It's also worth considering that although programs tend to have a few threads to look after things like printing while you carry on writing your document, you tend to by using one or maybe two threads heavily at once and the rest are just mostly idle, waiting on hardware and interrupts.

Intel themselves are claiming 10% speed improvement, even when compiled to take account of SMP, or 30% for specially optimised code (yeah as if that's going to be popular). Don't get fooled into thinking your PC is going 2x faster.

The printer friendly version... by JPriest · 2002-02-20 09:55 · Score: 2, Informative

the site is partly /.'d already but the printer friendly (non graphic) version seems to actually still load. http://www.gamepc.com/reviews/printreview.asp?revi ew=ppso&mscssid=&tp=

--
Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.

Feel the cold hand of the slashdot effect.... by Anonymous Coward · 2002-02-20 09:56 · Score: 0

"The remote site or network may be down. Please try again."

Wonder if that server is vibrating around the room right about now.... (remember the simpsons episode where they raced the washer and dryer in moe's tavern?)

It's not for games, stupid by Animats · 2002-02-20 09:57 · Score: 5, Informative

All this "hyperthreading" does is share some ALU resources between multiple threads. The big win is if one thread does lots of FPU work and the other doesn't. If both "hyperthreads" are hitting the CPU's computational resources hard, it probably won't help much.

And it may hurt. A downside of "hyperthreading" is that the threads contend for cache space, so if the threads are executing very different code, the cache miss rate will rise. Of course, this happens in ordinary threading on each context switch, but with "hyperthreading", there's a context switch of sorts on every instruction cycle. If this effect shows up, it will show in L1 cache miss rates.

This isn't a totally new idea, either. The first step in this direction was the peripheral processor for the CDC 6600, in the 1960s, which appeared as ten peripheral processors to the programmer. Internally, it was ten sets of registers and one ALU, doing one instruction for each machine state in turn. Basic/4, a forgotten minicomputer manufacturer, tried a similar idea in the 1970s.

On the other hand, this apparently isn't that tough a feature to add to an already-superscalar CPU, so why not?

Re:It's not for games, stupid by rtaylor · 2002-02-20 11:10 · Score: 2

Have you *ever* seen or heard of a new idea?

Humans tend to base their thoughts off of what they learned. So, new thoughts are always based off other thoughts of either yourself or others.

New is pretty tough. Generally means you kept your thought path secret long enough to go through enough revolutions that it no longer resembles anyone elses thoughts. You know what that means? Complete lack of progress. Group thoughts tend to move a little quicker than an individuals.

--
Rod Taylor
Re:It's not for games, stupid by mickwd · 2002-02-20 11:27 · Score: 2

Or maybe they just had to wait for a patent to expire ?
Re:It's not for games, stupid by Anonymous Coward · 2002-02-20 11:44 · Score: 0

AFAIK, P4 hyperthreading is supposed to work around the fact the the U- and V-pipes are rarely filled. If you can feed two separate threads to both pipes then you won't have as much contention that causes the pairings to fail. Assuming they've added more instructions that the V-pipe can process, FPU utilization is not necessary for full performance--FPU ALWAYS pairs with integer ops anyway.

As for cache miss...this is a Xeon which usually has a lot of L2 which should minimize the problem.
Re:It's not for games, stupid by moogla · 2002-02-20 15:00 · Score: 1

1) U-V pipe only was used in the original pentium. Pentium pro and up use an out-of-order executing RISC core.

2) That L2 cache is probably shared between the
cores, if so, it'll still suffer some.

--
Black holes are where the Matrix raised SIGFPE

Re:Think of it as out of order execution ..glorifi by fitten · 2002-02-20 10:00 · Score: 1

It is easier to think of this in terms of the OS concepts of a paging system.
When Thread1 gets a Load-Fault (think of Pagefault), it executes Thread2 instuctions that it has cached. Much like an OS will execute Process2 if Process1 hits a pagefault. Basically what the hardware is there to do is to fill up the bubbles in the pipeline that are caused by one thread with instructions from another thread.

More Intel marketing by hobit · 2002-02-20 10:01 · Score: 3, Informative

This is just SMT (simultaneous multithreading)

Some other complaints about this "invented at Intel" terminalogy can be found at The Register.

Also Toronto has a nice slide show (pdf) on the topic.

For the record I contributed a little tiny bit to this stuff when I was at Intel (I found what I think was the first multi-processor bug for SMT.)

--
As Nietsche famously said, "If you stare too long into the Abyss, 1d4 Tanar'ri of random type will attack you."

Re:More Intel marketing by SpaceLifeForm · 2002-02-20 10:59 · Score: 1

Would the bug be known as 'F00F F00F'?

--
You are being MICROattacked, from various angles, in a SOFT manner.

Looks like... by Tony.Tang · 2002-02-20 10:05 · Score: 2, Redundant

Looks like GamePC's website isn't running one of these babies yet.

Slashdotted already. :(

Problems With Slashdot by wizarddc · 2002-02-20 10:05 · Score: 0, Offtopic

I don't know where else to post this, but has anyone been experiencing problems
with Slashdot? More specifically, cookie problem, such as not being logged in
when you vist the site? For the past three days, I've had to log into Slashdot
multiple times thoughout the day. I've also gotten errors on the site that force
the cookie to be written to the screen. Is there any place where we can see/post
bugs/bugfixes for the site? Or a thread where we can see status on the site in
general? I enjoy reading the site too much to get fed up with small problems.

--
Th

Re:Problems With Slashdot by JPriest · 2002-02-20 10:28 · Score: 1

I will agree with you on this, sometimes clicking "X replies beneath your current threshold." links or on replies takes me back to the slashdot main page. But I believe CowboyNeal is the is the correct contact for bug reports. email addy is pater@/.

--
Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.
Re:Problems With Slashdot by SpaceLifeForm · 2002-02-20 11:05 · Score: 1

Yes, I experienced the cookie problem yesterday. Earlier today, I believe slashdot was /.-ed.
So, a site status page would be nice, but it would need to be separated from the main site.

--
You are being MICROattacked, from various angles, in a SOFT manner.

Yup correct by esses · 2002-02-20 10:07 · Score: 2, Insightful

I was trying to simplify things... I probably went a bit too far.

The regsiter level contentions are alleivated with Out of order execution (more ore less).

A good example of where hyperthreading helps is the front side bus. Procesors tend to spend over 80% of their time executing out of cache. Thus the front side bus is sitting idle (or performing simple snoops).

If one thread is going to be memory intensive (video streaming for example... or texture manipulation), or even I/O intensive and thus results in a lot of transactions along the FSB... it can occur at the same time as a second thread that's FPU intensive

(asuming the I/O intensive one isn't FPU intensive as well).

SMT by mrm677 · 2002-02-20 10:11 · Score: 5, Informative

Simultaneous Multithreading (SMT) is not a new idea, although no one to my knowledge has implemented it yet. Intel just calls it "Hyperthreading"...it is essentially SMT.

And yes, this is a very good idea. A modern superscaler out-of-order processor, like the Athlon and Pentium Pro (and later), can issue and retire multiple instructions per clock cycle. However, it can *only* do this if there is enough instruction-level parallelism (ILP). Turns out, there is not enough ILP in current programs to take full advantage of the chips processing capabilities. Issue slots and function units go unused due to dependencies in the program and cache misses that stall the processing. A typical processor can only look at about 32 instructions at a time. This is not a large enough window to execute future instructions out-of-order when such a stall occurs.

However, 2 threads of execution will likely fill all of the issue slots. They are also independent threads of execution, so dependencies don't exist between them. This means that when the pipeline stalls due to a cache miss, the other thread can keep on retiring instructions.

To all those saying that this is dumb, I suggest you study some modern architecture (I'm not talking about your undergrad architecture course either). A paper I read recently studied the affects of SMT on a simulated Alpha processor. The results were astounding with very little changes to the processor core. I heard that the next Alpha was slated to include SMT before Intel killed it.

Re:SMT by anzha · 2002-02-20 10:22 · Score: 1

Cray has been working on this for some time at the higher end. At least Tera, they were for a long time. Their MTA does exactly that, but with 128 cpu equivalents and in GaAs too.
It would be interesting to see if they end up rolling some of their MTA technology into the follow-on to their next vector machine. Hardware threading with vector processing might be interesting...even if not for the desktop. :D

--
Do you know why the road less traveled by is littered with the bones of the unwary?
Re:SMT by TurkishGeek · 2002-02-20 11:05 · Score: 1

Clearwater Networks has been shipping a network processor unit that uses SMT, to my knowledge this is the only implementation so far. IBM's Blue Gene research supercomputer will also use SMT.

As you pointed out, it is not a big deal to add SMT to a modern superscalar core. Most SMT studies show that the performance improvement maxes out at 4-way simultaneous multithreading however, so SMT will only be a short term solution in the quest for speed..

The paper you've read was most likely written by Joel Emer, a very influential Alpha architect and a major SMT researcher. He was behind the push to include SMT in Alpha 21464; he is now an Intel Fellow if I remember correctly. With Emer an Intel employee and director now, it is very likely that most of the Alpha SMT research is being integrated into the next generation x86 and EPIC processor plans...

--
Zigbee Central: A Zigbee weblog
Re:SMT by Daeslin · 2002-02-20 11:23 · Score: 2, Informative

Isn't that what IBM's Power4 chip does? 4 cores on one silicon with certain shared resources....

--

I like lots of people. That doesn't mean I go carting them around the galaxy with me. --Dr. Who
Re:SMT by sagei · 2002-02-20 11:51 · Score: 2

Isn't that what IBM's Power4 chip does? 4 cores on one silicon with certain shared resources....

Power4 is different, it is a multi-core CPU -- this means there are actually multiple (in Power4's case, two I believe) CPU cores on each die.

SMT just duplicates certain parts (say, the registers) and they share the resources of the core.

--

Robert Love
Re:SMT by gouldtj · 2002-02-20 12:20 · Score: 2

I agree. The Alpha guys have been working on this for quite a while. If you want to learn a little bit more about it here is my Master's Thesis on the topic. (Actually on scheduling in an SMT system, and also looking at four threads, but the intro should be enlightening). Also the biblography should provide you with everything you'd want to read about it (atleast 2 years ago).
Re:SMT by Slowping · 2002-02-20 13:05 · Score: 5, Informative

I got my undergrad architecture class at the University of Washington CSE department, and was fortunate enough to have a few lectures on SMT in my architecture class.
Professor Hank Levy has a whole bunch of interesting SMT papers; covering the architecture, performance analysis, compiler optimizations, etc.
Here is the presentation Prof Levy used during his guest lecture about SMT when I took the class.

--
(\(\
(^.^)
(")")
*beware the cute-bunny virus
Re:SMT by mrm677 · 2002-02-20 15:36 · Score: 1

Yeah...I was being a little arrogant. I see far too many posts on Slashdot about people talking out of their asses.

My undergrad architecture course was pretty basic and didn't cover anything advanced like SMT, multiple issue, OOO cores, speculation, etc.

I looked over your prof's website and will read some of his papers when I get a chance.
Re:SMT by Pussy+Is+Money · 2002-02-20 17:21 · Score: 1

Don't forget as well that the Pentium 4's ALU's clocks are doubly pumped, and as such it takes a lot of instructions to keep them busy. SMT helps here.

--
Pushin' 'n dealin', shovin' 'n stealin'
Re:SMT by pslam · 2002-02-21 00:00 · Score: 1

And yes, this is a very good idea. A modern superscaler out-of-order processor, like the Athlon and Pentium Pro (and later), can issue and retire multiple instructions per clock cycle. However, it can *only* do this if there is enough instruction-level parallelism (ILP).
The solution to a lack of ILP is to fix the instruction set. Personally I think of SMT as an extremely inefficient way to keep execution units filled. The x86 instruction set has very little in the way of implicit parallelism - or at least it's very hard to work out what's interdependent. You get your ILP in SMT basically because the two threads of execution have few dependencies on each other.
The bottleneck in x86 is definitely issuing instructions to execution units. But SMT only helps because the instruction set is poor to begin with. I'd go as far as say that SMT wouldn't be worth the effort on any decent architecture. Not that DSPs are decent - but some DSPs have parallelism built into every instruction to the extent that you basically specify what to do with every execution unit.
SMT makes more efficient use of execution units, but that's only because the number of execution units per thread has now been halved. In essense, it's artificially made the execution units the bottleneck instead of instruction issue. You have to ask yourself whether unfilled execution units is actually a bad thing when there's so many other areas in a processor which could be "optimised". I get the feeling that "unfilled execution units" is the next Intel marketing tool rather like clock speed is. It's questionable whether this slight increase in efficiency outweighs the loss in cache, memory bandwidth per thread and branch prediction. And there's the question of whether the amount of silicon this takes could have been better used in another way.
Turns out, there is not enough ILP in current programs to take full advantage of the chips processing capabilities.
And this is where it all falls down, really. It is very hard to write efficient parallel algorithms on real world processors. It's even harder to make a parallel algorithm efficient on SMT because the two threads of execution aren't really 100% independent. I suspect most people will decide not to bother writing for SMT because the development effort far outweighs the potential gains.
Re:SMT by mrm677 · 2002-02-21 05:02 · Score: 1

No, you are wrong. The instruction set is irrelevant. IA-32 is dynamically converted to a RISC-like language. Anti-dependencies are dynamically renamed. There are dozens of physical registers for which many instances of the logical registers can be renamed if there is no true dependence.

What you are suggesting is to move to a VLIW-type instruction set. VLIW is exactly specifying what function unit does what, in the instruction itself. DSP instruction sets are VLIW-like. So is the MMX extensions (sort of). IA-64 is doing exactly this except that it is allowing some dynamic behavior instead of everything being static which is VLIW-purism.
Re:SMT by pslam · 2002-02-22 13:40 · Score: 1
No, you are wrong. The instruction set is irrelevant. IA-32 is dynamically converted to a RISC-like language.
In theory. In real life no such compiler or dynamic translator exists that does a good job of this. In real life even an O(n^n) algorithmic time translator does not exist which does a job as perfect as you suggest, or at least hasn't been written yet.
If code translation in IA-32 is as good you say, then why do "Optimisation Guides" exist? Simple: because code translation is nowhere even near perfect. You still need a million and one hints and compile-time optimisations to get your code "near the metal" as far as efficiency goes.
What I suggest is that all that instead of piling an immense - and algorithmically intractable - amount of work into the hardware translator, it should be done at compile time, or perhaps even a software JIT. Adding SMT just makes optimisation even harder than before. In fact, it makes writing code with pipeline effects in mind completely impossible - so you will never be able to know if your hand crafted assembler will run well.
Whatever the solution, IA-32 is such a poor match for:
- Recognising indepedent code sequences.
- Optimising.
- Translating.
- Code density.
- Parallelism.
... that it's a stupidly uphill battle to get it running efficiently. Especially in hardware.
DSP instruction sets are VLIW-like. So is the MMX extensions (sort of).
MMX is different to a lot of DSP instruction sets in that it's SIMD and not MIMD. In IA-64 the approach that's taken is specifying many explicitly independent instructions on independent data in a bundle. In MMX the approach is to specify a single (ordered) instruction with many explicity independent data words. One could argue that MIMD is far more expressive. But in practise SIMD takes far less silicon and therefore usually ends up far more efficient.
When it comes down to it, software dynamic code translators (or JIT) is pretty much the equaliser as far as instruction sets go. In theory. In practise, the input and output instruction sets are extremely important. A really crap architecture like x86 makes for a really hard job for the translator - and therefore your code runs slowly. Especially in a purely hardware translator. Theory makes the fatal flaw of assuming you have an unbounded amount of time in which to translate your arbitrary code to theoretically perfect target machine code. Hardware implementations have a few clocks before they stall the pipeline. Software implementation have a small ratio of execution to translation time before it impacts.
And then we have real-world P3 and P4 architectures:
- Dynamic register renaming and memory reference renaming is by no means perfect. In fact, you have to "hint" to the processor about free registers by writing zeros.
- Memory writes still go out in order and analysis is not very comprehensive.
- Branches are not analysed on the algorithmic level, they are translated wholesale. A simple loop will not be loop unrolled, for instance.
You could probably ditch 25% of an Athlon/P4 if you got rid of IA-32 and invented an input instruction set which was a better match for the target processor.

Linux? by WetCat · 2002-02-20 10:11 · Score: 1

Just curious, how Linux kernel will work on that processor. Will there be any improvements?

Re:Linux? by Anonymous Coward · 2002-02-20 10:15 · Score: 0

Just curious, how Linux kernel will work on that processor. Will there be any improvements?

no.

These machines are wonderful by Xanthos28 · 2002-02-20 10:13 · Score: 2, Offtopic

Last friday I got my first taste of the Xeon processor. I work for a company that makes heavily optimizing OpenMP compilers, and we tend to get some of the latest hardware in short order. Last friday, I set up a machine with:

Dual Xeon 2.0Ghz CPUs (3997 bogomips on RH7.2)
1Ghz ram
36Gb disk

This machine is extremely fast. A test suite that runs in 4 hours on a dual PIII 800MHz (512MbRam) runs in about 45 minutes on this machine.

Re:These machines are wonderful by tiomapengineer · 2002-02-20 10:15 · Score: 1

1GHz RAM!!!!! Where can I find that? :-)
Re:These machines are wonderful by Xanthos28 · 2002-02-20 10:19 · Score: 1

*chortle* forgive me, its been a long day:)

1Gb Ram:)
Re:These machines are wonderful by fitten · 2002-02-20 10:49 · Score: 1

1 Gigabit of RAM?
Re:These machines are wonderful by Hektor_Troy · 2002-02-20 11:17 · Score: 1

So, you have 128 MB of RAM and a 4.5 GB harddrive?

Oh, you meant GigaByte and not Gigabit. Maybe you should drink some more cofee :-)

--
We do not live in the 21st century. We live in the 20 second century.
Re:These machines are wonderful by Anonymous Coward · 2002-02-21 03:19 · Score: 0

Er, forgive me for saying so but... yes, it is impressive. However, do some math, and you find that the relative performance of the 800 mhz deal as compared to the combined 2ghz deal is about equal. That's all.
Re:These machines are wonderful by eviltwinimposter · 2002-02-21 08:17 · Score: 1

Oh, you meant GigaByte and not Gigabit. Maybe you should drink some more cofee :-)
Or coffee?

GamePC? by gUmbi · 2002-02-20 10:15 · Score: 2

Anyone else think it odd that GamePC is reviewing this? Do ANY gamers run Xeons?

Jason.

Re:GamePC? by GooberToo · 2002-02-21 01:44 · Score: 2

Anyone else think it odd that GamePC is reviewing this? Do ANY gamers run Xeons?

Yes and it shows as they had little technical information and failed to provide a worthy review on how programs and OS's may better use this technology. Basically it said if you're already using 100% of your CPU you're not going to get another CPU out of this. Last I heard, a CPU can only provide 100%. Surprise. I personally was not very surprised to read that, however, they could of tested many other situations to see how well it performed, that is, cases where less than 100% was being used as is often the case on workstations and servers. Let's face it, if your production server is constantly at 100%, you need a new work load or a new server (unless it's purely a computational server whereby, I believe the technology assertions seems to imply that this is not it's targeted mode of operation). Pretty much the same goes for workstations too. So in that regard, the test was pretty much worthless as it was completely unreflective of how the technology might actually get used in the real world.

Was anyone really surprised that 1 != 2? A side from the author, I know I certainly was not. And that, I think is why they certainly were not qualified to perform a meaningful benchmark with this technology.

Re:Think of it as out of order execution ..glorifi by CustomDesigned · 2002-02-20 10:16 · Score: 1

But if this works, then all the CPU functional units will get higher duty cycles, and now we'll need a bigger fan!

Copy of text in case of /. effect by segfaultdot · 2002-02-20 10:16 · Score: 3, Interesting

Prestonia Xeon 2.0 GHz vs. Athlon MP 1900+
www.gamepc.com

2/19/2002

While Intel and AMD have seemingly taken a breather from their constant one-upmanship in the consumer processor market, things are still churning along for the workstation and server markets. While the consumer level chips from both companies (Pentium 4 and Athlon XP) bring in large portions of cash, the workstation and server processors are where the real money is made. These processors go for a much higher price premium on the market and are commonly used in more expensive multiprocessor setups.

The customers who buy these chips tend to buy large quantities and like to use them for multiple years without any issues. Therefore, stability and reliability are the most important factors in buying a chip here with raw performance coming in second. Sure, having an incredibly fast processor is nice, but if you're constantly having to reboot the systems due to processor or motherboard stability problems, the system becomes more of a burden than help. Thus, there is a constant struggle for IT managers to either go for the fastest workstation chip on the market, or go with the chip that's known for excellent stability. Both Intel and AMD are striving to become the processor manufacturer that gives workstation users both the best performance and best stability on the market.

Intel has the Xeon family, which has had a foothold in the low-end server / high-end workstation market for multiple years now, stemming back to the original Pentium II Xeon. The Xeon now clocks up to 2.2 GHz and comes equipped with features like 512k on-die cache, a 400 MHz front side bus, and some nifty on and off-die thermal monitoring features. Their new "Prestonia" Xeon family was just recently released to market, which is what we're looking at today.

AMD, on the other hand, has the Athlon MP. Renowned for its incredible price/performance ratio, the Athlon MP has had a tough time making a name for itself as a big time server chip, although has done fantastically well in the workstation market. The combination of a fairly low cost processor along with similarly priced motherboard and memory have made the Athlon MP platform quite the hit. The Athlon MP was recently bumped in speed up to 1.6 GHz, which uses the AMD PR rating of 1900+.

Today at GamePC, we're looking at two of the fastest consumer-level multiprocessing chips on the planet, Intel's "Prestonia" Xeon 2.0 GHz right alongside AMD's top of the line Athlon MP 1900+. Let's boogie.

Intel "Prestonia" Xeon 2.0 GHz
The Prestonia family of processors is to the Xeon what the Northwood family is to the Pentium 4. The Prestonia Xeon shares all the benefits of the original Pentium 4 Xeon, like a 400 MHz FSB, double-pumped ALU units, and SSE-2 instruction support, but it also has a few added bonus features which make it far and away better than its predecessor.

Just as Intel recently did with their Pentium 4 family, the Prestonia Xeon is manufactured on Intel's new 0.13 micron manufacturing processes, which allow for a smaller die area, along with lower power consumption and lower heat emissions. Not only does this make the Prestonia Xeon cheaper to produce, but the lower heat amounts come in very handy when dealing with dual and quad CPU configurations in a small form factor like a 1U or 2U rackmount. For example, the original 2.0 GHz Xeon produced a maximum of 77.5W of heat, while the new Prestonia Xeon at 2.0 GHz produces only 58W.

While reducing the manufacturing process, Intel also managed to stick in an extra 256 kB of L2 cache on to the processor die, giving it a total of 512 kB of full-speed on-die cache. As we've seen before with the Pentium 4 Northwood, adding another 256k of cache on to the Pentium 4's core can add up to 10-15% added application performance. Thus, the Prestonia Xeon gets that same speed increase compared to previous Xeon processors. Rumor has it that Intel will announce Xeon CPU's in the future with extra on-die cache, such as the case was the original Pentium II and III Xeons.

Both the original Xeon and Prestonia Xeon look roughly the same packaging, thus telling apart the CPU's can be difficult unless you have one right in front of you. Intel has the CPU markings on the bottom of the Xeon CPU's, as opposed to the Pentium 4 CPU's which have the markings right on the CPU's heat spreader. A quick flip of the CPU reveals the CPU's vital information. As you can see by the Xeon's S-SPEC codes, this is a 2.0 GHz Xeon with 512kB of L2 cache, running on a 400 MHz FSB, while running at 1.5V core voltage.

Even though there's a new core running underneath, Intel decided to keep the original Socket-603 form factor of the original Xeons, allowing you to upgrade to these newer chips without buying a new motherboard. As Xeon motherboards can be extremely expensive, this is a very, very good thing.

Besides the new manufacturing-level features of the processor, there has been one buzzword that has been gaining all the attention lately. Hyperthreading, the feature that can theoretically turn your 2 physical CPU's into 4 virtual CPU's. Let's investigate.

What Actually IS Hyperthreading?
Hyperthreading is actually a technology that's been around for quite a long time in microprocessing, but has never been used in a consumer-level product like the Pentium 4 Xeon. The technology itself is based on Simultaneous Multi-Threading (SMT) and was codenamed "Jackson Technology" by Intel while in development. At the last IDF, they gave this technology a name that fits in better with the Pentium 4 architecture, Hyperthreading.

Hyperthreading is simply a method of placing a second set of registers on the processor core, allowing the processor to execute two "threads" at once. Every time you run a piece of software, the software is sending threads to the CPU for it to execute and process. Until now, consumer level processors can only handle one thread at any given time. While a processor may go through thousands of threads per second, the CPU can only physically execute one at a time. In a dual CPU system, the computer can process two threads by sending one to each CPU. Hyperthreading takes the concept of executing multiple threads and brings it down to the single CPU level.

Hyperthreading allows the CPU to manage two threads at once, although this doesn't necessarily mean there are two CPU cores on the same die. Each register set can handle one thread, but each thread has to fight for processor resources like storing data in cache and sending it out through the front side bus. This means a single CPU with hyperthreading capabilities will not perform the same as two physical CPU's in an SMP configuration. While the ability to execute two threads at once was one of the main reasons why SMP was brought to market (symmetrical multi-processing, i.e dual CPU systems), the costs of going to SMP, such as SMP compatible motherboards and processors, in most cases far outweigh the benefits.

Unfortunately, since the threads have to fight for resources, there can be conflicts. If two threads want to use the same processor resources at the same time, they have to get in a queue to do so. Since most every piece of software on the market is written to only take advantage of a single CPU, suddenly throwing a single processor application on a dual/quad processor system will show literally no advantage in performance. Even as of today, only small percentage (mainly workstation/server applications) are multi-threaded to take advantage of multiple CPU's.

To get the full advantage of Hyperthreading technology, the software will have to be "optimized" for it. Whether this means re-compiling the software to support Hyperthreading through a new Intel compiler or just adding a few more lines of code, we're not certain. Intel states in their technical documents that software written to take advantage of SMP will get in upwards of 10% performance gain with a Hyperthreading capable CPU. If the software is optimized specifically for Hyperthreading, Intel has seen performance gains up to 30%.

Nowadays, where SMP is common in workstations and servers (and in some cases, desktops), there is a lot of multi-threaded code out there. The latest major operating systems can handle multiple processors, most professional video / audio editing software can use the CPUs, and even games are just starting to take advantage of a second CPU if available. This is the market that Intel's looking to capitalize on.

Hyperthreading in Reality
The buzz around Hyperthreading is that a single Xeon system will be seen as two CPUs, while a dual Xeon system will be seen as a quad CPU system. Of course, people immediately think, "Wow, two CPUs for the price of one!" This is certainly not the case with Hyperthreading, just as dual processors do not give you double the power of a single processor.

Since Hyperthreading is implemented on the hardware level, the motherboard sees a single hyperthread-compatible CPU as two physical CPUs. Thus, software that is written for multiple CPUs will be tricked into thinking there is a second CPU in the system, and will run the appropriate multithreaded code if available. Since Windows XP and 2000 are coded to take advantage of multiple CPU's, it too sees a hypertheaded CPU as two.

In our case, since we ran with dual Xeon processors (each with hyperthreading capabilities), the OS and software see this as four physical CPUs, even though there are only two physical CPUs running. As you can see by the device and task managers in Windows XP, the OS sees our system with four physical CPU's. Eeven though Windows 2000 and Windows XP only officially support two CPU's, both operating systems were able to run properly with the Hyperthreaded CPU's. This means you don't have to upgrade to a 4-processor OS like Windows 2000 server to take advantage of this technology.

While this looks great for showing off to co-workers or friends, you will absolutely NOT get the performance of four CPUs running in your system (I can't stress this enough). As you'll see in our benchmarks later, even if software is written to take advantage of SMP, you rarely ever see performance gains with Hyperthreading enabled. In fact, in many applications, you see a performance drop with Hyperthreading enabled, as there is a great deal of overhead when splitting data up over four CPU's to process. Perhaps this is why Intel is recommending motherboard makers leave Hyperthreading disabled in the BIOS.

It's quite possible that Intel implemented Hyperthreading to take advantage of the Xeon architecture's longer pipeline, an often criticized design element of the Pentium 4 and Xeon families. With Hyperthreading, they can start a second process after the first one is farther down the pipe. From a theoretical standpoint, the code would have to either be highly optimized for the Prestonia or limit the use of branch prediction, since there are now two sets of independent data in the processor. If you look at Hyperthreading like this, it would appear to be the next generation of the P4's out-of-order speculative execution engine.

From what I now understand about Hyperthreading, it's my belief that Intel is planning to use Hyperthreading in all of its future Pentium 4 products down the road. The Xeon is simply the first guinea pig to actually have the logic enabled on the die. As Intel already has the Hyperthreading logic in the current Pentium 4 hardware, but not implementing it, you've got a sure sign that Intel will simply flip the switch to activate the logic when Hyperthreading applications are actually available. If Intel convinces developers that Hyperthreading is worth their time to optimize for, this could be an incredible feature 1-2 years down the road. As for now, it's fairly useless, but certainly interesting in the sometimes bland world of computer processing.

AMD Athlon MP 1.6 GHz (1900+)
The Athlon MP 1.6 GHz is the latest and greatest from AMD's server/workstation family of CPUs, which have gained an extremely large amount of credibility lately due to their incredible price / performance ratio compared to Intel's Pentium 4 and Pentium 4 Xeon families. While slightly lagging behind AMD's own 1.67 GHz (2000+) in raw clock speed, the Athlon MP 1.6 GHz is quite more expensive than the Athlon XP 1.67 GHz, despite the fact that both can run SMP quite well.

The Athlon MP is based on the "Palomino" Athlon architecture, which is based on the 0.18 micron manufacturing process. While the Palomino chips create quite a bit less heat than the "Thunderbird" variant of the Athlon, the Palomino's still create quite a lot of heat, which can be difficult for dense rackmount situations. The chip itself is based on the Socket-A form factor, which means it should be compatible with most single processor Athlon boards, as well as all the dual Socket-A boards on the market now. As you'll no doubt notice, the new Athlon XP/MP processors are coming with green packaging, although they still use the same organic packaging as previous Athlon MP/XP CPU's.

The Palomino Athlon core comes equipped with 128 kB of L1 cache, along with 256 kB of L2 cache. While we've heard rumors that AMD may up the cache amounts on their upcoming 0.13 micron "Thoroughbred" processors, we haven't recieved any indication that this is anything more than a rumor.

Getting a closer look at the Athlon MP 1900+, you can see the Athlon's famous bridges are not "cut", like Athlon XP chips hitting the market. This means with a simple pencil and a motherboard that supports clock adjustments, you can overclock these processors to much higher clock speeds than intended. Of course, workstation and server users would most likely never do this, as overclocking is inherently risky, but we thought it was worth mentioning.

As you can see from reading the core, our Athlon MP processors are of a fairly recent "AGNGA" core stepping. The first line of text says "AMP1900", which denotes our chip as an Athlon MP 1900+. AMD runs the exact same processor core on both the Athlon XP and MP processors, albeit the MP models go through an extra round of multiprocessor "validation". Performance wise, these two cores are exactly the same.

The biggest threat for AMD and the Athlon MP is the fact that the platform has been plagued by a lack of absolute stability. While the Tyan Thunder K7 and Tiger MP boards still wrangle with edge-case stability scenarios, the AMD 760MPX motherboards have been plagued with chipset problems and many board revisions. In fact, the release of the 760MPX has undone much of AMD's work in making the Athlon MP synonymous with stability. We absolutely love the Athlon processors, but the platforms still aren't up to the level we were hoping for by now. Still, as more platforms are getting released, the situation IS getting better.

Just the facts, ma'am.

Intel Prestonia Xeon 2.0 GHz

AMD Athlon MP 1900+

. Prestonia Xeon 2.0 GHz Athlon MP 1900+
Clock Speed 2.0 GHz (2000 MHz) 1.6 GHz (1600 MHz)
L1 Cache 8 kB 128 kB
L2 Cache 512 kB 256 kB
L2 Cache Speed Clock Speed (2.0 GHz) Clock Speed (1.6 GHz)
L2 Cache Associativity 8-Way 16-Way
Form Factor Socket-603 Socket-A
Front Side Bus Speed 400 MHz 266 MHz
Manufacturing Technology 0.13 Micron 0.18 Micron
MMX Instruction Support Yes Yes
SSE Instruction Support Yes Yes
SSE-2 Instruction Support Yes No
3DNow! Instruction Support Partial Yes

The Platforms

Supermicro P4DC6+ i860

Asus A7M266-D AMD 760MPX

. Supermicro P4DC6+ Asus A7M266-D
Chipset Intel 860 AMD 760MPX
CPU Support Up to 2 x Xeon 2.2 GHz+ CPUs Up to 2 x Athlon MP 1.6 GHz+ CPUs
Memory Type PC-800 RDRAM PC-2100 DDR SDRAM
Memory Capacity 2 GB Max (4 RIMMS) 3.5 GB Max (4 DIMMS)
Memory Type Support Standard / ECC Standard / ECC
AGP Expansion AGP Pro 50 AGP Pro 50
PCI Expansion 2 x 64-bit (66 MHz) Slots
4 x 32-bit (33 MHz) Slots 2 x 64-bit (66 MHz) Slots
3 x 32-bit (33 MHz) Slots
Onboard SCSI Adaptec AIC-7899W Ultra160 SCSI N/A
Onboard Ethernet Intel 82559 10/100 Port N/A
Onboard Audio AC97 Audio C-Media 6 Channel Audio
Onboard Video N/A N/A

Pentium 4 Xeon "Prestonia" Testbed System Configuration

Processors 2 x Intel Pentium 4 Xeon 2.0 GHz "Prestonia" (8k L1, 512k L2)
Cooling Intel Socket-603 Retail Coolers
Memory 512MB Samsung PC-800 RDRAM (4 x 128M)
Motherboard Supermicro P4DC6+ (Intel 860 Chipset)
Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
Software Windows XP w/ DirectX 8.1, Intel 3.2 Chipset Drivers

Pentium 4 "Northwood" Testbed System Configuration

Processors Intel Pentium 4 2.0 GHz "Northwood" (8k L1, 512k L2)
Cooling Intel Socket-478 Retail Cooler
Memory 512MB Crucial PC-800 RDRAM (4 x 128M)
Motherboard Asus P4T-E (Intel 850 Chipset)
Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
Software Windows XP w/ DirectX 8.1, Intel 3.2 Chipset Drivers

AMD Athlon MP Testbed System Configuration

Processors 2 x AMD Athlon MP 1.6 Ghz (1900+) "Palomino" (128k L1, 256k L2)
Cooling AMD Socket-A Retail Coolers
Memory 512MB Crucial PC-2100 DDR SDRAM (2 x 256M)
Motherboard Asus A7M266-D (AMD 760-MPX Chipset)
Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
Software Windows XP w/ DirectX 8.1, AMD 1.30 Driver Pack

AMD Athlon XP Testbed System Configuration

Processors AMD Athlon XP 1.67 Ghz (2000+) "Palomino" (128k L1, 256k L2)
Cooling AMD Socket-A Retail Cooler
Memory 512MB Samsung PC-2100 DDR SDRAM (2 x 256M)
Motherboard Asus A7V266-E (VIA KT-266A Chipset)
Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
Software Windows XP w/ DirectX 8.1, VIA 4-In-1 4.37 Service Pack

Lab Notes

* All tests run with VSync (Vertical Sync) Disabled.
* Nvidia Detonator XP (23.11) Driver used in all testing.
* All RDRAM memory run with "Nap" mode disabled.
* All DDR memory run at CAS 2.5 latency.

Benchmarking Software

* Adobe Photoshop 6.01
* LAME MP3 Encoder 3.91
* Kinetix 3D Studio MAX
* Red Hat Linux 7.2
* SiSoft Sandra 2002
* Windows Media Encoder 8.0

SiSoft Sandra 2002 is a synthetic Windows benchmark.
The benchmarks can stress CPU, Memory, or Processor Instruction abilities.
Higher Sandra scores mean better overall performance.

CPU Benchmark - Hyper-Threading Support (SMT) Enabled
(Higher Scores are Better)

CPU Benchmark - Hyper-Threading Support (SMT) Disabled
(Higher Scores are Better)

Memory Benchmark
(Higher Scores are Better)

SiSoft's Sandra, while being a synthetic Windows benchmark, is one of the few pieces of software on the market with some level of Hyperthreading support. This is through Sandra's "SMT" test, which to be honest, gave us extremely sporadic results at first. Once we figured out what exactly was happening with the test, we were able to finally lay down some solid numbers.

First off, it's quite easy to see that the dual Athlon MP setup simply rules the roost when it comes to raw CPU performance. Even with the Athlon MP chips at 1.6 GHz, it's easily able to outpace the dual Xeon 2.0 GHz processors, with or without Hyperthreading enabled. Even the highest performing Xeon setup still trails the dual Athlon MP 1900+ by roughly 30%.

When Hyperthreading was enabled, we can certainly see some performance gains being had by the Xeon setups. One CPU with Hyperthreading gained 18% in this benchmark, while two CPU's with Hyperthreading gained 23%. Of course, this is simply a synthetic test, and to achieve any real world performance gains like this, the software would have to be specifically optimized for Hyperthreading.

Upon looking at the results, we're not positive on what effect the SMT test has on our scores. As you can see by the first graph, even with Hyperthreading (hardware)disabled on the dual 2.0 GHz Xeons, it still managed to get a higher score on the Hyperthreading (software) test, compared with Hyperthreading (software) being disabled, which nearly has a margin of 2000.

In terms of memory performance, Xeon systems still maintain quite a large margin over the current Athlon MP systems. Thanks to the Xeon / i860 dual channel RDRAM memory interface, you've got quite a bit more available bandwidth compared to the Athlon MP / 760MPX single channel DDR interface.

Adobe's Photoshop 6.0 is the world's most popular image creation/editing software.
We run a series of filters on an image, while measuring perform them.
The times for each filters are added up. Lower times mean faster performance.

Adobe Photoshop 6.01 Filter Benchmark
(Lower times are Better)

Adobe's Photoshop thrives on fast FPU units along with lots of memory bandwidth and capacity. Even though Photoshop is multi-threaded, the software only really takes advantage of multiple processors on a few select filters. Thus, running a second processor doesn't necessarily help Photoshop that much, at least in this case.

In our test, we see the simple single Athlon XP 2000+ processor beating out both the dual Athlon XP 1900+ and dual Xeon systems. While the other platforms were merely seconds away, it's clear that the Athlon-based systems take the cake for best overall Photoshop performance. We see the addition of a second Athlon MP processor took nearly 8 seconds off the benchmark time. Not bad, but we were hoping for more.

Hyperthreading shows itself here to become more of a nuisance than actually helping performance. With Hyperthreading enabled, the dual Xeon 2.0 GHz system actually slows down by 5 seconds, while a single Xeon 2.0 GHz with Hyperthreading speeds up by 2 seconds. As you'll likely guess, Photoshop is not optimized for Hyperthreading, so any performance gains seem to be purely coincidental.

Keep in mind, we ran this test with the Adobe 6.01 patch installed, along with Adobe's specially released SSE-2 filter package, and the Xeons still couldn't fully stand up to AMD's new Athlon processors.

3D Studio is one of the most popular 3D editing suites on the market today.
We render a 50-frame scene with over 40,000 faces and 20,000 vertices.
Lower render times mean faster processing performance.

3D Studio MAX "Tank" Render Test
(Lower Times are Better)

3D Studio MAX, and any kind of 3D rendering software, relies almost 100% on the CPU for final scene rendering. Thus, multiprocessor systems are almost required for any kind of professional level 3D modeling software. 3DS Max is indeed able to fully take advantage of multiple processors.

In our test render, we again see AMD take the take, as the dual Athlon MP 1900+ system rendered our scene the quickest. While the Dual Xeon 2.0 GHz system was just about one minute behind, the Athlon systems simply rock for these kind of applications. Even our single Athlon XP 2000+ system managed to render a few seconds faster then Intel's dual Xeon 2.0 GHz box.

As for Hyperthreading, again we see mixed results. A single processor with Hyperthreading actually helps out, cutting 15 seconds off our rendering time. Two processors with Hyperthreading hurt a lot, as it added an extra 1:56 to our final render time. Ouch.

Windows Media Encoder is a free Windows video encoding suite.
We take a 50MB MPEG file, and encode it to Windows Media 8 (.wmv) format.
We test at 320x240 Resolution using the WM8 for Cable/DSL encoding method.

50MB MPEG Video to Windows Media Video Encode
(Lower times are Better)

While the Xeon was crushed by the Athlon MP in the previous two tests, the table turns around for video encoding. Encoding our MPEG movie was incredibly fast with the Dual Xeons, the fastest score we've seen for this test to date. Windows Media Player 8 is extremely efficient with multiple processors, giving a 30-40% boost in encoding times for both the Xeon and Athlon MP platforms.

Even as the Xeon is the clear winner in these tests, Hyperthreading again disappoints. A single Xeon with Hyperthreading tacks on another 20 seconds to our encoding time, while Dual Xeons adds on another 29 seconds. Disappointing, to say the least.

MP3 Encoding is extremely CPU intensive, and tests the CPU's raw FPU performance.
We use LAME 3.89, which has optimizations for MMX, 3DNow, and SSE
A 200MB .wav file is encoded to a 160 kbps MP3, we record the time to encode.

200MB Wav to MP3 File Encode
(Lower Times are Better)

MP3 encoding through LAME is entirely CPU based, but since the program isn't multithreaded, we don't see any performance gains when adding a second processor. Thus, winning this benchmark is simply a case of having the best FPU performance in a single processor situation, which the Athlon clearly does.

The Pentium 4 / Xeon platforms are 9-10 seconds slower, no matter what motherboard or processor combination is used. Both the Athlon MP and Xeon systems give very respectable encoding performance, but the Athlon MP/XP are clearly the winners here.

Red Hat is the most popular Linux distribution in the world currently
We test by recompiling the 2.4.9 kernel using the "make bzImage -j#" command.
Depending on the # of threads, compiling time can be different, especially with SMP.
Lower compile times mean better processing performance.

Red Hat 2.4.9 Kernel Compile - 1 Thread
(Lower times are Better)

Red Hat 2.4.9 Kernel Compile - 2 Threads
(Lower times are Better)

Red Hat 2.4.9 Kernel Compile - 4 Threads
(Lower times are Better)

Compiling a Linux kernel is extremely stressful on the CPU, and as we tested with the SMP-compatible 2.4.9 Red Hat kernel, we were able to see some very nice performance gains with a our multiprocessor systems. As the 2.4.9 kernel also has for "Jackson Technology" (aka, SMT / Hyperthreading), we were hoping to see what Hyperthreading was capable of doing in a Linux environment.

When the kernel is compiled with a single thread, the systems don't show any real performance gains with a second processor installed. Compiling with two or more threads is where you really start to see the performance gains of SMP with Linux.

With two threads running, compile times are nearly cut in half with two CPU's installed. The Dual 2.0 GHz Xeons manage to compile the kernel quickest at 1:57, while the Athlon MP 1900+ setup is nipping at its heels with a 2:05 compile time. Compiling an entire Linux kernel in under two minutes is simply an incredible showing of CPU power, any way you look at it.

For curiosity's sake, we decided to run a compile with four simultaneous threads. As dual Hyperthreading-enabled Xeons can physically take four threads at once, we figured it would be a good test. Unfortunately, there were only 1-2 second differences in compile times between 2 and 4 threads. Compiling the kernel with 2, 3, 4, 5 and more threads gave roughly the same compile times.

The Final Word
Both the Prestonia Xeon and Athlon MP are incredible processors, and both engineering teams deserve a round of kudos for producing some incredibly fast SMP-capable CPU's. Each CPU has a specific area where you'll see one dominate over the other, although the majority of the tests were fairly close between the two CPU's.

In my opinion, the Prestonia Xeon is the better CPU of the two for mission critical / server applications. The Intel 860 platform seems to be incredibly stable, considering it's relatively short time on the market. Not one instance comes to mind where we ran into compatibility issues with our Dual Xeon systems, something we can't say for the Athlon MP systems we setup. Unfortunately, you pay the price for the Intel name, as Xeon systems are extremely expensive. The CPU's and motherboards are both extremely expensive, which makes the Xeon hard to recommend for the workstation market.

The workstation market is much better suited by the Athlon MP processor, as its price / performance ratio is unbeatable. For most workstation applications, the Athlon MP even will be a better performer, despite its lower price tag. We would love to see AMD put a few more server-specific features on their MP processors to justify their heightened price tags over the Athlon XP, but even as they are now, the MP's are a great deal for the amount of processing power you get in that tiny little core.

As for the Xeon's Hyperthreading technologies, it's hard not to be disappointed with the scores which we got throughout our testing. Hyperthreading sounds like an incredibly useful processor feature in theory, but in practice, It's useless without compatible software on the market. Time will only tell if developers want to take on the Hyperthreading challenge, and the few developers we've talked to have not been that incredibly impressed with the technology thus far. If nothing else, Hyperthreading will certainly be an interesting to watch out for over the next few years.

This time next year, it's quite possible that we may be dealing with McKinley and Clawhammer has the workstation processors of choice, if Intel and AMD have their way. While it's anyone's guess if 64-bit processing is ready to come down to the consumer level, this article certainly proves that current 32-bit processors have more than enough power to handle today's applications.

Re:Copy of text in case of /. effect by BeeShoo · 2002-02-20 10:39 · Score: 1

Prestonia???

Well... it shouldn't ever freeze up :-)

AS/400 by crow · 2002-02-20 10:19 · Score: 4, Interesting

I believe that this was done in the IBM AS/400 using a special version of the PowerPC chip. There was a talk on this at the Ottawa Linux Symposium last summer. According to the IBM people, it mostly worked great, but there were a few issues with spin locks--the CPU saw that one thread was busy (in a spin lock), so it never switched to the other one (that was holding the lock). The Intel implementation may be slightly different, but this is something to look at.

When your hardware isn't exactly what the software was written for, you tend to have weird bugs like that. I would not be surprised if Windows, Linux, FreeBSD, and other OSes need minor patches to work well with this new hyperthreading from Intel.

Re:AS/400 by Deagol · 2002-02-20 10:53 · Score: 2

It's available on RS/6000, too. Our department recently got a p660 server. I was crusing the docs when I stumbled onto something about "hardware threading". AIX 4.3.3 and up can utilize this feature, though we haven't tried it yet.

--
Method of processing duck feet
Re:AS/400 by rtaylor · 2002-02-20 11:06 · Score: 2

Sounds like that spin lock thing would happen on dual cpu's too.

Just needs 3. 2 to spin, one to be holding the lock which isn't running. This is mainly the reason that after a tick the kernel evaluates whats doing useful stuff, and what isn't and scheduals accordingly. So.. in this case and all others of locks it required the kernel to interrupt.

The program should have put itself to sleep if it didn't get the lock after a tick, as it can assume it'll be a while. Postgresql had a large debate not that long ago about the best timing for spin / sleep on SMP. Sleeping immediatly isn't the best thing in multi-cpu cases -- which this is pretending to be.

--
Rod Taylor
Re:AS/400 by Anonymous Coward · 2002-02-20 17:02 · Score: 0

Idiot. Why spout shit rather than keep quiet when you don't know what you're talking about. Spinlocks spin. Semaphores sleep.
Re:AS/400 by Anonymous Coward · 2002-02-21 00:40 · Score: 0

The codename of that special PowerPC version was Northstar, I don't recall the marketing name. What Northstar does is not SMT -- it is missing the 'simultaneous' part: Northstar can only have instructions from a single thread at any particular stage of the pipeline, whereas a true SMT processor can have instructions from several threads simultaneously in the same pipe stages. IIRC Northstar only switches threads on cache misses, which is why you can get problems with spinlocks.

why not 2 CPUs to pretend it's 1??? by superpulpsicle · 2002-02-20 10:19 · Score: 0

I'd like to get two 2gig processors, and have it run symmetrically as 4gigs.

Re:why not 2 CPUs to pretend it's 1??? by be-fan · 2002-02-20 11:56 · Score: 2

Because parallization isn't the same thing as racheting up the clock speed. If everything else in the system scales (which it doesn't, generally) a 2GHz chip is exactly twice as fast as a 1GHz chip. However, a chip with 4 integer pipes (or two chips with two integer pipes) aren't twice as fast as one chip with two integer pipes, simply because a lot of the time only one can be working due to the serial nature of the code.

--
A deep unwavering belief is a sure sign you're missing something...

As an owner of an SMP system... by Sivar · 2002-02-20 10:20 · Score: 2

As an owner of an SMP system, I can say with confidence that even having two /real/ processors, which is better than one hyperthreading processor, isn't of any great benefit to Windows users anyway (see comments above about HT on Win2K) other than for servers (shudder) and for running several very CPU intensive apps at once, which very few people do.
In *nix, however, I have improved my buildworld times for thirty percent. *That's* useful.

--
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra

Re:As an owner of an SMP system... by castlan · 2002-02-20 14:56 · Score: 1

buildworld is compilation. that might imply that development can benefit significantly, whether on Unix or Windows NT. How about the inverse... do you find that servers running on (I'll assume you run) BSD perform better with SMP then they would under Uniprocessor systems?

If all you serve is static webpages under a linear Apache, then I doubt it. If you run CGI, or possible development versions of apache (it looks like they might be working on threading it) you will likely see performance.

When it comes to Windows, IMO preffered for desktop/workstation use over Serving anyday, a significant advantage may come in improved responsiveness when the GUI environment doesn't have to compete with other system components. This is most obvious even under a single CPU running BeOS, but an SMP Win2K would likely behave similarly smooth. Hopefully X11 environments running on FreeBSD and Linux (Which by and large really need all the help they can get here) can also gain significant responsiveness. Even if the Window Manager/Environment aren't multithreaded, at least they won't have to compete with every other daemon and service on the system.

-castlan
Re:As an owner of an SMP system... by llamalicious · 2002-02-21 11:01 · Score: 2

Since this is a highly subjective matter, I won't disagree with your comment, I will simply interject my own, which happens to have a different point of view.

On my dual processor machine, running windows, I noticed a *significant* increase in performance when I added the second processor (after, that is, I told Windows that my machine was a dual processor system, it doesn't auto-detect that after it's been installed. You'll have to set it manually if you had WinNT, 2K or XP Pro installed before you added the 2nd proc)
When running multiple programs, like Photoshop, DreamWeaver, ftp software, Director and Flash (at the same time) I can now comfortably allow one program to do whatever it needs to do on one processor, while the other remains available to the OS to assign threads to in the meantime.

Simply stating that 2 processors is of no great benefit with out some quantifying data is a little weak, IMO

Virtuous Virtual by fm6 · 2002-02-20 10:21 · Score: 2

It's only a virtual processor. Send them a virtual license fee!

Re:Hyperthreading useless on Win2K? (OT) by syzxys · 2002-02-20 10:25 · Score: 0, Offtopic

I did. It works in C++ too. :-)

---
Windows 2000/XP stable? safe? secure? 5 lines of simple C code say otherwise!

Not really the "first look" (more info) by bbqBrain · 2002-02-20 10:28 · Score: 2, Informative

Posted 1/14 on anandtech:

http://www.anandtech.com/cpu/showdoc.html?i=1576

--

One of the reasons that I became a lawyer was to avoid ever having to hire one. -SPYvSPY

Some Dell systems support these... by hyoo · 2002-02-20 10:29 · Score: 2

The latest BIOS update for my dual PIV Xeon (Dell Precision 530) says that it added support for SMT/JT... I wonder if they had already tested these CPUs on my system. I WANT!!!! Drool...

Re:Some Dell systems support these... by cbodine · 2002-02-21 13:59 · Score: 1

Hypertheading is part of Jackson Technology if I remember correctly and was to be out by the middle of this year .The problem I saw with it was the new xeons would come in two config on for single and one for dual setups. It seems Intel just wants more money by making to diffrent cpu of the same model. Still the price for a dual xeon board is not bad about 450 ish and the CPU are about the same as most P4s.

--
Dr. Suess: 'Gandalf, Gandalf! Take the ring! I am too small to carry this thing!' 'I can not, will not hold the One.

Error in the review by Anonymous Coward · 2002-02-20 10:31 · Score: 0

A minor issue, but they report the Athlon as having 128k of L1 cache but the Xeon as having only 8k... Now, maybe I'm wrong but I'm assuming they have the same cache config as past athlons which was 64k L1D and 64k L1I... The P4 has only 8k L1D, but also has instruction and trace cache which is not being counted...

/. Effect by Spam808 · 2002-02-20 10:33 · Score: 1

I guess GamePC's servers got overrun or something, because it seems that none of their pages (not even a month-old review of a Coolermaster case) will load up. You guys at /. should really look into making a list of web sites exempted from ever being mentioned on this site, because it's a little too much at times. P.S. Could you please have some kind of post about a page on Microsoft.com? I'd lllooovvveee for that site to go down... :-p

--
"I get my jollies building computers" Steve Jobs, 1983

It's good - but more for Win2K/XP that for Unices by camusatan · 2002-02-20 10:35 · Score: 2, Informative

For tasks that can be easily split into two threads, I have a feeling that hyperthreading could be better than two processors. But, since threading seems to be better implemented on Windows, NT boxen might enjoy the benefits more.

The best example of how to split a task into threads (that I like to use) is rendering a 3D image to screen. If you want to split that task so that two threads (and thus two processors) can work on it, you just make one thread handle 'even' scan lines and one thread handle 'odd' lines. Keeping the caches cohereent between the two CPU's can be difficult - they're both executing the same code, and might also be twiddling around some piece of memory that they share.

My point is, with this hyperthreading business, that there's only one cache - so no more cache coherency bothers. I might be concerned that the arithemetic units or whatever else that are on the chip might be in contention for use - but they can just add more of 'em in later steppings of the CPU.

The problem for us Unix-lovin' folk is that Unix-esque OS'es don't often take threading very seriously. OpenBSD, for example, doesn't even have a kernel-threading implementation (correct me if I am wrong!) The 'Unix Way' is to just fork a process and run two process images. That's fifty billion times easier to debug than two threads that step on eachothers' data (see deadlock). But the forking method - even with nifty things like copy-on-write process images and such - doesn't seem to use as little memory, or perform as quickly, or process-switch as fast.

When I speak to developers who know their stuff (more than I do) they say - on NT, make a whole bunch of threads and make them talk to eachother with semaphores and stuff - on Unix, fork and write to a pipe. Nothing fundamentally wrong with that division, but advances such as this Hyperthreading thing won't work as well on Linux, I don't think.

Moderation moderation moderation by Anonymous Coward · 2002-02-20 10:37 · Score: 0

The moderators have failed again. The parent of this post was on topic and perhaps even funny. Certainly not worthy of a -1 rating though!

You know kids, just because an anon posts it doesn't mean it's offtopic...

(This note, however, most certainly is. So mark this one off topic and the parent on)

Other systems.... Tera by fitten · 2002-02-20 10:37 · Score: 2, Interesting

Another system that used something like this was Tera: http://www.sdsc.edu/SDSCwire/v3.18/tera.html However, what they did was have 128 contexts per CPU and it round-robin'd through them all. You could also "daisychain" multiple CPUs together in a system. It was interesting but I don't know what ever happened to the machines they were building.

Re:Other systems.... Tera by Anonymous Coward · 2002-02-20 19:13 · Score: 0

see cray's mta architecture. (tera became the new cray)
Re:Other systems.... Tera by soyle · 2002-02-20 21:57 · Score: 1

The MTA (multi threaded acrchitecture) is alive and well. Tera bought Cray at SGI's garagesale and subsequently changed their name to Cray (http://www.cray.com).

The idea behind the MTA is to provide support for threads on the CPU itself. When a thread stalls on reading memory the CPU instantly switches to a different thread, hopefully allowing you 100% cpu usage. At the same time you can have a large number of CPUs all having access to the same global shared memory. I can't find a decent description on Cray's website anymore, but I'm sure it must be in there somewhere.

Tera has begun shipping their CPUs made from CMOS which appearently makes them faster and cheaper than galium arsenide. It's really quite an interesting idea, and it would be cool to see this technology dripping down into desktop computers.

Wait a second by MoneyT · 2002-02-20 10:38 · Score: 1

Why do we want to trick the computer? Isn't windows confused enough without believing it's a multi processor comp too?

--
T Money
World Domination with a plastic spoon since 1984

Re:Think of it as out of order execution ..glorifi by mrm677 · 2002-02-20 10:38 · Score: 2, Informative

Registers in modern processors get renamed. Intel gets away with having such few logical registers in their ISA (instruction-set architecture) because they have dozens of physical registers.

All hyperthreading will do is just maintain a different program counter and re-order buffer for each thread. There are probably other minor details as well, but don't get caught up in registers from a programmer's point of view. There is magic under the hood that the programmer will never ever be aware of. At some point in your program, their may be 8 or so "EAX" registers. Later on, this same register may be renamed to a "ESP" register.

Re:Hyperthreading useless on Win2K? (OT) by Anonymous Coward · 2002-02-20 10:40 · Score: 0

Does it work only from Administrator account or from any user account?

Thats Cool... but.. by WndrBr3d · 2002-02-20 10:43 · Score: 1

Does it have QUANTISPEED ?!

QuantiSpeed makes my CDs burn faster.

;-)

Re:Thats Cool... but.. by Anonymous Coward · 2002-02-20 11:35 · Score: 0

No, QuantiSpeed makes the internet faster

Are they *REALLY* tricking the OS? ;) by Geek+In+Training · 2002-02-20 10:43 · Score: 2

Has anybody cracked one of the new Xeons apart yet? How do we know that Intel didn't really slip two cores onto the same processor card... then, one processor would appear as one, two as four!! They sell for thousands more than they cost to make, anyways, right? Who's going to know?

Hmmm?!? :)

--
SlashSigTheorem: Humorous, Political, Critical, Constructive- If you have a .sig, someone WILL complai

Re:Are they *REALLY* tricking the OS? ;) by cbodine · 2002-02-21 14:03 · Score: 1

Well to answer you question about tricking the OS the answer is no. It is a bios level thing that will be handled by the board. Or more correctly is a new feature of the board that the system should just beable to just use.
So the answer is Yes and No.
About the cpu spliting I think you are on to something.

--
Dr. Suess: 'Gandalf, Gandalf! Take the ring! I am too small to carry this thing!' 'I can not, will not hold the One.

Re:It's good - but more for Win2K/XP that for Unic by camusatan · 2002-02-20 10:45 · Score: 1

Crap - just finished the article where they said that it doesn't really help that much.

Whoops! Well, I'll stick with the '..later steppings' line - when they start slapping a couple more ALU's and other processing units on the CPU so that more 'stuff' is available for the second thread, then it could be good!

Until then, I guess it's kinda like the Itanium - good as an experiment to see what things can be like in the near future, but not as useful for day-to-day operations.

Oh well - wish I could just edit my comment above!

Re:Hyperthreading useless on Win2K? (OT) by Anonymous Coward · 2002-02-20 10:48 · Score: 0

Not only does it work from any user account, supposedly it even works from Guest, if you have that account enabled.

Re:Hyperthreading useless on Win2K? (OT) by Anonymous Coward · 2002-02-20 10:49 · Score: 0

What code did you use in C++? This doesn't work:

#include

int main()
{
for (;;)
{
cout "\t\b\b";
}

return 0;
}

The stream should have gotten flushed at some point, and there is no significant impact on the process or system.

VC6 + Win2K SP2

Non SMP OSs would be obsolete! by Zot · 2002-02-20 10:53 · Score: 1

Win9x and XP Home don't allow multi processors, so they would not gain any benefit from this, right?

Re:Non SMP OSs would be obsolete! by TeknoHog · 2002-02-20 11:47 · Score: 1, Troll

Win9x and XP Home don't allow multi processors, so they would not gain any benefit from this, right?
You are so right! After the introduction of Hyperthreading, Win9x and XP Home are no longer the perfect, stable, reliable OSes that take full advantage of the hardware.
Oh, hmm.. wait..

--
Escher was the first MC and Giger invented the HR department.
Re:Non SMP OSs would be obsolete! by Lithium+Element · 2002-02-20 13:32 · Score: 1

The consumer-grade versions of Windows are usually a waste for uni-processor systems frankly.

PHB Alert! by RetroGeek · 2002-02-20 10:59 · Score: 1

All those PHB's will be drooling over this so they can have bragging rights.

Just so Word can idle more......

--

- - - - - - - - - - -
I am a programmer. I am paid to produce syntax not grammar. Deal with it.

Article says Hyperthreading no match for Athlon by Anonymous Coward · 2002-02-20 11:00 · Score: 1, Informative

From the article mentioned above:

As for the Xeon's Hyperthreading technologies, it's hard not to be disappointed with the scores which we got throughout our testing. Hyperthreading sounds like an incredibly useful processor feature in theory, but in practice, It's useless without compatible software on the market. Time will only tell if developers want to take on the Hyperthreading challenge, and the few developers we've talked to have not been that incredibly impressed with the technology thus far. If nothing else, Hyperthreading will certainly be an interesting to watch out for over the next few years.

Re:Hyperthreading useless on Win2K? (OT) by syzxys · 2002-02-20 11:08 · Score: 1

You're right, that code (even before Slashcode killed it for you) won't work. The standard C++ library eats the backspaces, and they're the key to making it crash. Someone (usually a library) has to be calling (eventually) WriteConsole with tabs and backspaces in a loop. Apparently, the C++ standard library isn't doing that, so chalk one up to C++, but only because it's not outputting what you told it to. Now, is that a feature or a bug? :-)

---
Windows 2000/XP stable? safe? secure? 5 lines of simple C code say otherwise!

Re:Hyperthreading useless on Win2K? (OT) by Anonymous Coward · 2002-02-20 11:14 · Score: 0

If not allowing you to shoot yourself in the foot is a bug, then what language besides C and asm isn't buggy?

Can this feature be added to Linux by PD · 2002-02-20 11:33 · Score: 3, Funny

Is there some way that Linux can be limited to a certain number of CPU's? It sure would be wonderful if there were separate versions of Linux for each possible number of CPU's. If you had a kernel that was only written for two CPU's, it should properly not work at all on 4 CPU's, preferably with a message saying "send more money to your vendor". And while they are at it, is there some way that XFree can limit the number of xterms to less than 4, so that if a user wanted to open 6 xterms they would have to download the XFree that ran with 6 xterms? Think of the marketing possibilities that can be used to improve Linux!

--
If tits were wings it'd be flying around.

bogomips?!? by Bobartig · 2002-02-20 11:33 · Score: 1

I read constantly that bogomips are not a measure of processorspeed, absolute or relative, do not tranlslate into performance, and are only used to assign cache timing. Is there something beyond this that I'm missing? Why include a bogomips rating for this dual Xeon behemoth?

--
This is where I get my recommended daily allowance of "Foot in Mouth."

Re:bogomips?!? by red5 · 2002-02-20 11:37 · Score: 1

IDRK(I Don't Realy Know): My guess is that when compairing procs based on the same core (P3 vs Xeon).
A bogomip isn't all that bad a mesurement.
just my $0.02(CAN).

--
I know I'm going to hell, I'm just trying to get good seats.
Re:bogomips?!? by Chirs · 2002-02-20 18:42 · Score: 1

Bogomips are basically how fast you can spin in a tight loop. For the clock speed, this isn't anything unusual, most modern processors can achieve a bogomips of twice their nominal processor speed.

hrm, well. by Anonymous Coward · 2002-02-20 11:34 · Score: 0

If you compare the intel parts to other alternatives, you're already getting half the processor, and for twice the price as well!

Comment removed by account_deleted · 2002-02-20 11:37 · Score: 2

Comment removed based on user account deletion

No "Virtual CPUs" by castlan · 2002-02-20 11:39 · Score: 1

Win2K Pro allows 1-2 processors. If you need 4 processor, Then you Need Windows 2000 Server. 8 requires Advances server. I think that Datacenter can allow up to 16 processors in theory. Enabling hyperThreading seems to mean "lying to Windows" as a cheap hack to enable multithreading on a non-SMP machine. I doubt this induces processor-hotes schitzophrenia (rather, multiple personality disorder).

If the article they specifically stated that a dual processor machine will run on Windows 2000 Professional without problems. The hardware can tell Windows that it has double processors, but this does not give you the effect of 4 virtual CPUs with 2 CPUs. This is only a cheap hack to compensate for prior assumptions in computing - namely, that a single CPU will have reduced performance with multithreaded code. This is because a context switch for each thread can waste time, making multithreaded code seem expensive. Even on Intel hardware, the BeOS had fairly cheap context switches, and the exceptional performance on even single CPU systems anecdotally disproved some generally accepted notions on the impact of multithreaded code.

In this article, benchmarks seemed to indicate that while single CPUs with HyperThreading enabled slightly outperformed those with disabled HT, the dual-proc systems tended to lose with HT enabled. This could jibe with the above when you consider that a single CPU with Hyperthreading really isn't 2 virtual CPUs. Rather, by masquerading as 2 CPUs to Windows, Windows will send code that is multithreaded, which can be more efficient with an effective scheduler. On the other hand, with 2 CPUs, there is no advantage in telling Windows you have 4 CPUs, because either way you are getting multithreaded code. Thus, the negative consequences based on the hack (lie) are no longer compensated by threaded code outperforming nonthreaded code.
Note that all of this assumes that there is No Real Change in the hardware funtionality when enabling Hyperthreading, other than the CPU requesting multithreaded code. It would have been interesting to see a screenshot of the CPU load monitor with a non-idle task load to see how the load delta is presented on the 2 "virtual" CPUs belonging to each real CPU.

Perhaps this will allow Windows to approach responsiveness approaching that of BeOS even on single CPU systems, although dual-procs will always be better for abstracting interface performance from system load.

Interesting quote from the article by Sivar · 2002-02-20 11:48 · Score: 1

"First off, it's quite easy to see that the dual Athlon MP setup simply rules the roost when it comes to raw CPU performance. Even with the Athlon MP chips at 1.6 GHz, it's easily able to outpace the dual Xeon 2.0 GHz processors, with or without Hyperthreading enabled. Even the highest performing Xeon setup still trails the dual Athlon MP 1900+ by roughly 30%."

--
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra

Re: Was once touted for all PowerPCs by tupps · 2002-02-20 11:50 · Score: 1

I know of a couple of people who mentioned that one of the features of the G4 was that the SMP was handled in such a way that it was possible to put 2 G4's on the same CPU die. As there are no Mac's with this nor any 3rd party vendors that have this available as an upgrade I assumed it hadn't gone ahead.

Looks like IBM have used it in there high end servers. Now if only I could get OSX running on them...

--
Go out and get sailing!

Re:Hyperthreading useless on Win2K? (OT) by Anonymous Coward · 2002-02-20 12:12 · Score: 0

Perl

I'm curious by abhinavnath · 2002-02-20 12:14 · Score: 1

Why do the Pentiums have only 8 kB L1 cache as compared to the Athlon's 128 kB? I suppose this is somewhat o/t, but I'm curious as to why there's such a big difference.

--
My other sig is also a .Porsche

Re: Was once touted for all PowerPCs by khuber · 2002-02-20 12:15 · Score: 1

You may be in luck - I heard that eventually Power and PowerPC will be consolidated to the Power architecture.

-Kevin

Intel isn't IBM, and computers shouldn't lie. by castlan · 2002-02-20 12:16 · Score: 4, Interesting

Despite VmWare, The Intel architecture isn't really virtualized. I am willing to believe that IBM can actually put multiple CPU cores on one die, for many SMP benefits without the downside of requiring multiple CPU dies and infrastructure.

On the other hand, for Intel this seems to be unrelated, and a cheap hack, where the CPU presents an SMP interface to the OS only to to request multithreaded code. Requesting multithreaded code should be possible without sending misinformation to the basic Operating System services. This seems to be a symptom of the disconnect inherent in having different companies producing the Architecture and the Operating System. It seems unlikely that you would see such a crude hack for fundamental systems infrastructure coming from an integrated vendor such as SGI, Sun, or even Apple. The Mips and PPC/Power don't lie about such things, because the support chipsets are actually supported by the Operating Systems.

Really, this seems reminicent of C/H/S mangling in storage interfaces and long file name mangling in the vFat filesystem. While is has taken up many generations (in computer years) to abstract the state of the art from such persistent kludges, they still haunt the consumer computing space to this day. Now do we really need to go even further backwards fuck with how the CPU interfaces with the core system?

Of course, perhaps I am not giving the Wintel juggernaut the benefit of the doubt... after all, they will have to bear the consequenses of such a choice, so maybe they aren't just adding saddlebags to the most basic part of the computing system. Perhaps they have the foresight to resolve this in the next few years before making this burdensome interface an accepted standard. It would be shameful too add more standardized interface limitations... such as the standardized "Gates" hopping of the 640K limit on commodity computing harware, or the IDE limit of 540MB, no 2GB, no 8GB, no 36GB, no wait 140Gb, Ad Nauseam. All of these arbitrary limitations were the results of cutting corners and not sending fully honest information to basic system interfaces. Can't the OS send efficient multithreaded code for processing without living a lie?

-castlan

Re:Intel isn't IBM, and computers shouldn't lie. by Anonymous Coward · 2002-02-21 23:47 · Score: 0

"Intel isn't IBM"?

" ... for Intel this seems to be unrelated, and a cheap hack, where the CPU presents an SMP interface to the OS only to to request multithreaded code."?

Did you realize this "cheap hack" is exactly the same thing as the IBM RS64 RISC CPU's "Hardware Multithreading" (HMT)? Turn on HMT on the p680 and it looks like the machine has 48 CPUs. IBM says this can increase performance by up to 22% for some workloads.

The reason it only boosts performance by 22% and does not double is because HMT allows a second thread to run when the CPU is in a wait state. For example, if the primary thread of execution causes the CPU to look for data in the L2 cache which takes several CPU cycles to access, the second thread is executed instantaeously. When the second thread completes or causes a wait state, the primary thread again executes. So basically HMT/Hyperthreading is a way of increasing a single CPU's effiency in executing code.

IBM has a white paper on this on their web page. It covers the types of workloads that HMT benefits. I bet the same holds true for Hyperthreading on Xeon.

Re:It's good - but more for Win2K/XP that for Unic by chfleming · 2002-02-20 12:19 · Score: 1

Dude are you completely talking out of your arse?

The CPU can't see the difference between a thread and a process. This CPU doesn't include a kernel.

Your argument is 100% invalid.

only 512mb ram by Steveftoth · 2002-02-20 12:20 · Score: 2

What kind of setup are they testing? Anyone who is willing to spend that kind of money for those processors had better put no less then 2 gigs of ram on those boards! Who puts 512 megs on a board like that ? I mean really!

Re:only 512mb ram by Lithium+Element · 2002-02-20 13:20 · Score: 1

If you're only testing then it doesn't really matter does it? You see, when you're using less memory than your system has then adding more doesn't change things. I suspect that anyone who would buy such a thing would be doing so on behalf of their company as well...so they would likely have more than 512MB installed, but I don't really see how you've determined that >= 2GB RAM is the magical number for such a setup. Think *application* here...
Re:only 512mb ram by Steveftoth · 2002-02-21 10:37 · Score: 2

I'm just saying that most people who use 'server' boards tend to stick a lot of ram in them as usually the ram isn't the major cost of a board like this, the CPU is. Hence, you might as well load it up with ram so that you know that's not the bottleneck. But there is no methodology to the 2 gigs of ram number. Most PC boards don't support more then 1.5 gigs most of the time.

Re:Hyperthreading useless on Win2K? (OT) by Anonymous Coward · 2002-02-20 12:21 · Score: 0

Nope. If I want to walk off the end of an array, or reference an empty hash value, Perl will automatically create it for me.

No, Perl will give you a lot of bullets, but it won't let you do everything.

GCC compiler options by jimbo3123 · 2002-02-20 12:31 · Score: 1

I am not sure about this, but I would suspect if one were to compile the kernel with the GCC multithread option "/j n" (if I remember correctly), (n being the number of threads the executable being split up into), then the kernel would be able to take advantage of a SMT processor.

--
There should be a moderation category "Dumbest Comment EVER"

How the CDC 6600 PPU worked by Huusker · 2002-02-20 12:38 · Score: 1, Offtopic

This isn't a totally new idea, either. The first step in this direction was the peripheral processor for the CDC 6600, in the 1960s, which appeared as ten peripheral processors to the programmer. Internally, it was ten sets of registers and one ALU, doing one instruction for each machine state in turn.

Here is how it worked. On the CDC 6600 when the CPU wanted to do I/O it would store a request packet into a magic memory address. The next virtual PPU would scoop it up and shovel the bits into the device. There was no DMA. The PPUs polled the I/O port to push each word of data. They also did most of the 'system call' functions. For example for a context switch the PPU would order the CPU to dump its registers and halt, then the PPU would swap in the new registers and order it to resume and load.

Each PPU ran one instruction before switching. The design documents called the switching logic the 'barrel', as in the drinking song Roll Out the Barrel. The design engineers must have liked their beer :-)

Re:It's good - but more for Win2K/XP that for Unic by mrm677 · 2002-02-20 12:40 · Score: 1

I sort of agree with him. If an SMT processor runs 2 threads from different processes, how will it handle virtual memory? I suppose you could implement an SMT processor with separate TLB tables...that would be the only way. I wonder if Hyperthreading does this?? Otherwise running 2 threads from different processes won't work.

Oh, and from my knowledge, most UNICES flush the TLB on a context-switch. Someone please correct me if I'm wrong. I do realize that some architectures (MIPS R10000 I believe) can associate a PID with each TLB entry. This would also be a solution to the problem.

Didn't Power PC chips do this like 10 years ago? by t0qer · 2002-02-20 12:43 · Score: 2

I was going to post a link to the mercury news article I read 10 years ago but it would have cost me 3 bucks, in these times that's 3 bucks I wish I had (GWB*)

Anyways, this was around the time of NT4.0, when I believe apple, IBM, motorola, and MS pooled their resources together to make the power PC chip. One of the things I remember distincly from the article was the flaw in intels RISC chips was their single instruction pipeline to the core, while the PPC chips had 3.

This same argument/article was made later with the introduction of the G3 series of processors if memory serves me correctly.

Not intel bashing at all, in fact everything cept the TV and microwave got a intel chip in my house. Just trying to make an interesting point.

I'd rather... by stdcallsign · 2002-02-20 12:44 · Score: 1

I think I'd rather see a decent implimentation that tricks the OS into thinking that 2 processors is 1 processor. So you don't have to have mutli-threaded/process code in order to gain performance from multiple procs. just a thought. stdcallsign

Re:I'd rather... by be-fan · 2002-02-20 13:15 · Score: 2

Damn lazy developers. This fear of multithreaded code is why Linux stuff like galeon is so horrendously single-threaded. BeOS might not have succeeded, but agressive multi-threading is something they got right. There might have been implementation issues, but the concept itself (which is rather effectively used in Windows 2000 as well, though its not hyped as such) but oh what a user experience!

--
A deep unwavering belief is a sure sign you're missing something...
Re:I'd rather... by SK-null · 2002-02-20 15:56 · Score: 1

Galeon is multi-threaded, dumb ass!

Yawn (was: I'm curious) by RFC959 · 2002-02-20 13:01 · Score: 2

(I was going to make this a root-level comment, but it's somewhat relevant to this...) I'll be interested in Intel chips when Intel stops skimping on cache memory. Intel says the new Xeons have a whopping !!!512K!!! L2 cache! Wow!

Actually, they should be ashamed to sell that as suitable for heavy duty. This is freaking 2002, not 1992. An UltraSparc III has 8 MB of L2 cache. A MIPS R12000 has (or can have) the same amount. IBM Power4s have similar amounts. (USIII has 32K instruction and 64K data L1 cache, and R12k has 32K of L1, for the sake of comparison.)

I admit I don't have any hard data to back this up, but it's my suspicion that it's in large part the large L2 cache that causes Sparcs to thrash Intels at some tasks. There's a good page on some processor design considerations at SETI@UNC.

Re:Yawn (was: I'm curious) by SK-null · 2002-02-20 15:48 · Score: 1

Cache speed (USIIIs' 8 MB are off chip, for example) and price.
Still, you're bloody right.
With Pentium4, Intel decided that only Xeons would have MP support, so that is the diference to a regular Pentium4 (same cache size).
In the good old PentiumII/III, every chip suported MP and the Xeons just had more L2 cache (up to 2 MB).

The problem with this should be obvious... by Jayde+Stargunner · 2002-02-20 13:05 · Score: 2

Gamepc.com

They're taking what is designed as a server processor, what is designed to be optimized for server tasks (such as web page serving which probably scales to multiple CPU and hyperthreading rather efficiently), and benchmarking it on Quake. *shrug* Really, who buys Xeon's for a gaming PC? And, if they do, WHY?

These are server CPUs and should be benchmarked with server benchmarks.

-Jayde

--
What's a sig?

Re:The problem with this should be obvious... by Matchu · 2002-02-20 13:39 · Score: 1

You didn't read the article, did you?

They didn't run Quake benchmarks. They ran standard processor and memory benchmarks, as well as non-server tests like mpeg video and mp3 encoding. They also compiled kernels. And in the conclusion they discussed stability and compatibility issues.

So, maybe you should read before posting.
Re:The problem with this should be obvious... by Jayde+Stargunner · 2002-02-20 17:41 · Score: 2

Notice that I didn't say "Quake III Arena" or anything. It was just to illustrate a general point. It's obviously a gaming-oriented site.

Adobe Photoshop Filters? Yeah. LOTS of server run Adobe Photoshop filters. I hear lots of people who shell out big bucks for Xeon servers use it for encoding MP3 files too.

Where are the web serving or database benchmarks? That's what these processors are for, not for making pretty lens flares or opening Sandra to see how many CPUMarks you've got goin' on.

We all know that general-purpose memory and processor benchmarks mean nothing in the real world. Yet, they used no "real-world" tests for the normal application of the processor itself.

-Jayde

--
What's a sig?

100%? by cosmol · 2002-02-20 13:51 · Score: 1

What is your definition of 100% efficiency?

Think about that for a while and you will realize your post makes no sense whatsoever.

Re:100%? by Steveftoth · 2002-02-21 10:46 · Score: 2

100% would be when the processor never stalls due to lack of instructions, which happens all the time in all processors. When all branch predicts are correct. Basically, when every gate on a processor does work that acomplishes the goal of executing instructions correctly. As of right now, all processors do work that they don't need to. When a processor comes to a branch instruction, rather then waiting till the result is determined (which happens at the end of the pipeline, not the beginning so that means however long your pipeline is, you'll be waiting that many clock cycles), it tries to predict if that branch will actually be taken. Sometimes right, sometimes wrong. Other times, the processor is ready to process instructions, but the cache doesn't have the instruction in it's memory, so it has to go out to the L2 cache, and maybe the L3 cache or main memory. This causes the whole CPU to pause while it waits. This problem is what HyperThreading is supposed to help lessen. It tries to reduce the amount of time that a CPU spends waiting for instructions from the cache by keeping 2 seperate 'threads' going at once, so if one stalls, it executes the other. Which hopefully, has the information it needs to continue in either the registers or the main cache. inefficiant processors spend a lot of the time just simply waiting, doing nothing, or doing work (taking incorrect branches) that is wrong (and hence thrown away).

Re:Finally! (indeed) by castlan · 2002-02-20 13:57 · Score: 1

Perhaps the parent could be considered offtopic. I would argue that this post is almost worthy of its own submission, and in that respect could have earned at least a +2 interesting. In relation to the thread it augments, it seems to be a witty commentary on the effects of misinformation not just in computing systems, but in the much wider fields of computer science in relation to existential philosophy. In this aspect, this post may have been worthy of +2 or +3 insightful. Maybe even +1 funny for the sarcastic delivery.

I am posting under this message in solidarity, but I regret that the AC didn't use a name in this instance. It would have been less vulnerable to mistreatment by a moderator not worthy of their points. Since this post will not likely reached a well deserved +4 before being archived, I will reprint this simply elegant statement below.

Thoughtless moderation helps nobody, and weakens the community. Rather than waste your moderator points, you should have saved them for a more thoughtful moderator to use. Now for some delicious irony, I fully expect to be modded down both -1 redundant and -1 offtopic.

cheers.

Re:Finally! (indeed) (Score:-1)
by Anonymous Coward on Wednesday February 20, @05:53PM (#3040304)

Yes, just think about how much more quickly intelligent agents can be developed using genetic algorythms in a scenario with a tricky god.

Xeon still beats Athlon for SMP by kawaichan · 2002-02-20 14:00 · Score: 1

For uni systems, Athlon has a good platform with an excellent processors but for those who wants to get two CPU based system, Xeon is still the only game in town.

You've gotta understand, this is the first time AMD made its own 2x chipset. Intel has been doing this for a long time and stability is just great.

Oh, another thing oh the hyperthreading thing, seems like it offers fairly good perfomance to programs that take use of SMP, stuff like 3dmax. It's not 4CPU for the price of two, but it's probably 3CPU for the price of two.

--

kawai

Compaq and the Alpha by cpeterso · 2002-02-20 14:37 · Score: 1

The results were astounding with very little changes to the processor core. I heard that the next Alpha was slated to include SMT before Intel killed it.

If the SMT results were so impressive, why would Compaq kill the Alpha? I read an interview with the lead Alpha designer and he said that the Alpha processor had been stretched to its limits and could not support new improvements like EPIC or 64 bit addressing. He was part of Compaq's strategy team for choosing the Intel IA64 for Compaq's future server family.

--
cpeterso

Re:Compaq and the Alpha by Anonymous Coward · 2002-02-20 15:17 · Score: 0

new improvements like EPIC or 64 bit addressing

sure. except alpha has been doing 64bit since day one.
Re:Compaq and the Alpha by SK-null · 2002-02-20 15:34 · Score: 1

First, Compaq sold the Alpha because Alphas never sold well enough and was becoming too big of a burden for Compaq.
Second, EPIC is a concept behind ISA design, as CISC, RISC, VLIW (here is where someone will say that EPIC is just Intelspeak for VLIW). If you design an ISA by one, you can't improve it to another. You design another ISA from scratch.
Third, the Alpha suports 64 bit adressing and data manipulation. Actually, unlike most other 64 bit RISC ISAs, Alpha doesn't even have a 32 bit relative.
Re:Compaq and the Alpha by Anonymous Coward · 2002-02-20 19:12 · Score: 0

I thought most 64 bit processors today are 64 bits, but only have 48 bit pointers... At least thats what I heard about Sun, Alpha, and Itanium.
Re:Compaq and the Alpha by Anonymous Coward · 2002-02-21 00:32 · Score: 0

There's been an Alpha architect posting on comp.arch who maintains that the official explanation is management bullshit, and that the engineers were very confident they could keep Alpha the top performing processor, if only they were given enough resources. Of course Compaq management would rather that someone else did the processor development so they can concentrate on their perceived core business of putting boxes together from commodity parts.
Re:Compaq and the Alpha by mrm677 · 2002-02-21 05:04 · Score: 1

EPIC is indeed Intel-speak for VLIW...sort of. True VLIW is where everything is statically scheduled. EPIC relaxes this. It groups independent operations into a single instruction word. The processor core is then allowed to schedule this however it wants.

Re:Hyperthreading useless on Win2K? (OT) by Anonymous Coward · 2002-02-20 14:45 · Score: 0

How's that not letting you shoot yourself in the foot? It's a bug with Windows, not the language. The output should technically work (if not backspace as much as intended) instead of crashing the OS.

intel's version isn't for anything else either! by RockyJSquirel · 2002-02-20 14:51 · Score: 2, Interesting

Notice that the Linux kernel build on two threads went slower with "hyperthreading" on than without it. And compiling is as eclectic a task as possible. I can imagine that highly optimized loops in graphics programs already max out some chip resource (like the float alus) so that multithreading them in this scheme does no good, but when compiling fails to parallelize, you know that intel must have screwed the implementation up, big time.

Multiple processors sharing the same cache on a single chip ought to be a big win, whether they share alus or not. In some cases a set up like this should signicantly out-perform regular multi-processors (when both processors are dirtying each other's caches). Intel must have screwed something up.

The benchmarks show that the current implementation of "hyperthreading" is basically useless. The idea could work very well though.

Rocky J. Squirrel

Re:intel's version isn't for anything else either! by Animats · 2002-02-20 18:22 · Score: 2

Compiling is branch-heavy, with no inner loop and little numeric computation. Heavy load on the cache, light load on the ALU. Worst case for this sort of resource-sharing.
The best case would be a tight numeric loop that needed the FPU resources about 50% of the time. Then, two "hyperthreads" could load up the FPU effectively. So you could code inner loops to exploit this thing. Maybe.

hey! by Afrosheen · 2002-02-20 14:54 · Score: 2

Imagine a beowulf cluster of these! w00t!

Re:It's good - but more for Win2K/XP that for Unic by camusatan · 2002-02-20 14:57 · Score: 1

Am I talking out of my ass? I sure am! And I assume you are, too, since we don't have the actual programmer's guide for the CPU. So bear with me a minute.

If the processor could handle two completely separate processes, it would be just a dual-core CPU, right? And if it were, it wouldn't be called 'SMT' - it would be called a multi-core CPU. Furthermore, this technique of SMT is designed to work around 'small' problems like pipeline stalls - that's not something you're going to solve by doing a full task-switch within the CPU - task switches take a long time. Switching from one thread to another that lives in the same process-space seems more like what they would be trying to do. So from that, I would figure that SMT only works with two threads with the same address space.

We know that the CPU needs some OS extensions in order to run - that's mentioned in the article. Those extensions would be for, I presume, scheduling. In other words, code that says: "when one thread/process is running, schedule another to run also if it lives in the same memory space. But if you've got two separate runable processes hanging out, only have one actively running."

All that I'm saying is conjecture, sure, and in some other dicussions on this topic people who actually know what they're talking about are probably saying far more interesting things than this.

Re:It's good - but more for Win2K/XP that for Unic by moogla · 2002-02-20 15:23 · Score: 1

Conjecture, exactly.

It (HT Xeon) does use two seperate TLBs. Not only that, each core has it's own register set including segments and TSS, LDT, the whole shebang. In fact, an issue mentioned earlier in the discussion and the article was that the two threads would sometimes conflict in usage of the cache (wanting dispersed regions of memory), thus increasing the cache miss rate than if they were two seperate cores. Furthermore, stock versions of Windows were used. They are not aware of the SMT nature of the processor, they just see two CPUs, and act accordingly. How could they know not to use threads that use different memory regions when it's never previously been an issue?

--
Black holes are where the Matrix raised SIGFPE

SPARC is in deep trouble. by Anonymous Coward · 2002-02-20 15:27 · Score: 0

Each improvement in the Pentium processor is another nail in the coffin of the SPARC processor. A Pentium-based notebook running Linux costs $1500. An equivalent SPARC-based notebook costs $5000. Yes. Some dumb schmuck is selling a SPARC-based notebook. Read " PC maker ships Sun-based workstation"

When Sun Microsystems announced that it would Intel/AMD-based servers running Linux, Sun basically declared that the SPARC processor would be discontinued within 10 years.

U-V-W pipe by yerricde · 2002-02-20 15:33 · Score: 2

U-V pipe only was used in the original pentium. Pentium pro and up use an out-of-order executing RISC core.

The P6 core, used in Pentium Pro, Pentium II, Celeron, and Pentium III, has an OoO core with three functional units, one that can execute any kind of instruction and two thin that can do only simple instructions. Think of it as a U-V-W pipe where U is a fat pipe and V and W are skinny.

The Pentium 4 core, on the other hand, has six pipelines, and three of them (the double-pumped ALUs) can handle two micro-ops at once. However, the decoder can feed only three micro-ops per cycle per thread. Hyperthreading goes a long way toward keeping its pipes fed.

--
Will I retire or break 10K?

Re:U-V-W pipe by moogla · 2002-02-22 05:31 · Score: 1

Sort of. Except that since the Pentium Pro core completely rewrites and rearranges each instruction into microops, you never get to write code to exploit the three pipes. The original pentium directly translated each instruction for each execution unit, hence you could order each instruction for parallel executions. With the Pentium Pro, you can't really guarantee anything since the instruction rewriter is the one doing all the ordering for parallelism. You can actually make it's job harder by trying to help it out. P5 optimized code tends to stall out the P6 series.

--
Black holes are where the Matrix raised SIGFPE

Cherilyn LaPierre Patent Term Extension Act? by yerricde · 2002-02-20 15:40 · Score: 1

Or maybe they just had to wait for a patent to expire?

Good thing the big drug companies haven't scrounged up $6 million a piece to donate to Congress to get the Cherilyn LaPierre Patent Term Extension Act passed.

(See also the work of her late partner Sonny Bono, the campaign contributions that led to that law, and the lawsuit to get it overturned.)

--
Will I retire or break 10K?

Re:Are they *REALLY* tricking the OS? ;) The TRUTH by castlan · 2002-02-20 15:44 · Score: 2, Funny

You have a point. Most people don't know it, but modern processors use a hazardous combination of Potassium Fluoride and spent Plutonium to regulate clock speed, which is the real reason that it isn't safe to overclock your computer!

That density is especially thick with server CPUs, especially the Xeon. That is why, to date, nobody has set off large enough of a reaction to be deadly with overclocking PCs, but that is not the case with Xeons, whose Plutonium content is dangerously close to the Critical Mass. And you thought your Intel CPU ran hot! Everybody who runs Xeon servers knows better than to play with the clock speed.

In fact, that is why you can't ususally buy Xeons without ECC RAM, the radiation put off by the computation would too readily disrupt the memory state information. What, you bought that nonsense about solar flares or other sources of random radiation causing bit decay? Of, and FYI, don't run Distributed.net or Seti@home on your Xeon if you have fillings or a mercury thermometer in the area, unless you are interested in a direct demonstrations of fusion in action.

Really, since the Cold war had ended, Microprocessors have been constantly dropping in price. Why was this phenomenon never observerved until the 80? Moore's Law my ass, how about "military surplus in action." It is much too expensive to store all of this spent plutonium in federal compunds. So the FCC had to ensure that computers had sufficient shielding, and now, there you go. Let me reiterate that ever since the "Pentium Pro" (Plutonium Recycling Operation) and with each new generation of CPU, active cooling remains a matter of utmost priority.

Really, they would have just put it in the drinking water supplies to distribute the threat amongst all of our nations' citizens (like taxes and fluoride) except that in secret military tests they found that the subjects teeth and bones started to glow in the dark, which would have been too obvious to cover up for long. So they stationed the subjects in parts of Japan, Las Vegas, and tropical locations so that the glowing would be concealed by all of the Neon Lights and overpowering sun (causing a sun tan, to help cover the light emissions) and classisifed the research.

Well, in any case, I highly recommend tht you don't ever "crack" one of the more recent Xeons apart. Rather, you should carefully and delicately disassemble then in lead casings. much of the extra "thousands" in cost are spent in proper protective casings, which was the real reason for Intel to do away with the Socket interfaces for Slot 1 in the first place. New high performance ceranics and Lead-magnesium alloys have allowed the protective casings to shrink again, but you still need to be careful. And dont EVER let two Intel cores come in contact with each other, or not even the Liquid Hydrogen active cooling system will save you...

Dammit! I should have posted anonymously! Now they'll get me! It's a good thing I'm posting from OpenBSD virtualized inside of Tinfoil Hat Linux! I can use the HyperThreading Hyperspace technology to encrypt my essence and escape! Fight the Future.

***Disclaimer***
Fluoride is good for your teeth and bones. Fluorosis is nothing but a commie... er, a Terrorist plot. If you don't ingest fuoride and develop fluorosis then the terrorists will have won! Make sure you brush your teeth with an ADA approved sodium-flouride "activated" toothpaste, and be only drink potassium-fluoride supplimented water. Ignore naturally occuring "healthy" calcium-fluoride that only hippies and tree-huggers advocate!

Qoheleth and Intel Hyperthreading hyperbole by Anonymous Coward · 2002-02-20 16:02 · Score: 0

Following a couple of links, I get to the Intel article http://developer.intel.com/technology/hyperthread/ (Hyper-Threading Technology) which, in keeping with the modern trend to patent things that others have been doing for years, includes the claim

Hyper-Threading Technology will enable the world's first simultaneous multi-threaded (SMT) processor.

Well, the truth is that this is nothing more than a new name for virtual multiprocessors, which go back at least as far as the Honeywell H-800 in the late 1950s. Anyone with extensive expreience on the Control Data Corporation 6000 series probably remembers that the Peripheral Procesors (PPs) on the CDC 6400, 6500, 6600 and 6700 were implimented as virtual multipprocessors.

Here is a thing, it is new.
It is not new, it has already been
for there is nothing new under the Sun.

Is the kernel patched for HyperThreading? by r6144 · 2002-02-20 16:25 · Score: 2, Interesting

I remember that Alan Cox wrote a patch to deal with Hyperthreading just a few months ago. One thing it does is to avoid putting two threads on the same hyperthreaded CPU when there are spare physical CPUs, i.e., distinguish between physical and hyperthreaded CPUs.

Is it in the test kernel?

use different areas of the processor by hyrdra · 2002-02-20 16:47 · Score: 2

A good analogy to how this works would seem to be segmented downloading. On a fast connection, a segmented download splits up a file into chunks and then opens multiple connections on the same interface, and this tends to utilize more of the available burst bandwidth.

Despite the error in this, splitting up a program into two threads to run on one processor seems logical. It affords for advances in parallism, which is what processors (even single) like and optimize for. This way if two threads are running, one can be making heavy use of the ALU and the other the FPU, which are physically seperate areas of the processor, instead of one section sitting idle while to other reports 100% usage to the OS. One thread can be loading and moving data into memory while the other does number crunching...AT THE SAME TIME.

This seems like a very good model, and I can see where it would increase performance by a huge magnitude if implimented on RISC systems, since instructions typically take only a few clock cycles to complete, and most programs are written to perform them sequentially. In hyperthreading, the processor could deal with several instructions at once (like they do already), only the difference would be these wouldn't be JMP guesses or preparing executed code in case of a branch.

Cool stuff, Intel is in the right direction. It would be interesting if someone would write a program to test an ideal HT condition, like a program with two threads, one doing logic stuff and the other floating point. What would the performance increase be?

--

"I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95

Great - now I have to get a 2 CPU license by Sean+Clifford · 2002-02-20 17:44 · Score: 1

Great - now I have to get a 2 CPU license for Micro$oft stuff...

can it do this? by Anonymous Coward · 2002-02-20 17:44 · Score: 0

Can the new Zeon tricks the user license too?

Intel is desparate by BeerSlurpy · 2002-02-20 19:06 · Score: 2

Die size, die size, die size.

The larger and more complex the chip, the more it costs to make, and the higher the probability that there will be a defect in a randomly chosen chip. It is more cost effective to make one good cpu than to make two crappy cpus and put them on a single grid array.

Intel is trying to get back on top in terms of performance, even if it means taking an unelegant approach and making the chips extremely expensive to produce. Note that Intel's fastest offering is barely as fast as the fastest athlon- to accomplish this, they had to move to .13 micron and use a die size that was STILL larger than the athlons. I predict that sledgehammer on EV6 will be much more interesting news than hyperthreading.

Marketing by Anonymous Coward · 2002-02-20 20:08 · Score: 0

Why "Hyper" threading? That's very misleading.
It sounds as if a thread was being created for each function call. How are we going to call that then?

Slow Xeons by Hieronymus+Howard · 2002-02-20 20:55 · Score: 2

My company has just bought us developers Dell 530 dual 1.7Ghz Xeon workstations. Nice, you may think, but it feels bloody slow. 15 seconds to compile 50 lines of C, using gcc (using cygwin on NT). Something really seems wrong with this box.

Not only that, but the idiot who ordered these PC's really overspecced them for development work (mostly editing & compiling), but ordered the bottom-of-the-range monitors for them (17" 60hz @ 1280x1024). People are complaining of eyestrain and headaches. I kept the 19" monitor from my old PC, but I'm so close to quitting this job.

HH

Re:Slow Xeons by magus1988 · 2002-02-26 15:52 · Score: 1

maybe you are doing something wrong? I have a Dell 530 dual 1.4 GHz with 1 gig of memory. I can compile the entire linux kernel in 2 minutes and 12 seconds on it. And it's way over 50 lines of code. Make sure the CPU isn't in "compatible mode". If you go into setup ( hit F2 to do so) there is a field for the CPU. Make sure it doesn't show the speed of the CPU as "compatible" if it does, move your cursor to it, and hit left or right arrow to change it to "1.7".

Why does this work? by jarran · 2002-02-20 22:28 · Score: 1

Imagine you have just one task running. In a SMP system, one processor would be busy while the other idles.

What will happen when we have one real CPU, but two virtual CPUs? Won't the OS send the idle task to one of the virtual CPUs, thus halving performance?

If we are "tricking" the OS into thinking we have two CPUs, I see no reason why this won't happen.

I must be missing something somewhere.

1 = 2 , 2 =4.... by martin · 2002-02-20 22:55 · Score: 2

Sounds about right given Intel's previous mathematics 'errata' for everything about the 386.

:-)

martin

Marketing on Crack by AndyChrist · 2002-02-20 23:55 · Score: 1

Umm....intel is trying to sell software companies on Hyperthreading, so they go and sell lots of P4s with it disabled.
Makes sense. Crack the eggs, kill the chicken.

Re:Think of it as out of order execution ..glorifi by GooberToo · 2002-02-21 01:32 · Score: 2

There are many issues which the article did not address at all. For example, I would of loved to known how it effected system latency. For example, if over all performance is (-1-2%), and process latency has been improved by say 5%-10%, for workstation users, this may be a worth-while trade off.

Also, the article seems to push very hard for raw CPU performance. Allow me to clarify. While Intel does seem to indicate that performance boosts can be achieved, I didn't really read it to mean that total aggregate CPU performance would be gained or if it is, certainly not by much. Let me put it like this. It smells like this technology is geared to help out systems which normally run 80%-90% of their total CPU whereby HyperThreading would allow for effient use of the difference while requiring only common SMP application support.

Also, I didn't read that HyperThreading was geared to be directly taken advantage of by Linux or Win platforms. I suspect that there are significant OS opimizations that can be made for more intelligent scheduling and improved processor affinity. Here, I can see that processor affinity may make significant differences in overall performance. While Win's CPU affinity is only slightly better than that of Linux's current scheduler, I'm hoping that significant affinity improvments will go a long way toward addressing possible shortfalls with this technology. As such, it certainly would of been interesting to see how well Linux did with the new O(1)-scheduler in development as it has many optimizations which specifically address better CPU affinity. Plus, if the scheduler can make the distinction between virtual CPUs and it's associated owner, I can see that it may make sense to allow for processor bias between physical and virtual CPU's within a scheduler. After all, if a process is to migrate, it would seemingly (best guess here) make sense to allow it to migrate to a virtual self first before it migrates to another CPU entirely. If a process is currently executing on a physical CPU, does it make sense to allow it to migrate to a virtual CPU on a physically different CPU? I'm guessing that would make for a significant performance hit. How would it perform is process migration were only allowed to occur to it's own virtual CPU? I'd certainly like to know.

By allowing the scheduler to make intelligent migration and accordingly biased decisions, I'm guessing that any OS may be able to make significant performance in-roads while using the HyperThreading technology. As such, I'm guessing that more significant performance gains can be achieved by having the OS HyperThreading aware rather than attempting to heavily optimize at the application level. With proper OS support, I'm guessing that little more than simply SMP application support will place this technology in a completely different light.

yep by muchandr · 2002-02-21 01:46 · Score: 1

I have seen Intel confirming elsewhere that their
Hyperthreading is based on SMT design they got
with Alpha, which is in turn based on Washington's
SMT. I went to an SMT talk by Washington guys while at Berkeley - very elegant and impressive
stuff indeed.

stepping stone to true mp machines by slaida1 · 2002-02-21 02:03 · Score: 1

Looks like a good way to persuade programmers to do their stuff with MP processing in mind.

--
Preserve old classics: copy your collection onto all hard drives.

Re:Think of it as out of order execution ..glorifi by fitten · 2002-02-21 03:41 · Score: 1

Yes, I was thinking that thread mobility in those benchmarks was probably the reason why the poor scores were generated. Along with moving the threads, you of course have all the problems associated with it. If you look at the benchmarks on that page, Hyperthreading makes the single CPU faster usually, which we could have predicted (and is surely what Intel wants). On the dual CPU configuration, it actually hinders performance in many cases. There are some other issues with treating one of these CPUs as two virtual CPUs. First, it isn't really two CPUs and threads scheduled on it will not necessarily be executed with fair time slices. If you schedule two threads on one of these CPUs, it is possible that one of the threads uses the majority of the resources for the timeslice and the other does little - until you reschedule it because the timeslice has passed. So basically, you can pay the penalty of two context switches for little/no work done. Also, remember that a CPU, even in highly optimized code, is not running at 100% of its potential - using hyperthreading you aren't getting a 30% faster CPU, you (should be) using the one you have more efficiently in terms of keeping the various execution units busy. I read somewhere that a typical instruction stream in a P3/P4 only keeps 33% to 50% of the execution units (ALU, FPU, etc.) busy. Hyperthreading simply tries to keep more units busy more of the time.

SMP and Hyperthreading by SkyLeach · 2002-02-21 04:02 · Score: 1

The real benefit here for me is that if the technology is adopted all around then most software will be written to take advantage of SMP. I have been running SMP for a long time, even though many people have tried to say it's not helping me. I can demonstrate the goodies though:

1.) While playing games that are not SMP enabled, I still benefit because the game can beat the crap outa one CPU while the OS (graphics, networking, system management, messaging, dlls/sos etc...) share the other processor.

2.) Most graphics software is SMP compliant, so I get the benefit of that when running an SMP compliant os (NT, 2K, XP, Linux, BSD, etc...)

So for us SMP geeks this is a win-win situation.

--
My $0.02 will always be worth more than your â0.02, so :-p

Question about compilers. by Anonymous Coward · 2002-02-21 16:00 · Score: 0

Isn't this already done in software, as one of the possible optimizations?

Moreover, this looks like some kind of parallelization... Doesn't the Transmeta chip this after translating x86 codes? (code-morphing?)

If it's not done yet, well, it should be.

Re:Question about compilers. by Anonymous Coward · 2002-02-22 03:57 · Score: 0

As nobody seems to have noticed my question (probably because it got score 0, which 99% of the time happens to me), I'd like to make it more interesting:

Can we do this in software? I mean, like the Transmeta technology, but on a smaller scale? This would be like just-in-time compilations of Java & Oberon, only this time working on native code...

I ask it because my old machine has a CPU bottleneck. Sometimes a program reaches 95% CPU usage, but most of the time it's 80% idle.

I hope someone sees this. :-\

STOP seti@home !!! by castlan · 2002-02-22 18:19 · Score: 1

Should you stop running Seti@home? Most definitely you should stop!

But not because of heat issues. As long as you aren't overclocking your components, and have adequate cooling, then your computer should be fine. but aren't some things more important than your computer?

Let me ask you this: Have you even considered what would happen if we actually did make contact with alien life? If they are sufficiently advanced enough that we could communicate, then most likely they are at least as advanced as we humans in other aspects as well. The aspects I am concerned with range from political ambition, to weapons of extreme destructive technology!

What if they invaded America? They could reveal just how fragile the current state of our republic is, in light of its stagnation under the "two party system"! If they "bodysnatched" the presidential candidates for both the republican and democrat parties, the we would have no choice but to vote for one of them! As stated by Kodos and Krang, you couldn't vote for a third party, as that would just be throwing your vote away! So don't blame Homer J Sixpack.

What if they penetrated the White House's security by posing as a curvaceous floozy or hard working intern? From the "JFK room" they could chew nitrogen gum and frustrate our president with their sexy artificial bodies. Then when we finally have no choice but to use our nuclear missile against their mothership, the head alien would inhale the formidable payload and become pleasantly intoxicated. Then they would destroy all of our civilization, and most of the people, so that no one remained but Natalie Portman, who would have the task of repopulating our cities, and effectively mother the entire human race!

...actually, that doesn't sound half bad. Forget that freak with his Gramma's record player, if I got to share the bunker with Natalie, then I'm all for it! C'mon then, get cracking! How many units have you completed? Don't dally man, this is the future of the Human Race we're talking about! And while your're at it, do you think you could get me a box of grits? Miss Portman likes 'em hot.

-castlan

Still, computers shouldn't lie. by castlan · 2002-02-22 20:15 · Score: 1

Intel isn't IBM. When Intel gets their chip process under 100 nanometers, they may look into providing multiple CPU cores on one die, as IBM does today.

While what Intel is working on now is unrelated to what I was giving IBM credit for, you are correct in that IBM RS64 III and above offer HMT. This would mean that I was initially incorrect: this would be an "expensive hack".

The reason performance with some workloads can increase by 22% and not double, is that the CPUs are not doubled. It it not a good idea to lie to your computer, and that is why HMT is disabled by default. HMT falsifies the CPU information presented to some system interfaces, which can cause problems. It invalidates the dynamic CPU allocation feature, which seems to me a more important feature. To altering the allocation of physical CPUs is dynamic, while enabling and disabling this dubious feature requires a reboot? That is unfortunate.

Most strinkingly, While HMT under Windows is most valuable in single processor systems, to enable multithreaded code, it is not even an option in single processor AIX systems. Even singlethreaded code will benefit from double processors, as it allows double the threads to execute. This is not the case for HMT's "virtual" doubling, as the thread will never idle, allowing the next thread to take over.

In fact, HMT seems just an attempt to compensate for the limited threading model of Unix-generation systems. While moving the thread scheduler into the hardware is theoretically the best way to get the performance necessary for a responsive threading performance, it shouldn't be necessary to lie to an insufficient interface to achieve it.

I do agree that that the performance improvements of HMT should be achievable, but instead of violating interfaces, those interfaces should be fixed. I recall running Distributed.net's client on a 16 way Onyx. When they implemented CPU autodetection, the client would automatically spawn 16 threads, completely bogging down the system. In a matter of seconds the system was almost completely unresponsive, so that I had to use a serial terminal to kill the thread. When I specified that only 15 threads should be spawned, the responsiveness was silky and smooth, even as I watched the threads shift around between processors (which were running at different clock speeds). Why didn't running one thread on an O2 produce a similar effect? It seems that the Unix legacy is still carrying baggage from its single processor origin in its scheduling model. I would love to see if running the Dnet client on an SMP Linux box (preferably 4+ CPUs) would produce similar results under 2.2 or 2.4, as it seems that Linux has actually made the most progress in this area, despite SGI and IBM's capability of running Single partition system images on hundreds of processors.

I would actually like to play with an SMP Xeon with HMT, so that I could but BeOS on it and see if it's nifty little scheduler fares any better, in light of it's "pervasive multithreading".

If they are the same technology, then it is really too bad that Intel couldn't just use the more desriptive term "Hardware Multithreading". Doesn't this technology just come from the alpha anyway, or did IBM develop it first?

BTW. You really should get an account. It is free, and it makes me feel better about responding to your post with any amount of effort, as there is a better chance of you actually seeing my response, and lets me recognize you for future correspondance. Thanks for the info about the RS64 line. I really haven't looked into AIX much since we moved to LinuxPPC.

-castlan

Slashdot Mirror

Intel Hyperthreading In Reality

285 comments