Intel Hyperthreading In Reality
A reader writes: "Looks like GamePC has got the first look at Intel's new Xeon processor, which has the new super-fantastico Hyperthreading technology, which tricks your OS into thinking one CPU is two CPUs, two CPUs is four. Looks neat in theory, benchies included."
WHY? I mean, come on... If you want two processors, shouldnt you have 2 processors in the systems???
DocChaos -------- I may be crazy, but then again I may be crazy.
I am not as concerned with how it "tricks" the OS as much as I am about performance and reliability. Tell me how this actually makes the chip BETTER and I might get excited.
Tequila: It's not just for breakfast anymore!
So I can just buy half a processor, and get full functionality? ;)
Hyperthreading, the feature that can theoretically turn your 2 physical CPU's into 4 virtual CPU's
... *zwusch* :)
This is quite kewl, i could turn my 2 cpus to 4, then to 8, 16, 32
I like magic stuff like that
Life sucks.
...my keyboard so I could in essence have four hands.
Hyperthreading is a pretty cool idea, especially for those of us who would like to see SMP move more into the mainstream.
According to this article, though (posted on 2cpu.com), the Windows 2000 scheduler doesn't know how to take advantage of hyperthreading, since it doesn't know how to take advantage of virtual processors. (I suppose Windows XP does?) Go figure. Anyway, this looks like it's probably worth checking into. I'm sure Linux will support it!
---Have you crashed Windows XP with a simple printf recently? Try it!
Heh, the XBox 2 is coming now instead of Microsoft, Intel will get into the Video Game action.
The XEON chip started shipping bacnk in January. The Prestonia server chip is made on the 0.13-micron process. Intel cancelled the .18 micron process last year to focus on the .13 micron process. That is amazing!!!
Alot of people are giving reviews for the new XEON chip
Here is a link To another review of the XEON chip.
Wow, kinda sucks if your OS has a per CPU license, like NT and Win2K server!
Make sure you use the Printer Friendly view, that way you don't get 12 pages of slashdotted hell! Look here.
-- Dan
Basically what they're doing is simply taking unused processor resources and allocating them to another thread. You can now have multiple _threads_ of excecution simultaneously... truely simultaneously.
Thread X is using register's B and C
Thread Y can able to use registers A and D.
These threads can be executed together without a context switch... and the processor will hunt out these relationships in hardware. That's what "the big deal" is.
Until now, when a processor "multitasks", it's simply switching from one thread of execution to the next... it allocates separatetime to two different threads....Now it can allocate the exact same timeslice to multiple threads as long as there isn't a resource dependancy.
If your program can be architechted to take advantage of this (or your OS can schedule tasks like this), you'll get a huge benifit (read: if it works on SMP systems, it'll get some benifit on this as well).
I've been waiting for literally *years* for a CPU that will trick my operating system! Nirvana, I kiss you!
With AMD's past history with overheating under heavy use (read overclocking), wouldn't this hyperthreading just compund the issue by tricking the OS into overworking the CPUs?
a cluster of a cluster of these ...
...a Beowulf cluster of hyperthreaded Intel processors. They're all virtual anyway.
Only support a max 2 processors? Does this mean that anyone who wants more than 1 real processor has to run the Server edition of one or the other? And... what about per-cpu liscences? If I wanted to run my dual-Xeon mobo before, I only needed a liscense for 2 cpus...
Sure, it does sound good having the ability to pose as 2 cpu's, but you won't get the performance that you would from a real dual cpu setup.
And, because of this AMD is at advantage. Athlon is much much smaller at an equal fabrication process, so even if hyperthreading took off, AMD would be able to combine 2 cpu cores in the one chip and still be able to compete easily in terms of die size and attain a higher level of performance, because 2 real cpu's will beat 1 cpu posing as two any day.
A number of people have posted asking what the point was of making a single processor act like two processors. It's actually explained in the article linked to above.
Apparantly, he big deal is that a single processor can only handle one thread at a time--multitasking works by breaking programs down into threads, and working on one thread for a little while, then another, then another, then back to the first. But at any given time, only one thread is being actively executed. Hyperthreading changes this--a single processor can work on two threads truly simultaneously. This makes multitasking a hell of a lot more efficient.
The original Howling Frog is a fictional character and has no UID.
In hyperthreading, the logical processors do not share registers, just function units. Thus, if one logical processor needs to multiply while the other needs to add, they may share the CPU resources simultaneously.
This was developed in response to the observation that individual function units remained idle for multiple cycles while the current process was busy doing one kind of operation.
-B
The impression I got from the story is that Intel put this in now so that they can figure out what bottlenecks they face in turning hyperthreading into an advantage. Also, it gives compiler writer, OS writers, and application writers some exposure to the technology. I would not be surprised if a few years down the road this is a big win for some environments (ie, not office2.005k)
Hyper means overexcited - how about Underexcitethreading - Makes your PC with two CPU's think it only has one! :P Oh wait... That's like windows 98...
Any technology distinguishable from magic, is insufficiently advanced.
The Intel Xeon actually has *another* set of registers to cope with the second thread.
Unfortunately, the big slowdown in computers is accessing memory and peripherals on the various buses. Looking at the details of the Xeon, it still competes (and queues) for access to memory.
It's also worth considering that although programs tend to have a few threads to look after things like printing while you carry on writing your document, you tend to by using one or maybe two threads heavily at once and the rest are just mostly idle, waiting on hardware and interrupts.
Intel themselves are claiming 10% speed improvement, even when compiled to take account of SMP, or 30% for specially optimised code (yeah as if that's going to be popular). Don't get fooled into thinking your PC is going 2x faster.
the site is partly /.'d already but the printer friendly (non graphic) version seems to actually still load.
http://www.gamepc.com/reviews/printreview.asp?revi ew=ppso&mscssid=&tp=
Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.
"The remote site or network may be down. Please try again."
Wonder if that server is vibrating around the room right about now.... (remember the simpsons episode where they raced the washer and dryer in moe's tavern?)
And it may hurt. A downside of "hyperthreading" is that the threads contend for cache space, so if the threads are executing very different code, the cache miss rate will rise. Of course, this happens in ordinary threading on each context switch, but with "hyperthreading", there's a context switch of sorts on every instruction cycle. If this effect shows up, it will show in L1 cache miss rates.
This isn't a totally new idea, either. The first step in this direction was the peripheral processor for the CDC 6600, in the 1960s, which appeared as ten peripheral processors to the programmer. Internally, it was ten sets of registers and one ALU, doing one instruction for each machine state in turn. Basic/4, a forgotten minicomputer manufacturer, tried a similar idea in the 1970s.
On the other hand, this apparently isn't that tough a feature to add to an already-superscalar CPU, so why not?
It is easier to think of this in terms of the OS concepts of a paging system.
When Thread1 gets a Load-Fault (think of Pagefault), it executes Thread2 instuctions that it has cached. Much like an OS will execute Process2 if Process1 hits a pagefault. Basically what the hardware is there to do is to fill up the bubbles in the pipeline that are caused by one thread with instructions from another thread.
Some other complaints about this "invented at Intel" terminalogy can be found at The Register.
Also Toronto has a nice slide show (pdf) on the topic.
For the record I contributed a little tiny bit to this stuff when I was at Intel (I found what I think was the first multi-processor bug for SMT.)
As Nietsche famously said, "If you stare too long into the Abyss, 1d4 Tanar'ri of random type will attack you."
Looks like GamePC's website isn't running one of these babies yet.
:(
Slashdotted already.
I don't know where else to post this, but has anyone been experiencing problems
with Slashdot? More specifically, cookie problem, such as not being logged in
when you vist the site? For the past three days, I've had to log into Slashdot
multiple times thoughout the day. I've also gotten errors on the site that force
the cookie to be written to the screen. Is there any place where we can see/post
bugs/bugfixes for the site? Or a thread where we can see status on the site in
general? I enjoy reading the site too much to get fed up with small problems.
Th
I was trying to simplify things... I probably went a bit too far.
The regsiter level contentions are alleivated with Out of order execution (more ore less).
A good example of where hyperthreading helps is the front side bus. Procesors tend to spend over 80% of their time executing out of cache. Thus the front side bus is sitting idle (or performing simple snoops).
If one thread is going to be memory intensive (video streaming for example... or texture manipulation), or even I/O intensive and thus results in a lot of transactions along the FSB... it can occur at the same time as a second thread that's FPU intensive
(asuming the I/O intensive one isn't FPU intensive as well).
Simultaneous Multithreading (SMT) is not a new idea, although no one to my knowledge has implemented it yet. Intel just calls it "Hyperthreading"...it is essentially SMT.
And yes, this is a very good idea. A modern superscaler out-of-order processor, like the Athlon and Pentium Pro (and later), can issue and retire multiple instructions per clock cycle. However, it can *only* do this if there is enough instruction-level parallelism (ILP). Turns out, there is not enough ILP in current programs to take full advantage of the chips processing capabilities. Issue slots and function units go unused due to dependencies in the program and cache misses that stall the processing. A typical processor can only look at about 32 instructions at a time. This is not a large enough window to execute future instructions out-of-order when such a stall occurs.
However, 2 threads of execution will likely fill all of the issue slots. They are also independent threads of execution, so dependencies don't exist between them. This means that when the pipeline stalls due to a cache miss, the other thread can keep on retiring instructions.
To all those saying that this is dumb, I suggest you study some modern architecture (I'm not talking about your undergrad architecture course either). A paper I read recently studied the affects of SMT on a simulated Alpha processor. The results were astounding with very little changes to the processor core. I heard that the next Alpha was slated to include SMT before Intel killed it.
Just curious, how Linux kernel will work on that processor. Will there be any improvements?
Last friday I got my first taste of the Xeon processor. I work for a company that makes heavily optimizing OpenMP compilers, and we tend to get some of the latest hardware in short order. Last friday, I set up a machine with:
Dual Xeon 2.0Ghz CPUs (3997 bogomips on RH7.2)
1Ghz ram
36Gb disk
This machine is extremely fast. A test suite that runs in 4 hours on a dual PIII 800MHz (512MbRam) runs in about 45 minutes on this machine.
Anyone else think it odd that GamePC is reviewing this? Do ANY gamers run Xeons?
Jason.
But if this works, then all the CPU functional units will get higher duty cycles, and now we'll need a bigger fan!
Prestonia Xeon 2.0 GHz vs. Athlon MP 1900+
www.gamepc.com
2/19/2002
While Intel and AMD have seemingly taken a breather from their constant one-upmanship in the consumer processor market, things are still churning along for the workstation and server markets. While the consumer level chips from both companies (Pentium 4 and Athlon XP) bring in large portions of cash, the workstation and server processors are where the real money is made. These processors go for a much higher price premium on the market and are commonly used in more expensive multiprocessor setups.
The customers who buy these chips tend to buy large quantities and like to use them for multiple years without any issues. Therefore, stability and reliability are the most important factors in buying a chip here with raw performance coming in second. Sure, having an incredibly fast processor is nice, but if you're constantly having to reboot the systems due to processor or motherboard stability problems, the system becomes more of a burden than help. Thus, there is a constant struggle for IT managers to either go for the fastest workstation chip on the market, or go with the chip that's known for excellent stability. Both Intel and AMD are striving to become the processor manufacturer that gives workstation users both the best performance and best stability on the market.
Intel has the Xeon family, which has had a foothold in the low-end server / high-end workstation market for multiple years now, stemming back to the original Pentium II Xeon. The Xeon now clocks up to 2.2 GHz and comes equipped with features like 512k on-die cache, a 400 MHz front side bus, and some nifty on and off-die thermal monitoring features. Their new "Prestonia" Xeon family was just recently released to market, which is what we're looking at today.
AMD, on the other hand, has the Athlon MP. Renowned for its incredible price/performance ratio, the Athlon MP has had a tough time making a name for itself as a big time server chip, although has done fantastically well in the workstation market. The combination of a fairly low cost processor along with similarly priced motherboard and memory have made the Athlon MP platform quite the hit. The Athlon MP was recently bumped in speed up to 1.6 GHz, which uses the AMD PR rating of 1900+.
Today at GamePC, we're looking at two of the fastest consumer-level multiprocessing chips on the planet, Intel's "Prestonia" Xeon 2.0 GHz right alongside AMD's top of the line Athlon MP 1900+. Let's boogie.
Intel "Prestonia" Xeon 2.0 GHz
The Prestonia family of processors is to the Xeon what the Northwood family is to the Pentium 4. The Prestonia Xeon shares all the benefits of the original Pentium 4 Xeon, like a 400 MHz FSB, double-pumped ALU units, and SSE-2 instruction support, but it also has a few added bonus features which make it far and away better than its predecessor.
Just as Intel recently did with their Pentium 4 family, the Prestonia Xeon is manufactured on Intel's new 0.13 micron manufacturing processes, which allow for a smaller die area, along with lower power consumption and lower heat emissions. Not only does this make the Prestonia Xeon cheaper to produce, but the lower heat amounts come in very handy when dealing with dual and quad CPU configurations in a small form factor like a 1U or 2U rackmount. For example, the original 2.0 GHz Xeon produced a maximum of 77.5W of heat, while the new Prestonia Xeon at 2.0 GHz produces only 58W.
While reducing the manufacturing process, Intel also managed to stick in an extra 256 kB of L2 cache on to the processor die, giving it a total of 512 kB of full-speed on-die cache. As we've seen before with the Pentium 4 Northwood, adding another 256k of cache on to the Pentium 4's core can add up to 10-15% added application performance. Thus, the Prestonia Xeon gets that same speed increase compared to previous Xeon processors. Rumor has it that Intel will announce Xeon CPU's in the future with extra on-die cache, such as the case was the original Pentium II and III Xeons.
Both the original Xeon and Prestonia Xeon look roughly the same packaging, thus telling apart the CPU's can be difficult unless you have one right in front of you. Intel has the CPU markings on the bottom of the Xeon CPU's, as opposed to the Pentium 4 CPU's which have the markings right on the CPU's heat spreader. A quick flip of the CPU reveals the CPU's vital information. As you can see by the Xeon's S-SPEC codes, this is a 2.0 GHz Xeon with 512kB of L2 cache, running on a 400 MHz FSB, while running at 1.5V core voltage.
Even though there's a new core running underneath, Intel decided to keep the original Socket-603 form factor of the original Xeons, allowing you to upgrade to these newer chips without buying a new motherboard. As Xeon motherboards can be extremely expensive, this is a very, very good thing.
Besides the new manufacturing-level features of the processor, there has been one buzzword that has been gaining all the attention lately. Hyperthreading, the feature that can theoretically turn your 2 physical CPU's into 4 virtual CPU's. Let's investigate.
What Actually IS Hyperthreading?
Hyperthreading is actually a technology that's been around for quite a long time in microprocessing, but has never been used in a consumer-level product like the Pentium 4 Xeon. The technology itself is based on Simultaneous Multi-Threading (SMT) and was codenamed "Jackson Technology" by Intel while in development. At the last IDF, they gave this technology a name that fits in better with the Pentium 4 architecture, Hyperthreading.
Hyperthreading is simply a method of placing a second set of registers on the processor core, allowing the processor to execute two "threads" at once. Every time you run a piece of software, the software is sending threads to the CPU for it to execute and process. Until now, consumer level processors can only handle one thread at any given time. While a processor may go through thousands of threads per second, the CPU can only physically execute one at a time. In a dual CPU system, the computer can process two threads by sending one to each CPU. Hyperthreading takes the concept of executing multiple threads and brings it down to the single CPU level.
Hyperthreading allows the CPU to manage two threads at once, although this doesn't necessarily mean there are two CPU cores on the same die. Each register set can handle one thread, but each thread has to fight for processor resources like storing data in cache and sending it out through the front side bus. This means a single CPU with hyperthreading capabilities will not perform the same as two physical CPU's in an SMP configuration. While the ability to execute two threads at once was one of the main reasons why SMP was brought to market (symmetrical multi-processing, i.e dual CPU systems), the costs of going to SMP, such as SMP compatible motherboards and processors, in most cases far outweigh the benefits.
Unfortunately, since the threads have to fight for resources, there can be conflicts. If two threads want to use the same processor resources at the same time, they have to get in a queue to do so. Since most every piece of software on the market is written to only take advantage of a single CPU, suddenly throwing a single processor application on a dual/quad processor system will show literally no advantage in performance. Even as of today, only small percentage (mainly workstation/server applications) are multi-threaded to take advantage of multiple CPU's.
To get the full advantage of Hyperthreading technology, the software will have to be "optimized" for it. Whether this means re-compiling the software to support Hyperthreading through a new Intel compiler or just adding a few more lines of code, we're not certain. Intel states in their technical documents that software written to take advantage of SMP will get in upwards of 10% performance gain with a Hyperthreading capable CPU. If the software is optimized specifically for Hyperthreading, Intel has seen performance gains up to 30%.
Nowadays, where SMP is common in workstations and servers (and in some cases, desktops), there is a lot of multi-threaded code out there. The latest major operating systems can handle multiple processors, most professional video / audio editing software can use the CPUs, and even games are just starting to take advantage of a second CPU if available. This is the market that Intel's looking to capitalize on.
Hyperthreading in Reality
The buzz around Hyperthreading is that a single Xeon system will be seen as two CPUs, while a dual Xeon system will be seen as a quad CPU system. Of course, people immediately think, "Wow, two CPUs for the price of one!" This is certainly not the case with Hyperthreading, just as dual processors do not give you double the power of a single processor.
Since Hyperthreading is implemented on the hardware level, the motherboard sees a single hyperthread-compatible CPU as two physical CPUs. Thus, software that is written for multiple CPUs will be tricked into thinking there is a second CPU in the system, and will run the appropriate multithreaded code if available. Since Windows XP and 2000 are coded to take advantage of multiple CPU's, it too sees a hypertheaded CPU as two.
In our case, since we ran with dual Xeon processors (each with hyperthreading capabilities), the OS and software see this as four physical CPUs, even though there are only two physical CPUs running. As you can see by the device and task managers in Windows XP, the OS sees our system with four physical CPU's. Eeven though Windows 2000 and Windows XP only officially support two CPU's, both operating systems were able to run properly with the Hyperthreaded CPU's. This means you don't have to upgrade to a 4-processor OS like Windows 2000 server to take advantage of this technology.
While this looks great for showing off to co-workers or friends, you will absolutely NOT get the performance of four CPUs running in your system (I can't stress this enough). As you'll see in our benchmarks later, even if software is written to take advantage of SMP, you rarely ever see performance gains with Hyperthreading enabled. In fact, in many applications, you see a performance drop with Hyperthreading enabled, as there is a great deal of overhead when splitting data up over four CPU's to process. Perhaps this is why Intel is recommending motherboard makers leave Hyperthreading disabled in the BIOS.
It's quite possible that Intel implemented Hyperthreading to take advantage of the Xeon architecture's longer pipeline, an often criticized design element of the Pentium 4 and Xeon families. With Hyperthreading, they can start a second process after the first one is farther down the pipe. From a theoretical standpoint, the code would have to either be highly optimized for the Prestonia or limit the use of branch prediction, since there are now two sets of independent data in the processor. If you look at Hyperthreading like this, it would appear to be the next generation of the P4's out-of-order speculative execution engine.
From what I now understand about Hyperthreading, it's my belief that Intel is planning to use Hyperthreading in all of its future Pentium 4 products down the road. The Xeon is simply the first guinea pig to actually have the logic enabled on the die. As Intel already has the Hyperthreading logic in the current Pentium 4 hardware, but not implementing it, you've got a sure sign that Intel will simply flip the switch to activate the logic when Hyperthreading applications are actually available. If Intel convinces developers that Hyperthreading is worth their time to optimize for, this could be an incredible feature 1-2 years down the road. As for now, it's fairly useless, but certainly interesting in the sometimes bland world of computer processing.
AMD Athlon MP 1.6 GHz (1900+)
The Athlon MP 1.6 GHz is the latest and greatest from AMD's server/workstation family of CPUs, which have gained an extremely large amount of credibility lately due to their incredible price / performance ratio compared to Intel's Pentium 4 and Pentium 4 Xeon families. While slightly lagging behind AMD's own 1.67 GHz (2000+) in raw clock speed, the Athlon MP 1.6 GHz is quite more expensive than the Athlon XP 1.67 GHz, despite the fact that both can run SMP quite well.
The Athlon MP is based on the "Palomino" Athlon architecture, which is based on the 0.18 micron manufacturing process. While the Palomino chips create quite a bit less heat than the "Thunderbird" variant of the Athlon, the Palomino's still create quite a lot of heat, which can be difficult for dense rackmount situations. The chip itself is based on the Socket-A form factor, which means it should be compatible with most single processor Athlon boards, as well as all the dual Socket-A boards on the market now. As you'll no doubt notice, the new Athlon XP/MP processors are coming with green packaging, although they still use the same organic packaging as previous Athlon MP/XP CPU's.
The Palomino Athlon core comes equipped with 128 kB of L1 cache, along with 256 kB of L2 cache. While we've heard rumors that AMD may up the cache amounts on their upcoming 0.13 micron "Thoroughbred" processors, we haven't recieved any indication that this is anything more than a rumor.
Getting a closer look at the Athlon MP 1900+, you can see the Athlon's famous bridges are not "cut", like Athlon XP chips hitting the market. This means with a simple pencil and a motherboard that supports clock adjustments, you can overclock these processors to much higher clock speeds than intended. Of course, workstation and server users would most likely never do this, as overclocking is inherently risky, but we thought it was worth mentioning.
As you can see from reading the core, our Athlon MP processors are of a fairly recent "AGNGA" core stepping. The first line of text says "AMP1900", which denotes our chip as an Athlon MP 1900+. AMD runs the exact same processor core on both the Athlon XP and MP processors, albeit the MP models go through an extra round of multiprocessor "validation". Performance wise, these two cores are exactly the same.
The biggest threat for AMD and the Athlon MP is the fact that the platform has been plagued by a lack of absolute stability. While the Tyan Thunder K7 and Tiger MP boards still wrangle with edge-case stability scenarios, the AMD 760MPX motherboards have been plagued with chipset problems and many board revisions. In fact, the release of the 760MPX has undone much of AMD's work in making the Athlon MP synonymous with stability. We absolutely love the Athlon processors, but the platforms still aren't up to the level we were hoping for by now. Still, as more platforms are getting released, the situation IS getting better.
Just the facts, ma'am.
Intel Prestonia Xeon 2.0 GHz
AMD Athlon MP 1900+
. Prestonia Xeon 2.0 GHz Athlon MP 1900+
Clock Speed 2.0 GHz (2000 MHz) 1.6 GHz (1600 MHz)
L1 Cache 8 kB 128 kB
L2 Cache 512 kB 256 kB
L2 Cache Speed Clock Speed (2.0 GHz) Clock Speed (1.6 GHz)
L2 Cache Associativity 8-Way 16-Way
Form Factor Socket-603 Socket-A
Front Side Bus Speed 400 MHz 266 MHz
Manufacturing Technology 0.13 Micron 0.18 Micron
MMX Instruction Support Yes Yes
SSE Instruction Support Yes Yes
SSE-2 Instruction Support Yes No
3DNow! Instruction Support Partial Yes
The Platforms
Supermicro P4DC6+ i860
Asus A7M266-D AMD 760MPX
. Supermicro P4DC6+ Asus A7M266-D
Chipset Intel 860 AMD 760MPX
CPU Support Up to 2 x Xeon 2.2 GHz+ CPUs Up to 2 x Athlon MP 1.6 GHz+ CPUs
Memory Type PC-800 RDRAM PC-2100 DDR SDRAM
Memory Capacity 2 GB Max (4 RIMMS) 3.5 GB Max (4 DIMMS)
Memory Type Support Standard / ECC Standard / ECC
AGP Expansion AGP Pro 50 AGP Pro 50
PCI Expansion 2 x 64-bit (66 MHz) Slots
4 x 32-bit (33 MHz) Slots 2 x 64-bit (66 MHz) Slots
3 x 32-bit (33 MHz) Slots
Onboard SCSI Adaptec AIC-7899W Ultra160 SCSI N/A
Onboard Ethernet Intel 82559 10/100 Port N/A
Onboard Audio AC97 Audio C-Media 6 Channel Audio
Onboard Video N/A N/A
Pentium 4 Xeon "Prestonia" Testbed System Configuration
Processors 2 x Intel Pentium 4 Xeon 2.0 GHz "Prestonia" (8k L1, 512k L2)
Cooling Intel Socket-603 Retail Coolers
Memory 512MB Samsung PC-800 RDRAM (4 x 128M)
Motherboard Supermicro P4DC6+ (Intel 860 Chipset)
Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
Software Windows XP w/ DirectX 8.1, Intel 3.2 Chipset Drivers
Pentium 4 "Northwood" Testbed System Configuration
Processors Intel Pentium 4 2.0 GHz "Northwood" (8k L1, 512k L2)
Cooling Intel Socket-478 Retail Cooler
Memory 512MB Crucial PC-800 RDRAM (4 x 128M)
Motherboard Asus P4T-E (Intel 850 Chipset)
Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
Software Windows XP w/ DirectX 8.1, Intel 3.2 Chipset Drivers
AMD Athlon MP Testbed System Configuration
Processors 2 x AMD Athlon MP 1.6 Ghz (1900+) "Palomino" (128k L1, 256k L2)
Cooling AMD Socket-A Retail Coolers
Memory 512MB Crucial PC-2100 DDR SDRAM (2 x 256M)
Motherboard Asus A7M266-D (AMD 760-MPX Chipset)
Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
Software Windows XP w/ DirectX 8.1, AMD 1.30 Driver Pack
AMD Athlon XP Testbed System Configuration
Processors AMD Athlon XP 1.67 Ghz (2000+) "Palomino" (128k L1, 256k L2)
Cooling AMD Socket-A Retail Cooler
Memory 512MB Samsung PC-2100 DDR SDRAM (2 x 256M)
Motherboard Asus A7V266-E (VIA KT-266A Chipset)
Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
Software Windows XP w/ DirectX 8.1, VIA 4-In-1 4.37 Service Pack
Lab Notes
* All tests run with VSync (Vertical Sync) Disabled.
* Nvidia Detonator XP (23.11) Driver used in all testing.
* All RDRAM memory run with "Nap" mode disabled.
* All DDR memory run at CAS 2.5 latency.
Benchmarking Software
* Adobe Photoshop 6.01
* LAME MP3 Encoder 3.91
* Kinetix 3D Studio MAX
* Red Hat Linux 7.2
* SiSoft Sandra 2002
* Windows Media Encoder 8.0
SiSoft Sandra 2002 is a synthetic Windows benchmark.
The benchmarks can stress CPU, Memory, or Processor Instruction abilities.
Higher Sandra scores mean better overall performance.
CPU Benchmark - Hyper-Threading Support (SMT) Enabled
(Higher Scores are Better)
CPU Benchmark - Hyper-Threading Support (SMT) Disabled
(Higher Scores are Better)
Memory Benchmark
(Higher Scores are Better)
SiSoft's Sandra, while being a synthetic Windows benchmark, is one of the few pieces of software on the market with some level of Hyperthreading support. This is through Sandra's "SMT" test, which to be honest, gave us extremely sporadic results at first. Once we figured out what exactly was happening with the test, we were able to finally lay down some solid numbers.
First off, it's quite easy to see that the dual Athlon MP setup simply rules the roost when it comes to raw CPU performance. Even with the Athlon MP chips at 1.6 GHz, it's easily able to outpace the dual Xeon 2.0 GHz processors, with or without Hyperthreading enabled. Even the highest performing Xeon setup still trails the dual Athlon MP 1900+ by roughly 30%.
When Hyperthreading was enabled, we can certainly see some performance gains being had by the Xeon setups. One CPU with Hyperthreading gained 18% in this benchmark, while two CPU's with Hyperthreading gained 23%. Of course, this is simply a synthetic test, and to achieve any real world performance gains like this, the software would have to be specifically optimized for Hyperthreading.
Upon looking at the results, we're not positive on what effect the SMT test has on our scores. As you can see by the first graph, even with Hyperthreading (hardware)disabled on the dual 2.0 GHz Xeons, it still managed to get a higher score on the Hyperthreading (software) test, compared with Hyperthreading (software) being disabled, which nearly has a margin of 2000.
In terms of memory performance, Xeon systems still maintain quite a large margin over the current Athlon MP systems. Thanks to the Xeon / i860 dual channel RDRAM memory interface, you've got quite a bit more available bandwidth compared to the Athlon MP / 760MPX single channel DDR interface.
Adobe's Photoshop 6.0 is the world's most popular image creation/editing software.
We run a series of filters on an image, while measuring perform them.
The times for each filters are added up. Lower times mean faster performance.
Adobe Photoshop 6.01 Filter Benchmark
(Lower times are Better)
Adobe's Photoshop thrives on fast FPU units along with lots of memory bandwidth and capacity. Even though Photoshop is multi-threaded, the software only really takes advantage of multiple processors on a few select filters. Thus, running a second processor doesn't necessarily help Photoshop that much, at least in this case.
In our test, we see the simple single Athlon XP 2000+ processor beating out both the dual Athlon XP 1900+ and dual Xeon systems. While the other platforms were merely seconds away, it's clear that the Athlon-based systems take the cake for best overall Photoshop performance. We see the addition of a second Athlon MP processor took nearly 8 seconds off the benchmark time. Not bad, but we were hoping for more.
Hyperthreading shows itself here to become more of a nuisance than actually helping performance. With Hyperthreading enabled, the dual Xeon 2.0 GHz system actually slows down by 5 seconds, while a single Xeon 2.0 GHz with Hyperthreading speeds up by 2 seconds. As you'll likely guess, Photoshop is not optimized for Hyperthreading, so any performance gains seem to be purely coincidental.
Keep in mind, we ran this test with the Adobe 6.01 patch installed, along with Adobe's specially released SSE-2 filter package, and the Xeons still couldn't fully stand up to AMD's new Athlon processors.
3D Studio is one of the most popular 3D editing suites on the market today.
We render a 50-frame scene with over 40,000 faces and 20,000 vertices.
Lower render times mean faster processing performance.
3D Studio MAX "Tank" Render Test
(Lower Times are Better)
3D Studio MAX, and any kind of 3D rendering software, relies almost 100% on the CPU for final scene rendering. Thus, multiprocessor systems are almost required for any kind of professional level 3D modeling software. 3DS Max is indeed able to fully take advantage of multiple processors.
In our test render, we again see AMD take the take, as the dual Athlon MP 1900+ system rendered our scene the quickest. While the Dual Xeon 2.0 GHz system was just about one minute behind, the Athlon systems simply rock for these kind of applications. Even our single Athlon XP 2000+ system managed to render a few seconds faster then Intel's dual Xeon 2.0 GHz box.
As for Hyperthreading, again we see mixed results. A single processor with Hyperthreading actually helps out, cutting 15 seconds off our rendering time. Two processors with Hyperthreading hurt a lot, as it added an extra 1:56 to our final render time. Ouch.
Windows Media Encoder is a free Windows video encoding suite.
We take a 50MB MPEG file, and encode it to Windows Media 8 (.wmv) format.
We test at 320x240 Resolution using the WM8 for Cable/DSL encoding method.
50MB MPEG Video to Windows Media Video Encode
(Lower times are Better)
While the Xeon was crushed by the Athlon MP in the previous two tests, the table turns around for video encoding. Encoding our MPEG movie was incredibly fast with the Dual Xeons, the fastest score we've seen for this test to date. Windows Media Player 8 is extremely efficient with multiple processors, giving a 30-40% boost in encoding times for both the Xeon and Athlon MP platforms.
Even as the Xeon is the clear winner in these tests, Hyperthreading again disappoints. A single Xeon with Hyperthreading tacks on another 20 seconds to our encoding time, while Dual Xeons adds on another 29 seconds. Disappointing, to say the least.
MP3 Encoding is extremely CPU intensive, and tests the CPU's raw FPU performance.
We use LAME 3.89, which has optimizations for MMX, 3DNow, and SSE
A 200MB
200MB Wav to MP3 File Encode
(Lower Times are Better)
MP3 encoding through LAME is entirely CPU based, but since the program isn't multithreaded, we don't see any performance gains when adding a second processor. Thus, winning this benchmark is simply a case of having the best FPU performance in a single processor situation, which the Athlon clearly does.
The Pentium 4 / Xeon platforms are 9-10 seconds slower, no matter what motherboard or processor combination is used. Both the Athlon MP and Xeon systems give very respectable encoding performance, but the Athlon MP/XP are clearly the winners here.
Red Hat is the most popular Linux distribution in the world currently
We test by recompiling the 2.4.9 kernel using the "make bzImage -j#" command.
Depending on the # of threads, compiling time can be different, especially with SMP.
Lower compile times mean better processing performance.
Red Hat 2.4.9 Kernel Compile - 1 Thread
(Lower times are Better)
Red Hat 2.4.9 Kernel Compile - 2 Threads
(Lower times are Better)
Red Hat 2.4.9 Kernel Compile - 4 Threads
(Lower times are Better)
Compiling a Linux kernel is extremely stressful on the CPU, and as we tested with the SMP-compatible 2.4.9 Red Hat kernel, we were able to see some very nice performance gains with a our multiprocessor systems. As the 2.4.9 kernel also has for "Jackson Technology" (aka, SMT / Hyperthreading), we were hoping to see what Hyperthreading was capable of doing in a Linux environment.
When the kernel is compiled with a single thread, the systems don't show any real performance gains with a second processor installed. Compiling with two or more threads is where you really start to see the performance gains of SMP with Linux.
With two threads running, compile times are nearly cut in half with two CPU's installed. The Dual 2.0 GHz Xeons manage to compile the kernel quickest at 1:57, while the Athlon MP 1900+ setup is nipping at its heels with a 2:05 compile time. Compiling an entire Linux kernel in under two minutes is simply an incredible showing of CPU power, any way you look at it.
For curiosity's sake, we decided to run a compile with four simultaneous threads. As dual Hyperthreading-enabled Xeons can physically take four threads at once, we figured it would be a good test. Unfortunately, there were only 1-2 second differences in compile times between 2 and 4 threads. Compiling the kernel with 2, 3, 4, 5 and more threads gave roughly the same compile times.
The Final Word
Both the Prestonia Xeon and Athlon MP are incredible processors, and both engineering teams deserve a round of kudos for producing some incredibly fast SMP-capable CPU's. Each CPU has a specific area where you'll see one dominate over the other, although the majority of the tests were fairly close between the two CPU's.
In my opinion, the Prestonia Xeon is the better CPU of the two for mission critical / server applications. The Intel 860 platform seems to be incredibly stable, considering it's relatively short time on the market. Not one instance comes to mind where we ran into compatibility issues with our Dual Xeon systems, something we can't say for the Athlon MP systems we setup. Unfortunately, you pay the price for the Intel name, as Xeon systems are extremely expensive. The CPU's and motherboards are both extremely expensive, which makes the Xeon hard to recommend for the workstation market.
The workstation market is much better suited by the Athlon MP processor, as its price / performance ratio is unbeatable. For most workstation applications, the Athlon MP even will be a better performer, despite its lower price tag. We would love to see AMD put a few more server-specific features on their MP processors to justify their heightened price tags over the Athlon XP, but even as they are now, the MP's are a great deal for the amount of processing power you get in that tiny little core.
As for the Xeon's Hyperthreading technologies, it's hard not to be disappointed with the scores which we got throughout our testing. Hyperthreading sounds like an incredibly useful processor feature in theory, but in practice, It's useless without compatible software on the market. Time will only tell if developers want to take on the Hyperthreading challenge, and the few developers we've talked to have not been that incredibly impressed with the technology thus far. If nothing else, Hyperthreading will certainly be an interesting to watch out for over the next few years.
This time next year, it's quite possible that we may be dealing with McKinley and Clawhammer has the workstation processors of choice, if Intel and AMD have their way. While it's anyone's guess if 64-bit processing is ready to come down to the consumer level, this article certainly proves that current 32-bit processors have more than enough power to handle today's applications.
I believe that this was done in the IBM AS/400 using a special version of the PowerPC chip. There was a talk on this at the Ottawa Linux Symposium last summer. According to the IBM people, it mostly worked great, but there were a few issues with spin locks--the CPU saw that one thread was busy (in a spin lock), so it never switched to the other one (that was holding the lock). The Intel implementation may be slightly different, but this is something to look at.
When your hardware isn't exactly what the software was written for, you tend to have weird bugs like that. I would not be surprised if Windows, Linux, FreeBSD, and other OSes need minor patches to work well with this new hyperthreading from Intel.
I'd like to get two 2gig processors, and have it run symmetrically as 4gigs.
As an owner of an SMP system, I can say with confidence that even having two /real/ processors, which is better than one hyperthreading processor, isn't of any great benefit to Windows users anyway (see comments above about HT on Win2K) other than for servers (shudder) and for running several very CPU intensive apps at once, which very few people do.
In *nix, however, I have improved my buildworld times for thirty percent. *That's* useful.
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
It's only a virtual processor. Send them a virtual license fee!
I did. It works in C++ too. :-)
---Windows 2000/XP stable? safe? secure? 5 lines of simple C code say otherwise!
Posted 1/14 on anandtech:
http://www.anandtech.com/cpu/showdoc.html?i=1576
One of the reasons that I became a lawyer was to avoid ever having to hire one. -SPYvSPY
The latest BIOS update for my dual PIV Xeon (Dell Precision 530) says that it added support for SMT/JT... I wonder if they had already tested these CPUs on my system. I WANT!!!! Drool...
A minor issue, but they report the Athlon as having 128k of L1 cache but the Xeon as having only 8k... Now, maybe I'm wrong but I'm assuming they have the same cache config as past athlons which was 64k L1D and 64k L1I... The P4 has only 8k L1D, but also has instruction and trace cache which is not being counted...
I guess GamePC's servers got overrun or something, because it seems that none of their pages (not even a month-old review of a Coolermaster case) will load up. You guys at /. should really look into making a list of web sites exempted from ever being mentioned on this site, because it's a little too much at times.
P.S. Could you please have some kind of post about a page on Microsoft.com? I'd lllooovvveee for that site to go down... :-p
"I get my jollies building computers" Steve Jobs, 1983
The best example of how to split a task into threads (that I like to use) is rendering a 3D image to screen. If you want to split that task so that two threads (and thus two processors) can work on it, you just make one thread handle 'even' scan lines and one thread handle 'odd' lines. Keeping the caches cohereent between the two CPU's can be difficult - they're both executing the same code, and might also be twiddling around some piece of memory that they share.
My point is, with this hyperthreading business, that there's only one cache - so no more cache coherency bothers. I might be concerned that the arithemetic units or whatever else that are on the chip might be in contention for use - but they can just add more of 'em in later steppings of the CPU.
The problem for us Unix-lovin' folk is that Unix-esque OS'es don't often take threading very seriously. OpenBSD, for example, doesn't even have a kernel-threading implementation (correct me if I am wrong!) The 'Unix Way' is to just fork a process and run two process images. That's fifty billion times easier to debug than two threads that step on eachothers' data (see deadlock). But the forking method - even with nifty things like copy-on-write process images and such - doesn't seem to use as little memory, or perform as quickly, or process-switch as fast.
When I speak to developers who know their stuff (more than I do) they say - on NT, make a whole bunch of threads and make them talk to eachother with semaphores and stuff - on Unix, fork and write to a pipe. Nothing fundamentally wrong with that division, but advances such as this Hyperthreading thing won't work as well on Linux, I don't think.
The moderators have failed again. The parent of this post was on topic and perhaps even funny. Certainly not worthy of a -1 rating though!
You know kids, just because an anon posts it doesn't mean it's offtopic...
(This note, however, most certainly is. So mark this one off topic and the parent on)
Another system that used something like this was Tera: http://www.sdsc.edu/SDSCwire/v3.18/tera.html However, what they did was have 128 contexts per CPU and it round-robin'd through them all. You could also "daisychain" multiple CPUs together in a system. It was interesting but I don't know what ever happened to the machines they were building.
Why do we want to trick the computer? Isn't windows confused enough without believing it's a multi processor comp too?
T Money
World Domination with a plastic spoon since 1984
Registers in modern processors get renamed. Intel gets away with having such few logical registers in their ISA (instruction-set architecture) because they have dozens of physical registers.
All hyperthreading will do is just maintain a different program counter and re-order buffer for each thread. There are probably other minor details as well, but don't get caught up in registers from a programmer's point of view. There is magic under the hood that the programmer will never ever be aware of. At some point in your program, their may be 8 or so "EAX" registers. Later on, this same register may be renamed to a "ESP" register.
Does it work only from Administrator account or from any user account?
Does it have QUANTISPEED ?!
;-)
QuantiSpeed makes my CDs burn faster.
Has anybody cracked one of the new Xeons apart yet? How do we know that Intel didn't really slip two cores onto the same processor card... then, one processor would appear as one, two as four!! They sell for thousands more than they cost to make, anyways, right? Who's going to know?
:)
Hmmm?!?
SlashSigTheorem: Humorous, Political, Critical, Constructive- If you have a
Whoops! Well, I'll stick with the '..later steppings' line - when they start slapping a couple more ALU's and other processing units on the CPU so that more 'stuff' is available for the second thread, then it could be good!
Until then, I guess it's kinda like the Itanium - good as an experiment to see what things can be like in the near future, but not as useful for day-to-day operations.
Oh well - wish I could just edit my comment above!
Not only does it work from any user account, supposedly it even works from Guest, if you have that account enabled.
What code did you use in C++? This doesn't work:
#include
int main()
{
for (;;)
{
cout "\t\b\b";
}
return 0;
}
The stream should have gotten flushed at some point, and there is no significant impact on the process or system.
VC6 + Win2K SP2
Win9x and XP Home don't allow multi processors, so they would not gain any benefit from this, right?
All those PHB's will be drooling over this so they can have bragging rights.
Just so Word can idle more......
- - - - - - - - - - -
I am a programmer. I am paid to produce syntax not grammar. Deal with it.
From the article mentioned above:
As for the Xeon's Hyperthreading technologies, it's hard not to be disappointed with the scores which we got throughout our testing. Hyperthreading sounds like an incredibly useful processor feature in theory, but in practice, It's useless without compatible software on the market. Time will only tell if developers want to take on the Hyperthreading challenge, and the few developers we've talked to have not been that incredibly impressed with the technology thus far. If nothing else, Hyperthreading will certainly be an interesting to watch out for over the next few years.
You're right, that code (even before Slashcode killed it for you) won't work. The standard C++ library eats the backspaces, and they're the key to making it crash. Someone (usually a library) has to be calling (eventually) WriteConsole with tabs and backspaces in a loop. Apparently, the C++ standard library isn't doing that, so chalk one up to C++, but only because it's not outputting what you told it to. Now, is that a feature or a bug? :-)
---Windows 2000/XP stable? safe? secure? 5 lines of simple C code say otherwise!
If not allowing you to shoot yourself in the foot is a bug, then what language besides C and asm isn't buggy?
Is there some way that Linux can be limited to a certain number of CPU's? It sure would be wonderful if there were separate versions of Linux for each possible number of CPU's. If you had a kernel that was only written for two CPU's, it should properly not work at all on 4 CPU's, preferably with a message saying "send more money to your vendor". And while they are at it, is there some way that XFree can limit the number of xterms to less than 4, so that if a user wanted to open 6 xterms they would have to download the XFree that ran with 6 xterms? Think of the marketing possibilities that can be used to improve Linux!
If tits were wings it'd be flying around.
I read constantly that bogomips are not a measure of processorspeed, absolute or relative, do not tranlslate into performance, and are only used to assign cache timing. Is there something beyond this that I'm missing? Why include a bogomips rating for this dual Xeon behemoth?
This is where I get my recommended daily allowance of "Foot in Mouth."
If you compare the intel parts to other alternatives, you're already getting half the processor, and for twice the price as well!
Comment removed based on user account deletion
Win2K Pro allows 1-2 processors. If you need 4 processor, Then you Need Windows 2000 Server. 8 requires Advances server. I think that Datacenter can allow up to 16 processors in theory. Enabling hyperThreading seems to mean "lying to Windows" as a cheap hack to enable multithreading on a non-SMP machine. I doubt this induces processor-hotes schitzophrenia (rather, multiple personality disorder).
If the article they specifically stated that a dual processor machine will run on Windows 2000 Professional without problems. The hardware can tell Windows that it has double processors, but this does not give you the effect of 4 virtual CPUs with 2 CPUs. This is only a cheap hack to compensate for prior assumptions in computing - namely, that a single CPU will have reduced performance with multithreaded code. This is because a context switch for each thread can waste time, making multithreaded code seem expensive. Even on Intel hardware, the BeOS had fairly cheap context switches, and the exceptional performance on even single CPU systems anecdotally disproved some generally accepted notions on the impact of multithreaded code.
In this article, benchmarks seemed to indicate that while single CPUs with HyperThreading enabled slightly outperformed those with disabled HT, the dual-proc systems tended to lose with HT enabled. This could jibe with the above when you consider that a single CPU with Hyperthreading really isn't 2 virtual CPUs. Rather, by masquerading as 2 CPUs to Windows, Windows will send code that is multithreaded, which can be more efficient with an effective scheduler. On the other hand, with 2 CPUs, there is no advantage in telling Windows you have 4 CPUs, because either way you are getting multithreaded code. Thus, the negative consequences based on the hack (lie) are no longer compensated by threaded code outperforming nonthreaded code.
Note that all of this assumes that there is No Real Change in the hardware funtionality when enabling Hyperthreading, other than the CPU requesting multithreaded code. It would have been interesting to see a screenshot of the CPU load monitor with a non-idle task load to see how the load delta is presented on the 2 "virtual" CPUs belonging to each real CPU.
Perhaps this will allow Windows to approach responsiveness approaching that of BeOS even on single CPU systems, although dual-procs will always be better for abstracting interface performance from system load.
"First off, it's quite easy to see that the dual Athlon MP setup simply rules the roost when it comes to raw CPU performance. Even with the Athlon MP chips at 1.6 GHz, it's easily able to outpace the dual Xeon 2.0 GHz processors, with or without Hyperthreading enabled. Even the highest performing Xeon setup still trails the dual Athlon MP 1900+ by roughly 30%."
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
I know of a couple of people who mentioned that one of the features of the G4 was that the SMP was handled in such a way that it was possible to put 2 G4's on the same CPU die. As there are no Mac's with this nor any 3rd party vendors that have this available as an upgrade I assumed it hadn't gone ahead.
Looks like IBM have used it in there high end servers. Now if only I could get OSX running on them...
Go out and get sailing!
Perl
Why do the Pentiums have only 8 kB L1 cache as compared to the Athlon's 128 kB? I suppose this is somewhat o/t, but I'm curious as to why there's such a big difference.
My other sig is also a
You may be in luck - I heard that eventually Power and PowerPC will be consolidated to the Power architecture.
-Kevin
Despite VmWare, The Intel architecture isn't really virtualized. I am willing to believe that IBM can actually put multiple CPU cores on one die, for many SMP benefits without the downside of requiring multiple CPU dies and infrastructure.
On the other hand, for Intel this seems to be unrelated, and a cheap hack, where the CPU presents an SMP interface to the OS only to to request multithreaded code. Requesting multithreaded code should be possible without sending misinformation to the basic Operating System services. This seems to be a symptom of the disconnect inherent in having different companies producing the Architecture and the Operating System. It seems unlikely that you would see such a crude hack for fundamental systems infrastructure coming from an integrated vendor such as SGI, Sun, or even Apple. The Mips and PPC/Power don't lie about such things, because the support chipsets are actually supported by the Operating Systems.
Really, this seems reminicent of C/H/S mangling in storage interfaces and long file name mangling in the vFat filesystem. While is has taken up many generations (in computer years) to abstract the state of the art from such persistent kludges, they still haunt the consumer computing space to this day. Now do we really need to go even further backwards fuck with how the CPU interfaces with the core system?
Of course, perhaps I am not giving the Wintel juggernaut the benefit of the doubt... after all, they will have to bear the consequenses of such a choice, so maybe they aren't just adding saddlebags to the most basic part of the computing system. Perhaps they have the foresight to resolve this in the next few years before making this burdensome interface an accepted standard. It would be shameful too add more standardized interface limitations... such as the standardized "Gates" hopping of the 640K limit on commodity computing harware, or the IDE limit of 540MB, no 2GB, no 8GB, no 36GB, no wait 140Gb, Ad Nauseam. All of these arbitrary limitations were the results of cutting corners and not sending fully honest information to basic system interfaces. Can't the OS send efficient multithreaded code for processing without living a lie?
-castlan
Dude are you completely talking out of your arse?
The CPU can't see the difference between a thread and a process. This CPU doesn't include a kernel.
Your argument is 100% invalid.
What kind of setup are they testing? Anyone who is willing to spend that kind of money for those processors had better put no less then 2 gigs of ram on those boards!
Who puts 512 megs on a board like that ? I mean really!
Nope. If I want to walk off the end of an array, or reference an empty hash value, Perl will automatically create it for me.
No, Perl will give you a lot of bullets, but it won't let you do everything.
I am not sure about this, but I would suspect if one were to compile the kernel with the GCC multithread option "/j n" (if I remember correctly), (n being the number of threads the executable being split up into), then the kernel would be able to take advantage of a SMT processor.
There should be a moderation category "Dumbest Comment EVER"
Here is how it worked. On the CDC 6600 when the CPU wanted to do I/O it would store a request packet into a magic memory address. The next virtual PPU would scoop it up and shovel the bits into the device. There was no DMA. The PPUs polled the I/O port to push each word of data. They also did most of the 'system call' functions. For example for a context switch the PPU would order the CPU to dump its registers and halt, then the PPU would swap in the new registers and order it to resume and load.
Each PPU ran one instruction before switching. The design documents called the switching logic the 'barrel', as in the drinking song Roll Out the Barrel. The design engineers must have liked their beer :-)
I sort of agree with him. If an SMT processor runs 2 threads from different processes, how will it handle virtual memory? I suppose you could implement an SMT processor with separate TLB tables...that would be the only way. I wonder if Hyperthreading does this?? Otherwise running 2 threads from different processes won't work.
Oh, and from my knowledge, most UNICES flush the TLB on a context-switch. Someone please correct me if I'm wrong. I do realize that some architectures (MIPS R10000 I believe) can associate a PID with each TLB entry. This would also be a solution to the problem.
I was going to post a link to the mercury news article I read 10 years ago but it would have cost me 3 bucks, in these times that's 3 bucks I wish I had (GWB*)
Anyways, this was around the time of NT4.0, when I believe apple, IBM, motorola, and MS pooled their resources together to make the power PC chip. One of the things I remember distincly from the article was the flaw in intels RISC chips was their single instruction pipeline to the core, while the PPC chips had 3.
This same argument/article was made later with the introduction of the G3 series of processors if memory serves me correctly.
Not intel bashing at all, in fact everything cept the TV and microwave got a intel chip in my house. Just trying to make an interesting point.
I think I'd rather see a decent implimentation that tricks the OS into thinking that 2 processors is 1 processor. So you don't have to have mutli-threaded/process code in order to gain performance from multiple procs. just a thought. stdcallsign
Actually, they should be ashamed to sell that as suitable for heavy duty. This is freaking 2002, not 1992. An UltraSparc III has 8 MB of L2 cache. A MIPS R12000 has (or can have) the same amount. IBM Power4s have similar amounts. (USIII has 32K instruction and 64K data L1 cache, and R12k has 32K of L1, for the sake of comparison.)
I admit I don't have any hard data to back this up, but it's my suspicion that it's in large part the large L2 cache that causes Sparcs to thrash Intels at some tasks. There's a good page on some processor design considerations at SETI@UNC.
Gamepc.com
They're taking what is designed as a server processor, what is designed to be optimized for server tasks (such as web page serving which probably scales to multiple CPU and hyperthreading rather efficiently), and benchmarking it on Quake. *shrug* Really, who buys Xeon's for a gaming PC? And, if they do, WHY?
These are server CPUs and should be benchmarked with server benchmarks.
-Jayde
What's a sig?
Think about that for a while and you will realize your post makes no sense whatsoever.
I am posting under this message in solidarity, but I regret that the AC didn't use a name in this instance. It would have been less vulnerable to mistreatment by a moderator not worthy of their points. Since this post will not likely reached a well deserved +4 before being archived, I will reprint this simply elegant statement below.
Thoughtless moderation helps nobody, and weakens the community. Rather than waste your moderator points, you should have saved them for a more thoughtful moderator to use. Now for some delicious irony, I fully expect to be modded down both -1 redundant and -1 offtopic.
cheers.
Re:Finally! (indeed) (Score:-1)
by Anonymous Coward on Wednesday February 20, @05:53PM (#3040304)
For uni systems, Athlon has a good platform with an excellent processors but for those who wants to get two CPU based system, Xeon is still the only game in town.
You've gotta understand, this is the first time AMD made its own 2x chipset. Intel has been doing this for a long time and stability is just great.
Oh, another thing oh the hyperthreading thing, seems like it offers fairly good perfomance to programs that take use of SMP, stuff like 3dmax. It's not 4CPU for the price of two, but it's probably 3CPU for the price of two.
kawai
The results were astounding with very little changes to the processor core. I heard that the next Alpha was slated to include SMT before Intel killed it.
If the SMT results were so impressive, why would Compaq kill the Alpha? I read an interview with the lead Alpha designer and he said that the Alpha processor had been stretched to its limits and could not support new improvements like EPIC or 64 bit addressing. He was part of Compaq's strategy team for choosing the Intel IA64 for Compaq's future server family.
cpeterso
How's that not letting you shoot yourself in the foot? It's a bug with Windows, not the language. The output should technically work (if not backspace as much as intended) instead of crashing the OS.
Notice that the Linux kernel build on two threads went slower with "hyperthreading" on than without it. And compiling is as eclectic a task as possible. I can imagine that highly optimized loops in graphics programs already max out some chip resource (like the float alus) so that multithreading them in this scheme does no good, but when compiling fails to parallelize, you know that intel must have screwed the implementation up, big time.
Multiple processors sharing the same cache on a single chip ought to be a big win, whether they share alus or not. In some cases a set up like this should signicantly out-perform regular multi-processors (when both processors are dirtying each other's caches). Intel must have screwed something up.
The benchmarks show that the current implementation of "hyperthreading" is basically useless. The idea could work very well though.
Rocky J. Squirrel
Imagine a beowulf cluster of these! w00t!
If the processor could handle two completely separate processes, it would be just a dual-core CPU, right? And if it were, it wouldn't be called 'SMT' - it would be called a multi-core CPU. Furthermore, this technique of SMT is designed to work around 'small' problems like pipeline stalls - that's not something you're going to solve by doing a full task-switch within the CPU - task switches take a long time. Switching from one thread to another that lives in the same process-space seems more like what they would be trying to do. So from that, I would figure that SMT only works with two threads with the same address space.
We know that the CPU needs some OS extensions in order to run - that's mentioned in the article. Those extensions would be for, I presume, scheduling. In other words, code that says: "when one thread/process is running, schedule another to run also if it lives in the same memory space. But if you've got two separate runable processes hanging out, only have one actively running."
All that I'm saying is conjecture, sure, and in some other dicussions on this topic people who actually know what they're talking about are probably saying far more interesting things than this.
Conjecture, exactly.
It (HT Xeon) does use two seperate TLBs. Not only that, each core has it's own register set including segments and TSS, LDT, the whole shebang. In fact, an issue mentioned earlier in the discussion and the article was that the two threads would sometimes conflict in usage of the cache (wanting dispersed regions of memory), thus increasing the cache miss rate than if they were two seperate cores. Furthermore, stock versions of Windows were used. They are not aware of the SMT nature of the processor, they just see two CPUs, and act accordingly. How could they know not to use threads that use different memory regions when it's never previously been an issue?
Black holes are where the Matrix raised SIGFPE
Each improvement in the Pentium processor is another nail in the coffin of the SPARC processor. A Pentium-based notebook running Linux costs $1500. An equivalent SPARC-based notebook costs $5000. Yes. Some dumb schmuck is selling a SPARC-based notebook. Read " PC maker ships Sun-based workstation"
When Sun Microsystems announced that it would Intel/AMD-based servers running Linux, Sun basically declared that the SPARC processor would be discontinued within 10 years.
U-V pipe only was used in the original pentium. Pentium pro and up use an out-of-order executing RISC core.
The P6 core, used in Pentium Pro, Pentium II, Celeron, and Pentium III, has an OoO core with three functional units, one that can execute any kind of instruction and two thin that can do only simple instructions. Think of it as a U-V-W pipe where U is a fat pipe and V and W are skinny.
The Pentium 4 core, on the other hand, has six pipelines, and three of them (the double-pumped ALUs) can handle two micro-ops at once. However, the decoder can feed only three micro-ops per cycle per thread. Hyperthreading goes a long way toward keeping its pipes fed.
Will I retire or break 10K?
Or maybe they just had to wait for a patent to expire?
Good thing the big drug companies haven't scrounged up $6 million a piece to donate to Congress to get the Cherilyn LaPierre Patent Term Extension Act passed.
(See also the work of her late partner Sonny Bono, the campaign contributions that led to that law, and the lawsuit to get it overturned.)
Will I retire or break 10K?
You have a point. Most people don't know it, but modern processors use a hazardous combination of Potassium Fluoride and spent Plutonium to regulate clock speed, which is the real reason that it isn't safe to overclock your computer!
That density is especially thick with server CPUs, especially the Xeon. That is why, to date, nobody has set off large enough of a reaction to be deadly with overclocking PCs, but that is not the case with Xeons, whose Plutonium content is dangerously close to the Critical Mass. And you thought your Intel CPU ran hot! Everybody who runs Xeon servers knows better than to play with the clock speed.
In fact, that is why you can't ususally buy Xeons without ECC RAM, the radiation put off by the computation would too readily disrupt the memory state information. What, you bought that nonsense about solar flares or other sources of random radiation causing bit decay? Of, and FYI, don't run Distributed.net or Seti@home on your Xeon if you have fillings or a mercury thermometer in the area, unless you are interested in a direct demonstrations of fusion in action.
Really, since the Cold war had ended, Microprocessors have been constantly dropping in price. Why was this phenomenon never observerved until the 80? Moore's Law my ass, how about "military surplus in action." It is much too expensive to store all of this spent plutonium in federal compunds. So the FCC had to ensure that computers had sufficient shielding, and now, there you go. Let me reiterate that ever since the "Pentium Pro" (Plutonium Recycling Operation) and with each new generation of CPU, active cooling remains a matter of utmost priority.
Really, they would have just put it in the drinking water supplies to distribute the threat amongst all of our nations' citizens (like taxes and fluoride) except that in secret military tests they found that the subjects teeth and bones started to glow in the dark, which would have been too obvious to cover up for long. So they stationed the subjects in parts of Japan, Las Vegas, and tropical locations so that the glowing would be concealed by all of the Neon Lights and overpowering sun (causing a sun tan, to help cover the light emissions) and classisifed the research.
Well, in any case, I highly recommend tht you don't ever "crack" one of the more recent Xeons apart. Rather, you should carefully and delicately disassemble then in lead casings. much of the extra "thousands" in cost are spent in proper protective casings, which was the real reason for Intel to do away with the Socket interfaces for Slot 1 in the first place. New high performance ceranics and Lead-magnesium alloys have allowed the protective casings to shrink again, but you still need to be careful. And dont EVER let two Intel cores come in contact with each other, or not even the Liquid Hydrogen active cooling system will save you...
Dammit! I should have posted anonymously! Now they'll get me! It's a good thing I'm posting from OpenBSD virtualized inside of Tinfoil Hat Linux! I can use the HyperThreading Hyperspace technology to encrypt my essence and escape! Fight the Future.
***Disclaimer***
Fluoride is good for your teeth and bones. Fluorosis is nothing but a commie... er, a Terrorist plot. If you don't ingest fuoride and develop fluorosis then the terrorists will have won! Make sure you brush your teeth with an ADA approved sodium-flouride "activated" toothpaste, and be only drink potassium-fluoride supplimented water. Ignore naturally occuring "healthy" calcium-fluoride that only hippies and tree-huggers advocate!
I remember that Alan Cox wrote a patch to deal with Hyperthreading just a few months ago. One thing it does is to avoid putting two threads on the same hyperthreaded CPU when there are spare physical CPUs, i.e., distinguish between physical and hyperthreaded CPUs.
Is it in the test kernel?
A good analogy to how this works would seem to be segmented downloading. On a fast connection, a segmented download splits up a file into chunks and then opens multiple connections on the same interface, and this tends to utilize more of the available burst bandwidth.
Despite the error in this, splitting up a program into two threads to run on one processor seems logical. It affords for advances in parallism, which is what processors (even single) like and optimize for. This way if two threads are running, one can be making heavy use of the ALU and the other the FPU, which are physically seperate areas of the processor, instead of one section sitting idle while to other reports 100% usage to the OS. One thread can be loading and moving data into memory while the other does number crunching...AT THE SAME TIME.
This seems like a very good model, and I can see where it would increase performance by a huge magnitude if implimented on RISC systems, since instructions typically take only a few clock cycles to complete, and most programs are written to perform them sequentially. In hyperthreading, the processor could deal with several instructions at once (like they do already), only the difference would be these wouldn't be JMP guesses or preparing executed code in case of a branch.
Cool stuff, Intel is in the right direction. It would be interesting if someone would write a program to test an ideal HT condition, like a program with two threads, one doing logic stuff and the other floating point. What would the performance increase be?
"I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95
Great - now I have to get a 2 CPU license for Micro$oft stuff...
Can the new Zeon tricks the user license too?
Die size, die size, die size.
.13 micron and use a die size that was STILL larger than the athlons. I predict that sledgehammer on EV6 will be much more interesting news than hyperthreading.
The larger and more complex the chip, the more it costs to make, and the higher the probability that there will be a defect in a randomly chosen chip. It is more cost effective to make one good cpu than to make two crappy cpus and put them on a single grid array.
Intel is trying to get back on top in terms of performance, even if it means taking an unelegant approach and making the chips extremely expensive to produce. Note that Intel's fastest offering is barely as fast as the fastest athlon- to accomplish this, they had to move to
Why "Hyper" threading? That's very misleading.
It sounds as if a thread was being created for each function call. How are we going to call that then?
My company has just bought us developers Dell 530 dual 1.7Ghz Xeon workstations. Nice, you may think, but it feels bloody slow. 15 seconds to compile 50 lines of C, using gcc (using cygwin on NT). Something really seems wrong with this box.
Not only that, but the idiot who ordered these PC's really overspecced them for development work (mostly editing & compiling), but ordered the bottom-of-the-range monitors for them (17" 60hz @ 1280x1024). People are complaining of eyestrain and headaches. I kept the 19" monitor from my old PC, but I'm so close to quitting this job.
HH
What will happen when we have one real CPU, but two virtual CPUs? Won't the OS send the idle task to one of the virtual CPUs, thus halving performance?
If we are "tricking" the OS into thinking we have two CPUs, I see no reason why this won't happen.
I must be missing something somewhere.
Sounds about right given Intel's previous mathematics 'errata' for everything about the 386.
:-)
martin
Umm....intel is trying to sell software companies on Hyperthreading, so they go and sell lots of P4s with it disabled.
Makes sense. Crack the eggs, kill the chicken.
There are many issues which the article did not address at all. For example, I would of loved to known how it effected system latency. For example, if over all performance is (-1-2%), and process latency has been improved by say 5%-10%, for workstation users, this may be a worth-while trade off.
Also, the article seems to push very hard for raw CPU performance. Allow me to clarify. While Intel does seem to indicate that performance boosts can be achieved, I didn't really read it to mean that total aggregate CPU performance would be gained or if it is, certainly not by much. Let me put it like this. It smells like this technology is geared to help out systems which normally run 80%-90% of their total CPU whereby HyperThreading would allow for effient use of the difference while requiring only common SMP application support.
Also, I didn't read that HyperThreading was geared to be directly taken advantage of by Linux or Win platforms. I suspect that there are significant OS opimizations that can be made for more intelligent scheduling and improved processor affinity. Here, I can see that processor affinity may make significant differences in overall performance. While Win's CPU affinity is only slightly better than that of Linux's current scheduler, I'm hoping that significant affinity improvments will go a long way toward addressing possible shortfalls with this technology. As such, it certainly would of been interesting to see how well Linux did with the new O(1)-scheduler in development as it has many optimizations which specifically address better CPU affinity. Plus, if the scheduler can make the distinction between virtual CPUs and it's associated owner, I can see that it may make sense to allow for processor bias between physical and virtual CPU's within a scheduler. After all, if a process is to migrate, it would seemingly (best guess here) make sense to allow it to migrate to a virtual self first before it migrates to another CPU entirely. If a process is currently executing on a physical CPU, does it make sense to allow it to migrate to a virtual CPU on a physically different CPU? I'm guessing that would make for a significant performance hit. How would it perform is process migration were only allowed to occur to it's own virtual CPU? I'd certainly like to know.
By allowing the scheduler to make intelligent migration and accordingly biased decisions, I'm guessing that any OS may be able to make significant performance in-roads while using the HyperThreading technology. As such, I'm guessing that more significant performance gains can be achieved by having the OS HyperThreading aware rather than attempting to heavily optimize at the application level. With proper OS support, I'm guessing that little more than simply SMP application support will place this technology in a completely different light.
I have seen Intel confirming elsewhere that their
Hyperthreading is based on SMT design they got
with Alpha, which is in turn based on Washington's
SMT. I went to an SMT talk by Washington guys while at Berkeley - very elegant and impressive
stuff indeed.
Looks like a good way to persuade programmers to do their stuff with MP processing in mind.
Preserve old classics: copy your collection onto all hard drives.
Yes, I was thinking that thread mobility in those benchmarks was probably the reason why the poor scores were generated. Along with moving the threads, you of course have all the problems associated with it. If you look at the benchmarks on that page, Hyperthreading makes the single CPU faster usually, which we could have predicted (and is surely what Intel wants). On the dual CPU configuration, it actually hinders performance in many cases. There are some other issues with treating one of these CPUs as two virtual CPUs. First, it isn't really two CPUs and threads scheduled on it will not necessarily be executed with fair time slices. If you schedule two threads on one of these CPUs, it is possible that one of the threads uses the majority of the resources for the timeslice and the other does little - until you reschedule it because the timeslice has passed. So basically, you can pay the penalty of two context switches for little/no work done. Also, remember that a CPU, even in highly optimized code, is not running at 100% of its potential - using hyperthreading you aren't getting a 30% faster CPU, you (should be) using the one you have more efficiently in terms of keeping the various execution units busy. I read somewhere that a typical instruction stream in a P3/P4 only keeps 33% to 50% of the execution units (ALU, FPU, etc.) busy. Hyperthreading simply tries to keep more units busy more of the time.
The real benefit here for me is that if the technology is adopted all around then most software will be written to take advantage of SMP. I have been running SMP for a long time, even though many people have tried to say it's not helping me. I can demonstrate the goodies though:
1.) While playing games that are not SMP enabled, I still benefit because the game can beat the crap outa one CPU while the OS (graphics, networking, system management, messaging, dlls/sos etc...) share the other processor.
2.) Most graphics software is SMP compliant, so I get the benefit of that when running an SMP compliant os (NT, 2K, XP, Linux, BSD, etc...)
So for us SMP geeks this is a win-win situation.
My $0.02 will always be worth more than your â0.02, so
Isn't this already done in software, as one of the possible optimizations?
Moreover, this looks like some kind of parallelization... Doesn't the Transmeta chip this after translating x86 codes? (code-morphing?)
If it's not done yet, well, it should be.
Should you stop running Seti@home? Most definitely you should stop!
But not because of heat issues. As long as you aren't overclocking your components, and have adequate cooling, then your computer should be fine. but aren't some things more important than your computer?
Let me ask you this: Have you even considered what would happen if we actually did make contact with alien life? If they are sufficiently advanced enough that we could communicate, then most likely they are at least as advanced as we humans in other aspects as well. The aspects I am concerned with range from political ambition, to weapons of extreme destructive technology!
What if they invaded America? They could reveal just how fragile the current state of our republic is, in light of its stagnation under the "two party system"! If they "bodysnatched" the presidential candidates for both the republican and democrat parties, the we would have no choice but to vote for one of them! As stated by Kodos and Krang, you couldn't vote for a third party, as that would just be throwing your vote away! So don't blame Homer J Sixpack.
What if they penetrated the White House's security by posing as a curvaceous floozy or hard working intern? From the "JFK room" they could chew nitrogen gum and frustrate our president with their sexy artificial bodies. Then when we finally have no choice but to use our nuclear missile against their mothership, the head alien would inhale the formidable payload and become pleasantly intoxicated. Then they would destroy all of our civilization, and most of the people, so that no one remained but Natalie Portman, who would have the task of repopulating our cities, and effectively mother the entire human race!
...actually, that doesn't sound half bad. Forget that freak with his Gramma's record player, if I got to share the bunker with Natalie, then I'm all for it! C'mon then, get cracking! How many units have you completed? Don't dally man, this is the future of the Human Race we're talking about! And while your're at it, do you think you could get me a box of grits? Miss Portman likes 'em hot.
-castlan
Intel isn't IBM. When Intel gets their chip process under 100 nanometers, they may look into providing multiple CPU cores on one die, as IBM does today.
While what Intel is working on now is unrelated to what I was giving IBM credit for, you are correct in that IBM RS64 III and above offer HMT. This would mean that I was initially incorrect: this would be an "expensive hack".
The reason performance with some workloads can increase by 22% and not double, is that the CPUs are not doubled. It it not a good idea to lie to your computer, and that is why HMT is disabled by default. HMT falsifies the CPU information presented to some system interfaces, which can cause problems. It invalidates the dynamic CPU allocation feature, which seems to me a more important feature. To altering the allocation of physical CPUs is dynamic, while enabling and disabling this dubious feature requires a reboot? That is unfortunate.
Most strinkingly, While HMT under Windows is most valuable in single processor systems, to enable multithreaded code, it is not even an option in single processor AIX systems. Even singlethreaded code will benefit from double processors, as it allows double the threads to execute. This is not the case for HMT's "virtual" doubling, as the thread will never idle, allowing the next thread to take over.
In fact, HMT seems just an attempt to compensate for the limited threading model of Unix-generation systems. While moving the thread scheduler into the hardware is theoretically the best way to get the performance necessary for a responsive threading performance, it shouldn't be necessary to lie to an insufficient interface to achieve it.
I do agree that that the performance improvements of HMT should be achievable, but instead of violating interfaces, those interfaces should be fixed. I recall running Distributed.net's client on a 16 way Onyx. When they implemented CPU autodetection, the client would automatically spawn 16 threads, completely bogging down the system. In a matter of seconds the system was almost completely unresponsive, so that I had to use a serial terminal to kill the thread. When I specified that only 15 threads should be spawned, the responsiveness was silky and smooth, even as I watched the threads shift around between processors (which were running at different clock speeds). Why didn't running one thread on an O2 produce a similar effect? It seems that the Unix legacy is still carrying baggage from its single processor origin in its scheduling model. I would love to see if running the Dnet client on an SMP Linux box (preferably 4+ CPUs) would produce similar results under 2.2 or 2.4, as it seems that Linux has actually made the most progress in this area, despite SGI and IBM's capability of running Single partition system images on hundreds of processors.
I would actually like to play with an SMP Xeon with HMT, so that I could but BeOS on it and see if it's nifty little scheduler fares any better, in light of it's "pervasive multithreading".
If they are the same technology, then it is really too bad that Intel couldn't just use the more desriptive term "Hardware Multithreading". Doesn't this technology just come from the alpha anyway, or did IBM develop it first?
BTW. You really should get an account. It is free, and it makes me feel better about responding to your post with any amount of effort, as there is a better chance of you actually seeing my response, and lets me recognize you for future correspondance. Thanks for the info about the RS64 line. I really haven't looked into AIX much since we moved to LinuxPPC.
-castlan