Slashdot Mirror


Intel Hyperthreading In Reality

A reader writes: "Looks like GamePC has got the first look at Intel's new Xeon processor, which has the new super-fantastico Hyperthreading technology, which tricks your OS into thinking one CPU is two CPUs, two CPUs is four. Looks neat in theory, benchies included."

13 of 285 comments (clear)

  1. Sounds great in theory, not great in practice. by yobbo · · Score: 1, Interesting

    Sure, it does sound good having the ability to pose as 2 cpu's, but you won't get the performance that you would from a real dual cpu setup.

    And, because of this AMD is at advantage. Athlon is much much smaller at an equal fabrication process, so even if hyperthreading took off, AMD would be able to combine 2 cpu cores in the one chip and still be able to compete easily in terms of die size and attain a higher level of performance, because 2 real cpu's will beat 1 cpu posing as two any day.

    1. Re:Sounds great in theory, not great in practice. by Anonymous Coward · · Score: 1, Interesting

      AMD can't currently combine 2 CPU cores on one chip due to issues other than die size. The biggest two are power consumptiom/routing and thermal output.

      And why would you compare 2 CPUs to 1 hyperthreaded CPU? # of CPUs is limited by the CPU hardware itself. If AMD's processor had a limitation of 8, then it wouldn't matter if it was 4x2 or 8x1 in configuration. And if Intel's processor also had a limitation of 8, then 8 hyperthread CPUs could do more work/time than either a 4x2 or 8x1 configuration of AMD CPUs (assuming all CPUs had the same base performance of course).

  2. Re:Think of it as out of order execution ..glorifi by bjk4 · · Score: 5, Interesting

    In hyperthreading, the logical processors do not share registers, just function units. Thus, if one logical processor needs to multiply while the other needs to add, they may share the CPU resources simultaneously.

    This was developed in response to the observation that individual function units remained idle for multiple cycles while the current process was busy doing one kind of operation.

    -B

  3. Re:Overheating by Anonymous Coward · · Score: 1, Interesting

    Dear Clueless,

    The article is about Intel not AMD.

    Failure due to overheating is generally a user incompetence problem not a design problem.

    High-end business servers aren't typically overclocked.

    No, nobody really buys these processors for anything but high-end business servers.

    Hyperthreading doesn't magically make the CPU work at 120% capacity. Even without using hyperthreading, different tasks could make the processor work just as hard.

    Hope this clears up some issues for you.

  4. Re:Think of it as out of order execution ..glorifi by dtr20 · · Score: 3, Interesting

    The Intel Xeon actually has *another* set of registers to cope with the second thread.

    Unfortunately, the big slowdown in computers is accessing memory and peripherals on the various buses. Looking at the details of the Xeon, it still competes (and queues) for access to memory.

    It's also worth considering that although programs tend to have a few threads to look after things like printing while you carry on writing your document, you tend to by using one or maybe two threads heavily at once and the rest are just mostly idle, waiting on hardware and interrupts.

    Intel themselves are claiming 10% speed improvement, even when compiled to take account of SMP, or 30% for specially optimised code (yeah as if that's going to be popular). Don't get fooled into thinking your PC is going 2x faster.

  5. Copy of text in case of /. effect by segfaultdot · · Score: 3, Interesting



    Prestonia Xeon 2.0 GHz vs. Athlon MP 1900+
    www.gamepc.com

    2/19/2002

    While Intel and AMD have seemingly taken a breather from their constant one-upmanship in the consumer processor market, things are still churning along for the workstation and server markets. While the consumer level chips from both companies (Pentium 4 and Athlon XP) bring in large portions of cash, the workstation and server processors are where the real money is made. These processors go for a much higher price premium on the market and are commonly used in more expensive multiprocessor setups.

    The customers who buy these chips tend to buy large quantities and like to use them for multiple years without any issues. Therefore, stability and reliability are the most important factors in buying a chip here with raw performance coming in second. Sure, having an incredibly fast processor is nice, but if you're constantly having to reboot the systems due to processor or motherboard stability problems, the system becomes more of a burden than help. Thus, there is a constant struggle for IT managers to either go for the fastest workstation chip on the market, or go with the chip that's known for excellent stability. Both Intel and AMD are striving to become the processor manufacturer that gives workstation users both the best performance and best stability on the market.

    Intel has the Xeon family, which has had a foothold in the low-end server / high-end workstation market for multiple years now, stemming back to the original Pentium II Xeon. The Xeon now clocks up to 2.2 GHz and comes equipped with features like 512k on-die cache, a 400 MHz front side bus, and some nifty on and off-die thermal monitoring features. Their new "Prestonia" Xeon family was just recently released to market, which is what we're looking at today.

    AMD, on the other hand, has the Athlon MP. Renowned for its incredible price/performance ratio, the Athlon MP has had a tough time making a name for itself as a big time server chip, although has done fantastically well in the workstation market. The combination of a fairly low cost processor along with similarly priced motherboard and memory have made the Athlon MP platform quite the hit. The Athlon MP was recently bumped in speed up to 1.6 GHz, which uses the AMD PR rating of 1900+.

    Today at GamePC, we're looking at two of the fastest consumer-level multiprocessing chips on the planet, Intel's "Prestonia" Xeon 2.0 GHz right alongside AMD's top of the line Athlon MP 1900+. Let's boogie.

    Intel "Prestonia" Xeon 2.0 GHz
    The Prestonia family of processors is to the Xeon what the Northwood family is to the Pentium 4. The Prestonia Xeon shares all the benefits of the original Pentium 4 Xeon, like a 400 MHz FSB, double-pumped ALU units, and SSE-2 instruction support, but it also has a few added bonus features which make it far and away better than its predecessor.

    Just as Intel recently did with their Pentium 4 family, the Prestonia Xeon is manufactured on Intel's new 0.13 micron manufacturing processes, which allow for a smaller die area, along with lower power consumption and lower heat emissions. Not only does this make the Prestonia Xeon cheaper to produce, but the lower heat amounts come in very handy when dealing with dual and quad CPU configurations in a small form factor like a 1U or 2U rackmount. For example, the original 2.0 GHz Xeon produced a maximum of 77.5W of heat, while the new Prestonia Xeon at 2.0 GHz produces only 58W.

    While reducing the manufacturing process, Intel also managed to stick in an extra 256 kB of L2 cache on to the processor die, giving it a total of 512 kB of full-speed on-die cache. As we've seen before with the Pentium 4 Northwood, adding another 256k of cache on to the Pentium 4's core can add up to 10-15% added application performance. Thus, the Prestonia Xeon gets that same speed increase compared to previous Xeon processors. Rumor has it that Intel will announce Xeon CPU's in the future with extra on-die cache, such as the case was the original Pentium II and III Xeons.

    Both the original Xeon and Prestonia Xeon look roughly the same packaging, thus telling apart the CPU's can be difficult unless you have one right in front of you. Intel has the CPU markings on the bottom of the Xeon CPU's, as opposed to the Pentium 4 CPU's which have the markings right on the CPU's heat spreader. A quick flip of the CPU reveals the CPU's vital information. As you can see by the Xeon's S-SPEC codes, this is a 2.0 GHz Xeon with 512kB of L2 cache, running on a 400 MHz FSB, while running at 1.5V core voltage.

    Even though there's a new core running underneath, Intel decided to keep the original Socket-603 form factor of the original Xeons, allowing you to upgrade to these newer chips without buying a new motherboard. As Xeon motherboards can be extremely expensive, this is a very, very good thing.

    Besides the new manufacturing-level features of the processor, there has been one buzzword that has been gaining all the attention lately. Hyperthreading, the feature that can theoretically turn your 2 physical CPU's into 4 virtual CPU's. Let's investigate.

    What Actually IS Hyperthreading?
    Hyperthreading is actually a technology that's been around for quite a long time in microprocessing, but has never been used in a consumer-level product like the Pentium 4 Xeon. The technology itself is based on Simultaneous Multi-Threading (SMT) and was codenamed "Jackson Technology" by Intel while in development. At the last IDF, they gave this technology a name that fits in better with the Pentium 4 architecture, Hyperthreading.

    Hyperthreading is simply a method of placing a second set of registers on the processor core, allowing the processor to execute two "threads" at once. Every time you run a piece of software, the software is sending threads to the CPU for it to execute and process. Until now, consumer level processors can only handle one thread at any given time. While a processor may go through thousands of threads per second, the CPU can only physically execute one at a time. In a dual CPU system, the computer can process two threads by sending one to each CPU. Hyperthreading takes the concept of executing multiple threads and brings it down to the single CPU level.

    Hyperthreading allows the CPU to manage two threads at once, although this doesn't necessarily mean there are two CPU cores on the same die. Each register set can handle one thread, but each thread has to fight for processor resources like storing data in cache and sending it out through the front side bus. This means a single CPU with hyperthreading capabilities will not perform the same as two physical CPU's in an SMP configuration. While the ability to execute two threads at once was one of the main reasons why SMP was brought to market (symmetrical multi-processing, i.e dual CPU systems), the costs of going to SMP, such as SMP compatible motherboards and processors, in most cases far outweigh the benefits.

    Unfortunately, since the threads have to fight for resources, there can be conflicts. If two threads want to use the same processor resources at the same time, they have to get in a queue to do so. Since most every piece of software on the market is written to only take advantage of a single CPU, suddenly throwing a single processor application on a dual/quad processor system will show literally no advantage in performance. Even as of today, only small percentage (mainly workstation/server applications) are multi-threaded to take advantage of multiple CPU's.

    To get the full advantage of Hyperthreading technology, the software will have to be "optimized" for it. Whether this means re-compiling the software to support Hyperthreading through a new Intel compiler or just adding a few more lines of code, we're not certain. Intel states in their technical documents that software written to take advantage of SMP will get in upwards of 10% performance gain with a Hyperthreading capable CPU. If the software is optimized specifically for Hyperthreading, Intel has seen performance gains up to 30%.

    Nowadays, where SMP is common in workstations and servers (and in some cases, desktops), there is a lot of multi-threaded code out there. The latest major operating systems can handle multiple processors, most professional video / audio editing software can use the CPUs, and even games are just starting to take advantage of a second CPU if available. This is the market that Intel's looking to capitalize on.

    Hyperthreading in Reality
    The buzz around Hyperthreading is that a single Xeon system will be seen as two CPUs, while a dual Xeon system will be seen as a quad CPU system. Of course, people immediately think, "Wow, two CPUs for the price of one!" This is certainly not the case with Hyperthreading, just as dual processors do not give you double the power of a single processor.

    Since Hyperthreading is implemented on the hardware level, the motherboard sees a single hyperthread-compatible CPU as two physical CPUs. Thus, software that is written for multiple CPUs will be tricked into thinking there is a second CPU in the system, and will run the appropriate multithreaded code if available. Since Windows XP and 2000 are coded to take advantage of multiple CPU's, it too sees a hypertheaded CPU as two.

    In our case, since we ran with dual Xeon processors (each with hyperthreading capabilities), the OS and software see this as four physical CPUs, even though there are only two physical CPUs running. As you can see by the device and task managers in Windows XP, the OS sees our system with four physical CPU's. Eeven though Windows 2000 and Windows XP only officially support two CPU's, both operating systems were able to run properly with the Hyperthreaded CPU's. This means you don't have to upgrade to a 4-processor OS like Windows 2000 server to take advantage of this technology.

    While this looks great for showing off to co-workers or friends, you will absolutely NOT get the performance of four CPUs running in your system (I can't stress this enough). As you'll see in our benchmarks later, even if software is written to take advantage of SMP, you rarely ever see performance gains with Hyperthreading enabled. In fact, in many applications, you see a performance drop with Hyperthreading enabled, as there is a great deal of overhead when splitting data up over four CPU's to process. Perhaps this is why Intel is recommending motherboard makers leave Hyperthreading disabled in the BIOS.

    It's quite possible that Intel implemented Hyperthreading to take advantage of the Xeon architecture's longer pipeline, an often criticized design element of the Pentium 4 and Xeon families. With Hyperthreading, they can start a second process after the first one is farther down the pipe. From a theoretical standpoint, the code would have to either be highly optimized for the Prestonia or limit the use of branch prediction, since there are now two sets of independent data in the processor. If you look at Hyperthreading like this, it would appear to be the next generation of the P4's out-of-order speculative execution engine.

    From what I now understand about Hyperthreading, it's my belief that Intel is planning to use Hyperthreading in all of its future Pentium 4 products down the road. The Xeon is simply the first guinea pig to actually have the logic enabled on the die. As Intel already has the Hyperthreading logic in the current Pentium 4 hardware, but not implementing it, you've got a sure sign that Intel will simply flip the switch to activate the logic when Hyperthreading applications are actually available. If Intel convinces developers that Hyperthreading is worth their time to optimize for, this could be an incredible feature 1-2 years down the road. As for now, it's fairly useless, but certainly interesting in the sometimes bland world of computer processing.

    AMD Athlon MP 1.6 GHz (1900+)
    The Athlon MP 1.6 GHz is the latest and greatest from AMD's server/workstation family of CPUs, which have gained an extremely large amount of credibility lately due to their incredible price / performance ratio compared to Intel's Pentium 4 and Pentium 4 Xeon families. While slightly lagging behind AMD's own 1.67 GHz (2000+) in raw clock speed, the Athlon MP 1.6 GHz is quite more expensive than the Athlon XP 1.67 GHz, despite the fact that both can run SMP quite well.

    The Athlon MP is based on the "Palomino" Athlon architecture, which is based on the 0.18 micron manufacturing process. While the Palomino chips create quite a bit less heat than the "Thunderbird" variant of the Athlon, the Palomino's still create quite a lot of heat, which can be difficult for dense rackmount situations. The chip itself is based on the Socket-A form factor, which means it should be compatible with most single processor Athlon boards, as well as all the dual Socket-A boards on the market now. As you'll no doubt notice, the new Athlon XP/MP processors are coming with green packaging, although they still use the same organic packaging as previous Athlon MP/XP CPU's.

    The Palomino Athlon core comes equipped with 128 kB of L1 cache, along with 256 kB of L2 cache. While we've heard rumors that AMD may up the cache amounts on their upcoming 0.13 micron "Thoroughbred" processors, we haven't recieved any indication that this is anything more than a rumor.

    Getting a closer look at the Athlon MP 1900+, you can see the Athlon's famous bridges are not "cut", like Athlon XP chips hitting the market. This means with a simple pencil and a motherboard that supports clock adjustments, you can overclock these processors to much higher clock speeds than intended. Of course, workstation and server users would most likely never do this, as overclocking is inherently risky, but we thought it was worth mentioning.

    As you can see from reading the core, our Athlon MP processors are of a fairly recent "AGNGA" core stepping. The first line of text says "AMP1900", which denotes our chip as an Athlon MP 1900+. AMD runs the exact same processor core on both the Athlon XP and MP processors, albeit the MP models go through an extra round of multiprocessor "validation". Performance wise, these two cores are exactly the same.

    The biggest threat for AMD and the Athlon MP is the fact that the platform has been plagued by a lack of absolute stability. While the Tyan Thunder K7 and Tiger MP boards still wrangle with edge-case stability scenarios, the AMD 760MPX motherboards have been plagued with chipset problems and many board revisions. In fact, the release of the 760MPX has undone much of AMD's work in making the Athlon MP synonymous with stability. We absolutely love the Athlon processors, but the platforms still aren't up to the level we were hoping for by now. Still, as more platforms are getting released, the situation IS getting better.

    Just the facts, ma'am.

    Intel Prestonia Xeon 2.0 GHz

    AMD Athlon MP 1900+

    . Prestonia Xeon 2.0 GHz Athlon MP 1900+
    Clock Speed 2.0 GHz (2000 MHz) 1.6 GHz (1600 MHz)
    L1 Cache 8 kB 128 kB
    L2 Cache 512 kB 256 kB
    L2 Cache Speed Clock Speed (2.0 GHz) Clock Speed (1.6 GHz)
    L2 Cache Associativity 8-Way 16-Way
    Form Factor Socket-603 Socket-A
    Front Side Bus Speed 400 MHz 266 MHz
    Manufacturing Technology 0.13 Micron 0.18 Micron
    MMX Instruction Support Yes Yes
    SSE Instruction Support Yes Yes
    SSE-2 Instruction Support Yes No
    3DNow! Instruction Support Partial Yes

    The Platforms

    Supermicro P4DC6+ i860

    Asus A7M266-D AMD 760MPX

    . Supermicro P4DC6+ Asus A7M266-D
    Chipset Intel 860 AMD 760MPX
    CPU Support Up to 2 x Xeon 2.2 GHz+ CPUs Up to 2 x Athlon MP 1.6 GHz+ CPUs
    Memory Type PC-800 RDRAM PC-2100 DDR SDRAM
    Memory Capacity 2 GB Max (4 RIMMS) 3.5 GB Max (4 DIMMS)
    Memory Type Support Standard / ECC Standard / ECC
    AGP Expansion AGP Pro 50 AGP Pro 50
    PCI Expansion 2 x 64-bit (66 MHz) Slots
    4 x 32-bit (33 MHz) Slots 2 x 64-bit (66 MHz) Slots
    3 x 32-bit (33 MHz) Slots
    Onboard SCSI Adaptec AIC-7899W Ultra160 SCSI N/A
    Onboard Ethernet Intel 82559 10/100 Port N/A
    Onboard Audio AC97 Audio C-Media 6 Channel Audio
    Onboard Video N/A N/A

    Pentium 4 Xeon "Prestonia" Testbed System Configuration

    Processors 2 x Intel Pentium 4 Xeon 2.0 GHz "Prestonia" (8k L1, 512k L2)
    Cooling Intel Socket-603 Retail Coolers
    Memory 512MB Samsung PC-800 RDRAM (4 x 128M)
    Motherboard Supermicro P4DC6+ (Intel 860 Chipset)
    Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
    Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
    Software Windows XP w/ DirectX 8.1, Intel 3.2 Chipset Drivers

    Pentium 4 "Northwood" Testbed System Configuration

    Processors Intel Pentium 4 2.0 GHz "Northwood" (8k L1, 512k L2)
    Cooling Intel Socket-478 Retail Cooler
    Memory 512MB Crucial PC-800 RDRAM (4 x 128M)
    Motherboard Asus P4T-E (Intel 850 Chipset)
    Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
    Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
    Software Windows XP w/ DirectX 8.1, Intel 3.2 Chipset Drivers

    AMD Athlon MP Testbed System Configuration

    Processors 2 x AMD Athlon MP 1.6 Ghz (1900+) "Palomino" (128k L1, 256k L2)
    Cooling AMD Socket-A Retail Coolers
    Memory 512MB Crucial PC-2100 DDR SDRAM (2 x 256M)
    Motherboard Asus A7M266-D (AMD 760-MPX Chipset)
    Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
    Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
    Software Windows XP w/ DirectX 8.1, AMD 1.30 Driver Pack

    AMD Athlon XP Testbed System Configuration

    Processors AMD Athlon XP 1.67 Ghz (2000+) "Palomino" (128k L1, 256k L2)
    Cooling AMD Socket-A Retail Cooler
    Memory 512MB Samsung PC-2100 DDR SDRAM (2 x 256M)
    Motherboard Asus A7V266-E (VIA KT-266A Chipset)
    Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
    Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
    Software Windows XP w/ DirectX 8.1, VIA 4-In-1 4.37 Service Pack

    Lab Notes

    * All tests run with VSync (Vertical Sync) Disabled.
    * Nvidia Detonator XP (23.11) Driver used in all testing.
    * All RDRAM memory run with "Nap" mode disabled.
    * All DDR memory run at CAS 2.5 latency.

    Benchmarking Software

    * Adobe Photoshop 6.01
    * LAME MP3 Encoder 3.91
    * Kinetix 3D Studio MAX
    * Red Hat Linux 7.2
    * SiSoft Sandra 2002
    * Windows Media Encoder 8.0

    SiSoft Sandra 2002 is a synthetic Windows benchmark.
    The benchmarks can stress CPU, Memory, or Processor Instruction abilities.
    Higher Sandra scores mean better overall performance.

    CPU Benchmark - Hyper-Threading Support (SMT) Enabled
    (Higher Scores are Better)

    CPU Benchmark - Hyper-Threading Support (SMT) Disabled
    (Higher Scores are Better)

    Memory Benchmark
    (Higher Scores are Better)

    SiSoft's Sandra, while being a synthetic Windows benchmark, is one of the few pieces of software on the market with some level of Hyperthreading support. This is through Sandra's "SMT" test, which to be honest, gave us extremely sporadic results at first. Once we figured out what exactly was happening with the test, we were able to finally lay down some solid numbers.

    First off, it's quite easy to see that the dual Athlon MP setup simply rules the roost when it comes to raw CPU performance. Even with the Athlon MP chips at 1.6 GHz, it's easily able to outpace the dual Xeon 2.0 GHz processors, with or without Hyperthreading enabled. Even the highest performing Xeon setup still trails the dual Athlon MP 1900+ by roughly 30%.

    When Hyperthreading was enabled, we can certainly see some performance gains being had by the Xeon setups. One CPU with Hyperthreading gained 18% in this benchmark, while two CPU's with Hyperthreading gained 23%. Of course, this is simply a synthetic test, and to achieve any real world performance gains like this, the software would have to be specifically optimized for Hyperthreading.

    Upon looking at the results, we're not positive on what effect the SMT test has on our scores. As you can see by the first graph, even with Hyperthreading (hardware)disabled on the dual 2.0 GHz Xeons, it still managed to get a higher score on the Hyperthreading (software) test, compared with Hyperthreading (software) being disabled, which nearly has a margin of 2000.

    In terms of memory performance, Xeon systems still maintain quite a large margin over the current Athlon MP systems. Thanks to the Xeon / i860 dual channel RDRAM memory interface, you've got quite a bit more available bandwidth compared to the Athlon MP / 760MPX single channel DDR interface.

    Adobe's Photoshop 6.0 is the world's most popular image creation/editing software.
    We run a series of filters on an image, while measuring perform them.
    The times for each filters are added up. Lower times mean faster performance.

    Adobe Photoshop 6.01 Filter Benchmark
    (Lower times are Better)

    Adobe's Photoshop thrives on fast FPU units along with lots of memory bandwidth and capacity. Even though Photoshop is multi-threaded, the software only really takes advantage of multiple processors on a few select filters. Thus, running a second processor doesn't necessarily help Photoshop that much, at least in this case.

    In our test, we see the simple single Athlon XP 2000+ processor beating out both the dual Athlon XP 1900+ and dual Xeon systems. While the other platforms were merely seconds away, it's clear that the Athlon-based systems take the cake for best overall Photoshop performance. We see the addition of a second Athlon MP processor took nearly 8 seconds off the benchmark time. Not bad, but we were hoping for more.

    Hyperthreading shows itself here to become more of a nuisance than actually helping performance. With Hyperthreading enabled, the dual Xeon 2.0 GHz system actually slows down by 5 seconds, while a single Xeon 2.0 GHz with Hyperthreading speeds up by 2 seconds. As you'll likely guess, Photoshop is not optimized for Hyperthreading, so any performance gains seem to be purely coincidental.

    Keep in mind, we ran this test with the Adobe 6.01 patch installed, along with Adobe's specially released SSE-2 filter package, and the Xeons still couldn't fully stand up to AMD's new Athlon processors.

    3D Studio is one of the most popular 3D editing suites on the market today.
    We render a 50-frame scene with over 40,000 faces and 20,000 vertices.
    Lower render times mean faster processing performance.

    3D Studio MAX "Tank" Render Test
    (Lower Times are Better)

    3D Studio MAX, and any kind of 3D rendering software, relies almost 100% on the CPU for final scene rendering. Thus, multiprocessor systems are almost required for any kind of professional level 3D modeling software. 3DS Max is indeed able to fully take advantage of multiple processors.

    In our test render, we again see AMD take the take, as the dual Athlon MP 1900+ system rendered our scene the quickest. While the Dual Xeon 2.0 GHz system was just about one minute behind, the Athlon systems simply rock for these kind of applications. Even our single Athlon XP 2000+ system managed to render a few seconds faster then Intel's dual Xeon 2.0 GHz box.

    As for Hyperthreading, again we see mixed results. A single processor with Hyperthreading actually helps out, cutting 15 seconds off our rendering time. Two processors with Hyperthreading hurt a lot, as it added an extra 1:56 to our final render time. Ouch.

    Windows Media Encoder is a free Windows video encoding suite.
    We take a 50MB MPEG file, and encode it to Windows Media 8 (.wmv) format.
    We test at 320x240 Resolution using the WM8 for Cable/DSL encoding method.

    50MB MPEG Video to Windows Media Video Encode
    (Lower times are Better)

    While the Xeon was crushed by the Athlon MP in the previous two tests, the table turns around for video encoding. Encoding our MPEG movie was incredibly fast with the Dual Xeons, the fastest score we've seen for this test to date. Windows Media Player 8 is extremely efficient with multiple processors, giving a 30-40% boost in encoding times for both the Xeon and Athlon MP platforms.

    Even as the Xeon is the clear winner in these tests, Hyperthreading again disappoints. A single Xeon with Hyperthreading tacks on another 20 seconds to our encoding time, while Dual Xeons adds on another 29 seconds. Disappointing, to say the least.

    MP3 Encoding is extremely CPU intensive, and tests the CPU's raw FPU performance.
    We use LAME 3.89, which has optimizations for MMX, 3DNow, and SSE
    A 200MB .wav file is encoded to a 160 kbps MP3, we record the time to encode.

    200MB Wav to MP3 File Encode
    (Lower Times are Better)

    MP3 encoding through LAME is entirely CPU based, but since the program isn't multithreaded, we don't see any performance gains when adding a second processor. Thus, winning this benchmark is simply a case of having the best FPU performance in a single processor situation, which the Athlon clearly does.

    The Pentium 4 / Xeon platforms are 9-10 seconds slower, no matter what motherboard or processor combination is used. Both the Athlon MP and Xeon systems give very respectable encoding performance, but the Athlon MP/XP are clearly the winners here.

    Red Hat is the most popular Linux distribution in the world currently
    We test by recompiling the 2.4.9 kernel using the "make bzImage -j#" command.
    Depending on the # of threads, compiling time can be different, especially with SMP.
    Lower compile times mean better processing performance.

    Red Hat 2.4.9 Kernel Compile - 1 Thread
    (Lower times are Better)

    Red Hat 2.4.9 Kernel Compile - 2 Threads
    (Lower times are Better)

    Red Hat 2.4.9 Kernel Compile - 4 Threads
    (Lower times are Better)

    Compiling a Linux kernel is extremely stressful on the CPU, and as we tested with the SMP-compatible 2.4.9 Red Hat kernel, we were able to see some very nice performance gains with a our multiprocessor systems. As the 2.4.9 kernel also has for "Jackson Technology" (aka, SMT / Hyperthreading), we were hoping to see what Hyperthreading was capable of doing in a Linux environment.

    When the kernel is compiled with a single thread, the systems don't show any real performance gains with a second processor installed. Compiling with two or more threads is where you really start to see the performance gains of SMP with Linux.

    With two threads running, compile times are nearly cut in half with two CPU's installed. The Dual 2.0 GHz Xeons manage to compile the kernel quickest at 1:57, while the Athlon MP 1900+ setup is nipping at its heels with a 2:05 compile time. Compiling an entire Linux kernel in under two minutes is simply an incredible showing of CPU power, any way you look at it.

    For curiosity's sake, we decided to run a compile with four simultaneous threads. As dual Hyperthreading-enabled Xeons can physically take four threads at once, we figured it would be a good test. Unfortunately, there were only 1-2 second differences in compile times between 2 and 4 threads. Compiling the kernel with 2, 3, 4, 5 and more threads gave roughly the same compile times.

    The Final Word
    Both the Prestonia Xeon and Athlon MP are incredible processors, and both engineering teams deserve a round of kudos for producing some incredibly fast SMP-capable CPU's. Each CPU has a specific area where you'll see one dominate over the other, although the majority of the tests were fairly close between the two CPU's.

    In my opinion, the Prestonia Xeon is the better CPU of the two for mission critical / server applications. The Intel 860 platform seems to be incredibly stable, considering it's relatively short time on the market. Not one instance comes to mind where we ran into compatibility issues with our Dual Xeon systems, something we can't say for the Athlon MP systems we setup. Unfortunately, you pay the price for the Intel name, as Xeon systems are extremely expensive. The CPU's and motherboards are both extremely expensive, which makes the Xeon hard to recommend for the workstation market.

    The workstation market is much better suited by the Athlon MP processor, as its price / performance ratio is unbeatable. For most workstation applications, the Athlon MP even will be a better performer, despite its lower price tag. We would love to see AMD put a few more server-specific features on their MP processors to justify their heightened price tags over the Athlon XP, but even as they are now, the MP's are a great deal for the amount of processing power you get in that tiny little core.

    As for the Xeon's Hyperthreading technologies, it's hard not to be disappointed with the scores which we got throughout our testing. Hyperthreading sounds like an incredibly useful processor feature in theory, but in practice, It's useless without compatible software on the market. Time will only tell if developers want to take on the Hyperthreading challenge, and the few developers we've talked to have not been that incredibly impressed with the technology thus far. If nothing else, Hyperthreading will certainly be an interesting to watch out for over the next few years.

    This time next year, it's quite possible that we may be dealing with McKinley and Clawhammer has the workstation processors of choice, if Intel and AMD have their way. While it's anyone's guess if 64-bit processing is ready to come down to the consumer level, this article certainly proves that current 32-bit processors have more than enough power to handle today's applications.

  6. AS/400 by crow · · Score: 4, Interesting

    I believe that this was done in the IBM AS/400 using a special version of the PowerPC chip. There was a talk on this at the Ottawa Linux Symposium last summer. According to the IBM people, it mostly worked great, but there were a few issues with spin locks--the CPU saw that one thread was busy (in a spin lock), so it never switched to the other one (that was holding the lock). The Intel implementation may be slightly different, but this is something to look at.

    When your hardware isn't exactly what the software was written for, you tend to have weird bugs like that. I would not be surprised if Windows, Linux, FreeBSD, and other OSes need minor patches to work well with this new hyperthreading from Intel.

  7. Re:Overheating by castlan · · Score: 3, Interesting

    Overheating looks like a valid concern in this case. While overclocking will push the limits of heat dissipation, that is not the same as heavy use. An overclocked processor will still generate significant heat even when in an idle loop.

    The issue that concerns me is that most consumer CPUs aren't designed with true heavy use in mind, and the specs usually consider that most of the time, the standard processor is not pegged. This can be an issue if full time compute processes don't give the processor time to idle, as in Seti or Distributed.Net usage. That is why these projects specifically warn against overclocking - the combination adds up.

    Now even with a full time load, like with the Distributed.net client, the entire processor die isn't generating heat - some of the CPU logic remains idle. This still allows for a buffer for heat dissipation, as slight as it might be. Now with this hyperthreading technique, most of the die can be actively generating heat simultaneously, pushing the heat generation potential higher than the specs likely considered.

    Considering that the largest problem that all Intel processors had since the Pentium 60 involves inability to deal with sufficient heat dissipation, this concerns me deeply. I fear the day soon approached where the Intel processor code names are based on the Black Body Effect: The low end "black" "dull red" and "infra-red" models are outmatched by the "Hot-white" and "blue blaze" series, but much of the extraordinary cost is attributed to maintaining active-cooling systems that are spontaneous-combustion-retardant.

    And the Melting point of silicon substrate with varioius doping agents will soon become common knowledge.

    -castlan

  8. Other systems.... Tera by fitten · · Score: 2, Interesting

    Another system that used something like this was Tera: http://www.sdsc.edu/SDSCwire/v3.18/tera.html However, what they did was have 128 contexts per CPU and it round-robin'd through them all. You could also "daisychain" multiple CPUs together in a system. It was interesting but I don't know what ever happened to the machines they were building.

  9. Intel isn't IBM, and computers shouldn't lie. by castlan · · Score: 4, Interesting

    Despite VmWare, The Intel architecture isn't really virtualized. I am willing to believe that IBM can actually put multiple CPU cores on one die, for many SMP benefits without the downside of requiring multiple CPU dies and infrastructure.

    On the other hand, for Intel this seems to be unrelated, and a cheap hack, where the CPU presents an SMP interface to the OS only to to request multithreaded code. Requesting multithreaded code should be possible without sending misinformation to the basic Operating System services. This seems to be a symptom of the disconnect inherent in having different companies producing the Architecture and the Operating System. It seems unlikely that you would see such a crude hack for fundamental systems infrastructure coming from an integrated vendor such as SGI, Sun, or even Apple. The Mips and PPC/Power don't lie about such things, because the support chipsets are actually supported by the Operating Systems.

    Really, this seems reminicent of C/H/S mangling in storage interfaces and long file name mangling in the vFat filesystem. While is has taken up many generations (in computer years) to abstract the state of the art from such persistent kludges, they still haunt the consumer computing space to this day. Now do we really need to go even further backwards fuck with how the CPU interfaces with the core system?

    Of course, perhaps I am not giving the Wintel juggernaut the benefit of the doubt... after all, they will have to bear the consequenses of such a choice, so maybe they aren't just adding saddlebags to the most basic part of the computing system. Perhaps they have the foresight to resolve this in the next few years before making this burdensome interface an accepted standard. It would be shameful too add more standardized interface limitations... such as the standardized "Gates" hopping of the 640K limit on commodity computing harware, or the IDE limit of 540MB, no 2GB, no 8GB, no 36GB, no wait 140Gb, Ad Nauseam. All of these arbitrary limitations were the results of cutting corners and not sending fully honest information to basic system interfaces. Can't the OS send efficient multithreaded code for processing without living a lie?

    -castlan

  10. Re:Ouch by KidSock · · Score: 3, Interesting

    Wow, kinda sucks if your OS has a per CPU license, like NT and Win2K server!

    You want to know something even better? The way CPU ids are managed is by bits in an integer. Every other bit represents the "virtual" CPUs. Now when the Windows kernel is selecting CPUs for schedualing purposes it enumerates them in order. This means that when a process is schedualed to run on the next available CPU theres a very good chance it will get a virtual CPU even though a real CPU is completely free. So if you have an 4 CPU machine (4 real, 4 virtual means 8 Hyperthreaded total) and you have 4 processes that can run only 2 real CPUs will be used.

    Ok, ok, ok stop laughing. Here's the kicker. MS fixed this. But did they provide the fix to there customers? No. You have to get the Data Center version to enumerate your CPUs properly!

  11. intel's version isn't for anything else either! by RockyJSquirel · · Score: 2, Interesting

    Notice that the Linux kernel build on two threads went slower with "hyperthreading" on than without it. And compiling is as eclectic a task as possible. I can imagine that highly optimized loops in graphics programs already max out some chip resource (like the float alus) so that multithreading them in this scheme does no good, but when compiling fails to parallelize, you know that intel must have screwed the implementation up, big time.

    Multiple processors sharing the same cache on a single chip ought to be a big win, whether they share alus or not. In some cases a set up like this should signicantly out-perform regular multi-processors (when both processors are dirtying each other's caches). Intel must have screwed something up.

    The benchmarks show that the current implementation of "hyperthreading" is basically useless. The idea could work very well though.

    Rocky J. Squirrel

  12. Is the kernel patched for HyperThreading? by r6144 · · Score: 2, Interesting

    I remember that Alan Cox wrote a patch to deal with Hyperthreading just a few months ago. One thing it does is to avoid putting two threads on the same hyperthreaded CPU when there are spare physical CPUs, i.e., distinguish between physical and hyperthreaded CPUs.

    Is it in the test kernel?