AMD Delays Hammer
TeJarz writes "C|Net reports that their next processor (Hammer) has been rescheduled from its original Q4 release to Q1 2003. To quote C|Net: 'The delays are occurring to accommodate the release of a new version of Athlon with a 333MHz bus, said Crank. Current Athlons come with a 200MHz bus and 256KB of secondary cache.' Let's hope this doesn't get moved again."
Current Athlons have 266 bus. You can still get the older 200 bus, but it died out about a year ago. Sorted in price on pricewatch
What is the reason for the delay? Can it really be that it's just a business decision (as they seem to say) rather than a technological problem? It seems that AMD _needs_ this jump in 64 bit computing, the sse2 registers, and boost in performance on Intel. So to me, if it is a business decision, it is a poor one.
Everything I have seen shows that Intel is doing much better in performance and climbing. AMD claims there is no real technological reason, yet there must be. Anyone have insights? It seems that it would be prudent for AMD to issue better explanations -- how could it hurt to be honest? I want to see competition, if they are going to lag in performance, then they present no reason for people to buy. (A similarly performing Intel chip is close in price right now)
Guess what? I got a fever! And the only prescription.. is more cowbell!
Any place I can look for some doc on that issue ? :( The fingers have all been pointed at software optimization and we are doing some heavy duty examinations but it sounds all too pat to me...
We are migrating from our Alphas to dual P4's and seeing a serious drop in performance that should not exist
errr....umm...*whooosh* *whoosh* Is this thing on ?
I can probably send you some test code (same for anyone else who asks), but I'll have to check with my advisor first. The smallest I've made the test code is a bit under 300 lines. It's been run on Alpha 21264 EV67, Athlon C, Athlon XP, P4, and P-III, and one other Pentium-ish platform. At least two (I believe it's actually three) profilers have been run to find the bottleneck; it appears to be the floating point unit stalling for data.
Here are the timings. Note that these are just via "time" on GNU/Linux or a wall clock on Windows (or something -- I didn't do the Windows tests).
P4 dual Xeon 1.7GHz/gcc: 82 seconds
P3 1000/msvc: 18 seconds
Athlon C 600/msvc: 2 seconds
P3 1000/msvc, using floats and sse:
2 seconds
Alpha 667/gcc: 2 seconds
Athlon XP 1900+ 0.88 seconds
I guess the Athlon's clock was closer to the P4's clock than I recalled in my original post. Either way, the slowdown on the Pentiums can be easily seen.
-Paul Komarek
They got beaten today.
Down 7% on Intel's 2 cent per share dividend.
They'll get beaten again tomorrow.
They'll get beaten at Christmas.
They'll get beaten until Sledgehammer is released.. not Claw hammer which will have no x86-64 desktop software support right off the bat, and will have to rely solely on it's pure x86 performance.
Microsoft shafted them on the X-box because Intel paid Microsoft 200 million to use the Pentium III. Nvidia was stuck with an unused AMD integrated chipset for X-box and Nforce was born.
Intel will pay Microsoft to shaft them again. No x86-64 Windows XP for AMD despite AMD testifying on Microsoft's behalf in exchange for anti-trust testimony. AMD made an unenforceable agreement with Microsoft. To enforce it would perjure themselves.
So Intel wins again.
Until Sledgehammer arrives. Sledgehammer is a server/workstation chip and will have full support of the dominant server operating system, Linux. Microsoft must support Sledgehammer or risk losing more of their already weak server market share.
Long after Microsoft has done the work to get Windows running on the Server, Microsoft will incorporate x86-64 support into their desktop OS.
Probably about the same time as they support Hyper threading and SSE3 for Intel.
If voting were effective, it would be illegal by now.
Everyone always makes the same really annoying mistake when it comes to athlon fsbs. Athlon front side busses do not run at 200MHz and 266MHz. They offer bandwidth equivalent to 200MHz and 266MHz by using both sides of the clock (DDR) on 100MHz and 133MHz fsbs. All new athlons use 133MHz DDR fsbs. The hammers will support 166MHz DDR memory busses, offering performance equivalent to 333MHz SDR memory.
However, the notion of "fsb" is a little blurred with the hammer. Hammers will be directly connected to dimm banks and have integrated memory controllers, so the speed of the fsb will no longer be a determining factor in memory bandwidth. (* see mp note below) The traditional fsb to the traditional northbridge will be replaced by a "high speed" hypertransport link to a chip that connects to the agp slot, and has another (slower) hypertransport link to what could be called the south bridge. This "south bridge" will then connect the pci bus, serial ports, hard drives, usb ports, and any other devices that need to talk to the processor or main memory.
*What does this mean for MP systems? Well, that's actually the really cool part. By moving the memory controller onto the processor and providing communication between processors over a hypertransport link (3.2GB/sec for dual, 6.4GB/sec for quad and above), memory bandwidth actually increases as more cpus are added! This is in contrast to a normal MP system where as more cpus are added, there is increased competition for a fixed resource (main memory) which is already the bottleneck in many single processor applications.
That's my rant on terminology. Here's the question:
I'm no kernel hacker, and I certainly don't know anything about writing schedulers, but it seems like this would require a change in how processes are handled in hammer mp systems. In traditional mp systems, every processor has equal access to main memory. If a process gets moved from one cpu to another, there's initial overhead to do the moving, but after that it can still get to its areas in memory without any problems. On a hammer mp system, migrating a process from one cpu to another would mean that in order to access its memory it would have to reach out of its cpu's hypertransport link, into another cpu's memory controller (which may or may not be busy) and into the attached ram. Considering there would not be enough bandwidth available on the 3.2GB/sec hypertransport bus (in the case of a dp system) for both processors to reach into eachothers 166MHz DDR memory at the same time without suffering a performance hit, it seems like there would definitely be an advantage to keeping processes close to their data.
What changes would this require to scheduling and process management code, if any? Has this already been addressed, or are there people working on it in the linux kernel?
You ever notice how all the Hammers are clock speed locked at 800MHz? Yea, there's a reason for that. They're having problems cranking the clock speed up. For 800MHz they're fast as hell, beating P4 with twice the frequency, but they're not gonna release them until they clock faster than current Athlons so they're trying different types of transitors and what not.
How the hell do I know that??? Look where I live, take a guess...The birds outside my window know things.
it appears to be the floating point unit stalling for data.
2 /p4_240 0-01.htmlt hwood.htme ct.htm
s or.htm ?iid=Homepage+Find_Products_Processors&
Well, if it's stalling for data, your problem is probably that the P4 has a *tiny* L1 data cache compared to... uh... anything. It's only 8K, compared to the Athlons 64K. See the following URLs:
http://www.tomshardware.com/cpu/02q2/02040
http://www.geek.com/procspec/intel/nor
http://www.geek.com/procspec/amd/k7sel
It's probably also worth noting that Intel does NOT list the P4 as a "server processor". The P4 is listed as a desktop or workstation processor. Only P3, Xeon, and Itanium chips are recommended for server use:
http://www.intel.com/products/browse/proces
You might want to show that to management and reconsider your purchase of P4 equipment. Even a P3 is likely to perform better.
Something I've never seen a good explanation of -- is there performance-wise any difference between a 266 MHz clock with data transferred once per clock and a 133 MHz clock with data transferred twice per clock (despite the actual clock ticking rate of course)?
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Why should they rush the Hammer when the Itanium is failing as is? They know they can't push people to use their 64-bit capabilities, just like people didn't switch to Alphas. Squeeze every ounce of strength from the Athlon as they possibly can for now. Let Intel push the IA64 standard on everyone first to create a demand to migrate from 32-bit to 64-bit. That's where AMD plans to make their killing.
I would imagine it would be better to release Hammer ASAP and create the 64-bit market themselves. Then again, I don't know the logistics required for such a launch, nor do I know exactly how much better, if any better, x86-64 would perform. Let's face it, not many people care about 64-bit versus 32-bit, they only know what the dork at CompUSA tells them. And if Hammers can't outscore P4's in the 32-bit apps that very short-sighted people care about, then there is really no place for Hammer in the consumer market.
From what I've heard, mostly from internet gossip, is that AMD is having problems making Hammer scale high enough to beat the P4 in 32-bit apps, although it only requires roughly 1 Hammer MHz to beat 3 P4 MHz. I've also heard that AMD is having problems making Hammers run above 800MHz. With the expected debut of the P4 at clock speeds above 3GHz, the Hammer doesn't stand much of a chance in 32-bit apps.
In short, don't expect to see Hammers until Intel manages to salvage the Itanic.