AMD Delays Hammer
TeJarz writes "C|Net reports that their next processor (Hammer) has been rescheduled from its original Q4 release to Q1 2003. To quote C|Net: 'The delays are occurring to accommodate the release of a new version of Athlon with a 333MHz bus, said Crank. Current Athlons come with a 200MHz bus and 256KB of secondary cache.' Let's hope this doesn't get moved again."
...have a 266MBz bus
AMD = All Microprocessors Delayed
Current Athlons have 266 bus. You can still get the older 200 bus, but it died out about a year ago. Sorted in price on pricewatch
Your joking right?
Do you think its called Wintel because they couldn't figure out how to spell AMD?
If you wanna get rich, you know that payback is a bitch
Comment removed based on user account deletion
Anybody here have stock in AMD? I've been long on the company for like two years now, but it never seems to finally launch the Hammer!!! I was hoping for a christmas release, but that's not gonna happen now...... my stocks will get beaten tomorrow!!! :(
who works at AMD, we were talking about this tuesday, that the Hammer chips will be released next year, and I told him I thought late this year. Well, looks like he was right.
Because the sooner it comes out, the sooner I get to play with a 64-bit OS development on a machine that gets top performance and doesn't cost $20,000. That alone is reason enough for AMD to ship it sooner.
A deep unwavering belief is a sure sign you're missing something...
I, for one, am hoping to replace our Alphas with cpus from the AMD Hammer series. We're about to buy a bunch of P4-based machines despite the problems we've had with certain tight loops in scientific code performing 80 times slower than a similarly clocked Athlon (according to Athlon advertised "speed", not actual clock). No, I'm not exaggerating, and this has been verified independently -- the P4 cpu has some huge weak spots that really suck if you hit them. If Hammer were out and working properly, we probably wouldn't buy the P4 machines to hold us over.
We need 64 bit machines to accomodate massive memory for our research. I'm really hoping the Hammer can provide a relatively inexpensive and *commoditized* 64 bit platform for us to work on, compared to existing 64 bit (workstation/server) platforms. And I want it yesterday. Actually, I want it last year.
I have no idea what the editors or submitter meant, of course.
-Paul Komarek
"Let's hope this doesn't get moved again."
There's a damn good reason I want this to come out soon. The sooner AMD comes out with Hammer the sooner Intel has some extremely serious competition. If Hammer can stand up to its hype the P4 won't look so hot, especially if Hammer ramps well in clock speed. Strong competition = lowering of prices. Also, Athlon XPs would then be pushed into the value market. So not only would Intel be forced to drop prices on their desktop and server CPUs, but AMD's old lineup would become and absolute steal. Sounds good for the average consumer, eh? Lets hope for no more delays.
-Yoweigh
What is the reason for the delay? Can it really be that it's just a business decision (as they seem to say) rather than a technological problem? It seems that AMD _needs_ this jump in 64 bit computing, the sse2 registers, and boost in performance on Intel. So to me, if it is a business decision, it is a poor one.
Everything I have seen shows that Intel is doing much better in performance and climbing. AMD claims there is no real technological reason, yet there must be. Anyone have insights? It seems that it would be prudent for AMD to issue better explanations -- how could it hurt to be honest? I want to see competition, if they are going to lag in performance, then they present no reason for people to buy. (A similarly performing Intel chip is close in price right now)
Guess what? I got a fever! And the only prescription.. is more cowbell!
A delay from palladium which will be included by default starting with the Hammer. It was probably delayed because longhorn aka drm-Windows was delayed and its needed to actually use the cyptography in the cpu.
http://saveie6.com/
Incidently, you can get a nice new dual Alpha 21264 667 4u rackmount with 4GB ram and 18GB scsi (64 bit) for = $14,000 these days. With educational discount, you can buy a Compaq ES40 (with single cpu to start) for $20K. I have no idea what the used 21164 machines are selling for these days.
I don't have the same motivation for 64 bit machines (I need them for cycle servers with big memory), but I'm just as anxious for a commoditized 64 bit platform to emerge.
-Paul Komarek
Any place I can look for some doc on that issue ? :( The fingers have all been pointed at software optimization and we are doing some heavy duty examinations but it sounds all too pat to me...
We are migrating from our Alphas to dual P4's and seeing a serious drop in performance that should not exist
errr....umm...*whooosh* *whoosh* Is this thing on ?
The biggest problem with current processors is that to design such devices we *have* to use dynamic logic. Ask any VLSI design engineer.. that is no joke. Infact many multipliers and dividers have to be hand edited! So delays are expected and it does reflect upon the desigers and companiesd in any way.
Before you ask.. I do now work for AMD, i work in another VLSI company, thats why i say.. its tough. Millions of gates thousands to be hand edited its a bitch.. but as they say the fruits of labour are sweet... and for AMD hammer is going to be the sweetest
My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
FB : https://www.facebook.com/TanveersPhotography
As for the 333mhz bus, I heard somewhere that the memory bus speed isn't the bottleneck for the Athlon processors...
Depends on what you're doing.
The P4, especially configured expensively, has a kickass memory subsystem on the motherboard (dual-channel anything will score high on bandwidth-bound tests). The fact that the Athlon doesn't has hurt its relative benchmark results even more than the speed war has.
I still love the Althlon, and I still avoid the P4 on (personal) principle, but a faster memory bus is a Good Thing for AMD.
The Hammer will live or die on this too. We don't have a real-world test of how well its memory subsystem works yet. The NUMA scheme for multiprocessor systems also gives me pause (without migration/copying of non-local pages, it'll bog down like crazy under certain conditions).
A well-performing Hammer will push AMD back into prominence. I strongly suspect that they're at least partly buying time to tune the core.
I'm betting they're adding Palladium. It seems likely, since these days you must make sacrifices to gain things. XP Service Pack 1 will fix a few security holes, but at the cost of your privacy. Hammer will be 64-bit and more powerful than anything you've got now, but will probably be Palladium-enabled. Or maybe I'm being a pessimist and they're not adding Palladium. Lets hope not :|
Happy New Year, it's 1984!
Comment removed based on user account deletion
struct conspiracy theory {
;-)
real : MS;
int : palladium;
int * : hammer;
hmmm is it for integrating palladuim support!
};//end of struct
Sorry... couldnt help it
My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
FB : https://www.facebook.com/TanveersPhotography
The hammer is a critical product for AMD that they would never delay unless there were *major* problems with it.
1) AMD is currently losing huge amounts of money. The hammer would have allowed them to sell at the high-performance end of the market again where the sales prices are higher and might have helped them reduce the flow of red ink.
2) The delay will badly hurt AMD partners such as motherboard and chipset vendors who have developed supporting products for hammer.
3) The hammer had a potential performance lead over Intel that will be greatly eroded by the time it finally appears.
4) Critical software development for hammer will be slowed which will slow eventual market acceptance of hammer.
5) The delay will build momentum for Itanium.
6) The delay will greatly reduce the pressure on Microsoft to support hammer and will give Microsoft the opportunity to also build momentum for Itanium. Depending on market conditions when the hammer finally appears, it is now even possible that Microsoft will never need to support hammer.
7) This delay is so serious that it creates real doubts that hammer will *ever* be a viable product.
LOL. Busi.
Best... comment... ever.
"If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
"(Man) tries to live his own life as if he were telling a story. But you have to choose: live or tell." --Sartre
Hard as that may be to believe, some people use their computers for real work. And some of those people run into that dreaded 4G limit--4G is not a lot of memory anymore these days. And many of these people would love to have the choice of a Hammer over Itanium.
The poster dies in a fit of agony...
A feeling of having made the same mistake before: Deja Foobar
They're waiting so they can ship the new chip bundled with Duke Nukem Forever. ;)
All movements for social change begin as missions, evolve into businesses, and end up as rackets.
This guy is quite rude, offensive and disrespectfull to others and is very arrogant. Ignore him and add him to your foe list like me. Also he has been very supportive of drm in a troll like way and feels free to flame other people who actually like their fair use rights.
http://saveie6.com/
Comment removed based on user account deletion
Now, after Athlons have been out for a long, LONG time, I have an Athlon XP 2000+ sitting around, patiently waiting for me to get a motherboard and case to go along with it's RAM and 80GB hard drive.
.. but then you blow your entire "i don't need to waste money" premise by buying a state-of-the-art processor and then sitting on it until it's middle of the road? i don't get it.
dude - if you've got an Athlon2000+ that's "patiently waiting around" then you must have bought the thing when it was brand new -- and paid a hell of a premium to let it sit doing nothing. same chip's probably half price now. i can almost buy your "my 550 celeron runs everything i need!" story
i could live a little longer in this prison
I can probably send you some test code (same for anyone else who asks), but I'll have to check with my advisor first. The smallest I've made the test code is a bit under 300 lines. It's been run on Alpha 21264 EV67, Athlon C, Athlon XP, P4, and P-III, and one other Pentium-ish platform. At least two (I believe it's actually three) profilers have been run to find the bottleneck; it appears to be the floating point unit stalling for data.
Here are the timings. Note that these are just via "time" on GNU/Linux or a wall clock on Windows (or something -- I didn't do the Windows tests).
P4 dual Xeon 1.7GHz/gcc: 82 seconds
P3 1000/msvc: 18 seconds
Athlon C 600/msvc: 2 seconds
P3 1000/msvc, using floats and sse:
2 seconds
Alpha 667/gcc: 2 seconds
Athlon XP 1900+ 0.88 seconds
I guess the Athlon's clock was closer to the P4's clock than I recalled in my original post. Either way, the slowdown on the Pentiums can be easily seen.
-Paul Komarek
but it turns out you can't touch this.
There's nothing Intelligent about Intelligent Design.
May I ask why you are going to P4s instead of just getting more Alphas? You yourself said you are loosing quite a bit of performance with the P4 compared with the Alpha, but you don't say why more Alphas aren't an option.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
I think you are getting confused- memory bandwidth has been Athlon's biggest bottleneck by far for a long time now. If you don't believe me, check out the Memory bench results. So far AMD just hasn't been able to compete with the 400 and 533 MHz FSB.
"The defense of freedom requires the advance of freedom" - George W Bush
Any place I can look for some doc on that issue ?
Darek Mihocka of emulators.com has written a whole bunch of stuff about the Pentium 4. He has examples of code that performs badly on Pentium 4, although I'm not sure how the most recent versions of the P4 would work on his code samples.
http://www.emulators.com/pentium4.htm
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Hell, I got a 500MHz Alpha system for $300, used.
You are thinking about this all wrong. You seem to believe that a thousand dollar AMD chip is going to perform like a mainframe. You may be seriously disappointed when you figure out that these new 64-bit chips aren't going to make your current systems obsolete at all.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
I live in Austin, and have friends who work at AMD. AMD may make a great processor, but their motherboards suck because the motherboard testing department's manager tries very hard not to find any bugs. (Test stuff that you know will work. Never install an OS, just use a ghost image of preinstalled windows XP copied through the network onto the hard drive. Testing with linux is a no-no, because you actually find reproducible bugs in the hardware! We can't have that, we're a testing department...)
At least one woman was fired for making a Linux test CD and distributing it internally around the company, against that manager's wishes. Her name's on the test CD, and it was still being used inside AMD last week, but she answered too many Linux questions for people outside her department and as such was labeled "not a team player" in the internal politics. As far as I can tell, that was the most knowledgeable linux person they had anywhere near that area.
AMD makes great processors, but until they get a new motherboard testing department, they'll have nothing to put them in.
And of course the AthlonXP. I personally dont like the name but they are a company who is out to get money so i dont oppose them naming it this way. It just means more people are comfortable buying a budget chip.
unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep
FWIW, that went out with the K6-III at the latest. None of the Athlons or Durons I've installed have had the Windows logo on them in any manner--not printed, not engraved.
20 January 2017: the End of an Error.
Price/performance on the Alphas is low for most of our applications, making the only Alpha selling point it's 64-bitness for big memory. Many of our apps don't need that much space and can run on x86. The few apps that do need 64-bitness will be run on our existing Alphas. If we could get dual Alpha 1GHz machines for the same price as dual P4 Xeons, we would.
There's also the issue that finding replacement sysadmins for the Alphas isn't as easy as it is for the x86 machines. Alphas aren't much different to admin, but it can be a bit of a speedbump.
-Paul Komarek
I'm glad your happy with your slow Celeron, but don't assume that for the rest of us we don't need the fastest CPU possible. Time is money and the faster myself or someone else can get their work done the better. There are a ton of apps out there right now where that speed CPU is just no longer a viable solution.
I do happen to agree with not freaking out about a processor release date. But do realize that people are excited about this cpu for a reason.
BTW many of us here were using computers and programming before you were born. Your only 20 for Pete's sake.
If you wanna get rich, you know that payback is a bitch
We've sort of considered IA-64, but don't really want to make that expensive of leap into that performance mess. We're not going to buy into a *radically* new architecture weeks after its release, either. And we can't afford to spend lots of time tuning our code for one platform or another -- portability is key. gcc is our compiler of choice because we don't have to screw with (as many) platform-specific issues.
;-) The Windows tests were done by a different person, on his own time, 2500 miles from me. I did the P4, Athlon XP, and Alpha tests using gcc. You can at least compare those numbers. And in the end, the compiler (in general) should not make a 200% to 8000% performance difference.
"But judging from the benchmarks you posted further below, I question your know-how. You compare a GCC-compiled program running on a Pentium 4 to MSVC-compiled programs running on Athlons?"
I could snottily retort that I question your reading know-how, since msvc was for P-III and Athlon C, while gcc was for all the rest; but I won't.
For the tests, I used the same compilers we use for development and distrobution. We don't have time to screw with the industry's popularity contests. We do algorithmic and data structure work, aiming for 10000% speed-ups that just aren't available through compiler cleverness. The Intel compiler won't help us when compiling on Alpha, MIPS, or PA-Risc.
-Paul Komarek
If you weren't anonymous, I might take you more seriously. =-) I'm very interested in price/performance and maintainability. I think the Itaniums will lose on price/performance for a long time. Since Itanium(2) is so far from being a commodity processor (like our Alphas), we'd have to expend extra effort to maintain them and train others to maintain them.
The bottom line is that Itaniums only seem to make sense for people who get them for free.
-Paul Komarek
Your repsonse is not particularly well-related to my post. I hope for the Hammer to replace our Alphas in providing big memory per process. I didn't claim that the P4 performs 80 times slower. In fact, the P4 Xeon 1.7GHz generally outperforms my Athlon XP 1900+; of course, it also costs more. But on a bit of code I wrote for my research, it goes 80 times slower.
I can't spend all my research time optimizing for one silly cpu. My code is run on about 4 different cpus (only two different instruction sets, though) at present. Another cpu (with another instruction set) will be added soon. If Intel wants my code to run fast on the P4 Xeon, they can contribute to GCC; but I don't care. I'm happy to recommend that users of my code buy Athlons or Alphas.
-Paul Komarek
Everyone always makes the same really annoying mistake when it comes to athlon fsbs. Athlon front side busses do not run at 200MHz and 266MHz. They offer bandwidth equivalent to 200MHz and 266MHz by using both sides of the clock (DDR) on 100MHz and 133MHz fsbs. All new athlons use 133MHz DDR fsbs. The hammers will support 166MHz DDR memory busses, offering performance equivalent to 333MHz SDR memory.
However, the notion of "fsb" is a little blurred with the hammer. Hammers will be directly connected to dimm banks and have integrated memory controllers, so the speed of the fsb will no longer be a determining factor in memory bandwidth. (* see mp note below) The traditional fsb to the traditional northbridge will be replaced by a "high speed" hypertransport link to a chip that connects to the agp slot, and has another (slower) hypertransport link to what could be called the south bridge. This "south bridge" will then connect the pci bus, serial ports, hard drives, usb ports, and any other devices that need to talk to the processor or main memory.
*What does this mean for MP systems? Well, that's actually the really cool part. By moving the memory controller onto the processor and providing communication between processors over a hypertransport link (3.2GB/sec for dual, 6.4GB/sec for quad and above), memory bandwidth actually increases as more cpus are added! This is in contrast to a normal MP system where as more cpus are added, there is increased competition for a fixed resource (main memory) which is already the bottleneck in many single processor applications.
That's my rant on terminology. Here's the question:
I'm no kernel hacker, and I certainly don't know anything about writing schedulers, but it seems like this would require a change in how processes are handled in hammer mp systems. In traditional mp systems, every processor has equal access to main memory. If a process gets moved from one cpu to another, there's initial overhead to do the moving, but after that it can still get to its areas in memory without any problems. On a hammer mp system, migrating a process from one cpu to another would mean that in order to access its memory it would have to reach out of its cpu's hypertransport link, into another cpu's memory controller (which may or may not be busy) and into the attached ram. Considering there would not be enough bandwidth available on the 3.2GB/sec hypertransport bus (in the case of a dp system) for both processors to reach into eachothers 166MHz DDR memory at the same time without suffering a performance hit, it seems like there would definitely be an advantage to keeping processes close to their data.
What changes would this require to scheduling and process management code, if any? Has this already been addressed, or are there people working on it in the linux kernel?
The P4's x87 FPU and x86 ALU are just plain slow compared to P3s and Athlons. Though I am surprised your code is running 82x slower. I'd expect more like 2-8x slower for compute bound code. You can get a somewhat sensationalistic overview of why it's so slow at this link.
If you want more in-depth numbers you can compare appendix C of the Intel Pentium 4 Optimaztion Manual with chapter 29 of Agner Fog's Pentium/II/III Optimization Manual. You can see the Athlon numbers in Appendix F of AMD's Athlon Optimization Manual.
If you want to do number crunching with Pentium 4s your best bet is to use the SSE2 instructions/registers. You should be able to get a noticable speedup by using the Intel C++ compiler and telling it to use SSE2 instructions. If you want to eek out max performance you'll have to use assembly language. Though you can probably get most of the way there using the Intel C++ Compiler's SSE2 intrinsics.
I'm curious as to why your code is so much slower on a P4 than on an Athlon. The best way to find out would be to look at the assembly code that gcc is producing. You can do that by using gcc's -S option. If you'd like send me the C code and the output from -S and I'll see if I see anything obvious.
I'm somewhat paranoid about posting my email address. My paranoia seems to work, as I've received no more than the occasional spam in the last few years. My email address is my slashdot user name at woh.rr.com.
You ever notice how all the Hammers are clock speed locked at 800MHz? Yea, there's a reason for that. They're having problems cranking the clock speed up. For 800MHz they're fast as hell, beating P4 with twice the frequency, but they're not gonna release them until they clock faster than current Athlons so they're trying different types of transitors and what not.
How the hell do I know that??? Look where I live, take a guess...The birds outside my window know things.
it appears to be the floating point unit stalling for data.
2 /p4_240 0-01.htmlt hwood.htme ct.htm
s or.htm ?iid=Homepage+Find_Products_Processors&
Well, if it's stalling for data, your problem is probably that the P4 has a *tiny* L1 data cache compared to... uh... anything. It's only 8K, compared to the Athlons 64K. See the following URLs:
http://www.tomshardware.com/cpu/02q2/02040
http://www.geek.com/procspec/intel/nor
http://www.geek.com/procspec/amd/k7sel
It's probably also worth noting that Intel does NOT list the P4 as a "server processor". The P4 is listed as a desktop or workstation processor. Only P3, Xeon, and Itanium chips are recommended for server use:
http://www.intel.com/products/browse/proces
You might want to show that to management and reconsider your purchase of P4 equipment. Even a P3 is likely to perform better.
Even a P3 is likely to perform better.
:-) I just mean that if the P4 is performing like crap for your applications, then you shouldn't use that processor.
And by saying that, I don't mean to imply that I think the P3 is a good choice, (I like the Athlons
Something I've never seen a good explanation of -- is there performance-wise any difference between a 266 MHz clock with data transferred once per clock and a 133 MHz clock with data transferred twice per clock (despite the actual clock ticking rate of course)?
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
that's much bigger news than some delay - even bigger news is that it is not the first processor they produced :) "C|Net reports that their next processor (Hammer)". How low can you go :)
iNTEL does have their own compiler, that they sell, I guess they rather have you running their compiler than gcc.
That said, which version of gcc did you use? There seems to be vast differences between them (and certain companies seems to like 2.96.x which is NOT a valid gcc version. If gcc -v gives you the 2.96.x version, get a new gcc) and there are reports about speen increases in the 3.x series.
I was mostly curious, I really would like to see that code of yours, but I realise that you wouldn't wanna give it away. Any chance you could write some dummy code that gives the same results (as far as the P4 being slow that is)?
I may be able to give the testing code away; I'm waiting for a response from my advisor (GMT-5, so it will be a few more hours before he's even awake =-). Send me an email if you'd like the code.
The P4 used gcc 3x, while the Alpha and Athlon XP used gcc 2.96. =-) If anything, this should give the P4 an advantage.
The Athlon C and P-III results were all msvc. I don't know which version was used, because I didn't do the tests.
-Paul Komarek
Why should they rush the Hammer when the Itanium is failing as is? They know they can't push people to use their 64-bit capabilities, just like people didn't switch to Alphas. Squeeze every ounce of strength from the Athlon as they possibly can for now. Let Intel push the IA64 standard on everyone first to create a demand to migrate from 32-bit to 64-bit. That's where AMD plans to make their killing.
I would imagine it would be better to release Hammer ASAP and create the 64-bit market themselves. Then again, I don't know the logistics required for such a launch, nor do I know exactly how much better, if any better, x86-64 would perform. Let's face it, not many people care about 64-bit versus 32-bit, they only know what the dork at CompUSA tells them. And if Hammers can't outscore P4's in the 32-bit apps that very short-sighted people care about, then there is really no place for Hammer in the consumer market.
From what I've heard, mostly from internet gossip, is that AMD is having problems making Hammer scale high enough to beat the P4 in 32-bit apps, although it only requires roughly 1 Hammer MHz to beat 3 P4 MHz. I've also heard that AMD is having problems making Hammers run above 800MHz. With the expected debut of the P4 at clock speeds above 3GHz, the Hammer doesn't stand much of a chance in 32-bit apps.
In short, don't expect to see Hammers until Intel manages to salvage the Itanic.
The memory system wasn't a problem, but it's becoming a bottleneck now. When I look at the latest benchmarks, it looks more and more like the P4 catching up to the Athlon in terms of IPC. This is probably due to the memory bandwidth holding the Athlon XP back.
This isn't a particulary new requirement. You have to be careful about selecting pages for processes today even on single CPU systems to avoid cache thrashing. Because of the way first or second-level CPU-caches map to physical memory, certain memory-access pattern lead to constand reloading of the cache, making it pretty ineffective, even worse if it wasn't there in the first place. By carefully mapping physical pages to virtual memory the OS can avoid this problem. Solaris does this, I don't know about Linux. Probably.
So, this is one new requirement for the memory management code. No problem, we just make sure all process pages belong to one particular CPU and schedule this process to this CPU only. Everything is fast and nice. Intel is doomed. Or is it? Not so fast, all this is probably a bad idea:
We can't make sure pages on the right CPU a even available. What if they are not? Give out wrong pages? This would lead to results in running time which are not reproducable. This is really bad. It gets worse. What it the right CPU is not available because it's running some other process?
Probably it's best to allocate evenly distributed pages (some fast, some not so fast) to processes and not schedule them special in any way.
Easy ;)
Anything with 64-bit doom is good enough for me :)
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
Folks,
While AMD works out the bugs of their Hammer line of CPU's, don't forget that AMD still has a card to play in terms of CPU competition with Intel: the Barton-core Athlon CPU due later this fall.
Unlike the Athlon CPU core designs since the original Thunderbird-core Athlon's, the Barton-core Athlon sports a larger 512 KB L2 cache on the CPU die, which will offer dramatic performance increases, especially with memory-intensive programs. Remember, the current Thoroughbred-core Athlon CPU rated at 2600+ already has reached parity with the Intel Pentium 4 2.53 GHz part, and that's with only 256 KB of L2 cache on the CPU die and using DDR266 DDR-SDRAM! What will the Barton-core Athlon do?
If this means Barton sooner rather than later, I'm happy... although from what I've read Barton (166 MHz FSB, 512k cache) is still slated for Q1'03. Sigh.
Why? Because I'd like to get a Barton CPU for my next computer. I'm already in the waiting game for the NV30 and (to a much lesser extent) Serial ATA, so putting a better CPU on the list isn't a big deal.
Why not Hammer? Because I know better than to buy a first generation CPU with first generation motherboards. Barton is just a mild revision to a 4 year old CPU core, and the motherboards are now hitting their 6th generation (KT133, KT133A, KT266, KT266A, KT333, KT400).
For those who need the speed, power, and addressibility of a 64-bit chip this announcement sucks, but for those just looking for a faster current generation chip it's not entirely bad.
Err, I have no such illusions. I expect the Hammer to be about 20-30% faster at a given clock than existing chips, which is somewhat optimistic, but entirely within the expectations for the chip. I want a 64-bit machine because there are some things in OS development that are more fun when you have 64-bits of address space. Things like single-address space operating systems and persistant virtual memory stores become feasible when with 64-bits of address space while they aren't so nice to implement with only 32 bits.
A deep unwavering belief is a sure sign you're missing something...
It's probably also worth noting that Intel does NOT list the P4 as a "server processor". The P4 is listed as a desktop or workstation processor.
Quite honestly, I think workstations tend to be more floating-point intensive than servers. For example, how many floating-point calculations does 3D CAD software do vs. Sendmail or LDAP?
So, new PC customers should be buying "servers" for any graphics, mathematics, or scientific work. This only increases my dislike of Intel's marketing tactics.
Perhaps Intel should market the P4 as an administrative assistant's toy, and let the engineers and scientists go to Sun, SGI, HP, and IBM for real workstations?
Healthcare article at Kuro5hin
Could somebody give some insight why AMD is after adequacy.org?
Was it because of the humor article "Is Your Son a Computer Hacker?" where it said:
"If your son has requested a new "processor" from a company called "AMD", this is genuine cause for alarm. AMD is a third-world based company who make inferior, "knock-off" copies of American processor chips. They use child labor extensively in their third world sweatshops, and they deliberately disable the security features that American processor makers, such as Intel, use to prevent hacking. AMD chips are never sold in stores, and you will most likely be told that you have to order them from internet sites. Do not buy this chip!"
Any info would be appreciated, since adequacy.org redicects to kiro5in. What's AMD's beef?
By moving the memory controller onto the processor and providing communication between processors over a hypertransport link (3.2GB/sec for bandwidth actually increases as more cpus are added! This is in contrast to a normal MP system where as more cpus are added, there is increased competition for a fixed resource (main memory) which is already the bottleneck in many single processor applications.
This is true only if the processors are running tasks with unrelated working sets (and if the data for each task is in that processor's memory).
If you have tasks that require memory managed by another processor, you have to go through the hypertransport link and the other processor's memory controller to get it. This will be _slow_. HT is decent, but nowhere near as good as a direct connection to memory, and there _will_ be delays due to arbitration on the second chip and the various buffering stages the data transfer has to go through.
So, for multiple processors working on a shared workload, you're screwed.
The only way to ameliorate this is to have very smart OS-level memory management that can duplicate shared-but-not-modified pages across multiple memory banks, and both OS and processor support for update-based coherence between the banks. The hardware support for this is a bit tricky, and the OS support will be a nightmare if the OS wasn't NUMA-friendly to begin with.
And under some cases - like tasks on multiple processors competing for access to a lock or all heavily modifying the same data page - you're screwed no matter what you do.
So, don't rejoice yet. We'll only know for sure how well this will work when we have Hammer systems on our desks.
I haven't tried the ppc yes, but we've considered ppc machines. Vector processing could really improve parts of our algorithms. However we can't afford to spend time on things like altivec and sse, and must remain portable.
-Paul Komarek
You can't believe how much I agree with you. I'm very tired of having to wade through "server" literature just to find a good workstation. OTOH, if you think of the workstation as a "cycle server", well .... =-)
-Paul Komarek
I now have permission to send the code, but I don't have an email address for you.
-Paul Komarek
Patience my troll, good things come to those who wait. Better things come to those who wait longer. Besides, not like you never delayed anything in your life. No one is perfect...
In college, really poor, need a flatscreen.
I've received a lot of email about my test code. I obtained permission to distribute my test code, and have made a web page with it. For those who are interested, please see http://www.andrew.cmu.edu/~komarek/work/RobustChol eskyPerf/RobustCholeskyPerf.html.
-Paul Komarek
It competes against the Pentium 4. If the K7 tops out at "3000+" as I've read, AMD is going to need Hammer to compete with 3GHz+ Pentium 4s.
The whole "Wintel" relationship is mostly just a huge myth. At best, Intel and Microsoft are forced-friends.
Both Intel and Microsoft have realized for quite some time that they need each other to survive, at least for the time being, but neither of them are happy about that fact, and both are striving to change it. Microsoft has been a fairly strong supporter of AMD for some time now, while Intel was one of the first major hardware companies to jump on the Linux bandwagon.
Think about it, Intel is pretty much the dominant force in PC hardware, being not only the primary supplier of processors, but also the #1 supplier of motherboard chipsets, video solutions, and being well up there if not #1 when it comes to audio and NICs (most of this is now integrated into motherboards). They are also the driving force behind most buses and interconnects, ie PCI and AGP (AMD's Hypertransport is a notable exception here), not to mention the fact that they defined the ATX form factor that virtually all current PCs make use of (albeit in a somewhat basterdized format for some of the big OEMs).
Microsft, on the other hand, is the dominant force in software, having the most common operating system, office software, web browser, e-mail program, etc. etc. ad nauseum.
The end result is that Microsoft is the only company in the PC world that Intel doesn't have a fair degree of control over (AMD fights, but ends up adopting Intel-compatible technology in the end, ie MMX/SSE/SSE2). Similarly Intel is the only company that Microsoft doesn't exert a signicant amount of control over in the PC world, though even there I'd say that MS mostly won that battle a while back.
Anyway, long story short, MS and Intel, not so much allies as simply two companies forced together by circumstance.
I'm concerned by this too. The central HyperTransport links in an 8-way system will be heavily contended for many workloads. (Not to mention the latency even in the absence of contention.) Something has to give.
If I understand correctly, the Sledgehammer has an extra HT link port, which will let them add extra processors as a mesh instead of a chain. The problem still occurs, but it's less crippling for large processor counts.
I'd have to doublecheck this feature, of course.
They may also be banking on fewer processors being used for most applications. I'm actually kind of impressed with their four-processor demo. It's large enough to be impressive, and for shared-bus schemes to bog down, but small enough that even with randomly-distributed data, really long hops will be uncommon. I don't see them competing in the Starfire/Sunfire's market any time soon, so larger systems might not be a problem.
It'll be fun to see what happens.
How does this work architecturally? Does each CPU have its own (fast) address space and can access the address spaces of other CPUs more slowly? Or is there an intermediate translation layer, so that all of memory now looks like a cache of one big address space? Or, most likely, does the OS have to manage the logical address/CPU mapping problem as if it were virtual memory? It's not quite like paging, though, because you can still read memory on the other CPU; it's just slow. So only big or frequently accessed stuff needs to be copied. I think. The VM system is going to have to be very, very clever to manage this beast well.
Some early mainframes had memory arrangements like this. The CDC 6600, the IBM 360/90, and the UNIVAC 1110 all had both "fast" and "slow" memory. But they didn't have paged virtual memory, so the fast/slow memory thing had to be managed explicitly. Mostly, this didn't work, and as a result, mixed speed memory got a bad name.
But now, we've probably got to go there, simply because speed of light lag and crossbar bottlenecking limits the speed of multiprocessors.
The notion of machines with really fast memory to memory copying hardware has some interesting implications. It may lead to some new OS architectures for large systems. But I'm too tired to think this through.
Thanks for that link! I'm going to try to watch that thread; there seems to be at least a handful of sharp people posting there.
-Paul Komarek
I read on the lkhml that SUMO is used on 8 or less CPUs while NUMA is used for 9+ CPUs.
If that's true, I don't think the majority of us have to worry about NUMA concerns.
Effecient use of hyper threading requires proper kernel schedular support regardless of what OS you use. Without schedular support, you can slow your self down bigtime.