AMD Delays Hammer
TeJarz writes "C|Net reports that their next processor (Hammer) has been rescheduled from its original Q4 release to Q1 2003. To quote C|Net: 'The delays are occurring to accommodate the release of a new version of Athlon with a 333MHz bus, said Crank. Current Athlons come with a 200MHz bus and 256KB of secondary cache.' Let's hope this doesn't get moved again."
Intel is in on it too. http://www.msnbc.com/news/806011.asp
Too drunk to remeber html
A delay from palladium which will be included by default starting with the Hammer. It was probably delayed because longhorn aka drm-Windows was delayed and its needed to actually use the cyptography in the cpu.
http://saveie6.com/
The biggest problem with current processors is that to design such devices we *have* to use dynamic logic. Ask any VLSI design engineer.. that is no joke. Infact many multipliers and dividers have to be hand edited! So delays are expected and it does reflect upon the desigers and companiesd in any way.
Before you ask.. I do now work for AMD, i work in another VLSI company, thats why i say.. its tough. Millions of gates thousands to be hand edited its a bitch.. but as they say the fruits of labour are sweet... and for AMD hammer is going to be the sweetest
My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
FB : https://www.facebook.com/TanveersPhotography
The hammer is a critical product for AMD that they would never delay unless there were *major* problems with it.
1) AMD is currently losing huge amounts of money. The hammer would have allowed them to sell at the high-performance end of the market again where the sales prices are higher and might have helped them reduce the flow of red ink.
2) The delay will badly hurt AMD partners such as motherboard and chipset vendors who have developed supporting products for hammer.
3) The hammer had a potential performance lead over Intel that will be greatly eroded by the time it finally appears.
4) Critical software development for hammer will be slowed which will slow eventual market acceptance of hammer.
5) The delay will build momentum for Itanium.
6) The delay will greatly reduce the pressure on Microsoft to support hammer and will give Microsoft the opportunity to also build momentum for Itanium. Depending on market conditions when the hammer finally appears, it is now even possible that Microsoft will never need to support hammer.
7) This delay is so serious that it creates real doubts that hammer will *ever* be a viable product.
Technically it's 133, just double-pumped (it's obvious that the 533 p4s don't actually have 533 mhz fsb, that would be jsut silly). I hope they are reffering to the base speed in this case as well.
I live in Austin, and have friends who work at AMD. AMD may make a great processor, but their motherboards suck because the motherboard testing department's manager tries very hard not to find any bugs. (Test stuff that you know will work. Never install an OS, just use a ghost image of preinstalled windows XP copied through the network onto the hard drive. Testing with linux is a no-no, because you actually find reproducible bugs in the hardware! We can't have that, we're a testing department...)
At least one woman was fired for making a Linux test CD and distributing it internally around the company, against that manager's wishes. Her name's on the test CD, and it was still being used inside AMD last week, but she answered too many Linux questions for people outside her department and as such was labeled "not a team player" in the internal politics. As far as I can tell, that was the most knowledgeable linux person they had anywhere near that area.
AMD makes great processors, but until they get a new motherboard testing department, they'll have nothing to put them in.
In case everyone doesn't know what "double pumped" or "DDR FSB" mean let me explain. The clock that sets how often data is transfered clicks over and over to keepo the pace. On an Athlon it transfers data twice for every click. On a Pentium 4 its 4 times a click. Right now most Athlons run at 133mhz "DDR FSB". Mine already runs at 166mhz (overclocked of course) and let me tell you its sweet. I cant wait to see everyone have access to 166 mhz FSB Athlons.
unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep
The P4's x87 FPU and x86 ALU are just plain slow compared to P3s and Athlons. Though I am surprised your code is running 82x slower. I'd expect more like 2-8x slower for compute bound code. You can get a somewhat sensationalistic overview of why it's so slow at this link.
If you want more in-depth numbers you can compare appendix C of the Intel Pentium 4 Optimaztion Manual with chapter 29 of Agner Fog's Pentium/II/III Optimization Manual. You can see the Athlon numbers in Appendix F of AMD's Athlon Optimization Manual.
If you want to do number crunching with Pentium 4s your best bet is to use the SSE2 instructions/registers. You should be able to get a noticable speedup by using the Intel C++ compiler and telling it to use SSE2 instructions. If you want to eek out max performance you'll have to use assembly language. Though you can probably get most of the way there using the Intel C++ Compiler's SSE2 intrinsics.
I'm curious as to why your code is so much slower on a P4 than on an Athlon. The best way to find out would be to look at the assembly code that gcc is producing. You can do that by using gcc's -S option. If you'd like send me the C code and the output from -S and I'll see if I see anything obvious.
I'm somewhat paranoid about posting my email address. My paranoia seems to work, as I've received no more than the occasional spam in the last few years. My email address is my slashdot user name at woh.rr.com.
If Intel creates demand for IA64, how in the world is AMD going to sell Hammers that are based on a non-compatible ISA?
Well, it's not obvious how, and that's why the method is patented. Their solution has something to do with two phase locked clocks or whatever, iirc.
Latency.
With single data rate a new address can be sent every clock for all memory requests.
With double data rate a new address can be send with every other "clock", but while data transmission rate stays the same. Effectively this means transferring double data for each request, while the amount of requests doesn't change.
This isn't very serious problem, since single bytes/bus wide data aren't usually transferred, but whole cachelines of 32/64 bytes. They will generate 4/8 sequential burst requests nullifying much of the "halfclocked" address generation potential latency problems.
Ok, so why can't the addresses be sent like the data is another question which someone else with more knowledge might explain.. Maybe it would complicate things too much since the request-answer mechanism should be pipelined to accept new requests until previous requests are served. Or maybe the physical bus has some limitations, like using the same pins for address/data, which would simply make it impossible to send new addresses simultaneously (on falling edge of clock) while receiving data.
Essentially this would be a NUMA system (non-uniform memory architecture). As far as I know Linux 2.6 will have support for these systems.
In a real NUMA machine there would be a hierarchy of clusters of processors. Each cluster functions a bit like a traditional SMP system, but the clusters are interconnected over "low"-bandwidth busses. This makes memory accesses across clusters slower than direct accesses into the clusters' memory.
Both the VM and the scheduler will have to know about this.
Another point with NUMA systems is the possibility of gaps in the main memory (discontinues memory). Kernel hackers are currently working on support for that (discontigmem patch, merged in 2.5.34).
This is your sig. There are thousands more, but this one is yours.
The only situation in which book value means anything is when a company is going to be taken over and sold off in pieces, as happened a lot in the 80's. Even so it is not worth what you might want, for instance if AMD was in trouble there are very few possible buyers for their factories, and those buyers won't buy assets at book value, they will wait for AMD to get into worse trouble and try to get a bargain. Many investors have been burned when their holdings liquidated below book, and creditors were first in line for the proceedings.
When liquidation is not a current risk, how should you value a company? The economists answer is that the value of a company is the estimated current value of the future returns from owning it. Those returns come out of profits made. You shouldn't strongly care whether those profits are reinvested in the company (for bigger profits later), paid out in dividends (immediate cash) or spent in stock buybacks (raise the stock price to pay you back indirectly).
Given that the present is the best guide we have for the future, the best estimate of future profits are current profits (also called earnings). So the ratio between current earnings (aka profits) and the price of the stock is a pretty good guide to how over or under priced a company that you expect to continue along is. The importance of this ratio is justified by basic economics, whether or not the company is one which has huge fixed investments it needs that show up as a large book value (eg Intel) or whether it has low inventories and depends on rapid turnover with many small profits (eg Dell).
This ratio will, of course, mean less if you expect to see the company sold off in pieces, or if the company is growing and you expect larger future profits. But overall in the long term you tend to see most companies with P/E ratios in the range 15-20. Now let us look at the P/Es of the companies you listed:
AMD: N/A (no earnings!)
Dell: 38.33
Intel: 18.45
Which means that unless you expect AMD to turn around, it isn't worth much. Dell is historically overpriced, but by a much smaller factor than book value lead you to think, and Intel is at a historically reasonable price. Of course the current market is historically way overvalued - still. (You may wonder at how stocks could be overvalued after several years of being hammered. Well that was the size of the bubble preceeding.)
An excellent introdution to how this all works is How to Buy Stocks by Louis Engel.
Of course the wise technology investor will focus on 2 questions when it comes to chip companies. The first is who will survive to see the standard consumer PC face the 32-bit barrier. The second is whose 64-bit strategy is more likely to win in the market.
I don't know if AMD will survive. If they do, then their 64-bit strategy is much better. Intel will definitely survive but breaking backwards compatibility is a darned big iceberg for the Itanic. Transmeta is in serious trouble, but they have the best 64-bit strategy. (It is, "We can ship whatever wins!" They literally implemented AMD's instruction set before AMD could.)
The additional latency required to synchronize the address with the rising edge rather than either edge is negligible when you consider the total amount of time required to perform the fetch from L3 or L4. Therefore there is no need to endure the more complex design to implement this.
Most data is fetched in bursts. So there are typically 4 or more data phases per request. Consequently, there is no need for as high bandwidth for the address bus as for data.
Plus, as another post said, it reduces the power requirements. This, combined with the fact that there are typically 4 or 8 data transfers per address is why P4 has gone to QDR buses. This way, there is one address per cycle, and an entire 4-unit burst can be completed in a cycle so the address bus could theoretically be completely saturated. Once you pass QDR (to Octal DR?), you may start requiring a higher data rate on the address bus as well for performing two 4-unit bursts per cycle.