Slashdot Mirror


AMD Delays Hammer

TeJarz writes "C|Net reports that their next processor (Hammer) has been rescheduled from its original Q4 release to Q1 2003. To quote C|Net: 'The delays are occurring to accommodate the release of a new version of Athlon with a 333MHz bus, said Crank. Current Athlons come with a 200MHz bus and 256KB of secondary cache.' Let's hope this doesn't get moved again."

7 of 346 comments (clear)

  1. Good by Billly+Gates · · Score: 3, Informative

    A delay from palladium which will be included by default starting with the Hammer. It was probably delayed because longhorn aka drm-Windows was delayed and its needed to actually use the cyptography in the cpu.

    1. Re:Good by Billly+Gates · · Score: 5, Informative
      Oops I forgot to include this from the faq.



      Q: Can Linux, FreeBSD or another open source OS run on "Palladium" hardware?

      A: Virtually anything that runs on a Windows-based machine today will still run on a "Palladium" machine (there are some esoteric exceptions[1]). If you currently have a machine that runs both Linux and Windows, you would be able to have that same functionality on a "Palladium" machine.

      The exceptions are here



      [1] These exceptions include the following:

      1.)Some debuggers may need to be updated to work in the "Palladium" environment, but they can still work.

      2.)Some special performance tools may need to be updated.

      3.)Software that writes directly to TCPA hardware will need to be updated.

      4.)Memory scrub routines (at the hardware level) will need attention.

      5.)Third-party crash dump software may need to be updated.

      6.)BIOS mode hibernation features will need to be updated to work with "Palladium."



      Its these 6 reasons why palladium is still beta and why AMD is probably waiting before releasing Hammer.

  2. the other side... by tanveer1979 · · Score: 3, Informative
    Lot of posts are screaming "again, again"... but the fact is a 64 bit processor is one devil to design.
    The biggest problem with current processors is that to design such devices we *have* to use dynamic logic. Ask any VLSI design engineer.. that is no joke. Infact many multipliers and dividers have to be hand edited! So delays are expected and it does reflect upon the desigers and companiesd in any way.

    Before you ask.. I do now work for AMD, i work in another VLSI company, thats why i say.. its tough. Millions of gates thousands to be hand edited its a bitch.. but as they say the fruits of labour are sweet... and for AMD hammer is going to be the sweetest

    --
    My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
    FB : https://www.facebook.com/TanveersPhotography
  3. Re:Current Athlons by packeteer · · Score: 4, Informative

    In case everyone doesn't know what "double pumped" or "DDR FSB" mean let me explain. The clock that sets how often data is transfered clicks over and over to keepo the pace. On an Athlon it transfers data twice for every click. On a Pentium 4 its 4 times a click. Right now most Athlons run at 133mhz "DDR FSB". Mine already runs at 166mhz (overclocked of course) and let me tell you its sweet. I cant wait to see everyone have access to 166 mhz FSB Athlons.

    --
    unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep
  4. Why Pentium IVs are slow by stewartjm · · Score: 4, Informative

    The P4's x87 FPU and x86 ALU are just plain slow compared to P3s and Athlons. Though I am surprised your code is running 82x slower. I'd expect more like 2-8x slower for compute bound code. You can get a somewhat sensationalistic overview of why it's so slow at this link.
    If you want more in-depth numbers you can compare appendix C of the Intel Pentium 4 Optimaztion Manual with chapter 29 of Agner Fog's Pentium/II/III Optimization Manual. You can see the Athlon numbers in Appendix F of AMD's Athlon Optimization Manual.
    If you want to do number crunching with Pentium 4s your best bet is to use the SSE2 instructions/registers. You should be able to get a noticable speedup by using the Intel C++ compiler and telling it to use SSE2 instructions. If you want to eek out max performance you'll have to use assembly language. Though you can probably get most of the way there using the Intel C++ Compiler's SSE2 intrinsics.
    I'm curious as to why your code is so much slower on a P4 than on an Athlon. The best way to find out would be to look at the assembly code that gcc is producing. You can do that by using gcc's -S option. If you'd like send me the C code and the output from -S and I'll see if I see anything obvious.
    I'm somewhat paranoid about posting my email address. My paranoia seems to work, as I've received no more than the occasional spam in the last few years. My email address is my slashdot user name at woh.rr.com.

  5. Re:is there a real difference? by kimmo · · Score: 5, Informative

    Latency.

    With single data rate a new address can be sent every clock for all memory requests.

    With double data rate a new address can be send with every other "clock", but while data transmission rate stays the same. Effectively this means transferring double data for each request, while the amount of requests doesn't change.

    This isn't very serious problem, since single bytes/bus wide data aren't usually transferred, but whole cachelines of 32/64 bytes. They will generate 4/8 sequential burst requests nullifying much of the "halfclocked" address generation potential latency problems.

    Ok, so why can't the addresses be sent like the data is another question which someone else with more knowledge might explain.. Maybe it would complicate things too much since the request-answer mechanism should be pipelined to accept new requests until previous requests are served. Or maybe the physical bus has some limitations, like using the same pins for address/data, which would simply make it impossible to send new addresses simultaneously (on falling edge of clock) while receiving data.

  6. Re:short rant and a question by Erik+Hensema · · Score: 4, Informative

    Essentially this would be a NUMA system (non-uniform memory architecture). As far as I know Linux 2.6 will have support for these systems.

    In a real NUMA machine there would be a hierarchy of clusters of processors. Each cluster functions a bit like a traditional SMP system, but the clusters are interconnected over "low"-bandwidth busses. This makes memory accesses across clusters slower than direct accesses into the clusters' memory.

    Both the VM and the scheduler will have to know about this.

    Another point with NUMA systems is the possibility of gaps in the main memory (discontinues memory). Kernel hackers are currently working on support for that (discontigmem patch, merged in 2.5.34).

    --

    This is your sig. There are thousands more, but this one is yours.