Slashdot Mirror


AMD64 Preview

Araxen writes "Over at Anandtech.com they have an interesting preview of AMD's 64 bit processor on a Nforce3 mobo. The results are very impressive with the Anthlon64 beating out Intel's P4 best processor soundly in their gaming benchmarks. This was only in 32-bit mode no less! I can't wait for 64-bit benchmarks come out!"

33 of 290 comments (clear)

  1. Opteron Benchmarks, not Athlon 64 by ultor · · Score: 5, Informative

    The benchmarks are from a 2ghz Opteron, not an Athlon 64. It is intended to give an example of the performance from the new chip. Unfortunately, upon introduction, only the Athlon FX, running on ECC memory will be capable of using dual-channel memory. And from what I've heard, this cpu will cost in the vicinity of $600+. The first non-ECC dual-channel platform will be introduced in 2004.

  2. Not an Athlon64, but an Opteron by doormat · · Score: 4, Informative

    Anandtech is only comparing single processor Opteron performance against everything else, no infact, Athlon64 performance. The primary difference is that the Opteron has a dual channel memory subsystem, whereas the Athlon64 has a single channel system. This difference will have an affect on performance.

    --
    The Doormat

    If you're not outraged, then you're not paying attention.
    1. Re:Not an Athlon64, but an Opteron by heli0 · · Score: 4, Informative

      There will actually be two lines at launch. The 940-pin Athlon64FX(1-way Opteron) will have dual channel DDR while the cheaper 754-pin Athlon64 will have single channel DDR.


      Athlon64 Showing Up
      Pricing for Athlon 64 leaks: 939 pin chip won't be compatible with 940 CPU

      --
      Whenever the offence inspires less horror than the punishment, the rigour of penal law is obliged to give way...
    2. Re:Not an Athlon64, but an Opteron by MBCook · · Score: 5, Informative
      Your comment is somewhat missleading. There are TWO Athlon 64s being launched (or as Overclockers.com called them the "Opteron that's not an opteron but is an opteron" and the "operton that's really not an opteron" or something like that). Annandtech compared the equivelent of an Athlon64 FX, not an Athlon64. Here is the skinny:

      Athlon64 FX
      This is a 1xx opteron. It's still dual channel, and it uses ECC memmory (for now?). This is the "performance" part, the high end one. If we're trying to find who has the fastest CPU, this is the one to test. Their tests are quite valid for this, IMHO.

      Athlon64
      This is the "budget" Athlon64. It only has once memory channel, I don't know if it has ECC or not. Yes, this will be slower, but it will also be cheaper and the motherboards for it could be cheaper too (since it doesn't have that second memory channel).

      So, I think that this is a very important article. Look how fast an Opteron/Athlon64 FX is compared to a P4. A 2 Ghz Opteron/Athlon64 FX is beating a 3 Ghz P4. This is all on a 32 bit os and software. When you run 64 bit software that knows about all the extra registers and can do 64 bit math nativly should it need to, the computer will be fast. Tim Sweeny (spelling?) said that native versions of UT2003 (or something) was running up to 20% faster on x86-64 without optimisations; just from going to 64 bit mode. And for most of us the fact that it can manage over 4GB of mem easily for now is only iceing on the cake.

      AMD has a great processor. I can't wait to see more info on these things. The fact that it does so well in 32 bit mode is important since you currently can't get Windows for the processor (there is no x86-64 version of Windows out yet). If it was a great processor, but you were forced to get terrible performance if you bought one for 6+ months (becuase it wasn't good with 32 bit software like windows and what you run), would anyone buy it? This thing is faster today, and should only get faster when you run native software. I'm saving my pennies (and yes, I know it will take a lot of pennies ;).

      --
      Comment forecast: Bits of genius surrounded by a sea of mediocrity.
    3. Re:Not an Athlon64, but an Opteron by sirsnork · · Score: 5, Informative

      All Opterons are made on the same line, they all have the exact same core. They are then tested to see it the work in 8way, if not they are tested for 4way and if they fail that they are tested for 2way. In 2 way they only use 2 of the 3 HT cotrollers and only one talks to another CPU (the second connects to the HT controller). In 4way config the CPU's use 2 HT controllers to talk to CPUs and one of them uses the third to talk to an HT controller (in fact sometime they have 2 CPUs talking to HT controllers, one for PCI-X and the other for all the rest). Finally in 8way 4 of the CPUs use all 3 of their HT controllers and the 4 at the edges only use 2, but again some also have to talk to the HT controller(s) on the "outside"

      --

      Normal people worry me!
    4. Re:Not an Athlon64, but an Opteron by Jeff+DeMaagd · · Score: 3, Informative

      I don't know how this fits with your listing, but it looks like there might be another derivative budget chip at or below A64 that might not have a 64 bit mode at all:


      AMD to ship Athlon 64s as Athlon XPs

      I do find it amusing that people are commenting how good something is or is not before the damn product has been released, particularly when there is so little hard information on what it will really amount to.

      So far one difficulty I see is the lack of Hammer boards that have AGP _and_ PCI-X slots or at least 64 bit/66MHz PCI slots, and they commented on this in that review last I checked. I think part of the assumption was that because these systems are for servers, AGP isn't needed, or if AGP is needed, it was assumed that PCI I/O slots weren't that critical.

  3. Re:Interesting by Dun+Malg · · Score: 3, Informative
    I just wonder if it can compete with the Intel x86-64 line of processors.

    Huh? There's no such thing as an "Intel x86-64" processor. x86-64 is AMD's solely implementation.

    --
    If a job's not worth doing, it's not worth doing right.
  4. Re:64-benchmarks wont be good by Slack3r78 · · Score: 4, Informative

    I actually read this this morning, and there are a couple of important things to note - the chip being 'previewed' isn't actually an Athlon64 - it's a 1.8GHz Opteron overclocked to 2.0GHz, which is the expected clock rate of the first A64, prorated at 3200+. It'll give us an idea of what to expect, but nothing too specific.

    The other important thing to note is that the comparisons were mostly against P4s and an Athlon XP, with a Dual 3.06GHz Xeon thrown in for good measure, all 32 bit chips. And the 'Athlon64' owned most of the competitions, showing that its 32 bit mode is just as good as rumored. There were no Itaniums in the competition since, so only 32 bit modes can be compared here. However, if the A64 turns out to be as good in its native 64 bit mode as the 32 bit number might lead you to believe, the Athlon 64 looks like it very well could be a force to be dealt with.

  5. Semantics, maybe, but... by Murdock037 · · Score: 3, Informative

    Intel doesn't have an x86-64 line of processors. They have an IA64 line of processors.

    The two apparently aren't interchangable. There's a coming battle in which software companies have to choose between the two, or support both, which would be tough on both them and consumers.

    Apparently, AMD's x86-64 set is easier to deal with, and more of a natural progression from where the processors are now. (It also apparently runs 32-bit code at rates comparable to 32-bit chips at the same clock speed.) Intel's IA-64 is a total reworking, and a bitch to work with, from what I've read.

    In the end, it seems like the smart choice would be for everybody to toss their hat in with x86-64 (which means Intel would have to, as well, and essentially concede defeat and lose face); it probably won't happen, though, because Intel is Intel.

    Check out this article at the Inquirer, which I've basically just paraphrased, but it does go into some interesting Windows 64 dealings.

  6. 64bit performance gains... by Natalie's+Hot+Grits · · Score: 5, Informative

    Before anybody starts talking about how little 64bit cpu's actually increase performance, let me tell everyone what 64 bit mode will actually bring to the table over the Opteron/Athlon64 32 bit modes:

    1) more registers. This will get us fair performance increase from the start, as compilers will have more registers to work with when doing calculations on multiple pieces of data.

    2) support for larger system memory sizes. This won't help you in video games, but it will help you doing high end photoshop, and other applications (provided you spend the money to get more memory put into your system)

    3) native operations on 64 bit data. Typically, when someone wants to do operations on a 64 bit integer in a 32 bit CPU, you have to split up the work in software. Now with 64 bit registers, you will be able to do operations on 64 bit integers in the same time as it takes to do the same operation on a 32 bit integer.

    4) when using native 64 bit mode, certain legacy instructions of x86-32 are depreciated. This is a cleanup for the x86 ISA, which in the past has contained literaly EVERYTHING that the previous generation of CPU supported. AMD's x86-64 ISA eliminates these legacy features and moves them into firmware emulation (don't worry, it won't degrade any modern 32 bit code, just terribly outdated stuff from the 386 days, which doesn't need 2GHz of power in the first place)

    On top of these performance enhancements that 64 bit mode brings you, you get all of this just because you are using AMD's Opteron/Athlon64 CPU:

    1) Dual channel DDR Memory interface, with memory controller on the die of the CPU. This reduces latency and improves memory bandwidth so dramatically that even Intel's off die memory controller can't keep up (this is why video games are so much faster on the amd64 platform than on athlon-32 platform)

    2) HyperTransport bus to the south bridge, which will give high bandwidth access to the PCI bus, PCI-X, and other IO intensive controllers. Eventually AGP slots will be phased out for PCI-X slots which will be universal for both video, and other devices.

    3) when using multiple CPU's in the same system, the new AMD-64 platform gives you dedicated memory bandwidth to each CPU installed. On the intel and athlon-32 platforms, all the CPU's in the system shared the same memory controller which runs either single or dual channel DDR anywhere from 266MHz - 400MHz.

    --
    Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.
    1. Re:64bit performance gains... by p3d0 · · Score: 4, Informative
      Nice summary. I would only add a couple of things:
      • 64-bit math on IA32 requires register pairs. With 8 GPRs, one of which is reserved for the stack pointer, that means you can only keep 3 long-longs in registers. On AMD64, even if you dedicate another register to the frame pointer, you can still get 14 long-longs in registers: almost a factor of 5 improvement.
      • The benefits from the memory subsystem will be offset by the fact that objects containing pointers will be twice as big as on IA32. That means objects could have twice the cache footprint and twice the memory bandwith requirements.
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    2. Re:64bit performance gains... by p3d0 · · Score: 4, Informative
      I meant data objects, as in object-oriented programming. Not object files. OO data tends to have a lot of pointers.

      Having said that, object files will be bigger too. I'm not sure where you're getting your 10-15%; have you actually checked? I don't have access to our AMD64 boxes right now so I can't take a look at the object files, but I think the difference could easily be more than that for object-oriented code, for a number of reasons:

      • Probably 2/3 of the instructions in hot code will need a REX prefix, either because they use registers r8-r15, or because they manipulate addresses.
      • Only mov instructions can use an 8-byte immediate. Anything else that needs an 8-byte immediate must load that immediate into a register first with a 10-byte mov instruction, possibly spilling whatever was in that register. We could be talking about 3 extra instructions totalling maybe 18 bytes on an instruction that used to occupy maybe 6 bytes. Class tests in a polymorphic inline cache are particularly affected by this. Also, relocations (ie. jumps between different DLLs) must be 64 bits because there's no reason to think DLLs will be loaded within a 32-bit offset of each other.
      • Autos that are pointers now occupy twice as much stack space, making your stack frame that much less likely to fit within an 8-bit signed offset (ie. 127 bytes). That means you can't use [esp+12h] addressing to access your locals, but rather [rsp+12345678h], which requires three extra bytes (not to mention the Rex prefix). Highly optimized functions often have lots of variables, especially after inlining, and in OO code, lots of the variables are pointers, so this one could hurt.
      • Similarly, the AMD64 linkage convention on Linux has 6 parameters passed in registers (while IA32 has none) which also makes the stack frame bigger. This can be mitigated by using a frame pointer, but if you don't dedicate a register as a frame pointer, than you need to access your parameters with the stack pointer (rsp), and the parameters are always at the largest offsets from rsp. Result: parameters are likely not to be reachable with an 8-bit offset from rsp.
      If I had to estimate off the top of my head, I'd guess code would be more like 25% bigger, while OO data could be as much as 50% bigger. (Remember that each object contains a pointer to its class or vft, and many object fields are pointers.)
      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    3. Re:64bit performance gains... by Performer+Guy · · Score: 2, Informative

      1) Is NOT an inate property of a 64 bit processor. You could build a 32 bit processor with more registers.

      2) is valid

      3) True but almost no software I use does much 64 bit type processing.

      4) You could do this with a compiler, it the instruction is slow, yu don't save die area because you need to support it in 32 bit mode.

      You missed out the biggest winner, the massive cache on this processor, 1MB I believe, that's a big step up.

      You put a cache that size on a 32 bit Athlon and you will see some big improvements.

      I don't think it's right to say 64 bit is inherently faster, if your application needs it then yes, but for 32 bit class apps, 32 bit mode is faster.

  7. Re:64-benchmarks wont be good by kenneth_martens · · Score: 2, Informative
    Intel's IA-64 emulates 32-bit unlike AMD's 64-bit chips which have 32-bit hardware. So we can expect AMD to beat Intel easily in 32-bit stuff.

    If you had RTFA, you would know that the benchmarks compared the Athlon64 against Pentium 4s and Xeons, not against IA64. What the benchmarks show is that the 32-bit performance of the Athlon64 is on par or better than the best Pentium 4 processors, and is better than the current Xeons. IA64 is not benchmarked in the article.

    The 64-bit performance of the Athlon64 is not being benchmarked in the article; it is the 32-bit performance relative to leading 32-bit processors that is the issue.
  8. Re:Intel's response by Glasswire · · Score: 3, Informative

    Prescott with PNI new instructions, 1Mb L2 cache clocking up to 4GHz and beyond, 800MHz front side bus and increased software support for Hyperthreading. (eg. 2.6.x Linux kernels know how to do HT scheduling much more efficiently)

    Watch the Xmas benchmarks, that's when it matters...

  9. Re:64-benchmarks wont be good by Distinguished+Hero · · Score: 5, Informative

    How the frell did this get modded up? Please RTFA before commenting/modding.

    The benchmark was against a P4 (as well as a dual Xeon), which runs IA-32 natively, not the Italium.

    The A64 is a consumer chip, designed to be purchased and used by consumers. The Itanium processor costs more than a whole top of the line consumer computer. The A64 and the Italium are not targeted at the same market segment and neither is the Opteron, which is supposed to go up against the Xeon.

    The reason everyone is looking forward to a benchmark of an A64 running a native 64-bit application on a 64-bit OS is that not only is X86-64 considerably cleaner than IA-32, but the A64 also has two times as many SSE2 and General Purpose registers, which should yield significantly better results than the A64 running in 32-bit mode (which is already outperforming the P4 in a lot of benchmarks).

    By the way, before someone points out that the benchmarked processor is an overclocked Opteron and not an A64, AMD is currently planning on releasing a version of the A64 which is just a rebranded Opteron 1xx along with the single-channel version of the A64.

    --
    Uttering logically derived and empirically supported truths to the disciples of the orthodox establishment.
  10. Intel Itanium vrs. AMD Opteron/Athlon64 by mjuarez · · Score: 4, Informative

    Just to set some things straight:

    - Itanium, Intel's 64-bit chip, uses a totally different architecture (EPIC) from the current Pentium x86 line of chips. This architecture is NOT compatible with x86, so that effectively you need a recompile for existing software work on Itanium. There is an EMULATION mode for x86 in Itanium, which is absolutely unusable according to various sources on the Net. You will DEFINITELY not want to run a game on it. Finally, prices for a low-end 1.0Ghz Itanium chip start at approx $800.

    - AMD's Opteron/Athlon64 chips are compatible with everything you are running right now at 32 bits. You can install a complete 32-bit operating system in it, and everything will run just as today, albeit a little bit faster. There is no need for an "emulator". And, of course, you can already use Linux at full-64 bits, available from SuSe, RedHat and Mandrake. Also, Microsoft will release a 64-bit version of XP at the end of the year.

    Marcos

  11. Re:About 64-bit gaming performance by amorsen · · Score: 4, Informative
    a benchmark of 64-bit gaming performance (say, its 3D calculation or its AI plotting performance) would be mostly a waste of time, as you would see very likely only see an equalling performance at best.

    This would have been the case if IA-32 was a sane architecture. Athlon64 in IA-32 mode has only 8 visible general purpose registers, whereas it has 16 in 64-bit mode. That makes 64-bit mode a win in almost all cases. Technically it would have made sense for AMD to introduce a new 32-bit mode, but it would probably have been bad for marketing.

    --
    Finally! A year of moderation! Ready for 2019?
  12. First Look at Windows XP 64bit for AMD64 by rchatterjee · · Score: 5, Informative

    GamePC is running a first look of Windows XP 64bit edition for the AMD64 (x86-64) architecture.

  13. Re:Intel's response by Anonymous Coward · · Score: 1, Informative

    Appro 4U Quad Opteron Server. That ought to contain one, don't you think?

  14. Re:About 64-bit gaming performance by Ospeovedizer · · Score: 3, Informative

    What you say is true, if the only improvement of AMD64 is 64-bit support. However, AMD64 also doubles the number of general-purpose and XMM (for SSE, SSE2) registers to 16 of each. This will make many programs run faster, as having 8 general-purpose registers is just not enough. Far too much time is given to swapping data into and out of registers on x86.
    The additional registers is really what I like about AMD64. I couldn't care less about 64bit for now.

    --
    "We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2
  15. Re:64-benchmarks wont be good by parkanoid · · Score: 4, Informative

    Sorry, but Hyper-Threading isn't really used to "take any advantage of the dualies". From the intel page: "Hyper-Threading Technology is a form of simultaneous multi-threading technology (SMT) where multiple threads of software applications can be run simultaneously on one processor" (emphasis mine)

  16. 20% Gain by MBCook · · Score: 3, Informative
    IIRC, Tim Sweeny said that by recompiling one of the versions of UT (2003 maybe?) for the x86-64 platform without optimizations, they saw up to a 20% performance boost. Now if they were to optomize the code on top of that, they could probably get a little more.

    So even for programs that don't need to use 64 bit math, moving them to the x86-64 platform can speed them up. It won't improve your typing speed in Word, but it can probably speed up most if not all your games if they are simply recompiled.

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
  17. Re:About 64-bit gaming performance by mjuarez · · Score: 2, Informative

    However, what happens when the operating system does a context switch or some other exception occurs? The latency from saving the processor context is going to go way up as you have to save far more data to memory and then load the same large amount of data in for the new context.

    There is no "context-switch" delay. The processor takes exactly the same amount of time doing a context-switch at 64-bits than at 32-bits. Remember that the processor has to do a certain number of clocks per second, and it cannot "fall behind" or get delayed.

    Now, if your programmers decide that they want to work on 64bit wide data instead of the 32bit they used to on the old system, you suddenly find that your processor is having to move double the amount of data around there system.. You have to hope that any increases in memory bandwidth the engineers included are enough to cater for this.

    If you read the article, you will have noticed that Opteron has an integrated memory controller. In this case, it means the controller was moving data at 2.0Ghz. This adds up to significant increases in performance in the benchmarks, as could be seen by the article.

    I think the main thing I'm trying to say is that 64bit computing isn't necessarily faster than 32 bit computing. Indeed, because some of the overheads can be double or quaduple, it can be a performance hit.

    Absolutely true. It can be slower (just take a look at Itanium :-), but it shouldn't! Did you even read the article? In most of the benchmarks, the Opteron was even faster than dual-Xeons (although I'm not sure the benchmarks were fully using the additional processor) I didn't see a "performance hit" anywhere in the benchmarks.

  18. Re:Well I'm hopefull. by mjuarez · · Score: 2, Informative

    Tyan and Arima already have dual motherboards out there. The Tyan K8W looks really nice for a workstation or high-end gaming machine. All 4P motherboards are not "available" per se, they're only should as complete systems. Check out Appro, Angstrom Computer or Racksaver for some 4P servers if you're looking for Opteron servers.

  19. Re:Will it be secure? by MROD · · Score: 3, Informative

    Not exactly.

    Within the MMU look-up tables the memory pages can be marked as being executable or not. Hence, if a program tries to jump to memory in a protected page (ie. not marked as executable) it will cause an exception.

    The current x86 MMU doesn't have this ability, unlike some processors such as the Sun UltraSPARC (though not any versions previous to this).

    --

    Agrajag: "Oh no, not again!"
  20. One More thing by Nazmun · · Score: 2, Informative

    Most of the slashdotters already pointed out the other important stuff...

    But I'd like to point out that the Itanium will not be competition for the Opteron in most cases. Itaniums are super expensive chips that run on servers and are totaly incompatible with x86 (32 bits or 64 bits) software unless it's in emulation mode in which it runs very slowly. If you were to run Itanium on x86 software then more then likely the opterons would easily win anyway.

    --
    Hmmm... Pie...
  21. no more 'next page' style, please ;-( by TheGratefulNet · · Score: 3, Informative
    here ( http://www.anandtech.com/printarticle.html?i=1856) is the printable (all continuous) version.

    causing hit counters to go up artificially just to see 'next page' drives me nuts!

    --

    --
    "It is now safe to switch off your computer."
  22. Re:So why didn't Intel do this? Politics by afidel · · Score: 4, Informative

    And conventional wisdom was correct. They just underestimated the power of the entrenched software library. Intel processors since the Pentium Pro have basically been RISC cores with a x86->RISC translator in front. This allows them to ramp up the speed of the core, even change core architectures while still running all the old code. It costs at the fairly small cost of the gates needed for the translation frontend. It has another advantage in that CISC operations take up less room in cache so you get much better utilization out of your expensive cache resources. Intel started the Itanium project for two reasons, HP needed a new flagship chip and they are a large enough customer to sway Intel, and two they were tired of Cyrix and AMD copying their designs so they were going to make a tightly controlled architecture where EVERYTHING was covered by patents and copyright, that way they thought they could have the whole pie to themselves. What they didn't realize is that while they are a big player the only reason people keep using their chips is that they have maintained that backwards compatability path, throw that away and Intel is just another chip maker and others like IBM, Motorolla, etc may look better.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  23. AMD64 already has non-executable pages. by tamyrlin · · Score: 4, Informative

    Quoting the AMD64 Architecture Programmer's Manual Volume 2: System Programming:

    "The NX bit in the page-translation tables specifies whether instructions can be executed from the page."

    So non-executable pages are already present in AMD64.

  24. Re:About 64-bit gaming performance by be-fan · · Score: 2, Informative

    It has to lok like its doing this, but doesn't have to do this :) A P4 has about 128 internal registers, and uses renaming hardware to present 8 to a given task. A given task may use more than 8 registers, where the CPU figures out ways to avoid spilling a register and doing a rename instead. Now, during a context switch, the CPU doesn't actually have to dump the full context of the processor out to memory. Most of the state gets buffered, either in the internal register file, or in one of the write queues. Also, I doubt modern processors flush the i-cache. The i-cache on the P4 is actually the trace cache, and flushing it would involve dumping about 8kb of traces that took a lot of work to make. In reality, its probably lazily replaced with new traces as the new process executes.

    FYI> The big win with the AMD64 is not that the processor has more physical registers (it probably doesn't) but that its larger window of 16 GPRs enables the compiler's optimizer to do a much better job with register allocation.

    --
    A deep unwavering belief is a sure sign you're missing something...
  25. This is good, but don't count on XP 64-Bit by aelfwyne · · Score: 2, Informative

    If the AMD64 version of Windows XP 64-Bit is as stripped down as the current Intel version... then don't bother considering what performance would be like there anyway... check here for a list of things *NOT* included in XP 64-bit:

    http://www.microsoft.com/technet/treeview/defaul t. asp?url=/technet/prodtechnol/winxppro/reskit/prka_ fea_tfiu.asp

    But I guess we can do without features like Media Player, POSIX Compliance, Power Management, Windows Installer, and more... I guess..... just to have a 64-bit OS...

    --
    -- If it ain't broke - overclock it more.
  26. Re:About the wattage... by sirsex · · Score: 2, Informative

    To the first order, power increase linearly with speed, squared with voltage. P=CFV^2