Slashdot Mirror


AMD's 64-bit Plot

ceebABC writes "In a long interview with eWEEK, AMD's CEO Hector de Ruiz talks about struggling to compete with Intel, but more importantly about their upcoming 64-bit processors. He says that AMD's 64-bit chips will be comparatively priced to the 32-bit ones, and backwards compatible. He also thinks there will be a market for desktop 64-bit systems. Skip to the last page for the most interesting stuff."

31 of 507 comments (clear)

  1. Re:Big deal. by Devil's+BSD · · Score: 2, Informative
    Nintendo64 was 64 bits.

    Yeah, and I have a 128-bit graphics card. (I know, they have like 100 Mbit ethernet cards now. :) ) However, The GPU and processor are totally different. The graphics card has more bits but obviously it doesnt run as fast as the cpu. All it does is make your fragfest a little more purty by letting you see the giblets all over. Having the CPU 64 bits is quite different, security-wise, code-wise, and speed-wise. If you have a 64-bit 2 GHz processor and a 32-bit 2 GHz processor, the 64-bit processor is going to be much faster. This speeds up the whole system, not just the rate at which you make giblets fly.

    --
    I'm the Devil the Windows users warned you about.
  2. Benchmark's by Anonymous Coward · · Score: 4, Informative

    Here are some benchmarks for a Operton.

    http://www.aceshardware.com/

  3. 64 bits=$8=8 bytes etc??? by Devil's+BSD · · Score: 2, Informative

    OK people, I know some of you are trying to be humorous, but really the 64 bits is the size of the registers and how much data the processor handles at once. Which means at 64 bits, the processor can process (theoretically) twice as much data per second than a 32 bit processor. Which also means it can handle any number up to 2^64.

    --
    I'm the Devil the Windows users warned you about.
    1. Re:64 bits=$8=8 bytes etc??? by Visigothe · · Score: 3, Informative

      This isn't totally correct.

      "64bit" refers to the size of the instruction word, not "how much data the processor handles at once". That is a function of pipelining, ALUs, branch prediction, etc. This can be proved by a recompile of a 32bit application with 64bit flags. The application won't be "magically" twice as fast.

      There is something else... a 64bit app may even be *slower* as the cache can only hold half the number of words, given an equal cache size. Cache misses are a huge performance hit these days, as RAM is much slower than Cache RAM.

      Of course the big difference between AMD and IBM is that the new 64bit PPC970 doesn't take a performance hit switching between 32 and 64bit applications. This has more to do with the PPC ISA than anything in the processor.

      The only thing that 64bits will give "normal" users is the ability to address a *huge* amount of LOGICAL memory. In most cases, it doesn't make sense to make 64bit versions of applications, due to the above cache issue. Also, note the allusion that users will require more RAM for 64bit applications, as it will be needed to store the larger word size.

      .

  4. Re:They don't *WANT* to make money?!?! by DjMd · · Score: 5, Informative

    I love that everyone read that story and thought it ment that they were leaving the desktop market, when it really said that they were going to diversify outside of the desktop market, as in do more in addition to their desktop market...

    (a quote from first paragraph of the Forbes article "[a] strategy of developing processors for a wider range of products outside computers ...")

    --
    DJMD - The fourth man - Planetary
  5. Re:Big deal. by Junks+Jerzey · · Score: 5, Informative

    If you have a 64-bit 2 GHz processor and a 32-bit 2 GHz processor, the 64-bit processor is going to be much faster. This speeds up the whole system, not just the rate at which you make giblets fly.

    No. That's a myth. As it stands, Pentiums for many years now have sported 64 bit buses and 64-bit FPUs (well, 80-bit CPUS actually), so we're not talking about bus size and FPU width. We're talking about:

    1. All addresses being 64-bits.
    2. All internal integer registers being 64-bits.

    For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.

    For #2, realize that some integer operations are O(N) where N is the number of bits involved. 64-bit multiplication and division are slower than the same 32-bit operations. Period.

    The gain with 64-bit processors is one of address space and nothing more.

  6. Re:Will This be Linux's first killer app? by JKR · · Score: 5, Informative
    ...and I have not heard of microsoft having anything ready for this market

    MS have been quietly getting ready for 64 bit for at least 2 years; they've been shipping a 64 bit SDK on my MSDN disks for over a year. There are 64 bit NVidia drivers for WinXP-64. What makes you think MS isn't already there?

  7. Windows runs in 64 bit by KPU · · Score: 2, Informative

    Check the Windows XP 64 bit edition website. I hate to burst your bubble, but microsoft knows what it's doing.

  8. 32-bit compatible = a temporary half-solution by justanumber · · Score: 4, Informative

    No real benefit will come until geniune 64-bit apps hit the consumer market. This will be a steep learning curve for most developers who have only ever know 16 or 32-bit programming.

    The problems to be hurdled are:

    1) Reliance on the fact that size of pointer is equal to size of int.

    2) Reliance on a particular byte order in the machine word.

    3) Using type long and presuming that it always has the same size as int.

    4) Alignment of stack variables.

    5) Different alignment rules in structures and classes.

    6) Pointer arithmetic.

    A lot of engineering (and developer re-education) work also needs to be put into not only these issues, but also designing the application so that it is actually getting the most out of each clock cycle.

  9. More bits not useful to games? by Inoshiro · · Score: 5, Informative

    Have you ever done a physics engine? When you are working with vectors, you want as much precission as you can get. More precission means more bits.

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  10. Re:Big deal. by Dave_bsr · · Score: 5, Informative

    Increased maximum memory helps.
    Opteron's extra registers help.
    64-bit calculations are easier, they don't have to be put into multiple 32-bit parts.

    So...a 32-person bus is just as good as a 64-person bus? It may be harder to design and build, but when you have to move >32 people it's nice to have that big of a bus running around.

    What I'm saying is, being 64-bit DOES make you faster. Not twice as fast, but definately faster and more powerful.

    --


    Who is this Anonymous Coward character, how does he post so much, and why is he always such a whore?
  11. Re:Microsoft Quote, and Kernel Dev Question by fastpathguru · · Score: 2, Informative

    Kernel 2.4.20 has x86-64 support built-in.

    Look for SuSE's Andi Kleen in the release-notes.

    fpg

  12. Re:Just to remind people why more bits is good.. by Christopher+Thomas · · Score: 5, Informative

    2^64 addressing is not the only benefit of the change. FPUs see additional benefit when they have more bits. More bits means more precission; this is very important and desirable, especially when working with numbers that have fractional components. For proper 3D rendering, physics models, and anything else that involves computing numbers that have fractional parts, more is better. When the FPU can handle a double in one clock cycle because it works natively on 64-bit IEEE floating point numbers, you will notice a performance boost in addition to the increased accuracy.

    Um, all current x86s already handle 64-bit IEEE double-precision floats natively (actually more like 80 bits, for "extended double-precision"). The FP register file has been this wide for quite a while.

    There will be no performance or precision boost for floating-point math from moving the rest of the chip to 64-bit registers/datapaths.

  13. Re:Just to remind people why more bits is good.. by blamanj · · Score: 2, Informative

    A nit. Orders of magnitude is generally thought of in the decimal realm. Thus 2^64 which is a 20 digit number is only 10 orders of magnitude greater than 2^32 (a 10 digit number).

    I wouldn't be to sure about the 100 years part either. But it out to be good for at least 10.

  14. Re:Wow by TheAncientHacker · · Score: 4, Informative

    Actually, IBM was pretty damn sure that people needed 80386 systems. What they were also just as sure about was that an 80386 based PC would canibalize sales from their System/36 systems. The folks up in Rochester, Minnesota (where the System/36 and later AS/400 come from) went to Armonk (IBM Headquarters) and had the IBM Executive Committee block the 80386 based PC.

    The industry stalled for a while because NOBODY had introduced anything for the PC compatible industry that wasn't a clone of IBM's systems or peripherals until then. Finally, Compaq risked the company with the DeskPro 386 and IBM was in serious trouble.

  15. Re:Big deal. by Christopher+Thomas · · Score: 5, Informative

    For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.

    wouldn't the chance of cache misses depend on the caching policy? How does the data size matter?

    Data size matters because a program will typically access a fixed number of working variables, not a fixed amount of data. If a program's working set size stays at, say, 1000 words, and you move from a 32-bit to a 64-bit architecture, you need a cache with twice as much storage space to hold the working set without thrashing.

    There's easily enough die area to double the sizes of the L1 and L2 caches; the problem is that it slows down cache access (more latency cycles fetching something from L1 is a Bad Thing).

    Certain types of load work with constant size instead of constant word count, but most of those deal with working sets large enough that you'll thrash no matter what.

    The gain with 64-bit processors is one of address space and nothing more.

    Which includes better behaviour for those programs that have to fake larger address space. That would be a speed increase.

    Nothing running on x86 will do that. Unless you're running old DOS programs in real mode, you're already working with a flat address space. Typically 2 gigs of this is available to user programs (with the rest being mapped to kernel or device space). If you have a problem with a working set larger than 2 gigabytes, you already have a Sun/$other_vendor machine to solve it on.

    Larger address space targets the _future_ problem of desktop users who want many gigabytes of memory.

    A fringe benefit is being able to more efficiently map multi-gigabyte files into memory space, but performance for this kind of task is limited by disk latency and controller bandwidth, not memory architecture.

  16. Re:Hmm by Screaming+Lunatic · · Score: 5, Informative
    There's really not much need for 64bits even in gaming...but the 64-bitness of the chips is not at all important for games for the foreseeable future.

    That's the biggest bunch of crap that I've ever heard. There are a bunch of games that do fixed point math because floating point does not give you enough accuracy.

    Collision detection would certainly benefit from improved precision. Physics suck in games because it is difficult to do fast and accurate at the same time.

    Epic has promised a 64bit version of games. I'm guessing they are doing so for a very good reason. And they are doing this despite the fact that they use a comparitively very robust physics engine in Karma.

    I'm guessing you've never implemented a physics engine or even taken a Numerical Analysis course or read any books. So how about pulling your head out of your ass before disseminating FUD.

  17. Re:Just to remind people why more bits is good.. by Zaak · · Score: 2, Informative

    The physical interconnect is of secondary importance to the internal implementation. If your program counter and other registers have 64 bits internally, then to make a processor which can actually use 2^64 bytes of memory, you just need to add more address lines to the IC. No big deal. When your registers are only 32 bits (as they are in the IA32 processors we have now) it's not easy to make a processor which can use more than 2^32 bytes of memory. You have to use icky segmentation schemes and other ugliness.

    TTFN

  18. Re:Big Bets on Table by Jace+of+Fuse! · · Score: 3, Informative

    or does the emulator just pick up x86 instructions and translate them to IA64 instructions?

    As I understand it, AMD's 64-Bit processors actually have hardware for supporting the previous 32-Bit instructions. I could be misunderstanding, but if I'm not this will naturally mean that with 32-Bit instructions the AMD chip will outperform Intel's emulation.

    Intel is banking heavily on people finally ditching x86 for good. There are good reasons for people to ditch x86, but there is one good reason to keep it: Legacy Support. How important that is will depend on the person and their needs.

    --

    "Everything you know is wrong. (And stupid.)"

    Moderation Totals: Wrong=2, Stupid=3, Total=5.
  19. Re:Just to remind people why more bits is good.. by kma · · Score: 3, Informative
    Servers have already hit this limit. That's why there are special instructions (a return to segmented memory access) on P3 and P4 processors, allowing up to 64gb of RAM in 4gb segments to be addressed.
    Bzzt. The feature you're describing is known as PAE, for physical address extension. It doesn't work via "real mode" style DOS segmentation. Each program's virtual address space is still 4GB, and pointers are still a flat 32 bits. PAE simply changes the hardware page table structure so the 4GB "window" of your virtual address space can look out onto more than 4GBs of physical memory. Even though no one process can access more memory than before, you can run multiple, 4GB processes on a single machine.

    Miraculously, someone at Intel stowed the x86 crackpipe, preventing some sort of segmented/overlay nightmare like the one you describe.
  20. 32 bits != 4 gig max by cartman · · Score: 5, Informative

    32 bit architectures are not limited to 4 gigabytes of memory. "32 bit processor" refers to the width of the DATA bus (and registers). It does not refer to the width of the address bus.

    For example, the z80 and 6502 were 8-bit processors, but they supported more than 256 bytes of RAM (2^8 bytes). The 68000 and 80286 were 16-bit processors, but they supported more than 64k of RAM (2^16 bytes). That's because the 8-bit processors had 16-bit address busses, and the 16-bit processors often had 24-bit address busses.

    The current pentium-4 Xeon chip supports 64 gig of RAM, despite being a 32-bit processor.

    64-bit computing means that you can hold a 64-bit quantity (long int or double) in a register. Also, you can load, store, or perform arithmetic on such quantities using one instruction and often in one clock cycle.

    This offers very few benefits for the end consumer. Mostly it's about perception: consumers will percieve that a 64-bit chip is twice as good as a 32-bit one.

    1. Re:32 bits != 4 gig max by mr_data_esq · · Score: 3, Informative
      Mostly, I agree. In fact, I spend lots of time writing software for 8 and 16 bit machines, and I spend half that time turning single bits on and off.

      One thing I'd like to point out, though: I've noticed that an awful lot of mathematics is being done using doubles (i.e., 64-bit floats) these days. It's partially laziness, but it's also really the case that 32-bit IEEE floats only give you 24 bits of accuracy. Doing math with doubles really cuts down on roundoff errors, so a lot of people switch to doubles and forget about it.

    2. Re:32 bits != 4 gig max by Bert64 · · Score: 4, Informative

      AFAIK the 68000 was a 32bit processor, with 24bit address bus and 16bit external bus. The later 68020 increased everything to 32bit.
      However, the p4 actually has a 32bit address bus, with hacks to address 36bit space, but thats what it is.. a hack, the extra addressspace is not directly available to apps. There is also likely to be a performance hit when using these hacks..

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    3. Re:32 bits != 4 gig max by Ninja+Programmer · · Score: 2, Informative
      • 32 bit architectures are not limited to 4 gigabytes of memory. "32 bit processor" refers to the width of the DATA bus (and registers). It does not refer to the width of the address bus.
      This is a marketing term, not a technical one. And as far as the current 32 bit processors are concerned, you have it exactly backwards. x86's today have SSE registers which can hold 64 bit integers. I think the PPC's AltiVec registers can also hold 64bit integers. However, neither processor can access a 64 bit address space.

      But even the marketeers knows that you don't call these processors "64bit". With the notable exception of Nintendo, the only time a processor is labelled "64 bits" is when its *address* space is 64 bits, not just its registers/ALUs.
    4. Re:32 bits != 4 gig max by cartman · · Score: 2, Informative

      What you said is incorrect. "32 bit" has ALWAYS referred to the width of the data bus/registers.

      SSE2 and AltiVec have 128-bit registers. This is so they can hold 4 32-bit quantities (the point is to be able to do 32-bit operations on 4 values at a time). It is still a 32-bit processor.

      If the number of bits described the address bus width, then x86 is 36-bit, DEC Alpha is 36-bit (NOT 64-bit), 6502 is 16-bit, 68000 is 24-bit, 80286 is 24-bit, etc. This is clearly not the case. When was the last time you heard anyone refer to a Xeon as a 36-bit processor, or a 68000 as a 24-bit processor?

  21. Re:Big deal. by ameoba · · Score: 4, Informative

    I think you mean CISC.

    RISC = Reduced Instruction Set Computer
    CISC = Complex ...

    The basic idea of (most) RISC chip designs, such as the MIPS, Alpha, PowerPC & Sparc, was to have a large number of general purpose registers, fixed length instructions that could only refer to those registers, and only a handful of instructions that specifically read/wrote to main memory (which is why they're also referred to as 'load/store' architectures). This simplistic design allowed them to push clock speeds without too much trouble. RISC processors were also adopted superscalar designs (having multiple execution units, allowing the execution of multiple instructions 'simultaniously') before their CISC counterparts.

    In contrast to the simplicity of the RISC systems, there are the CISC chips, such as the x86 and the old VAX processors, which tried to make their instructions resemble high-level languages, as well as having a smaller number of registers, many of them having a special purpose. With variable length instructions, and many different modes of operation for each instruction, the CISC methodology generaly resulted in much larger, more complex chip designs that were harder to speed up, pipeline & make superscalar.

    To compare the two, lets take a simple operation, such as taking two numbers from memory & adding them together. A generic RISC system would do something like:
    1) load 1st number into Register 1
    2) load 2nd number into Register 2
    3) add the value in R1 to R2, putting the value in R3
    4) copy the value from Register 3 to memory ...and not have any other way to solve the problem

    where a CISC chip, would more likely do something more like:
    1)add the value at memory location 1 to the value at memory location 2, and store in a special Accumulator register
    2) copy the Accumulator register back to memory

    The difference being that where the RISC machine only had one addition operation (register+register->register), the CISC machine would have a handful of them, depending on where the data came from (memory (using multiple forms of reference), registers, constants, and various combinations).

    In the early 80s, the RISC/CISC debate was a hot one in accademia, and RISC won out there, by virtue of its simplicity & easy of improvement. By the mid 80s, the debate was starting again in industry, as a number of RISC chips started entering the marketplace, where Intel's x86 architecture won by virtue of the IBM PC.

    The whole debate is pretty much a moot point now,
    since Intel's new x86 chips have RISC cores wrapped by a thin layer to translate the complex instructions. As an added bonus, the new 64b x86 systems should be adding a bunch of extra registers, further negating the penalty of the architecture.

    --
    my sig's at the bottom of the page.
  22. P4 no longer cooler operating than Athlon by Brian+Stretch · · Score: 3, Informative

    The new Thoroughbred Revision B Athlons (XP 2400+ and higher) made a significant drop in power consumption (1.65V core), while the 3GHz P4 guzzles more electrons than any Athlon (have you seen the heatsink Intel bundles with that thing?!). The Hammer series uses Silicon-On-Insulator technology to keep power consumption (heat) down, to the point that the larger Hammer core consumes about the same amount of power as the TBred RevB. AMD is gunning for the high-density rackmount market with the Opteron where efficient power use is critical. They'll get it too.

    I have a dual CPU Athlon 2400+ box, 2GHz each, using Thermalright SLK800 heatsinks and 80mm adjustable fans set to 2500RPM. My temps are 41C/43C/42C (case/CPU1/CPU2) at the moment with about 25% CPU utilization. Power consumption (as measured by my UPS load monitor) is the same as the dual Athlon 1800+ chips (1.53GHz) the new CPUs replaced.

  23. FUD disguised as a technical comment. by Ninja+Programmer · · Score: 5, Informative
    • 1. All addresses being 64-bits.
      For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.
    This is incorrect. The Hammer "long mode" uses 32 bits as the default data size. 64 bits are only used for pointers and explicitely overridden 64 bit operands. I.e., you still have to declare "long long" or "int64" or whatever, in your languages to access those 64 bits. All your old 32-bit data still occupies the same space.

    Furthermore, measurements by AMD indicate that op-code size did not increase with the expanded instructions, but actual *decreased* because the additional registers decreased the typical amount of spill/fill code emitted.

    Therefore there is no additional cache pressure. The "code bloat" problem remains solely in the hands of the software developer, and is *NOT* worsened in any way by hammer.
    • 2. All internal integer registers being 64-bits.
      For #2, realize that some integer operations are O(N) where N is the number of bits involved. 64-bit multiplication and division are slowerthan the same 32-bit operations. Period.
    This is also incorrect. There are numerous well known techniques used in ALU design that makes precious few operations "O(bits)". Again, AMD specifically targetted this. For example: the 64-bit integer multiply in hammer is *FASTER* (per clock) than the 32-bit integer multiply in either the Athlon or Pentium 4.

    The reason AMD is able to do this is because arithmetic and logic operations can largely be implemented in a "more gates for more speed" fashion. They are closer to O(ln(N)) than O(N). But at this level of circuit design, you don't necessarily think in those terms (since N is constant, everything just looks like O(1)) -- these high speed circuit designers worry about other technical things like "latch speed".

    The 64 bit integer divide may be a little slower, however, again you need to explicitely use 64 bit ints in your software, and division is a comparatively uncommon operation.
    • The gain with 64-bit processors is one of address space and nothing more.
    This is the largest gain (big DB people will be very happy with it) but it certainly is not the only gain. Remember that there are now twice as many SSE registers. This opens up some performance possibilities for multimedia applications.

    Although I don't know that its related to SSE, it should be pointed out that EPIC (as in the video game company) has ported the Unreal engine to x86-64! Like most people, I was quite surprised that they did this, however, they apparently found doing it to be worthwhile.

    Do not underestimate the upside of going to 64 bits in the way that AMD has done it. They have literally made it a no-lose scenario -- that alone should spur (mostly new) application developer interest.
  24. Re:Big Bets on Table by Ninja+Programmer · · Score: 2, Informative
    • Both Intel and AMD have been betting big on 64 bit computing and it will be interesting to see how this plays out.
    They had nowhere else to go. If we start hitting the 4GB, and there is no solution, software developers and end-users will eventually be crying bloody murder like they were when Intel's 640KB limitation was hit. (That time Intel was slow to react -- this time around AMD and Intel are trying to have a solution in place *before* it becomes a problem.)
    • Itanium 1 was a flop. Itanium 2 has respectable performance, but is not IA-32 backward compatible, where AMD x86-64 is backward compatible.
    Well I like to dig on Intel as much as the next guy, but technically speaking, IA64 is backward compatible with IA32 (it does have a bona fide IA32 mode.) But its slow as molasses (they might as well be emulating IA-32.)

    That being said, I don't think Windows device drivers are going to work on IA-64 (the IA-32 mode is not involved in the boot process in any way.) IA-64's compatibility is in fact a "joke", though technically there.

    The backward compatibility mode in Hammer, is very different. You can boot 32-bit windows on it, play your old DOS games on it or whatever and you will not know the difference (except it will be a lot faster.)
  25. The "bitness" of a micro-architecture by Luminous+Coward · · Score: 2, Informative

    I have to share this insightful comment I read on Usenet 3 years ago:

    The "bit width" of a CPU is not strictly defined by a single architectural
    attribute. Several candidates for a "normative" bit width exist:

    - word width of the general purpose registers

    - width of internal data paths

    - width of external data paths

    - width of the ALU

    - width of the architected address range

    There are probably more...

    Back in the days of the 8 bit processors, ALUs were 8 bit wide, but address
    range was already 16 bit.

    In the age of 16 bit processors, registers and ALUs were 16 bits wide, but
    often there were more than 16 address bits. Segmented addressing was needed
    to make use of more than 64 KB for a single process.

    When the first 32 bit CPUs appeared, they had 32 bit wide general purpose
    registers and 32 bits of architected address space. But for example the 68000
    had only a 16 bit ALU and its data bus was only 16 bits wide. Of the address
    bits, only 24 were externally visible on pins.

    Nowadays, with "64 bit CPUs" a reality for high-end computers, the address
    width is the important criterion. Only a true 64 bit machine can linearly
    address more than 4 GB for each running process. And when you handle pointer
    variables that are 64 bits wide, it makes a lot of sense to have 64 bit wide
    registers, a 64 bit ALU and 64 bit wide internal data paths. All current
    64 bit CPUs that I know of meet this definition of "64 bit".

    Internal bus widths tend to be wider (think of the 256 bit wide backside L2
    bus of Coppermine or the G5), and registers have been wider than the "bitness"
    ever since FPUs have moved on-chip (you don't even need to consider AltiVec or
    SSE). External buses are sometimes narrower (to save some pins and a lot of
    bucks on packaging) and sometimes wider (to better feed the new and fast CPU
    cores from the same old memory chips).

    So, by all intents and purposes, the x86 architecture was 16 bit until and
    including the 286, and is 32 bit from the 386 onwards. AMD's K8 will probably
    extend it to 64 bits. The P6 core is 32 bits, but it has some extensions to
    enable it to address 64 GB of physical RAM. But every single process can only
    address 4 GB directly, since pointers are still 32 bits wide. AFAIK K7 and P7
    also have these extensions, but are still 32 bit cores.

    BTW, the G5 is also rumored to be able to address 64 GB of physical RAM.
    There are four unused bits in each of the "segment registers" which could be
    used by the OS to select one of sixteen banks of 4 GB each. But processes
    would still be limited to 4 GB of directly addressable memory.

    Holger Bettag

  26. Re:Hmm by drinkypoo · · Score: 3, Informative
    Well I'm forced to play my hole card, IPv6 addresses. As we know the size of an IPv4 address is 32 bits and a IPv6 address is 128 bits. A machine with 32 bit registers will need to use four of them to store an entire IPv6 address at once. A machine with 64 bit registers will only need to use two.

    While x86-compatible CPUs have generally not been used in dedicated networking devices until very recently due to the cost to performance ratio, they have become a fairly popular high-performance embedded solution lately. Hammer should be an extremely attractive solution in the high-performance embedded space because:

    • It is inexpensive. The interview that has spawned all this discussion seems to indicate that they will be in the same price range as the current 32 bit Athlon offerings. (AMD's processor roadmap indicates that only the 2-8 way SMP version (Slegehammer) will be called Opteron, and Clawhammer will be known as Athlon 64 and Mobile Athlon 64.)
    • It has an integrated memory controller. Those willing to use hypertransport between their custom silicon and the CPU might need only a north bridge, or perhaps even a simpler solution. A Hypertransport to PCI bridge would be sufficient for most needs, if not all, given the use of hypertransport to do whatever actual work you have in mind.
    • It is 64 bit, and as such can shovel a lot of data quickly, or work with large integers (Again quickly, since you can work with an integer of any size on a CPU with at least two digits' worth of space or at the very least one digit and a carry/borrow flag.)
    • The part is (well, will be) available with very high clock rates which while it is by no means the only defining factor in the performance of a system, it is nonetheless a factor.

    Another nice factor of using large integers instead of floating point is that when you absolutely positively have to get the result back in the same number of cycles each time, you can do this. Math coprocessors are just that, coprocessors. I haven't kept up so I don't know just how fast you can expect things to come back from them these days, and if they are actually scheduled or not, but at least in the olden days you had to shovel the data at the math co, then query it to find out if it was done. One problem was that if you queried it too fast it might not have set the flags properly yet and you would get bogus results. Ah, x86 is so classy!

    The address space will become significant to us all very quickly if we start doing entirely memory-mapped I/O. Isn't this an issue of the Hurd at the moment? While there are other ways to solve it (but who wants to deal with segmented addressing? not me!) certainly there are many advantages to mem-mapped I/O.

    And finally, sure games do fine, but more power means bigger, shiner games with more gibs! Also the reason GPUs have become so popular is that CPU speed wasn't growing fast enough to satisfy the desires of the game industry. Expect to see some more graphics-related processing to be done in the CPU for a while, namely multires (the reduction of vertices in a model one at a time with re-meshing in between, with the greatest number of vertices assigned to the appropriate models and usually determined by a scoring system, using very high-vertex-count models which may never be rendered with all visible vertices plotted EVER.) Multires and the most simple of occlusion techniques is enough to make a scalable game which will look very good on even low-end hardware and still look fantastically better on high-end equipment. It does cost you CPU though, and I'm sure you can see where I'm going with this. Of course multires will be an inherent feature of a future generation of 3D accelerators which will do even more for the developer and likely have even crappier drivers.

    Also the memory bandwidth of hammer doesn't seem like it's all that outstanding except that it's integrated into the CPU and so you can expect to do less waiting. The real advantages in terms of memory bandwidth will be in SMP systems. Of course I don't know too many people planning to go to Clawhammer who aren't planning to go to dual Clawhammer, but if they are less inexpensive than promised I'll be one sucker with only one of 'em.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"