AMD's 64-bit Plot
ceebABC writes "In a long interview with eWEEK, AMD's CEO Hector de Ruiz talks about struggling to compete with Intel, but more importantly about their upcoming 64-bit processors. He says that AMD's 64-bit chips will be comparatively priced to the 32-bit ones, and backwards compatible. He also thinks there will be a market for desktop 64-bit systems. Skip to the last page for the most interesting stuff."
Here are some benchmarks for a Operton.
http://www.aceshardware.com/
I love that everyone read that story and thought it ment that they were leaving the desktop market, when it really said that they were going to diversify outside of the desktop market, as in do more in addition to their desktop market...
...")
(a quote from first paragraph of the Forbes article "[a] strategy of developing processors for a wider range of products outside computers
DJMD - The fourth man - Planetary
If you have a 64-bit 2 GHz processor and a 32-bit 2 GHz processor, the 64-bit processor is going to be much faster. This speeds up the whole system, not just the rate at which you make giblets fly.
No. That's a myth. As it stands, Pentiums for many years now have sported 64 bit buses and 64-bit FPUs (well, 80-bit CPUS actually), so we're not talking about bus size and FPU width. We're talking about:
1. All addresses being 64-bits.
2. All internal integer registers being 64-bits.
For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.
For #2, realize that some integer operations are O(N) where N is the number of bits involved. 64-bit multiplication and division are slower than the same 32-bit operations. Period.
The gain with 64-bit processors is one of address space and nothing more.
MS have been quietly getting ready for 64 bit for at least 2 years; they've been shipping a 64 bit SDK on my MSDN disks for over a year. There are 64 bit NVidia drivers for WinXP-64. What makes you think MS isn't already there?
No real benefit will come until geniune 64-bit apps hit the consumer market. This will be a steep learning curve for most developers who have only ever know 16 or 32-bit programming.
The problems to be hurdled are:
1) Reliance on the fact that size of pointer is equal to size of int.
2) Reliance on a particular byte order in the machine word.
3) Using type long and presuming that it always has the same size as int.
4) Alignment of stack variables.
5) Different alignment rules in structures and classes.
6) Pointer arithmetic.
A lot of engineering (and developer re-education) work also needs to be put into not only these issues, but also designing the application so that it is actually getting the most out of each clock cycle.
Have you ever done a physics engine? When you are working with vectors, you want as much precission as you can get. More precission means more bits.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Increased maximum memory helps.
Opteron's extra registers help.
64-bit calculations are easier, they don't have to be put into multiple 32-bit parts.
So...a 32-person bus is just as good as a 64-person bus? It may be harder to design and build, but when you have to move >32 people it's nice to have that big of a bus running around.
What I'm saying is, being 64-bit DOES make you faster. Not twice as fast, but definately faster and more powerful.
Who is this Anonymous Coward character, how does he post so much, and why is he always such a whore?
2^64 addressing is not the only benefit of the change. FPUs see additional benefit when they have more bits. More bits means more precission; this is very important and desirable, especially when working with numbers that have fractional components. For proper 3D rendering, physics models, and anything else that involves computing numbers that have fractional parts, more is better. When the FPU can handle a double in one clock cycle because it works natively on 64-bit IEEE floating point numbers, you will notice a performance boost in addition to the increased accuracy.
Um, all current x86s already handle 64-bit IEEE double-precision floats natively (actually more like 80 bits, for "extended double-precision"). The FP register file has been this wide for quite a while.
There will be no performance or precision boost for floating-point math from moving the rest of the chip to 64-bit registers/datapaths.
Actually, IBM was pretty damn sure that people needed 80386 systems. What they were also just as sure about was that an 80386 based PC would canibalize sales from their System/36 systems. The folks up in Rochester, Minnesota (where the System/36 and later AS/400 come from) went to Armonk (IBM Headquarters) and had the IBM Executive Committee block the 80386 based PC.
The industry stalled for a while because NOBODY had introduced anything for the PC compatible industry that wasn't a clone of IBM's systems or peripherals until then. Finally, Compaq risked the company with the DeskPro 386 and IBM was in serious trouble.
For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.
wouldn't the chance of cache misses depend on the caching policy? How does the data size matter?
Data size matters because a program will typically access a fixed number of working variables, not a fixed amount of data. If a program's working set size stays at, say, 1000 words, and you move from a 32-bit to a 64-bit architecture, you need a cache with twice as much storage space to hold the working set without thrashing.
There's easily enough die area to double the sizes of the L1 and L2 caches; the problem is that it slows down cache access (more latency cycles fetching something from L1 is a Bad Thing).
Certain types of load work with constant size instead of constant word count, but most of those deal with working sets large enough that you'll thrash no matter what.
The gain with 64-bit processors is one of address space and nothing more.
Which includes better behaviour for those programs that have to fake larger address space. That would be a speed increase.
Nothing running on x86 will do that. Unless you're running old DOS programs in real mode, you're already working with a flat address space. Typically 2 gigs of this is available to user programs (with the rest being mapped to kernel or device space). If you have a problem with a working set larger than 2 gigabytes, you already have a Sun/$other_vendor machine to solve it on.
Larger address space targets the _future_ problem of desktop users who want many gigabytes of memory.
A fringe benefit is being able to more efficiently map multi-gigabyte files into memory space, but performance for this kind of task is limited by disk latency and controller bandwidth, not memory architecture.
That's the biggest bunch of crap that I've ever heard. There are a bunch of games that do fixed point math because floating point does not give you enough accuracy.
Collision detection would certainly benefit from improved precision. Physics suck in games because it is difficult to do fast and accurate at the same time.
Epic has promised a 64bit version of games. I'm guessing they are doing so for a very good reason. And they are doing this despite the fact that they use a comparitively very robust physics engine in Karma.
I'm guessing you've never implemented a physics engine or even taken a Numerical Analysis course or read any books. So how about pulling your head out of your ass before disseminating FUD.
32 bit architectures are not limited to 4 gigabytes of memory. "32 bit processor" refers to the width of the DATA bus (and registers). It does not refer to the width of the address bus.
For example, the z80 and 6502 were 8-bit processors, but they supported more than 256 bytes of RAM (2^8 bytes). The 68000 and 80286 were 16-bit processors, but they supported more than 64k of RAM (2^16 bytes). That's because the 8-bit processors had 16-bit address busses, and the 16-bit processors often had 24-bit address busses.
The current pentium-4 Xeon chip supports 64 gig of RAM, despite being a 32-bit processor.
64-bit computing means that you can hold a 64-bit quantity (long int or double) in a register. Also, you can load, store, or perform arithmetic on such quantities using one instruction and often in one clock cycle.
This offers very few benefits for the end consumer. Mostly it's about perception: consumers will percieve that a 64-bit chip is twice as good as a 32-bit one.
I think you mean CISC.
...
...and not have any other way to solve the problem
RISC = Reduced Instruction Set Computer
CISC = Complex
The basic idea of (most) RISC chip designs, such as the MIPS, Alpha, PowerPC & Sparc, was to have a large number of general purpose registers, fixed length instructions that could only refer to those registers, and only a handful of instructions that specifically read/wrote to main memory (which is why they're also referred to as 'load/store' architectures). This simplistic design allowed them to push clock speeds without too much trouble. RISC processors were also adopted superscalar designs (having multiple execution units, allowing the execution of multiple instructions 'simultaniously') before their CISC counterparts.
In contrast to the simplicity of the RISC systems, there are the CISC chips, such as the x86 and the old VAX processors, which tried to make their instructions resemble high-level languages, as well as having a smaller number of registers, many of them having a special purpose. With variable length instructions, and many different modes of operation for each instruction, the CISC methodology generaly resulted in much larger, more complex chip designs that were harder to speed up, pipeline & make superscalar.
To compare the two, lets take a simple operation, such as taking two numbers from memory & adding them together. A generic RISC system would do something like:
1) load 1st number into Register 1
2) load 2nd number into Register 2
3) add the value in R1 to R2, putting the value in R3
4) copy the value from Register 3 to memory
where a CISC chip, would more likely do something more like:
1)add the value at memory location 1 to the value at memory location 2, and store in a special Accumulator register
2) copy the Accumulator register back to memory
The difference being that where the RISC machine only had one addition operation (register+register->register), the CISC machine would have a handful of them, depending on where the data came from (memory (using multiple forms of reference), registers, constants, and various combinations).
In the early 80s, the RISC/CISC debate was a hot one in accademia, and RISC won out there, by virtue of its simplicity & easy of improvement. By the mid 80s, the debate was starting again in industry, as a number of RISC chips started entering the marketplace, where Intel's x86 architecture won by virtue of the IBM PC.
The whole debate is pretty much a moot point now,
since Intel's new x86 chips have RISC cores wrapped by a thin layer to translate the complex instructions. As an added bonus, the new 64b x86 systems should be adding a bunch of extra registers, further negating the penalty of the architecture.
my sig's at the bottom of the page.
- 1. All addresses being 64-bits.
This is incorrect. The Hammer "long mode" uses 32 bits as the default data size. 64 bits are only used for pointers and explicitely overridden 64 bit operands. I.e., you still have to declare "long long" or "int64" or whatever, in your languages to access those 64 bits. All your old 32-bit data still occupies the same space.For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.
Furthermore, measurements by AMD indicate that op-code size did not increase with the expanded instructions, but actual *decreased* because the additional registers decreased the typical amount of spill/fill code emitted.
Therefore there is no additional cache pressure. The "code bloat" problem remains solely in the hands of the software developer, and is *NOT* worsened in any way by hammer.
- 2. All internal integer registers being 64-bits.
This is also incorrect. There are numerous well known techniques used in ALU design that makes precious few operations "O(bits)". Again, AMD specifically targetted this. For example: the 64-bit integer multiply in hammer is *FASTER* (per clock) than the 32-bit integer multiply in either the Athlon or Pentium 4.For #2, realize that some integer operations are O(N) where N is the number of bits involved. 64-bit multiplication and division are slowerthan the same 32-bit operations. Period.
The reason AMD is able to do this is because arithmetic and logic operations can largely be implemented in a "more gates for more speed" fashion. They are closer to O(ln(N)) than O(N). But at this level of circuit design, you don't necessarily think in those terms (since N is constant, everything just looks like O(1)) -- these high speed circuit designers worry about other technical things like "latch speed".
The 64 bit integer divide may be a little slower, however, again you need to explicitely use 64 bit ints in your software, and division is a comparatively uncommon operation.
- The gain with 64-bit processors is one of address space and nothing more.
This is the largest gain (big DB people will be very happy with it) but it certainly is not the only gain. Remember that there are now twice as many SSE registers. This opens up some performance possibilities for multimedia applications.Although I don't know that its related to SSE, it should be pointed out that EPIC (as in the video game company) has ported the Unreal engine to x86-64! Like most people, I was quite surprised that they did this, however, they apparently found doing it to be worthwhile.
Do not underestimate the upside of going to 64 bits in the way that AMD has done it. They have literally made it a no-lose scenario -- that alone should spur (mostly new) application developer interest.