AMD's 64-bit Plot
ceebABC writes "In a long interview with eWEEK, AMD's CEO Hector de Ruiz talks about struggling to compete with Intel, but more importantly about their upcoming 64-bit processors. He says that AMD's 64-bit chips will be comparatively priced to the 32-bit ones, and backwards compatible. He also thinks there will be a market for desktop 64-bit systems. Skip to the last page for the most interesting stuff."
Yeah, and I have a 128-bit graphics card. (I know, they have like 100 Mbit ethernet cards now. :) ) However, The GPU and processor are totally different. The graphics card has more bits but obviously it doesnt run as fast as the cpu. All it does is make your fragfest a little more purty by letting you see the giblets all over. Having the CPU 64 bits is quite different, security-wise, code-wise, and speed-wise. If you have a 64-bit 2 GHz processor and a 32-bit 2 GHz processor, the 64-bit processor is going to be much faster. This speeds up the whole system, not just the rate at which you make giblets fly.
I'm the Devil the Windows users warned you about.
Here are some benchmarks for a Operton.
http://www.aceshardware.com/
OK people, I know some of you are trying to be humorous, but really the 64 bits is the size of the registers and how much data the processor handles at once. Which means at 64 bits, the processor can process (theoretically) twice as much data per second than a 32 bit processor. Which also means it can handle any number up to 2^64.
I'm the Devil the Windows users warned you about.
I love that everyone read that story and thought it ment that they were leaving the desktop market, when it really said that they were going to diversify outside of the desktop market, as in do more in addition to their desktop market...
...")
(a quote from first paragraph of the Forbes article "[a] strategy of developing processors for a wider range of products outside computers
DJMD - The fourth man - Planetary
If you have a 64-bit 2 GHz processor and a 32-bit 2 GHz processor, the 64-bit processor is going to be much faster. This speeds up the whole system, not just the rate at which you make giblets fly.
No. That's a myth. As it stands, Pentiums for many years now have sported 64 bit buses and 64-bit FPUs (well, 80-bit CPUS actually), so we're not talking about bus size and FPU width. We're talking about:
1. All addresses being 64-bits.
2. All internal integer registers being 64-bits.
For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.
For #2, realize that some integer operations are O(N) where N is the number of bits involved. 64-bit multiplication and division are slower than the same 32-bit operations. Period.
The gain with 64-bit processors is one of address space and nothing more.
MS have been quietly getting ready for 64 bit for at least 2 years; they've been shipping a 64 bit SDK on my MSDN disks for over a year. There are 64 bit NVidia drivers for WinXP-64. What makes you think MS isn't already there?
Check the Windows XP 64 bit edition website. I hate to burst your bubble, but microsoft knows what it's doing.
No real benefit will come until geniune 64-bit apps hit the consumer market. This will be a steep learning curve for most developers who have only ever know 16 or 32-bit programming.
The problems to be hurdled are:
1) Reliance on the fact that size of pointer is equal to size of int.
2) Reliance on a particular byte order in the machine word.
3) Using type long and presuming that it always has the same size as int.
4) Alignment of stack variables.
5) Different alignment rules in structures and classes.
6) Pointer arithmetic.
A lot of engineering (and developer re-education) work also needs to be put into not only these issues, but also designing the application so that it is actually getting the most out of each clock cycle.
Have you ever done a physics engine? When you are working with vectors, you want as much precission as you can get. More precission means more bits.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Increased maximum memory helps.
Opteron's extra registers help.
64-bit calculations are easier, they don't have to be put into multiple 32-bit parts.
So...a 32-person bus is just as good as a 64-person bus? It may be harder to design and build, but when you have to move >32 people it's nice to have that big of a bus running around.
What I'm saying is, being 64-bit DOES make you faster. Not twice as fast, but definately faster and more powerful.
Who is this Anonymous Coward character, how does he post so much, and why is he always such a whore?
Kernel 2.4.20 has x86-64 support built-in.
Look for SuSE's Andi Kleen in the release-notes.
fpg
2^64 addressing is not the only benefit of the change. FPUs see additional benefit when they have more bits. More bits means more precission; this is very important and desirable, especially when working with numbers that have fractional components. For proper 3D rendering, physics models, and anything else that involves computing numbers that have fractional parts, more is better. When the FPU can handle a double in one clock cycle because it works natively on 64-bit IEEE floating point numbers, you will notice a performance boost in addition to the increased accuracy.
Um, all current x86s already handle 64-bit IEEE double-precision floats natively (actually more like 80 bits, for "extended double-precision"). The FP register file has been this wide for quite a while.
There will be no performance or precision boost for floating-point math from moving the rest of the chip to 64-bit registers/datapaths.
A nit. Orders of magnitude is generally thought of in the decimal realm. Thus 2^64 which is a 20 digit number is only 10 orders of magnitude greater than 2^32 (a 10 digit number).
I wouldn't be to sure about the 100 years part either. But it out to be good for at least 10.
Actually, IBM was pretty damn sure that people needed 80386 systems. What they were also just as sure about was that an 80386 based PC would canibalize sales from their System/36 systems. The folks up in Rochester, Minnesota (where the System/36 and later AS/400 come from) went to Armonk (IBM Headquarters) and had the IBM Executive Committee block the 80386 based PC.
The industry stalled for a while because NOBODY had introduced anything for the PC compatible industry that wasn't a clone of IBM's systems or peripherals until then. Finally, Compaq risked the company with the DeskPro 386 and IBM was in serious trouble.
For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.
wouldn't the chance of cache misses depend on the caching policy? How does the data size matter?
Data size matters because a program will typically access a fixed number of working variables, not a fixed amount of data. If a program's working set size stays at, say, 1000 words, and you move from a 32-bit to a 64-bit architecture, you need a cache with twice as much storage space to hold the working set without thrashing.
There's easily enough die area to double the sizes of the L1 and L2 caches; the problem is that it slows down cache access (more latency cycles fetching something from L1 is a Bad Thing).
Certain types of load work with constant size instead of constant word count, but most of those deal with working sets large enough that you'll thrash no matter what.
The gain with 64-bit processors is one of address space and nothing more.
Which includes better behaviour for those programs that have to fake larger address space. That would be a speed increase.
Nothing running on x86 will do that. Unless you're running old DOS programs in real mode, you're already working with a flat address space. Typically 2 gigs of this is available to user programs (with the rest being mapped to kernel or device space). If you have a problem with a working set larger than 2 gigabytes, you already have a Sun/$other_vendor machine to solve it on.
Larger address space targets the _future_ problem of desktop users who want many gigabytes of memory.
A fringe benefit is being able to more efficiently map multi-gigabyte files into memory space, but performance for this kind of task is limited by disk latency and controller bandwidth, not memory architecture.
That's the biggest bunch of crap that I've ever heard. There are a bunch of games that do fixed point math because floating point does not give you enough accuracy.
Collision detection would certainly benefit from improved precision. Physics suck in games because it is difficult to do fast and accurate at the same time.
Epic has promised a 64bit version of games. I'm guessing they are doing so for a very good reason. And they are doing this despite the fact that they use a comparitively very robust physics engine in Karma.
I'm guessing you've never implemented a physics engine or even taken a Numerical Analysis course or read any books. So how about pulling your head out of your ass before disseminating FUD.
The physical interconnect is of secondary importance to the internal implementation. If your program counter and other registers have 64 bits internally, then to make a processor which can actually use 2^64 bytes of memory, you just need to add more address lines to the IC. No big deal. When your registers are only 32 bits (as they are in the IA32 processors we have now) it's not easy to make a processor which can use more than 2^32 bytes of memory. You have to use icky segmentation schemes and other ugliness.
TTFN
or does the emulator just pick up x86 instructions and translate them to IA64 instructions?
As I understand it, AMD's 64-Bit processors actually have hardware for supporting the previous 32-Bit instructions. I could be misunderstanding, but if I'm not this will naturally mean that with 32-Bit instructions the AMD chip will outperform Intel's emulation.
Intel is banking heavily on people finally ditching x86 for good. There are good reasons for people to ditch x86, but there is one good reason to keep it: Legacy Support. How important that is will depend on the person and their needs.
"Everything you know is wrong. (And stupid.)"
Moderation Totals: Wrong=2, Stupid=3, Total=5.
Miraculously, someone at Intel stowed the x86 crackpipe, preventing some sort of segmented/overlay nightmare like the one you describe.
32 bit architectures are not limited to 4 gigabytes of memory. "32 bit processor" refers to the width of the DATA bus (and registers). It does not refer to the width of the address bus.
For example, the z80 and 6502 were 8-bit processors, but they supported more than 256 bytes of RAM (2^8 bytes). The 68000 and 80286 were 16-bit processors, but they supported more than 64k of RAM (2^16 bytes). That's because the 8-bit processors had 16-bit address busses, and the 16-bit processors often had 24-bit address busses.
The current pentium-4 Xeon chip supports 64 gig of RAM, despite being a 32-bit processor.
64-bit computing means that you can hold a 64-bit quantity (long int or double) in a register. Also, you can load, store, or perform arithmetic on such quantities using one instruction and often in one clock cycle.
This offers very few benefits for the end consumer. Mostly it's about perception: consumers will percieve that a 64-bit chip is twice as good as a 32-bit one.
I think you mean CISC.
...
...and not have any other way to solve the problem
RISC = Reduced Instruction Set Computer
CISC = Complex
The basic idea of (most) RISC chip designs, such as the MIPS, Alpha, PowerPC & Sparc, was to have a large number of general purpose registers, fixed length instructions that could only refer to those registers, and only a handful of instructions that specifically read/wrote to main memory (which is why they're also referred to as 'load/store' architectures). This simplistic design allowed them to push clock speeds without too much trouble. RISC processors were also adopted superscalar designs (having multiple execution units, allowing the execution of multiple instructions 'simultaniously') before their CISC counterparts.
In contrast to the simplicity of the RISC systems, there are the CISC chips, such as the x86 and the old VAX processors, which tried to make their instructions resemble high-level languages, as well as having a smaller number of registers, many of them having a special purpose. With variable length instructions, and many different modes of operation for each instruction, the CISC methodology generaly resulted in much larger, more complex chip designs that were harder to speed up, pipeline & make superscalar.
To compare the two, lets take a simple operation, such as taking two numbers from memory & adding them together. A generic RISC system would do something like:
1) load 1st number into Register 1
2) load 2nd number into Register 2
3) add the value in R1 to R2, putting the value in R3
4) copy the value from Register 3 to memory
where a CISC chip, would more likely do something more like:
1)add the value at memory location 1 to the value at memory location 2, and store in a special Accumulator register
2) copy the Accumulator register back to memory
The difference being that where the RISC machine only had one addition operation (register+register->register), the CISC machine would have a handful of them, depending on where the data came from (memory (using multiple forms of reference), registers, constants, and various combinations).
In the early 80s, the RISC/CISC debate was a hot one in accademia, and RISC won out there, by virtue of its simplicity & easy of improvement. By the mid 80s, the debate was starting again in industry, as a number of RISC chips started entering the marketplace, where Intel's x86 architecture won by virtue of the IBM PC.
The whole debate is pretty much a moot point now,
since Intel's new x86 chips have RISC cores wrapped by a thin layer to translate the complex instructions. As an added bonus, the new 64b x86 systems should be adding a bunch of extra registers, further negating the penalty of the architecture.
my sig's at the bottom of the page.
The new Thoroughbred Revision B Athlons (XP 2400+ and higher) made a significant drop in power consumption (1.65V core), while the 3GHz P4 guzzles more electrons than any Athlon (have you seen the heatsink Intel bundles with that thing?!). The Hammer series uses Silicon-On-Insulator technology to keep power consumption (heat) down, to the point that the larger Hammer core consumes about the same amount of power as the TBred RevB. AMD is gunning for the high-density rackmount market with the Opteron where efficient power use is critical. They'll get it too.
I have a dual CPU Athlon 2400+ box, 2GHz each, using Thermalright SLK800 heatsinks and 80mm adjustable fans set to 2500RPM. My temps are 41C/43C/42C (case/CPU1/CPU2) at the moment with about 25% CPU utilization. Power consumption (as measured by my UPS load monitor) is the same as the dual Athlon 1800+ chips (1.53GHz) the new CPUs replaced.
- 1. All addresses being 64-bits.
This is incorrect. The Hammer "long mode" uses 32 bits as the default data size. 64 bits are only used for pointers and explicitely overridden 64 bit operands. I.e., you still have to declare "long long" or "int64" or whatever, in your languages to access those 64 bits. All your old 32-bit data still occupies the same space.For #1, realize that this is going to greatly increase the data size of many applications. The larger the data size, the higher the chance of cache misses. In general, this is a loss, not a win.
Furthermore, measurements by AMD indicate that op-code size did not increase with the expanded instructions, but actual *decreased* because the additional registers decreased the typical amount of spill/fill code emitted.
Therefore there is no additional cache pressure. The "code bloat" problem remains solely in the hands of the software developer, and is *NOT* worsened in any way by hammer.
- 2. All internal integer registers being 64-bits.
This is also incorrect. There are numerous well known techniques used in ALU design that makes precious few operations "O(bits)". Again, AMD specifically targetted this. For example: the 64-bit integer multiply in hammer is *FASTER* (per clock) than the 32-bit integer multiply in either the Athlon or Pentium 4.For #2, realize that some integer operations are O(N) where N is the number of bits involved. 64-bit multiplication and division are slowerthan the same 32-bit operations. Period.
The reason AMD is able to do this is because arithmetic and logic operations can largely be implemented in a "more gates for more speed" fashion. They are closer to O(ln(N)) than O(N). But at this level of circuit design, you don't necessarily think in those terms (since N is constant, everything just looks like O(1)) -- these high speed circuit designers worry about other technical things like "latch speed".
The 64 bit integer divide may be a little slower, however, again you need to explicitely use 64 bit ints in your software, and division is a comparatively uncommon operation.
- The gain with 64-bit processors is one of address space and nothing more.
This is the largest gain (big DB people will be very happy with it) but it certainly is not the only gain. Remember that there are now twice as many SSE registers. This opens up some performance possibilities for multimedia applications.Although I don't know that its related to SSE, it should be pointed out that EPIC (as in the video game company) has ported the Unreal engine to x86-64! Like most people, I was quite surprised that they did this, however, they apparently found doing it to be worthwhile.
Do not underestimate the upside of going to 64 bits in the way that AMD has done it. They have literally made it a no-lose scenario -- that alone should spur (mostly new) application developer interest.
- Both Intel and AMD have been betting big on 64 bit computing and it will be interesting to see how this plays out.
They had nowhere else to go. If we start hitting the 4GB, and there is no solution, software developers and end-users will eventually be crying bloody murder like they were when Intel's 640KB limitation was hit. (That time Intel was slow to react -- this time around AMD and Intel are trying to have a solution in place *before* it becomes a problem.)- Itanium 1 was a flop. Itanium 2 has respectable performance, but is not IA-32 backward compatible, where AMD x86-64 is backward compatible.
Well I like to dig on Intel as much as the next guy, but technically speaking, IA64 is backward compatible with IA32 (it does have a bona fide IA32 mode.) But its slow as molasses (they might as well be emulating IA-32.)That being said, I don't think Windows device drivers are going to work on IA-64 (the IA-32 mode is not involved in the boot process in any way.) IA-64's compatibility is in fact a "joke", though technically there.
The backward compatibility mode in Hammer, is very different. You can boot 32-bit windows on it, play your old DOS games on it or whatever and you will not know the difference (except it will be a lot faster.)
I have to share this insightful comment I read on Usenet 3 years ago:
The "bit width" of a CPU is not strictly defined by a single architectural
attribute. Several candidates for a "normative" bit width exist:
- word width of the general purpose registers
- width of internal data paths
- width of external data paths
- width of the ALU
- width of the architected address range
There are probably more...
Back in the days of the 8 bit processors, ALUs were 8 bit wide, but address
range was already 16 bit.
In the age of 16 bit processors, registers and ALUs were 16 bits wide, but
often there were more than 16 address bits. Segmented addressing was needed
to make use of more than 64 KB for a single process.
When the first 32 bit CPUs appeared, they had 32 bit wide general purpose
registers and 32 bits of architected address space. But for example the 68000
had only a 16 bit ALU and its data bus was only 16 bits wide. Of the address
bits, only 24 were externally visible on pins.
Nowadays, with "64 bit CPUs" a reality for high-end computers, the address
width is the important criterion. Only a true 64 bit machine can linearly
address more than 4 GB for each running process. And when you handle pointer
variables that are 64 bits wide, it makes a lot of sense to have 64 bit wide
registers, a 64 bit ALU and 64 bit wide internal data paths. All current
64 bit CPUs that I know of meet this definition of "64 bit".
Internal bus widths tend to be wider (think of the 256 bit wide backside L2
bus of Coppermine or the G5), and registers have been wider than the "bitness"
ever since FPUs have moved on-chip (you don't even need to consider AltiVec or
SSE). External buses are sometimes narrower (to save some pins and a lot of
bucks on packaging) and sometimes wider (to better feed the new and fast CPU
cores from the same old memory chips).
So, by all intents and purposes, the x86 architecture was 16 bit until and
including the 286, and is 32 bit from the 386 onwards. AMD's K8 will probably
extend it to 64 bits. The P6 core is 32 bits, but it has some extensions to
enable it to address 64 GB of physical RAM. But every single process can only
address 4 GB directly, since pointers are still 32 bits wide. AFAIK K7 and P7
also have these extensions, but are still 32 bit cores.
BTW, the G5 is also rumored to be able to address 64 GB of physical RAM.
There are four unused bits in each of the "segment registers" which could be
used by the OS to select one of sixteen banks of 4 GB each. But processes
would still be limited to 4 GB of directly addressable memory.
Holger Bettag
While x86-compatible CPUs have generally not been used in dedicated networking devices until very recently due to the cost to performance ratio, they have become a fairly popular high-performance embedded solution lately. Hammer should be an extremely attractive solution in the high-performance embedded space because:
Another nice factor of using large integers instead of floating point is that when you absolutely positively have to get the result back in the same number of cycles each time, you can do this. Math coprocessors are just that, coprocessors. I haven't kept up so I don't know just how fast you can expect things to come back from them these days, and if they are actually scheduled or not, but at least in the olden days you had to shovel the data at the math co, then query it to find out if it was done. One problem was that if you queried it too fast it might not have set the flags properly yet and you would get bogus results. Ah, x86 is so classy!
The address space will become significant to us all very quickly if we start doing entirely memory-mapped I/O. Isn't this an issue of the Hurd at the moment? While there are other ways to solve it (but who wants to deal with segmented addressing? not me!) certainly there are many advantages to mem-mapped I/O.
And finally, sure games do fine, but more power means bigger, shiner games with more gibs! Also the reason GPUs have become so popular is that CPU speed wasn't growing fast enough to satisfy the desires of the game industry. Expect to see some more graphics-related processing to be done in the CPU for a while, namely multires (the reduction of vertices in a model one at a time with re-meshing in between, with the greatest number of vertices assigned to the appropriate models and usually determined by a scoring system, using very high-vertex-count models which may never be rendered with all visible vertices plotted EVER.) Multires and the most simple of occlusion techniques is enough to make a scalable game which will look very good on even low-end hardware and still look fantastically better on high-end equipment. It does cost you CPU though, and I'm sure you can see where I'm going with this. Of course multires will be an inherent feature of a future generation of 3D accelerators which will do even more for the developer and likely have even crappier drivers.
Also the memory bandwidth of hammer doesn't seem like it's all that outstanding except that it's integrated into the CPU and so you can expect to do less waiting. The real advantages in terms of memory bandwidth will be in SMP systems. Of course I don't know too many people planning to go to Clawhammer who aren't planning to go to dual Clawhammer, but if they are less inexpensive than promised I'll be one sucker with only one of 'em.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"