AMD64 Preview
Araxen writes "Over at Anandtech.com they have an interesting preview of AMD's 64 bit processor on a Nforce3 mobo. The results are very impressive with the Anthlon64 beating out Intel's P4 best processor soundly in their gaming benchmarks. This was only in 32-bit mode no less! I can't wait for 64-bit benchmarks come out!"
The benchmarks are from a 2ghz Opteron, not an Athlon 64. It is intended to give an example of the performance from the new chip. Unfortunately, upon introduction, only the Athlon FX, running on ECC memory will be capable of using dual-channel memory. And from what I've heard, this cpu will cost in the vicinity of $600+. The first non-ECC dual-channel platform will be introduced in 2004.
Anandtech is only comparing single processor Opteron performance against everything else, no infact, Athlon64 performance. The primary difference is that the Opteron has a dual channel memory subsystem, whereas the Athlon64 has a single channel system. This difference will have an affect on performance.
The Doormat
If you're not outraged, then you're not paying attention.
Huh? There's no such thing as an "Intel x86-64" processor. x86-64 is AMD's solely implementation.
If a job's not worth doing, it's not worth doing right.
I actually read this this morning, and there are a couple of important things to note - the chip being 'previewed' isn't actually an Athlon64 - it's a 1.8GHz Opteron overclocked to 2.0GHz, which is the expected clock rate of the first A64, prorated at 3200+. It'll give us an idea of what to expect, but nothing too specific.
The other important thing to note is that the comparisons were mostly against P4s and an Athlon XP, with a Dual 3.06GHz Xeon thrown in for good measure, all 32 bit chips. And the 'Athlon64' owned most of the competitions, showing that its 32 bit mode is just as good as rumored. There were no Itaniums in the competition since, so only 32 bit modes can be compared here. However, if the A64 turns out to be as good in its native 64 bit mode as the 32 bit number might lead you to believe, the Athlon 64 looks like it very well could be a force to be dealt with.
When are some of these newer processors going to implement the executable permissions bit in the MMU so that the STACK can be NON-EXECUTABLE (ok I know some trampoline stuff needs executable stacks, well they can ask for it where needed by setting the executable bit for a small region)
And when are some of these new processors going to be fully virtualizable? I'm talking about PUSHF and POPF generating exceptions like directly setting the interrupt flag does.
Think how easy plex86 would be to run on a processor that did this properly?
Code-morphing Transmeta (come one!), AMD (maybe?) Intel (no chance?)
Sam
blog.sam.liddicott.com
Intel doesn't have an x86-64 line of processors. They have an IA64 line of processors.
The two apparently aren't interchangable. There's a coming battle in which software companies have to choose between the two, or support both, which would be tough on both them and consumers.
Apparently, AMD's x86-64 set is easier to deal with, and more of a natural progression from where the processors are now. (It also apparently runs 32-bit code at rates comparable to 32-bit chips at the same clock speed.) Intel's IA-64 is a total reworking, and a bitch to work with, from what I've read.
In the end, it seems like the smart choice would be for everybody to toss their hat in with x86-64 (which means Intel would have to, as well, and essentially concede defeat and lose face); it probably won't happen, though, because Intel is Intel.
Check out this article at the Inquirer, which I've basically just paraphrased, but it does go into some interesting Windows 64 dealings.
Before anybody starts talking about how little 64bit cpu's actually increase performance, let me tell everyone what 64 bit mode will actually bring to the table over the Opteron/Athlon64 32 bit modes:
1) more registers. This will get us fair performance increase from the start, as compilers will have more registers to work with when doing calculations on multiple pieces of data.
2) support for larger system memory sizes. This won't help you in video games, but it will help you doing high end photoshop, and other applications (provided you spend the money to get more memory put into your system)
3) native operations on 64 bit data. Typically, when someone wants to do operations on a 64 bit integer in a 32 bit CPU, you have to split up the work in software. Now with 64 bit registers, you will be able to do operations on 64 bit integers in the same time as it takes to do the same operation on a 32 bit integer.
4) when using native 64 bit mode, certain legacy instructions of x86-32 are depreciated. This is a cleanup for the x86 ISA, which in the past has contained literaly EVERYTHING that the previous generation of CPU supported. AMD's x86-64 ISA eliminates these legacy features and moves them into firmware emulation (don't worry, it won't degrade any modern 32 bit code, just terribly outdated stuff from the 386 days, which doesn't need 2GHz of power in the first place)
On top of these performance enhancements that 64 bit mode brings you, you get all of this just because you are using AMD's Opteron/Athlon64 CPU:
1) Dual channel DDR Memory interface, with memory controller on the die of the CPU. This reduces latency and improves memory bandwidth so dramatically that even Intel's off die memory controller can't keep up (this is why video games are so much faster on the amd64 platform than on athlon-32 platform)
2) HyperTransport bus to the south bridge, which will give high bandwidth access to the PCI bus, PCI-X, and other IO intensive controllers. Eventually AGP slots will be phased out for PCI-X slots which will be universal for both video, and other devices.
3) when using multiple CPU's in the same system, the new AMD-64 platform gives you dedicated memory bandwidth to each CPU installed. On the intel and athlon-32 platforms, all the CPU's in the system shared the same memory controller which runs either single or dual channel DDR anywhere from 266MHz - 400MHz.
Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.
Prescott with PNI new instructions, 1Mb L2 cache clocking up to 4GHz and beyond, 800MHz front side bus and increased software support for Hyperthreading. (eg. 2.6.x Linux kernels know how to do HT scheduling much more efficiently)
Watch the Xmas benchmarks, that's when it matters...
or so says Ars Technica. In addition most of the initial shipments will go to motherboard manufacturers for bundling with their boards. I really don't like the idea of that becoming common practice as that much purchasing power will mean tight pricing controlls. Read more Here.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
How the frell did this get modded up? Please RTFA before commenting/modding.
The benchmark was against a P4 (as well as a dual Xeon), which runs IA-32 natively, not the Italium.
The A64 is a consumer chip, designed to be purchased and used by consumers. The Itanium processor costs more than a whole top of the line consumer computer. The A64 and the Italium are not targeted at the same market segment and neither is the Opteron, which is supposed to go up against the Xeon.
The reason everyone is looking forward to a benchmark of an A64 running a native 64-bit application on a 64-bit OS is that not only is X86-64 considerably cleaner than IA-32, but the A64 also has two times as many SSE2 and General Purpose registers, which should yield significantly better results than the A64 running in 32-bit mode (which is already outperforming the P4 in a lot of benchmarks).
By the way, before someone points out that the benchmarked processor is an overclocked Opteron and not an A64, AMD is currently planning on releasing a version of the A64 which is just a rebranded Opteron 1xx along with the single-channel version of the A64.
Uttering logically derived and empirically supported truths to the disciples of the orthodox establishment.
Just to set some things straight:
- Itanium, Intel's 64-bit chip, uses a totally different architecture (EPIC) from the current Pentium x86 line of chips. This architecture is NOT compatible with x86, so that effectively you need a recompile for existing software work on Itanium. There is an EMULATION mode for x86 in Itanium, which is absolutely unusable according to various sources on the Net. You will DEFINITELY not want to run a game on it. Finally, prices for a low-end 1.0Ghz Itanium chip start at approx $800.
- AMD's Opteron/Athlon64 chips are compatible with everything you are running right now at 32 bits. You can install a complete 32-bit operating system in it, and everything will run just as today, albeit a little bit faster. There is no need for an "emulator". And, of course, you can already use Linux at full-64 bits, available from SuSe, RedHat and Mandrake. Also, Microsoft will release a 64-bit version of XP at the end of the year.
Marcos
Of course, you can buy a dual-Opteron or even a quad-Opteron TODAY if you want, or you can wait until late this year to buy a Prescott system, which is not 64-bits nor multi-processing.
By the way, did you know Prescott, along with its mobile version Dothan, was delayed because it was dissipating almost 103 watts? For the record, Opteron is dissipating about 60 watts.
Marcos
This would have been the case if IA-32 was a sane architecture. Athlon64 in IA-32 mode has only 8 visible general purpose registers, whereas it has 16 in 64-bit mode. That makes 64-bit mode a win in almost all cases. Technically it would have made sense for AMD to introduce a new 32-bit mode, but it would probably have been bad for marketing.
Finally! A year of moderation! Ready for 2019?
GamePC is running a first look of Windows XP 64bit edition for the AMD64 (x86-64) architecture.
What you say is true, if the only improvement of AMD64 is 64-bit support. However, AMD64 also doubles the number of general-purpose and XMM (for SSE, SSE2) registers to 16 of each. This will make many programs run faster, as having 8 general-purpose registers is just not enough. Far too much time is given to swapping data into and out of registers on x86.
The additional registers is really what I like about AMD64. I couldn't care less about 64bit for now.
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2
Sorry, but Hyper-Threading isn't really used to "take any advantage of the dualies". From the intel page: "Hyper-Threading Technology is a form of simultaneous multi-threading technology (SMT) where multiple threads of software applications can be run simultaneously on one processor" (emphasis mine)
So even for programs that don't need to use 64 bit math, moving them to the x86-64 platform can speed them up. It won't improve your typing speed in Word, but it can probably speed up most if not all your games if they are simply recompiled.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Basically, you're saying that this is an important incremental improvement over previous x86 processors. Which describes every new x86 processor right back to the 8088. So you have to ask: why did Intel abandon the incremental approach with the Itanium? It's locked them in as the dominant CPU maker since forever.
Perhaps somebody was bored with the whole Pentium architecture.
10-15 years ago, everyone else in the industry thought x86 was a dead end. Massive amounts of investement poured into RISC alternatives like Alpha and PPC.
Perhaps Intel believed the conventional wisdom and felt they had to eventually drop x86.
causing hit counters to go up artificially just to see 'next page' drives me nuts!
--
"It is now safe to switch off your computer."
And conventional wisdom was correct. They just underestimated the power of the entrenched software library. Intel processors since the Pentium Pro have basically been RISC cores with a x86->RISC translator in front. This allows them to ramp up the speed of the core, even change core architectures while still running all the old code. It costs at the fairly small cost of the gates needed for the translation frontend. It has another advantage in that CISC operations take up less room in cache so you get much better utilization out of your expensive cache resources. Intel started the Itanium project for two reasons, HP needed a new flagship chip and they are a large enough customer to sway Intel, and two they were tired of Cyrix and AMD copying their designs so they were going to make a tightly controlled architecture where EVERYTHING was covered by patents and copyright, that way they thought they could have the whole pie to themselves. What they didn't realize is that while they are a big player the only reason people keep using their chips is that they have maintained that backwards compatability path, throw that away and Intel is just another chip maker and others like IBM, Motorolla, etc may look better.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Quoting the AMD64 Architecture Programmer's Manual Volume 2: System Programming:
"The NX bit in the page-translation tables specifies whether instructions can be executed from the page."
So non-executable pages are already present in AMD64.
In Windows, you only get 2 GB of address space for your process (WinXP & expensive Win2K Server versions can give 3 GB, which helps). Into this address space is loaded your executable code (including all system DLLs) and your stack (by default 1 MB of address space is reserved for every thread), and these tend to be scattered around a bit, which breaks up the available address range considerably.
Now if your app needs to allocate large (200+ MB) areas of memory, how many of those do you think you can get from a 2 GB RAM machine? Not enough :-) In fact you may find that as little as 50-60% of your available RAM can be allocated into large chunks, and all the rest is only available as countless smaller fragments. The larger the contiguous RAM blocks you want, the less of them you can allocate.
With a 64 bit CPU, there's no more problem. The MMU can map scattered pages of your available physical RAM to any contiguous section of the massive 64 bit address range, and you can utilise all the RAM you have in any size chunk you wish :-)
Why would anyone engrave "Elbereth"?