AMD64 Preview
Araxen writes "Over at Anandtech.com they have an interesting preview of AMD's 64 bit processor on a Nforce3 mobo. The results are very impressive with the Anthlon64 beating out Intel's P4 best processor soundly in their gaming benchmarks. This was only in 32-bit mode no less! I can't wait for 64-bit benchmarks come out!"
The benchmarks are from a 2ghz Opteron, not an Athlon 64. It is intended to give an example of the performance from the new chip. Unfortunately, upon introduction, only the Athlon FX, running on ECC memory will be capable of using dual-channel memory. And from what I've heard, this cpu will cost in the vicinity of $600+. The first non-ECC dual-channel platform will be introduced in 2004.
Anandtech is only comparing single processor Opteron performance against everything else, no infact, Athlon64 performance. The primary difference is that the Opteron has a dual channel memory subsystem, whereas the Athlon64 has a single channel system. This difference will have an affect on performance.
The Doormat
If you're not outraged, then you're not paying attention.
Huh? There's no such thing as an "Intel x86-64" processor. x86-64 is AMD's solely implementation.
If a job's not worth doing, it's not worth doing right.
I actually read this this morning, and there are a couple of important things to note - the chip being 'previewed' isn't actually an Athlon64 - it's a 1.8GHz Opteron overclocked to 2.0GHz, which is the expected clock rate of the first A64, prorated at 3200+. It'll give us an idea of what to expect, but nothing too specific.
The other important thing to note is that the comparisons were mostly against P4s and an Athlon XP, with a Dual 3.06GHz Xeon thrown in for good measure, all 32 bit chips. And the 'Athlon64' owned most of the competitions, showing that its 32 bit mode is just as good as rumored. There were no Itaniums in the competition since, so only 32 bit modes can be compared here. However, if the A64 turns out to be as good in its native 64 bit mode as the 32 bit number might lead you to believe, the Athlon 64 looks like it very well could be a force to be dealt with.
Intel doesn't have an x86-64 line of processors. They have an IA64 line of processors.
The two apparently aren't interchangable. There's a coming battle in which software companies have to choose between the two, or support both, which would be tough on both them and consumers.
Apparently, AMD's x86-64 set is easier to deal with, and more of a natural progression from where the processors are now. (It also apparently runs 32-bit code at rates comparable to 32-bit chips at the same clock speed.) Intel's IA-64 is a total reworking, and a bitch to work with, from what I've read.
In the end, it seems like the smart choice would be for everybody to toss their hat in with x86-64 (which means Intel would have to, as well, and essentially concede defeat and lose face); it probably won't happen, though, because Intel is Intel.
Check out this article at the Inquirer, which I've basically just paraphrased, but it does go into some interesting Windows 64 dealings.
Before anybody starts talking about how little 64bit cpu's actually increase performance, let me tell everyone what 64 bit mode will actually bring to the table over the Opteron/Athlon64 32 bit modes:
1) more registers. This will get us fair performance increase from the start, as compilers will have more registers to work with when doing calculations on multiple pieces of data.
2) support for larger system memory sizes. This won't help you in video games, but it will help you doing high end photoshop, and other applications (provided you spend the money to get more memory put into your system)
3) native operations on 64 bit data. Typically, when someone wants to do operations on a 64 bit integer in a 32 bit CPU, you have to split up the work in software. Now with 64 bit registers, you will be able to do operations on 64 bit integers in the same time as it takes to do the same operation on a 32 bit integer.
4) when using native 64 bit mode, certain legacy instructions of x86-32 are depreciated. This is a cleanup for the x86 ISA, which in the past has contained literaly EVERYTHING that the previous generation of CPU supported. AMD's x86-64 ISA eliminates these legacy features and moves them into firmware emulation (don't worry, it won't degrade any modern 32 bit code, just terribly outdated stuff from the 386 days, which doesn't need 2GHz of power in the first place)
On top of these performance enhancements that 64 bit mode brings you, you get all of this just because you are using AMD's Opteron/Athlon64 CPU:
1) Dual channel DDR Memory interface, with memory controller on the die of the CPU. This reduces latency and improves memory bandwidth so dramatically that even Intel's off die memory controller can't keep up (this is why video games are so much faster on the amd64 platform than on athlon-32 platform)
2) HyperTransport bus to the south bridge, which will give high bandwidth access to the PCI bus, PCI-X, and other IO intensive controllers. Eventually AGP slots will be phased out for PCI-X slots which will be universal for both video, and other devices.
3) when using multiple CPU's in the same system, the new AMD-64 platform gives you dedicated memory bandwidth to each CPU installed. On the intel and athlon-32 platforms, all the CPU's in the system shared the same memory controller which runs either single or dual channel DDR anywhere from 266MHz - 400MHz.
Two infinite things: your stupidity and mine. But I'm not sure about the latter. If my sig offends you, I'm sorry.
If you had RTFA, you would know that the benchmarks compared the Athlon64 against Pentium 4s and Xeons, not against IA64. What the benchmarks show is that the 32-bit performance of the Athlon64 is on par or better than the best Pentium 4 processors, and is better than the current Xeons. IA64 is not benchmarked in the article.
The 64-bit performance of the Athlon64 is not being benchmarked in the article; it is the 32-bit performance relative to leading 32-bit processors that is the issue.
Prescott with PNI new instructions, 1Mb L2 cache clocking up to 4GHz and beyond, 800MHz front side bus and increased software support for Hyperthreading. (eg. 2.6.x Linux kernels know how to do HT scheduling much more efficiently)
Watch the Xmas benchmarks, that's when it matters...
How the frell did this get modded up? Please RTFA before commenting/modding.
The benchmark was against a P4 (as well as a dual Xeon), which runs IA-32 natively, not the Italium.
The A64 is a consumer chip, designed to be purchased and used by consumers. The Itanium processor costs more than a whole top of the line consumer computer. The A64 and the Italium are not targeted at the same market segment and neither is the Opteron, which is supposed to go up against the Xeon.
The reason everyone is looking forward to a benchmark of an A64 running a native 64-bit application on a 64-bit OS is that not only is X86-64 considerably cleaner than IA-32, but the A64 also has two times as many SSE2 and General Purpose registers, which should yield significantly better results than the A64 running in 32-bit mode (which is already outperforming the P4 in a lot of benchmarks).
By the way, before someone points out that the benchmarked processor is an overclocked Opteron and not an A64, AMD is currently planning on releasing a version of the A64 which is just a rebranded Opteron 1xx along with the single-channel version of the A64.
Uttering logically derived and empirically supported truths to the disciples of the orthodox establishment.
Just to set some things straight:
- Itanium, Intel's 64-bit chip, uses a totally different architecture (EPIC) from the current Pentium x86 line of chips. This architecture is NOT compatible with x86, so that effectively you need a recompile for existing software work on Itanium. There is an EMULATION mode for x86 in Itanium, which is absolutely unusable according to various sources on the Net. You will DEFINITELY not want to run a game on it. Finally, prices for a low-end 1.0Ghz Itanium chip start at approx $800.
- AMD's Opteron/Athlon64 chips are compatible with everything you are running right now at 32 bits. You can install a complete 32-bit operating system in it, and everything will run just as today, albeit a little bit faster. There is no need for an "emulator". And, of course, you can already use Linux at full-64 bits, available from SuSe, RedHat and Mandrake. Also, Microsoft will release a 64-bit version of XP at the end of the year.
Marcos
This would have been the case if IA-32 was a sane architecture. Athlon64 in IA-32 mode has only 8 visible general purpose registers, whereas it has 16 in 64-bit mode. That makes 64-bit mode a win in almost all cases. Technically it would have made sense for AMD to introduce a new 32-bit mode, but it would probably have been bad for marketing.
Finally! A year of moderation! Ready for 2019?
GamePC is running a first look of Windows XP 64bit edition for the AMD64 (x86-64) architecture.
Appro 4U Quad Opteron Server. That ought to contain one, don't you think?
What you say is true, if the only improvement of AMD64 is 64-bit support. However, AMD64 also doubles the number of general-purpose and XMM (for SSE, SSE2) registers to 16 of each. This will make many programs run faster, as having 8 general-purpose registers is just not enough. Far too much time is given to swapping data into and out of registers on x86.
The additional registers is really what I like about AMD64. I couldn't care less about 64bit for now.
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2
Sorry, but Hyper-Threading isn't really used to "take any advantage of the dualies". From the intel page: "Hyper-Threading Technology is a form of simultaneous multi-threading technology (SMT) where multiple threads of software applications can be run simultaneously on one processor" (emphasis mine)
So even for programs that don't need to use 64 bit math, moving them to the x86-64 platform can speed them up. It won't improve your typing speed in Word, but it can probably speed up most if not all your games if they are simply recompiled.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
However, what happens when the operating system does a context switch or some other exception occurs? The latency from saving the processor context is going to go way up as you have to save far more data to memory and then load the same large amount of data in for the new context.
:-), but it shouldn't! Did you even read the article? In most of the benchmarks, the Opteron was even faster than dual-Xeons (although I'm not sure the benchmarks were fully using the additional processor) I didn't see a "performance hit" anywhere in the benchmarks.
There is no "context-switch" delay. The processor takes exactly the same amount of time doing a context-switch at 64-bits than at 32-bits. Remember that the processor has to do a certain number of clocks per second, and it cannot "fall behind" or get delayed.
Now, if your programmers decide that they want to work on 64bit wide data instead of the 32bit they used to on the old system, you suddenly find that your processor is having to move double the amount of data around there system.. You have to hope that any increases in memory bandwidth the engineers included are enough to cater for this.
If you read the article, you will have noticed that Opteron has an integrated memory controller. In this case, it means the controller was moving data at 2.0Ghz. This adds up to significant increases in performance in the benchmarks, as could be seen by the article.
I think the main thing I'm trying to say is that 64bit computing isn't necessarily faster than 32 bit computing. Indeed, because some of the overheads can be double or quaduple, it can be a performance hit.
Absolutely true. It can be slower (just take a look at Itanium
Tyan and Arima already have dual motherboards out there. The Tyan K8W looks really nice for a workstation or high-end gaming machine. All 4P motherboards are not "available" per se, they're only should as complete systems. Check out Appro, Angstrom Computer or Racksaver for some 4P servers if you're looking for Opteron servers.
Not exactly.
Within the MMU look-up tables the memory pages can be marked as being executable or not. Hence, if a program tries to jump to memory in a protected page (ie. not marked as executable) it will cause an exception.
The current x86 MMU doesn't have this ability, unlike some processors such as the Sun UltraSPARC (though not any versions previous to this).
Agrajag: "Oh no, not again!"
Most of the slashdotters already pointed out the other important stuff...
But I'd like to point out that the Itanium will not be competition for the Opteron in most cases. Itaniums are super expensive chips that run on servers and are totaly incompatible with x86 (32 bits or 64 bits) software unless it's in emulation mode in which it runs very slowly. If you were to run Itanium on x86 software then more then likely the opterons would easily win anyway.
Hmmm... Pie...
causing hit counters to go up artificially just to see 'next page' drives me nuts!
--
"It is now safe to switch off your computer."
And conventional wisdom was correct. They just underestimated the power of the entrenched software library. Intel processors since the Pentium Pro have basically been RISC cores with a x86->RISC translator in front. This allows them to ramp up the speed of the core, even change core architectures while still running all the old code. It costs at the fairly small cost of the gates needed for the translation frontend. It has another advantage in that CISC operations take up less room in cache so you get much better utilization out of your expensive cache resources. Intel started the Itanium project for two reasons, HP needed a new flagship chip and they are a large enough customer to sway Intel, and two they were tired of Cyrix and AMD copying their designs so they were going to make a tightly controlled architecture where EVERYTHING was covered by patents and copyright, that way they thought they could have the whole pie to themselves. What they didn't realize is that while they are a big player the only reason people keep using their chips is that they have maintained that backwards compatability path, throw that away and Intel is just another chip maker and others like IBM, Motorolla, etc may look better.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Quoting the AMD64 Architecture Programmer's Manual Volume 2: System Programming:
"The NX bit in the page-translation tables specifies whether instructions can be executed from the page."
So non-executable pages are already present in AMD64.
It has to lok like its doing this, but doesn't have to do this :) A P4 has about 128 internal registers, and uses renaming hardware to present 8 to a given task. A given task may use more than 8 registers, where the CPU figures out ways to avoid spilling a register and doing a rename instead. Now, during a context switch, the CPU doesn't actually have to dump the full context of the processor out to memory. Most of the state gets buffered, either in the internal register file, or in one of the write queues. Also, I doubt modern processors flush the i-cache. The i-cache on the P4 is actually the trace cache, and flushing it would involve dumping about 8kb of traces that took a lot of work to make. In reality, its probably lazily replaced with new traces as the new process executes.
FYI> The big win with the AMD64 is not that the processor has more physical registers (it probably doesn't) but that its larger window of 16 GPRs enables the compiler's optimizer to do a much better job with register allocation.
A deep unwavering belief is a sure sign you're missing something...
If the AMD64 version of Windows XP 64-Bit is as stripped down as the current Intel version... then don't bother considering what performance would be like there anyway... check here for a list of things *NOT* included in XP 64-bit:
l t. asp?url=/technet/prodtechnol/winxppro/reskit/prka_ fea_tfiu.asp
http://www.microsoft.com/technet/treeview/defau
But I guess we can do without features like Media Player, POSIX Compliance, Power Management, Windows Installer, and more... I guess..... just to have a 64-bit OS...
-- If it ain't broke - overclock it more.
To the first order, power increase linearly with speed, squared with voltage. P=CFV^2