If I Had a Hammer
adpowers writes: "Anandtech is running an article about their preview of AMD's Hammer. They had one machine running 32-bit Windows and the other running 64-bit Linux. The Linux machine had a 32 bit program and an identical program that was compiled for 64-bit processor support. Both processors were less than 30 days old and running without any crashes, but they weren't at full speed." We did one Hammer story a day or two ago, but there have been several more posted since then (wild guess: the NDA expired). Tom's Hardware has a story, so does Gamespot.
Maybe I'm over simplifying it.
There's a lot of difference between 32 bit optimized code compiled for 64 Bit, and code written and optimized for 64 bit and compiled for 64 bit.
Applications need to be programmed and optimized to make use of the extra registers, extra info paths, extra instructions available on the new platform. Without that, the application speeds can't be compared, even though the base code and output is the same.
Let's take the example of some of the 1st. generation playstation II code...which was actually code written for a 32 Bit machines, on a different platform like the PC, or the old PSX, now..pure recompiling won't get you any major performance boost, so all the developers had to "re-do" the code to make use of the 128 bit emotion engine.
Exactly the reason why all these gamedev guys kept screaming it is much harder to code for the PS2 than for other platforms....one part of that whole hing is this...the other part is changing graphics APIs.
PCs is dirextx/opengl....and PS2 can be either custom renderers, or Open GL.
Put it in perspective....why don't 16 bit games re-compiled for 32 bit give a "major" performance boost...unless optimised code is included...??
The article suggests that AMD write / release native compilers that plug into Visual Studio...which would be a good thing for MS programmers.
.NET on Slashdot yesterday? Or was I hallucinating?
Simple enough to say.
I just wanted a lead-in for the following question:
Did anyone else see a banner ad for Visual Studio
Writers imply. Readers infer.
The extended paging bug wasn't a simple cpu bug, it was a complex bug between CPU, chipset and videocard. Because the Hammer has a very different i/o architecture compared to the current athlon, the parts of the cpu & chipset that caused the bug should be new designs anyway.
AGP seems to be a problem on the first sample as all of the demonstration system were running without AGP videocards.
Jan
This is the job of the compiler... If I recompile source code I expect the compiler to optimise the object code in the best way for the target!
No, let's not. The PS2 was so radically different from the PS1 (I've coded both) that it amounted to an architecture change, not just a platform upgrade. The PS1 is a pretty much bog standard CPU+VRAM+DRAM device. The PS2 is a dataflow architecture, with the idea being to set up datastreams, (with the code to execute being part of the stream), and to target those streams with a firing-condition model. This is amazingly versatile (and the device has the bus bandwidth and DMA channels to handle it, the PC doesn't) but it is *very* *very* different from the standard way coding is done. This is why PS2 games are still getting better two years down the line...
Actually I don't think it's much harder at all, it's just different. You have 3 independent CPU's, all of which are pretty damn fast considering they're only at 300MHz. The device can do (peak) 3 billion (3,000,000,000) general purpose floating point multipliy/accumulates per second, and you can get pretty close to that figure, unlike most peak throughput estimates. Bandwidth again, and the use of an opportunistic programming methodology rather than a logical-progression methodology.
Having said that, I'm from a parallel computing background, so using only 3 CPU's is child's play
Because there's a much more quantifiable change in going from 16-bit to 32-bit. Developers had been hacking around the 16-bit limit using 'near' and 'far' pointers (!!), which meant all the cruft from those 16-bit days was still sticking around and causing problems if you just recompiled.
Now they're (at long last!) in the 32-bit arena, there's no such problems. A char* ptr is still a char* ptr, it now just has a greater domain. No cruft. No problems.
This isn't to say that compilers won't get better over time though - optimisation is an inexact science, and you'd hope to see improvements as compiler-writers see how to improve the optimising stage.
Enough...
Simon
Physicists get Hadrons!
You also compared the transition from x86 to x86-64 to the transition for PSX to PS2. That is also something very different. The PS2 is hard to code because the design of the graphic subsystem and vector cpus make it very fast on the one hand but also very hard to use the full potential. The PS2 CPUs also hard to use because the caches are too small.
When the 386 was introduces things like games were coded in assembler, at least the performance critical parts. Something that is coded in assembler can't be recompiled. Now even games are coded in high level languages.
Jan
You're wrong in this.
:-) But my point here is that there is no change in the way you think - no change in the coding philosophy.
I have been working for SuSE on porting gcc and binutils for x86-64 for over a year now, and it has been pretty painless. After we had the basic system running, I ported a fullblown but small linux system to it (sysvinit, linux-utils, vim etc.) and the only thing I had to do was to make configure scripts grok the x86_64-unknown-linux architecture.
If you take a look at the design papers on x86-64.org or amd.com, you will find that the architecture is very easy to port to. It's basically an athlon with 64 bit adressing modes on top (very simplified way of looking at it). What AMD has done is to do the exact same transition that Intel did from i286 to i386 - 16 to 32 bit.
The new architecture is impressively easy to handle, and gcc can by now optimize almost as good for x86-64 as for i386. It's really just a matter of recompiling.
And if you don't want to do that, run the 32 bit binary. The x86-64 architecture includes running i386 binaries at native speed. This is no marketing crap, it really is the same as you would expect from an athlon.
Of course, if your application has assembler in it, you have to port this. But take a look at the docs again, and you'll feel very much at home there. Actually the extra registers will give you a warm fuzzy feeling inside
I appreciate your point, because for a lot of platform it would be true. But on this one it simply isn't.
Bo Thorsen,
SuSE Labs.
Since that bug is already fixed on current Athlons, I seriously doubt it'll be a problem with Hammer.
299,792,458 m/s...not just a good idea, its the law!
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
Obviously you're not aware of how the Athlon works, among other things.
Internally, it has many more registers than four. x86 instructions only reference four registers, but internally the Athlon uses it's full set to speed up the code, as well as exploiting several types of parallelism.
For higher level languages, it is even less of an issue. There may be some impact on my Java code as to whether "int" or "long" has faster operations, but I'll guarantee that all my code using "double" will fly. The best part is that I won't even have to recompile! =)
The other thing I'll gain is that all of my dynamic allocations will have much larger memory limits. The virtual memory limit per process for the first Linux port to Hammer is 511 GB.
299,792,458 m/s...not just a good idea, its the law!
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
"slight physics problems" is right; and how!
;-)
I'm very doubtful that 128-bit machines will *ever* be built; though only a fool would say they definitely won't be built, this early in the game.
32-bit CPU's still take large chunks of silicon, and their features are approaching 1E-7 meters in size. 64-bit machines will not be severely limited until they are trying to manage about 10 orders of magnitude (1E10 times - well over 2^32 times) more circuit elements. If circuits are still basically planar in physical layout, this implies circuit features approaching 1E-12 meters (1E-7 / sqrt( 1E10 ))...
Since silicon atoms are roughly 2.5E-10 meters across, there might be a slight problem with building circuit features this small.
Put another way, the realistic limit for further process shrink is about 2 more orders of magnitude (the circuits would be just a few atoms across) - only 4 more orders of magnitude in total number of circuit elements, not 10.
So I really have a hard time seeing how a computer built with *chips*, that is smaller than a skyscraper, would ever need more than 64 address bits.
-- Mike Greaves
Yes, the Athlon extracts parallelism from the incoming instruction stream.
The point is simply that there is more than one way to skin a cat. No one ever would have thought that x86 would scale so well or be so competitive with RISC. (Alpha was/is great, and I'll never understand why DEC/Compaq dropped the ball so badly there...)
when a program needs to deal with more variables then registers are available they still need to spill data onto the stack (into ram) which is AMAZINGLY slow.
Don't you mean L1 cache?
Your reference box, eh? Since you're breaking your NDA just to tell us you have one (I presume), why don't you enlighten us as to the clock speed on that puppy? ;-)
Regardless, a 50-fold improvement in calculation speed is unlikely to be the result of additional registers... It would more likely be the SIMD type instructions, which use additional registers even in IA32.
299,792,458 m/s...not just a good idea, its the law!
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
Quote:
AMD (NYSE: AMD) today announced that SuSE Linux AG, one of the world's leading providers of the Linux operating system, has submitted enhancements to the official Linux kernel.
Read the rest here: http://www.amdzone.com/releaseview.cfm?ReleaseID=8 10