If I Had a Hammer
adpowers writes: "Anandtech is running an article about their preview of AMD's Hammer. They had one machine running 32-bit Windows and the other running 64-bit Linux. The Linux machine had a 32 bit program and an identical program that was compiled for 64-bit processor support. Both processors were less than 30 days old and running without any crashes, but they weren't at full speed." We did one Hammer story a day or two ago, but there have been several more posted since then (wild guess: the NDA expired). Tom's Hardware has a story, so does Gamespot.
... I'm sure this will definately 'hammer' down Intel..
boom boom.
RIP Spike Milligan
"Never let the truth get in the way of a good story..."
Maybe I'm over simplifying it.
Did they have to add "option=nopentium" to the lilo boot parameterlist? :-)
(Seriously though, I hope they haven't left the extended paging bug in)
no sig error.
There's a lot of difference between 32 bit optimized code compiled for 64 Bit, and code written and optimized for 64 bit and compiled for 64 bit.
Applications need to be programmed and optimized to make use of the extra registers, extra info paths, extra instructions available on the new platform. Without that, the application speeds can't be compared, even though the base code and output is the same.
Let's take the example of some of the 1st. generation playstation II code...which was actually code written for a 32 Bit machines, on a different platform like the PC, or the old PSX, now..pure recompiling won't get you any major performance boost, so all the developers had to "re-do" the code to make use of the 128 bit emotion engine.
Exactly the reason why all these gamedev guys kept screaming it is much harder to code for the PS2 than for other platforms....one part of that whole hing is this...the other part is changing graphics APIs.
PCs is dirextx/opengl....and PS2 can be either custom renderers, or Open GL.
Put it in perspective....why don't 16 bit games re-compiled for 32 bit give a "major" performance boost...unless optimised code is included...??
RMS made up a song years ago, in anticipation of this processor's release:
If I had a hammer,
I'd throw it in the morning,
I'd throw it in the evening,
All over this land;
I'd throw it at Loki,
I'd throw it at Fen-rir,
I'd throw it at the war between
The Gods and the Giants,
All-ll over this land...
--RMS
There may be another verse or two but I've forgotten. Anyway the headline made me think of this. I always remember these words to the song instead of the regular ones.
The article suggests that AMD write / release native compilers that plug into Visual Studio...which would be a good thing for MS programmers.
.NET on Slashdot yesterday? Or was I hallucinating?
Simple enough to say.
I just wanted a lead-in for the following question:
Did anyone else see a banner ad for Visual Studio
Writers imply. Readers infer.
He is singing about his enormous GNU/COCK and how he considers only gods and entire nations worth of his GNU/COCK attention.
Not to mention the fact that current IA64 linux distribs
have the userland compiled for 32 bit.
heh
This is the job of the compiler... If I recompile source code I expect the compiler to optimise the object code in the best way for the target!
No, let's not. The PS2 was so radically different from the PS1 (I've coded both) that it amounted to an architecture change, not just a platform upgrade. The PS1 is a pretty much bog standard CPU+VRAM+DRAM device. The PS2 is a dataflow architecture, with the idea being to set up datastreams, (with the code to execute being part of the stream), and to target those streams with a firing-condition model. This is amazingly versatile (and the device has the bus bandwidth and DMA channels to handle it, the PC doesn't) but it is *very* *very* different from the standard way coding is done. This is why PS2 games are still getting better two years down the line...
Actually I don't think it's much harder at all, it's just different. You have 3 independent CPU's, all of which are pretty damn fast considering they're only at 300MHz. The device can do (peak) 3 billion (3,000,000,000) general purpose floating point multipliy/accumulates per second, and you can get pretty close to that figure, unlike most peak throughput estimates. Bandwidth again, and the use of an opportunistic programming methodology rather than a logical-progression methodology.
Having said that, I'm from a parallel computing background, so using only 3 CPU's is child's play
Because there's a much more quantifiable change in going from 16-bit to 32-bit. Developers had been hacking around the 16-bit limit using 'near' and 'far' pointers (!!), which meant all the cruft from those 16-bit days was still sticking around and causing problems if you just recompiled.
Now they're (at long last!) in the 32-bit arena, there's no such problems. A char* ptr is still a char* ptr, it now just has a greater domain. No cruft. No problems.
This isn't to say that compilers won't get better over time though - optimisation is an inexact science, and you'd hope to see improvements as compiler-writers see how to improve the optimising stage.
Enough...
Simon
Physicists get Hadrons!
You also compared the transition from x86 to x86-64 to the transition for PSX to PS2. That is also something very different. The PS2 is hard to code because the design of the graphic subsystem and vector cpus make it very fast on the one hand but also very hard to use the full potential. The PS2 CPUs also hard to use because the caches are too small.
When the 386 was introduces things like games were coded in assembler, at least the performance critical parts. Something that is coded in assembler can't be recompiled. Now even games are coded in high level languages.
Jan
Most likely the hair, skin, dirt, and blood that form nuclei for fibers to cling to.
Hey, you asked!
Nah, jetzt, du muss diese Schwein-untenMensch von Moderator nach Paris schicken sonst bist du ein Zigeuner !!!
Man, yes I can tell. So why you feel compelled to write this stuff and then get modded up to 4 is beyond me.
First of all you have the classic Umm, first of all it's hard enough to engineer a 64-bit CPU with related components. Face it, the 4Stack was made by one grad student.
Then you have the audacity to continue with the equally intellectual Then there is the manufactoring details, etc, etc. Guess what, manufacturing does not care about bitwidth. They care about layers, metallisation and more. Only inasfar as bitwidth requiring more interconnection which usually requires more metal layers does this have any impact.
Then finishing off with a perceived lack of economic benefit you truly complete the works of the terminally uninformed.
Yes, for GP CPU 32 and 64 bits are ok, for graphics, DSP and number crunching 128 bits can be required. And guess what, Cray has made 128 bit computers. I have rarely had the displeasure of reading such a pile of uninformed garbage. Even 5 minutes on Google would have shown it clearly even to someone who has not been involved in both design AND fabriction AND register level programming for 14 years.
Oh yes, as for the difficulties of programming 64 bit processors, have you heard of Linux? You have even failed to notice Linux has been ported to Itanium, SPARC and Hammer. Well done.
The reason for the transition from 32 to 64 bit cpu's is commercial: you need the extra address space for database useage and with 32 bits there is a practical limit of about 100K hits per minute on a (imagine a pipe this big!) I was told this by an IBM executive ~5 years ago. It's all about address space and internal bandwidth to run gigantic websites with a minimum of hardware. (There are a few number crunchers who will also find this useful, but the market for webservers is the justification for the $500M investment to bring a machine into production-half of that, or nore, is cpu design, validation and cost to bring to the threshhold of production.)
You're wrong in this.
:-) But my point here is that there is no change in the way you think - no change in the coding philosophy.
I have been working for SuSE on porting gcc and binutils for x86-64 for over a year now, and it has been pretty painless. After we had the basic system running, I ported a fullblown but small linux system to it (sysvinit, linux-utils, vim etc.) and the only thing I had to do was to make configure scripts grok the x86_64-unknown-linux architecture.
If you take a look at the design papers on x86-64.org or amd.com, you will find that the architecture is very easy to port to. It's basically an athlon with 64 bit adressing modes on top (very simplified way of looking at it). What AMD has done is to do the exact same transition that Intel did from i286 to i386 - 16 to 32 bit.
The new architecture is impressively easy to handle, and gcc can by now optimize almost as good for x86-64 as for i386. It's really just a matter of recompiling.
And if you don't want to do that, run the 32 bit binary. The x86-64 architecture includes running i386 binaries at native speed. This is no marketing crap, it really is the same as you would expect from an athlon.
Of course, if your application has assembler in it, you have to port this. But take a look at the docs again, and you'll feel very much at home there. Actually the extra registers will give you a warm fuzzy feeling inside
I appreciate your point, because for a lot of platform it would be true. But on this one it simply isn't.
Bo Thorsen,
SuSE Labs.
Obviously you're not aware of how the Athlon works, among other things.
Internally, it has many more registers than four. x86 instructions only reference four registers, but internally the Athlon uses it's full set to speed up the code, as well as exploiting several types of parallelism.
For higher level languages, it is even less of an issue. There may be some impact on my Java code as to whether "int" or "long" has faster operations, but I'll guarantee that all my code using "double" will fly. The best part is that I won't even have to recompile! =)
The other thing I'll gain is that all of my dynamic allocations will have much larger memory limits. The virtual memory limit per process for the first Linux port to Hammer is 511 GB.
299,792,458 m/s...not just a good idea, its the law!
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
Yes, the athlon has shadow registers, but that doesn't obviate the need for more physical registers.. The shadow registers are used to enable superscalar execution.. when a program needs to deal with more variables then registers are available they still need to spill data onto the stack (into ram) which is AMAZINGLY slow. If you have code that executes a tight loop on more data points then are available on stock x86 but less then whats available on x86-64, the speedup is very impressive.. I've seen maths codes execute with a 50 fold performance improvement after being recompiled for x86-64 on my reference box.
AMD has thrown its full weight behind Linux, according to "AMD Touts Linux Support for News Chips". Specifically, AMD supports efforts by SuSE to create updates of Linux for the x86-64.
Sun has taken a different approach of creating its own version of Linux for x86-64. This smells of hijacking (Linux).
"slight physics problems" is right; and how!
;-)
I'm very doubtful that 128-bit machines will *ever* be built; though only a fool would say they definitely won't be built, this early in the game.
32-bit CPU's still take large chunks of silicon, and their features are approaching 1E-7 meters in size. 64-bit machines will not be severely limited until they are trying to manage about 10 orders of magnitude (1E10 times - well over 2^32 times) more circuit elements. If circuits are still basically planar in physical layout, this implies circuit features approaching 1E-12 meters (1E-7 / sqrt( 1E10 ))...
Since silicon atoms are roughly 2.5E-10 meters across, there might be a slight problem with building circuit features this small.
Put another way, the realistic limit for further process shrink is about 2 more orders of magnitude (the circuits would be just a few atoms across) - only 4 more orders of magnitude in total number of circuit elements, not 10.
So I really have a hard time seeing how a computer built with *chips*, that is smaller than a skyscraper, would ever need more than 64 address bits.
-- Mike Greaves
every problem would look like a nail.
If I recompile source code I expect the compiler to optimise the object code in the best way for the target!
Your expectation and what happens in reality are two very different things. It is entirely possible to write high level code that is optimized, perhaps biased would be a better term, towards one architecture at the expense of another.
Much of this is unintentional. A problem can often be solved in many ways and sometimes a programmer tries out a couple of different solutions and picks the better performing one. Consider a 4x4 matrix multiply. One solution directly accesses array elements in memory, another preloads part of the array into temporary variables, one solution runs better on x86, the other tends to run better on RISC based systems.
They way you solve a problem in high level code is often effectively "hinting" to the compiler on how to generate code. We are still waiting for the mythical compiler that optimizes code in the "best way for the target".
Quote:
AMD (NYSE: AMD) today announced that SuSE Linux AG, one of the world's leading providers of the Linux operating system, has submitted enhancements to the official Linux kernel.
Read the rest here: http://www.amdzone.com/releaseview.cfm?ReleaseID=8 10
The Hammer has 8 32/64 bit shared registers, and 8 more 64 bit only registers. What if they had 8 32-bit registers and then 16 more 64 bit registers? This would provide more registers, and thus more speed. I imagine that it would be tougher to implement, but I think it would lay a better foundation. Anyone have any insights on this? Also, how do these registers differ from the ones added by the 3DNow/SSE instruction sets?
Also, I would imagine some 64 bit extentions have been added to the regular x86 instructions. For example, a 64 bit move instead of the typical 32 bit move, ect... Does anyone know whether there are any different instructions that make use of the 64 bits? Is there any way to link 32 bit data with a 32 bit instruction, ala Itanic?
What really will kick butt is the overall performance of the system with Hypertransport, the built in DDR controller, and the multiprocessing. I know I'm planning on building a 2-processor Clawhammer system at the end of 2003!
Too legit to quit!
This is not the best place to say it but... Windows isn't in the horizon and Penguins don't rule the market yet...
Second, its improvements are limited. More than the ordinary 32 vs 64 bit comparisson shows.
Quoting from AMD's overview:
64-bit flat virtual addressing.
This is good if you care about more than 4 GB memory, useless if you don't. That simples.
8 new general purpose registers (GPRs).
This is plain good. It will allow for faster and with more Instruction Level Paralelism code.
8 new registers for streaming SIMD extensions (SSE).
Same as above.
64-bit wide GRPs and instruction pointer.
You need this to support the 64 bit addressing.
Some aplications may also take advantage of the ability to manipulate 64-bit integers at once.
And thats it! Some migh noticed I didn't refer to the floating point numbers. The FPU, that is already capable of handling 32, 64 and 80-bit floats won't be extended. Same thing with MMX and SSE (with exception of the 8 new registers).
Even if they have an OS suporting long mode, how willing will software vendors be to put money in a x86-64 64-bit version of their software?
Remember a few years ago when all the windows uses were going from a system that had a 32 bit processor but were only utilizing it for 16 bit apps. The real test of time is how long does it take for 64 bit apps to be main stream.
I've got a problem. No it's not that my ass is getting ripped apart like Mr. Goatse.cx. That's never a problem. It's time for some more mellow sounds. So courtesy of the soundtrack to A Clockwork Orange is Singin' in the Rain.