SuSE Submits Enhancements for AMD Hammer

← Back to Stories (view on slashdot.org)

SuSE Submits Enhancements for AMD Hammer

Posted by ryuzaki0 on Saturday March 2, 2002 @02:28AM from the getting-the-features-in dept.

ackthpt writes "SuSE has this press release as they are submitting enhancements to the Linux kernal particular to the AMD's x86-64 processor instruction set. Anticipated for 2.6 kernel, some enhancements may appear in 2.4, as development is only beginning on 2.5. AMD's take on the announcement as well.". nik notes that SuSE join NetBSD in having ports to Hammer. Usenix members can see the paper Wasabi's Frank van der Linden wrote about the porting effort.

7 of 57 comments (clear)

Min score:

Reason:

Sort:

great, but what about GCC? by Khopesh · 2002-03-02 02:37 · Score: 5, Interesting

this is truely a great move in the right direction, but we also need to see something like a gcc support and optimization for this new architecture. AMD, please: you are the expert on your chips. As Intel made it's own free compiler, so too can you. Ideally, release your compiler via MIT-License, LGPL, GPL, or something similar, and releasing an optimization for GCC would blow my mind.

--
Use my userscript to add story images to Slashdot. There's no going back.
1. Re:great, but what about GCC? by Ace+Rimmer · 2002-03-02 02:46 · Score: 5, Interesting
  
  There are some people in SuSE working on gcc Hammer optimizations this is a part of the contract between AMD and SuSE.
  
  --
  :wq
It's being done!! :) by Daath · 2002-03-02 02:41 · Score: 3, Interesting

FreeBSD is working on an x86-64 GCC! Actually AMD itself has sponsored this! Take a look at the link!

--
Any technology distinguishable from magic, is insufficiently advanced.
i sniff a server market takeover .. by Anonymous Coward · 2002-03-02 02:55 · Score: 1, Interesting

because intel put their itanium 64bit egg in the windows xp64 basket.
Freedom to Innovate by Paul+the+Bold · 2002-03-02 06:25 · Score: 5, Interesting

You will recall that when AMD demoed hammer recently, they showed a 32-bit Windows system and a 64-bit Linux system. People were commenting on AMD preferring Linux over Windows, therefore showing a more powerful Linux demo than a Windows demo.

The truth is that there is not a 64-bit version of Windows for the Hammer. AMD was able to modify the existing Linux code to create their own 64-bit version of Linux. This is the best example of the freedom granted by the GPL that I have seen in months. AMD is releasing a new product at the end of the year, and they are able to create a demand for it NOW by having software for it NOW.

Do you remember the lag between the introduction of Intel's Itanium and a Windows version for Itanium? It was not well coordinated. AMD has done the opposite, they created a demand and a use several months before the release, and it's working. We are all drooling over a 64-bit architecture, and we will have 6-8 months to think about (and save up for) the purchase of a Hammer.

This is the freedom to innovate that is granted by the GPL and denied by the MS EULA. GPLed software is going to make AMD some money.

I feel all warm and fuzzy inside.
Open Source software vital to hammer success by AZPhysics · 2002-03-02 08:21 · Score: 3, Interesting

While Hammer will fly at 32 bit code, the 64 bit code will really differentiate the proccessor. Two-way clawhammer Beowulfs should be a huge business. But, the differentiation will really not show on Windows until (unless) they develop a x86-64 bit windows. I wouldn't count on them doing that until Intel comes out with their version of x86-64. (note that I didn't say if). There will be great pressure to recompile and reoptimize Open software to take advantage of the Hammer.
I think this is a wonderful advancement. I run Suse on an athlon now, and will run suse on a dual hammer in probably a year in a half (I can't afford to be bleeding edge). I can't find many optimizations for the Athlon in compilers and such. However, with the Hammer, the optimizations will be out there. Not only will the compilers have flags, but entire distributions will likely be built with re-compiled applications. That would be something I would pay more for.
Doing SIMD without SIMD hardware is possible by pslam · 2002-03-03 04:47 · Score: 2, Interesting

Ignore the other replies - it is possible to do this, and it definitely is a speed increase. See the example code below. You just have to be careful about the packing arrangement of data in each word, and the overlap when performing operations on them.
Multiplication is a bad example, but it is possible to multiply several numbers at the same time by one or more coefficients. This usually isn't worth it unless the numbers are very small compared to the word size - e.g 4 bits vs 32 bits.
However - there are a lot of operations which can be dramatically improved by packing data without any extra SIMD hardware. For example, you can perform some tricks with bit shifting to do pixel masking 32 bits (or 64!) at a time. You can do addition/subtraction trivially with the only thing to watch out for being the carry.
Whether it's worth it is a case-by-case decision. Sometimes the packing/unpacking/carry correction takes longer than the performance gain.
And here's an example where there's definitely a performance increase! I've used the code below to do motion blur in the past. It's slower than using MMX, but not by much. I wrote it so long ago I don't have any comparitive figures though.
unsigned *bufin = (unsigned *) buffer;
unsigned *bufout = (unsigned *) motionbuf;
unsigned mask1 = 0xfcfcfcfc;
unsigned mask2 = 0xfefefefe;
for(unsigned n = (width * height) >> 2; n; n--) {

unsigned in = *bufin++;
unsigned out = *bufout;
in &= mask1;
in >>= 2;
out &= mask2;
out >>= 1;
out += (out & mask2) >> 1;
*bufout++ = in + out;

}

The idea here is that the framebuffer persists the image. The input and output buffers are 8 bits per primary. Now, you could do this a single byte at a time, but that would suck for speed. Instead, 4 bytes are computed at once. The formula for each output byte is based on:
out = (out * 3 + in) / 4
This is actually performed here slightly less accurately:
out = out / 2 + out / 4 + in / 4
I remove some of the visible artifacts in practise by a post-processing stage where 1 bit of noise is added.
The bit masks are applied to prevent the shifts "leaking" into the next byte in the word. Now, on the topic of 64bit - the above can be performed on 64bit words with no performance loss. This means it goes twice as fast. Although you'd be silly to do this on an architecture with SIMD instructions designed to do exactly this job.
On architectures without SIMD, tricks like this can give you several times speed increase. If anyone's interested in any other tricks I can pull some code onto a web page somewhere.