RC4 Code Achieves 319 MB/s On AMD64 Opteron
Marc Bevand writes "This
recent paper
is about optimizing
RC4
for
AMD64
processors. A working implementation is
provided. Its encryption/decryption throughput
reaches 319 MB/s on a single AMD Opteron x44
processor running at 1.8 GHz. This makes it, as of today, the world's fastest RC4 symmetric cipher implementation for general purpose CPUs. As the author of this work, I would like to
point out that many CPU-hungry applications
have not been optimized for AMD64 yet.
In other words: such speedups can be expected
in other areas."
An anonymous reader adds some figures for the old implementation: "Opteron 244 1.8 GHz (32-bit) 163 MB/s; Opteron 244 1.8 GHz (64-bit) 135 MB/s."
Who wants to optimize RC4 for the PowerPC G5 chip (64-bit implementation) and do a bake-off? Hand-coding PPC assembly doesn't sound as fun as this PHP I'm working on at the moment, so someone else will have to tackle that!
That's good because is yet another pace in the direction when all information (http, smtp etc.) will travel encrypted (since today only some pages are served this way, because of the processor loads)
and because everytime we hear good about AMD we're happy:P
Everybody'll get TLS'ed
gtkaml.org
AFAIK, the VIA's *only* do AES, as they're designed to make good VPN endpoints. This is cos some hefty AES subroutines are built into the hardware (with software drivers doing the rest).
So whilst this is all very handy, if you want encryption other than AES (which, if there were ever any significant flaws found in AES' maths, is a certainty) you'd want to dump those VIA boards and get yourself either a dedicated encryption device like an Encipher box (like an expensive version of the VIA) or just a beast of a machine to do encryption entirely in software (like an Opteron).
I personally shunt everything through DSA stunnels, so a VIA isn't much use to me.
Moderation Total: -1 Troll, +3 Goat
I just bought a new PC, and when compaired to all the available options, the the AMD64 option (I got an AMD64 2800+) was best. Slightly more expensive than the equivalent XP, cheaper than the p4. And they run so cool, its the first PC I've had in years where I don't have to worry about the temperature. When I bought an XP 2600+ last year, I spent almost half the chips price again on cooling.
Just because I'm running a 32bit win XP on it doesn't make it a bad purchase.
Also, I'm one of those people who bought a 386 instead of a 486 (then later a 486 instead of a pentium 1) because of the price difference. The price difference nowadays is nowhere near comparable to what it was then.
East Coast Brewers
Will the optimized AMD64 rc4 code provide any boost to those crunching rc5 on an AMD64?
No, they're entirely different. For a start, RC4 is a stream cipher whereas RC5 is a block cipher. They just share the same inventor, hence the names.
AFAICR, the RC5 effort uses the register width to try and crack many keys in one go anyway - a different approach to this, which is using the register width to generate more of a single stream in one go.
Dare I say that the fact that Intel produces a kick-ass compiler (for certain tasks anyway) has nothing to do with the fact that the same company produces CPUs. BTW - currently (AFAIK) the compiler is developed in Russia whereas chips design is done at "traditional" sites.
PS: Oh, of course, Intel compiler won't ever support 3dnow, but that's the issue with sponsorship. I mean - AMD don't have to design the compiler themselves. They will be equally ok with sponsoring someone who knows how to do that.
I am the author of the paper.
You know, so *few* CPU-hungry apps are AMD64-optimized that it is almost shocking to see this unused CPU power... I *strongly* believe that such speedups (about 2 times faster) can be achieved in many areas such as video encoding, checksumming algorithms, etc. Servers and workstations will be the first ones to benefit from such optimizations.