RC4 Code Achieves 319 MB/s On AMD64 Opteron
Marc Bevand writes "This
recent paper
is about optimizing
RC4
for
AMD64
processors. A working implementation is
provided. Its encryption/decryption throughput
reaches 319 MB/s on a single AMD Opteron x44
processor running at 1.8 GHz. This makes it, as of today, the world's fastest RC4 symmetric cipher implementation for general purpose CPUs. As the author of this work, I would like to
point out that many CPU-hungry applications
have not been optimized for AMD64 yet.
In other words: such speedups can be expected
in other areas."
An anonymous reader adds some figures for the old implementation: "Opteron 244 1.8 GHz (32-bit) 163 MB/s; Opteron 244 1.8 GHz (64-bit) 135 MB/s."
I was initially disappointed with the performance of my Athlon64. CPU intensive 64bit code often seemed much slower than it's (heavily optimised) 32bit counterpart.
:)
Every now & then I come across some code optimised for 64bit processors, and it just flies - as more & more stuff gets the treatment, it will be like upgradingin for free
amd decides to provide a compiler for its chip, optimization will always be behind intel (who do. for linux also).
If all a machine is doing is encrypting, A64s and Opterons are a bit overkill. The VIA C3 C5P has an encryption engine that makes top-of-the-line processors look sad. I couldn't find results for RC4, but is a page from a review of the EPIA MII-12000 which shows AES results. First graph is EPIAs in software, second is a few Intel and AMD CPUs (software), and the MII-12000 in software (which gets creamed by the AXP 2500+ and the P4@2.4) and hardware (which totally obliterates everything).
Don't get me wrong it's good that code is optimised, but I think that RC4 would fly faster on an IA64 than an opteron if specifically optimised to take advantage of the CPU's features.
RC4 isn't really that relavent in real life as wep is crap & also easily done in hardware anyway.
The 64 bit advantage will suffer thesame fate as the 32bit advantage did for the 486, pentium & especially the Pentium Pro.
486 = 32bits, faster but people still bought 386's due to cost.
Pentium = 32bits, sometimes faster but again costs meant 486's stayed popular.
Pentium Pro = 32bit, 16 bit instrucations stalled it. WHen running pure 32bit code ran like the dogs, when running 16bit code (win 98) ran like a dog.
Problem is that your generally better off saving your cash, buying a cheap CPU (32bit in this case) and waiting for the 2nd/3rd Generation CPU. By that time prices will more reasonable and you will see the full advantages as programs will use the extra bits properly.
I mean come on MS still hasn't released a final AMD64 version of Winblows yet.
...to allow DRM encryption of movies to become standard :)
Anagram("United States of America") == "Dine out, taste a Mac, fries"
Who wants to optimize RC4 for the PowerPC G5 chip (64-bit implementation) and do a bake-off? Hand-coding PPC assembly doesn't sound as fun as this PHP I'm working on at the moment, so someone else will have to tackle that!
I'm holding out on the 64 bit systems until amd starts naming the chips commodore.
----
Squirrel
"I would like to point out that many CPU-hungry applications have not been optimized for AMD64 yet. In other words: such speedups can be expected in other areas."
well, maybe in some areas.
Since this is a cipher, it obviously helps a lot when you can work on 64-Bit chunks of data instead of 32-Bit.
The same speedup can probably be seen with applications that use numbers larger than 32b (or 64b for floats), since the number of operations necessary will essentially halve.
But other than that, I don't see much room for huge speedups.
That's good because is yet another pace in the direction when all information (http, smtp etc.) will travel encrypted (since today only some pages are served this way, because of the processor loads)
and because everytime we hear good about AMD we're happy:P
Everybody'll get TLS'ed
gtkaml.org
I wish that every software company would put optimization first and features second. This way, we would not have to buy computers every few years. They can potentially last much longer.
Will the optimized AMD64 rc4 code provide any boost to those crunching rc5 on an AMD64?
No, they're entirely different. For a start, RC4 is a stream cipher whereas RC5 is a block cipher. They just share the same inventor, hence the names.
AFAICR, the RC5 effort uses the register width to try and crack many keys in one go anyway - a different approach to this, which is using the register width to generate more of a single stream in one go.
If you really need speed, you can use RC4 securely but you have to know what you are doing and be aware of these attacks so you can employ protective countermeasures. Otherwise you are better off to use a cipher like AES which is actually secure.