RC4 Code Achieves 319 MB/s On AMD64 Opteron
Marc Bevand writes "This
recent paper
is about optimizing
RC4
for
AMD64
processors. A working implementation is
provided. Its encryption/decryption throughput
reaches 319 MB/s on a single AMD Opteron x44
processor running at 1.8 GHz. This makes it, as of today, the world's fastest RC4 symmetric cipher implementation for general purpose CPUs. As the author of this work, I would like to
point out that many CPU-hungry applications
have not been optimized for AMD64 yet.
In other words: such speedups can be expected
in other areas."
An anonymous reader adds some figures for the old implementation: "Opteron 244 1.8 GHz (32-bit) 163 MB/s; Opteron 244 1.8 GHz (64-bit) 135 MB/s."
I was initially disappointed with the performance of my Athlon64. CPU intensive 64bit code often seemed much slower than it's (heavily optimised) 32bit counterpart.
:)
Every now & then I come across some code optimised for 64bit processors, and it just flies - as more & more stuff gets the treatment, it will be like upgradingin for free
Sorry it's not immediately obvious to me. Who are they?
AFAICR AMD paid SuSE to do the original work. I think the main developers were Jan Hubicka, the current x86-64 maintainer, and Andreas Jaeger. SuSE have a few more well-known GCC contributors: look at MAINTAINERS.
Will the optimized AMD64 rc4 code provide any boost to those crunching rc5 on an AMD64?
No, they're entirely different. For a start, RC4 is a stream cipher whereas RC5 is a block cipher. They just share the same inventor, hence the names.
AFAICR, the RC5 effort uses the register width to try and crack many keys in one go anyway - a different approach to this, which is using the register width to generate more of a single stream in one go.