RC4 Code Achieves 319 MB/s On AMD64 Opteron
Marc Bevand writes "This
recent paper
is about optimizing
RC4
for
AMD64
processors. A working implementation is
provided. Its encryption/decryption throughput
reaches 319 MB/s on a single AMD Opteron x44
processor running at 1.8 GHz. This makes it, as of today, the world's fastest RC4 symmetric cipher implementation for general purpose CPUs. As the author of this work, I would like to
point out that many CPU-hungry applications
have not been optimized for AMD64 yet.
In other words: such speedups can be expected
in other areas."
An anonymous reader adds some figures for the old implementation: "Opteron 244 1.8 GHz (32-bit) 163 MB/s; Opteron 244 1.8 GHz (64-bit) 135 MB/s."
amd decides to provide a compiler for its chip, optimization will always be behind intel (who do. for linux also).
"I would like to point out that many CPU-hungry applications have not been optimized for AMD64 yet. In other words: such speedups can be expected in other areas."
well, maybe in some areas.
Since this is a cipher, it obviously helps a lot when you can work on 64-Bit chunks of data instead of 32-Bit.
The same speedup can probably be seen with applications that use numbers larger than 32b (or 64b for floats), since the number of operations necessary will essentially halve.
But other than that, I don't see much room for huge speedups.
I wish that every software company would put optimization first and features second. This way, we would not have to buy computers every few years. They can potentially last much longer.