RC4 Code Achieves 319 MB/s On AMD64 Opteron
Marc Bevand writes "This
recent paper
is about optimizing
RC4
for
AMD64
processors. A working implementation is
provided. Its encryption/decryption throughput
reaches 319 MB/s on a single AMD Opteron x44
processor running at 1.8 GHz. This makes it, as of today, the world's fastest RC4 symmetric cipher implementation for general purpose CPUs. As the author of this work, I would like to
point out that many CPU-hungry applications
have not been optimized for AMD64 yet.
In other words: such speedups can be expected
in other areas."
An anonymous reader adds some figures for the old implementation: "Opteron 244 1.8 GHz (32-bit) 163 MB/s; Opteron 244 1.8 GHz (64-bit) 135 MB/s."
amd decides to provide a compiler for its chip, optimization will always be behind intel (who do. for linux also).
"I would like to point out that many CPU-hungry applications have not been optimized for AMD64 yet. In other words: such speedups can be expected in other areas."
well, maybe in some areas.
Since this is a cipher, it obviously helps a lot when you can work on 64-Bit chunks of data instead of 32-Bit.
The same speedup can probably be seen with applications that use numbers larger than 32b (or 64b for floats), since the number of operations necessary will essentially halve.
But other than that, I don't see much room for huge speedups.
I wish that every software company would put optimization first and features second. This way, we would not have to buy computers every few years. They can potentially last much longer.
I dont know why everyone jumps off the horse as soon as they hear the magic word "assembly".
Seriously.
If you want to get 110% out of your hardware, you have to put effort in, to get effort out. Makes sense, doesnt it ?
Im not saying people who dont like ASM are sissies, not at all. But Im saying that assembly has its right, just as so many other programing languages.
Powerful is he who overpowers his temptations.
But when other projects beckon that don't require assembler work, I'm not about to jump on one that does for "fun" either ;)
I'm not sure why you think that IA-64 would outperform AMD64. For those who don't know, IA-64 refers to Intel's VLIW instruction set that is used with the Itanium. RC4 generally is an integer type application, which the Opteron usually does better in (according to the SPEC results).
Itanium does really well on encryption in general. Hand-optimized code makes good use of the large register set, the modulo-scheduling of loops and powerful bit manipulation primitives.
IIRC Itanium hold the top stop in SpecSSL for a while (don't know where it stands currently, I don't think the numbers are current).
In fact, the only time Itanium does well in anything is when it has 6MB of L2 cache
Stop drinking the AMD coolaid.