Slashdot Mirror


RC4 Code Achieves 319 MB/s On AMD64 Opteron

Marc Bevand writes "This recent paper is about optimizing RC4 for AMD64 processors. A working implementation is provided. Its encryption/decryption throughput reaches 319 MB/s on a single AMD Opteron x44 processor running at 1.8 GHz. This makes it, as of today, the world's fastest RC4 symmetric cipher implementation for general purpose CPUs. As the author of this work, I would like to point out that many CPU-hungry applications have not been optimized for AMD64 yet. In other words: such speedups can be expected in other areas." An anonymous reader adds some figures for the old implementation: "Opteron 244 1.8 GHz (32-bit) 163 MB/s; Opteron 244 1.8 GHz (64-bit) 135 MB/s."

36 of 177 comments (clear)

  1. Optimisation is definately the key by datajack · · Score: 5, Informative

    I was initially disappointed with the performance of my Athlon64. CPU intensive 64bit code often seemed much slower than it's (heavily optimised) 32bit counterpart.

    Every now & then I come across some code optimised for 64bit processors, and it just flies - as more & more stuff gets the treatment, it will be like upgradingin for free :)

    1. Re:Optimisation is definately the key by Savage-Rabbit · · Score: 5, Funny

      it will be like upgradingin for free :)

      Just don't get too excited. One of my coworkers made this same discovery a while back. Now he runs around the office wearing an "I love Opteron" T-Shirt and starts shouting"Intel is history - Power PC is dead!" everytime somebody mentions the words Opteron or AMD in a sentence. Worst of all he attacks anybody who disagrees and tries to bite them. We tried to knock him out with a dart gun after he savaged a visiting IBM sales rep but even heavy duty veterinary tranqulisers don't seem to have any effect.

      :-D

      --
      Only to idiots, are orders laws.
      -- Henning von Tresckow
    2. Re:Optimisation is definately the key by Anonymous Coward · · Score: 2, Funny
      Power PC is dead?

      What the hell is he smoking?

      I know. The attacking and biting I can look past, but saying that Power PC is dead is just nuts.
    3. Re:Optimisation is definately the key by youknowmewell · · Score: 2, Funny
      Opteron or AMD
      Intel is history - Power PC is dead!
  2. until by iamnotacrook · · Score: 4, Insightful

    amd decides to provide a compiler for its chip, optimization will always be behind intel (who do. for linux also).

    1. Re:until by isometrick · · Score: 4, Insightful

      I agree, to an extent. It's been said that Intel's compiler can outdo GCC in some performance benchmarks.

      GCC is no slouch though, and obviously Intel is performing some tricks that could also be implemented by GCC.

      I think it'd be a great move for AMD to work WITH GNU to optimize 64-bit AMD code from GCC.

      Seems like Intel is more prone to keeping secrets when it comes to processors. Maybe this is (yet another) way for AMD to give them a run for their money.

    2. Re:until by IWannaBeAnAC · · Score: 2, Informative

      Err, AMD have had developers working on optimizing GCC for quite a while now....

    3. Re:until by Gopal.V · · Score: 2

      ICC really sucks for compatibility ....

      For example, my code performs 5 times faster when compiled with gcc than when compiled with ICC ...

      Ok, maybe I'm a special case (I use computed GOTO). But you can't compile the kernel either :)

    4. Re:until by RupW · · Score: 5, Informative

      Sorry it's not immediately obvious to me. Who are they?

      AFAICR AMD paid SuSE to do the original work. I think the main developers were Jan Hubicka, the current x86-64 maintainer, and Andreas Jaeger. SuSE have a few more well-known GCC contributors: look at MAINTAINERS.

    5. Re:until by maxwell+demon · · Score: 2, Funny

      Quiche Eater!
      Real programmers can write FORTRAN in every language.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    6. Re:until by ceeam · · Score: 2, Interesting

      Dare I say that the fact that Intel produces a kick-ass compiler (for certain tasks anyway) has nothing to do with the fact that the same company produces CPUs. BTW - currently (AFAIK) the compiler is developed in Russia whereas chips design is done at "traditional" sites.
      PS: Oh, of course, Intel compiler won't ever support 3dnow, but that's the issue with sponsorship. I mean - AMD don't have to design the compiler themselves. They will be equally ok with sponsoring someone who knows how to do that.

  3. Somewhat OT, but... by bhtooefr · · Score: 4, Informative

    If all a machine is doing is encrypting, A64s and Opterons are a bit overkill. The VIA C3 C5P has an encryption engine that makes top-of-the-line processors look sad. I couldn't find results for RC4, but is a page from a review of the EPIA MII-12000 which shows AES results. First graph is EPIAs in software, second is a few Intel and AMD CPUs (software), and the MII-12000 in software (which gets creamed by the AXP 2500+ and the P4@2.4) and hardware (which totally obliterates everything).

    1. Re:Somewhat OT, but... by MrNemesis · · Score: 4, Interesting

      AFAIK, the VIA's *only* do AES, as they're designed to make good VPN endpoints. This is cos some hefty AES subroutines are built into the hardware (with software drivers doing the rest).

      So whilst this is all very handy, if you want encryption other than AES (which, if there were ever any significant flaws found in AES' maths, is a certainty) you'd want to dump those VIA boards and get yourself either a dedicated encryption device like an Encipher box (like an expensive version of the VIA) or just a beast of a machine to do encryption entirely in software (like an Opteron).

      I personally shunt everything through DSA stunnels, so a VIA isn't much use to me.

      --
      Moderation Total: -1 Troll, +3 Goat
    2. Re:Somewhat OT, but... by mczak · · Score: 5, Informative
      AFAIK, the VIA's *only* do AES, as they're designed to make good VPN endpoints. This is cos some hefty AES subroutines are built into the hardware (with software drivers doing the rest).
      True. VIA padlock (as they call it) can currently only do AES in hardware (and it can also generate true random numbers). The next VIA chip called C7 (C5J Esther) however should be able to also do SHA-1, SHA-256 and parts of RSA in hardware (I think it should be available first half of 2005). That's of course still a limited set of encryption algorithms, but it's certainly an improvement.
    3. Re:Somewhat OT, but... by swillden · · Score: 2, Informative

      The government agency that selected and approved of AES are the same ones who approved of DES, oh so many years ago.

      And the same ones who were apparently surprised when flaws were found in SHA-1, which they also selected and approved. And the same ones who developed the Law Enforcement Access Field (LEAF) for Clipper, which was quickly broken by Matt Blaze.

      Thirty years ago when the NSA fixed IBM's Lucifer, which became DES, the NSA clearly had a huge amount of cryptologic knowledge that the public research community did not. Indeed, there really wasn't a public research community. That has changed. Although it's certain that the NSA has some tricks the public community does not, and it's obvious that there's nothing the public community knows that the NSA does not, it appears that the gap has closed considerably, and that it's entirely feasible for people to find flaws in NSA-approved, and even NSA-created ciphers.

      AES/Rijndael concerns lots of cryptographers because it is, by the only measure we really have available, the weakest of all of the AES candidates. All of the candidates use multiple "rounds" of computation, and all of them have been broken for some number of rounds less than the full algorithm. The difference between the number of rounds in the full algorithm and the largest number of rounds that has been broken is called the "margin of security", and Rijndael has a very thin margin of security, meaning that existing attacks only have to be extended by a little bit to break the full algorithm.

      Does this mean it's likely to be broken? No one knows, and only time will tell. Contingency plans are a good thing if you really care about security, though.

      I think that alone means it deserves the benefit of the doubt.

      You're insufficiently paranoid. Or, more likely, you just don't have anything you really need to protect. That's most people, actually.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  4. Not worth the outlay at present by cheezemonkhai · · Score: 4, Informative

    Don't get me wrong it's good that code is optimised, but I think that RC4 would fly faster on an IA64 than an opteron if specifically optimised to take advantage of the CPU's features.

    RC4 isn't really that relavent in real life as wep is crap & also easily done in hardware anyway.

    The 64 bit advantage will suffer thesame fate as the 32bit advantage did for the 486, pentium & especially the Pentium Pro.

    486 = 32bits, faster but people still bought 386's due to cost.

    Pentium = 32bits, sometimes faster but again costs meant 486's stayed popular.

    Pentium Pro = 32bit, 16 bit instrucations stalled it. WHen running pure 32bit code ran like the dogs, when running 16bit code (win 98) ran like a dog.

    Problem is that your generally better off saving your cash, buying a cheap CPU (32bit in this case) and waiting for the 2nd/3rd Generation CPU. By that time prices will more reasonable and you will see the full advantages as programs will use the extra bits properly.

    I mean come on MS still hasn't released a final AMD64 version of Winblows yet.

    1. Re:Not worth the outlay at present by joib · · Score: 3, Informative


      486 = 32bits, faster but people still bought 386's due to cost.


      The 386 was also a 32-bit processor...

    2. Re:Not worth the outlay at present by DigitumDei · · Score: 4, Interesting

      I just bought a new PC, and when compaired to all the available options, the the AMD64 option (I got an AMD64 2800+) was best. Slightly more expensive than the equivalent XP, cheaper than the p4. And they run so cool, its the first PC I've had in years where I don't have to worry about the temperature. When I bought an XP 2600+ last year, I spent almost half the chips price again on cooling.

      Just because I'm running a 32bit win XP on it doesn't make it a bad purchase.

      Also, I'm one of those people who bought a 386 instead of a 486 (then later a 486 instead of a pentium 1) because of the price difference. The price difference nowadays is nowhere near comparable to what it was then.

    3. Re:Not worth the outlay at present by Bert64 · · Score: 4, Informative

      Actually, the majority of SSL websites are using RC4..
      If you use Mozilla and Apache, you can use 256-bit AES encryption for SSL (try loading up paypal with a mozilla based browser) but if either the server or client is microsoft-based your stuck with the much weaker 128bit RC4...
      MS - always behind the curve, no 256bit encryption, no 64bit os

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    4. Re:Not worth the outlay at present by nick-less · · Score: 2, Informative

      That's kinda vague. The 386 was a 32 bit processor with a 16bit data bus. It still could perform 32-bit arithmetic natively, but the bus was strangled.

      thats the 386SX you're taking about, the regular 386DX which came out before the SX was full 32bit..

    5. Re:Not worth the outlay at present by Sivar · · Score: 2
      Motorola on the other hand designed their 68000 was designed to be a 32bit chip from the get go, which I believe was first introduced in 1979 or so, at least according to my data book titled "break away from the past". Makes you wonder why anyone thought it was a good idea to use the 8086 for the PC.

      I've spoken to some of the people that made this decision for various companies (e.g. Raytheon). The general consensus was that the difference between the 68K and the x86 was "night and day", but that the Intel chips were available dirt cheap (for the time) in massive quantities, and that there were more developers that knew the architecture (at least in their specific situations). Thus, yet again, business logic overrode technical/engineering prowess, and most of us have been using the worst major CPU architecture ever designed because of it.
      (Not that I blame Intel, they didn't have any other architecture to learn their mistakes from).
      --
      Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
    6. Re:Not worth the outlay at present by Anonymous Coward · · Score: 2, Insightful

      I'm not sure why you think that IA-64 would outperform AMD64. For those who don't know, IA-64 refers to Intel's VLIW instruction set that is used with the Itanium. RC4 generally is an integer type application, which the Opteron usually does better in (according to the SPEC results).

      Itanium does really well on encryption in general. Hand-optimized code makes good use of the large register set, the modulo-scheduling of loops and powerful bit manipulation primitives.

      IIRC Itanium hold the top stop in SpecSSL for a while (don't know where it stands currently, I don't think the numbers are current).

      In fact, the only time Itanium does well in anything is when it has 6MB of L2 cache

      Stop drinking the AMD coolaid.

    7. Re:Not worth the outlay at present by RangerElf · · Score: 2, Informative

      Regural x86 has 8 GP-register, AMD64 has 16.

      Not if you want to actually use the stack pointer and your stack-frame base pointer; you have 4 GP regs (EAX .. EDX), two kinda- specific-purpose regs (ESI, EDI), one crippled-kinda-general-purpose-pointer reg (EBP) and one specific-purpose register (ESP).

      AND, if you want to do multiplications and divisions (the worst offenders, IMO), then two of the GP registers are already spoken for (EAX, EDX).

      So actually, the grandparent poster was right.

      -gus

  5. Finally enough horsepower... by Vo0k · · Score: 3, Funny

    ...to allow DRM encryption of movies to become standard :)

    --
    Anagram("United States of America") == "Dine out, taste a Mac, fries"
  6. PowerPC G5 by TiMac · · Score: 4, Interesting

    Who wants to optimize RC4 for the PowerPC G5 chip (64-bit implementation) and do a bake-off? Hand-coding PPC assembly doesn't sound as fun as this PHP I'm working on at the moment, so someone else will have to tackle that!

    --

    1. Re:PowerPC G5 by fizze · · Score: 2, Insightful

      I dont know why everyone jumps off the horse as soon as they hear the magic word "assembly".
      Seriously.
      If you want to get 110% out of your hardware, you have to put effort in, to get effort out. Makes sense, doesnt it ?

      Im not saying people who dont like ASM are sissies, not at all. But Im saying that assembly has its right, just as so many other programing languages.

      --
      Powerful is he who overpowers his temptations.
    2. Re:PowerPC G5 by TiMac · · Score: 2, Insightful
      Indeed.

      But when other projects beckon that don't require assembler work, I'm not about to jump on one that does for "fun" either ;)

      --

    3. Re:PowerPC G5 by Anonymous Coward · · Score: 4, Informative
      What ouch? You're looking at something different; RC4 is not RC5-72...

      From distributed.net's pages, here's what it has to say on the Opterons for RC5-72 (uniprocessor)
      The Opteron 2420 achieved a score of 9,547,969.00.

      The 2GHz G5 for RC5-72 (uniprocessor) achieved a score of 15,057,412.00 (there are 2.5GHz chips available...) The best multi-cpu scores?
      A 2-way 2 GHz Opteron achieved a score of 15,145,274.67, but
      a 2-way 2.5GHz G5 smoked it with a score of 37,441,192.00.

      Apples to apples, my friend, apples to apples.

  7. chip names by Pompatus · · Score: 3, Funny

    I'm holding out on the 64 bit systems until amd starts naming the chips commodore.

    --

    ----
    Squirrel ... It's not just for breakfast anymore
  8. well... by mx.2000 · · Score: 4, Insightful

    "I would like to point out that many CPU-hungry applications have not been optimized for AMD64 yet. In other words: such speedups can be expected in other areas."

    well, maybe in some areas.
    Since this is a cipher, it obviously helps a lot when you can work on 64-Bit chunks of data instead of 32-Bit.

    The same speedup can probably be seen with applications that use numbers larger than 32b (or 64b for floats), since the number of operations necessary will essentially halve.

    But other than that, I don't see much room for huge speedups.

  9. that's good by b100dian · · Score: 2, Interesting

    That's good because is yet another pace in the direction when all information (http, smtp etc.) will travel encrypted (since today only some pages are served this way, because of the processor loads)
    and because everytime we hear good about AMD we're happy:P

    Everybody'll get TLS'ed

    --
    gtkaml.org
  10. Optimization First, Features Second by Space_Soldier · · Score: 4, Insightful

    I wish that every software company would put optimization first and features second. This way, we would not have to buy computers every few years. They can potentially last much longer.

    1. Re:Optimization First, Features Second by Smoo_Master · · Score: 2, Insightful

      I would tend to disagree with that. While one should weigh the performance against an overabundance of features, overzealous optimization can also result in problems. Remember that Knuth said "Premature optimization is the root of all evil."

      If someone took your idea to the extreme, you might get something like this:
      "What does it do?"
      "Nothing, but look how *fast* it does it?"

      I think the best solution is moderation in both ends.

    2. Re:Optimization First, Features Second by shplorb · · Score: 2, Insightful

      Buy a games console then =]

  11. Re:Does this change anything for rc5? by RupW · · Score: 5, Interesting

    Will the optimized AMD64 rc4 code provide any boost to those crunching rc5 on an AMD64?

    No, they're entirely different. For a start, RC4 is a stream cipher whereas RC5 is a block cipher. They just share the same inventor, hence the names.

    AFAICR, the RC5 effort uses the register width to try and crack many keys in one go anyway - a different approach to this, which is using the register width to generate more of a single stream in one go.

  12. RC4 is not cryptographically strong by SiliconEntity · · Score: 2, Informative
    The RC4 stream cipher has a number of weaknesses. See Itsik Mantin's RC4 page; he is a crypto student who did his master's thesis on RC4. Among other weaknesses, the 2nd byte of the output is twice as likely to match the plaintext as it should be; there are weak keys; and it is possible to distinguish the output from randomness. Some of the attacks are practical and have been used to break the WEP wireless encryption algorithm, which uses RC4.

    If you really need speed, you can use RC4 securely but you have to know what you are doing and be aware of these attacks so you can employ protective countermeasures. Otherwise you are better off to use a cipher like AES which is actually secure.