Slashdot Mirror


AltiVec Unwrapped

paradesign writes "O'Reilly is running a nice article on AltiVec in the G4 chip. The article includes examples, with code, showing its effectiveness. For everyone who is uneducated as to exactly what Altivec is, this is a must read."

14 of 38 comments (clear)

  1. Good by morbid · · Score: 2, Informative

    This is a good article giving a basic overview of SIMD coding using altivec. However, when Apple claims that MHz don't matter, they're only telling the story, because SSE (on PIII and Athlon4/XP), 3DNow! on K6-2, K6-3 and Athlon all do much the same thing. I hate to say it, but the Pentium IV even has double-precision SIMD in the form of SSE2, currently the only consumer-grade processor with souble-precision SIMD. The AMD Hammer will have SSE2 as well when it comes out.

    --
    I'm out of my tree just now but please feel free to leave a banana.
    1. Re:Good by QuietRiot · · Score: 3, Funny

      I was curious about what kind of hardware in the x86 arena had the same capabilities. Does anyone know where one could find a rundown of the "extras" found on the various x86-based processors with capabilities similar to those described above?

      How do they compare to the AltiVec in terms of speed, precision, cache in/out, etc.?

      Oh! http://www.processor-emporium.co.uk seems to be a good reference site....

    2. Re:Good by Lars+T. · · Score: 2
      Gosh, you are right!

      Well, actually you are not, but that shouldn't keep you from trying. 2nd example of using AltiVec: FP vector multiply-add instruction - a no-show on SSE(2) and 3DNow!. 3rd example: relies on the fact that x[i] and y[i] vectors stay the same - which they don't on the x86 SIMD extensions. So in those examples we already have some of the differences between AltiVec and the lesser SIMDs, others are more registers and better instructions for shuffeling data. IOW again MHz isn't everything - as shown by e.g. dnet rc5 scores.

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    3. Re:Good by Perdo · · Score: 2

      And the purpose of this is?

      Let's review. Implementing Altivec requires a code rewrite. If your application lends itself to parallel processing, why rely on a single processor that executes 4 instructions at a time when you could use 6 processors, that are clocked 50% faster and most of the time execute 4 instructions in parallel and somtimes are reduced to two in comparison. You can still execute 6, 100% faster by clock speed at a given price. As long as you are going to have to rewrite your code, might as well rewrite it for a cluster.

      So, in our example, we pit 3 dual processor 1533mhz athlon XPs against 1 800mhz G4. Price point is $1600

      In one corner, you have a single bottom end apple G4 tower at 800 mhz.

      800MHz PowerPC G4
      256K L2
      cache
      256MB SDRAM memory
      40GB Ultra ATA drive
      CD-RW drive
      ATI Radeon 7500
      56K internal modem

      In the other corner we have 3u of Dual processor athlon goodness.

      3 tyan tiger AMD 760mp chipset motherboards @ $522.
      6 1800XP Athlons @ $624 (yes they work).
      3 256mb PC2100 registered ecc DDR ram @ $195.
      3 1u cases w/300w power supplies @ $120.
      3 40gb hard drives @ $162.

      Price point is $1623.

      Now rewrite your code.

      Which takes 3 weeks, by which time Apple raises the price of the G4 another hundred dollars while the price of the cluster drops a hundred dollars.

      Ok, that was a flame, let's stick to matters at hand.

      Refrencing this article, the ars technica article and the c't article (you know which one I'm talking about, that place where you dare not look, you'll find x86 there staring back at you) we can draw these assumptions:

      The G4 with Altivec performs equily clock for clock with x86 w/SSE with some rare exceptions where it performs 100% faster clock for clock.

      best case scenario for our similar priced systems using your best case for the G4 benchmark, rc5:

      Single G4 800mhz 8,243,188 keys per second
      6 AMD 1800XP 32,987,538 keys per second

      Same price, x86 is 4 times as productive.

      Seti@home using Ars Lambchop benching wu: Identicle!

      3.35 per work unit.

      x86 is 6 times as productive for the same price.

      CINT2000: base 648 - XP1800
      CINT2000: base 242 - G4 800mhz

      684 vs 242... and that is a single processor comparison!

      If we can optimise to scale, x86 is 16 times as fast for the same price

      If you know of any benchmarks where Mac can compare favorably for the price, please let us all know. You are right, Mhz is not everything. But you have to get some numbers to back the claim that the G4 is even marginally close in performance to machines with well over twice the clockspeed. I'm sure that will convince us all to run out and buy Macs for number crunching :)

      --

      If voting were effective, it would be illegal by now.

    4. Re:Good by Lars+T. · · Score: 2

      I could reply to your posting in detail, but I'm just gonna say: crawl up and live a long, miserable life.

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    5. Re:Good by Lars+T. · · Score: 2

      Sorry, but I have life.

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  2. Re:Run away! by QuietRiot · · Score: 2

    Explain the "farad" temperature measurement please....

  3. OpenApple by QuietRiot · · Score: 2

    And why doesn't anyone besides Apple sell this stuff?? Is is possible to get a G4-enabled, AltiVec-enabled board somewhere without paying the Apple Tax?

    1. Re:OpenApple by ivan256 · · Score: 2

      Many people would be happy to sell you a board with a G4 on it. Maybe even 2 G4s!

      Marvell makes ATX boards with 1 or 2 7450s.

      MotorolaMakes a very nice ATX board with 2 7450's on it. They also have the Sandpoint platform which you can use with many different PPC chips.

      Merlancia seems to have some good stuff.

      There's a bunch more too, Tundra, GMS, Force, just do a search on google. You'll likely find though that Apple has the best prices. If you want to play with a PPC (I'm assuming you want to do some low level stuff for fun or profit) you'll end up spending $1500 on just a board from somewhere else, or $1500 on a complete system from Apple. The Apple systems retain their value for a long time too.

    2. Re:OpenApple by JohnsonJohnson · · Score: 2, Informative

      google is your friend.

  4. Re:Gladly! by QuietRiot · · Score: 2

    Funny... I suppose you think you're doing people a service.

  5. LinuxPPC AltiVec support? by "Zow" · · Score: 2

    Mostly out of curiosity (as I don't have a G4 on my desk anymore - it died), what does anyone know about the status of AltiVec support under LinuxPPC (as opposed to OSX, as discussed in the article)? A quick Google search indicates that Motorola made some patches for gcc a couple years ago, but that it wasn't exactly production quality.

    There's a website that supposedly has tools, but you have to register for their mailing list to see what they've got (and I get enough mail as it is).

    -"Zow"

  6. Check out the Ars Technica article by plsuh · · Score: 2, Informative

    Ars Technica did an article comparing the AltiVec and SSE/MMX2/3DNow! architectures. Written a while back, but still valid as the architectures have not changed.

    --Paul

  7. Re:implementation-specifit coding by statusbar · · Score: 2

    Your guesses are correct.

    Each altivec register is 128 bits.

    You can use them as 4 32 bit integers, 4 32 bit floats, 8 16 bit integers, or 16 8 bit integers.

    There is a lot of information on altivec.org

    Jeff

    --
    ipv6 is my vpn