Slashdot Mirror


Introducing the PowerPC SIMD unit

An anonymous reader writes "AltiVec? Velocity Engine? VMX? If you've only been casually following PowerPC development, you might be confused by the various guises of this vector processing SIMD technology. This article covers the basics on what AltiVec is, what it does -- and how it stacks up against its competition."

83 comments

  1. The article makes a very good point... by tabkey12 · · Score: 4, Interesting

    This highlights one of the real advantages that AltiVec has over the various SIMD instruction sets available for x86 processors: its comparative stability. Every AltiVec processor since the original G4 has had the same essential functionality, the same large register pool that isn't shared with anything, and a reasonably complete set of likely operations. This has made it easier for support to become widespread: a program designed to take advantage of the original G4 will still get a noticeable performance improvement on today's G5. x86 SIMD was frankly botched - MMX was a very odd idea, and, though SSE & SSE2 have partially fixed the problem, the fact that SSE optimised code usually runs slower on an Athlon than 'unoptimised' code has severely limited its applications.

    1. Re:The article makes a very good point... by Anonymous Coward · · Score: 1, Interesting

      the fact that SSE optimised code usually runs slower on an Athlon than 'unoptimised' code has severely limited its applications.

      Oh? What's your source for that? Is that a problem with the processor or with the optimiser?

    2. Re:The article makes a very good point... by Anonymous Coward · · Score: 1, Funny

      You mean you read the article? That is .... strange.

    3. Re:The article makes a very good point... by adam31 · · Score: 4, Informative
      the fact that SSE optimised code usually runs slower on an Athlon than 'unoptimised' code has severely limited its applications.

      What does this even mean? I've written a great deal of optimized SSE code, and I can promise you that it works just as well on AMD. In fact, if you look at Athlon's pipeline, it does some really amazing things rescheduling and executing operations out-of-order. Fiddling around with ordering individual instructions is basically pointless because the scheduler has gotten so good at doing it on-the-fly.

      Can you cite a specific example, because I've never run into this.

    4. Re:The article makes a very good point... by Anonymous Coward · · Score: 0

      What does this even mean?

      It means he is an Apple fanboy, nuff said...

    5. Re:The article makes a very good point... by Anonymous Coward · · Score: 3, Interesting

      Have you done any PPC programming? There are some big gotchas when optimizing for the G5 and I usualy have to program things twice as the typical G4 altivec optims often runs quite slower on the G5, particularly the ones where we use the stream load/save hint instruction to help the processor schedule its memory accesses. The apple site has a page dedicated to the diference between G4 and G5 when it comes to optimizing. On the contrary I never found such gotchas when programming SSE on PIII and going to the other processors. And I do a LOT of SIMD, PPC or x86 asm programming for pro audio products.

    6. Re:The article makes a very good point... by Anonymous Coward · · Score: 0

      MMX was a very odd idea

      MMX was an extremely good idea. Its final incarnation was limited. This was probably due to a mixture of influences, including cost (the P55C was developed in 1995-1996) and project deadlines. I don't recall any PowerMacs in the beginning of 1997 that had Altivec, or any natve SIMD capabilities.
      Even though MMX is limited it has been used to speed up a number of applications including scaling and blitting.

      I remember putting together my first Pentium MMX 200 system with 32MB of RAM, and it wasn't even in the same price ballpark as even a lower end 7600 (this is back when I was in school and used PowerMacs exclusively in one of our research facilities...I am not a Mac basher, just a realist)

    7. Re:The article makes a very good point... by drinkypoo · · Score: 2, Insightful

      It's a bunch of hooha. Ignore it. Altivec has added new instructions; MMX and SSE have added new instructions and registers over time. Clearly the intel-driven stuff has changed more than Altivec. Of course, it's been a couple years longer, too. The Pentium MMX (P55C) came around in 1997, and the G4 in 1999. That's not an excuse really, but useful information anyway. The benefit of altivec since it has changed less is that you don't need to make changes to the old code to make it run faster on newer processors with more powerful SIMD engines. With the intel stuff, if you want the most out of the processor you have had to optimize for MMX, then SSE, then SSE2... Now, if you wrote SSE2 code and tried to execute it on Athlon XP, it wouldn't work at all, maybe that's what they mean :)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    8. Re:The article makes a very good point... by dusanv · · Score: 1

      The gain of using SSE on Athlons is smaller because they have decent ALUs and FPUs to begin with. They are tough to beat. Switching code to SSE on P4 gives much more dramatic results because it's a POS CPU. He could just be meaning that...

    9. Re:The article makes a very good point... by Bert64 · · Score: 2, Interesting

      Which also means that, well optimized floating point code (which is more likely, since compilers have had many years to improve their floating point code generation) would run faster on an athlon than poorly written sse code.. Whereas the p4 would run the sse code faster simply because it runs fpu code so poorly.
      I'm sure well written sse code would run faster on both platforms, atleast in cases where vectorization makes sense.. Intel implemented the weak fpu unit in the p4 to try and steer users onto sse code even when it's not really appropriate.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    10. Re:The article makes a very good point... by Bert64 · · Score: 1

      It's interesting to see how the 1.67ghz G4 chip holds it's own against the G5 chips for cracking RC5.. according to http://n0cgi.distributed.net/speed/query.php?cputy pe=all&arch=2&contest=rc572
      Infact, the 1.67ghz beats the 2ghz G5 and isn't far behind the 2.5ghz G5..

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    11. Re:The article makes a very good point... by turgid · · Score: 1
      Intel implemented the weak fpu unit in the p4 to try and steer users onto sse code even when it's not really appropriate.

      SSE has scalar instructions too. Since the SSE registers are in a flat register file (c.f. stack in the legacy FPU descended from the 287/387), it's actually easier for a compiler to generate efficient code for SSE. SSE only does single precision though, so for double precision you need SSE2/SSE3. Incidentally, if you look at the intel manuals you will see that SSE3 is only a minor extension to SSE2 adding a few more data shuffling instuctions which save a few operations here and there.

    12. Re:The article makes a very good point... by Bert64 · · Score: 1

      Easier if your starting from scratch perhaps, but experience counts for a lot too, and people have far more experience optimizing for x86 style floating point units currently.. Although it may not take as long to bring compilers up to the same level with SSE as it did with x87.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
  2. AltiVec? Velocity Engine? VMX? by Zebadias · · Score: 0, Redundant

    Its vector processing...

    1. Re:AltiVec? Velocity Engine? VMX? by Anonymous Coward · · Score: 0

      Its vector processing...

      I'm sorry, did you have a point? Or should I mod you -1 Redundant? Ta.

  3. Altivec and OS X by siliconwafer · · Score: 4, Insightful

    I'd like to know if Mac OS X uses the Altivec instructions to their full potential. For example, the article mentions that a heavily loaded server can benefit greatly from Altivec if the TCP checksum algorithm uses it. Does OS X TCP stack do this?

    1. Re:Altivec and OS X by Anonymous Coward · · Score: 1, Insightful

      Does OS X TCP stack do this?

      Best guess no - you have to weigh up the immediate gains against the overhead of saving/reloading the SIMD registers when you enter/leave the TCP stack. Usually OS kernels avoid SIMD/FPU for this reason.

      (Uh, why was that modded troll?)

    2. Re:Altivec and OS X by crow · · Score: 3, Interesting

      But if you only need one or two of the registers for TCP, then perhaps it would be a win if you are only doing a save/restore on one or two registers (and presumably a status register). And if you only need it in the TCP stack, then only do the save when the kernel calls those functions (which admittedly gets complicated if the kernel is preemptable).

    3. Re:Altivec and OS X by ip_fired · · Score: 5, Informative
      I googled around and found this article on Macworld:
      According to several developers Macworld talked to who are currently working on OS X applications, anytime the OS can take advantage of the AltiVec engine, it does. This ensures that the parts of the OS that can utilize AltiVec, such as working in the new user interface, experience a significant increase in performance.

      I don't know how much of OS X has AltiVec code, but there are many other apple apps that use it. iTunes uses it for encoding music. I'm sure the video codecs in Quicktime use it as well.

      The Mac has a really nice optimization tool called shark which will help you find things that can be put into the AltiVec processor (it also helps with general optimization).
      --
      Don't count your messages before they ACK.
    4. Re:Altivec and OS X by mrseigen · · Score: 1

      If you go to Apple's XCode page, for XCode 2.0 apparently it automatically "vectorizes" applications to take advantage of this. So if not now, probably in the future (unless they're afraid of their newest tools, like Microsoft's internal use of Win2K).

    5. Re:Altivec and OS X by Chuckstar · · Score: 4, Informative

      Most (all?) Apple hardware does the checksum in hardware (built into the NIC). Add to that the inefficiency of using Altivec in the kernel, especially for small data sets, and it did not make sense for Apple to develop an Altivec version of the TCP checksum code.

      The reason the article mentions the checksum case is not because Apple is missing the boat, but because there was a nice research article written about writing optimized TCP checksum code for Altivec, providing a good set of example code for aspiring Altivec coders.

    6. Re:Altivec and OS X by Matthias+Wiesmann · · Score: 2, Interesting
      Actually I read an optimisation on some darwin mailing list were the altivec register are not immediately save on a context switch, but configured in such a way that a write to those causes an interruption, and that the service routine does the register saving then. The idea was that often, you are prempted by a task that does not use the altivec register, so saving and restoring those registers would be a waste.

      I can't remember if this was implemented in the end...

    7. Re:Altivec and OS X by SewersOfRivendell · · Score: 2, Interesting

      Remember that we're talking about context switch overhead, not function call overhead. It doesn't matter if you use three or thirty vector registers, the kernel still has to take the performance hit to figure out which registers are in use. The compiler will generate instructions to set the appropriate bits in the VRSAVE register, but the OS still needs to compare each bit in VRSAVE so that it knows which registers to save/restore.

    8. Re:Altivec and OS X by cyngus · · Score: 1

      Anecdotally I can tell you that OS X runs much better on a G4 or better than a similarly clocked G3. For example I have an iBook G3 500Mhz and a first generation, G4 400Mhz. The G4 runs noticeably faster. My previous experience pre-OS X was that G3's and G4's performed pretty much on par. It wasn't until the release of OS X that Apple really started putting AltiVec optimizations into the OS.

    9. Re:Altivec and OS X by dbrutus · · Score: 2, Interesting

      Since the OS X TCP/IP stack is likely fully available in Darwin, why don't you go look and let us administrator types know?

    10. Re:Altivec and OS X by bill_mcgonigle · · Score: 4, Informative

      I'd like to know if Mac OS X uses the Altivec instructions to their full potential.

      No, at this point too much needs hand tuning for everything to fully utilize the potential of Altivec. Most serious DSP-class apps spend the effort to do this in critical code, but there's plenty of compiled code running in OSX that doesn't benefit from the parallel vectorization that the Altivec unit can offer.

      This is all about to change with GCC 4 which offers an SSA tree optimizer. The SSA form is particularly useful for doing automatic vectorization of code. I'm not sure what the efficiency will be like in the first release but it looks like good things are coming.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    11. Re:Altivec and OS X by MyDixieWrecked · · Score: 1

      OSX most definately uses Altivec QUITE extensively.

      I switched from a 450mhz G3 to a 450mhz G4 a couple years ago, and there's a HUGE performance difference in OSX's boot and response time.

      Of course, the GUI runs a bit faster on the G4, too, but that could be because of the AGP video card.

      --



      ...spike
      Ewwwwww, coconut...
    12. Re:Altivec and OS X by psavo · · Score: 1

      Also, afaik most kernel code tries hard not to use any math/vectorization coprocessor. In Linux RAID is supposedly MMX/SSE -accelerated and tries hard not to botch everything, but most other aren't.

      --
      fucktard is a tenderhearted description
    13. Re:Altivec and OS X by edp · · Score: 2, Informative
      "No, at this point too much needs hand tuning for everything to fully utilize the potential of Altivec. Most serious DSP-class apps spend the effort to do this in critical code, but..."

      Of course it is rarely true that AltiVec instructions are used to their "full potential" in the sense you can usually find another CPU cycle to eliminate, but neither is it necessary to use hand tuning to get big boosts from AltiVec. We do the hand tuning for you (in C with AltiVec extensions or in assembly language) and provide optimized libraries, such as BLAS, vDSP, vImage, and others bundled into the Accelerate framework. (As another participant notes, some library interfaces and functions are not original to Apple, but Apple provides optimization.) The libraries are used in a variety of places throughout Apple's software, even inside the kernel, and are available to external developers.

      It does take a certain class of algorithms to get a lot of use from the libraries, but some of the routines are of general benefit. You would expect image processing, music encoding and decoding, and mathematical algorithms to run faster with AltiVec, but even simple things like copying memory are significantly faster when done with AltiVec instructions.

    14. Re:Altivec and OS X by bill_mcgonigle · · Score: 1

      even simple things like copying memory are significantly faster when done with AltiVec instructions.

      Do I get these with a generic compile of an XCode project (curious)?

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    15. Re:Altivec and OS X by edp · · Score: 1
      "Do I get these with a generic compile of an XCode project (curious)?"

      Yes, in things like memcpy, you will get AltiVec instructions with just default switches. You could single-step through memcpy (actually a subroutine named __bigcopy) in the debugger and see the instructions.

      The compiler isn't going to automatically recognize you're doing an FFT routine and call an optimized routine instead of using your code. So, to use the optimized signal processing routines, you would add a reference to the Accelerate framework to your project and call the routines explicitly. There is work on having the compiler recognize more and more situations where it can use AltiVec code or prepared routines.

    16. Re:Altivec and OS X by bill_mcgonigle · · Score: 1

      Yes, in things like memcpy, you will get AltiVec instructions with just default switches.

      Excellent. Kudos to your team.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  4. AltiVec is nice... by Grand+V'izer · · Score: 5, Informative

    I've done some altivec programming in the past, and discovered it was a very effective use of my time. Since there's no mode-switching penalty for using the vector instructions you can use it for some very trivial-but-common tasks, like replacing strlen(), vector operations on small tables, etc.. I knocked a lot of computation time (25%) from one of my projects just by vectorizing three functions. Of course there's a hitch: vector processing only works for certain kinds of algorithms and requires a change in mindset. In spite of that it's a great tool to have in your box.

    --
    Not all random numbers are created equally.
    1. Re:AltiVec is nice... by evn · · Score: 5, Informative

      The other nice thing about Altivec on OS X is that Apple has done a fairly good job of making it accessible without forcing the programmer to learn and use assembly language. These libraries will automatically fall back to a scaler code path if they're running on a G3 so it saves you from a fair bit of work there too. They have included a number of optimized libraries that use Altivec that are ready to go "out of the box" with xCode including:

      • Vimage: for image processing
      • vDSP: for signal processing
      • BLAS: the name says it all: "Basic Linear Algebra"
      • LAPAC: for solving systems of equations and matrix factorization
      • Vector Math Library: unloads common operations like square root, transcendental functions, division, etc to VMX
      • vBasicOps: for simple algebra operations like integer addition, subtraction, etc.
      • VBN: for dealing with 256-1024bit numbers easily

      Apple has documentation and source code for the libraries on their Developer Connection Website. What good are vector units if nobody can make use of them? I can't wait for Apple to put the GPUs image processing abilities into my hads with CoreImage/Video.

    2. Re:AltiVec is nice... by swamp+boy · · Score: 1

      How is it helpful for replacing strlen()? Do you load up the string in a vector register and have it compare each byte to \0? Could you provide an example of how this is done? Thanks!

    3. Re:AltiVec is nice... by Lars+T. · · Score: 2, Informative

      A little Googleing gives us a) the Freescale (nee Motorola) AltiVec Libraries (Login required), which includes (among others) strlen and b) this code fragment and c) a more general description.

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    4. Re:AltiVec is nice... by Anonymous Coward · · Score: 0

      BLAS: the name says it all: "Basic Linear Algebra"
      LAPAC: for solving systems of equations and matrix factorization


      You make it sound like Apple invented all of these libraries. This isn't the case with these two (I already knew this), and I didn't bother looking up the others.

      BLAS has been around since before 1980 (probably), and LAPACK is relatively new--15 or so years old. Both of these libraries have the interesting property that they will likely always be in use. In four hundred years, your Applescripts and VB files won't exist, but there'll be folks arguing about the latest version of BLAS and LAPACK and whether it's as good/fast as the original.

  5. One of my favorite ArsTechnica articles by rsborg · · Score: 2, Informative

    is here. They talk about altivec on Page 3. IIRC, it's the best designed mass-market SIMD implementation there is out there.

    --
    Make sure everyone's vote counts: Verified Voting
    1. Re:One of my favorite ArsTechnica articles by adam31 · · Score: 3, Interesting
      Why does no one ever talk about Sony's VU assembly in their SIMD comparisons? The parent's linked article even cites the PS2 in its very first sentence, but then ignores it completely!

      The VUs have the sweetest SIMD instruction set I've seen. 32 registers (like altivec), but you can do component swizzling within an instruction, it has MADD and also a sweet Accumulate register that can be re-written to on successive cycles (throughput is worse if you accumulate results in a normal vector register, like you have to on all other SIMDs). So you can do a 4x4 matrix/vector multiply in just 4 instructions!

      The big problem was that you didn't get any of the nice instruction scheduling/re-ordering that you get on PPC or x86 platforms, so the onus was on the programmer to NOP through latency issues (huge pain!)... They finally came out with the VCL that would process chunks of VU assembly and reschedule everything at compile time.

      The really sad thing is that Sony/IBM/Toshiba opted for AltiVec in the Cell. I guess it probably has better tools and IBM is highly leveraged into VMX, but VU was very, very clever considering that it pre-dates all these other SIMD instruction sets.

    2. Re:One of my favorite ArsTechnica articles by be-fan · · Score: 3, Informative

      AltiVec will only be used in the PowerPC core of the Cell. The vector coprocessors (the SPEs), will use some other instruction set.

      --
      A deep unwavering belief is a sure sign you're missing something...
    3. Re:One of my favorite ArsTechnica articles by Flaming+Death · · Score: 0

      Totally agree - the VU(s) were way ahead of their time. I remember finding so many amazing instructions in it that you could reduce entire algorithms to a few simple VU calls. The NOP padding was totally a pain - but the fun was trying to use them up with interger ops :-)

      Like everything, people get on the bandwagon about gear that is better marketed, rather than necessarily completely better.

      Not sure about the AltiVec in the Cell - it does have some 'similar' componentry to the PowerPC but its a complete remake from scratch, so it will be interesting to see the what the instruction sets are like - I would bet a very high compatibility with PS2 like instruction sets (ie MIPS & VU based).

  6. simdtech.org by kuwan · · Score: 4, Informative

    If anyone is interested, simdtech.org is probably the best resource you can find for AltiVec (or any other SIMD) programming. They have a number of tutorials and technical resources and the mailing list is the best there is. Motorola, Apple, and IBM engineers frequent the list so you can get help and information directly from the guys that created AltiVec as well as from those who program for it.

    --
    Join the Pyramid - Free Mini Mac

  7. API matters by dugenou · · Score: 4, Insightful
    /. caught again on buzz words with a shallow article.

    Anyway, what we need is not an autovec compiler, but instead a library with most CPU hungry algorithms well implemented with SIMD extensions.

    What about an open library, cross-platform, multimedia oriented, along the line of SUN's mediaLib ? Would SUN allow the re-use of their API ?

    I'm looking for such a library, with GPL/LGPL compatible license. The API has to be in C, to maximise audience. For many projects, C++ is not an option.

    Primary use will be DSP work in GNU Radio project, but multimedia extensions could prove useful anywhere in GUI's to audio/video app, etc.

    I would take any pointers to such an already existing API/project, or be ready to start a new one, if other people interested in.

    See also this previous story for cheap recylced comments.

    --
    Love salty crackers? catchy electronica? Try !
    1. Re:API matters by Kluge66 · · Score: 3, Informative
      I think you're looking for liboil: "Liboil is a library of simple functions that are optimized for various CPUs. These functions are generally loops implementing simple algorithms, such as converting an array of N integers to floating-point numbers or multiplying and summing an array of N numbers. Such functions are candidates for significant optimization using various techniques, especially by using extended instructions provided by modern CPUs (Altivec, MMX, SSE, etc.)." http://www.schleef.org/liboil/. The site seems to be down at the moment, but it's also listed on Freshmeat.

      I believe the GStreamer people are looking into using liboil. The license is two-clause BSD.

  8. tradeoffs by idlake · · Score: 1, Flamebait

    Choosing something like AltiVec involves a bunch of trade-offs:

    -- How much work do I need to do in order to take advantage of it? Some BLAS implementations may support it and some Fortran 95 compilers may generate code for it for some primitives, but other than that, it's a lot of manual work to tune code for it. (My own experience with using the AltiVec instructions can only be described as "painful", among other things because the C interface to them is poorly defined and causes name conflicts.)

    -- What range of hardware can I choose from? Well, there is mainly one Apple rack-mount that runs OS X, a bunch of big Apple desktops in fancy cases, and a bunch of expensive IBM workstations. That's pretty limited.

    -- What's the bang for the buck? There are actually two parts to this: what's the bang for the buck for code not specifically hacked to take advantage of AltiVec, and what's the bang for the buck for code specifically hacked for AltiVec. For code not specifically tuned for AltiVec, the bang for the buck is not so great with either Apple or IBM. For the rest, it may be reasonable.

    Considering these issues, I continue to find AltiVec pretty unpersuasive. I think AltiVec won't take off until Intel and AMD's SIMD instructions are equally good; until then, there is simply not enough incentive for software writers to incorporate support into their software for it consistently. And then, frankly, we first need a market in commodity Linux PowerPC boxes until that really gets interesting. I wouldn't hold my breath.

    1. Re:tradeoffs by b1t+r0t · · Score: 2, Informative
      a bunch of big Apple desktops in fancy cases, and a bunch of expensive IBM workstations.

      Don't forget the small Apple desktop in a fancy case.

      --

      --
      "Open source is good." - Steve Jobs
      "Open source is evil." - Microsoft
    2. Re:tradeoffs by Anonymous Coward · · Score: 0

      I wonder what the bang-for-the-buck for those machines is. A cluster of Minis might be cheaper than an equivalently fast cluster of XServes. Installing Yellow Dog on all of the might be a pain, though.

    3. Re:tradeoffs by Anonymous Coward · · Score: 0

      Wow, look at those moderation points. The Apple fanboi modsquad is apparently out in full force again tonight. Great going, guys, you tell us all what kind of people use Macintosh.

  9. Does it matter? by leereyno · · Score: 2, Insightful

    I don't know of anyone who makes an open standards based system using the the PowerPC architecture. IBM did release a reference design for a PPC based motherboard, but as far as I know no one every produced it.

    Unless and until I can go down to Fry's and buy a motherboard based off of this chip and put it into a standard case, it really doesn't matter if the CPU is better or not. It is the system as a whole that matters, not the relative performance of one of its components. I'm not going to paint myself into a corner with a proprietary system from anyone, let alone Apple.

    Lee

    --
    Muslim community leaders warn of backlash from tomorrow morning's terrorist attack.
    1. Re:Does it matter? by Anonymous Coward · · Score: 1, Interesting

      You can just buy a $499 mini and get done with it.

    2. Re:Does it matter? by leereyno · · Score: 0, Flamebait

      If I wanted a slow laptop, I'd at least buy one that comes with a screen and a keyboard.

      Lee

      --
      Muslim community leaders warn of backlash from tomorrow morning's terrorist attack.
    3. Re:Does it matter? by podperson · · Score: 4, Insightful

      I don't know of anyone who makes an open standards based system using the the PowerPC architecture. IBM did release a reference design for a PPC based motherboard, but as far as I know no one every produced it.

      CHRP, the PowerPC Common Hardware Reference Platform is what you're looking for, and it's been around since before there were Apple PowerPCs. AFAIK most, if not all, the PowerPC-based workstations shipped by IBM, the BeBox, various third-party PowerPCs such as those from PowerComputing, and many of Apple's machines (even tody) are either compliant or as-close-to-compliant-as-makes-sense with this or evolutions of this standard (such that some fanatics Rhapsody/OS X were able to get it running on AIX PowerPC workstations).

      CHRP Links

      I'm not going to paint myself into a corner with a proprietary system from anyone, let alone Apple.

      Until I can make the computer from sand, copper ore, and crude oil using recipes downloaded from the internet (i.e. "The Diamond Age"), I don't see the useful distinction between being able to build a computer out of proprietary chips from one of, count them, two CPU manufacturers, a video card from one of, count them, two graphics card manufacturers, etc. and simply buying a computer that works.

    4. Re:Does it matter? by Donny+Smith · · Score: 1

      >> motherboard based off of this chip and put it into a standard case,

      > You can just buy a $499 mini and get done with it.

      uhm, did you notice the word **motherboard**?
      and how about a **standard** case?

      the grandparent is right - it's open but not actually.

      buy a bladecenter loaded with IBM's PowerPC blades now and pray to dear god you'll have anyone but IBM and their partners give you quotations for maintenance agreement in 2006.
      "makes you think twice who you invite to your house"

    5. Re:Does it matter? by Anonymous Coward · · Score: 0
      Then buy an iBook. They have a G4, and start under $1000.

      Or go buy one of the many non-Apple PowerPC boards that you no doubt came across during the Google search you performed before posting.

      Wait, you didn't search before opening your mouth? oops!

    6. Re:Does it matter? by MetaPhyzx · · Score: 1
      Unless and until I can go down to Fry's and buy a motherboard based off of this chip and put it into a standard case, it really doesn't matter if the CPU is better or not. It is the system as a whole that matters, not the relative performance of one of its components. I'm not going to paint myself into a corner with a proprietary system from anyone, let alone Apple.


      that's a slightly oxymoronic way of expressing it, isn't it? There are descendants of CHRP that exist, such as the Power Mac. Last I checked, It is a "whole system".

      You may not be be able to go down to Fry's (we don't even have Fry's here in this part of the Midwest)and get a G4 motherboard, but you sure can go pick one up online. You can also jump out and get up to a 1.4 GHz G4 as well. Online. I'm more than sure you can get it into a standard case as well. Play around with it a bit.

      No one is saying you have to run OS X on your G4 motherboard and processor; Linux PPC, Net and Free BSD, Darwin and others are available.

      It may not be at the price point you want to pay, but the option is there.
      --
      Blacker than my baby girl's stare. Black like the veil that the muslimina wear. Black like the planet that they fear...
    7. Re:Does it matter? by Anonymous Coward · · Score: 0

      CHRP? Oh, what an Apple bore... and boor.

    8. Re:Does it matter? by Anonymous Coward · · Score: 1, Informative

      the new 'Amiga' is basically a PowerPC reference board. they make an ATX and a Mini-ITX version of the board. it will run the new AmigaOS / MorphOS / all (?) the various PPC Linuxes (Linii?).

      http://www.walibe.com/modules.php?name=News&file=a rticle&sid=16

      http://slashdot.org/article.pl?sid=03/09/22/239215 &tid=137&tid=138

      the AmigaOne boards are either G3 or G4. no G5.

    9. Re:Does it matter? by d0wnr11g3r · · Score: 1
      $499 is hardly painting yourself into a corner financially, unless a $30 KVM switch would break your bank.

      and what difference does it matter if it's an already complete system, or not in a "generic" case?? if you program for altivec enabled processors, it's probably going to be running on a Mac anyway. it's highly unlikely that any code you may write will be running on IBM hardware as it's even less common and much more expensive than any Mac(G4/5) w/altivec.

      Ignoring Altivec isn't a very good idea either. As sales increase in the Mac arena(and they will/are currently), altivec is going to be more commonly used - think of all the games that could be ported! If you think that Apple is just going to die off, you're nuts - people have been saying that for 10 years now and it's not even close to being a possibility at this moment or in the future. Apple has definitely learned from the rough times in the mid to late 90's and is on a much more secure road than they were...but even then, Apple dying off just wasn't going to happen.

      Personally, having learned how to program assembly on PowerPC chips, I feel I have a much better understanding of how everything works and where you can optimize code than I would have taken away from anything x86 based. It may have had a slightly steeper learning curve and sometimes required re-thinking some processes to use vectors, but I think it was worth every minute.

    10. Re:Does it matter? by Anonymous Coward · · Score: 0

      Or he meant something which is comparable to the latest AMD and Intel offerings which means at least 2.5ghz G5.

    11. Re:Does it matter? by dbrutus · · Score: 1

      You might want to keep an eye on these guys. Power.org members are likely to produce those PPC standard case motherboards in the reasonably near future.

    12. Re:Does it matter? by Anonymous Coward · · Score: 0

      Wow you're an ass, as such I have moderated you down, I hope others follow.

      How you like them apples jackass?

      Re:Does it matter? Monday March 07, @09:55AM 2
      attached to Introducing the PowerPC SIMD unit
      Re:Half of 200? Monday March 07, @08:49AM 2
      attached to The Story Behind Cell Phone Radiation Research
      Re:It's the Branding Monday March 07, @05:36AM 1 2
      attached to Problems With the Firefox Development Process
      Re:Yeah - So Who's Lovin' It? Sunday March 06, @06:48PM 0, Flamebait
      Re:Yeah - So Who's Lovin' It? Sunday March 06, @06:38PM 1 1, Flamebait
      attached to Openoffice 2.0 Preview
      Re:was a change required? Sunday March 06, @06:43PM 0, Offtopic
      Re:was a change required? Saturday March 05, @09:07PM 1 2
      attached to Wells Fargo Web-Enables ATMs
      Free download for AMD64 Saturday March 05, @09:31PM 1, Offtopic

    13. Re:Does it matter? by leereyno · · Score: 1

      No, all I meant was an ATX motherboard with PCI slots that I can plug standard components into.

      The strength of the PC has never been that it is always the best and the fastest. At various times other platforms have held that crown. The strength of the PC as a platform is that it is based upon interchangable parts from multiple vendors. This ensures that at any given time you will always get the best bang for your buck. It also allows you to completely customize your system and pick and choose the precise components that you use to build it. A certain degree of customization is possible in modern Macs, but only because Apple was forced into it in order to compete with PCs. I understand that your average soccer mom doesn't care about any of this, but then I'm talking about a computer for ME, not for some technophobe. If Apple wants to corner the technophobe market then they're welcome to it, but that isn't a justification for the Apple-freak to then come and preach to me about how a Mac is the ideal computer for me as well, because Macs are not, never have been, and never will be.

      An open standards based 2.5 Ghz G5 motherboard sounds like a VERY good idea to me. I'd love to be able to run a 64 bit version of Linux on something like that. It would primarily just be for the hell of it since I'd then have painted myself into a corner software-wise. The only stuff I could run on such a system would be those Linux applications that I could get to compile, and there are no guarantees when it comes to that. Of course if the price/performance ratio of such a system was good, then it is possible that at least a few people would start running such systems. They would be unlikely to ever threaten the dominance of Intel/AMD, but that doesn't mean they couldn't enjoy a strong cult following. Whether that would generate enough revenue to keep anyone in business is of course another matter. Actually I think that one area where such a board would do very well is in the server arena where software compatibility is not as much of an issue. If such a board can be the foundation of a capable server, and it runs cooler and uses less power, then I can see how it would be very appealing to companies that are running into problems keeping their server rooms from melting.

      Apple could probably clean up if they pursued such an approach by producing standards based rack-mount systems for a competitive price. They could even use OS-X on it if they wanted to. The problem is that Apple just isn't smart enough to do that. At any given turn the company can be expected to either shoot itself in the foot, or snatch defeat from the jaws of victory. Rare mistakes that actually lead to success, such as the Ipod, are subsequently perverted and/or destroyed.

      I have no problem with the PowerPC architecture. My problem is with Apple and with the religious zealots who try to convert everyone around them into Mac users. I'm sorry, but I've been around Macs since the very first model shipped in late 1984. Once upon a time they were something to behold, but those days are long past. My very first computer was an Apple II+, which even now sits in my closet. The insanity that overtook Apple is something that pains me still. But as much as I might love the Apple that once was, I have no patience for dogma and ideology, especially not when it comes to a topics like computers and technology where objective reality is anything but a mystery.

      Lee

      --
      Muslim community leaders warn of backlash from tomorrow morning's terrorist attack.
    14. Re:Does it matter? by Bert64 · · Score: 1

      Well the mac mini is small enough that it would easily fit inside a standard case, i'm sure you could build a cluster of them in a standard case and even build in a switch, possibly even a kvm, all in one standardish case.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
  10. Every AltiVec processor... by Anonymous Coward · · Score: 0, Insightful

    So, the G4 and the G5.

    What you're really talking about here is that with a greater variety of chip models, it's harder to do all-out optimization.

    Never mind that there are fewer G4s and G5s deployed combined than any one class of Intel or AMD chips which requires different SIMD optimizations.

    When your market is 20 times bigger, you can afford to optimize for Athlon XP, Athlon64, Opteron, and Pentium 4, and you're still putting in only a fifth of the relative development/installed-based effort. If you like, you can still go and optimize for all of the older and niche models without being in a worse situation than PowerPC developers.

    If the Athlon line wasn't x86 compatible, I suppose they'd be praising the consistency of 3D-Now. Lack of x86 compatibility is not an advantage! x86 is the defacto standard. Following standards always costs a certain amount of design freedom and elegance, but the advantages are considerable.

    You really can't take someone too seriously who preaches the advantages of uniformity within a completely non-standard-compliant line of chips, while denigrating the popular standard for the relatively minor incompatibilities between its various implementations. As if small incompatibilities are bad and big ones are good!

  11. AltiVec instructions by Anonymous Coward · · Score: 1, Informative

    There's a book "Vector Game Math Processors" by James Leiterman ISBN: 1-55622-921-6 that discusses programming PowerPC-AltiVec, MIPS, and 80x86 SIMD instructions. I found it pretty useful when I do vector programming with AltiVec! Some instructions that other processors have that AltiVec doesn't are simulated with what he called PseudoVec!

  12. portable vectorization by AeiwiMaster · · Score: 4, Interesting

    On the D programing newsgroup we have been talking
    about implementing a vectorization syntax, so
    we can have portable vector code which
    approach the speed of hand coded vectorization.

    Here is something from the list.

    What is a vectorized expression? Basically, loops that does not specify any
    order of execution. If there is no order specified, of course the compiler
    can choose any one that is efficient or maybe even distribute the code and
    execute it in parallel.

    Here is some examples.

    Adding a scalar to a vector.
    [i in 0..l](a[i]+=0.5)

    Finding size of a vector.
    size=sqrt(sum([i in 0..l](a[i]*a[i])));

    Finding dot-product;
    dot=sum([i in 0..l](a[i]*b[i]));

    Matrix vector multiplication.
    [i in 0..l](r[i]=sum([j in 0..m](a[i,j]*v[j])));

    Calculating the trace of a matrix
    res=sum([i in 0..l](a[i,i]));

    Taylor expansion on every element in a vector
    [i in 0..l](r[i]=sum([j in 0..m](a[j]*pow(v[i],j))));

    Calculating Fourier series.
    f=sum([j in 0..m](a[j]*cos(j*pi*x/2)+b[j]*sin(j*pi*x/2)))+c;

    Calculating (A+I)*v using the Kronecker delta-tensor : delta(i,j)={i=j ? 1 : 0}
    [i in 0..l](r[i]=sum([j in 0..m]((a[i,j]+delta(i,j))*v[j])));

    Calculating cross product of two 3d vectors using the
    antisymmetric tensor/Permutation Tensor/Levi-Civita tensor
    [i in 0..3](r[i]=sum([j in 0..3,k in 0..3](anti(i,j,k)*a[i]*b[k])));

    Calculating determinant of a 4x4 matrix using the antisymmetric tensor
    det=sum([i in 0..4,j in 0..4,k in 0..4,l in 0..4]
    (anti(i,j,k,l)*a[0,i]*a[1,j]*a[2,k]*a[3,l]) );

    1. Re:portable vectorization by Misagon · · Score: 1

      Interesting, but how do you enforce the property of paralellism without restricting the language that is available within the loop?

      --
      "We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
    2. Re:portable vectorization by Anonymous Coward · · Score: 0

      macstl already does some of this, through a custom implementation of the C++ standard component valarray:

      Adding a scalar to a vector: a += 0.5;

      Finding the size of a vector: size = sqrt(sum (a * a));

      Finding dot product: dot = sum (a*b);

      All this in C++ for maximum compatibility with existing code and a simple, intuitive syntax.

      macstl currently doesn't have direct support for 2D matrices but they can be overlayed using slice and gslice. However the overhead of pulling scalar elements from scattered places and putting them into a vector may kill most performance gains from using vectorized code in these cases.

      One of the shortcomings of separately compiled code libraries like vecLib etc. is the granularity of the supplied functions. Altivec and SIMD work best with branchless, straight-line code and maximal use of CPU registers. Any separate library either has (1) general-purpose functions that work with a vector at a time, or (2) special-purpose functions that work with many vectors at a time. But (1) performs too little work within a function, so that any code which uses it will have too many branches (function calls and returns). And (2) performs a good amount of work before function return, but you as the client are forced to pick from very specialized functions such as FFT, image processing, etc.

      That's where inline code libraries like macstl come in. Besides being open source by necessity (since all the source is visible to humans and compilers alike), a good C++ compiler will inline all the function calls with a loop and smooth out the road for fast Altivec performance. The only other library I know of with a similar philosophy is Joel Falcou's EVE.

      macstl-generated code hasn't been benchmarked against gcc 4.0 autovectorized code, but its SSE/SSE2 generated code (did I mention it is cross-platform?) is actually faster than Intel ICC 8.1's autovectorized code.

      Cheers,
      Glen Low, Pixelglow Software
      www.pixelglow.com

    3. Re:portable vectorization by Alsee · · Score: 1

      I'm just guessing, but they probably don't try to restrict the language or "enforce" paralellism properties at all. It's probably like writing "function(x++,x++)" which could be evaluated as function(x,x) or as function(x,x+1) or as function(x+1,x). The behaviour is undefined and its considered the programmer's fault if he does it and gets buggy results.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
  13. What it means... by turgid · · Score: 0, Flamebait
    ...is that this is yet another IBM PR fluff piece liberally sprinkled with FUD and half-truths.

    We get on average one of these per month posted here to slashdot as news.

    Nothing to see here. Move along please.

    1. Re:What it means... by Lars+T. · · Score: 1

      What it really means is that you didn't RTFA. Also no news.

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    2. Re:What it means... by turgid · · Score: 1
      What it really means is that you didn't RTFA. Also no news.

      Tell me, are they recruiting at IBM? Only I need a new job and I'm willing to sell my soul for the right price.

    3. Re:What it means... by Lars+T. · · Score: 0

      Did you lose your job of holding that "will hit myself for money" sign?

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

  14. Hold on there, partner. This isn't AltiVec stuff! by Paradox · · Score: 4, Informative
    Ahh Slashdot. First, let's mention this link which you were evidently too busy to provide. It links to two papers on how to tune for the G5. That way, someone can verify what I'm saying.

    The problems you're talking about are not the AltiVec's fault, and the AltiVec instruction set is still stable. Code will still run very quickly even if you don't optimize for the G5. But, let me bring a quote from one of those linked papers:

    Of course, your code may still need to be restructured to handle the increased latencies of the G5 Velocity Engine pipeline. Avoid small data accesses. Due to the increased latency to memory, the longer cache lines, and the nature of the CPU-to-memory bus, small data accesses should be avoided if possible. The entire system architecture has been designed to optimize the transfer of large amounts of data (i. e. maximize system memory throughput). As a side effect, the cost to handle small accesses can be very high and is quite inefficient.
    See, the problem you're complaining about is a problem with any port to the G5, or really any port from a slow-thin-memory-access system to a fast-wide-memory-access system. It has nothing to do with your AltiVec code. It just has to do with tuning for a larger L2 cache and and faster FSB rather than a slow FSB and a huge L3 cache.

    So let's not blame AltiVec for this. Except for a brief change in policy in the 745X G4, it seems like the AltiVec invocation has been stable for quite awhile.

    --
    Slashdot. It's Not For Common Sense
  15. Embedded world by Anonymous Coward · · Score: 0

    In my work, we use PPC extensively because the bang per Watt is so very high compared to x86 and relatives. We make good use of very domain-specific operations that have been hand-tuned for Altivec and are very happy with the results. Even without using AV, we see similar performance for our code running on 500MHz PPC and 2.5GHz P4.

    AV instruction set kicks ass over SSE. And having 32 registers in each of the integer, FP, and SIMD register files also kicks ass. We can perform a lot of register-based operations on a lot of operands without dropping operands out of registers.

    My complaint about vector libraries is that they tend to be basic building blocks. The overhead of hacing to chain together multiple basic functions can add up; we write specialized versions that do more per operand. Like instead of vector-multiply-add, we might have vector-multiply-multiply-add-sqrt-atan (magnitude of (I,Q) is (sqrt(I*I + Q*Q)), phase is atan2(Q, I)). More operations per read/write of memory == better performance.

  16. Introducing.... 1999! by statemachine · · Score: 1

    I wrote AltiVec code in 1999, and I even have a faded AltiVec t-shirt from the same year. AltiVec is just not new.

    "Re-Introducing" would be a better title.