Introducing the PowerPC SIMD unit
An anonymous reader writes "AltiVec? Velocity Engine? VMX? If you've only been casually following PowerPC development, you might be confused by the various guises of this vector processing SIMD technology. This article covers the basics on what AltiVec is, what it does -- and how it stacks up against its competition."
I've done some altivec programming in the past, and discovered it was a very effective use of my time. Since there's no mode-switching penalty for using the vector instructions you can use it for some very trivial-but-common tasks, like replacing strlen(), vector operations on small tables, etc.. I knocked a lot of computation time (25%) from one of my projects just by vectorizing three functions. Of course there's a hitch: vector processing only works for certain kinds of algorithms and requires a change in mindset. In spite of that it's a great tool to have in your box.
Not all random numbers are created equally.
What does this even mean? I've written a great deal of optimized SSE code, and I can promise you that it works just as well on AMD. In fact, if you look at Athlon's pipeline, it does some really amazing things rescheduling and executing operations out-of-order. Fiddling around with ordering individual instructions is basically pointless because the scheduler has gotten so good at doing it on-the-fly.
Can you cite a specific example, because I've never run into this.
I don't know how much of OS X has AltiVec code, but there are many other apple apps that use it. iTunes uses it for encoding music. I'm sure the video codecs in Quicktime use it as well.
The Mac has a really nice optimization tool called shark which will help you find things that can be put into the AltiVec processor (it also helps with general optimization).
Don't count your messages before they ACK.
is here. They talk about altivec on Page 3. IIRC, it's the best designed mass-market SIMD implementation there is out there.
Make sure everyone's vote counts: Verified Voting
If anyone is interested, simdtech.org is probably the best resource you can find for AltiVec (or any other SIMD) programming. They have a number of tutorials and technical resources and the mailing list is the best there is. Motorola, Apple, and IBM engineers frequent the list so you can get help and information directly from the guys that created AltiVec as well as from those who program for it.
--
Join the Pyramid - Free Mini Mac
infested with jello like fishes no melotron wishes
Most (all?) Apple hardware does the checksum in hardware (built into the NIC). Add to that the inefficiency of using Altivec in the kernel, especially for small data sets, and it did not make sense for Apple to develop an Altivec version of the TCP checksum code.
The reason the article mentions the checksum case is not because Apple is missing the boat, but because there was a nice research article written about writing optimized TCP checksum code for Altivec, providing a good set of example code for aspiring Altivec coders.
There's a book "Vector Game Math Processors" by James Leiterman ISBN: 1-55622-921-6 that discusses programming PowerPC-AltiVec, MIPS, and 80x86 SIMD instructions. I found it pretty useful when I do vector programming with AltiVec! Some instructions that other processors have that AltiVec doesn't are simulated with what he called PseudoVec!
The problems you're talking about are not the AltiVec's fault, and the AltiVec instruction set is still stable. Code will still run very quickly even if you don't optimize for the G5. But, let me bring a quote from one of those linked papers:
See, the problem you're complaining about is a problem with any port to the G5, or really any port from a slow-thin-memory-access system to a fast-wide-memory-access system. It has nothing to do with your AltiVec code. It just has to do with tuning for a larger L2 cache and and faster FSB rather than a slow FSB and a huge L3 cache.So let's not blame AltiVec for this. Except for a brief change in policy in the 745X G4, it seems like the AltiVec invocation has been stable for quite awhile.
Slashdot. It's Not For Common Sense
the new 'Amiga' is basically a PowerPC reference board. they make an ATX and a Mini-ITX version of the board. it will run the new AmigaOS / MorphOS / all (?) the various PPC Linuxes (Linii?).
a rticle&sid=16
5 &tid=137&tid=138
http://www.walibe.com/modules.php?name=News&file=
http://slashdot.org/article.pl?sid=03/09/22/23921
the AmigaOne boards are either G3 or G4. no G5.
Don't forget the small Apple desktop in a fancy case.
--
"Open source is good." - Steve Jobs
"Open source is evil." - Microsoft
I'd like to know if Mac OS X uses the Altivec instructions to their full potential.
No, at this point too much needs hand tuning for everything to fully utilize the potential of Altivec. Most serious DSP-class apps spend the effort to do this in critical code, but there's plenty of compiled code running in OSX that doesn't benefit from the parallel vectorization that the Altivec unit can offer.
This is all about to change with GCC 4 which offers an SSA tree optimizer. The SSA form is particularly useful for doing automatic vectorization of code. I'm not sure what the efficiency will be like in the first release but it looks like good things are coming.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
I believe the GStreamer people are looking into using liboil. The license is two-clause BSD.
Of course it is rarely true that AltiVec instructions are used to their "full potential" in the sense you can usually find another CPU cycle to eliminate, but neither is it necessary to use hand tuning to get big boosts from AltiVec. We do the hand tuning for you (in C with AltiVec extensions or in assembly language) and provide optimized libraries, such as BLAS, vDSP, vImage, and others bundled into the Accelerate framework. (As another participant notes, some library interfaces and functions are not original to Apple, but Apple provides optimization.) The libraries are used in a variety of places throughout Apple's software, even inside the kernel, and are available to external developers.
It does take a certain class of algorithms to get a lot of use from the libraries, but some of the routines are of general benefit. You would expect image processing, music encoding and decoding, and mathematical algorithms to run faster with AltiVec, but even simple things like copying memory are significantly faster when done with AltiVec instructions.