Domain: pixelglow.com
Stories and comments across the archive that link to pixelglow.com.
Comments · 18
-
Re:more to it
Could be done, but short of summing the elements in a more concise way - v.sum() - what would it give you? The only optimized valarray implementation I know of is macstl, and that has not been maintained for more 2 years, so it seems that noone actually cares.
-
Re:Power isn't PPC
Have you really done any serious computing with PPC chips?If you would like to know more,see this:http://www.pixelglow.com/stories/macintel-fa
s ter-than-altivec/ -
Re:Graphviz
No worries. If you have access to a Mac, I'd also recommend http://www.pixelglow.com/graphviz/. The raster formats that
can come out of GraphViz can be huge (i.e. 100,000x100,000 pixels), they can be quite difficult to deal
with sometimes, the GUI these guys have written can deal with even huge graphs pretty snappily. -
conversion tools
One possible conversion tool is macstl (which comes with some scripts that claim to help in converting Altivec to SSE) - I haven't used it myself, only read the blurb.
-
Re:Short answer? NoOn the other hand, ICC is generally lauded for its ability to vectorize code in a manner that lends to performance increases thanks to the SSE/2/3 vector units on Pentium chips.
Yup. The FORTRAN compiler often does some good autovectorization. Show me the C compiler does too.
Hell, why not look at this? Ignore the Mac part, just look at the x86 results. Looks like ICC doesn't even beat Visual C++ for most of the tests they did here (at least not by much).
Apart from that, you can do some wierd-ass stuff with AltiVec (and to a lesser effect with SSE/MMX/3DNow) that can never be generated by autovectorization.
-
Re:This is bullshit.
Now's the time to look at truly revolutionary cross-platform SIMD technology. Stuff that makes Altivec and SSE work almost the same, with the same front-end interface and the insane speed that SIMD is capable of. Stuff that compiles cleanly on Linux, Windows and Mac OS X, on x86/SSE/SSE2 and PowerPC/Altivec, with serious 9x - 22x speedup over scalar code. Now's the time to look seriously at macstl.
Cheers, Glen Low, Pixelglow Software.
-
Cocoa is the way to go
If you're serious about Mac OS X development and want to write applications which blend in well with the Mac environment (necessary if you expect Mac users to actually want to use your applications) then you must learn how Mac applications are supposed to look and behave and figure out how to apply the Human Interface Guidelines to your application. (Note that the HIG is not always 100% accurate
.. there are places where you should deviate from it slightly in order to match what Apple's apps do.) This is important, as if you do anything unMaclike then your application will be bashed as a bad Windows port (even if you wrote it from scratch) and no Mac users will touch it.
You have two ways to go in terms of APIs. Cocoa and Carbon. Cocoa is a refined version of NeXT's OpenStep. Carbon is a cleaned up version of the old Classic Mac OS API (but with a huge number of changes .. it is much improved.) Carbon is straight C code, and the concepts involved are more similar to Windows or X11 than Cocoa's design. But it's big and complicated, and would take a long time to learn. Also, there's a lot more stuff that you have to do in Carbon to make your application behave properly.
Cocoa gives you most of the behavior for free. You'll write less code, and you'll probably end up with a more Mac-like application. (It's entirely possible to write a a well-behaved Carbon app, but you'll have a lot more to learn in order to do it right, as fewer things are done for you automatically.)
With Cocoa you'll have to learn Objective C. This is not a big deal. If you know C, then you can learn the handful of additions which comprise Objective C in less than an hour. It's a very simple language.
Theoretically you can program Cocoa apps in Java, but I do not suggest that you attempt this. Java does not really fit into the Cocoa model very well (it's not nearly dynamic enough) and was shoehorned in as a way to attract developers who refused to learn ObjC. This was a mistake on Apple's part, and they seem to have realized it ... they no longer promote the use of Java in Cocoa.
I strongly recommend that you spend some time with Hillegass's book on Cocoa. Objective C an elegant language, and is certainly the fastest way to develop Mac applications.
If you are a diehard C++ fan, then you may be better off with Carbon, but there will probably be a bigger learning curve as the Carbon libraries are more complex. (Carbon has a long complicated history, from its Pascal roots and old Classic Mac OS constructs (resource forks, FSspecs, Gworlds, etc.) and repeated changes in design (GetNextEvent() replaced by WaitNextEvent() and then Carbon Events, QuickDraw replaced by Color QuickDraw and now by Quartz 2D) so it takes quite a while to figure it all out.
I'd give ObjC and Cocoa a chance first. You can always use Objective-C++ to combine an ObjC user interface with a C++ backend, if you need to port old code. Be sure to check out MacSTL as it provides some nifty stuff to treat some Core Foundation and Foundation objects in an STL manner.
As for tools ... well, the Xcode IDE and all the GNU tools (gcc, gdb, etc.) come with Mac OS X, so you shouldn't need to buy anything. -
Re:portable vectorization
macstl already does some of this, through a custom implementation of the C++ standard component valarray:
Adding a scalar to a vector: a += 0.5;
Finding the size of a vector: size = sqrt(sum (a * a));
Finding dot product: dot = sum (a*b);
All this in C++ for maximum compatibility with existing code and a simple, intuitive syntax.
macstl currently doesn't have direct support for 2D matrices but they can be overlayed using slice and gslice. However the overhead of pulling scalar elements from scattered places and putting them into a vector may kill most performance gains from using vectorized code in these cases.
One of the shortcomings of separately compiled code libraries like vecLib etc. is the granularity of the supplied functions. Altivec and SIMD work best with branchless, straight-line code and maximal use of CPU registers. Any separate library either has (1) general-purpose functions that work with a vector at a time, or (2) special-purpose functions that work with many vectors at a time. But (1) performs too little work within a function, so that any code which uses it will have too many branches (function calls and returns). And (2) performs a good amount of work before function return, but you as the client are forced to pick from very specialized functions such as FFT, image processing, etc.
That's where inline code libraries like macstl come in. Besides being open source by necessity (since all the source is visible to humans and compilers alike), a good C++ compiler will inline all the function calls with a loop and smooth out the road for fast Altivec performance. The only other library I know of with a similar philosophy is Joel Falcou's EVE.
macstl-generated code hasn't been benchmarked against gcc 4.0 autovectorized code, but its SSE/SSE2 generated code (did I mention it is cross-platform?) is actually faster than Intel ICC 8.1's autovectorized code.
Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com -
Re:portable vectorization
macstl already does some of this, through a custom implementation of the C++ standard component valarray:
Adding a scalar to a vector: a += 0.5;
Finding the size of a vector: size = sqrt(sum (a * a));
Finding dot product: dot = sum (a*b);
All this in C++ for maximum compatibility with existing code and a simple, intuitive syntax.
macstl currently doesn't have direct support for 2D matrices but they can be overlayed using slice and gslice. However the overhead of pulling scalar elements from scattered places and putting them into a vector may kill most performance gains from using vectorized code in these cases.
One of the shortcomings of separately compiled code libraries like vecLib etc. is the granularity of the supplied functions. Altivec and SIMD work best with branchless, straight-line code and maximal use of CPU registers. Any separate library either has (1) general-purpose functions that work with a vector at a time, or (2) special-purpose functions that work with many vectors at a time. But (1) performs too little work within a function, so that any code which uses it will have too many branches (function calls and returns). And (2) performs a good amount of work before function return, but you as the client are forced to pick from very specialized functions such as FFT, image processing, etc.
That's where inline code libraries like macstl come in. Besides being open source by necessity (since all the source is visible to humans and compilers alike), a good C++ compiler will inline all the function calls with a loop and smooth out the road for fast Altivec performance. The only other library I know of with a similar philosophy is Joel Falcou's EVE.
macstl-generated code hasn't been benchmarked against gcc 4.0 autovectorized code, but its SSE/SSE2 generated code (did I mention it is cross-platform?) is actually faster than Intel ICC 8.1's autovectorized code.
Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com -
Re:How do I code this thing??
What I think is they'll provide some c++ framework or perhaps some meta language so that programmers define small treatment units with clearly defined treatment, inputs and output streams, and interconnect them without having to write tons of boilerplate code, and with abstractions to be able not to care about the details of memory management and streaming from and toward other treatment units running on other SPEs.
That sounds remarkably like C++ std::valarray, of which my macstl is a SIMD-optimized version. Just write: v0 = sin (v1) + cos (v2) for arrays v0, v1, v2 and the library and compiler handles the rest, chunking the array for SIMD consumption. Now if only I can get a handle on a couple of Cells, then macstl will be able to run seamlessly on Altivec, MMX/SSE and the Cell architecture!
BTW, the OpenMP spec might help here to, it's largely implemented by compilers like IBM XLC++ (hmm...) and Intel ICC, though not yet in gcc.
Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com -
Re:How do I code this thing??
What I think is they'll provide some c++ framework or perhaps some meta language so that programmers define small treatment units with clearly defined treatment, inputs and output streams, and interconnect them without having to write tons of boilerplate code, and with abstractions to be able not to care about the details of memory management and streaming from and toward other treatment units running on other SPEs.
That sounds remarkably like C++ std::valarray, of which my macstl is a SIMD-optimized version. Just write: v0 = sin (v1) + cos (v2) for arrays v0, v1, v2 and the library and compiler handles the rest, chunking the array for SIMD consumption. Now if only I can get a handle on a couple of Cells, then macstl will be able to run seamlessly on Altivec, MMX/SSE and the Cell architecture!
BTW, the OpenMP spec might help here to, it's largely implemented by compilers like IBM XLC++ (hmm...) and Intel ICC, though not yet in gcc.
Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com -
Re:Why? Altivec-optimized libraries supplied by Ap
SIMD works best when all the SIMD code is within a tight inner loop, with little or no branching or conditional code except the loop itself. Especially function calls. This helps with loop unrolling, instruction scheduling and minimizes pipeline bubbles.
The problem with any separately compiled library, including Apple's fine vecLib implementation, is that it puts the function call boundary in the wrong place.
The compiled library will have functions like vec_sin and vec_cos that work on a large set of vectors like IBM MASS, so your call to calculate sin(x)+cos(x) would look something like:
allocate 1000 of v1
allocate 1000 of v2
allocate 1000 of v3
allocate 1000 of v4
for each v1, v2: v2 = sin (v1) -- call to lib vec_sin
for each v1, v3: v3 = cos (v1) -- call to lib vec_cos
for each v4, v2, v3: v4 = v2 + v3 -- call to lib add
Note 4 memory allocations, 2 of which are for temporaries that won't be used again; note also 3 branches away to library functions.
Compare with macstl, which is massively inlined yet works on an element-by-element basis:
allocate 1000 of v1
allocate 1000 of v4
for each v4, v1: v4 = sin(v1) + cos(v1)
Saving 2 expensive allocations and inlining function calls to within the loop, so no conditionals or branching there.
The only way around this with separately compiled code is to put more and more functionality into a single call e.g. FFT, linear algebra, but you lose the flexibility of creating your own equations -- what if you don't want fast fourier transforms or linear algebra, but some funky trig function?
Check out http://www.pixelglow.com/stories/altivec-valarray- 1/.
Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com -
Re:Isn't it what std::valarray is for?
I guess you didn't notice: http://www.pixelglow.com/macstl/valarray/.
-
Why limit yourself to Altivec when you have NVidia
Well the processing power of Altivec or MMX/SSE/3DNow or whatever is nowhere near the power of you newest NVidia/ATI card you have surely bought for playing Doom III. Why not use it then? Get the brook compiler! Furthemore, I see they introduce classes like vec, etc. Such classes have been already designed successfuly for C++. Why not try porting Blitz to the Altivec and/or to the GPU?
-
Re:Isn't it what std::valarray is for?
So, my question is: could an std::valarray specialization for processor-supported types serve as a basis for portable SIMD support in C++?
That's exactly what this is. If you read the part on his website about valarray then you'll see that it does extensive SIMD optimizations for valarray for both Altivec and MMX/SSE/SSE2/SSE3 platforms. He's even added "parallelized algorithms such as integer division, trigonometric functions and complex number arithmetic" which you'd have to code yourself in either assembly or using the C-based intrinsics if you wanted do the SIMD programming by hand.
So basically, this allows you to code using std::valarray using normal C++ and then plug this in under the hood to get a nice speed boost.
--
Join the Pyramid - Free Mini Mac -
About the RPL
The RPL ( Reciprocal Public License) is an odd choice for this project. It is an even stronger viral copy-left than the GPL, to the point where the FSF takes issue with it. If create a derivative work you are required required to 1) Notify the original author, and 2) Publish your changes even if you only use the program in house. Furthermore, their definition of derivative work is much, much broader than the "linking" definition that the GPL uses.
The fact that it puts these additional requirements / restrictions on the user makes it incompatible with the GPL. In fact, considering the requirements placed on you by the license, I would expect that you will have difficulty incorporating this RPL library into any existing FLOSS project without running into license conflicts. The only thing I can see this being useful for is a new project that you don't mind releasing under the RPL, or with existing BSD style licensed code which you dual license as BSD/RPL (since BSD can be included in anything).
So this library does not appear to very useable for the FLOSS world, although if you want to license it for proprietary software you may. -
License issuesBe careful; the "open source" license (PDF) is not GPL-compatible. I don't even think it's BSD-compatible on first reading.
The Reciprocal Public License requires you to release all of your source code if you link to this library, even if your project is personal or used in-house only.
-
A Fujitsu scanner, SANE and Quartz Python bindingsSuch as the fi-4120c is what I'd recommend. You might have to stretch your budget a bit. The cheap HP sheet feeders are very unreliable; we went through two HP 5550c's enduring constant paper jams before switching to a better (Fujitsu) scanner.
Unfortunately you don't have much use for something like Acrobat Capture because you have handwritten notes to deal with. To process the files, SANE and/or TWAIN interfaces are reasonably easy to write code for. The cool thing about SANE is that you can run the saned daemon on any Mac or Linux box, and with a couple of lines of config file changes, it's instantly available over the network from any Mac, Windows, or Unix box (there are TWAIN bridges for Mac/Windows so it even shows up in Photoshop and so forth); there are also standalone GUI clients like XSane.
I wrote a document management system in Python/wxWidgets (for Windows) in about a month part-time, and it works very well. Either on Mac or Windows, PDF makes sense because of the ubiquity of the viewers, even if you lose a bit in compression compared to more optimized formats such as DjVu. On Windows you can easily embed the Acrobat ActiveX control; on Mac OS X you have native PDF support, Panther's Preview kicks ass, and there are several open-source PDF browsing components such as the ones out of TeXShop or Glen Low's Graphviz port you can embed in your own app.
Given a choice I would probably pick the Mac to do this project, because of the wonderful Quartz/CoreGraphics Python bindings. You can just draw right to PDF, and place PDF files as if they were images; for example, here's a short script to rotate a bunch of PDF files (sorry, Slashdot destroys Python indentation):
#!/usr/bin/python
You could also use ReportLab, but because a lot of the PDF processing code is written in Python it's somewhat slower and memory-hogging for high-volume use. (I used ReportLab on Windows for the above project, and use CoreGraphics Python bindings for my research, so I do know what I'm talking about mostly
from CoreGraphics import *
import math, sys
for inputPDFPath in sys.argv[1:]:
inputProvider = CGDataProviderCreateWithFilename(inputPDFPath)
&n bsp; inputPDF = CGPDFDocumentCreateWithProvider(inputProvider)
&n bsp; if inputPDF is None:
print >> sys.stderr, \
"unable to open '%s': perhaps is not a PDF file?" % inputPDFPath
continue
outputContext = CGPDFContextCreateWithFilename(
inputPDFPath + '-rotated.pdf', None)
for pageNumber in xrange(1, inputPDF.getNumberOfPages() + 1):
mediaBox = inputPDF.getMediaBox(pageNumber)
rotatedBox = CGRectMake(0, 0, mediaBox.getMaxY(), mediaBox.getMaxX())
outputContext.beginPage(rotatedBox)
outputContext.saveGState()
outputContext.translateCTM(0, rotatedBox.size.height)
outputContext.rotateCTM(-math.pi/2)
outputContext.drawPDFDocument(mediaBox, inputPDF, pageNumber)
outputContext.restoreGState()
outputContext.endPage()
outputContext.finish() :)