Intel Releases Threading Library Under GPL 2
littlefoo writes "Intel Software Dispatch have announced the availability of the Threading Building Blocks (TBB) template library under the GPL v2 with the run-time exception — so this previously commercial only package is now open for all the use, whether for open-source projects or commercial offerings (although they are explicitly encouraging open source use). The interface is more task-based then thread-based, but with a somewhat different view of things than, e.g. OpenMP.
From the Intel release: 'Intel® Threading Building Blocks (TBB) offers a rich and complete approach to expressing parallelism in a C++ program. It is a library that helps you leverage multi-core processor performance without having to be a threading expert. Threading Building Blocks is not just a threads-replacement library. It represents a higher-level, task-based parallelism that abstracts platform details and threading mechanism for performance and scalability.'"
I attended a seminar about this at GDC (Game Developers Conference) this year. It is really nifty stuff, automatically parallelizes things for you and helps take the load off of the OS scheduler. It is also trivial to implement in many cases, for instance there are parallel loops that execute things in parallel, all you have to do is write it like a normal loop but use a different keyword (ok so it is a wee bit more involved, but you get the idea). If I recall correctly it is basically a thread-pool that manages scheduling itself better than the OS because it knows ahead of time the needs of the code. Also you don't have to know the # of cores or anything as it handles that transparently. Also it isn't limited to Intel processors, I'm pretty sure at GDC it was actually being demoed on some sparc machines. If I had the time and/or a reason to use it I would definately investigate further.
If you are about to mod me down, keep in mind that this post was most likely sarcastic.
Thats the thing, it makes programming easier by making the whole parallel thing a bit more transparent. Basically picture a foreach loop. This thing allows you to do the same thing but instead can do multiple instances of the loop at once and automatically uses the "optimal" number of threads based on the cores available, you just have to call parallel_for. It's not quite as simple as that but it certainly does take the grunt work out of parallelizing things.
If you are about to mod me down, keep in mind that this post was most likely sarcastic.
The AMD question was raised on their Forums, and there is no issues with TTB running on AMD CPUs.
And, if there was, well it's under the GPL now, and I'm sure someone would have added / corrected that mistake.
III.IIVIVIXIIVIVIIIVVIIIIXVIIIXIIIIIIIIVIIIIVVIII
It depends on which version of the GPL you use. There's a 'runtime exception' version (That Intel chose for this project) that allows you MORE freedom than the LGPL in the case of libraries.
Simply put, you can link in the code as a library without worrying about LGPL's library requirements. (Namely the need to be able to replace the library with an upgraded version.) Intel notes that this is necessary for C++ libraries because of the way they have to be linked.
For the parent's code, I doubt he chose to have this clause in the GPL he chose, and it wouldn't be possible with his.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Agreed it does look to take a lot of the grunt work out of writing parallel-processing code. There are supposedly Java and
Just junk food for thought...
Erm, yes, C++ has local classes, however there is a "BUT" and it's a big one:
Local classes / structs do not have external linkage and therefore can't be used as template arguments. So, for functors etc., which is precisely where you'd want something like a local class (ie. because you really want a closure), they are useless.
Hence why we have Boost lambda. Expect, and I agree with the GP, the syntax ends up so horrible (due to the constraints of C++, not in any way the fault of the Boost devs) that you end up not using it. Not a lot of point in trying to do something because it is technically cleaner and neater if it ends up unreadable and therefore unmaintainable (for that, there is always Perl).
C++ (or C) is where all the fast code is still written. Thus it is the most relevant place for this kind of thing. If you look at Intel's page, you'll see they sell compilers, but only for two languages: C/C++ and Fortran. The reason is that their compilers are specifically to get as much performance as possible on an x86/x64 chip. So they target the languages people use when they are performance oriented. There are lots of other great languages out there, but face it, you aren't (or at least shouldn't) be using a managed language like Java when every last clock cycle counts.
You'll find that this is rather evident in most games. While it is increasingly common to write large portions of the game in a scripting language since that make it easier to write and perhaps more importantly easier to mod, you'll find that the high speed stuff is still C++. Take Civ 4 for example. They wrote almost the whole damn game in XML and Python. All data (like unit definitions, technology tree, etc) is stored in XML files, all the scripting necessary to make them work is Python. Makes the game extremely easy to mod. However, the AI code, which they also released to end users, is in C++. The reason is that the AI is highly intensive and would have run too slow in Python. Also, the core engine of the game (not released to users) is C++ as well.
So it isn't surprising this is where Intel is targeting their optimisations. Also, I'd argue that to a large degree any of this kind of thing for a managed language is the responsibility of the runtime itself. If Java is to have better support for automatically threading things, the JRE is probably where that should be done.
I compiled and ran the examples on my AMD system. They run without issue.
Just junk food for thought...
We've been supporting Linux, Windows and Mac OS X for x86, x86-64 and Itanium processors in the commercial product for a year. And, yes, those include Intel and AMD processors. The commercial product information only lists those.
;-) We're talking about what to do together to add SPARC support to - which shouldn't be too hard but will take some work.
The commercial product information quoted does not include some ports which were completed for the open source project only days before the open source release.
Preparing for open source, we were able to get G5 for Mac OS X as well as support for Solaris and FreeBSD (both x86 and x86-64) working before releasing on Tuesday. It was tight - but they made it. I wasn't sure until the week before what we would have - but the team got them working. I think it will be easier now that the project is started - and we can let other join in to help us.
I should also say we got a bunch more Linux distributions working for builds too. We have tested them enough to see no issues - but we haven't enough experience to call them supported on the product pages (commercial product). Please look for the latest ports on the open source project threadingbuildingblocks.org. We'll work with anyone who has processors/system expertise and needs any advice we can offer. Understandably, we don't have a lot of non-Intel hardware inside Intel to test upon and we are hoping others can help a bit with that.
For compilers - we have gcc, Intel, Microsoft and Apple (gcc in Xcode environment) compilers all working with the builds. It seems like we may have something to do for Sun's compilers and/or environment working - some Sun engineers are in touch and helping us double check this. No schedule - just working together - which I have faith will get results to put out in an updated open source copy in the not too distant future - non-binding wish - this is not a promise
The biggest issues from processor to processor is knowing how to implement a few key locks, and atomic operations, best in assembly language. Since we have support for processors with both weak and strong memory consistency models - we know TBB is up to the task.
TBB is very strongly tied to shared memory, and so a port to a Cell processor (or a GPU) would be a bit more challenging - but might be doable for the Cell. We've had only a few discussions/thoughts - no progress I know of figuring out a good approach there. That will almost certainly take someone with more Cell experience than we have at this time. I'm open to learning - but I'd need a teacher for sure.