IBM's Chief Architect Says Software is at Dead End

← Back to Stories (view on slashdot.org)

IBM's Chief Architect Says Software is at Dead End

Posted by ryuzaki0 on Tuesday January 30, 2007 @04:33AM from the its-da-end-of-da-woild dept.

j2xs writes "In an InformationWeek article entitled 'Where's the Software to Catch Up to Multicore Computing?' the Chief Architect at IBM gives some fairly compelling reasons why your favorite software will soon be rendered deadly slow because of new hardware architectures. Software, she says, just doesn't understand how to do work in parallel to take advantage of 16, 64, 128 cores on new processors. Intel just stated in an SD Times article that 100% of its server processors will be multicore by end of 2007. We will never, ever return to single processor computers. Architect Catherine Crawford goes on to discuss some of the ways developers can harness the 'tiny supercomputers' we'll all have soon, and some of the applications we can apply this brute force to."

10 of 334 comments (clear)

Min score:

Reason:

Sort:

rendered deadly slow? by Dr+Kool,+PhD · 2007-01-30 04:39 · Score: 3, Informative

If you look at single-thread performance on Intel and AMD's dual/quad core chips, they meet or beat the best that single-core has to offer. I don't see why a multi-core system in the future will run single-thread apps any slower than right now. If anything I'd expect single-thread performance to increase incrementally as Intel and AMD are able to increase clock speeds.
Concurrency in software by NullProg · 2007-01-30 04:44 · Score: 4, Informative

Herb Sutter wrote about this topic two years ago. A great read for anyone who is interested.
http://www.gotw.ca/publications/concurrency-ddj.ht m

Enjoy,

--
It's just the normal noises in here.
Multi-cores vs. internal parallelism by BritneySP2 · 2007-01-30 04:46 · Score: 4, Informative

IMHO, multi-cores are good for multitasking, which does not cover the whole problem of parallelism. Software (at least, in principle) _is_ ready: pure functional languages, for example, are perfectly suited for parallel processing; it is the lack of the CPUs with architectures that support internal concurrency (using a single core - as opposed to those providing support for multi-threading using multiple cores) that is the problem...
Concurrency is hard. by argent · 2007-01-30 04:52 · Score: 5, Informative

Concurrency is a hard problem, and unexpected interactions between asynchronous events in concurrent environments has been a periodic bugbear for almost as long as computers have been interactive.

It's what made the Amiga look less reliable than its competitors... if you only ran one native program at a time it was a lot more stable than MacOS or MS-DOS, because the OS provided a much richer set of services so applications didn't have to replicate them... but most people took advantage of the multitasking and when something crashed in the background the lack of memory protection meant the whole thing went down, and non-native software that wasn't written with multitasking in mind could produce the most entertaining crashes.

These days we all have good protected mode multitasking operating systems, but we don't have good easy ways to distribute an application across multiple cores. Until we do, most applications are going to be written to run single-threaded and depend on the OS to use the other cores to speed up the rest of the system, both at the application level and doing things like running graphics libraries on another core.

Until we have so many cores that the OS can't make effective use of them I don't think there's even going to be much of an attempt to make use of them for more developers. And then we're going to go through a painful period like we went through before Microsoft discovered multitasking.
1. Re:Concurrency is hard. by texag98 · 2007-01-30 06:05 · Score: 3, Informative
  
  I agree... anyone who's developed multi-threaded code for an SMP has probably run into the problems of debugging asynchronous thread events. This can make debugging, which is already tedious even more tedious and time consuming.
  
  As the number of cores increases different algorithmic approaches will need to be pursued to get the maximum performance. Many algorithms which are great for serial processors will perform poorly on a parallel architecture.
  
  I think many people don't realize just yet how big of a paradigm shift multi-core really represents. Think of all the billions of lines of legacy code that exists out which was written for sequential computing. Scalability of code is also important since code written today is tomorrow's legacy code code written that isn't scalable will eventually need to be revisited.
  
  Multi-core will probably also require a new look at memory systems in PCs... To keep a lot of cores busy you have to feed them and that means possibly changes to the memory subsystems. It's not so bad now with so few cores on processors, but as they increase to 16, 32, etc. things start to get harder.
  
  In any case, multicore is here to stay and it will be exciting to see what changes come about in the next few years.
Re:Yeah, if you only run one program at a time.. by ArcherB · 2007-01-30 04:56 · Score: 3, Informative

The programs don't have to be specifically written to be multi-core aware as long as the OS is smart enough to send process to the idle cores.

While that is true of multi-core general purpose processors like the x86, but I don't think that works too well when talking about the Cell processor. The OS can't just assign a Power-PC compiled app to a SPU and expect it to run. Apps have to be specifically coded to take advantage of the SPU's on the Cell.

--
There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
Concurrency is... by TodMinuit · 2007-01-30 05:04 · Score: 4, Informative

Concurrency is easy.

--
I wonder if I use bold in my signature, people will notice my posts.
programming for multi-core architectures by barnacle · 2007-01-30 05:17 · Score: 5, Informative

was an interesting article, particularly the part about the hybrid "roadrunner" architecture.

However what is more relevant to today's non-supercomputing needs is SMP scalability.

One of the challenges with SMP scalability is cache coherency; synchronizing the caches on the processors is a costly operation (this is necessary to ensure that each processor has the same view of certain memory at the same time), normally (always?) done with a cache invalidation.

So the more invalidations you do, the more often the processor has to fetch memory from main memory, and the less it's using its cache. Processing slows down dramatically.

I've tried to design the qore programming language http://qore.sourceforge.net/ to be scalable on SMP systems. The new version (released today) has some interesting optimizations that have resulting in a large performance boost on SMP machines - the optimizations involve reducing the number of cache invalidations to the minimum (more than just reducing locking, although that is a part of it too - even an atomic update - for example on intel an assembly lock and increment - involves a cache invalidation and therefore is an expensive operation on SMP platforms). There is more work to be done, but in simple benchmarks of affected code paths the performance increase was between 2 and 3 times as fast with the optimizations on the same qore code.

Anyway it would be interesting to know if other high-level programming languages have also taken the same approach (or will do so); as we go forward, it's clear that SMP scalability will be an important topic for the future...
Re:Clearing things up a bit by adam31 · 2007-01-30 05:46 · Score: 5, Informative

but its design counts on the programmers to code in 90%+ SIMD instructions to get the absolute fastest performance.

This is an often-repeated misconception. Cell abandons the practice of having different fp, integer, and vector registers... all registers are 128-bit and any instruction can be issued on any of them, and those instructions are generated by a C++ compiler. So saying that programmers code in these SIMD instructions is like saying that "x86's design counts on programmers to shuffle values between the fp stack, integer and vector registers, and code in separate fp, integer, and vector instructions to get the absolute fastest performance".
The reality is that Cell was targeted more at solving the memory problem than just doing SIMD stream processing. Engineers looked around and decided a 32kb L1 cache was silly... not having a cache-snooping DMA engine (or prefetch engine) would be silly. Putting nine cores on a bus with 7 GB/s bandwidth would be silly. Not being able to overlap memory latency with execution is silly. To solve all these problems, you give up having a single coherent address space.
But there is even more power in Java, .NET investments now... It is completely within the realm of possibility to write a runtime that executes your Java thread on SPU, or JITs the .NET to SPU code. It's a nice benefit that these are already handle-based rather than pointer based languages, so the memory-mapping is a task of the runtime and transparent to the code. And IBM is working hard on native C++ code generation that is agnostic of the address space problem.
Re:Yeah, if you only run one program at a time.. by Jerf · 2007-01-30 06:16 · Score: 4, Informative
Couldn't the programs inherit the benefits of a multi-core system if the APIs they call are written to distribute the work to the cores? I know this probably isn't optimal but there must be some benefits from this.
In a word, no.

The more complicated answer is "Yes, in rare cases".

The problem is that programs written in your normal languages (C, C++, Java, C#, basically anything you've ever heard of) are totally synchronous; you can not proceed on to the next statement until the previous one completes.

Thus, trying to parallelize something at the API is virtually worthless. I don't win anything if my "drawWindow" or "displayMPEGFrame" function flies off to another processor to do its work, if I still have to wait for it to complete before I can move on.

(This can be helpful if you have two types of processors, so in fact 3D graphics APIs can be looked at as working just this way. But we already have that.)

You might say, "But there are some operations that I can do that with, like loading a webpage!" We already can do that. It's called asynchronous IO; you fire your IO request, the hardware (with software assist) does its thing, and you get the results later. You might even fire off a lot of things and process them in the non-deterministic order they come back. UNIX has been doing that for about as long as it has been UNIX, via the select call.

The easy stuff has been done. To write programs that actually fill a multi-core CPU's capacity is going to require a paradigm change. Shared-memory threading isn't looking very good (too complex for any human to correctly implement). There are several candidate paradigms, but there is no clear winner at the moment, some of them may never work, and they all have one thing in common: They look nothing like current coding practices with threads (because, as I said, that's looking pretty useless if we can't get it working in the decades we've had to play with it).

The claims I've seen so far:
- Erlang-style concurrency: This is a ton of little threads that communicate solely through message passing, no shared state. On the plus side, it's got a working implementation that you can use today. On the down side (and this is my personal opinion), I'm not sure you really need the "functional" part of Erlang to use it (I think you just need threads that share nothing, and if you did that in a more conventional OO language it'd be fine), and Erlang's still quite short on libraries for anything outside of its core competency of network programming.
- Pure functional programming: Pure functional programming has the idea of no mutable state, which allows you to do certain things out-of-order automatically without fear of the system behaving non-deterministically. A lot of people are still making bold claims about this one, but I tend to agree with the papers that show the amount of implicit parallelism in real-world programs is fairly minimal; you're going to need to tell the system where the parallelism for the forseeable future.
- Stream programming: Probably ultimately a special case of Erlang-style processes, and only useful in certain domains (like sound processing).
- And of course, I'd be remiss to not mention the "suck it up and use threads" school of thought, but my feeling is that if programmers in general haven't gotten it right after 20 years, the claim that programmers are especially stupid becomes less plausible, and "the technology is uselessly complex in practice" must be the right answer.
This isn't exhaustive, it's off the top of my head, and there are endless variations on each of those themes.

If I had to lay money down, I'd go with "a language that used threadlets like Erlang and rigidly enforced no sharing, in an OO environment" winning, which does not really exist yet. (Probably the closest you get today would be Stackless Python with a manual enforcement of sending only immutables across t