IBM's Chief Architect Says Software is at Dead End

← Back to Stories (view on slashdot.org)

IBM's Chief Architect Says Software is at Dead End

Posted by ryuzaki0 on Tuesday January 30, 2007 @04:33AM from the its-da-end-of-da-woild dept.

j2xs writes "In an InformationWeek article entitled 'Where's the Software to Catch Up to Multicore Computing?' the Chief Architect at IBM gives some fairly compelling reasons why your favorite software will soon be rendered deadly slow because of new hardware architectures. Software, she says, just doesn't understand how to do work in parallel to take advantage of 16, 64, 128 cores on new processors. Intel just stated in an SD Times article that 100% of its server processors will be multicore by end of 2007. We will never, ever return to single processor computers. Architect Catherine Crawford goes on to discuss some of the ways developers can harness the 'tiny supercomputers' we'll all have soon, and some of the applications we can apply this brute force to."

9 of 334 comments (clear)

Min score:

Reason:

Sort:

Clearing things up a bit by AKAImBatman · 2007-01-30 04:34 · Score: 5, Insightful

In an InformationWeek article entitled 'Where's the Software to Catch Up to Multicore Computing?' the Chief Architect at IBM gives some fairly compelling reasons why your favorite software will soon be rendered deadly slow because of new hardware architectures.
*THUNK*

owww... my head...

There are a couple of serious problems with this statement. The most important one is that the article doesn't say that existing software will get slower. And there's a reason for that: Existing software will continue to run on the individual processor cores. Something that they've done for a long period of time. Old software may not get any faster due to a change in focus toward parallelism vs. increased core speed, but it's not going to suddenly come to a screeching halt any more than my DOS programs from 15 years ago are.

Secondly, multicore systems are not a problem. Software (especially server software!) has been written around multi-processing capabilities for a long time now. Chucking more cores into a single chip won't change that situation. So my J2EE server will happily scale on IBM's latest multicore Xenon PowerPC 64 processor.

Finally, what the article is really talking about is the difficulties in programming for the Cell architecture. Cell is, in effect, and entire supercomputer architecture shrunk to a single microprocessor. It has one PowerPC core that can do some heavy lifting, but its design counts on the programmers to code in 90%+ SIMD instructions to get the absolute fastest performance. By that, I mean that you need to write software that does the same transformation simultaneously across reasonably large datasets. (A simplification, but close enough for purposes of discussion.) What this means is that the Cell processor is the ultimate in Digital Signal Processor, achieving incredible thoroughput as long as the dataset is conductive to SIMD processing.

The "problem" the article refers to is that most programs are not targetted toward massive SIMD architectures. Which means that Cell is just a pretty piece of silicon to most customers. Articles like this are trying to change that by convincing customers that they'd be better served by targetting Cell rather than a more general purpose architecture.

With that out of the way, here's my opinion: The Cell Broadband Architecture is a specialized microprocessor that is perpendicular to the general market's needs. It has a lot of potential uses in more specialized applications (many of which are mentioned in the article), but I don't think that companies are ready to throw away their investment in Java, .NET, and PHP server systems. (Especially since they just finished divorcing themselves from specific chip architectures!) Architectures like the SPARC T1 and IBM's multicore POWER/PowerPC chips are the more logical path as they introduce parallelism that is highly compatible with existing software systems. The Cell will live on, but it will create new markets where its inexpensive supercomputing power will open new doors for data analysis and real-time processing.

--
Javascript + Nintendo DSi = DSiCade
1. Re:Clearing things up a bit by Red+Flayer · 2007-01-30 04:44 · Score: 5, Insightful
  
  The "problem" the article refers to is that most programs are not targetted toward massive SIMD architectures. Which means that Cell is just a pretty piece of silicon to most customers. Articles like this are trying to change that by convincing customers that they'd be better served by targetting Cell rather than a more general purpose architecture.
  
  In other words, a spokesperson from $COMPANY is trying to convince the market that they'll soon need to use $PRODUCT if they want best results, conveniently which is sold by the $COMPANY?
  
  Imagine that.
  
  Sorry, cynical today.
  
  --
  "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
2. Re:Clearing things up a bit by adam31 · 2007-01-30 05:46 · Score: 5, Informative
  
  but its design counts on the programmers to code in 90%+ SIMD instructions to get the absolute fastest performance.
  
  This is an often-repeated misconception. Cell abandons the practice of having different fp, integer, and vector registers... all registers are 128-bit and any instruction can be issued on any of them, and those instructions are generated by a C++ compiler. So saying that programmers code in these SIMD instructions is like saying that "x86's design counts on programmers to shuffle values between the fp stack, integer and vector registers, and code in separate fp, integer, and vector instructions to get the absolute fastest performance".
  The reality is that Cell was targeted more at solving the memory problem than just doing SIMD stream processing. Engineers looked around and decided a 32kb L1 cache was silly... not having a cache-snooping DMA engine (or prefetch engine) would be silly. Putting nine cores on a bus with 7 GB/s bandwidth would be silly. Not being able to overlap memory latency with execution is silly. To solve all these problems, you give up having a single coherent address space.
  But there is even more power in Java, .NET investments now... It is completely within the realm of possibility to write a runtime that executes your Java thread on SPU, or JITs the .NET to SPU code. It's a nice benefit that these are already handle-based rather than pointer based languages, so the memory-mapping is a task of the runtime and transparent to the code. And IBM is working hard on native C++ code generation that is agnostic of the address space problem.
That's ridiculous. by The-Bus · 2007-01-30 04:36 · Score: 5, Funny

I see no need for why we would ever need anything more than 640 cores per processor in the future.

--
Small potatoes make the steak look bigger.
Yeah, if you only run one program at a time.. by ruiner13 · 2007-01-30 04:39 · Score: 5, Insightful

What the author fails to take into account is that multi-core allows each program to effectively use a separate core to do its work, regardless of how it is programmed. All it takes is the OS to be smart enough to task each program to a free core, if available. The programs don't have to be specifically written to be multi-core aware as long as the OS is smart enough to send process to the idle cores. The programs that need more power than one core can deliver will usually have the multi-core support built in, as many games are starting to do now that the technology is taking off.

Notice I took the high ground and didn't make the obligatory windows virus scan jokes... :)

--
today is spelling optional day.
Concurrency is hard. by argent · 2007-01-30 04:52 · Score: 5, Informative

Concurrency is a hard problem, and unexpected interactions between asynchronous events in concurrent environments has been a periodic bugbear for almost as long as computers have been interactive.

It's what made the Amiga look less reliable than its competitors... if you only ran one native program at a time it was a lot more stable than MacOS or MS-DOS, because the OS provided a much richer set of services so applications didn't have to replicate them... but most people took advantage of the multitasking and when something crashed in the background the lack of memory protection meant the whole thing went down, and non-native software that wasn't written with multitasking in mind could produce the most entertaining crashes.

These days we all have good protected mode multitasking operating systems, but we don't have good easy ways to distribute an application across multiple cores. Until we do, most applications are going to be written to run single-threaded and depend on the OS to use the other cores to speed up the rest of the system, both at the application level and doing things like running graphics libraries on another core.

Until we have so many cores that the OS can't make effective use of them I don't think there's even going to be much of an attempt to make use of them for more developers. And then we're going to go through a painful period like we went through before Microsoft discovered multitasking.
You hit the nail right on the head by Salvance · 2007-01-30 04:53 · Score: 5, Interesting

The argument that software will get slower assumes that most consumer software will continue to have additional CPU requirements without being coded for multi-core applications. This doesn't make sense. The average consumer uses an Office product, e-mail, and a browser. None of these use anywhere close to 100% of the CPU for very long even on a Pentium 3, let alone on a 2GHz+ core in a multi-core processor.

Workstation computing will suffer some until software vendors catch up, but this is already happening (e.g. most CAD, Animation, Video Processing are starting to come out with multi-core optimized software). Sure, some apps will continue to be single-threaded, but eventually, who would buy them? Software vendors aren't dumb.

Games will probably speed up significantly as well. Imagine the possibilities of having a game engine where each AI character utilizes 100% of a single core? Game designers aren't going to sit around desiging games that run on single core engines, they always push the boundaries and will continue to do so.

--
Crack - Free with every butt and set of boobs
Stephen Wolfram has a solution by maynard · 2007-01-30 04:55 · Score: 5, Interesting

A New Kind of Science. Converting a range of standard CS algorithms into Cellular Automata networks is the very solution our brains use; a combination of message passing and feedback loops. If we want our computers to scale in parallel, we might want to look at how biology has solved the problem. A lot of people laughed at Wolfram when he initially published that book. I think he yet might have the last laugh.
programming for multi-core architectures by barnacle · 2007-01-30 05:17 · Score: 5, Informative

was an interesting article, particularly the part about the hybrid "roadrunner" architecture.

However what is more relevant to today's non-supercomputing needs is SMP scalability.

One of the challenges with SMP scalability is cache coherency; synchronizing the caches on the processors is a costly operation (this is necessary to ensure that each processor has the same view of certain memory at the same time), normally (always?) done with a cache invalidation.

So the more invalidations you do, the more often the processor has to fetch memory from main memory, and the less it's using its cache. Processing slows down dramatically.

I've tried to design the qore programming language http://qore.sourceforge.net/ to be scalable on SMP systems. The new version (released today) has some interesting optimizations that have resulting in a large performance boost on SMP machines - the optimizations involve reducing the number of cache invalidations to the minimum (more than just reducing locking, although that is a part of it too - even an atomic update - for example on intel an assembly lock and increment - involves a cache invalidation and therefore is an expensive operation on SMP platforms). There is more work to be done, but in simple benchmarks of affected code paths the performance increase was between 2 and 3 times as fast with the optimizations on the same qore code.

Anyway it would be interesting to know if other high-level programming languages have also taken the same approach (or will do so); as we go forward, it's clear that SMP scalability will be an important topic for the future...