IBM's Chief Architect Says Software is at Dead End

Clearing things up a bit by AKAImBatman · 2007-01-30 04:34 · Score: 5, Insightful

In an InformationWeek article entitled 'Where's the Software to Catch Up to Multicore Computing?' the Chief Architect at IBM gives some fairly compelling reasons why your favorite software will soon be rendered deadly slow because of new hardware architectures.

*THUNK*

owww... my head...

There are a couple of serious problems with this statement. The most important one is that the article doesn't say that existing software will get slower. And there's a reason for that: Existing software will continue to run on the individual processor cores. Something that they've done for a long period of time. Old software may not get any faster due to a change in focus toward parallelism vs. increased core speed, but it's not going to suddenly come to a screeching halt any more than my DOS programs from 15 years ago are.

Secondly, multicore systems are not a problem. Software (especially server software!) has been written around multi-processing capabilities for a long time now. Chucking more cores into a single chip won't change that situation. So my J2EE server will happily scale on IBM's latest multicore Xenon PowerPC 64 processor.

Finally, what the article is really talking about is the difficulties in programming for the Cell architecture. Cell is, in effect, and entire supercomputer architecture shrunk to a single microprocessor. It has one PowerPC core that can do some heavy lifting, but its design counts on the programmers to code in 90%+ SIMD instructions to get the absolute fastest performance. By that, I mean that you need to write software that does the same transformation simultaneously across reasonably large datasets. (A simplification, but close enough for purposes of discussion.) What this means is that the Cell processor is the ultimate in Digital Signal Processor, achieving incredible thoroughput as long as the dataset is conductive to SIMD processing.

The "problem" the article refers to is that most programs are not targetted toward massive SIMD architectures. Which means that Cell is just a pretty piece of silicon to most customers. Articles like this are trying to change that by convincing customers that they'd be better served by targetting Cell rather than a more general purpose architecture.

With that out of the way, here's my opinion: The Cell Broadband Architecture is a specialized microprocessor that is perpendicular to the general market's needs. It has a lot of potential uses in more specialized applications (many of which are mentioned in the article), but I don't think that companies are ready to throw away their investment in Java, .NET, and PHP server systems. (Especially since they just finished divorcing themselves from specific chip architectures!) Architectures like the SPARC T1 and IBM's multicore POWER/PowerPC chips are the more logical path as they introduce parallelism that is highly compatible with existing software systems. The Cell will live on, but it will create new markets where its inexpensive supercomputing power will open new doors for data analysis and real-time processing.

--
Javascript + Nintendo DSi = DSiCade

Re:Clearing things up a bit by Red+Flayer · 2007-01-30 04:44 · Score: 5, Insightful

The "problem" the article refers to is that most programs are not targetted toward massive SIMD architectures. Which means that Cell is just a pretty piece of silicon to most customers. Articles like this are trying to change that by convincing customers that they'd be better served by targetting Cell rather than a more general purpose architecture.

In other words, a spokesperson from $COMPANY is trying to convince the market that they'll soon need to use $PRODUCT if they want best results, conveniently which is sold by the $COMPANY?

Imagine that.

Sorry, cynical today.

--
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
Re:Clearing things up a bit by markov_chain · 2007-01-30 05:27 · Score: 2, Insightful

Most apps won't benefit; after all, most apps barely use any CPU at all.

Things like audio/photo processing could benefit a lot, though. I imagine all you would need is a worker-thread design pattern, and let the OS do the rest.

--
Tsunami -- You can't bring a good wave down!
Re:Clearing things up a bit by TheRaven64 · 2007-01-30 05:53 · Score: 4, Insightful

The problem with having say, 60 cores able to run in parallel is that our computation methods (turing based machine computation) are based on the basic "serial algorithm"
We've had theoretical foundations for parallel processing back since Turing (see non-deterministic Turing Machine) and rigourous theoretical frameworks such as ?-calculus and CSP for decades. We even have languages like some Haskell dialects and Erlang that are built using these as foundations.
If you choose to use languages designed for a PDP-11, that's up to you, but the rest of us are quite happy writing code for 64+ processors in languages designed for parallelism.

--
I am TheRaven on Soylent News
Re:Clearing things up a bit by rifter · 2007-01-30 06:11 · Score: 2, Insightful

With any language I'm aware of you just have to implement it.
But there's the rub ... technically you can implement anything you want. Heck, you can do it in assembler! But do people do it and do they do it right? How much extra time and energy does it cost you and how difficult is the task? Why did database application developers care whether the database inherently supported transactions when they could just implement it in the application? Why did anyone care about Java's garbage collector when it can be (and was) implemented in C++? Why did people care about the C++ support for object orientation (or Java's enforcement of such) when you could implement such support yourself in C? Why does anyone use or care about GDI objects, GUI widgets, GUI APIs, and such when they can just write a program that draws stuff on the screen and implement that too? Why do we care that there are libraries providing threading functions, etc?
The point is that abstraction layers, libraries, frameworks, object classes, etc exist specifically to provide functionality that is needed on a broad basis with an interface that is meant to be intuitive and usable for developers writing code within a given platform or context. The hope, too, is, that they might enforce certain policies and methodologies and provide more optimal implementations as well as an easier means of correcting any problems down the road. (If I have 1000 apps, each implementing some functionality a certain way, and a bug shows up that affects all or a chunk of them, I have to update them all. If these 1000 apps call on a library that implements that function and a similar bug is found, I can just update that library, in theory).
Re:Clearing things up a bit by cmacb · 2007-01-30 07:33 · Score: 2, Insightful

So many misconceptions, so little time...

I'm not sure if anyone above read past the third paragraph, but I see no evidence of it.

Noteworthy in the article was a combining of conventional X86 technology and Cell technology along with some state of the art memory management. (I'm not employed by or invested in any of the companies involved, just reporting the facts mam).

For the average user there is NO downside to multi core technology, so any statement to the contrary (in the article summary in this case, not the article itself) gets me worked up.

If you don't have a multi-core machine, go into your local superstore and start up the Windows Task Manager and watch the little graphs on the two (or more) processors as you open programs etc. See, both processors are being kept busy doing things in the background? Modern bloat-ware OSs have plenty of stuff that can be done in the background as you type up your little office party announcement. The "average" user can even benefit by being able to keep multiple programs running at the same time, doing a big complex document reformat, syncing your palm device, downloading the latest virus update, and so on. The OSs have been pretty good at multitasking for some time now, and while 16 or 32 processors are probably overkill, it is simply wrong to say there is no benefit and REALLY wrong to imply that there is some sort of disadvantage to multi-core processors (or even multiple single core processors) in almost ANY modern use of a computer. Beyond having multiple processors, advances are being made in controlling shared memory and I?O devices so those processors don't end up stepping on each others toes. What is finally happening is the "commodity" PC is beginning to look like a mainframe of the 70s (and earlier, which had separate devices for memory and I/O management) which the OS, not the application programmer, could use to very great effect. So THERE.

Microsoft has been doing some "interesting" work in using the multiple processors in some graphics cards to do floating point calculations in the background and getting orders of magnitude faster restult with some (specialized) applications. Of course the better way is to leave the graphics card alone to accelerate graphics (presumably thats why you paid big bucks for it) and build the basic PC to handle all these parallel tasks (not just spreadsheet recalcs, but ordinary I/O and memory management) more efficiently. Based on this article, sounds like the scientific community will be seeing such systems in the near future, and us ordinary desktop users will probably start benefiting from it (or some Dell/Apple/Intel/etc variation in a few years.) It's great news.

Yes, this is a sales piece. Just like yesterdays "announcement" of the upcoming 45nm processors was a (gigantic) sales piece coordinated to show up here there and everywhere with pre-taped videos, Scoble blog DRAHMA meltdowns and more pictures of bunny suits than we've seen in ten years. How much actual "news" was there in the Intel/IBM announcments? Not a lot as I think they (and AMD, and IBM) have been hawking this change for almost a year now. Yesterday we learned about "Hafnium", or at least that aspect of it was news to me, and in all these product announcements, even when there isn't MUCH news, there are often tidbits that make it interesting. There is no reason to be complaining about the article, at least until it has been duped half a dozen times or so (and it probably will be.)

But Developers do? by Ckwop · 2007-01-30 04:37 · Score: 3, Insightful

Software, she says, just doesn't understand how to do work in parallel to take advantage of 16, 64, 128 cores on new processors.

But the developers do? When these processors become prevelant, people will design their software to utilise the parallel processing capability. What am I missing here?

Simon

Yeah, if you only run one program at a time.. by ruiner13 · 2007-01-30 04:39 · Score: 5, Insightful

What the author fails to take into account is that multi-core allows each program to effectively use a separate core to do its work, regardless of how it is programmed. All it takes is the OS to be smart enough to task each program to a free core, if available. The programs don't have to be specifically written to be multi-core aware as long as the OS is smart enough to send process to the idle cores. The programs that need more power than one core can deliver will usually have the multi-core support built in, as many games are starting to do now that the technology is taking off.

Notice I took the high ground and didn't make the obligatory windows virus scan jokes... :)

--

today is spelling optional day.

Re:Compilers need to be better. by Rosco+P.+Coltrane · 2007-01-30 04:49 · Score: 2, Insightful

Don't be silly, it's not that simple: sure you can spread processes and threads across several cores, as opposed to using just one cpu to do it all, but what distributed computing is about is arranging the code in a single thread to take advantage of the presence of several cores. It's called parallelizing code, and it's an extremely tough branch of computer sciences.

Of course OSes can do load balancing on several cores with several processes, that's trivial... What's not is real parallel code.

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash

Re:Architect by Overzeetop · 2007-01-30 05:08 · Score: 2, Insightful

Give it up. Protected titles ceased to be protected decades ago when industry decided that it didn't need to be regulated. Architect means nothing these days, nor does engineer or doctor. We can throw around RA and PE and MD all we want, but the common words will always be crapped on. Interstingly, accountant and lawyer seems pretty safe - I guess we just need to choose fields that nobody wants to be associated with if we want to keep our monikers pristine.

Overzeetop, PE

--
Is it just my observation, or are there way too many stupid people in the world?

Re:Concurrency in software by AKAImBatman · 2007-01-30 05:08 · Score: 2, Insightful

VLC would benefit greatly from concurrent SIMD cores like the Cell Broadband Engine. Depending on the particular compression stream, the cores could be decoding multiple frames of video/audio in parallel at a much faster rate than a regular PC. Even if the particular compression stream didn't allow for multiprocessing, the SIMD design would still allow for the decompression stream to be completed in a fraction of the time it would take general purpose instructions to perform. (The exact fraction is dependent on how large the bits of data are. It could be as little as twice as fast, or it could easily be four times as fast.)

--
Javascript + Nintendo DSi = DSiCade

Re:Compilers need to be better. by xero314 · 2007-01-30 05:11 · Score: 2, Insightful

Attempts to parallelize the operations will give bad results.

I think what you meant was "attempts to parallelize the operations incorrectly may yield incorrect results."
The example that you had given above where you manual converted an algorithm from sequential to potentially parallel processed could easily be handled by a compiler. If your brain can handle the optimization so can a compiler given enough time. When writing in a higher level language (i.e. Not Assembly or Machine Code), like you used in your example, then you should be able to expect the compiler to handle those optimizations. Yes I realize all of this is in theory, but eventually reality has to catch up with theory if we expect to improve.

Re:rendered deadly slow? by Wesley+Felter · 2007-01-30 05:14 · Score: 3, Insightful

But software bloat increases faster than single-thread performance, thus making software run slower.

Re:Compilers by mabhatter654 · 2007-01-30 05:25 · Score: 2, Insightful

this is the best thread to reply to...

First issue... who says "most" programs CAN be recompiled? The first gen dual cores were basically duplicates of full processors, but as multicore becomes more popular, the cores will be more efficient and may start leaving out 100% compatibility in favor of sending threads to the better processor... that could save millions of gates per chip by tailoring some cores for FPU and some for SSE3 etc. This means in the future multicore processors won't automatically handle the old code more efficently. In comment to your "multiple computer" comment, that's what happens with code that doesn't play nice NOW.. in the future, it may not be possible to have ALL the features fo a full processor on ALL the cores.

In many companies they don't have access to code... sometimes the key parts are 20 + years old and the source physically lost.. very common in business/manufacturing. Sometimes it's not "profitable".. witness how long Adobe is taking to get a version for Intel macs... Sue it's JUST a recompile, but they don't WANT TO do it.. and normal users are legally not allowed.

The problem is not NEW programing languages, it's that much low-level stuff needs to be at least looked at and tested even if it's simply recompiled... that takes TIME and MONEY! If it's copyrighted software, there's nobody but the publisher that can legally do that! That means new versions with upgrade costs (and profit scalping). Like you said, forcing people to recompile usually makes them want to rewrite parts as well from being lost, misunderstood, or inefficient. That's a great time to bring in a new language to simplify things on one base of code and tools... On the other hand it's a great time to push Linux and OSS!!! After all, the code is open so there's nothing preventing somebody from doing the simple work of recompiling and testing on their own. (it still costs TIME, which isn't free, but at least it CAN be done).

Never, ever, not in a million years! by Phat_Tony · 2007-01-30 05:28 · Score: 2, Insightful

"We will never, ever return to single processor computers"

Does anyone think that's anything other than a stupid thing to say?

I mean, maybe we never will, and maybe it's really unlikely that we will anytime soon. But it seems that anytime there's a real revolutionary (rather than evolutionary) jump in processors, we may well go back to a single "core." For example, if they invented a fully optical processor that was insanely faster than anything in silicon, but they were very expensive to produce per core, and the price scaled linearly with the number of cores... sounds like we'd have single core computers around again for a while. And what about quantum computers? I don't even know what a "core" would be for a quantum computer, but are they by nature going to have a design that works on multiple problems simultaneously without being able to use that capacity to work on an individual problem faster? Even if that is the case, does the author know that, or are they just ignoring any possibility of non-silicon architectures?

Even within silicon, is it out of the realm of conceivability that someone will develop a radical new architecture that can use more transistors to make a single core faster such that it's competitive with using the same transistor count for multiple cores?

Considering how computers have spent a good 40 years continuously changing more quickly than any other technology in history, I'd be a bit more reserved in making sweeping generalizations about all possible future developments that might occur in the next forever.

Still, computer scientists seem to be in rough agreement that current software development models mostly don't produce programs that are multi-threaded enough to take optimal advantage of the current trend toward increased cores. maybe it just sounds too boring when worded that way.

--
Can anyone tell me how to set my sig on Slashdot?

CPU not the bottleneck by Tablizer · 2007-01-30 05:28 · Score: 4, Insightful

Most apps get slow for these reasons:

1. Disk is slow
2. Network is slow
3. Junkware hogging CPU
4. Some primadona process decided against my will that it wants to run a scan, Java RTE update, registry cleaning, etc., using up disk head movements, RAM, and CPU.

CPU is usually not the bottleneck except when other crap makes it the bottleneck.

--
Table-ized A.I.

It's the Language, stupid! by david.emery · 2007-01-30 05:45 · Score: 2, Insightful

I believe a big part of our problem is our piss-poor set of programming langauges and their support for concurrency. C/C++ threads packages and Java's low level synchronization primitives make developing parallel/concurrent programs much more difficult than it should be. (Ada95/Ada05 gets it better, at least by raising the level of abstraction and supporting one approach to unifying concurrency synchronization, concurrency avoidance, and object-oriented programming.)

Additionally, there's the related problems of understanding concurrency. In the 80's and 90s in particular, there were a lot of fundamental research results in reasoning about concurrent systems. Nancy Lynch's work at MIT (http://theory.csail.mit.edu/tds/lynch-pubs.html) comes to my mind. I'm always dismayed at how little both new CS grads and practicing programmers know about distributed systems, and how poor their ability is collectively to reason about concurrency. It seems like most of the time when I say "race condition" or "deadlock", eyes glaze over and I have to go back and explain 'concurrency 101' to folks who I think should know this.

Wasn't it Jim Gray (I sure hope he shows up safe and sound!) who coined the terms "Heisenbugs" and "Bohrbugs" to help describe concurrency and faults? (Wikipedia attributes this to Bruce Lindsay, http://en.wikipedia.org/wiki/Heisenbug) Not only is developing concurrent programs hard, debugging them is -really hard-, and our tools (starting with programming languages and emphasizing development tools/checkers), should be focused on substantially reducing or elminating the need for debugging, or development effort will continue to grow.

Until we have more powerful tools -and training- (both academic and industrial) in using those tools, the Sapir-Whorf hypothesis (http://en.wikipedia.org/wiki/Sapir-Whorf_hypothes is) will apply: The lack of a language (programming language as well as 'spoken language') to talk about concurrency will make it nearly impossible for most programmers to develop concurrent programs. This applies to both MIMD and SIMD kinds of parallelism.

dave

"Build it and they will come" attitude by Animats · 2007-01-30 05:47 · Score: 4, Insightful

I've met some of the architects of the Cell processor, and they have a "build it and they will come" attitude. They've designed the computer; it's up to others to make it useful. This is probably not going to fly.

The Cell is a non-shared memory multiprocessor with quite limited memory per processor. There's only 256K per processor, which takes us back to before the 640K IBM PC. There are DMA channels to a bigger memory, but no cacheing. Architecturally, it's very retro; it's very similar to the NCube of the mid-1980s. It's not even superscalar. Cell processors are dumb RISC engines, like the old low-end MIPS machines. They clock fast, but not much gets done per clock.

Yes, you get lots of CPUs, but that may not help. On a server, what are you going to run in a Cell? Not your Java or Perl or Python server app; there's not enough memory. No way will an instance of Apache fit. You could put a copy of the TCP/IP stack in a Cell, but that's not where the CPU time goes in a web server. One IBM document suggests putting "XML acceleration" (i.e. XML parsing) in the server, but that's an answer looking for a problem. It might be useful for streaming video or audio; that's a pipelined process. If you need to compress or decompress or transcode or decrypt, the Cell might be useful. But for most web services, those jobs are done once, not during playout. Even MPEG4 compression might be too much for a Cell; you need at least two frames of storage, and it doesn't have enough memory for that.

Now if they had, say, 16MB per CPU, it might be different.

The track record of non-shared memory supercomputers is terrible. There's a long history of dead ends, from the ILLIAC IV to the BBN Butterfly to the NCube to the Connection Machine. They're easy to design and build, but just not that useful for general purpose computing. Some volumetric simulation problems, like weather prediction, structural analysis, and fluid dynamics can be crammed into those machines, so there are jobs for them, but the applications are limited.

Shared-memory microprocessors look much more promising as general purpose computers. Having eight or sixteen CPUs in a shared-memory multicore configuration is quite useful. That's how SGI servers worked, and they had a good track record. Scaling up today's multicore shared-memory CPUs is repeating that idea, but smaller and cheaper.

At some point, you have to go to non-shared memory, but that doesn't have to happen until you hit maybe 16 CPUs sharing a few gigabytes of memory, which is about when the cache interconnects start to choke and speed of light lag to the far side of the RAM starts to hurt. That might even be pushed harder; there's been talk of 80 CPUs in a shared memory configuration. That's optimistic. But we know 16 will work; SGI had that years ago.

Then you go to a cluster on a chip, which is also well understood.

That's the near future. Not the Cell.

Re:Java does threading probably easiest of all by Pxtl · 2007-01-30 07:03 · Score: 2, Insightful

And Forth's manual stack-loading is practically 1:1 to the underlying OS too, why don't you use that? Garbage collection has nothing to do with the underlying OS, but we keep it around.

Mapping 1:1 to the underlying OS is not the be-all and end-all of linguistic constructs. Consider Actors model languages, or dataflow-model languages - or the native rendezvous concepts from Ada. Im not saying that any of these are ideal approaches (I hate Ada, for example) - Im just saying that Algol-descended languages were designed to model procedure and formulas... so modelling concurrency doesnt come naturally to them.

Purely Functional Programming... by Tracy+Reed · 2007-01-30 08:39 · Score: 2, Insightful

...is the only way we are going to take advantage of multi-core cpu's and continue to improve our software. Only through purely functional code can you make guarantees about what can be executed simultaneously and let the machine sort it all out. I'm learning Haskell for this very reason.

Slashdot Mirror

IBM's Chief Architect Says Software is at Dead End

20 of 334 comments (clear)