IBM's Chief Architect Says Software is at Dead End
j2xs writes "In an InformationWeek article entitled 'Where's the Software to Catch Up to Multicore Computing?' the Chief Architect at IBM gives some fairly compelling reasons why your favorite software will soon be rendered deadly slow because of new hardware architectures. Software, she says, just doesn't understand how to do work in parallel to take advantage of 16, 64, 128 cores on new processors. Intel just stated in an SD Times article that 100% of its server processors will be multicore by end of 2007. We will never, ever return to single processor computers. Architect Catherine Crawford goes on to discuss some of the ways developers can harness the 'tiny supercomputers' we'll all have soon, and some of the applications we can apply this brute force to."
owww... my head...
There are a couple of serious problems with this statement. The most important one is that the article doesn't say that existing software will get slower. And there's a reason for that: Existing software will continue to run on the individual processor cores. Something that they've done for a long period of time. Old software may not get any faster due to a change in focus toward parallelism vs. increased core speed, but it's not going to suddenly come to a screeching halt any more than my DOS programs from 15 years ago are.
Secondly, multicore systems are not a problem. Software (especially server software!) has been written around multi-processing capabilities for a long time now. Chucking more cores into a single chip won't change that situation. So my J2EE server will happily scale on IBM's latest multicore Xenon PowerPC 64 processor.
Finally, what the article is really talking about is the difficulties in programming for the Cell architecture. Cell is, in effect, and entire supercomputer architecture shrunk to a single microprocessor. It has one PowerPC core that can do some heavy lifting, but its design counts on the programmers to code in 90%+ SIMD instructions to get the absolute fastest performance. By that, I mean that you need to write software that does the same transformation simultaneously across reasonably large datasets. (A simplification, but close enough for purposes of discussion.) What this means is that the Cell processor is the ultimate in Digital Signal Processor, achieving incredible thoroughput as long as the dataset is conductive to SIMD processing.
The "problem" the article refers to is that most programs are not targetted toward massive SIMD architectures. Which means that Cell is just a pretty piece of silicon to most customers. Articles like this are trying to change that by convincing customers that they'd be better served by targetting Cell rather than a more general purpose architecture.
With that out of the way, here's my opinion: The Cell Broadband Architecture is a specialized microprocessor that is perpendicular to the general market's needs. It has a lot of potential uses in more specialized applications (many of which are mentioned in the article), but I don't think that companies are ready to throw away their investment in Java,
Javascript + Nintendo DSi = DSiCade
I see no need for why we would ever need anything more than 640 cores per processor in the future.
Small potatoes make the steak look bigger.
But the developers do? When these processors become prevelant, people will design their software to utilise the parallel processing capability. What am I missing here?
Simon
Worst. Summary. Ever. She's talking about technical computing. Regardless, software will evolve to take advantage of new hardware architectures - this has been proven by history. I'm not in the least worried. If this doesn't happen, then by definition the hardware architecture will be forced to change.
What the author fails to take into account is that multi-core allows each program to effectively use a separate core to do its work, regardless of how it is programmed. All it takes is the OS to be smart enough to task each program to a free core, if available. The programs don't have to be specifically written to be multi-core aware as long as the OS is smart enough to send process to the idle cores. The programs that need more power than one core can deliver will usually have the multi-core support built in, as many games are starting to do now that the technology is taking off.
:)
Notice I took the high ground and didn't make the obligatory windows virus scan jokes...
today is spelling optional day.
If you look at single-thread performance on Intel and AMD's dual/quad core chips, they meet or beat the best that single-core has to offer. I don't see why a multi-core system in the future will run single-thread apps any slower than right now. If anything I'd expect single-thread performance to increase incrementally as Intel and AMD are able to increase clock speeds.
We just need a compiler to translate our code into threaded processes. Impossible? Think about the translation a C++ compiler does, this is minimal task. Problem solved. Move along, nothing to see here.
Why do the individual cores have to become slower? Am I missing a logical step here?
Let's see, we have efficient and fast ALU/FPUs now. All of a sudden they'll become totally inefficient because we've gone quad-core?
Hey, biatch from IBM [or just poor reporter], shut your ignorant gob.
Tom
Someday, I'll have a real sig.
Has Netcraft confirmed this yet?
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
What do women know about computers? Someone tell her to go play Barbie.
The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
Herb Sutter wrote about this topic two years ago. A great read for anyone who is interested.t m
http://www.gotw.ca/publications/concurrency-ddj.h
Enjoy,
It's just the normal noises in here.
IMHO, multi-cores are good for multitasking, which does not cover the whole problem of parallelism. Software (at least, in principle) _is_ ready: pure functional languages, for example, are perfectly suited for parallel processing; it is the lack of the CPUs with architectures that support internal concurrency (using a single core - as opposed to those providing support for multi-threading using multiple cores) that is the problem...
IBM needs a new Chief Software Architect.
From this i get three questions in my head:
-Can compilers be improved to automatically use multiple cores and where are the limits of this?
-Multiple cores? Why not just treat it multiple computers?
-Besides this, is there a solution to this in the form of new programming languages?
Surely part of it it's true, but why not to ask more money for the software, if it needs to be rewritten to better handle new hardware and "security"? Five years later it seems it is still 32 bits and has drm inside. Sounds familiar? Len
Concurrency is a hard problem, and unexpected interactions between asynchronous events in concurrent environments has been a periodic bugbear for almost as long as computers have been interactive.
It's what made the Amiga look less reliable than its competitors... if you only ran one native program at a time it was a lot more stable than MacOS or MS-DOS, because the OS provided a much richer set of services so applications didn't have to replicate them... but most people took advantage of the multitasking and when something crashed in the background the lack of memory protection meant the whole thing went down, and non-native software that wasn't written with multitasking in mind could produce the most entertaining crashes.
These days we all have good protected mode multitasking operating systems, but we don't have good easy ways to distribute an application across multiple cores. Until we do, most applications are going to be written to run single-threaded and depend on the OS to use the other cores to speed up the rest of the system, both at the application level and doing things like running graphics libraries on another core.
Until we have so many cores that the OS can't make effective use of them I don't think there's even going to be much of an attempt to make use of them for more developers. And then we're going to go through a painful period like we went through before Microsoft discovered multitasking.
The argument that software will get slower assumes that most consumer software will continue to have additional CPU requirements without being coded for multi-core applications. This doesn't make sense. The average consumer uses an Office product, e-mail, and a browser. None of these use anywhere close to 100% of the CPU for very long even on a Pentium 3, let alone on a 2GHz+ core in a multi-core processor.
Workstation computing will suffer some until software vendors catch up, but this is already happening (e.g. most CAD, Animation, Video Processing are starting to come out with multi-core optimized software). Sure, some apps will continue to be single-threaded, but eventually, who would buy them? Software vendors aren't dumb.
Games will probably speed up significantly as well. Imagine the possibilities of having a game engine where each AI character utilizes 100% of a single core? Game designers aren't going to sit around desiging games that run on single core engines, they always push the boundaries and will continue to do so.
Crack - Free with every butt and set of boobs
An issue is perhaps that we don't really have a programming language that really helps in thinking with multiple cores. Sure, there are constructs for threading and sharing memory and the like, but they usually boil down to os-specific function calls.
Take ANSI Pascal. How do you do OOP in Pascal with proper abstraction, inheritance, and polymorphism? Borland added extentions to the language and got around some of those issues. So did C by way of C++.
Thinking with multiple cores to solve problems isn't necessarily hard. Humans do parallel thinking all the time: walking, driving, dancing, etc. The old formats of linear code are just inadequate to really take advantage of lots and lots of cores.
That said, it can still be done. But it's like digging a canal with a bunch of sporks.
Why can't they shut up about computers and go back to
making buildings. You don't find coders telling
architects how to build skyscrapers.
A New Kind of Science. Converting a range of standard CS algorithms into Cellular Automata networks is the very solution our brains use; a combination of message passing and feedback loops. If we want our computers to scale in parallel, we might want to look at how biology has solved the problem. A lot of people laughed at Wolfram when he initially published that book. I think he yet might have the last laugh.
This sounds like a complaint that software has not yet caught up with hardware in their never-ending cycle where software expands to take up all available hardware resources.
Developers, start your eye-candy-izations!
Architects design buildings.
If engineers want to call themselves system architects, or chip architects, or something like that to try and pretend that they're somehow better than normal engineers, whatever. But don't refer to them with the title of Architect followed by their name. To design buildings and sign your name as a licensed architect has real legal implications, and a long and expensive process is required to get to that point.
It's very similar to the term Doctor, which you generally should not go around referring to yourself as unless you truly are a doctor. Just because architects aren't as well paid or respected as doctors in our culture doesn't mean it's ok to steal the title.
One time I threw a brick at a duck.
My native c++ apps will still run like the wind.
/. I mean er yah! FreeBSD app runeth fast!
Everyone buying into Microsoft's huge, slow runtimes
(.NET where it isn't needed) will have to suffer.
Wait, this is
I rather feel hardware is at an end rather than software. It seems we've got to the point where stacking things on top of each other is the only way to keep improving things.
I like muppets.
Forgive my ignorance, but can't the OS just make each new app run on it's own core? That would probably give us some overall apparent-speed-of-computer increases, without having to completely modify all existing stuff.
stuff |
If you read the article, you will see that Mrs. Crawford does not even come close to saying that "Software is at Dead End". She says software needs to catch up with the hardware.
Computers have more and more processors (and different kinds of processors, like GPUs), and currently most software isn't designed for that kind of environment. IBM has developed some clever ways to program these types of systems in a "general purpose" way.
That's the worst summary of a headline that I've ever read.
Concurrency is easy.
I wonder if I use bold in my signature, people will notice my posts.
Clearly this person hasn't heard of our lord and master (and commander) Tim O'Reilly. Web 2.0, with it's centralised software-as-a-service paradigm will synergize the blah blah blah.
Ok, ok, buzz words aside (synergize!), with the move to web apps, desktop terminals, people browsing on gaming consoles, etc... I just don't see what the person is whinging about.
Ok, so hardware is moving towards parellelism (new buzzword? It's mine, and I've trademarked it, use it and I'll sue you!!). Software will need to be engineered differently to take advantage. This is nothing new. Old games don't take advantage of 3d cards. Once the hardware became relatively available, they started to. It's the same thing here.
But what I find most ironic is that further down the /. main page is an article about how businesses should move to dumb terminals for most users, implying server software which implies software designed for parellelism. So I guess I don't see where there's some giant disconnect between software and hardware, there's just a shift coming up, and there'll be a "synching up" period. We've had these in the past and this won't be the last. Therefore, can we please put the rhetoric down, and step away from it, before someone loses an eye?
I knew IBM's Lotus Notes was a dead end years ago :)
The price is always right if someone else is paying.
IBM have invested heavily in multicore technology that is only effective for very parallel tasks.
However, not all tasks can be parallelized at all, and not all can be parallelized easily. This makes IBM's product unattractive to many potential customers.
IBM is therefore also investing in people to go around telling prospective buyers that sure, if the developers just build 'multicore aware' programs, then multicore will help them a whole lot, yes sirree. And sure, there's an element of truth in that in most scenarios.
IBM's investment has been successful enough to appear on Slashdot.
The end.
Whence? Hence. Whither? Thither.
was an interesting article, particularly the part about the hybrid "roadrunner" architecture.
However what is more relevant to today's non-supercomputing needs is SMP scalability.
One of the challenges with SMP scalability is cache coherency; synchronizing the caches on the processors is a costly operation (this is necessary to ensure that each processor has the same view of certain memory at the same time), normally (always?) done with a cache invalidation.
So the more invalidations you do, the more often the processor has to fetch memory from main memory, and the less it's using its cache. Processing slows down dramatically.
I've tried to design the qore programming language http://qore.sourceforge.net/ to be scalable on SMP systems. The new version (released today) has some interesting optimizations that have resulting in a large performance boost on SMP machines - the optimizations involve reducing the number of cache invalidations to the minimum (more than just reducing locking, although that is a part of it too - even an atomic update - for example on intel an assembly lock and increment - involves a cache invalidation and therefore is an expensive operation on SMP platforms). There is more work to be done, but in simple benchmarks of affected code paths the performance increase was between 2 and 3 times as fast with the optimizations on the same qore code.
Anyway it would be interesting to know if other high-level programming languages have also taken the same approach (or will do so); as we go forward, it's clear that SMP scalability will be an important topic for the future...
One of the first problems which Developers will need to master is Cache-Consistency. We are not being helped very much here. At the moment this is all very level/library specific. All the tricks for, say, elegantly sharing a session in a Hibernate based J2EE webapp are different for synching L3 cache structures between different cores. They are the same Cache-Consistency problem in principle.
Hibernate accomplishes a lot by giving in-code Hints [called AOP for some stupid reason]. These are instructions which are reasonably close in to where the code which deals with that data is located.
What is needed is a lot more consistent frameworks for describing what data should be shared between cores&servers. Some will be a priority and the rest can be fetched later. Similar to marking
to hint the compiler should use a cpu register to store the variable.
From the question 'Where's the Software to Catch Up to Multicore Computing?'. We need better consistency to describe to the computer what we'd like it to be doing, and we need a consistent system so we can learn it more easily when we're novices.Describing this data stuff cuts across many levels. Data design, language choice, IDE integration, build tools and then target deployment environments.
[% slash_sig_val.text %]
The human brain is not good at task parallelism, so advances in parallel computation MUST at some point
be driven at the compiler level. Try to remember 10 things at once. It's hard. Try writing a program with 600
threads. It's almost impossible.
This is where languages that have the concept of no side effects will come in and replace the concepts we have today.
Maybe we will all be programming is Haskell soon.
I think multiple cores are great. Now each of the spyware apps on my Windows box will have their own dedicated processor, and I can actually get something done on the ones that are freed up.
Bring it on!
--sugarman--
"We will never, ever return to single processor computers"
Does anyone think that's anything other than a stupid thing to say?
I mean, maybe we never will, and maybe it's really unlikely that we will anytime soon. But it seems that anytime there's a real revolutionary (rather than evolutionary) jump in processors, we may well go back to a single "core." For example, if they invented a fully optical processor that was insanely faster than anything in silicon, but they were very expensive to produce per core, and the price scaled linearly with the number of cores... sounds like we'd have single core computers around again for a while. And what about quantum computers? I don't even know what a "core" would be for a quantum computer, but are they by nature going to have a design that works on multiple problems simultaneously without being able to use that capacity to work on an individual problem faster? Even if that is the case, does the author know that, or are they just ignoring any possibility of non-silicon architectures?
Even within silicon, is it out of the realm of conceivability that someone will develop a radical new architecture that can use more transistors to make a single core faster such that it's competitive with using the same transistor count for multiple cores?
Considering how computers have spent a good 40 years continuously changing more quickly than any other technology in history, I'd be a bit more reserved in making sweeping generalizations about all possible future developments that might occur in the next forever.
Still, computer scientists seem to be in rough agreement that current software development models mostly don't produce programs that are multi-threaded enough to take optimal advantage of the current trend toward increased cores. maybe it just sounds too boring when worded that way.
Can anyone tell me how to set my sig on Slashdot?
If I had a multi-core near-supercomputer, I'd stick it in my basement and use it to run VM'd or dumb terminals in every room in my house. I'd only use the full processing power for cracking and transcoding DRM'd video.
Most apps get slow for these reasons:
1. Disk is slow
2. Network is slow
3. Junkware hogging CPU
4. Some primadona process decided against my will that it wants to run a scan, Java RTE update, registry cleaning, etc., using up disk head movements, RAM, and CPU.
CPU is usually not the bottleneck except when other crap makes it the bottleneck.
Table-ized A.I.
ring. . . Sorry. . .but I get sick of all those 'smart' people who think they can predict the future and evolution.
Turing imagined massively parallel machines, but the succes of Von Newman architectures (hardware/software) has lead to the actual state of computers.
There are things that can only be achieved with massive parallel processing, but after +60 years we are still triyng to extend the current arquitecture to multiple (not massive) units.
I can understand the motivations and advantages but they are still fundamentally wrong.
What's in a sig?
The first person I met who was seriously looking at this question of how to design (and take advantage of) chips with lots of cores was Chuck Moore, the inventor of the Forth Language. I know he's been working on this general problem quite a while-and I'd tend to take what he has to say in this area _very_ seriously. Moore is an original thinker--and the "real deal".
Randall Burns
Why are $COMPANY and $PRODUCT both in the global namespace?
Or can only one company sell only one product?
I believe a big part of our problem is our piss-poor set of programming langauges and their support for concurrency. C/C++ threads packages and Java's low level synchronization primitives make developing parallel/concurrent programs much more difficult than it should be. (Ada95/Ada05 gets it better, at least by raising the level of abstraction and supporting one approach to unifying concurrency synchronization, concurrency avoidance, and object-oriented programming.)
s is) will apply: The lack of a language (programming language as well as 'spoken language') to talk about concurrency will make it nearly impossible for most programmers to develop concurrent programs. This applies to both MIMD and SIMD kinds of parallelism.
Additionally, there's the related problems of understanding concurrency. In the 80's and 90s in particular, there were a lot of fundamental research results in reasoning about concurrent systems. Nancy Lynch's work at MIT (http://theory.csail.mit.edu/tds/lynch-pubs.html) comes to my mind. I'm always dismayed at how little both new CS grads and practicing programmers know about distributed systems, and how poor their ability is collectively to reason about concurrency. It seems like most of the time when I say "race condition" or "deadlock", eyes glaze over and I have to go back and explain 'concurrency 101' to folks who I think should know this.
Wasn't it Jim Gray (I sure hope he shows up safe and sound!) who coined the terms "Heisenbugs" and "Bohrbugs" to help describe concurrency and faults? (Wikipedia attributes this to Bruce Lindsay, http://en.wikipedia.org/wiki/Heisenbug) Not only is developing concurrent programs hard, debugging them is -really hard-, and our tools (starting with programming languages and emphasizing development tools/checkers), should be focused on substantially reducing or elminating the need for debugging, or development effort will continue to grow.
Until we have more powerful tools -and training- (both academic and industrial) in using those tools, the Sapir-Whorf hypothesis (http://en.wikipedia.org/wiki/Sapir-Whorf_hypothe
dave
I've met some of the architects of the Cell processor, and they have a "build it and they will come" attitude. They've designed the computer; it's up to others to make it useful. This is probably not going to fly.
The Cell is a non-shared memory multiprocessor with quite limited memory per processor. There's only 256K per processor, which takes us back to before the 640K IBM PC. There are DMA channels to a bigger memory, but no cacheing. Architecturally, it's very retro; it's very similar to the NCube of the mid-1980s. It's not even superscalar. Cell processors are dumb RISC engines, like the old low-end MIPS machines. They clock fast, but not much gets done per clock.
Yes, you get lots of CPUs, but that may not help. On a server, what are you going to run in a Cell? Not your Java or Perl or Python server app; there's not enough memory. No way will an instance of Apache fit. You could put a copy of the TCP/IP stack in a Cell, but that's not where the CPU time goes in a web server. One IBM document suggests putting "XML acceleration" (i.e. XML parsing) in the server, but that's an answer looking for a problem. It might be useful for streaming video or audio; that's a pipelined process. If you need to compress or decompress or transcode or decrypt, the Cell might be useful. But for most web services, those jobs are done once, not during playout. Even MPEG4 compression might be too much for a Cell; you need at least two frames of storage, and it doesn't have enough memory for that.
Now if they had, say, 16MB per CPU, it might be different.
The track record of non-shared memory supercomputers is terrible. There's a long history of dead ends, from the ILLIAC IV to the BBN Butterfly to the NCube to the Connection Machine. They're easy to design and build, but just not that useful for general purpose computing. Some volumetric simulation problems, like weather prediction, structural analysis, and fluid dynamics can be crammed into those machines, so there are jobs for them, but the applications are limited.
Shared-memory microprocessors look much more promising as general purpose computers. Having eight or sixteen CPUs in a shared-memory multicore configuration is quite useful. That's how SGI servers worked, and they had a good track record. Scaling up today's multicore shared-memory CPUs is repeating that idea, but smaller and cheaper.
At some point, you have to go to non-shared memory, but that doesn't have to happen until you hit maybe 16 CPUs sharing a few gigabytes of memory, which is about when the cache interconnects start to choke and speed of light lag to the far side of the RAM starts to hurt. That might even be pushed harder; there's been talk of 80 CPUs in a shared memory configuration. That's optimistic. But we know 16 will work; SGI had that years ago.
Then you go to a cluster on a chip, which is also well understood.
That's the near future. Not the Cell.
build good compilers that support these options?
The Kruger Dunning explains most post on
Pick the right tool for the job.
I am TheRaven on Soylent News
"I work for IBM, and we've been doing multicore (well, multi-processor) for years. Buy IBM software"
-- Having a Creationist Museum is like having an Atheist place of worship
Berkeley tech report (inc. Patterson as author)
Brief summary (I heard the same talk when he spoke at PARC), computational problems are divisible into one of thirteen categories that range from matrix multiplication to finite state automata. Most existing research (academia and industry) into parallelism tends to focus on about seven of those categories that are most easily parallelized - think supercomputer cluster. Most apps that you or I use fall into the graph traversal or finite-state categories (think compilers, apps with an event loop, etc.), into which there is essentially no research. Patterson even suspects that finite state machines are inherently serial and CANNOT be parallelized.
So ... the apps that we already use can't really get faster on parallel cores without major, fundamental advances in computer science that don't seem to be approaching. Which means we'll be using our current apps for a LONG time.
Additional note: IBM (and other chip manufacturers) have a vested interest in telling everyone that parallelism is the future. They can't make faster chips anymore, they can only compete on sheer number of cores.
A witty [sig] proves nothing. --Voltaire
that main frames had the right idead 25+ years ago.
The Kruger Dunning explains most post on
Perl 6 (as it is designed) introduces a new concept of "junctions", which are a bit like arrays, but can be used in clever ways. One useful way is:
... and instead of doing three consecutive eat's -
if ($fruit == ("apple"|"orange"|"pear")) {
print "sweet!";
}
But another intriguing way of using the junction will be parallel loops -
for ("apple"|"orange"|"pear") {
eat($_);
}
eat("apple");
eat("orange");
eat("pear");
the interpreter will run 3 threads, each with the eat() function.
As the whole of Perl 6, this design is not finalized. Maybe it won't be like that at all. And of course threading is all non-safe and stuff.
But having threads in a vanilla for loop, instead of setting up thread with clever functions, modules, etc. is something new. If it will happen and unsuspecting programmers will just use it, hey - that'd be something special.
This comment does not exactly apply to the question put forth about performance of existing apps under multiple cores. However, I would like to bring up that, in my opinion, given my experience with artificial Neural networks and related work, that I expect, in some form or another, that it is likely that one could fairly easily argue:
1) The number of cores is going to increase
2) The current concept of an artificial Neuron having some sort of value, with weights attributed to it is too simple for how our human brains realy work, and therefore need more than a simple value and one algorithm, such that it will likely need to be replaced with a more complex model of values and algorithms, and the work on such that requires a mini-process or in this case "a core"
I expect that given that there will be an increased amount of cores, probably with an increase similiar to hard disc, processor, or memory increases of the past (1 10MB hard disc increasing to 500GB today), that we will have thousands or even hundreds of thousands of cores.
As we learn more about how the brain works I believe that 2) will be accepted as true at some point.
So I expect that more and more new software will attempt to be more intuitive, as more and more people begin to agree that the software we have now in general is crap, in that it doesn't help the layman as much as it could do their jobs.
This intuitiveness will likely be in the form of artificial Neural Nets, paving the way for computing systems to begin to act like the science fiction computer systems we think of in "the future".
Just my two-cents guess...
The main issue with threaded programming is creating reliable programs. With multiple cores, you get a much higher level of parallelism than we have seen before.
e d_DDJ_All -- a very brief summary/reviewE CS-2006-1.pdf -- full paper
There is a great (though depressing) paper that discusses the challenges:
http://www.ddj.com/dept/64bit/196901362?cid=RSSfe
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/E
splitting a task in a way that works well in parallel is a problem whether you're using threads or concurrency though, so i wouldn't say either method is particularly easy for someone who has a background in designing more linear solutions
Java's threading is practically 1:1 to the OS. It's incredibly easy to use.
new Thread() { public void run() { go nuts } }.start().
...built from ground with multiprocessors in mind, like... BeOS.
For those which have not tested it, it was really reactive, more reactive on my old PPC604 PowerMac-7300 than current OSes are on recent hyper-powerful hardware with mountain of memory.
So, maybe its a known design problem for OS designers, and at least partially resolved.
Note: I dont talk about thouthands of core, for me they are still for number crunching, video production... For normal use, a reactive OS like was BeOS, even on a single core chip, is more efficient in my daily task than recent OSes with too many layers and software history support [I use Linux at home, WinXP at work].
L.Pointal
Hardware has two purposes:
To the hardware vendors, the purpose of hardware is only to generate revenue. They will say or do just about anything to try to convince the rest of us to buy their product.
To most companies, and most individuals, the purpose of hardware is to run the software. Nobody uses a raw computer. You use a word processor, or a spreadsheet, or a database, or some other application program. The hardware is just there to run the applications.
I don't need a gazillion-gigahertz kilo-processor box on my desk. My company might need it in specialized units to run specific business or scientific applications. (It might even let us use fewer boxes to run the highly threaded apps that have been the norm in my business for the last 20+ years.)
You lost me at 'IBM Architect' :-)))
an IBM anonymous coward
I guess the competition is working on a Coyote project?
I think the Acme Corporation got prime on that contract. Their past performance was a bit sketchy, but they were the low bid...
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
I have a question.
Instead of, say, using a dual core proccessor computer, why not use a single core proccessor computer with the double speed CPU? Is it because the cost will be way higher for the double speed CPU than for the dual CPUs?
Thanks.
Um, editors? She "is chief architect for next-generation systems software at IBM Systems Group's Quasar Design Center". (FTFA) She's not "IBM's Chief Architect". I don't think IBM even has a Chief Architect.
The problem the layman has with understanding computers
is the language. We need more words to describe software.
In this case Source code would better take advantage of
a new architecture. The problem is the old "commodity ware"
that so many people run on PC's. Contrast that with the portability
of UNIX. A operating system that has been ported to (for me) countless
hardware architectures. The advantage UNIX has is its portability is based on
Source code. Not a binaries compiled for a single thread Intel architecture.
The issue is not that software will not run well on many cores.
It the old binaries will not run.
Many new software architectures(eg. java?, smalltalk?, perl6? )
Could/should run on many core with out programmers doing any thing special.
http://en.wikipedia.org/wiki/MapReduce provides a nice model of parallel computation to be exploited: in a multicore chip, you can treat each core as a node with a very reliable network and fast I/O.
l s/mpi/ or a nice collection of learning materials at: http://www.hlrs.de/organization/par/par_prog_ws/ .
m p/openmp_content.html
Don't like MapReduce and want something more... heterogenous... you could use PVM: http://www.csm.ornl.gov/pvm/ ; which later became MPI: http://www.hlrs.de/organization/par/services/mode
Want something from Sun? OpenMP: http://developers.sun.com/sunstudio/articles/open
I think that we have the software to do it, already, yes.
No, but thanks for the link. I'm pretty out of the C world since about 1998 - i code almost exclusively in Perl (5), which explains my Perl bias.
OpenMP seems nice, but the charm with the Perl 6 junctions idea (if and when it is implemented; i haven't tried the latest Pugs build yet), is that the programmer doesn't even know that he's writing multi-threaded code. And that's something.
So, I wrote a little app that uses the Accelerate library on OSX. Basically, I use it for running convolutions on images that are coming from a webcam - blur, laplacian, etc.
The task itself is highly parallelizable, so the library actually does create multiple threads and use image tiling to get me better performance when multiple cores are available. On machines where there's just one core, it doesn't using threading. However, they join all the threads before returning from the API call, so my program works identically either way.
Also, I wrote the app on a PPC, but it still ran very nicely on Intel machines through Rosetta, because it was just a call to the underlying library.
Powered by Web3.5 RC 2
I see some major problems with multi-core processors for home users who only really use their computers for internet browsing, email, and standard productivity. They basically don't need it. Unless they're playing intense games (which would have to be programmed to use multi-cores) or video editing, or software development... I just don't see it being as usual as everyone hoped... especially if the software itself is still only utilizing one core, or one thread at a time.
In the business world, I see people moving towards blade servers and dumb terminals, and so truly, the servers are the only ones who really need the multiple cores. Yet again though, the software needs to be written to make use of multi-threading and multiple cores.
Relocating to San Francisco / Palo Alto... Hire me?
Any server side app running in a remotely decent jee Application server with Stateless beans for example will scale unbelievable well onto multi-proc systems.
any application with a handler thread per incoming connection request will benefit from multi-proc systems.
the beauty of things like jee application servers is that developers design applications as if everything was single threaded, then u flick a couple switches and all of a sudden ur app is replicated and running on a cluster of 100 machines....now consolidate that into one machine with 100 procs and ull see even more performance enhancements.
DB2 is another example of an app that scales amazingly on multi-proc machines.
designing multi-threaded applications really isnt as hard as most people make it out to be. everything in the real world is "multi-threaded". when u move ur left arm up, u can move ur right arm down. if u have a robot with 4 cores....spawn 4 threads, one for each limb....if u need to multiply 4 matrices....spawn 2 threads....each thread multiplies two matrices....join the threads.....then multiply the results. u just need to get creative in order to start takign advantage of multi-proc machines.
and when i think about it...i dont think theres a single app i could name that uses only one thread.
Can we use MPI http://www-unix.mcs.anl.gov/mpi/ on these new multicore processor architectures?
Finally, the end of painful Context Switching in multi-threaded environments. I can hardly wait!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
When we went to Instruction Parallelism, first in mainframes such as the IBM 360/91 and 195, and later in microprocessors such as the Intel Pentium, the programmer didn't have to hand-code the parallel instructions. Yes you could hand-tune code to better run down the I & J pipes together in the Pentium, and compilers could and did optimize for it. But the hardware itself took the instruction stream as given to it, and dispatched a second instruction in parallel each time it determined that the second instruction met the criteria to run in parallel. Everything that's followed is simple refinement of this principal.
Transmeta did the same thing with instruction sets, translating them on the fly to match the underlying hardware in a special Code Morphing[tm] approach.
The next challenge for hardware is for the hardware itself to assign tasks to processing cores automatically. Look for context switches and instead pass them along to available processing resources. Do it once in the hardware, and all the software to follow will benefit from it.
I'm waiting...
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
...is the only way we are going to take advantage of multi-core cpu's and continue to improve our software. Only through purely functional code can you make guarantees about what can be executed simultaneously and let the machine sort it all out. I'm learning Haskell for this very reason.
Some tasks can't be done in parallel and this is the Achilles heel of massive parallel architectures. See for instance http://en.wikipedia.org/wiki/Amdahl's_law. No parallel hardware and no parallel algoritms - no matter how clever - will help you, if you have a task of sequential nature (and even only help you somewhat no matter how massively parallel it is, if your task is partly sequential).
If one truckdriver can drive a truck 50 miles in one hour, how far can two drivers then drive the truck in the same amount of time?
From TFA: "Digital animation. Massive supercomputing power will let movie makers create characters and scenarios so realistic that the line will be blurred between animated and live-action movies."
Didn't they already do that with Jar-jar?
Slashdot: news for Apple. Stuff that Apple.
But software bloat increases faster than single-thread performance, thus making software run slower.
But if your application can run in its own thread the ongoing software bloat of the underlying OS won't affect its internal performance. (And the OS vendor may bloat it, but count on him to use enough multithreading to keep it out from underfoot - even if it chews up most of the processors in the multicore cluster.)
As long as you or your software vendor doesn't add inline bloating with the upgrades the application's performance will track that of the individual processors in the cluster - which will continue improving, though more gradually.
(If your vendor can't keep his engineers' hands off the critical code's performance his product is toast. Start looking for a replacement.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Article seems to be specifically addressing supercomputer type parallel applications used in narrow industry niches. Not much new there unless you're the guy writing compilers for nuclear weapons simulations. Clearly general purpose business computing is not going to be rendered any more obsolete than the MVS based apps you have running on your Z/OS mainframes today. And clearly, customers are not going to junk big multiCPU PowerPC based AIX systems so quickly. Hell We BARELY have our Databases running right today. You think we're going to tear them down? Nope.
But in terms of special vertical systems, hell yes they're always looking for cheaper faster ways to run them. Anytime you can remove hardware and make it cheaper, faster, cooler and more reliable is a plus. Is a Quad-core CPU roughly equal to a pair of dual processor single cores running in a cluster? I hope so. and now I can remove a lot of network complexity. Problem is rewring that special application to do that.
The new multicore systems are good for everyone - regardless of whether your application can take advantage of multiple cores or not. The new technology is certainly no reason to throw away existing code or skills.
You can run multiple processes (applications) on different processors - and I know for a fact that this is how Sun multiprocessor systems operate. For example, if you are using a Sun 420 (4 processors) - it can run 4 applications at 100% CPU utilization - which means you can get 4 times the work done that an equivalent single processor machine would normally be able to accomplish (+/- overhead for shared memory management). The operating system handles this, and schedules which processes get the various CPUs. I would imagine this is how Linux support also works (but don't quote me because I don't know for sure - someone in the audience can pipe in here).
So, there is no downside, as far as I am concerned. For your average end user of a home (laptop) system, having a dual core 64 bit system will allow him to run his favorite game on one processor, and TeamSpeak voice comms on the other - spreading out the work between the two processors for more efficient overall experience on the machine. For business users, having multicore machines is already making their operations more efficient, even without special software to make one application take advantage of multiple processors.
Eventually someone will build a module to allow developers to more easily generate applications that take advantage of the new capabilities on multiprocessor systems. There are only two arenas where I see this worth the effort: Video Gaming/Rendering (rendering and network bottlenecks abound - hence moves to offload CPU intensive algorithms to the video card to get every ounce of performance), and Super Computing/AI (where aggrigation and analysis of large datasets is costly); everything else I can think of will work fine without the need to split the threads of execution between processors.
Just because you can do a thing doesn't mean you should rush out and do it.
Lodragan Draoidh
The more you explain it, the more I don't understand it. - Mark Twain
Funny how Erlang programmers are chuckling to themselves right now, isn't it?
Maybe IBM's chief software architect should look into parallelised languages before she declares parallelized architectures to be a roadblock. (And, before you chumps start booing and hissing from the bleachers about she's the chief software architect at one of the world's largest technical firms, well, let's just remember there's about a 1/8 chance globally that your phone call home to ask for money from Mom is running over Erlang.)
StoneCypher is Full of BS
I'm anchoring on your post because you've hinted at what I think this whole discussion is missing: there are several reasons to run more than one sequence of instructions simultaneously, and some of the problems are much easier than others.
For one thing, there is a huge difference between dividing a task into independent parts that can run in parallel, possibly combining the results afterwards, and running multiple concurrent, interacting threads. There is also a huge difference between explicit parallelism (a request comes into the server, and it fires off another thread to handle it) and implicit parallelism (I've got this expensive mathematical algorithm represented sequentially, and the compiler is going to spot opportunities to run parts of it in parallel and then combine the results).
Of these, explicit parallelism is easy, but both implicit parallelism and explicit concurrency are hard. (Implicit concurrency doesn't really make sense with these definitions.)
You described pure functional languages as "ideally suited for parallel processing". In a sense, yes, because of their underlying model they lend themselves to parallel processing. However, explicit parallelism is easy anyway, and sadly current research suggests that the scope for implicit parallelisation of algorithms is relatively limited in many applications. I'm not sure the big advantages coming from the functional programming world are because of this, though of course if someone does find a way to cost-effectively run algorithms in parallel for relatively short computations that might all change.
What is really interesting to me in the functional programming world isn't the "pure" concept of having no side effects, but rather the type systems that represent side effects explicitly, of which perhaps the best-known example is the monadic I/O concept in Haskell. See Simon Peyton-Jones's excellent paper Tackling the Awkward Squad for background on this and related areas.
My personal take is that this is too cumbersome to use in everyday programming as it currently stands. Indeed, the same view was expressed by several senior Microsoft programming language architects in a recent interview. IIRC, they gave the example that if you wanted expr1+expr2 for non-trivial expressions, you shouldn't always have to compute expr1 and expr2 and then combine the results in an explicit order just to satisfy the language's composition rules.
However, the principle of tracking side-effects and ordering them explicitly when it matters seems very sound. Ultimately, anything your program does matters only to the extent that it influences observable side-effects, and any internal computations can be arbitrarily reordered and parallelised as long as the side effects still come out the same.
From here, we can make the big jump to concepts like transactional memory, where related side effects that affect shared memory areas are grouped into database-style transactions, and the run-time framework ensures that either all related side-effects take effect together, or none do at all, as observed from any other thread that shares the memory space. I'm not really doing the concept justice with this summary: it is a vastly better approach to shared state than the classical thread-locking mechanisms, in expressive power, in safety, and in composability. The name of Simon Peyton-Jones appears often in the literature here as well, and I thoroughly recommend the work he and his colleagues have been doing to anyone who's interested in how we might deal with today's concurrency problems when our programming tools have grown up.
Similarly, you can start thinking of concurrent threads as sequences of actions whose side-effects may be arbitrarily ordered by default, and inter-thread communication as a mechanism to impose ordering where it is required.
The sorts of prototype implementations flying
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
"In an InformationWeek article entitled 'Where's the Software to Catch Up to Multicore Computing?' the Chief Architect at IBM gives some fairly compelling reasons why your favorite software will soon be rendered deadly slow because of new hardware architectures."
If the article says that, then how come Opera can't find the words "slow" or "deadly", much less "deadly slow" on that page?
"I don't know how to program in a functional language, I don't want to learn to program in a functional language, and I'll do my damndest best to make sure nobody starts using functional languages, even if it would be for the best."
Only physicians with a doctorate (e.g. a PhD or MD) should be called doctors. This applies to any other profession. No difference.
they are called threads ...
http://en.wikibooks.org/wiki/Ada_Programming/Task
Martin
In Java threading is a library feature controlled by the Thread class, in Ada threading is a first class language feature:
i ng
http://en.wikibooks.org/wiki/Ada_Programming/Task
But more importantly: Ada uses rendezvous instead of semaphores which are a lot easier to handle and more elegant to look at.
Martin
I know I repeat myself:
i ng
;-).
http://en.wikibooks.org/wiki/Ada_Programming/Task
Programming languages with build in multitasking have been around for a long time. They just haven't made it into the "top ten". Mind you: Ada has a firm place in the top twenty
Martin
Where the HELL did that mountain come from???
Coz eternity my friend, is a long *ing time.
An engineer, a chemist and a mathematician are staying in three adjoining cabins at an old motel. First the engineer's coffee maker sets fire. He smells the smoke, wakes up, unplugs the coffe maker, throws it out the window, and goes back to sleep. Later that night, the chemist smells smoke too. He wakes up and sees that a cigarette butt has set the trash can on fire. He says to himself, "Hmm, How does one put out a fire? One can reduce the temperature of the fuel below the flash point, isolate the burning material from oxygen, or both. This could be accomplished by applying water." So he picks up the trash can, puts it in the shower stall, turns on the water, and when the fire is out, goes back to sleep. The mathematician of course has been watching all this out the window. So later, when he finds that his pipe ashes have set the bedsheet on fire, he is not in the least taken aback. He says: "Aha! A solution exists!" and goes back to sleep.
:)
The existence of a solution doesn't mean the problem doesn't exist, and doesn't mean the problem isn't hard. The reason there are lots of tools to manage concurrency is not because concurrency is easy, it's because concurrency is hard. And so is getting people to use the tools. If it wasn't, we'd all be using Smalltalk or Lisp - derived languages, rather than a dozen languages modelled on 'C'.
Going back to my example of the Amiga, protected memory (as anther poster has pointed out) is a tool that Commodore could have used to manage concurrency better in AmigaDOS. It was a well-known tool, but Commodore was unwilling to accept the additional cost of upgrading from the 68000 to the 68010 (used, for example, in the contemporary AT&T 3b1) and the accompanying performance impact.
Existing software is another complication that will lead to fine-grained concurrency having a bigger impact than it perhaps should. Again, the Amiga is a great example: Commodore did provide a hook to allow applications to live in protected memory and only use shared memory for access to shared resources (memory allocated without MEMF_PUBLIC set was intended for protected memory in later models), but unfortunately too many programmers left that flag off in allocating structures that had to be shared, or allocated them on the stack, so implementing PRIVATE memory would have broken too much software for them to follow through.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
This was a good focus group (oh, you didn't realize?) on the state of software as it relates to multicore architectures. I'm going to try and summarize what has got to be apparent to most people reading all these threads.
...it was 2010 when you have 64+ cores and the clock speed is still not increasing. 128+ cores and still no clock speed increase. Hmmm... those non-OLTP business apps don't _seem_ to be speeding up anymore do they? Yet you are paying through the nose for new processor technology. Your CFO will be asking about Return on Assets (ROA) soon...
... for some reason that phrase went MIA when the post hit the air. The title, in retrospect, should have included a question mark on the end (in true passive-aggressive Internet behavior).
a) Education in software development community must continue. Witness Intel's unprecedented outreach to global universities upon the launch of what I call "the multicore arms race" last year. Whether you are in the "j2xs is stupid" camp or the "j2xs is a crazy alarmist, but has a point" camp -- you can't deny the total disarray of responses. Guess what Mr(s) CIO, these are the people coding the software running your business. Take note here and tell me there is no way icebergs can bring down Titanic (good movie... I cried...). Look here and tell me the dot-net hockey stick growth curve will never end (uh...oops) -- meaning 'the one with the loudest mouth is NOT right... (s)he is just loud, man.'
b) I was not commenting on the 3% of developers that make up concurrency, grid computing, gaming (hard stuff!), most infrastructure ISV's (can invest in PhD's), etc... I was commenting on the 98% of the rest of us who code business software in Global 2000 organizations. I was commenting on the 200,000,000 lines of COBOL (that I helped write while with Accenture) powering corporations that doesn't have a single hint of parallelism in it. Last 5 years of Perl code? Nope, no parallelism. Last 5 years of Python? Nope, no parallelism. A significant portion of consultants on large projects in the 90's were actually MBA's who learned to program 3 months prior... wonder if that's still the case? Wonder if 98% of consultants have cracked the book on concurrent programming, OpenMP, MPI or Haskell yet?
Fully agree the J2EE, ESB camps are more or less immune due to bean pooling and threading support; however, each EJB cannot run parallel code (container won't allow) and there are many 20 second bean executions out there that could run in 8 seconds. Ever run Foo.Sort()? Not parallel; should be. Most likely an edge case?
c) Yes, if each core in each CPU continues to double in clock-speed every year, then this post will go down in the history books as truly "from da-end-of-da-world desk". But it's not me saying that clock speeds likely won't double every year (until breakthrough in physics occurs), it's the chip vendors themselves.
d) Yes, if your server has 40 running processes after boot up, then your apps will _seem_ to speed up because they have their own little "core kingdom" to use. My point wasn't 2007
In closing, I would like to address the small fraction of you that couldn't find "deadly slow" in the IBM article. I had put "(my words)" in my post to slashdot
-j2xs
Java To Excess
Hardware paralelism can be beneficial without having to rewrite every application. In some cases, the OS can take advantage of multiple CPUs to make single-threaded applications run faster. In my current job, I run single-threaded tasks in a batch environment. Our system allows us to run one task per core, thus taking advantage of a multi-core environment.
No, I will not work for your startup
This is retarded. Claiming that software will be "clunky" or "sluggish" because hardware vendors decided to make an inefficient hardware design to cope with the fact that they can't or won't innovate is rather like saying that cars without engines don't run well because of the tires. Multi-core will have crappy software because multi-core is essentially a "hack" to get over the fact that silicon can't supply the density needed for high performance applications. Blaming software developers for the hardware industry's crappy hack is pretty pathetic. Instead of blaming software for their crappy designs, maybe they should work on moving away from silicon into spintronics or any of the other promising possible hardware advances.
If you read the article, you will see that Mrs. Crawford does not even come close to saying
h tml?articleID=197001130/
that "Software is at Dead End". She says software needs to catch up with the hardware.
Computers have more and more processors (and different kinds of processors, like GPUs), and
currently most software isn't designed for that kind of environment. IBM has developed
some clever ways to program these types of systems in a "general purpose" way.
That's the worst summary of a headline that I've ever read.
ozgur uksal
http://www.informationweek.com/news/showArticle.j