Panic in Multicore Land

← Back to Stories (view on slashdot.org)

Posted by Zonk on Monday March 10, 2008 @11:30PM from the multi-cores-no-waiting dept.

MOBE2001 writes "There is widespread disagreement among experts on how best to design and program multicore processors, according to the EE Times. Some, like senior AMD fellow, Chuck Moore, believe that the industry should move to a new model based on a multiplicity of cores optimized for various tasks. Others disagree on the ground that heterogeneous processors would be too hard to program. The only emerging consensus seems to be that multicore computing is facing a major crisis. In a recent EE Times article titled 'Multicore puts screws to parallel-programming models', AMD's Chuck Moore is reported to have said that 'the industry is in a little bit of a panic about how to program multicore processors, especially heterogeneous ones.'"

367 comments

Min score:

Reason:

Sort:

Panic? by jaavaaguru · 2008-03-10 23:34 · Score: 4, Insightful

I think "panic" is a bit of an over-reaction. I use a multicore CPU. I write software that runs on it. I'm not panicking.

--
Follow me
1. Re:Panic? by dnoyeb · 2008-03-10 23:38 · Score: 1
  
  Is it April 1st already?
  
  We have been writing multi-threaded software for years. There is nothing special about multicore. Its basically a cut down version of a dual-CPU box. The only people that should have any concern at all would be the scheduler writers. And even then there is no cause for "panic".
2. Re:Panic? by shitzu · 2008-03-10 23:42 · Score: 4, Insightful
  
  Still, the fact remains that the x86 processors (due to the OS-s that run on them, actually) have not gone much faster in the last 5-7 years. The only thing that has shown serious progress is power consumption and heat dissipation. I mean - the speed the user experiences has not improved much.
3. Re:Panic? by leenks · 2008-03-10 23:57 · Score: 5, Insightful
  
  How is an 80-core cpu a cut down version of a dual-CPU box? This is the kind of technology the authors are discussing, not your Core2 duo MacBook...
4. Re:Panic? by Anonymous Coward · 2008-03-11 00:03 · Score: 0, Interesting
  
  > the speed the user experiences has not improved much.
  
  User experience is not a useful metric for performance, unless you consider media encoding , decoding and rendering. 10 years ago I was running a P166, what kind of framerates would I get with a modern game using a software renderer? What kind of framerates would I get for decoding a HD video stream?
  
  Do you seriously think a 12 year old P166 will provide a comparative user experience to a modern 8 core 3GHz machine? You're putting it down to "the OS's that run on them", which is interesting since user mode x86 emulation with QEMU runs W2K faster on my laptop than on the hardware I ran it on back in 1999.
5. Re:Panic? by Cutie+Pi · 2008-03-11 00:03 · Score: 5, Informative
  
  Yeah, but if you extrapolate to where things are going, we're going to have CPUs with dozens if not hundreds of cores on them. (See Intel's 80 core technology demo as an example of where their research is going). Can you write or use general purpose software that takes advantage of that many cores? Right now I expect there is a bit of panic because it's relatively easy to build these behemoths, but not so easy to use them efficiently. Outside of some specialized disciplines like computational science and finance (that have already been taking advantage of parallel computing for years), there won't be a big demand for uber-multicore CPUs if the programming models don't drastically improve. And those innovations need to happen now to be ready in time for CPUs of 5 years from now. Since no real breakthroughs have come however, the chip companies are smart to be rethinking their strategies.
6. Re:Panic? by Chrisq · 2008-03-11 00:04 · Score: 4, Insightful
  
  Yes panic is strong, but the issue is not with multi-tasking operating systems assigning processes to different processors for execution. That works very well. The problem is when you have a single CPU-intensive task, and you want to split that over multiple processors. That, in general, is a difficult problem. Various solutions, such as functional programming, threads with spawns and waits, etc. have been proposed, but none are as easy as just using a simple procedural language.
7. Re:Panic? by that+this+is+not+und · 2008-03-11 00:30 · Score: 1
  
  So the message here is that they won't be able to *sell* these things since there isn't a market, nor well-defined uses for them. I guess that would be a panic situation for marketing types.
  
  And since these things can't be 'just used' as a faster version of an 8088 processor, the way the CPU houses sold the 386 (frankly, one of the last cases where they had to scale a 'really big change' to an existing market that had almost no motivation to use the new features) there's a panic that people might just not buy the new stuff.
  
  Where will it lead!?! The hardware upgrade cycle feeds all sorts of mouths that might otherwise have to actually provide meaningful innovative products.
8. Re:Panic? by shitzu · 2008-03-11 00:39 · Score: 1, Informative
  
  I was speaking of the last 5-7 years.
  
  I have an old AMD-XP-something running windows XP at home, it is at 5 years old. I have a Core2Duo machine is sometimes use. I dont see much difference in day-to-day usage. Even if there is one, i would attribute most of that to faster drives and i/o.
9. Re:Panic? by ObsessiveMathsFreak · 2008-03-11 00:47 · Score: 4, Insightful
  
  That works very well. The problem is when you have a single CPU-intensive task, and you want to split that over multiple processors. That, in general, is a difficult problem.
  
  It is in general, an impossible problem.
  
  Most existing code is imperative. Most programmers write in imperative programming languages. Object orientation does not change this. Imperative code is not suited for multiple CPU implementation. Stapling things together with threads and messaging does not change this.
  
  You could say that we should move to other programming "paradigms". However in my opinion, the reason we use imperative programs so such is because most of the tasks we want accomplished are inherently imperative in nature. Outside of intensive numerical work, most tasks people want done on a computer are done sequentially. The availability of multiple cores is not going to change the need for these tasks to be done in that way.
  
  However, what multiple cores might do is enable previously impractical tasks to be done on modest PCs. Things like NP problems, optimizations, simulations. Of course these things are already being done, but not on the same scale as things like, say, spreadsheets, video/sound/picture editing, gaming, blogging, etc. I'm talking about relatively ordinary people being able to do things that now require supercomputers, experimenting and creating on their own laptops. Multi core programs can be written to make this feasible.
  
  Considering I'm beginning to sound like an evangelist, I'll stop now. Safe money says PCs stay at 8 CPUs or below for the next 15 years.
  
  --
  May the Maths Be with you!
10. Re:Panic? by divisionbyzero · 2008-03-11 00:47 · Score: 5, Funny
  
  Developers aren't panicking. Their kernels are! Ha! Oh, that was a good one. Where's my coffee?
11. Re:Panic? by Saurian_Overlord · 2008-03-11 00:51 · Score: 5, Insightful
  
  "...the speed the user experiences has not improved much [in the last 5-7 years]."
  
  This may almost be true if you stay on the cutting edge, but not even close for the average user (or the power-user on a budget, like myself). 5 years ago I was running a 1.2 GHz Duron. Today I have a 2.3 GHz Athlon 64 in my notebook (which is a little over a year old, I think), and an Athlon 64 X2 5600+ (that's a dual-core 2.8 GHz, for those who don't know) in my desktop. I'd be lying if I said I didn't notice much difference between the three.
12. Re:Panic? by SlashV · 2008-03-11 00:54 · Score: 1
  
  there won't be a big demand for uber-multicore CPUs if the programming models don't drastically improve. And those innovations need to happen now to be ready in time for CPUs of 5 years from now. Software always lags behind hardware development. The 80386 was launched in 1986. Useful 32 bit code only arrived in the 90's. Starting software innovations now, for CPU's that will only be available in 5 years isn't very feasible or even useful.
13. Re:Panic? by 10101001+10101001 · 2008-03-11 01:15 · Score: 1
  
  Can you write or use general purpose software that takes advantage of that many cores?
  
  A 3D video driver? So that all PCs will have a decent "graphics card"? I think game designers will come up ways to use those extra CPUs such that even more CPUs will be needed. Or otherwise unthinkable things (mostly, the sort of thing that throwing strong parallel CPU power can solve but which is cost prohibitive today) will start being common.
  
  Now, will *most* software use many/all of them? No. But, then, most CPUs are idle most of the time right now. A much bigger issue (even today) is electricity usage, not CPU usage. But, let's ignore that elephant in the room.
  
  --
  Eurohacker European paranoia, gun rights, and h
14. Re:Panic? by GreatBunzinni · 2008-03-11 01:22 · Score: 1
  
  However, what multiple cores might do is enable previously impractical tasks to be done on modest PCs. Things like NP problems, optimizations, simulations. Of course these things are already being done, but not on the same scale as things like, say, spreadsheets, video/sound/picture editing, gaming, blogging, etc. I'm talking about relatively ordinary people being able to do things that now require supercomputers, experimenting and creating on their own laptops. Multi core programs can be written to make this feasible.
  
  Idealisms... Unfortunately reality doesn't play by those rules, as thirty years ago bright minds predicted that knowing how to program a machine with high level programing languages would be also a trivial thing for "relatively ordinary people". So what do we see? The "relatively ordinary people" do have powerful computers but they don't go much further than chatrooms, myspace and blogs.
  
  --
  Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
15. Re:Panic? by Chrisq · 2008-03-11 01:24 · Score: 1
  
  Impossible might be too strong. I don't think anyone has proved that you can't take a program written in a normal procedural language and somehow transform it to run on multiple processors. Its just that nobody has any idea of how it could be done. The fact that a skilled programmer may be able to look at a process and identify isolated components that can run in parallel means that some day a computer may be able to do the same.
16. Re:Panic? by cloakable · 2008-03-11 01:27 · Score: 1
  
  The 80-core CPU will /still/ (in my opinion, anyhow), be a cut down version of a dual-CPU box, because those 80 cores are sharing the same north- and southbridge. Can we say bottleneck?
  
  What I would dearly love is to have SMP back, and affordable to the average person, available is x high street computer shop. True, it'll make it more expensive to get two or four cores, but do you really need that many at the moment? My computer right now has a single, single core Athlon XP 2000+ (1.6GHz), and it's running a LAMP setup with several virtualhosts, along with KDE, with stuff like a torrent client, PIM application, etc. And it still barely touches the CPU.
  
  So get rid of the crappy multicore stuff, and get back with true SMP. I have no dualcore computers, and two SMP, dual cpu ones.
  
  --
  No tyrant thrives when every subject says no.
17. Re:Panic? by TuringTest · 2008-03-11 01:30 · Score: 1
  
  You could say that we should move to other programming "paradigms". However in my opinion, the reason we use imperative programs so such is because most of the tasks we want accomplished are inherently imperative in nature. Outside of intensive numerical work, most tasks people want done on a computer are done sequentially. The availability of multiple cores is not going to change the need for these tasks to be done in that way. A workable solution is using multiparadigm languages such as OCaml. This way you can program most of your sequential tasks inside imperative modules, yet avoid the race conditions and deadlocks by writing the difficult synchronization concerns in a more suitable functional language that connects the many imperative bits in a safe way. This implies the use of very abstract constructs such as monads and arrows (related to the concept of "inversion of control" in the object oriented world).
  
  For the record, Microsoft has recently announced commercial support after its research F# language, based on OCaml, for the .Net platform. Maybe their onto something?
  
  --
  Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
18. Re:Panic? by Constantine+XVI · 2008-03-11 01:38 · Score: 1
  
  That's because no-one's really put much time/effort into making "Idiot-Capable(TM)" programming systems. Stuff like Automator in OSX is a step in the right direction. It's not some archaic language that require semicolons and parenthesises (or whitespace) everywhere; it's a nice little GUI, and you drag+drop stuff around to make a working script. Is your grandma going to be using it to solve cancer? No, but it's much better along than "10 PRINT 'Hello World!'", and one could easily start using it without much instruction.
  
  (Full disclosure: Ubuntu junkie)
  
  --
  "I think an etch-a-sketch with an ethernet port would beat IE7 in web standards compliance."
19. Re:Panic? by mollymoo · 2008-03-11 01:39 · Score: 2, Insightful
  
  The 386 could run existing 16-bit code faster than the processors it replaced, so there was a market for it despite the lack of 32-bit code. This is not the same situation; an 80-core processor won't run today's code any faster than an 8-core proccessor (assuming the cores are the same). Nobody will buy an 80-core processor till there is software which would benefit from it.
  
  --
  Chernobyl 'not a wildlife haven' - BBC News
20. Re:Panic? by TuringTest · 2008-03-11 01:44 · Score: 1
  
  Idealisms... Unfortunately reality doesn't play by those rules, as thirty years ago bright minds predicted that knowing how to program a machine with high level programing languages would be also a trivial thing for "relatively ordinary people". So what do we see? The "relatively ordinary people" do have powerful computers but they don't go much further than chatrooms, myspace and blogs. That is not because reality is stubborn, it's because programmers are. The main research lines of programming languages have for a long time abandoned the initial pursuit for user-friendly languages, their last successes being in BASIC, Logo and the Fourth Generation Languages (i.e. SQL).
  
  When this track has been taken up again, friendly programming environments like Alice and whole new user-centered-programming paradigms such as Programming By Demonstration have emerged.
  
  --
  Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
21. Re:Panic? by Sebastian+Reichelt · 2008-03-11 01:45 · Score: 2, Insightful
  
  I think you are right that a lack of demand is the reason for the panic, but that is probably a broader issue: CPU manufacturers seem to be desparately looking for fields in which more processing power would be an advantage, even though it becomes more difficult to use. For the average user, even the increasing CPU speeds of the past have not shown much of a benefit, as software has become more demanding just because it could, not because users wanted features requiring a lot of CPU power (except in certain areas such as image processing). Now that CPU speeds cannot be increased much further, wasting of CPU time will also have to stop at the current level. It is not realistic for the same programmers who have been writing more and more inefficient code, to start using multiple threads just to continue this trend.
  
  That must be the reason why CPU companies are looking for niches of the consumer market where there is a realistic chance of programmers actually utilizing all available processing power, despite the difficulties. It is no surprise to me that "gaming" is a common answer. But the only consumer-related answer I could find in the article is this: "It could also create desktops that automatically index personal pictures based on facial recognition software." Judge for yourself.
22. Re:Panic? by GreatBunzinni · 2008-03-11 01:50 · Score: 1
  
  You have a point. Nonetheless, we already have languages and tools which advanced the user-friendly aspect of programming quite a bit, not to mention WYSIWYG RAD tools which dumb things down quite a bit. Moreover, some complex languages like C++ are currently blessed with programming toolkits like Qt which make it possible for someone to put up a simple application in a matter of minutes. And then we have that ton of interpreted languages, some of them almost read like english.
  
  But even with all that progress behind our backs, we are only seeing myspace-bound mouse potatoes.
  
  --
  Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
23. Re:Panic? by mollymoo · 2008-03-11 01:51 · Score: 3, Insightful
  
  No matter how easy they make knitting I'm never going to do it, because I don't want to knit my own clothes. I just want ones which look good and work. No matter how easy you make programming most people just aren't going to do it, because they don't want to write their own programs. They just want programs that work.
  
  --
  Chernobyl 'not a wildlife haven' - BBC News
24. Re:Panic? by johannesg · 2008-03-11 01:56 · Score: 4, Insightful
  
  Let's not be too harsh on ourselves. In most systems today, the bottleneck is the hard disk, not the CPU. No amount of threading will rescue you if your memory has been swapped out.
  
  I write large and complex engineering applications. I have a few threads around, mostly for the purpose of doing calculation and dealing with slow devices. But I'm not going to add in more threads just because there are more cores for me to use. I'll add threads when performance issues requires that I add threads, and not before.
  
  Most software today runs fine as a single thread anyway. The specialized software that requires maximum CPU performance (and is not already bottle-necked by HD or GPU access) will be harder to write, but for everything else the current model is just fine.
  
  If anything, Intel should worry about 99% of all people simply not needing 80 cores to begin with...
25. Re:Panic? by Penguin+Follower · 2008-03-11 01:56 · Score: 4, Informative
  
  Unless you're speaking of AMD SMP systems, the Intel systems up until recently share the FSB among all the CPUs. So from the Intel side of things, SMP vs multi-core is nearly the same (save for L2 cache sharing and whatnot). The only notable exception, on the Intel side, that I have noticed is that the recent Xeon systems (within like the last two years) seem to be using two "northbridges". For example, my "quad-core" Mac Pro tower that I bought in April of 2007. It has two dual-core Xeons and the motherboard has two northbridges (though Intel doesn't refer to their chipsets that way last I checked. They like to talk about "hubs".).
26. Re:Panic? by RCL · 2008-03-11 01:56 · Score: 1
  
  Ever tried programming for IBM's Cell?
  
  --
  Coding etudes
27. Re:Panic? by TemporalBeing · 2008-03-11 02:01 · Score: 3, Insightful
  
  "...the speed the user experiences has not improved much [in the last 5-7 years]."
  
  This may almost be true if you stay on the cutting edge, but not even close for the average user (or the power-user on a budget, like myself). 5 years ago I was running a 1.2 GHz Duron. Today I have a 2.3 GHz Athlon 64 in my notebook (which is a little over a year old, I think), and an Athlon 64 X2 5600+ (that's a dual-core 2.8 GHz, for those who don't know) in my desktop. I'd be lying if I said I didn't notice much difference between the three.
  
  Do notice that in 5 years we have barely increased the clock frequency of the CPUs
  
  Do notice that multi-cores don't increase the overall clock frequency, just divide the work up among a set of lower clock frequency cores - yet most programs don't take advantage of that. ;-)
  
  Do notice that despite clock frequencies going from 33 mhz to 2.3 GHz, the user's perceived performance of the computer has either stayed the same (most likely) or diminished over that same time period.
  
  Do notice that programs are more bloated than ever, and programmers are lazier than ever.
  ...
  In the end the GP is right.
  
  --
  Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
28. Re:Panic? by DarkOx · 2008-03-11 02:09 · Score: 2, Insightful
  
  Its not the same as before though. In 1986 I could get something for my money buying a 386, even if there was no new software in my plans. You got speed. Moving your DOS bases accounting package from that PC-AT at 6mhz to a 368 running at 20mhz let you do your payroll cycle faster.
  
  Assuming clock rates don't increase much; and they have not been, and instruction sets don't improve much, and the have not been; then beyond 3-4 cores I don't get any kind of improvement in the desktop world. I don't even see much improvement in the server world other then for running vmware and a few applications like database software that is some what parallelized; even that stuff though stops scaling well in most cases past core 8.
  
  That means there will be no demand for new chips accross the majority of the business sector. That is a big problem of Intel and AMD.
  
  --
  Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
29. Re:Panic? by billcopc · 2008-03-11 02:16 · Score: 1
  
  Well that's just, like, your opinion, man! Meanwhile, I've got an Intel quad, down from a 16-way Opteron (that was getting old). What I would like is a quad-quad Core 2, aka 16 cores and that lovely NUMA memory split, along with a very very wide SAS/SATA controller to feed in the data. Average Joe may not have a use for parallelism and sky-high clocks, but I most certainly do. I whine because my current rig only supports 8gb Ram, I want 32 or even 64! The simple explanation is that there are some things that become so ridiculously fast and trivial on a beastly machine, they make you wonder why you ever tried to solve them on a limited box in the first place. There are numerous tasks that work with huge datasets, trying to shove them into 4gb on a single processor means you have to swap things in/out all the time and you spend more time optimizing the caching algo than actually solving the original problem.
  
  --
  -Billco, Fnarg.com
30. Re:Panic? by ThreeIfByAir · 2008-03-11 02:27 · Score: 1
  
  If you retain the concept of a single north and south bridge, then yes, there's going to be a bottleneck. But on the multicore chips I've worked on, that doesn't exist: there are multiple memory controllers and multiple I/O controllers built right into the chip. Memory bandwidth is still an issue (and indeed often _is_ the major gating factor) but if you want to get really good performance out of a truly multicore architecture, you're going to have to rethink the memory and I/O connectivity. NUMA isn't that far out of the mainstream now, and if we're going to go all out for multicore, it's going to swim right into the mainstream.
31. Re:Panic? by xrobertcmx · 2008-03-11 02:38 · Score: 1
  
  Over the last 5 years I have moved from Athlon 1800+ to a 3400+ to a X2 4400 and now to a X2 6000+ (Had a lot of overtime). I also had and still have a P4 Mobile 3.0 w/HT laptop from Dell and now carry around a Macbook Pro (Core 2 Duo 2.16). 7 years would only add a K62 450 to the list. I've since donated the 1800 and the X2 4400 to family, but my 3400 is still in the office, now running OpenSuse and doing all of my file and media hosting work. The one thing I noticed following the move to a dual core machine was that the number of applications I would have open at one time increased. I tried this on the old Dell the other day when I needed to update it and tried to do some additional work while sitting there and it was painfully slow compared to my newer machines. On my desktop or even my Mac I can have 3 or 4 apps open, a dvd burning, and music playing. This is not the case on the single core machines.
32. Re:Panic? by nekokoneko · 2008-03-11 02:39 · Score: 2, Interesting
  
  Do notice that in 5 years we have barely increased the clock frequency of the CPUs Do notice that multi-cores don't increase the overall clock frequency,
  Clock frequency is not an indicative of CPU performance. For example, the Core 2 chips, despite generally operating at a lower frequency than the Pentium 4's outperform them significantly.
  just divide the work up among a set of lower clock frequency cores - yet most programs don't take advantage of that. ;-)
  If I'm not mistaken, even if a specific program was not designed to use several cores, the OS can still run different programs in each core, improving the overall user performance. Correct me if I'm wrong.
  Do notice that despite clock frequencies going from 33 mhz to 2.3 GHz, the user's perceived performance of the computer has either stayed the same (most likely) or diminished over that same time period. Do notice that programs are more bloated than ever, and programmers are lazier than ever.
  Your second point in the blockquote corroborates the first one: the problem isn't that the CPU isn't getting faster, we're just throwing bigger and more bloated stuff at it.
33. Re:Panic? by TuringTest · 2008-03-11 02:40 · Score: 1
  
  We are talking about a different set of users. Qt and "Visual X" are NOT for the "relatively ordinary people", they are for highly experienced application programmers (even it DOES easy the work for them).
  
  Anything requiring to understand the distinction between declaration and use of a variable, is not suitable for people not trained into programming. The "almost read like english" was tried quite a while ago with COBOL and BASIC, and it doesn't cut it.
  
  --
  Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
34. Re:Panic? by Bert64 · 2008-03-11 02:44 · Score: 1
  
  People have been coding for machines with 80 or more CPUs for years, SGI had a nice line of large machines, Cray, Sun and IBM too...
  All that's happening is that this technology is being pushed further into the low end, it's nothing new.
  
  --
  http://spamdecoy.net - free throwaway anonymous email - avoid spam!
35. Re:Panic? by coats · 2008-03-11 02:57 · Score: 5, Informative
  Ditto. And the principles are pretty generic; they haven't changed since a decade before a seminar I gave six years ago at EPA's Office of Research and Development .
  And frankly, it helps a lot to write code that is microprocessor-friendly to begin with:
  
  Algorithms are important; that's where the biggest wins usually are.
  
  Memory is much slower than the processors, and is organized hierarchically.
  
  ALU's are superscalar and pipelined;
  
  Current processors can have as many as 100 instructions simultaneously executing in different stages of execution, so avoid data dependencies that break the pipeline.
  
  Parallel "gotchas" are at the bottom of this list...
  
  If the node-code is bad enough, it can make any parallelism look good to the user. But writing good node-code is hard;-( As a reviewer, I have recommended rejection for a parallel-processing paper that claimed 80% parallel efficiency on 16 processors for the author's air-quality model. But I knew of a well-coded equivalent model that outperformed the paper's 16-processor model-result on a single processor -- and still got 75% efficiency on 16 processors (better than 10x the paper-author's timing).
  fwiw.
  --
  "My opinions are my own, and I've got *lots* of them!"
36. Re:Panic? by LWATCDR · 2008-03-11 02:57 · Score: 1
  
  I think you are talking about NUMA and not SMP.
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
37. Re:Panic? by punkrocher · 2008-03-11 03:01 · Score: 1
  
  but none are as easy as just using a simple procedural language.
  You've never tried erlang, have you?
  
  --
  I can stand brute force, but brute reason is quite unbearable. There is something unfair about its use. It is hitting be
38. Re:Panic? by coats · 2008-03-11 03:04 · Score: 1
  
  No, we do know how to do it: it's just that it's an NP-complete problem, so complex that it is only feasible to apply for really tiny programs. The general solution would not be useful even if compilers had computers 1,000,000,000,000,000,000,000 times as fast as today's, to do the parallel-decomposition-and-compile on.
  
  --
  "My opinions are my own, and I've got *lots* of them!"
39. Re:Panic? by TemporalBeing · 2008-03-11 03:13 · Score: 2, Interesting
  
  Clock frequency is not an indicative of CPU performance. For example, the Core 2 chips, despite generally operating at a lower frequency than the Pentium 4's outperform them significantly.
  Each core would perform nearly the same as a similarly clocked P4, of course, optimizations in the instructions have changed since then too. But they would still perform similarly. Of course, comparing a P4 to a Core2 is like comparing Apples to Oranges as there are architecture changes across the whole chip that would change that (like the move away from P4's netburst architecture). So there are reasons other than clock frequency for that performance difference.
  If I'm not mistaken, even if a specific program was not designed to use several cores, the OS can still run different programs in each core, improving the overall user performance. Correct me if I'm wrong.
  That only works across all the different programs. An OS cannot break a single program into multiple threads/processes for the program - the program has to be coded to do so.
  Your second point in the blockquote corroborates the first one: the problem isn't that the CPU isn't getting faster, we're just throwing bigger and more bloated stuff at it.
  It's both issues. Programmers have gotten lazier and since roughly 2000 (at least from my perspective, likely before that) have come to rely on the ever increasing sizes of hard drives, RAM, and Clock Frequency. The prime directive in the Java community is if you don't like the performance, toss more hardware at it (e.g. Processors); however, that doesn't work if your 1.6 GHz chip single core processors goes to a 1.8 GHz dual core consisting of two 1.1 GHz cores that roughly equate to a 1.8 GHz single core processor in performance. They only equate because the OS can move processes and threads between them, but a program that is designed for a single process cannot take advantage of the second core, and thus effectively runs at the 1.1 GHz instead of the full 1.8 GHz. Programs that are designed to be multi-threaded (or multi-processed) would feel the full benefit of the second core.
  
  This also goes to the bloat - as programmers have typically stopped optimizing code. Thus there are more lines of code in delivered software - often having more and more abstraction layers in them, which doesn't help either. So the overall effect is that the software takes longer to do the same function.
  
  In the end, despite the increase in processing power, the programs run as slow or slower than before. Numerous reasons for it. The GP of my original post in this thread is still correct.
  
  --
  Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
40. Re:Panic? by Alsee · 2008-03-11 03:16 · Score: 5, Insightful
  
  spreadsheets, video/sound/picture editing, gaming, blogging
  
  Odd selection of examples. The processing of cells can almost trivially be allocated across 80 cores. Media work can almost trivially be split into chunks across 80 cores. Games usually relatively easy to split, either by splitiing the graphics into chunks or parallelizable physics or other parallelizable simulation aspects.
  
  Oh, and blogging.
  My optical mouse has enough processing horsepower inside for blogging.
  
  OPTICAL MOUSE CIRCUITRY:
  Has the user pressed a key?
  No.
  Has the user pressed a key?
  No.
  Has the user pressed a key?
  No.
  (repeat 1000 times)
  Has the user pressed a key?
  No.
  Has the user pressed a key?
  No.
  Has the user pressed a key?
  Yes.
  OOOO! YES!
  QUICK QUICK QUICK! HURRY HURRY HURRY! PROCESS A KEYPRESS! YIPEE!
  
  -
  
  --
  - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
41. Re:Panic? by TuringTest · 2008-03-11 03:17 · Score: 2, Insightful
  
  Ah, but they DO want their tedious tasks automated. If you provide users with a way to automate their tasks without them writing a whole program, just by learning what they do often, they will program the machine without knowing.
  
  --
  Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
42. Re:Panic? by Popsmear · 2008-03-11 03:28 · Score: 0
  
  "I think "panic" is a bit of an over-reaction. I use a multicore CPU. I write software that runs on it. I'm not panicking."
  
  The problem with your statement is, they are talking about programming the CPU not programs that run on the CPU.
43. Re:Panic? by nekokoneko · 2008-03-11 03:35 · Score: 2, Insightful
  
  Of course, comparing a P4 to a Core2 is like comparing Apples to Oranges as there are architecture changes across the whole chip that would change that (like the move away from P4's netburst architecture). So there are reasons other than clock frequency for that performance difference.
  That was my point. In opposition to what you had said, the fact that the clock frequency has not increased does not mean that CPU performance has not increased. Unless you didn't mean that an increase in clock frequency is necessary for an increase in performance, in which case I don't understand why you posted about clock frequency at all.
  That only works across all the different programs. An OS cannot break a single program into multiple threads/processes for the program - the program has to be coded to do so.
  Again, that was my point, quote with emphasis added: (...) the OS can still run different programs in each core, improving the overall user performance. I would suggest reading my post with a little more attention.
  In the end, despite the increase in processing power, the programs run as slow or slower than before. Numerous reasons for it. The GP of my original post in this thread is still correct.
  Quoting the GP, emphasis added: the fact remains that the x86 processors (due to the OS-s that run on them, actually) have not gone much faster in the last 5-7 years. The only thing that has shown serious progress is power consumption and heat dissipation. What do the OS's that run on them have to do with the processors' performance? Recent processors have had significant improvements in performance in the last 5-7 years, which makes the GP incorrect.
44. Re:Panic? by Original+Replica · 2008-03-11 03:50 · Score: 1
  
  we're going to have CPUs with dozens if not hundreds of cores on them. (See Intel's 80 core technology demo as an example of where their research is going). Can you write or use general purpose software that takes advantage of that many cores?
  
  For the average user, the first 30 or so cores will be taken up running various parts of Vista, anti-virus, and all the other support programs to make the computer ready to actually do what you want it to do. According to windows task manager I'm running 35 processes right now. Sure at the moment each little process doesn't require it's own core, but I have little doubt that many of those background processes will expand to fill the space if it is made available to them. Having remaining unused cores would at least remove much of the problem of OS bloat from the experience of the "average user", because like it or not the "average user" runs the most bloated OS out there, and will continue to do so for at least the next several years.
  
  --
  We are all just people.
45. Re:Panic? by cens0r · 2008-03-11 03:52 · Score: 3, Insightful
  
  If the 80 core processor can run 10 virtual machines as fast as one machine on the 8 core processor, I would be interested.
  
  --
  Jack Valenti and Orrin Hatch will be first up against the wall when the revolution comes.
46. Re:Panic? by mollymoo · 2008-03-11 03:53 · Score: 1
  
  I would run multiple apps (there are at least half a dozen I would never bother to close), play music and burn discs simultaneously on my last single-core machine, a 1.2GHz iBook. I could compile something at the same time too, the only limit I ever hit was lack of RAM. I've not found more cores to be any improvement in multi-tasking beyond the increase in processing power they bring.
  
  --
  Chernobyl 'not a wildlife haven' - BBC News
47. Re:Panic? by TemporalBeing · 2008-03-11 03:57 · Score: 3, Insightful
  
  What do the OS's that run on them have to do with the processors' performance? Recent processors have had significant improvements in performance in the last 5-7 years, which makes the GP incorrect.
  Perhaps you missed my statement about the user's perceived performance. It is true, I grant you, that hardware performance has gotten better. But the user's perception of that performance has not - it's gone the opposite. Some of that is because programmer's rely on a single faster core to correct for their inept programming, lack of optimization, added abstraction layers, etc. However, that is no longer how processors function - they are now two slower processors working together.
  
  And yes, the OS can, and has been able to for years since SMP first came about, spread loads across multiple processors and cores. But that cannot change how a single program functions in and of itself - it cannot make that single program work at any given moment on more than one single core if it was not designed to do so (i.e. if the program is not designed to use multiple threads or processes).
  
  All-in-all, the OP is correct.
  
  --
  Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
48. Re:Panic? by Chrisq · 2008-03-11 03:57 · Score: 1
  
  From Wikipedia OCaml
  
  OCaml bytecode and native code programs can be written in a multithreaded style, with preemptive context switching. However, because the garbage collector is not designed for concurrency, symmetric multiprocessing is not supported
49. Re:Panic? by xrobertcmx · 2008-03-11 04:10 · Score: 1
  
  On a 1.5Ghz G4 Powerbook with 1.25GB of Ram, I routinely had to close applications completely (one of very annoyances I had with OS X) to avoid slowdowns. But, yes, OS X, and Linux both handle multiple apps much better then XP or Vista. CD.DVD burning would normally slow everything down to a crawl on most of my single core machines. K3B does not as much, but I know when burning an ISO and then checking my email on the 3400+ K3B completely locked up, made a nice coaster.
50. Re:Panic? by 0xABADC0DA · 2008-03-11 04:20 · Score: 1
  
  That works very well. The problem is when you have a single CPU-intensive task, and you want to split that over multiple processors. That, in general, is a difficult problem. It is in general, an impossible problem.
  
  Most existing code is imperative. Most programmers write in imperative programming languages. Object orientation does not change this. Imperative code is not suited for multiple CPU implementation. Stapling things together with threads and messaging does not change this. Actually it is not really a hard problem at all, but a solution won't magically make every program use every processor with 100% efficiency -- that seems to be the only result that people consider a solution. But what is impossible is solving it with today's operating systems and language implementations.
  
  Modern operating systems just cannot schedule and manage running fragments of code across multiple processors. Suppose you wanted to do a regex match against a list of 10,000 lines across upto 80 processors. First you would have to have created a pool of 80 threads, each locked to a different CPU. Then add some work to them so they each perform a subset of the matches. Then the OS has to switch to each thread, probably involving a tlb flush, and using large time slices since context switching is so costly. So even in the best case with already existing threads you're talking at least a couple thousand cycles and a few time slices lost in overhead per CPU. So this is a net loss in terms of realtime unless the regex is very costly.
  
  Now consider an operating system that gave a program a timeslice across a certain number of processors instead of for each thread on a per-cpu basis. Then the program could do things like run the regex on each line across multiple CPUs, and the overhead would be only say a hundred cycles on each CPU to interrupt what it was currently doing. Now replace regex with some generic closure... this approach would let that closure *also* be able to do work across multiple CPUs, so that if some processors completed their assigned work quickly they would be helping the processors that had harder work assigned to them.
  
  The other part of the solution is to have a typesafe operating system, like Singularity or JXOS or JavaOS. First, running an action across multiple CPUs means memory will be allocated by one CPU and freed by another. Having garbage collection reduces the problem of locking this, since you essentially do a lock once to start/stop gc instead of at every malloc and every free. And much of this work can be done by other CPUs while a program is running in a 'narrow' part that can only be done in order. Also, if every program shares the same memory space then doing a multiple-CPU timeslice has much less overhead, and in fact CPUs can be dynamically redistributed between processes with so little waste that it is practical to do so.
  
  So you see, what you ask is impossible in practice with current operating systems and most C-style programming environments (anything not type safe). But it is possible, and our operating systems and programming will adapt to it, because purely linear execution will not scale. Fundamentally there will be limits on how fast operations can be done in sequence, and practically speaking it will be cheaper to make 80 cores than to make a core 80 times faster.
51. Re:Panic? by profplump · 2008-03-11 04:29 · Score: 1
  
  However, that is no longer how processors function - they are now two slower processors working together.
  
  But there aren't two slower processors. Core for core modern processors are faster then 5-year-old processors. Even if they have lower clock speeds.
  
  And while the average home user probably can't use 16 cores, they can use two cores. Their foreground app can get 100% of the cycles on core 1, and all the kernel-space calls can run on core 2 -- all the overhead of USB, TCP/IP, etc. can happen on a second core so the first core can run the user app exclusively.
52. Re:Panic? by Just+Some+Guy · 2008-03-11 04:29 · Score: 1
  
  Outside of intensive numerical work, most tasks people want done on a computer are done sequentially.
  At the highest levels, you're mostly right. At lower levels, you're definitely wrong. When an artist is applying filters in Photoshop, although they're only doing so one at a time, each filter's low-level code should be as parallelized as possible for best performance. When you open a web page, you don't want the browser to completely load one image, then completely load the next, then the next; you want quite a few coming in at the same time. Basically, each task people want done on a computer decompose to a huge number of mostly parallelizable subtasks.
  
  I wrote a little replacement for Python's map() function one afternoon to play around with. Now, even though my implementation is unexciting, why wouldn't you want average boring code to be automatically spread across multiple processors if you can do so for free? Google did the same thing but on an entirely different level. Apparently they see quite a lot of value in the idea.
  
  I guess what I'm saying is that even boring things like spreadsheets can be (and should be) optimized quite a lot so that ordinary people can get stuff done quicker. There are methods available to the average programmer that go a long way toward addressing these needs, and we really must start using them.
  
  --
  Dewey, what part of this looks like authorities should be involved?
53. Re:Panic? by Alsee · 2008-03-11 04:42 · Score: 1
  
  I'll unzip a new bag of java and get it in the pipe in a flash.
  
  -
  
  --
  - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
54. Re:Panic? by LWATCDR · 2008-03-11 04:45 · Score: 1
  
  I think that original post is very true. The end user has no experienced much increase in speed. Frankly I don't think most users can experience a lot of increase in the speed of their apps. For the most part all my apps run so fast that I am the limiting factor. Only a few programs make me wait. My compiler for large jobs and when I trans code a file.
  Most of the time when I am waiting it is caused not by the CPU but my internet connection, hard drive, or DVD drive. For the vast number of users PCs are more than fast enough. What most people need are computers that are silent, use as little power as possible, and don't take forever to boot. Almost all PCs have more than enough CPU to edit pictures, surf the web, run quicken, do VOIP, and play media.
  Of course I do have one program that needs a faster CPU and that is FSX but then no PC on the planet seems to run FSX and 60FPS.
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
55. Re:Panic? by zopf · 2008-03-11 04:56 · Score: 1
  
  While I respect your high level of experience in the field, I disagree with the multicore prognosis you suggest.
  
  Throughout history, especially in the computing industry, experts have incorrectly come to the consensus that the next step in technology is unnecessary. I think that Intel is banking on experts like yourself being again shortsighted. In the past, we've seen software expand to saturate the capabilities of almost every device on the market.
  
  If they build it, software (and demand) will come.
  
  --
  Did you see the pool? They flipped the bitch!
56. Re:Panic? by TemporalBeing · 2008-03-11 05:03 · Score: 1
  
  However, that is no longer how processors function - they are now two slower processors working together.
  
  But there aren't two slower processors. Core for core modern processors are faster then 5-year-old processors. Even if they have lower clock speeds.
  
  And while the average home user probably can't use 16 cores, they can use two cores. Their foreground app can get 100% of the cycles on core 1, and all the kernel-space calls can run on core 2 -- all the overhead of USB, TCP/IP, etc. can happen on a second core so the first core can run the user app exclusively. Not exactly. A kernel-space call will delay the foreground app, thus even the scenario you propose would utilize both cores and starve the rest of the system. Now, you could use the one core for a foreground app, and the other core for all background apps; but that too would not be the best scenario. Honestly, the best utilization of both cores in most desktop scenarios would be to treat them like separate processors and utilize them nearly the same as before.
  
  But needless to say, the OS cannot break a single program/thread across more than one core.
  
  Let's take your example - primary program runs on the first core, the OS pushes all the kernel stuff onto the second core. Two things are wrong with this: (1) If the app was only designed for a single processor, then there will be all kinds of timing issues that will arise including but not limited to race conditions. (2) Assuming the OS solves the race conditions - which it can only do internally to itself, not the application it is providing service for, the first core will stall whenever any kernel request is made - even if just for a few milliseconds - as the request is handled by the second core, since the first core is now waiting. But it gets even better - some kernel operations are designed to operate without calling out, thus they will perform their function and return without causing a context switch, etc. Now, if it tried to do all OS functions on the second core, a context switch AND a core switch (double overhead) would now be required for even those calls that could be optimized to run without having to do so. So you've now slowed down your computer.
  
  But you're still left with the issue that the application may not be written to be thread safe - so now, your kernel does something (even if that is thread safe!) on a different core whilst the program continues on the original core and it has an adverse affect on the application since it happens faster than the application needs it to. (Been there - done that. Big problem and hard to find and resolve.)
  
  Ultimately, it doesn't matter to the home user whether the processor has one core, two cores, or 80 cores. What matters is whether their software runs. Now OS's mitigate a lot of these issues by leaving things mostly the same - programs typically operate on a single core just like they would have if there only existed a single core and single processor in the system - changing that would break a lot of applications, which OS kernel vendors have no desire to do - especially Microsoft. But the programs cannot take any more advantage of the multi-cores than they could before unless they were designed with SMP in mind, in which case the same benefit would be derived through multiple processors.
  
  --
  Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
57. Re:Panic? by bdjacobson · 2008-03-11 05:04 · Score: 2, Funny
  
  Might take a look at Gentoo again with 80 cores. I'd be done compiling in just 2 days!
58. Re:Panic? by phillips321 · 2008-03-11 05:13 · Score: 1
  
  Any chance of examples of these tasks your doing to require that much grunt?
59. Re:Panic? by nekokoneko · 2008-03-11 05:21 · Score: 2, Insightful
  
  Perhaps you missed my statement about the user's perceived performance. It is true, I grant you, that hardware performance has gotten better. But the user's perception of that performance has not - it's gone the opposite.
  Yes, I had noticed that statement both in your post and in GP's post and there is anecdotal evidence that the perceived performance has not increased. The objectiveness of such a statement notwithstanding, one could argue that this increase in performance has not led to an increase in the users' perceived performance, but this argument has a tenuous relation at best with the other statements presented in your and the GP's post, such as statements about the increase in clock frequency. Particularly, the statement by the GP that x86 processors have not been speeding up for the past 5-7 years is patently false.
  And yes, the OS can, and has been able to for years since SMP first came about, spread loads across multiple processors and cores. But that cannot change how a single program functions in and of itself - it cannot make that single program work at any given moment on more than one single core if it was not designed to do so (i.e. if the program is not designed to use multiple threads or processes).
  I find it baffling that you insist in trying to explain to me the point I myself had made in my first post in the thread.
60. Re:Panic? by slashdot_commentator · 2008-03-11 05:25 · Score: 1
  
  Excuse my ignorance, but not just do I agree with what you said, but I think its useful to point out that the articles are really talking about a future hardware and software model 99.9% of the readers here will have no say as to what gets decided. Furthermore, the issues there doesn't really have relevance to the consumer dual cores we use today.
  
  Unless you are a part of that elite 1% who will either be writing video processing applications, massive simulations, or advanced videogame programming, you're best off avoiding the whole issues of parallel processing via threaded programming. Everyone wants to be able to max out their 2-4 cores, but the reality is that you're going to quadruple your development time in bugs, and potentially make your computer less stable running your damn app.
  
  There is already a paradigm for utilizing these multi-core CPUs. Threaded, distributed OS, AND using a Virtual Machine based language like java, to do your application development (I'll be checking out C#. It should be interesting to see if Perl6 can exploit the multi-core, multiprocessing platforms.) VM languages are designed to primitively take advantage of multiprocessing, in a herky-jerky way, while minimizing the active involvement by the programmer to implement those details.
  
  You want to develop faster programs in shorter amounts of time? Focus more on basic algorithms and design, rather than threaded code. Threaded code is probably the assembler language programming of the 21st century.
  
  --
  There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
61. Re:Panic? by pato101 · 2008-03-11 05:37 · Score: 1
  
  true! mod parent up!
  there are several parallel needs out of there besides of making the desktop feel smoother.
62. Re:Panic? by Spy+Hunter · 2008-03-11 05:42 · Score: 1
  
  Safe money says PCs stay at 8 CPUs or below for the next 15 years.
  I think you've underestimated the consumer's appetite for applications that can take advantage of large numbers of cores. Things like image processing (panorama generation, object recognition, 3D reconstruction from images), better speech recognition, HD video editing and encoding, and especially realtime 3D graphics (eventually raytracing) are applications consumers can use and will want in the next 15 years, and can easily scale to use many cores.
  
  In 2010 Intel is planning to finish Larrabee, which is ostensibly a 3D graphics chip but is really nothing more than an array of 24 or 32 small x86 processors. It may not run Windows out of the box, but I think it is inevitable that it will run Linux, and I would not be at all surprised to see computers without a "CPU"; only a Larrabee chip.
  
  --
  main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
63. Re:Panic? by johannesg · 2008-03-11 06:19 · Score: 2, Informative
  
  Ah, sorry: I didn't mean to imply that it is unnecessary for the applications of tomorrow. Where I work we also do those massive simulations mentioned by another poster, and we welcome _any_ number of cores (one thing we simulated was the ATV, mentioned a few days ago on slashdot. The simulator runs on two machines with a total of ten cores between them, and when we started the work, we were afraid our state of the art 1GHz CPU's (single core, at that time) might not be fast enough. Hahaha, it seems so quaint now! ;-) ).
  
  What I did mean to imply is that something fundamental needs to change in the rest of the system as well before this becomes important, though, since right now most of the time I'm not waiting for the CPU, I'm waiting for the hard disk. That guy waiting for the address bar in IE? I'd bet a dollar that he is really waiting for his harddisk. Possibly IE is scanning some history file each time he types a character, and there might be some paging going on, and he might have some severe fragmentation issues, and some torrents open, and all those would combine to making something that should be lightning fast, unbearably slow.
  
  My dualcore, 2.4GHz machine with a staggering 3GB of RAM, occasionally feels slower than my ancient Amiga 500 (7.14MHz, 512KB of RAM, and no hard disk - and no paging file!). As soon as your application swaps out (and that is an activity Windows does as a hobby, just to spite you), you will lose significant time when you want it to come back to life.
  
  And as long as systems remain mostly limited by the harddisk, rather than the CPU, adding threads will not help. Even those massively parallel monster applications of tomorrow will just be spending their time waiting to be paged in.
64. Re:Panic? by JoelKatz · 2008-03-11 06:35 · Score: 2, Funny
  
  Bluntly, it doesn't sound like you have any idea what you're talking about, as nothing about what you said makes any sense at all. Why not stick to talking about thinks you understand? I'll just pick one example, but there are dozens:
  
  "But you're still left with the issue that the application may not be written to be thread safe - so now, your kernel does something (even if that is thread safe!) on a different core whilst the program continues on the original core and it has an adverse affect on the application since it happens faster than the application needs it to. (Been there - done that. Big problem and hard to find and resolve.)"
  
  A single core that was faster would have the same problem. If the application breaks if things happen "too fast", it *needs* to run slow. There's no hope of speeding it up without fixing it anyway. What does that have to do with multiple cores? Nothing.
  
  Okay, one more:
  
  "If the app was only designed for a single processor, then there will be all kinds of timing issues that will arise including but not limited to race conditions."
  
  This makes absolutely *NO* sense. I defy you to present a single case of an application designed for a single processor that runs into problems when the kernel does work on another processor.
  
  And, this claim:
  
  "Now OS's mitigate a lot of these issues by leaving things mostly the same - programs typically operate on a single core just like they would have if there only existed a single core and single processor in the system - changing that would break a lot of applications, which OS kernel vendors have no desire to do - especially Microsoft."
  
  I cannot believe that you have any idea what you're talking about. When you say "changing that", what are you talking about? What is it that OS kernel vendors aren't changing? They've made just about every possible change to support SMP and multicore that they could think of. If there's some change they have no desire to do, please tell us what it is.
65. Re:Panic? by default+luser · 2008-03-11 06:36 · Score: 2, Informative
  
  Clock frequency is not an indicative of CPU performance. For example, the Core 2 chips, despite generally operating at a lower frequency than the Pentium 4's outperform them significantly.
  
  But massive instruction per clock improvements do not happen very often in the x86 chip industry. In fact, I can count all the major improvements for the last 15 years on one hand:
  
  1993: Intel Pentium Pro (approximately 2 INT, 2 FP operations per clock, best case) introduces real time instruction rescheduling to the x86 world. The design can decode 3 instructions per clock. Yes, I am disregarding the Pentum, because you got NO performance improvement without an optimizing compiler.
  
  1997: MMX increases maximum number of integer instructions to 8 per cycle. But, because of the 64-bit data size, you really see little improvement unless using 16-bit or 8-bit types.
  
  1998/1999: 3DNOW! and SSE double the potential throughput for 32-bit floating point, again not all that impressive.
  
  2001: the Pentium 4 actually REDUCES performance per clock, with a single instruction decoder, and heavy reliance on trace cache to make up for this. SSE2 gives the potential to increase FP thoroughput to 4 instructions per clock, per SSE unit, but a half-assed implementation by both Intel and AMD means nothing changes.
  
  2006/2007: the addition of more decode units on the Core 2, packed SSE instructions for both the Core 2 and the Phenom, and TWO 128-bit SIMD units means we see the first improvements in instructions per clock in years.
  
  --
  Man is the animal that laughs.
  And occasionally whores for Karma.
66. Re:Panic? by magus_melchior · 2008-03-11 06:39 · Score: 1
  
  Be glad it isn't Homer Simpson's brain in that mouse:
  
  Has the user pressed a key?
  No.
  Has the user pressed a key?
  No!
  Has the user pressed a key?
  No!!
  Has the user pressed a key?
  NO!! ...
  Has the user pressed a key?
  Look, if I process a keypress, will you stop asking me?
  Yes, of course.
  
  --
  "We are Microsoft. You shall be assimilated. Competition is futile."
67. Re:Panic? by homer_ca · 2008-03-11 06:49 · Score: 1
  
  Sure, a P166 with Netscape 4 worked fine for surfing back in the day. I could even play mp3s in the background. How's that for multitasking? However, web pages were also much simpler back then. Forget about it now that we have AJAX and embedded flash everywhere.
  
  7 years ago would put us in the P-III Coppermine era. That's actually fast enough to handle a modern AJAX or Flash webpage, but if you put it side by side with a modern computer it would be severely lacking, even if you loaded the P-III with 2001-era software to go light on RAM usage. We often look back on the past with rose colored glasses.
68. Re:Panic? by carlmenezes · 2008-03-11 07:08 · Score: 2, Interesting
  
  I'd like to ask a few related questions from a developer's point of view :
  
  1) Is there a programming language that tries to make programming for multiple cores easier?
  2) Is programming for parallel cores the same as parallel programming?
  3) Is anybody aware of anything in this direction on the C++ front that does not rely on OS APIs?
  
  --
  Find a job you like and you will never work a day in your life.
69. Re:Panic? by timeOday · 2008-03-11 07:33 · Score: 1
  
  But even gentoo is bottlenecked by sequential package installation. And within a single package, many of the steps are single-threaded and sequential: downloading, ./configure, make install. And even within compilation, separate compilation directories are handled sequentially, and linking is single-threaded.
  It's a good example of a time-consuming job an end-user might actually want to run, yet it's bumping up against Amdahl's law already.
70. Re:Panic? by Ex-MislTech · 2008-03-11 08:18 · Score: 1
  
  You want 16 cores eh ?
  
  Quad x Quad with AMD from Asus:
  
  http://usa.asus.com/products.aspx?l1=9&l2=39&l3=575&l4=0&model=1868&modelmenu=1
  
  You want crazy I/O, try a 16 port multi-lane SATA II 3ware controller.
  
  http://www.newegg.com/Product/Product.aspx?Item=N82E16816116059
  
  If you need bigger I/O than that and you got a 'unlimited' budget call EMC
  for a RAM drive SAN, lol.
  
  Enjoy !
  
  --
  google "32 trillion offshore needs IRS attention"
71. Re:Panic? by Nykon · 2008-03-11 08:58 · Score: 1
  
  "Safe money says PCs stay at 8 CPUs or below for the next 15 years."
  
  Until Doom 6 comes out and requires 8 or more. I think we'll see a huge surge in 8+ core processors ;-)
  
  --
  "It's better to be a pirate then join the Navy"
72. Re:Panic? by leenks · 2008-03-11 09:25 · Score: 2, Informative
  
  Read http://view.eecs.berkeley.edu/wiki/The_Landscape_of_Parallel_Computing_Research:_A_View_From_Berkeley (specifically the white paper linked from it)
73. Re:Panic? by Sloppy · 2008-03-11 10:07 · Score: 3, Funny
  
  Nobody will buy an 80-core processor till there is software which would benefit from it.
  Fortunately, we already have that software. It's "make" with the "-j 80" option. Intel just needs to run a "Get Gentoo Now!" advertising campaign and their hardware marketing problem is solved.
  
  --
  As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
74. Re:Panic? by ChrisGilliard · 2008-03-11 10:14 · Score: 1
  
  Definitely an over-reaction. Whenever I see these kind of articles I kind of laugh because web/application servers and databases are optimized for multi-core computing already and they scale very well and tons of smart people continually work to make them even more scalable to multi-core/thread processors. Given that much of our computing is moving out onto the web (a trend which will continue until for all intents and purposes all of our processing is done on the web), this whole multi-core panic for desktop software is just completely irrelevant.
  
  --
  No Sigs!
75. Re:Panic? by Mad+Merlin · 2008-03-11 11:17 · Score: 2, Informative
  
  I'd like to ask a few related questions from a developer's point of view :
  
  1) Is there a programming language that tries to make programming for multiple cores easier?
  2) Is programming for parallel cores the same as parallel programming?
  3) Is anybody aware of anything in this direction on the C++ front that does not rely on OS APIs?
  1) Yes.
  2) Maybe.
  3) Yes.
  
  --
  Game! - Where the stick is mightier than the sword!
76. Re:Panic? by Mad+Merlin · 2008-03-11 12:11 · Score: 1
  
  Actually, downloading can be done in parallel with Gentoo, the parallel-fetch feature starts a background job to download packages while you're compiling. ./configure could (fairly easily) be done in parallel as well, but AFAIK it's currently not. make install is primarily I/O bound, and thus wouldn't really benefit from parallelization (although it could be done pretty easily as well). Compiling is definitely not sequential across directories, try make -j5 in pretty much any project with multiple source directories and you'll see.
  
  --
  Game! - Where the stick is mightier than the sword!
77. Re:Panic? by Mad+Merlin · 2008-03-11 12:34 · Score: 1
  
  And as long as systems remain mostly limited by the harddisk, rather than the CPU, adding threads will not help. Even those massively parallel monster applications of tomorrow will just be spending their time waiting to be paged in.
  
  Ah, but you need to consider another major (recent) development, that is, SSD. Prices on SSD are dropping faster than a skydiver without a parachute, and they greatly mitigate the major problem with conventional hard drives (seek time). At the current rate, I won't be surprised if SSD largely replaces conventional hard drives in the next year or two. So, in a year or two, the hard drive probably won't be the bottleneck that it is now, and we'll be better able to feed massively parallel CPUs more efficiently.
  
  --
  Game! - Where the stick is mightier than the sword!
78. Re:Panic? by carlmenezes · 2008-03-11 14:47 · Score: 1
  
  1) Is there a programming language that tries to make programming for multiple cores easier?
  1) Yes.
  Which one(s)?
  
  --
  Find a job you like and you will never work a day in your life.
79. Re:Panic? by nanostuff · 2008-03-11 15:40 · Score: 1
  
  I suspect the conclusion to this will be massive parallelization with serial programming. Core logic will use a single thread. Core logic rarely requires more than one thread. The remaining processors will be used to perform parallel tasks with trivial implementation. Games for example will be single threaded to the programmer, but will use vast parallelization in the compiler for ray tracing / illumination, AI, physics and other tasks that can 'parallelize' themselves.
80. Re:Panic? by Mad+Merlin · 2008-03-11 15:42 · Score: 1
  
  Well, Ada comes to mind... it has tasks (~= threads) as a primitive along with some interesting inter-thread communication and synchronization. The only problem is that Ada is incredibly awful to actually write anything in, kinda like Java, but more obnoxious (and inconsistent) and with even worse I/O capabilities (string support is basically non-existant, for example). Consider yourself warned.
  
  Several other posters have suggested a functional language like Erlang or OCaml, but I've never actually used those... You could also look at some even more obscure languages like Occam, or special purpose languages like GLSL (which is massively implicitly parallel).
  
  --
  Game! - Where the stick is mightier than the sword!
81. Re:Panic? by 0111+1110 · 2008-03-11 16:48 · Score: 1
  
  I don't mean to state the obvious, but the vast majority of today's SSDs are not much faster than the hard drives they replace, especially when you consider both reading and writing. Especially when you compare the SSD transfer rates to dual or quad RAID 0. Take a look at some of the actual benchmarks. You also have to take into account that flash memory has a much more limited number of writes than standards hard drives which don't really have any such limit. Now of course it is possible that some future tech will eliminate these issues, but we haven't seen it yet. The closest we have come to genuinely eliminating the hard drive speed issue is with the Gigabyte i-drive, which itself has many unresolved issues (like reliable battery back up) and does not really solve the problem in its current form.
  
  --
  Quite an experience to live in fear, isn't it? That's what it is to be a slave.
82. Re:Panic? by x2A · 2008-03-11 16:58 · Score: 1
  
  Well there're a few examples where a second core can still take some work from the first even if the first core is running a single threaded process, but yes on the whole it may not be much (or even noticable). Where IO can be performed asynchronously, such as network app (like a network game) that displays its current state on the screen until it receives a packet over the network, at which point it stops working on the screen, processes the incoming packet, then goes back to drawing the updated state on the screen. With a second core, the network interface card (NIC) could trigger the IRQ on the second core, which would pull the packet from the NIC, handle any network frame or tcp/ip stuff (such as combining fragmented packets that may have been received in or out of order), work out which stream it belongs to, do any firewally type stuff, put it in the buffer ready for the app to read it, and then signal the app to tell it there's data ready for it. Similar can be true with asynchronous disc access (think virus scanner running on second core) or display stuff (where app changes contents of screen elements but OS is responsible for keeping the screen up to date with the changes, generating bitmaps from vector graphic fonts etc).
  
  But granted, this could be such a small portion of stuff that the savings are small, or such a large portion that dedicated hardware is used anyway (think graphics accelerators, tcp offload engines, hardware encoders, dsps etc), but possibly still worth noting.
  
  --
  The revolution will not be televised... but it will have a page on Wikipedia
83. Re:Panic? by 0111+1110 · 2008-03-11 18:00 · Score: 1
  
  Actually I am wondering if exactly the opposite is going to be true. Maybe massively multithreaded assembly language programming is going to be our future. After all, so far there is no high level language that solves the problems of massive parallelism. This blind faith that development time is orders of magnitude more precious than runtime has maybe proven to be wrong. Only overclockers are getting much single core speed increase from the recent full process step from 65nm to 45nm. CPU speed increases have been greatly slowing down and, failing a fundamental breakthrough in technology, may soon stop completely. So how do you make current programs run faster when the number of instructions per second no longer increases every year?
  
  We need to start teaching assembly language at our universities again. We need to make it an important part of our curriculum. We need to start encouraging people to actually think about efficiency again. We need to stop ending every discussion about optimization with the mere mention of the sin of 'premature optimization', with 'premature' often translating into 'never'. Instead of trying to continue to improve our native code compilers we are dumping them for JIT byte code 'compilers'.
  
  Maybe, at least for a while, we are going to have to stop thinking about a computer as just a black box. Maybe we are going to have to understand all the little hamsters running around on the inside. And we may also find a lot of the programming paradigms that we hold so dear come tumbling down. The companies that choose to ignore the possibilities of massive SMP and stick with their precious 'maintainable' code with all of its beautiful logic and readability and extensibility might find themselves passed by. Maybe it is time once again to think more like the machines running our code. Could assembly and ultra low level languages (even lower level than C) actually be the way forward? When you can get a speedup of 10x or 20x from parallelism, it is nontrivial and worth the extra effort in writing and maintaining the code. But it would be a very bitter pill for modern programmers to swallow.
  
  OTOH, maybe the way forward will be haskell/ocaml/erlang. Either way, it would be a fundamental change in the direction we are currently headed in the (mainstream) programming world. I don't think the answer will be Java.
  
  --
  Quite an experience to live in fear, isn't it? That's what it is to be a slave.
84. Re:Panic? by x2A · 2008-03-11 18:13 · Score: 1
  
  "But massive instruction per clock improvements do not happen very often in the x86 chip industry. In fact, I can count all the major improvements for the last 15 years on one hand:"
  
  Can count all the major improvements - that you know/can think of, sure you've demonstrated that. What about all the improvements that occur outside of the processor instruction set? Such as improvements to branch prediction and improvements to the pipeline so that branch prediction misses aren't so expensive (as well as additional instructions such as condition move that can reduce need for branching)? Improved out-of-order instruction execution, fast loading/saving of contexts useful for multitasking, improvements to the MMU such as global and large pages mean less TLB misses, memory prefetching, greater memory bandwidth, and of course, hyperthreading (okay just kiddin on the last one ;-)
  
  --
  The revolution will not be televised... but it will have a page on Wikipedia
85. Re:Panic? by Mad+Merlin · 2008-03-11 18:28 · Score: 1
  
  Ah, but you're missing the point. I agree that the raw sequential read/write speeds of SSD is not substantially different than that of a conventional hard drive, however, the seek time for SSD is way under 1 ms, while conventional hard drives are in the 10 ms range. So, a SSD might be 10% slower to write a 2G file, but it also might be 1000% faster (or more) when reading a couple hundred 1M files. Furthermore, a RAID array (of any type) does not improve seek time, in fact it normally has a small detrimental effect. On the other hand, most any RAID array will help sequential read/write performance.
  
  Also, newer flash drives use transparent write leveling and other similar technologies to greatly extend the life of SSD, to the point where the device wearing out really isn't an issue at all, with expected lifetimes typically surpassing that of a conventional hard drive. Look at any Slashdot story in the last year or two about SSD and you'll see the issue hashed out in excruciating detail.
  
  --
  Game! - Where the stick is mightier than the sword!
86. Re:Panic? by TuringTest · 2008-03-11 22:50 · Score: 1
  
  Yes but that's an implementation problem of the standard compiler, not a language feature. Different implementations (like, for example, .Net) could solve that. My point was that this kind of languages provide a set of primitives more tailored to parallel programming than purely imperative languages.
  
  --
  Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
87. Re:Panic? by Bert64 · 2008-03-11 22:55 · Score: 1
  
  What are you running on that single core Dell?
  I've always had lots of apps open, i was always able to play music while burning a dvd (or a cd, going back) and running several other apps at the same time without issues...
  I've heavily used systems from a lowly 14mhz amiga and a P100 all the way up to a quad core system and a 14 cpu sun e4500 over the last few years... The CPU was almost never a bottleneck, so long as there's enough ram to prevent the apps you're using being swapped, things always ran smoothly.
  
  --
  http://spamdecoy.net - free throwaway anonymous email - avoid spam!
88. Re:Panic? by Bert64 · 2008-03-12 00:08 · Score: 1
  
  Server apps most certainly do benefit from lots of CPUs, high end servers have been available with 64, 128, even 512 CPUs for years so there's been incentive to write such code for years...
  Right now, you can buy a 64 processor sun server on ebay for the price of a macbook pro...
  
  Most server apps have traditionally used one thread per user or connection, or a pool of threads, look at Apache, or any server spawned from inetd etc. The only problem, is when you're other system components can't keep up (memory, io) and this is where these big servers with their multi channel raid controllers with large caches, and banks of interleaved memory really come into their own compared to low-end multicore systems which often have to share the same memory bus as their single core counterparts.
  It's also why high end machines (eg IBM blue gene) use relatively low clocked CPUs, so that the rest of the system can keep them constantly fed with data.
  
  --
  http://spamdecoy.net - free throwaway anonymous email - avoid spam!
89. Re:Panic? by default+luser · 2008-03-12 04:43 · Score: 1
  
  Can count all the major improvements - that you know/can think of, sure you've demonstrated that. What about all the improvements that occur outside of the processor instruction set? Such as improvements to branch prediction
  
  The last two times branch prediction on x86 saw a HUGE improvement were with chips like the Pentium (which introduced a 2-entry branch history table), and the P6 and the K6, which introduced much more powerful branch history implementations. Everything beyod the P6/K6 generation has been minor tweaks; when you get above the most basic implementation of the branch history table, your performance improvements are tiny for the general-use case. Even additions like the loop predictor on the Pentium M don't add much, when you realize that most loops have lots of iterations.
  
  Improvements to the pipeline so that branch prediction misses aren't so expensive
  
  This would be great, except that most CPU pipelines ELONGATE as you add more features. Even the saga of the P6 -> Core2 has added more pipeline stages. And when you add more stages, you need better prediction, but there is only so much you can do: the Pentium 4 featured one of the most advanced branch predictors ever (but still only slightly better than previous), plus hyperthreading, all in an attempt to reign-in the cost of clearing the pipeline from branch misses (and other I/O). It didn't work.
  
  (as well as additional instructions such as condition move that can reduce need for branching)?
  
  Does not exist. You can execute both branches (at the expense of memory overhead) like Itanium does, or you can make a prediction (like all modern x86 processors), but only the programmer can actually remove branches.
  
  Improved out-of-order instruction execution,
  
  Which, as I highlighted, hasn't really improved in years. The Athlon took one step above the P6/K6 by adding a third floating-point unit, but kept the 3-wide decoders. The Pentium 4 added an ungodly number of rename registers, but that's mostly to handle the long pipeline and HyperThreading, and it didn't really improve performance per clock (see above).
  
  The Core2 and the Phenom are the the first processors to actually improve out-of-order execution in years, with a combination of wider decode/retire path (Core2), support for packed SSE instructions (both), two 128-bit SSE units (both), and support for out-of-order loads/stores (both). The wider SSE registers double the potential performance for SSE2+ code, and the 4-wide decode path on the Core2 allows you to keep the integer pipes full while the SSE pipes work their magic.
  
  fast loading/saving of contexts useful for multitasking, and of course, hyperthreading (okay just kiddin on the last one ;-)
  
  I grouped these together, because they are essentially the same. Hyperthreading allows fast task switching without involving the operating system, because it makes one processor behave like two.
  
  Don't kid, HyperThreading isn't a bad idea, it just had no place in the (largely) single-threaded univese of the PC. Intel added it to bandage the P4's poor I/O and branching peformance, but it wasn't enough to make the difference (plus it slowed-down single-threaded apps). On I/O limited processors (like Sun's Niagra line), SMT is a wonderful thing.
  
  improvements to the MMU such as global and large pages mean less TLB misses, memory prefetching, greater memory bandwidth,
  
  Large pages, memory prefetch, larger caches - they all add their little bits, but nothing extrodinary. On the Pentium 4, for example, the move from Willamette to Northwood (double the cache, plus prefetch) yielded a %5-7 performance improvement, and nobody knows whether the cache or the prefetch were more important. Those are small potatos.
  
  And as for memory bandwidth, it is not a question: it will improve, or processors will starve. But the amount of input from the processor side is startlingly low: only four times since the introduction of the 8088 have processor manufacturers widended the memory bus (16-bit 286, 32-bit 386, 64-bit P5, 128-bit P4/Athlon), and only with the 486 did we actually introduce on-chip cache.
  
  --
  Man is the animal that laughs.
  And occasionally whores for Karma.
90. Re:Panic? by slashdot_commentator · 2008-03-12 07:39 · Score: 1
  
  I'll start by saying I am a fan of assembler programming as well, and believe it needs to be in the CS curriculum. But not because its a useful language, but because it "teaches" you how the hardware "thinks". If you don't develop for embedded processor environment, you'll probably never see assembler again in your lifetime. But you will still need to understand how the hardware in your platform works.
  
  Its the computer scientists that like to dream and pretend that one day soon, you will be able to contruct useful programs without ever having to understand the machine (high abstraction), but the drawback of abstraction is that it will alway result in less efficient runtimes, or inability to implement what you want to "express"
  
  Assembler will never be the language of choice for a multiprocessor environment. Even if you could master writing programs that could be spread over N processors and communicate threads to each other without deadlocking, it still would not be adopted. Assembler does not force you into particular coding patterns, and tend to be very individual to the programmer, at his level of experience at the time. They are just too costly to maintain with one programmer, let alone of team of them, or a programmer drifting in and out of a company during the product's lifetime. The reason for assembler is to use a human to maximally optimize performance of a program. The driving force of faster hardware and distributed processors is to aggregate enough computing power NOT to need a human to optimize performance of an application.
  
  What MIGHT happen is a trend to design future processors such that the microinstructions available reflect the basic constructs of the language (e.g. java). So writing "futurejava" ends up writing the assembler runtime of that CPU. That's a feature which helps make virtual machines the popular trend in compiler/language design; it gets you closer to it.
  
  The real challenge of the new multicore processor programming is to exploit its specific design. When you have cores with L1 caches and L2 caches and out-of-order-execution pipelines, etc. etc., you can write up high level languages, and they will run on that CPU, but they may not best exploit the features of the CPU, and frankly run like a dog. (as opposed to writing in a manner making the most used routines as small as possible to fit in the cache, or best optimize the use of predictive pipeline caching.) The idea with VM based languages is preserve the idea of an abstract language, but let the internals routines of the language best implement those issues while hiding it from the programmer.
  
  I suspect near term, that's how the "next" popular language will go. It will abstract concepts of concurrency and threads and IPC into a set of instructions, and it will be kludged into the VM/language. Perhaps its as simple as a fork() instruction (and the VM/multiprocessor platform takes care of all the internals). It will have utterly unacceptable performance for some rarefied multiprocessor platforms, but it will be good enough for the average programmer, and thus become the standard. It'll be the COBOL of matrix processing platforms. I have my doubts about Haskell/Erlang/OCAML only because if it really satisfied industry requirements, they would already be the de facto language. But perhaps it hasn't caught on just because the platforms capable of running it haven't been widely available until recently.
  
  --
  There is no America. There is no democracy. There is only IBM and AT&T and DuPont, Dow, General Electric, and Exxon
91. Re:Panic? by smartdreamer · 2008-03-12 13:52 · Score: 1
  
  Most functionnal programming languages are a good step in this direction. Either Erlang with a message base approach or the pure way of (most interesting in my opinion) Haskell are very easy to run concurrently. Pure languages avoids mutable elements (think variables) so only the sequencing issue remains. That can be done with special instructions (seq) or by the language itseft (there is many Haskell variants going this way).
92. Re:Panic? by billcopc · 2008-03-12 15:01 · Score: 1
  
  See that's precisely why the high end keeps getting lower. Those 3Ware cards look good on paper but they don't deliver the throughput. I get more speed out of my freebie onboard Intel Matrix Raid than I did out of any 3Ware or Promise card. I literally push 300mb/sec sustained on my current system, and it's just a semi-high-end gaming rig. Maybe I'm being too logical, but if I can get 300mb/sec with fake raid and a handful of cheap drives, I would expect at least double that from server-class gear, like it used to be in SCSI's glory days.
  
  As for the K10-based Opterons, they're just too weak! The original single and dual core Opterons were awesome because they beat the Xeon across the board: better performance per clock, lower power draw, better pricing and mass availability. It was a no-brainer! The new quad core AMDs just can't keep up, they're not all that much cheaper than quad core Xeons but most importantly: they're clocked too low. In all likelihood, a 16-core K10 system wouldn't be much faster could be slower than my old 16-core K8 system - from four years ago! It's been widely reported that K10 is on average 15% faster than the K8, clock for clock. That means my old 2.2ghz rig from 2003 is roughly equivalent to a 1.9ghz rig from today, and the fastest quad core Opteron you can buy today is 2.0ghz. I have no interest in dropping $10k on a workstation that's hardly 5% faster than the $10k workstation from 2003. If they could wind it up to 2.8ghz like they were supposed to, I'd actually get things done quicker on new hardware, which then justifies the investment. The Opteron isn't a bragging toy, it's a work tool. If it doesn't let me work faster and cheaper, then it is not worth a penny.
  
  --
  -Billco, Fnarg.com
93. Re:Panic? by Fulcrum+of+Evil · 2008-03-13 07:03 · Score: 1
  
  but if I can get 300mb/sec with fake raid and a handful of cheap drives, I would expect at least double that from server-class gear, like it used to be in SCSI's glory days.
  
  Can you get 300M/sec on RAID5? Also, what's the system load?
  
  As for the K10-based Opterons, they're just too weak! The original single and dual core Opterons were awesome because they beat the Xeon across the board: better performance per clock, lower power draw, better pricing and mass availability. It was a no-brainer! The new quad core AMDs just can't keep up, they're not all that much cheaper than quad core Xeons but most importantly: they're clocked too low.
  
  AMD has better memory architecture, so it'll outperform Xeons at 4x because the Xeons will choke on memory (usually).
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
94. Re:Panic? by infonography · 2008-03-13 08:08 · Score: 1
  
  I take it you've never experienced a 80 CPU panic..
  
  Likely both models will happen, it's likely anyway. Both camps are right. If anything we should see some new super-computer companies emerging in the next few years as the split widens.
  
  overall it sound like it's going to be fun.
  
  --
  Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
95. Re:Panic? by Fulcrum+of+Evil · 2008-03-13 08:09 · Score: 1
  
  Programmers have gotten lazier and since roughly 2000 (at least from my perspective, likely before that) have come to rely on the ever increasing sizes of hard drives, RAM, and Clock Frequency.
  
  It's called proper allcoation of resources. hardware is cheap and engineers are not, so why optimize if it's fast enough or if you can just toss another box at the problem?
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
96. Re:Panic? by Fulcrum+of+Evil · 2008-03-13 08:27 · Score: 1
  
  According to windows task manager I'm running 35 processes right now.
  
  And 32 of them are idle. It's always been like that.
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
97. Re:Panic? by Fulcrum+of+Evil · 2008-03-13 09:05 · Score: 1
  
  That's because no-one's really put much time/effort into making "Idiot-Capable(TM)" programming systems.
  
  Are you joking? We can't make idiot capable people for the most part, so why do you expect computers to do better?
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
98. Re:Panic? by Fulcrum+of+Evil · 2008-03-13 09:07 · Score: 1
  
  The main research lines of programming languages have for a long time abandoned the initial pursuit for user-friendly languages, their last successes being in BASIC, Logo and the Fourth Generation Languages (i.e. SQL).
  
  We have Ruby, which is friendly as you should expect. Programming is hard, and it's nothing to do with syntax.
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
99. Re:Panic? by Fulcrum+of+Evil · 2008-03-13 09:12 · Score: 1
  
  You won't be able to do anything in OCaml.Net that you can't already do in C#, so why bother with .Net?
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
100. Re:Panic? by billcopc · 2008-03-13 12:55 · Score: 1
  
  Can you get 300M/sec on RAID5? Also, what's the system load?
  Hell no! The fastest Raid-5 I've seen on the ICH9R is around 175mb/sec writes, which seemed to be the upper limit for that controller (more drives didn't help). It's still pretty damned decent, considering most "affordable" SATA Raid controllers are stuck in the slow PCI lane at 8mb/sec or less.
  
  I used to have high hopes for PCI-Express Raid controllers, but it wasn't long ago that I read reviews of Areca, Raidcore and LSI PCI-E 4x controllers that were just as slow as their 32bit/33mhz PCI grandfathers. I stopped looking. Clearly the storage industry doesn't get it. I already have a monster PC, I would gladly pay a nice chunk of change for a storage controller that's worthy of its CPU and Ram neighbors. I'll load it up with 16 SATA/SAS drives if it can saturate the bus!
  
  --
  -Billco, Fnarg.com
101. Re:Panic? by Fulcrum+of+Evil · 2008-03-13 13:55 · Score: 1
  
  Have you seen this? Looks like you can beat 175MB/s in a raid5 config, depending on file access patterns, of course.
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
102. Re:Panic? by uuxququex · 2008-03-14 09:05 · Score: 1
  
  Also, newer flash drives use transparent write leveling and other similar technologies to greatly extend the life of SSD
  Just curious: what happens if your SSD is 90% full? Are all writes going to be levelled around on the remaining 90%, or are full blocks actually being shifted all the time?
  Reason I'm asking is that I normally don't have that many space left on my hard disk, and if that proves to be a killer for SSD, then I rather stick to my old tech.
  Another thing, if SSD's have the same failure rate as USB thumbdrives then it paints a bleak pictures. Those things die on me like there's no tomorrow...
103. Re:Panic? by qinjuehang · 2008-03-15 13:40 · Score: 1
  
  Would we manage to hit 80 cores before quantum conputing? Now a state-of-the-art skulltrail PC has only 8 cores, and the average new computer has 2.
104. Re:Panic? by TuringTest · 2008-03-16 20:52 · Score: 1
  
  You couldn't do anything in C# that you can't already do in Assembler. So why bother?
  
  Higher abstraction, better reusability, easier to develop parallelism?
  
  --
  Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
105. Re:Panic? by TuringTest · 2008-03-16 20:56 · Score: 1
  
  Ruby is friendly to programmers. Research should thrive to make it easy for non-programmers (user friendly for computer users).
  
  Programming is hard
  That's my point. The holy grail of would be to find a way to make it possible programming (i.e. creating automatized tasks) without a programming language.
  
  --
  Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
106. Re:Panic? by PipsqueakOnAP133 · 2008-03-18 09:29 · Score: 1
  
  Actually, we don't even need to be looking back. Considering the Asus EEE is a "current" computer using what amounts to a 650 Mhz P3, it's fairly clear that most people are buying machine that are far overspeced for their needs.
Self Interest by quarrel · 2008-03-10 23:35 · Score: 3, Informative

AMD's Chuck Moore presumably has a lot of self interest in pushing heterogeneous cores. They are combining ATI+AMD cores on a single die and selling the benefits in a range of environments including scientific computing etc.

So take it all with a grain of salt

--Q
1. Re:Self Interest by The_Angry_Canadian · 2008-03-10 23:47 · Score: 5, Informative
  
  The article covers many point of views. Not only the one from Chuck Moore.
2. Re:Self Interest by davecb · 2008-03-11 00:10 · Score: 2, Insightful
  
  If he's saying that his multicore processors are going to be hard to program, then self-interest suggests he be very very quiet (;-))
  Seriously, though, adding what used to be a video board to the CPU doesn't change the programming model. I suspect he's more interested in debating future issues with more tightly coupled processors.
  --dave
  
  --
  davecb@spamcop.net
3. Re:Self Interest by xouumalperxe · 2008-03-11 00:16 · Score: 1
  
  Sure, I'll take it with a grain of salt. But he does have Moore as a surname, and the other guy pretty much nailed it. :)
4. Re:Self Interest by Hanners1979 · 2008-03-11 00:24 · Score: 2, Informative
  
  AMD's Chuck Moore presumably has a lot of self interest in pushing heterogeneous cores. They are combining ATI+AMD cores on a single die...
  
  It's worth noting that Intel will also be going down this route in a similar timeframe, integrating an Intel graphics processor onto the CPU die.
5. Re:Self Interest by GreatBunzinni · 2008-03-11 01:27 · Score: 1
  
  Indeed. The people pushing heterogeneous multicore systems (i.e., AMD) state that heterogeneous cores are the future while people pushing homogeneous, large scale multicore systems (i.e., Tilera) state that homogeneous multi-core systems will be the norm. Meanwhile IBM, which happens to be pushing both technologies through their big iron and their cell, states that both will be the future....
  
  Do you start to detect a pattern here?
  
  --
  Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
6. Re:Self Interest by ccoder · 2008-03-11 01:54 · Score: 1
  
  I read that as AMD's Chuck Norris, and was really interested then... now I'm just disappointed.
  
  Remember, Chuck Norris can kill two stones with one bird.
  
  --
  "During times of universal deceit, telling the truth becomes a revolutionary act" -- George Orwell
7. Re:Self Interest by pohl · 2008-03-11 02:02 · Score: 1
  
  If he's saying that his multicore processors are going to be hard to program, then self-interest suggests he be very very quiet (;-))
  This morning my son, already almost late for the bus to school, asked me to teach him how to make a cafe mocha (we lost an hour due to 'daylight savings time', and he wanted a jolt of caffeine). Teaching such a thing takes time. At best he had enough time to drink a shot of espresso. But what he really wanted was a cup of cocoa with caffeine in it. Could I get him to drink espresso?
  "I'll pull you a shot if you're man enough to drink it," I said, applying gentle pressure to his ego.
  "I am," he said weakly.
  I looked him hard in the eye: "Are you!?"
  He straightened his back, squared his shoulders, and solemly replied: "yes."
  I turned my back to him so that he could not see my grin. I pulled him a double shot and handed it to him.
  He drank it.
  Hard to program, eh? Taunt enough geeks and someone will step up and solve the problem.
  
  --
  The "cue the foo posts in 3, 2, 1..." posts will commence with no subsequent foo posts in 3, 2, 1...
8. Re:Self Interest by NSIM · 2008-03-11 02:08 · Score: 1
  
  It's not just AMD that's putting dedicated cores into general purpose CPUs. Intel is also going down that path with integrated graphics, and Larrabee is graphics engine implemented with lots of general purpose x86 CPU cores, I believe Sun has also done it with a 10 GBit/s Ethernet I/F. The real problem at the moment is deciding what to do with all those transistors, do you keep throwing them at more general purpose cores and honking big caches or do you have cores with specialized functions. The problem only gets more pronounced as we move to smaller and smaller scale fabrication where building processors with billions of transistors becomes the norm.
9. Re:Self Interest by PlusFiveTroll · 2008-03-11 02:53 · Score: 1
  
  >Do you start to detect a pattern here?
  
  Yes, that it only makes sense to develop something that (you think) is going to be used (and profitable) in the future. The thing is they are all right. Push the right processor for the job it's best at.
10. Re:Self Interest by daveisfera · 2008-03-11 03:21 · Score: 1
  
  Isn't that the point of capitalism/free markets? AMD can try one thing, Intel another, and then the market will decide which is the better option.
11. Re:Self Interest by zippthorne · 2008-03-11 04:19 · Score: 1
  
  But it makes quite a bit of sense, if you think of the GPU as a vector/matrix processor, rather than simply a video organ.
  
  It makes sense because tasks that aren't parallelizable won't benefit from any number of cores beyond one. But tasks that are parallelizable, and massively so, would benefit much more from a "single" massively parallel core than from a few parallel sequential cores. Assuming they can be set up like that.
  
  --
  Can you be Even More Awesome?!
Should Mimick The Brain by curmudgeon99 · 2008-03-10 23:39 · Score: 5, Interesting

Well, the most recent research into how the cortext works has some interesting leads on this. If we first assume that the human brain has a pretty interesting organization, then we should try to emulate it.

Recall that the human brain receives a series of pattern streams from each of the senses. These patterns streams are in turn processed in the most global sense--discovering outlines, for example--in the v1 area of the cortext, which receives a steady stream of patterns over time from the senses. Then, having established the broadest outlines of a pattern, the v1 cortext layer passes its assessment of what it saw the outline of to the next higher cortex layer, v2. Notice that v1 does not pass the raw pattern it receives up to v2. Rather, it passes its interpretation of that pattern to v2. Then, v2 makes a slightly more global assessment, saying that the outline it received from v1 is not only a face but a face of a man it recognizes. Then, that information is sent up to v4 and ultimate to the IT cortex layer.

The point here is important. One layer of the cortex is devoted to some range of discovery. Then, after it has assigned some rudimentary meaning to the image, it passes it up the cortex where a slightly finer assignment of meaning is applied.

The takeaway is this: each cortex does not just do more of the same thing. Instead, it does a refinement of the level below it. This type of hierarchical processing is how multicore processors should be built.
1. Re:Should Mimick The Brain by El_Muerte_TDS · 2008-03-11 00:00 · Score: 4, Funny
  
  If we first assume that the human brain has a pretty interesting organization, then we should try to emulate it.
  
  I think it's pretty obvious there are serious design flaws in the human brain. And I'm not only talking about stability, but also reliability and accuracy.
  Just look at the world.
2. Re:Should Mimick The Brain by curmudgeon99 · 2008-03-11 00:32 · Score: 0, Flamebait
  
  Ass clown.
  
  I was making a serious point. While the brain is not perfect--which one would expect from something that was not designed but evolved--it is the best game in town. I think it's foolish to try to replicate the trial and error development of a million years. I think we have a great model before us in the brain and it only makes sense to emulate it.
3. Re:Should Mimick The Brain by maestroX · 2008-03-11 00:39 · Score: 1
  
  Oh please, *everyone* knows males are lousy at multitasking.
4. Re:Should Mimick The Brain by locster · 2008-03-11 00:44 · Score: 1
  
  In addition there is also some degree of feedback from higher level processing back to lower levels, e.g. v2 telling v1 "I think this is a man's face, reinterpret based on this context". Information flows in both directions.
5. Re:Should Mimick The Brain by ragtoplvr · 2008-03-11 01:06 · Score: 1
  
  I hope we do not mimic the political brain. Most if not all of them do not work well.
  
  I guess we should not mimic the management brain either.
  
  Or the .....
  
  Transputer and Occam Ahead of their time.
  
  Rod
6. Re:Should Mimick The Brain by doublebackslash · 2008-03-11 01:07 · Score: 1
  
  Yeah, okay, that is all well and good, but not everything can be arbitrarily broken down into parallel tasks.
  Take your example. Imagine v1 takes 2s of CPU time and cannot be split into smaller pieces to be processed. However v2 takes 25s and cannot be broken up into parallel tasks. v4 will execute slightly sooner because parts of v2 started processing slightly sooner than if there were no parallelism between v1 and v2, but the speedup is minimal since the large wait is on v2.
  
  Take that down to a smaller level. Painting the screen, performing a sort, taking a sha1 checksum. They all have a non-parallel bottleneck. Painting the screen has to be done in the order things are viewed, you can calculate where things overlap, but eventually one thread has to paint it in order. A sort must eventually have a certain number of comparisons made, and they have to be made in a certain order, a small speedup can be had from using multiple cores but it ends up costing more CPU horsepower overall. Sha1/md5 checksums can only be processed in a certain order.
  
  We can break small parts of most tasks up between cores, but beyond a certain level it becomes non-trivial to find work for all the hardware to do. Either the problem has to be re-engineered, such as developing a multi step sorting algorithm, or the problem has to be broken into single steps and each evaluated against the rest for order of execution to find out which steps can be executed out of order (since there is no guarantee which steps will happen first on a multi CPU system) and then writing an entire system to farm out arbitrary bits of code to multiple processors (either at the compiler level after giving it hints, or at the code level with threads, mutexes, semaphores, and shared memory).
  
  Web servers and the like, on the other hand, can see a benefit immediately since they already have more tasks running than CPU's to run them on (server dozens of pages simultaneously). However even those can start to see diminishing returns when memory begins to be contented for by all the threads.
  
  You can see this problem gets complex fast.
  
  --
  md5sum /boot/vmlinuz
  d41d8cd98f00b204e9800998ecf8427e /boot/vmlinuz
7. Re:Should Mimick The Brain by curmudgeon99 · 2008-03-11 01:22 · Score: 1
  
  I think you missed my point. In the brain v1 focuses on a broad task. v2 focuses on a finer task. v4 on still a finer task. The results of the work done by v1 are sent in summary form to v2. v2 also sends its summary up to v4. Likewise, the upper levels will send down their summary to the lower area to help focus. So, it is not to the point of discrete processes working on pieces of the same issue. Rather, each cortex layer focuses on a qualitatively different task.
8. Re:Should Mimick The Brain by radtea · 2008-03-11 01:41 · Score: 2, Interesting
  
  Models from nature are rarely the best way to go. Heavier than air flight only got off the ground when people stopped looking to birds and bats for inspiration. Wheeled vehicles have no resemblance to horses. Interestingly, we are still trying to understand the nuanced details of the flight of birds based on the aerodynamics we learned building highly un-bird-like flying machines.
  
  So while there's nothing wrong in looking at our radically imperfect understanding of the brain, which is in no better state than pre-flight understanding of bird aerodynamics, it is optimistic to expect that it will provide much guidance in building programmes for multi-core processors, or for building those processors themselves. Neural networks, the most famously brain-like system architecture, are famously hard to "programme" (train) and essentially impossible to debug (interpret).
  
  The article suggests that heterogeneous multi-core architectures may be best represented to the programmer as a set of heterogeneous APIs, much as graphics-specific APIs are now. While this is vaguely consistent with the idea that "different parts of the brain do different things", I don't think the brain analogy brings anything useful to the table, and past experience should make us very wary of trying to draw any deeper inferences from it. Aeroplanes do look vaguely like birds, but that doesn't mean we should dispense with vertical stabilisers...
  
  One could equally well argue that neurons in the brain are fairly homogeneous, and each core could be considered a neuron. We know that different parts of the brain are remarkably adaptable. Stroke patients often regain function due to other parts of the brain taking over from the bits that were destroyed. So on this analogy, homogeneous processors that could be adapted to multiple tasks is the way to go.
  
  Demonstrating fairly conclusively that the brain analogy is pretty much useless, as it can be manipulated to appear to support whichever side of the debate you've already decided is the right one.
  
  --
  Blasphemy is a human right. Blasphemophobia kills.
9. Re:Should Mimick The Brain by Jeremy+Erwin · 2008-03-11 02:15 · Score: 1
  
  "Mimicking the human brain" may or not be an effective programming strategy, but it's still a novel one that must be incorporated into schedulers, languages, and programming styles.
  
  And.... we're back to square one.
10. Re:Should Mimick The Brain by emj · 2008-03-11 02:20 · Score: 1
  
  I had a professor that was very sold on using biological brain ideas for computers, and for all his ideas non really seemed that good enough at that time. Probably because because I knew too little, and the way the brain functions is a bit too complex for us atm.
  
  The things you say are very good ideas, but trying to implement them even in an OCR is very hard....
11. Re:Should Mimick The Brain by imgod2u · 2008-03-11 02:27 · Score: 1
  
  If all computer chips had to be good at was pattern recognition, this would be true. This is, in fact, how signal processors work. However, general purpose computing involves many things that are often quite challenging to the human brain (branching for example). So I don't think emulating the human brain will provide a better general purpose computing architecture.
  
  You should check out neural networks. They work based on the "do a bit and pass it along" principle. They're only good, however, for a certain subset of problems.
12. Re:Should Mimick The Brain by goose-incarnated · 2008-03-11 02:33 · Score: 1
  
  The takeaway is this: each cortex does not just do more of the same thing. Instead, it does a refinement of the level below it
  
  You have used 4 paragraphs to describe the word "pipeline".
  
  There is nothing wrong with running things in a pipeline, but it does mean that all the tasks being submitted to the pipeline are similar (the reason why graphics processing does so well in a pipelined architecture).
  
  --
  I'm a minority race. Save your vitriol for white people.
13. Re:Should Mimick The Brain by nschubach · 2008-03-11 02:33 · Score: 1
  
  Though you really didn't need to go the route of name calling, I have thought about a similar pattern for a while. I'm no engineer, nor would I really want to be in this but I think you're suggesting is akin to instruction sets where each core is assigned to process a specific set of data. Sort of. I had always thought it would be like telling the CPU to calculate the square of a number and have it send that process to a dedicated squaring core. Of course, you'd have to identify specific re-used functions and design cores specifically to process such things. You want to target your processor to ray tracing? Create a set of cores to calculate ray casting functions. The bad part of this idea is that you can't change the function of the cores as your brain can probably adapt to different circumstances and you find yourself with a very specific function that needs to be replaced often as needs change.
  
  I think the biggest issue is that these patterns you suggest from higher brain functions are merely 1s and 0s to the computer which looks not at patterns, but each bit individually. Maybe if we had 256-bit+ machines that could register multiple "words" to determine function per cycle and reroute the command, but as it is, we are struggling to register a few bytes per cycle.
  
  --
  Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
14. Re:Should Mimick The Brain by amplt1337 · 2008-03-11 03:40 · Score: 1
  
  If we first assume that the human brain has a pretty interesting organization, then we should try to emulate it.
  
  Well, gopher colonies have a pretty interesting organization too, but I don't think we need to be emulating those.
  
  The takeaway is this: each cortex does not just do more of the same thing. Instead, it does a refinement of the level below it. This type of hierarchical processing is how multicore processors should be built.
  
  You are correct that having specialized tools which are efficient at doing specialized things and then providing a summary is a good way to go. However, there is a much greater benefit to doing this for a human (or animal) brain, that needs to interpret stimulus from a real environment, identify a situation, apply several different heuristics, keep its heart beating, figure out if it needs to pee, etc. That's a lot of parallel processing needs that simply aren't there for the average computer application, because computers don't interact with the "real environment" so to speak.
  
  No, the earlier poster had it right -- most of the tasks that the Common User uses a computer for are pretty inherently serial, and require continued interaction. Outside of some specialized fields, the extra power simply isn't useful, so people won't pay for it, and a business strategy that stakes profitability on their willingness to do so is a flawed one. We're approaching the shores of "good enough."
  
  --
  Freedom isn't free; its price is the well-being of others.
15. Re:Should Mimick The Brain by blahplusplus · 2008-03-11 04:28 · Score: 1
  
  "Models from nature are rarely the best way to go."
  
  That's a pretty big claim there.
  
  "Heavier than air flight only got off the ground when people stopped looking to birds and bats for inspiration."
  
  This is completely incorrect, heavier then air flight was not understood not because people were trying to emulate birds, but the did not understand the principles of flight. All flying things use principles for their locomotion, this changes depending on the size of the object (a bug vs a bird vs a plane) and how fast you want it to go, how much weight you want it to carry, etc. Engineering is not easy, and I say with the amount of technology already in life, life is a very safe bet. Before there were helicopters there were hummingbirds and dragonfly's.
  
  "Wheeled vehicles have no resemblance to horses."
  
  Which has no bearing on anything. No one understood what powered locomotion in a horse, or much of anything for that matter until we found out the primary causes of locomotion.
  
  "Interestingly, we are still trying to understand the nuanced details of the flight of birds based on the aerodynamics we learned building highly un-bird-like flying machines."
  
  But that is the whole point, the aerodynamics you talk about was nowhere near understood when flight was invented and what we're learning now about bird aerodynamics is that it's a lot more complicated and sophisticated then our solid wing aircraft.
  
  You're putting the cart before the horse: Man stumbled his way into technology because he didn't know what to look for, the fact of the matter undersstanding the flight of organisms is nowhere as easy as designing rudimentary flying vehicles. Our 'advanced technology' (from then) was the low hanging fruit.
16. Re:Should Mimick The Brain by Kjella · 2008-03-11 05:03 · Score: 1
  
  I think it's pretty obvious there are serious design flaws in the human brain. And I'm not only talking about stability, but also reliability and accuracy. The stability is excellent in my opinion, it'll endure work 16 hours/day and 30000 sleep cycles over 80 years yet I don't recall ever hearing anything like the brain crashing. The primary cause of brain failure is lack of oxygen but that's like a server without electricity - you can hardly blame the brain for that. I guess it's more like one of those old Pentiums, they can be perfectly stable but the logic is still wrong.
  
  --
  Live today, because you never know what tomorrow brings
17. Re:Should Mimick The Brain by timeOday · 2008-03-11 07:43 · Score: 1
  
  Besides, there are well-known processing paradigms matching this description: data flow (e.g. unix pipelining, probably the easiest known way to do parallel programming) and production systems (based on pattern matching, which is easily made parallel - but you can often get the same speedup with indexing tricks that aren't!)
18. Re:Should Mimick The Brain by Mad+Merlin · 2008-03-11 14:03 · Score: 1
  
  You should check out neural networks. They work based on the "do a bit and pass it along" principle. They're only good, however, for a certain subset of problems.
  
  Nonsense, Neural Networks are the wave of the future!
  
  --
  Game! - Where the stick is mightier than the sword!
19. Re:Should Mimick The Brain by DiEx-15 · 2008-03-11 14:59 · Score: 1
  
  All I can say is that this would be a very good idea for current and future multi-core CPUs. Eventually programmers and designers will have to take hints on what has been going on inside us for millions of years to this new level. Very excellent idea sir/madam.
20. Re:Should Mimick The Brain by Superballs · 2008-03-11 16:54 · Score: 1
  
  I find it humourous that the above comment was modded as funny.
  
  When you really think about how susceptible our brain is to illusion and misinterpretation, I'm not sure it's too wise to base a computing architecture on it's structure.
  
  --
  Howe due yoo keap uh gramur natsee bizzy four ours?
21. Re:Should Mimick The Brain by tabrnaker · 2008-03-11 18:57 · Score: 1
  
  I'm currently working on a development system based on human cognition.
  The hard part is finding the lowest common building blocks. A good clean simple design should be scalable in both the micro and macro directions.
  Once we find the most efficient ways to do something that is required at all layers (such as communication,cpu,motherboard,programs,modules,internet) we can optimize the hell out of those functions, as well as build specific cores for them.
  I really should stop talking my mouth off and actually code something. Believe me, i speak logic languages a lot better than i do english :)
22. Re:Should Mimick The Brain by curmudgeon99 · 2008-03-11 23:36 · Score: 1
  
  Interesting research. I suggest you pick up a copy of Jeff Hawkins' "On Intelligence", by the way. The most interesting aspect of it to me was that all human senses provide information to the brain in the form of "pattern streams". These streams of patterned information are the way we receive information such as sight, hearing, touch and anything else. He points out that the brain works on the same type of data no matter from where the input comes.
  
  However, some of the later stuff about how the brain actually stores information, such as the "columns" and the interactions of the various layers of cortex, was really difficult to grok.
  
  I myself have been about four years on an AI project that just may come to something interesting. There is one whole entire realm of the brain that no one has attempted to model. That's what I'm after.
23. Re:Should Mimick The Brain by tabrnaker · 2008-03-12 00:07 · Score: 1
  
  Thanks, i'll take a look at it.
  Layering is actually one of the things i'm having a design problem around. Of course, i'm going about it differently. Instead of modeling on the neurological level, i'm trying to model the abstract level on which cognition takes place. More Python than Assembler, and of course i'm writing the design proof in python.
  I figured that instead of writing a book to explain my theories of cognition and awareness, i'd just write a working model. I'm also looking forward to using it to model feature detectors in the visual system, i've got the glimmers of an algorithm for a binocular object detector i want to test out.
24. Re:Should Mimick The Brain by curmudgeon99 · 2008-03-12 01:01 · Score: 1
  
  Excellent. The most interesting aspect of the brain's behavior is that it attacks the problem of perception in stages. The initial layer just attempt--in the case of images--to detect the broad outlines. So, if the eyes pass in a pattern stream of the image of a pyramid, the first layer of perception just identifies: triangle. So, from the pattern stream of information, the first detection is just of the outlines. Then, from that large volume of input data the brain receives, it just passes to the next layer in the cortex: "triangle". Then, the next layer works from that basis and looks for details inside the triangle. So, from this simple division of labor, we have much simplified the problem. In fact, there is a layer that is optimized to only see faces, no matter their orientation (upside down, seen from the side, etc.)
  
  So, in short, I think this is the correct approach: divide and conquer.
25. Re:Should Mimick The Brain by tabrnaker · 2008-03-12 10:14 · Score: 1
  
  Only problem, where do you divide?
  I currently disagree with a large part of the visual model we currently have. There seems to be a big emphasis on equating the brain with a computer, and thereby limiting our models to the constraints against computers.
  I believe (unfounded as of yet) that the feature detectors do not just measure 2d simple geometric shapes. I think this happens when we keep thinking of light as a particle. Nature wouldn't limit itself to one modality of light.
  I believe that a lot of feature detectors take movement(spirals) and geometry(lots of pentagons) and translate it into simple static symbols visualized as a 2d surface. The brain then makes a new 3d static representation from two of these condensed static views.
  Reality might be running at 5Ghz, most of us only refresh ~60/sec :) Part of meditation is learning how to slow down one's consumption of the outside world, slowing helps to show the movement in everything. Most people just pop acid though.
  It's a weird phenomenon, but as you run your brain faster, 3d object recognition goes downhill. Object recognition is temporally computationally complex. Before i learned how to see, i often wondered if i could even think while 'seeing'. Now, i'm still pretty dumb while i'm seeing, it's like i'm average:). I was staring at a receipt trying to figure out a tip, and all i could see was a crumpled piece of paper in my hand, so i gave up. So i seriously can't do math when seeing, but if i shut off object processing, i can go back to doing math in my head, and using the viewscreen for visualization and graphing. And wouldn't you know it? Object recognition requires you to focus both eyes on the same thing. That seriously slows down detail collection. Hence photographic memory becomes much harder, easier to store flat text representations than a whole object scene plus the data.
  I'm sure science will figure it out sooner or later, just like the nose, it's more about recognition of geometry than 2d symbol interacts with 2d receptor. It'll be another golden age soon as we realize we can't remove the subjective observer from our observations. Once we do that, we can stop staring at the 2d shadow on Plato's cave wall, and turn around and start studying the causes.
  We are, in fact, trapped within the simple 2d reality that is produced by the senses.
  Sheesh, i always sound so corny. I guess it's better than the dry, monotone, technical, purely objective, autistic academic stuff i did in university :)
26. Re:Should Mimick The Brain by curmudgeon99 · 2008-03-12 10:19 · Score: 1
  
  Interesting post. And from your description of your predicament in regard to images, I would venture to guess that you're right handed.
27. Re:Should Mimick The Brain by tabrnaker · 2008-03-12 11:26 · Score: 1
  
  Not quite. Physically, my left side is dominant. My right hand used to be one of those contracted gimp arms. I did use it to write though, but because i had limited movement of my fingers my writing was extremely small and irregular.
  My right dealt in the micro, my left in the macro.
28. Re:Should Mimick The Brain by curmudgeon99 · 2008-03-13 12:26 · Score: 1
  
  Having taken several dozen stabs at this problem, I think you need first of all to have an architecture that can seek to actually combine two different and complimentary "thought" processes. The brain is effective because it uses memories to offer feedback to itself, in real time. The UI interface is an annoying distraction from the idea: what does it mean to be a conscious entity, taking in simultaneous pattern streams of information from the senses, using those senses to reply with related things.
  
  Somehow, how that sensory information rattles around in side that memory and flood of ideas and connotations feeding back into the mind somehow makes this consciousness.
  
  We usually think of a conscious person seeing themselves, or hearing themselves. But a deaf blind person is still conscious, getting a pattern stream of information from their fingertips, letting the mind make it's own interior imaginary representation of the world.
  
  All that is the challenge. There are a trillion other things that could be said.
29. Re:Should Mimick The Brain by gbjbaanb · 2008-03-16 23:16 · Score: 1
  
  The takeaway is this: each cortex does not just do more of the same thing. Instead, it does a refinement of the level below it. This type of hierarchical processing is how multicore processors should be built. Oh no! you're saying my brain works like the OSI networking stack?!
  
  data stream from the eyeballs:
  
  Level 1: look at the pretty flashy lights
  Level 2: hey, some of those lights seem to be moving
  Level 3: and that chunk of light moves together
  Level 4: which is a recognizable object
  Level 5: I remember that object, its my mate Dave.
  Level 6: Dave owes me a beer.
30. Re:Should Mimick The Brain by curmudgeon99 · 2008-03-16 23:37 · Score: 1
  
  Ha ha. No, I meant more like: V1 -- detect outline V2 -- detect details within outline from V1 V4 -- further refinement IT -- etc.
Let's see the menu by Tribbin · 2008-03-10 23:39 · Score: 3, Interesting

Can I have... errr... Two floating point, one generic math with extra cache and two RISC's.

--
If you mod this up, your slashdot background will turn into a beautiful sunset!
1. Re:Let's see the menu by that+this+is+not+und · 2008-03-11 00:32 · Score: 1
  
  You're sounding like that IBM 'Drive Thru' radio commercial now.
2. Re:Let's see the menu by imikem · 2008-03-11 00:34 · Score: 5, Funny
  
  Would you like fries with that?
  
  --
  Perscriptio in manibus tabellariorum est.
3. Re:Let's see the menu by everphilski · 2008-03-11 01:48 · Score: 1
  
  and biggie size it, to go, with CHEESE FRIES!
4. Re:Let's see the menu by jgiltner · 2008-03-12 13:59 · Score: 1
  
  Have you see IBM's new z6 processor? Quad core 4.4GHz, 2 I/O interfaces running 17GB ps each, 2 interface for SMP function at 48GB ps each, and 4 memory interface at 13 GB ps each. Each core has a decimal math accelerator, each pair of cores share a compression/decompression accelerator and a encryption/decryption accelerator.
  
  Their new z10 mainframe can have up to 20 of these (80 cores) of which 64 of the cores can be configured for the customer to use. The other 16 cores are used for spares and special processing.
  
  The z6 (or z10 CPU) is very closely related to IBM's Power6.
  
  http://en.wikipedia.org/wiki/IBM_z6
  
  http://www2.hursley.ibm.com/decimal/IBM-z6-mainframe-microprocessor-Webb.pdf
OpenMP? by derrida · 2008-03-10 23:40 · Score: 2, Informative

It is portable, scalable, standardized and supports many languages.

--
nemesis. Home of an experimental fe code.
1. Re:OpenMP? by kscguru · 2008-03-11 04:53 · Score: 1
  
  And OpenMP is completely useless outside of the narrow field of embarrassingly-parallel numeric computation for which it was developed. OpenMP assumes the only thing worth spreading across processors are tight kernels of code, so OpenMP only produces gains when very large fractions of code are exactly identical. OpenMP is nothing more than a thin wrapper around five or so common scatter-gather paradigms. OpenMP makes scientific code easier to write, and never claimed to do more.
  Pthreads is a more useful parallel API for any task outside scientific computing. But if you want the high-level overview, David Patterson's (yes, the David Patterson of The Book on computer architecture) multicore talk is quite good. (I think I found the right slide deck...)
  
  --
  A witty [sig] proves nothing. --Voltaire
2. Re:OpenMP? by widman · 2008-03-13 07:52 · Score: 1
  
  Er... No. Check out the new STL and many other new things doing either OpenMP or something. OpenMP is not limited lik SIMD. You can do a lot of mini-threads there. The pragma statements are good hints to the compiler. Check out GCC 4.3 on STL, Cell. And GCC 4.2 on OpenMP.
Languages by PsiCollapse · 2008-03-10 23:41 · Score: 2, Informative

That's why it's so important that languages begin to adopt threading primitives and immutable data structures. Java does a good job. Newer languages, like Clojure are built from the ground up with concurrency in mind.
1. Re:Languages by chudnall · 2008-03-11 00:35 · Score: 5, Informative
  
  *cough*Erlang*cough*
  
  I think the wailing we're about to hear is the sound of thousands of imperative-language programmers being dragged, kicking and screaming, into functional programming land. Even the functional languages not specifically designed for concurrency do it much more naturally than their imperative counterparts.
  
  --
  Disclaimer: Evolution comes with NO WARRANTY, except for the IMPLIED WARRANTY of FITNESS FOR A PARTICULAR PURPOSE.
2. Re:Languages by Westley · 2008-03-11 00:47 · Score: 5, Informative
  
  Java doesn't do a good job. It does a "better than abysmal" job in that it has some idea of threading with synchronized/volatile, and it has a well-defined memory model. (That's not to say there aren't flaws, however. Allowing synchronization on any reference was a mistake, IMO.)
  
  What it *doesn't* do is make it easy to write verifiably immutable types, and code in a functional way where appropriate. As another respondent has mentioned, functional languages have great advantages when it comes to concurrency. However, I think the languages of the future will be a hybrid - making imperative-style code easy where that's appropriate, and functional-style code easy where that's appropriate.
  
  C# 3 goes some of the way towards this, but leaves something to be desired when it comes to assistance with immutability. It also doesn't help that that .NET 2.0 memory model is poorly documented (the most reliable resources are blog posts, bizarrely enough - note that the .NET 2.0 model is significantly stronger than the ECMA CLI model).
  
  APIs are important too - the ParallelExtensions framework should help .NET programmers significantly when it arrives, assuming it actually gets used. Of course, for other platforms there are other APIs - I'd expect them to keep leapfrogging each other in terms of capability.
  
  I don't think C# 3 (or even 4) is going to be the last word in bringing understandable and reliable concurrency, but I think it points to a potential way forward.
  
  The trouble is that concurrency is hard, unless you live in a completely side-effect free world. We can make it simpler to some extent by providing better primitives. We can encourage side-effect free programming in frameworks, and provide language smarts to help too. I'd be surprised if we ever manage to make it genuinely easy though.
3. Re:Languages by locster · 2008-03-11 00:49 · Score: 1
  
  Microsoft did some research in this area a few years back. See C Omega
4. Re:Languages by TheRaven64 · 2008-03-11 01:12 · Score: 3, Interesting
  
  For good parallel programming you just need to enforce one constraint:
  Every object (in the general sense, not necessarily the OO sense) may be either aliased or mutable, but not both.
  Erlang does this by making sure no objects are mutable. This route favours the compiler writer (since it's easy) and not the programmer. I am a huge fan of the CSP model for large projects, but I'd rather keep something closer to the OO model in the local scope and use something like CSP in the global scope (which is exactly what I am doing with my current research).
  
  --
  I am TheRaven on Soylent News
5. Re:Languages by Ngarrang · 2008-03-11 01:49 · Score: 1
  
  *cough*Erlang*cough*
  
  I think the wailing we're about to hear is the sound of thousands of imperative-language programmers being dragged, kicking and screaming, into functional programming land. Even the functional languages not specifically designed for concurrency do it much more naturally than their imperative counterparts.
  You will have to pry my imperative programming languages from my cold, dead fingers, thank you very much. Maybe when BASIC is made parallel invisibly...
  
  --
  Bearded Dragon
6. Re:Languages by autophile · 2008-03-11 02:43 · Score: 1
  
  Erlang does this by making sure no objects are mutable.
  
  I thought that was a feature of all functional languages, not just Erlang?
  --Rob
  
  --
  Towards the Singularity.
7. Re:Languages by mkramer · 2008-03-11 02:43 · Score: 1
  
  That seems like a reasonable core requirement, when dealing with traditional type cores.
  
  But, as Moore was saying, you start to evaulate heterogenous microprocessors, with processing units much more specialized than your basic general purpose processor (be it more DSP, GPU, FPGA, or whatever), you're going to have a lot more problems with the fact that the functionality of a single object may not be well-fit to the optimum processing unit. Not to mention, we also need techniques for determining exactly which processing units are optimum for the particular task, based not only on raw functionality, but resource availability, and future resource needs.
  
  Homogenous multi-cores are difficult as it is. The heterogenous nightmare is what particularly frightens researchers in this area. We've seen some amazing performance success with these types of processors, but essentially purely with hand-tuned (or at least heavily hinted) source code. To actually sell microprocessors of these types with success, there's no doubt that we need either more intelligent tools or a miraculous generalized-yet-not-too-high-level parallel programming paradigm. Preferably both.
8. Re:Languages by Anonymous Coward · 2008-03-11 03:58 · Score: 0
  
  You will have to pry my imperative programming languages from my cold, dead fingers, thank you very much.
  
  Good luck with that dogma. While your single active core is glowing red hot and you're complaining about poor performance, your other 79 cores are in their idle loops. Looks pretty dumb to us.
9. Re:Languages by Anonymous Coward · 2008-03-11 07:00 · Score: 0
  
  I'd take java.util.concurrent over ParallelExtentions any day of the week. Having a strong set of concurrency data structures and primitives is far, far more valuable than a parallel "for" library that promotes abuse through data dependancy errors. And considering that the authors of .Net's memory model have repeatedly compared their work to Java's, its clear that this is where they got their inspiration from.
  
  In Java, you can program concurrency programs in a variety of ways, such as exclusively in a message passing style approach by using concurrent queues. Before lambasting Java and talking up C#, you should go back and see what Java released well before Microsoft. This is the area that C# is catching up to Java.
10. Re:Languages by matsh · 2008-03-11 07:28 · Score: 2, Informative
  
  *cough*Scala *cough*
  
  An object oriented AND functional language built on top of the JVM, with seamless interoperability with Java, immutable collections and an Actors framework pretty much the same as in Erlang. I think it is the solution to the multi-core problem.
11. Re:Languages by Anonymous Coward · 2008-03-11 10:23 · Score: 0
  
  I think the implication is that it just might be your cold, dead fingers.
12. Re:Languages by jasonjacks0n · 2008-03-11 15:02 · Score: 1
  
  That's not to say there aren't flaws, however. Allowing synchronization on any reference was a mistake, IMO.
  
  Just out of curiosity, why do you think that was a mistake?
  
  (I'm not arguing the point.. in fact I have no opinion on it one way or another. Just wondering what you've run into that's made you decide that wasn't a good idea.)
  
  --
  This space intentionally left blank.
13. Re:Languages by Westley · 2008-03-11 19:50 · Score: 1
  
  It encourages people to synchronize on any old reference, such as "this" or "typeof(Foo)".
  
  This in turn leads locks to be more public than they should be, and also decreases the readability - you don't instantly know whether an object is going to be used for locking or not.
  
  I almost always lock on a readonly object variable which is privately created and never exposed.
14. Re:Languages by Westley · 2008-03-11 21:39 · Score: 1
  
  I'm well aware of java.util.concurrent, and I agree that having a strong set of primitives is a good idea. I believe that Parallel Extensions will provide some of the same kinds of primitives as well as the higher level "parallel for" etc. Both are useful.
  
  I haven't looked at Parallel Extensions as much as I'd like to, due to time pressures, but if there are important primitives that it's missing, those would certainly be worth telling the team - I'm sure they'd be very interested to hear other views.
  
  And yes, it's obvious that the .NET memory model was partly inspired by Java's, which may be part of the reason for making the same mistake of giving every object a monitor. Don't assume that I'm attacking Java's record from a point of view of ignorance. (Note also that I explicitly acknowledged that Java's memory model was a lot more clearly documented than .NET's. You implied that I've been entirely positive about C# and entirely negative about Java, which simply isn't true.)
  
  If Java 7 actually gets closures (I haven't kept track of what's currently in/out) that would certainly aid concurrent programming in Java too, btw.
15. Re:Languages by Westley · 2008-03-11 21:52 · Score: 1
  
  Scala is one of the many languages I haven't got round to yet, but plan to at some point.
  
  There's an interesting question of psychology though. I *suspect* that we're more likely to win people round to a functional way of thinking by gradually encouraging a functional style within a familiar imperative setting than by starting off with a "mainly functional" language which allows side-effects in some cases.
  
  I say this as someone with a long history of imperative programming, who is now coming round to the functional way of thinking - so I'm naturally hugely biased :)
  
  However it happens, there's definitely a resurgence of interest (outside the CS community) in functional languages - at least, I've certainly noticed them getting more attention recently.
Not *that* Chuck Moore by Hobart · 2008-03-10 23:42 · Score: 4, Informative

This article is referring to AMD's Charles R. "Chuck" Moore, who worked on the POWER4 and PowerPC 601, not the language and chip designer Charles H. "Chuck" Moore who invented Forth, ColorForth, et al. and was interviewed on slashdot.

--
o/~ Join us now and share the software ...
1. Re:Not *that* Chuck Moore by Hal_Porter · 2008-03-10 23:52 · Score: 5, Funny
  
  Those +1 Informative links go to wikipedia, an online encyclopedia.
  
  --
  echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
What about... by aurb · 2008-03-10 23:45 · Score: 2, Informative

...functional programming languages? Or flow programming?
Should Mimic DNA/cell process. by cabazorro · 2008-03-10 23:48 · Score: 1

Sounds wasteful, I know (data replication everywhere). But there is a reason for that. The process becomes resilient to unexpected changes (corruption). The bus is the enzymes, the cpu is the cell and thread of execution is, well, the DNA. The replication and communication process is autonomous.

--
- these are not the droids you are looking for -
The future is here by downix · 2008-03-10 23:48 · Score: 5, Insightful

What Mr Moore is saying does have a grain of truth, that generic will be beaten by specific in key functions. The Amiga proved that in 1985, being able to deliver a better graphical solution than workstations costing tens of thousands more. The key now is to figure out which specifics you can use without driving up the cost nor without compromizing the design ideal of a general purpose computer.

--
Karma Whoring for Fun and Profit.
1. Re:The future is here by funkboy · 2008-03-11 00:24 · Score: 2, Insightful
  
  The Amiga proved that in 1985, being able to deliver a better graphical solution than workstations costing tens of thousands more. The key now is to
  figure out which specifics you can use without driving up the cost nor without compromizing the design ideal of a general purpose computer.
  
  The key now is figuring out what to do with your Amiga now that no one writes applications for it anymore.
  
  I suggest NetBSD :-)
2. Re:The future is here by codevark · 2008-03-11 02:51 · Score: 1
  
  Heya. Despite the rumors, it's not true that Amiga software dev has stopped dead. And even if it were true, that doesn't mean that all the existing software stops working, right? (like what some people seemed to think would happen when CBM went belly-up.) AFAICT, most of those old apps are still working fine.. several are working for me right now :)
  
  It would be somewhat ironic if we came full circle to end up with a collection of "custom" CPUs, each specially suited for particular tasks, all cooperating to perform some higher function.
  
  JM must be shaking his head and smiling, somewhere.
3. Re:The future is here by judashole · 2008-03-11 05:04 · Score: 1
  
  I still have an Amiga 2000 sitting in a box in my attic. Those were the days.
4. Re:The future is here by downix · 2008-03-12 00:35 · Score: 1
  
  Try out the MiniMig sometime, brand new open-source Amiga clone.
  
  --
  Karma Whoring for Fun and Profit.
Just fab the cores and get out of the way by Anonymous Coward · 2008-03-10 23:56 · Score: 0

or go take threads 101. don't punish developers who know what they are doing just because the ruby/rails/java/python fad language crowd doesn't understand how their language bastardizes pthreads.
My heterogeneous experience with Cell processor by DoofusOfDeath · 2008-03-11 00:01 · Score: 5, Interesting

I've been doing some scientific computing on the Cell lately, and heterogeneous cores don't make life very easy. At least with the Cell.

The Cell has one PowerPC core ("PPU"), which is a general purpose PowerPC processor. Nothing exotic at all about programming it. But then you have 6 (for the Playstation 3) or 8 (other computers) "SPE" cores that you can program. Transferring data to/from them is a pain, they have small working memories (256k each), and you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL). They also have poor speed for double-precision floats.

The SPEs are pretty fast, and they have a very fast interconnect bus, so as a programmer I'm constantly thinking about how to take better advantage of them. Perhaps this is something I'd face with any architecture, but the high potential combined with difficult constraints of SPE programming make this an especially distracting aspect of programming the Cell.

So if this is what heterogeneous-cores programming means, I'd probably prefer the homogeneous version. Even if they have a little less performance potential, it would be nice to have a 90%-shorter learning curve to target the architecture.
1. Re:My heterogeneous experience with Cell processor by nycguy · 2008-03-11 00:16 · Score: 5, Interesting
  
  I agree. While a proper library/framework can help abstract the difficulties associated with a heterogeneous/asymetric architecture away, it's just easier to program for a homogeneous environment. This same principle applies all the way down to having general-purpose registers in a RISC chip as opposed to special-purpose registers in a CISC chip--the latter may let you do a few specialized things better, but the former is more accomodating for a wide range of tasks.
  And while the Cell architecture is a fairly stationary target because it was incorporated into a commercial gaming console, if these types of architectures were to find their way into general purpose computing, it would be a real nightmare, since every year or so a new variant of the architecture would come out that would introduce a faster interconnect here, more cache memory there, etc., so that one might have to reorganize the division of labor in one's application to take advantage (again a properly parameterized library/framework can handle this sometimes, but only post facto--after the variation in features is known, not before the new features have even been introduced).
2. Re:My heterogeneous experience with Cell processor by that+this+is+not+und · 2008-03-11 00:34 · Score: 1
  
  Transferring data to/from them is a pain, they have small working memories (256k each), and you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL).
  
  The horrors! How are the teams at Microsoft going to fit bloat in them, then!?!
3. Re:My heterogeneous experience with Cell processor by DoofusOfDeath · 2008-03-11 00:44 · Score: 1
  
  The horrors! How are the teams at Microsoft going to fit bloat in them, then!?!
  
  Actually, it's been a good exercise to have to work under those constraints. I found that a tight environment like that forced me to carefully reconsider the design of my code and my algorithm. It probably lead to an implementation that not only had fewer lines of code, but was also more readable, than the original version.
4. Re:My heterogeneous experience with Cell processor by epine · 2008-03-11 01:01 · Score: 5, Interesting
  
  So if this is what heterogeneous-cores programming means, I'd probably prefer the homogeneous version.
  Your points are valid as things stand, but isn't it a bit premature to make this judgment? Cell was a fairly radical design departure. If IBM continues to refine Cell, and as more experience is gained, the challenge will likely diminish.
  
  For one thing, IBM will likely add double precision floating point support. But note that SIMD in general poses problems in the traditional handling of floating point exceptions, so it still won't be quite the same as double precision on the PPU.
  
  The local-memory SPE design alleviates a lot of pressure on the memory coherence front. Enforcing coherence in silicon generates a lot of heat, and heat determines your ultimate performance envelop.
  
  For decades, programmers have been fortunate in making our own lives simpler by foisting tough problems onto the silicon. It wasn't a problem until the hardware ran into the thermal wall. No more free lunch. Someone has to pay on one side or the other. IBM recognized this new reality when they designed Cell.
  
  The reason why x86 never died the thousand deaths predicted by the RISC camp is that heat never much mattered. Not enough registers? Just add OOO. Generates a bit more heat to track all the instructions in flight, but no real loss in performance. Bizarre instruction encoding? Just add big complicated decoders and pre-decoding caches. Generates more heat, but again performance can be maintained.
  
  Probably with a software architecture combining the hairy parts of the Postgres query execution planner with the recent improvements in the FreeBSD affinity-centric ULE scheduler, you could make the nastier aspects of SPE coordination disappear. It might help if the SPUs had 512KB instead of 256KB to alleviate code pressure on data space.
  
  I think the big problem is the culture of software development. Most code functions the same way most programmers begin their careers: just dive into the code, specify requirements later. What I mean here is that programs don't typically announce the structure of the full computation ahead of time. Usually the code goes to the CPU "do this, now do that, now do this again, etc." I imagine the modern graphics pipelines spell out longer sequences of operations ahead of time, by necessity, but I've never looked into this.
  
  Database programmers wanting good performance from SQL *are* forced to spell things out more fully in advance of firing off the computation. It doesn't go nearly far enough. Instead of figuring out the best SQL statement, the programmer should send a list of *all* logically equivalent queries and just let the database execute the one it finds least troublesome. Problem: sometimes the database engine doesn't know that you have written the query to do things the hard way to avoid hitting a contentious resource that would greatly impact the performance limiting path.
  
  These are all problems in the area of making OSes and applications more introspective, so that resource scheduling can be better automated behind the scenes, by all those extra cores with nothing better to do.
  
  Instead, we make the architecture homogeneous, so that resource planning makes no real difference, and we can thereby sidestep the introspection problem altogether.
  
  I've always wondered why no-one has ever designed a file system where all the unused space is used to duplicate other disk sectors/blocks, to create the option of vastly faster seek plans. Probably because it would take a full-time SPU to constantly recompute the seek plan as old requests are completed and new requests enter the queue. Plus if two supposedly identical copies managed to diverge, it would be a nightmare to debug, because the copy you get back would non-deterministic. Hybrid MRAM/Flash/spindle storage systems could get very interesting.
  
  I guess I've been looking forward to the end of artificial scaling for a long time (clock freq. as the
5. Re:My heterogeneous experience with Cell processor by TheRaven64 · 2008-03-11 01:19 · Score: 3, Insightful
  
  Well, part of your problem is that you're using a language which is a bunch of horrible syntactic sugar on top of a language designed for programming a PDP-8 on an architecture that looks nothing like a PDP-8.
  You're not the only person using heterogeneous cores, however. In fact, the Cell is a minority. Most people have a general purpose core, a parallel stream processing core that they use for graphics and an increasing number have another core for cryptographic functions. If you've ever done any programming for mobile devices, you'll know that they have been using even more heterogeneous cores for a long time because they give better power usage.
  
  --
  I am TheRaven on Soylent News
6. Re:My heterogeneous experience with Cell processor by Anonymous Coward · 2008-03-11 01:22 · Score: 0, Interesting
  
  Double precison has been greatly improved in the last variants of the Cell SPU, not the ones in the PS3 though. The enhanced DP processors are only found in the recent IBM (and perhaps Mercury) blades, which are expensive, but the only difference with single precision is longer latency which leads to about half the flops (only 2 values per register instead of 4).
  
  Next year we might even get the same processor with 32 SPU, still hard to program but it means 8MB total of data on the chip, which opens some opportunities (unfortunately the memory size per SPU seems to be set in stone at 256kB).
  
  I'm interested in programming the Cell for doing some signal processing, most of which will be single precision FFT, an application where it seems to rock. I think that the data flow between SPU is relatively easy to organize for my purpose. OTOH, it seems nobody wants to sell bare Cell chips, which is sad, since I would love to try to interface it to high speed (1-2Gsamples/s) ADCs.
7. Re:My heterogeneous experience with Cell processor by Anonymous Coward · 2008-03-11 01:32 · Score: 1, Interesting
  
  The Cell has one PowerPC core ("PPU"), which is a general purpose PowerPC processor. Nothing exotic at all about programming it. But then you have 6 (for the Playstation 3) or 8 (other computers) "SPE" cores that you can program. Transferring data to/from them is a pain, they have small working memories (256k each), and you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL). They also have poor speed for double-precision floats.
  
  A lot of this is just due to the lack of a good platform, there is nothing that prevents demand paging of data SPEs need and the C++ features are just due to the current implementation of the runtime. I will agree that the Cell is aimed a little bit too much at the video and game markets in its current implementation though, think of it as a first step, if they can make is successful and some better platforms and tools materialize then imagine having 64 SPEs with different groups of specialized functions, perhaps some aimed at linear algebraic functions, some at more analytical, etc.. some kind of parallel multiprocessing is the future, that much is a given, it's just a matter of figuring out the right model.
8. Re:My heterogeneous experience with Cell processor by DoofusOfDeath · 2008-03-11 01:36 · Score: 1
  
  What I mean here is that programs don't typically announce the structure of the full computation ahead of time.
  
  I believe that any Turing-complete language is subject to the halting problem. So for Turing-complete languages, you can't rely on automated tools to calculate the structure of all possible runs of some arbitrary problem.
9. Re:My heterogeneous experience with Cell processor by neomunk · 2008-03-11 01:48 · Score: 3, Insightful
  
  Heterogeneous cores are already in almost every PC I've seen so far this millennium. Anyone with a GPU is running heterogeneous cores in their machine. How do we handle it? The first half of your second sentence; libraries and frameworks. OpenGL, DirectX and whatnot provide the frameworks we need while the various manufacturers provide the drivers to maintain compatibility with the various APIs. We'll see soon enough (as a result of the Cell) if the same thing (2 or more different libraries for the same processor; one for each of it's core-types) becomes the norm for other heterogeneous core system. I think so, but it may be overlooked by manufacturers who want to view a processor as a unit instead of a compilation of various units. They'll figure it out, these guys aren't MBAs, they're the truly educated. :-D
10. Re:My heterogeneous experience with Cell processor by Karellen · 2008-03-11 02:12 · Score: 2, Interesting
  
  "you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL)"
  
  OK, I have to ask - why on earth can't you use C++ exceptions on them?
  
  After all, what is an exception? It's basically syntactic sugar around setjmp()/longjmp(), but with a bit more code to make sure the stack unwinds properly and destructors are called, instead of longjmp() being a plain non-local goto.
  
  What else is there that makes C++ exceptions unimplemenatable?
  
  --
  Why doesn't the gene pool have a life guard?
11. Re:My heterogeneous experience with Cell processor by Kupek · 2008-03-11 03:29 · Score: 1
  
  Good question, and the answer me and the other people in my group came up with is that they probably just haven't implemented it. It would require effort on their part to get exceptions working properly (keep in the mind the SPE is not a typical processing core), and there isn't much function nesting on the SPE due to its limited size. Maintaining stacks more than half a dozen deep on the SPE can become problematic.
  
  So, my guess is it's not unimplementable, it's just not a priority.
12. Re:My heterogeneous experience with Cell processor by nahdude812 · 2008-03-11 03:54 · Score: 1
  
  There are several problems which will keep parallel programming out of the hands of every-day joe schmoe programmers. Probably the most significant of which is an inability to consistently predict issues. What I mean is things like race conditions. Some block of code which could execute correctly 999 times out of 1000. But when you have hundreds or thousands of those, you have a really unstable and unpredictable application with very little ability to figure out exactly where it's going wrong.
  
  This is a side-effect of programming in languages which were designed with serial programming in mind. Procedural languages are a square peg to parallel programming's round hole. With the correct combination of care and force, you can fit it, but you always have a chance that someone checks in a new block of code which is not properly parallel-safe, even in an unrelated subsystem, and it causes your code to error (even though you wrote your code perfectly).
  
  Also, optimizing compilers often change the order of operations when they believe they can make something run faster. This works great in serial programming, but in parallel programming it means that for example, the flag variable you set to indicate a subsystem is ready for use may get set before you intended it to.
  
  A real-world example of this is the common semaphore optimizing technique of test/lock/test. Test if some work needs to be done. Acquire a lock to be sure you're the only one doing this work. Test again to make sure someone else didn't beat you to the lock and do it before their lock was released and given to you. It works great because it saves the overhead of constantly acquiring that semaphore when the work inside it might need doing rarely, or maybe even only once. It's common to see this pattern in lazy instantiation. The problem is that certain optimizers, including the Java one, may rearrange this to Test/Test/Lock (which may end up in the work being done more than once), or Lock/Test/Test (which of course leads to acquiring a semaphore repeatedly).
  
  These are only symptoms of the problem of course. The solution, I believe, will be new programming languages which are functional in nature (C, Java, and most modern languages are procedural, not functional). Functional languages have a lot less state which can be interfered with by other processes accessing the same data. You don't get to set X, then update its value later. X is set and has a known fixed value within this scope, or it is not set and does not have a known fixed value. Functional languages are innately parallizable.
  
  Functional languages include APL, Erlang, Haskell, Lisp, ML, F# and Scheme, and actually XSLT. They're used in parallel programming some today, but I won't be at all surprised if we see a new language come out of academia in the near future which is designed specifically for parallel programming.
  
  --
  Slay a dragon... over lunch!
13. Re:My heterogeneous experience with Cell processor by jd · 2008-03-11 05:06 · Score: 1
  
  Anyone who used a PC back in the days of the 8087 maths co-processor has used heterogeneous cores. Anyone who has used an intelligent peripheral (eg: disk drive or printer with a CPU) has almost certainly used heterogeneous cores. Anyone who used the BBC Microcomputer with any processor other than a 6502 plugged into the Tube port has used heterogeneous cores. Almost anyone who used the Transputer will have used it with a host system, so used heterogeneous cores.
  In other words, they're not just in almost every modern PC, they've been in almost every PC since the beginning and even pre-date the PC in some cases. And, yes, you're absolutely right. Libraries and frameworks are how almost everyone else has solved this particular problem. Dedicated offload processors are an ancient technology and there's no shortage of experience in handling them. (Another solution is to program at a higher level, then use source-to-source compiling to split the code into lower-level special-purpose programs, which can then be compiled for the individual CPUs. This would make sense in those cases where the programmer doesn't know the platform characteristics ahead of time.)
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
14. Re:My heterogeneous experience with Cell processor by ufnoise · 2008-03-11 05:18 · Score: 2, Informative
  
  The Cell has one PowerPC core ("PPU"), which is a general purpose PowerPC processor. Nothing exotic at all about programming it. But then you have 6 (for the Playstation 3) or 8 (other computers) "SPE" cores that you can program. Transferring data to/from them is a pain, they have small working memories (256k each), and you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL). They also have poor speed for double-precision floats.
  
  I find the most useful parts of the STL, don't even use exceptions. It just has a lot of undefined behaviors. There is only one call, at, for vectors and deques that will throw a exception directly. The STL is mostly concerned with being exception safe. Do you have a reference for C++ programming the cell processor concerning the exceptions?
15. Re:My heterogeneous experience with Cell processor by Goalie_Ca · 2008-03-11 06:12 · Score: 1
  
  I was thinking of heterogeneous cores more along the lines of Big + Fast mixed in with Small + Low Power. Both would have a nearly identical instruction set.
  
  --
  
  ----
  Go canucks, habs, and sens!
16. Re:My heterogeneous experience with Cell processor by Anonymous Coward · 2008-03-11 08:24 · Score: 0
  
  > OK, I have to ask - why on earth can't you use C++ exceptions on them?
  
  Because you can't have big stacks on them. There's no ABI requirement that you use the stack at all, but you'd certainly need different mechanics to make them work, and that means a new compiler, which they didn't feel like writing.
17. Re:My heterogeneous experience with Cell processor by Mandatory+Default · 2008-03-11 18:13 · Score: 1
  
  I think you mean a PDP-11. I spent several years developing on a PDP-8. No C compiler was ever built for it, much less a C++ compiler. Even so, your comment still doesn't make sense. The PDP-11 is the granddaddy of today's hardware - multiple registers, stack, protected memory, etc. Today's hardware has all sorts of extra toys, but programmatically it's not that different than a PDP-11. (Yes, I've done assembly language development on a PDP-8, PDP-11, 68000, 808x, and Pentium, among others.) OTOH, the PDP-8 had a 12-bit word, had a single register, no stack, and a rather painful page memory architecture. Anyone else remember what TAD indirect does?
  
  In spite of that, I'm not sure what you are alluding to in your first paragraph. It's pretty unlikely that anyone is going to start writing console games in Perl, Python, PHP, Haskell, VB.net, or (shudder) Java. There's one thing that C and C++ are good at, and that's going fast.
18. Re:My heterogeneous experience with Cell processor by ShakaUVM · 2008-03-11 18:31 · Score: 1
  
  Out of curiosity, what does your development environment on the cell look like? Can you run apps on a PS3 without having to go through a vetting procedure?
  
  I used to work at the supercomputer center in San Diego, and think it would be neat to play around with writing some stuff for my PS3. The folding@home app makes me nostalgic for when I used to work with grid apps.
19. Re:My heterogeneous experience with Cell processor by JohnFluxx · 2008-03-12 01:29 · Score: 1
  
  > It doesn't go nearly far enough. Instead of figuring out the best SQL statement, the programmer should send a list of *all* logically equivalent queries and just let the database execute the one it finds least troublesome.
  
  If they are logically equivalent, then the database program should be able to generate them itself. Pretty much every database program does rewrite the query to maximise performance.
Well, I'm panicked... by argent · 2008-03-11 00:02 · Score: 4, Interesting

The idea of having to use Microsoft APIs to program future computers because the vendors only document how to get DirectX to work doesn't exactly thrill me. I think panic is perhaps too strong a word, but sheesh...
There's only three approaches by gilesjuk · 2008-03-11 00:06 · Score: 1

1. Change operating systems to be able to use the all the available CPU power even when running single threaded applications.

2. Change programming languages to make multicore programming easier.

3. Both 1 and 2.

What the end user should be able to dictate however is how many cores should be in use. It's not for the programmer of the application to dictate how processing of any data should occur.
1. Re:There's only three approaches by slashbart · 2008-03-11 00:48 · Score: 1
  
  >> 1. Change operating systems to be able to use the all the available CPU power even when running single threaded applications.
  
  So how should the operating system be able to figure out what program flow dependencies there are in a binary? You can make an O.S that schedules your single threaded application so that it uses 100% of 1 core, but automatically multithreading a single threaded application, no way, not now, and not for the foreseeable future.
2. Re:There's only three approaches by TheRaven64 · 2008-03-11 01:22 · Score: 1
  
  It's easier with heterogeneous multicore. Your single-threaded game happily makes use of two cores (the CPU and the GPU). Your single-threaded server happily makes use of two cores (your CPU and your crypto coprocessor). The functionality of the extra cores is exposed in both cases via a library (OpenGL or OpenSSL). If you want to design a fast multicore system then profile your existing workloads and see which libraries are using most of the CPU. Then add a core that implements their functionality in hardware.
  
  --
  I am TheRaven on Soylent News
3. Re:There's only three approaches by MadKeithV · 2008-03-11 01:25 · Score: 1
  
  Actually things like the .NET and Java runtime, combined with JIT compiling and optimization, could do exactly that.
  It could recompile a reasonably abstract definition of a program into exactly the kind of code that your current system needs, on-demand
4. Re:There's only three approaches by gilboad · 2008-03-11 16:12 · Score: 1
  
  True, 4 gen language can, at least in theory, take single threaded code and break it into multiple thread, however:
  A. It has nothing to do with OS itself; At least until someone creates embeds a .NET/java VM into the kernel.
  B. Using all cores != getting better performance/efficiency. *
  Given my current experience with .NET software developers (As someone that feeds them the information from a C/C++ based front-end) - I'm -far- from being impressed by C#. (We are actually scraping large pieces of existing .NET code and replacing it by C++ [under linux])
  
  - Gilboa
  * You may argue that once we hit 80 cores, efficiency will lose all relevance. But given the fact that our Windows 2K3/.NET people require a 8000$ 8 core server to accomplish something that could have been executed by a 10 year old PII366 laptop running a slimmed down version of CentOS5 (I kid you not) and polling single-threaded C code - I beg to differ.
he is right, but it depends on the application by CBravo · 2008-03-11 00:13 · Score: 5, Interesting

As I demonstrated in my thesis a parallel application can be shown to have certain critical and less critical parts. An optimal processing platform matches those requirements. The remainder of the platform will remain idle and burn away power for nothing. One should wonder what is better: a 2 GHz processor or 2x 1 GHz processors. My opinion is that, if it has no impact on performance, the latter is better.

There is an advantage to a symmetrical platform: you cannot misschedule your processes. It does not matter which processor takes a certain job. On a heterogeneous system you can make serious errors: scheduling your video process on your communications processor will not be efficient. Not only is the video slow, the communications process has to wait a long time (impacting comm. performance).

--
nosig today
1. Re:he is right, but it depends on the application by Yoozer · 2008-03-11 01:45 · Score: 1
  
  Both, and because foreign profs don't always speak/learn Dutch. Besides, since everyone learns English at age 10 or so, it's not that big of a deal.
2. Re:he is right, but it depends on the application by emj · 2008-03-11 02:31 · Score: 1
  
  It's the same here in Sweden, we used to write our publications in Latin but have in later years switched to English. Especially when you collaborate over boarders.
3. Re:he is right, but it depends on the application by CBravo · 2008-03-11 03:27 · Score: 1
  
  For us it was mandatory. Most CS publications are in English and you have to write an article which is published. Normally the article is the basis for (part of) the thesis itself. For a lot of CS words I would not know the translation.
  
  If you don't write in English, spreading your 'science' becomes very difficult (smaller audience). I actually have a citation from a Frenchman (which would never have happened if the thesis was in Dutch).
  
  Last: English is easy for Dutch people because we see a lot of subtitled TV. I am more or less fluent in English.
  
  --
  nosig today
4. Re:he is right, but it depends on the application by xip.dk · 2008-03-11 04:29 · Score: 1
  
  In Denmark everything academic that gets published is in English. I guess it is because the audience for these works is small already (Computer Scientists) without putting a language constraint on it too (Dutch/Danish speaking).
  Besides that, we have lectures in Danish normally, except if the lecturer is a native English-speaker or we have foreign students in the class, in which case everything is in English.
5. Re:he is right, but it depends on the application by JoelKatz · 2008-03-11 06:48 · Score: 1
  
  Or, to put it another way, which would you rather have:
  
  1) A computer with a fast single CPU core and a fast single GPU core.
  
  2) A computer with two fast cores that can each perform either CPU or GPU tasks equally well.
  
  In this case, 2 is a slam dunk. Let's try it another way, which would you rather have:
  
  1) A computer with a fast single CPU core and also a cryptographic accelerator.
  
  2) A computer with two fast CPU cores, each of which has a cryptographic accelerator.
  
  Hmm, I still like 2 better.
  
  Put every feature you can into the core and give me as many of them as you can. Make them all the same, so scheduling is simple. Sounds like a winning formula.
  
  That should cover us for the next 20 years. Then we may hit walls in synchronization and memory efficiency.
6. Re:he is right, but it depends on the application by CBravo · 2008-03-12 00:21 · Score: 1
  
  Ok, I'll bite (for the record). The question becomes: do you want 10 high speed cores that are capable of everything? Or do you want 20? Or, maybe 15 combined with 10 half speed cores for the same price (because slower cores are much much cheaper)?
  
  Second, a designer has to make a choice and your example hints that one does not have to make that decision. You either get specialized faster silicon or you get slower generalized silicon. That's it. You cannot have it both ways (remember CISC?).
  
  By the way, this is already happening but only inside processors. Remember floating point units? ALU? Are those not specialized parts of the processor with a certain mix (of units) that is 'optimal' or sufficient according to a manufacturer? Why not have 10 processors with an extra ALU and 10 with an extra FPU? Would that not make a heterogeneous system (if only slightly)? Same price, extra performance.
  
  This is just a technical explanation, a technological one becomes even broader.
  
  --
  nosig today
7. Re:he is right, but it depends on the application by JoelKatz · 2008-03-12 04:03 · Score: 1
  
  Ok, I'll bite (for the record). The question becomes: do you want 10 high speed cores that are capable of everything? Or do you want 20? Or, maybe 15 combined with 10 half speed cores for the same price (because slower cores are much much cheaper)?
  
  For the moment, the way the economics (both hardware and software) work, 10 high speed cores is probably a better deal. Slower cores are really not cheaper. If you need a high-speed path to memory (and you almost always do), you need to be made on the latest processes anyway, and if all the cores are on one die, it doesn't take much more space to be faster. 90% of the space goes to memory (cache), not calculation.
  By the way, this is already happening but only inside processors. Remember floating point units? ALU? Are those not specialized parts of the processor with a certain mix (of units) that is 'optimal' or sufficient according to a manufacturer? Why not have 10 processors with an extra ALU and 10 with an extra FPU? Would that not make a heterogeneous system (if only slightly)? Same price, extra performance.
  
  You notice how that really didn't work out so well and you now find CPUs and FPUs are basically always paired one-to-one. The exceptions are only found in limited-purpose products.
  
  You may be right for the more distant future. But the economies right now (both in hardware and in software) favor symmetric cores heavily. I don't think this will change in less than 20 years.
8. Re:he is right, but it depends on the application by CBravo · 2008-03-12 23:13 · Score: 1
  
  Define cheaper. It may not be the purchasing price or die area. It may be partly defined as power usage (just lowering the clock may save you battery power/weight on your laptop) and avoid thermal meltdown. You say that you need high speed memory access but you haven't proven it. Slow speed memory access, less cache, lower clock: you might be able to mix that for a certain amount of processors without a performance penalty. You may be able to do that dynamically.
  
  About the FPU: most processors have a number of FPU's, ALU's and other units combined. They are by no means obsolete, just integrated into the cpu. What I meant was that there is a mix of them inside a processor. A designer can change that mix to favour e.g. floating point calculations. See for instance Itaniums architecture
  
  Heterogeneous processing is already happening. A GPU from nVidia is a full fledged processor, very expensive and large, optimized for video processing. AMD is just seeing what is already there: a heterogeneous architecture. A GPU was introduced as an accelerator but that is just a
  
  --
  nosig today
Multithreading is not easy but it's doable by pieterh · 2008-03-11 00:20 · Score: 5, Interesting

It's been clear for many years that individual core speeds had peaked, and that the future was going to be many cores and that high-performance software would need to be multithreaded in order to take advantage of this.

When we wrote the OpenAMQ messaging software in 2005-6, we used a multithreading design that lets us pump around 100,000 500-byte messages per second through a server. This was for the AMQP project.

Today, we're making a new design - ØMQ, aka "Fastest. Messaging. Ever." - that is built from the ground up to take advantage of multiple cores. We don't need special programming languages, we use C++. The key is architecture, and especially an architecture that reduces the cost of inter-thread synchronization.

From one of the ØMQ whitepapers:
Inter-thread synchronisation is slow. If the code is local to a thread (and doesn't use slow devices like network or persistent storage), execution time of most functions is tens of nanoseconds. However, when inter-thread synchronisation - even a non-blocking synchronisation - kicks in, execution time grows by hundreds of nanoseconds, or even surpasses one microsecond. All kind of time-expensive hardware-level stuff has to be done... synchronisation of CPU caches, memory barriers etc.

The best of the breed solution would run in a single thread and omit any inter-thread synchronisation altogether. It seems simple enough to implement except that single-threaded solution wouldn't be able to use more than one CPU core, i.e. it won't scale on multicore boxes.

A good multi-core solution would be to run as many instances of ØMQ as there are cores on the host and treat them as separate network nodes in the same way as two instances running on two separate boxes would be treated and use local sockets to pass messages between the instances.

This design is basically correct, however, the sockets are not the best way to pass message within a single box. Firstly, they are slow when compared to simple inter-thread communication mechanisms and secondly, data passed via a socket to a different process has to be physically copied, rather than passed by reference.

Therefore, ØMQ allows you to create a fixed number of threads at the startup to handle the work. The "fixed" part is deliberate and integral part of the design. There are a fixed number of cores on any box and there's no point in having more threads than there are cores on the box. In fact, more threads than cores can be harmful to performance as they can introduce excessive OS context switching.

We don't get linear scaling on multiple cores, partly because the data is pumped out onto a single network interface, but we're able to saturate a 10Gb network. BTW ØMQ is GPLd so you can look at the code if you want to know how we do it.

--
My blog
1. Re:Multithreading is not easy but it's doable by maestroX · 2008-03-11 00:48 · Score: 1
  
  Interesting. But does it infringe on the QNX patent http://en.wikipedia.org/wiki/QNX??
2. Re:Multithreading is not easy but it's doable by pieterh · 2008-03-11 01:08 · Score: 1
  
  Good question. The answer is "no, not as far as we're aware"; the patent covers the distribution of transactions across network nodes, invisibly to applications, and is specifically aimed as implementing GUIs. From the patent, "The invention disclosed broadly relates to graphical user interfaces (GUI's) and particularly relates to the software architectures used to implement them."
  
  However, all software patents have the problem of "creep", so that if a market emerges that looks within reach of the claims, the patent holder - if litigious - will try to expand the scope of the patent to claim this market. The claims of this patent, which are what really count, are written in fairly abstract language.
  
  It is impossible to clear new software for patents - the cost would exceed $1bn - so we just have to try to stay away from known danger areas.
  
  More constructively, we also support the fight against the software patent regime, at least in Europe.
  
  --
  My blog
3. Re:Multithreading is not easy but it's doable by TheRaven64 · 2008-03-11 01:26 · Score: 1
  
  While it's not quite the same target application, I'd be interested in how your code compares in terms of scalability to eJabberd, which is written from scratch in a language designed for concurrency, scales easily up to medium sized clusters and has replaced the C implementation as the de facto standard XMPP server.
  
  --
  I am TheRaven on Soylent News
4. Re:Multithreading is not easy but it's doable by pieterh · 2008-03-11 02:45 · Score: 3, Interesting
  
  It's unfair to compare blob messaging with a protocol that has to process XML, but let's look. I'm using http://www.ejabberd.im/benchmark as a basis:
  
  - eJabberd latency is in the 10-50msec range. 0MQ gets latencies of around 25 microseconds.
  - eJabberd supports more than 10k users. 0MQ will support more than 10k users.
  - eJabberd scales transparently thanks to Erlang. 0MQ squeezes so much out of one box that scaling is less important.
  - eJabberd has high-availability thanks to Erlang 0MQ will have to build its own HA model (as OpenAMQ did).
  - eJabberd can process (unknown?) messages per second. 0MQ can handle 100k per second on one core.
  
  Sorry if I got some things wrong, ideally we'd run side-by-side tests to get figures that we can properly compare.
  
  Note that protocols like AMQP can be elegantly scaled at the semantic level, by building federations that route messages usefully between centers of activity. This cannot be done in the language or framework, it is dependent on the protocol semantics. This is how very large deployments of OpenAMQ work. I guess the same as SMTP networks.
  
  0MQ will, BTW, speak XMPP one day. It's more a framework for arbitrary messaging engines and clients, than a specific protocol implementation.
  
  I've seen Erlang used for AMQP as well - RabbitMQ - and by all accounts it's an impressive language for this kind of work.
  
  --
  My blog
Why choose? by Evro · 2008-03-11 00:21 · Score: 2, Insightful

Just build both and let the market decide.

--
rooooar
Heterogenous is a natural thing to do by A+beautiful+mind · 2008-03-11 00:21 · Score: 3, Interesting

If you have 80 or more cores, I'd rather have 20 of them support specialty functions and be able to do them very fast (it would have to be a few (1-3) orders of magnitude faster than the general counterpart) and the rest do general processing. This of course needs the support of operating systems, but that isn't very hard to get. With 80 cores caching and threading models have to be rethought, especially caching - the operating system has to be more involved in caching than it currently is, because otherwise cache coherency won't be able to be done.

This also means that programs will need to be written not just by using threads, "which makes it okay for multi-core", but with cpu cache issues and locality in mind. I think VMs like JVM, Parrot and .NET will be much more popular as it is possible for them to take care a lot of these issues, which isn't or only possible in a limited way for languages like C and friends with static source code inspection.

--
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
1. Re:Heterogenous is a natural thing to do by Anonymous Coward · 2008-03-11 00:58 · Score: 0
  
  Don't you mean "80 or Moore cores"?
  
  I'll get my coat.
CPU != BRAIN by v(*_*)vvvv · 2008-03-11 00:23 · Score: 1

There is this view held by some (of which some are posting here) that somehow CPUs are primitive brains and that improving them will eventually result in a non-primitive brain. Hello, there is nothing remotely human about what my computer has done for me lately. Computers and humans *do* very different things, and *are* very different things.

I beg that the distinction between acquiring hints from brain structure vs creating brain structure not be blurred, and that no moderator marks "brains are like this so chips should be like that" type posts as informative or insightful.

No one at Intel has their chipset blueprints confused with an x-ray of Einstein's brain.
1. Re:CPU != BRAIN by curmudgeon99 · 2008-03-11 00:29 · Score: 1
  
  Please do not pooh-pooh our ideas, unless YOU HAVE A BETTER ONE. Please correct me if I'm wrong but I see modern computers only coming close to simulating on the most rudimentary level the functions of the LEFT hemisphere. No one has attempted to replicate the right hemisphere's function. So, I'm waiting for your better idea...
2. Re:CPU != BRAIN by Eternauta3k · 2008-03-11 01:52 · Score: 1
  
  Please correct me if I'm wrong but I see modern computers only coming close to simulating on the most rudimentary level the functions of the LEFT hemisphere
  That's not CPU design, that's AI algorithms.
  
  --
  Yeah. Would you choose a neurosurgeon who pokes around people's brains in his spare time? I wouldn't.
3. Re:CPU != BRAIN by Anonymous Coward · 2008-03-11 08:32 · Score: 0
  
  > No one has attempted to replicate the right hemisphere's function
  
  Right that's the magical elves-and-fairies "creative" part? It's amazing how many people bought this tripe.
  
  News flash fella, we're actually closer to replicating that part. Spatial relationships are no problem anymore, and image recognition is a process we understand more every day. Meanwhile, we still have problems teaching language with ambiguities, or complex inferences.
4. Re:CPU != BRAIN by curmudgeon99 · 2008-03-13 12:17 · Score: 1
  
  Sounds to me that you really agree. What kind of AI algorithms were you referring to?
5. Re:CPU != BRAIN by Eternauta3k · 2008-03-14 06:12 · Score: 1
  
  Don't know any such algorithms, but if your CPUs use the same instruction set as mine, they don't replicate the brain in any meaningful way. I disagree, because the advances we're seeing in CPU design increase speed and number of cores, but do not constitute ground-breaking brain-imitating designs.
  
  Remember, CPUs
  
  --
  Yeah. Would you choose a neurosurgeon who pokes around people's brains in his spare time? I wouldn't.
Multicores, but not on a chip by Kim0 · 2008-03-11 00:24 · Score: 5, Interesting

This trend with multiple cores on the CPU is only an intermediate phase,
because it over saturates the memory bus, which is easy to remedy by
putting the cores on the memory chips, of which there are a number
comparable to the number of cores.

In other words, the CPUs will disappear, and there will be lots of smaller
core/memory chips, connected in a network. And they will be cheaper as well,
because they do not need so high a yeld.

Kim0
1. Re:Multicores, but not on a chip by richlv · 2008-03-11 00:34 · Score: 1
  
  and on a larger scale there's this wicked idea about plan9.
  as for parallel processing, i don't think it is feasible to be implemented in each app separately - more likely it would be built upon some higher level api, where app would simply tell "these things can run in parallel, this one should wait for that one to finish, and this one can start as soon as that one sends a particular signal".
  it would be somewhat more work, but something like that is being already implemente with kde4 and i expect it only to become more widespread.
  
  --
  Rich
2. Re:Multicores, but not on a chip by jo42 · 2008-03-11 01:44 · Score: 4, Informative
  
  lots of smaller core/memory chips, connected in a network You mean like the Transputer back in the '80s?
3. Re:Multicores, but not on a chip by imgod2u · 2008-03-11 02:21 · Score: 1
  
  Looking at all processors for the past 10 years, they already *are* core/memory chips. More than half the processor die is cache. Also, the 80 core chip Intel made was indeed a network-on-chip architecture. This is true of Cell and many others. It's difficult to program and take advantage of but the payoff can be tremendous. Multiple small, scalar cores each with its own cache pocket (and a massively parallel memory interface).
  
  Disregarding discrete memory in and of itself is unlikely to happen in the future IMO. Dedicated memory chips will always be denser than having small pockets of memory.
4. Re:Multicores, but not on a chip by coats · 2008-03-11 02:59 · Score: 1
  
  half? More like 90% of the processor die is cache...
  
  --
  "My opinions are my own, and I've got *lots* of them!"
5. Re:Multicores, but not on a chip by springbox · 2008-03-11 03:23 · Score: 1
  
  more likely it would be built upon some higher level api, where app would simply tell "these things can run in parallel, this one should wait for that one to finish, and this one can start as soon as that one sends a particular signal".
  
  Threads?
6. Re:Multicores, but not on a chip by richlv · 2008-03-11 03:39 · Score: 2, Informative
  
  well, yes, i believe the thingie taking care of that is even called "threadweaver" :)
  
  yep, here it is : http://api.kde.org/4.0-api/kdelibs-apidocs/threadweaver/html/index.html
  
  --
  Rich
7. Re:Multicores, but not on a chip by kitgerrits · 2008-03-11 04:39 · Score: 1
  
  Multiple cores, each with their own memory...
  This sounds exactly like the Cell processor.
  
  This chip has barely hit the consumer shelves an already developers are complaining about transfer speeds between the side-cores and the main memory, whilst this thing has only 9 cores, total.
  
  Once you start giving each CPU its own memory, you're wasting most of the physical memory on cores that need very little, just because a few cores might need a lot of memory.
  
  This problem could be solved by 'dynamic' allocation of RAM and CPU resources, which is already present in certain virtualization technologies (Sun anyone?).
  The problem with this is, that you now have core-dedicated RAM, but it's not sitting right next to the core.
  
  You could also go the other way by giving each core a lot of cache memory, them all into a blindingly fast bus and hooking the memory straight into the bus.
  This is what AMD has done with their multi-core CPUs with full crossbar and integrated memory controller on the CPU die.
  
  Now, what technology or trick should vendors try to improve performance?
  Please note, that cores have not successfully improved that much in raw GHz, the last few years (the P4 was not a success))
  
  I recall an article that explained why multiple cores is the way to go.
  Mote ALU's on a die makes multi-threading and inter-dependency exponentially more complex.
  This would need more schedulers, which would fight it out amongst each other.
  In order to have the software (OS) make sense of all the threads, they should be partitioned.
  So you're back to sticking is all into separate cores.
  
  --
  "I was in love with a beautiful blonde once, dear. She drove me to drink. It's the one thing I am indebted to her for."
8. Re:Multicores, but not on a chip by SlowMovingTarget · 2008-03-11 04:53 · Score: 1
  
  ... and we'll call them "engines" and program them just like a mainframe. It may be new to the desktop or commodity rack, but it's not new. The only difference will be the programming languages, and even there, there's an awful lot of familiar territory.
9. Re:Multicores, but not on a chip by arktemplar · 2008-03-11 05:09 · Score: 1
  
  well I guess you are right about it being an intermedeate stage. I guess that things are going to go the cell way from now on, but with a different abstraction level visible to the guys programming it, I think the TRIPS architecture http://en.wikipedia.org/wiki/TRIPS_architecture that IBM has or the Mollen architecture http://ce.et.tudelft.nl/publicationfiles/908_9_prototype_molen.pdf might be the future.
  
  --
  blog plug -> The Darker Side of Light
10. Re:Multicores, but not on a chip by Anonymous Coward · 2008-03-11 09:59 · Score: 0
  
  All computing is evolving toward the 1980s. All of the world is evolving toward the 1980s. Soon, perhaps in our lifetimes, it will become the 1980s again, and stay there .. forever!
11. Re:Multicores, but not on a chip by markjhood2003 · 2008-03-11 10:52 · Score: 1
  
  Or perhaps the Connection Machine http://en.wikipedia.org/wiki/Connection_machine from the '80s? It's interesting that this "panic" first appeared 20 years ago when it appeared that transistor densities might be hitting a limit. But then superscalar architectures were implemented and fab technology improved, and we were able to ignore the problem for the next 20 years.
12. Re:Multicores, but not on a chip by Fulcrum+of+Evil · 2008-03-13 08:14 · Score: 1
  
  Once you start giving each CPU its own memory, you're wasting most of the physical memory on cores that need very little, just because a few cores might need a lot of memory.
  
  No you aren't. The non local ram is just slower.
  
  Please note, that cores have not successfully improved that much in raw GHz, the last few years (the P4 was not a success))
  
  Not relevant. Recent processors are faster than older ones, even with lower clocking.
  
  Mote ALU's on a die makes multi-threading and inter-dependency exponentially more complex.
  
  Compared to what? It's just physical packaging, with possible sharing of cache.
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Specialisation is inevitable by adamkennedy · 2008-03-11 00:26 · Score: 2, Insightful

I have a 4-core workstation and ALREADY I get crap usage rates out of it.

Flick the CPU monitor to aggregate usage rate mode, and I rarely clear 35% usage, and I've never seem it higher than about 55% (and even that for only a second or two once an hour). A normal PC, even fairly heavily loaded up with apps, just can't use the extra power.

And since cores aren't going to get much faster, there's no real chance of getting big wins there either.

Unless you have a specialized workload (heavy number crunching, kernel compilation, etc) there's going to simply be no point having more parallelism.

So as far as I can tell, for general loads it seems to be inevitable that if we want more straight line speed, we'll need to start making hardware more attuned for specific tasks.

So in my 16-core workstation of the future, if my Photoshop needs to apply some relatively intensive transform that has to be applied linearly, it can run off to the vector core, while I'm playing Supreme Commander on one generic core (the game) two GPU cores (the two screens) and three integer-heavy cores (for the 3 enemy AIs), and the generic System Reserved Core (for interrupts, and low-level IO stuff) hums away underneath with no pressure.

Hetrogeny also has economics on it's side.

There's very little point having specialized cores when you've only got two.

Once there's no longer scarcity in quantity, you can achieve higher productivity by specialization.

Really, any specialized core that you can keep the CPU usage rates running higher than the overall system usage rate, is a net win in productivity for the overall computer. And over time, anything that increases productivity wins.
1. Re:Specialisation is inevitable by makapuf · 2008-03-11 01:25 · Score: 1
  
  There's very little point having specialized cores when you've only got two. Like, say, a CPU and a GPU ? I would have thought it was pretty efficient.
  
  I think it all breaks down to how much specialized-but-still-generic-being-computationnaly-intensive tasks we define and then implement in hardware.
  
  And, finally, it's the same specialized vs generic hardware wheel of reincarnation (see http://www.catb.org/~esr/jargon/html/W/wheel-of-reincarnation.html)
2. Re:Specialisation is inevitable by jcupitt65 · 2008-03-11 01:25 · Score: 1
  
  Unless you have a specialized workload (heavy number crunching, kernel compilation, etc) there's going to simply be no point having more parallelism.
  
  You can get very good parallelism with media apps like photoshop, audio or video encode/decode, things like that. Regular desktop apps aren't going to often go to the trouble, but I can see a future when most media libraries are heavily threaded. My spare time project (a GPL image processing library) gets about a 27x speedup on a 32-cpu machine, at least on some benchmarks.
  It'll maybe be a bit like current console development: middleware authors will get their hands dirty with the hardware, and that knowledge will be packaged up and sold to app developers.
3. Re:Specialisation is inevitable by everphilski · 2008-03-11 01:59 · Score: 1
  
  I have a 4-core workstation and ALREADY I get crap usage rates out of it.
  Screw the desktop, this is for the tinkerers, the computational scientists, the engineers. I can peg a 4 core Opteron.
4. Re:Specialisation is inevitable by Zoxed · 2008-03-11 03:07 · Score: 1
  
  > A normal PC, even fairly heavily loaded up with apps, just can't use the extra power.
  
  Mine can, no problem. I only have a dual core but can easily be: re-coding a video file, converting FLAC to mp3, burning a DVD, listening to an MP3 and surfing the web at the same time. Some processes are IO bound, but I can still get very high CPU utilization.
  
  And as a Gentoo user building is much faster using make -j3 to create up to 3 compilation threads that easily maxes out both cores !! (But credit to recent Linux kernels that the burning buffer is always fed >90%, and the MP3 playback never skips a beat !)
5. Re:Specialisation is inevitable by adamkennedy · 2008-03-11 05:37 · Score: 2, Interesting
  
  Two is totally doable. I can fill two (or the equivalent of two) of my four cores.
  
  Trouble is, filling four cores is quite a bit more iffy.
Brain by slashflood · 2008-03-11 00:28 · Score: 1

Take an advise from mother nature: as far as I know, our brain works like a heterogeneous multicore processor. We don't have multiple generic mini-brains in our head, we have one brain with highly specialized brain areas for different tasks. Seems to be the right concept for a computer processor.
1. Re:Brain by Anne+Thwacks · 2008-03-11 00:56 · Score: 1
  
  as far as I know, our brain works like a heterogeneous multicore processor
  Then your brain needs an upgrade.
  The brain has a (virtual) single serial processor, and a great bundle of "neural networks" which are essentially procedures built from hardware. (Kind of like early mainframes had a circuit board per instruction, and then gated the results of the selected instruction onto the bus.)
  The self-modifying neural network architecture is interesting, but not to people who want to buy reliable computing engines.
  While perhaps not immediately obvious to everyone,
  learning==forgetting
  Hence absent minded professors.
  I do not want my wages computed by neural networks, and I dont want my bank to store my account on one either. If they want to use one to data-mine, well and good. (But hopefully one good enough to learn that I put their spam in the bin without reading it).
  
  --
  Sent from my ASR33 using ASCII
Seems like Google would have some ideas by smose · 2008-03-11 00:31 · Score: 1

Strange, it seems to me that Google would have some ideas about how to utilize massively parallel processing, as would the supercomputing crowd.

Is the issue here how to scale supercomputing concepts down to desktop applications? Well, for starters, you can dedicate a couple of cores to run all of the background processes (on the order of 70) that my IT department insists must reside on my system, so that I might get at least one which can work on the application(s) at hand.
1. Re:Seems like Google would have some ideas by ThreeIfByAir · 2008-03-11 03:05 · Score: 1
  
  They do. They bought PeakStream. Now, I have no particular idea what they're doing with PeakStream's technology, but clearly, like Victor Kiam, they liked it so much they bought the company.
Simple well tested solution. by thehatmaker · 2008-03-11 00:31 · Score: 1, Funny

Looking back at history, we see that as clock speeds and memory capacity increased, software writing became simplified by the use of higher level languages whos output, while not as optimal as machine code programming, ran at a similar speed to previous generation hardware using well optmised machine code. And so, the "problem" of writing for faster machines was solved.

For the multicore problem, I propose a similar strategy. Simply write a natural language programming interface which uses n-x cores to interpret and compile the code into a mish mash of bloated machine code, which then runs on the remaining x number of cores. Of course, several remaining cores would be needed to run this bloated mess at speeds comparable to 486's - but at least the new hardware could be widely sold, thus supporting industry!

Its not like the users really need faster software, they just need a reason to upgrade to better hardware, right?? right??
Sun's thoughts by Dersaidin · 2008-03-11 00:33 · Score: 1

I went to a presentation by Sun last Friday (by Don Kretsch and Liang Chen), on "High Powered Computing". Sun's idea of HPC is, logically, multicored/cluster solutions. They talked about some of their abstraction ideas on how to take advantage of a bunch of cores. Some interesting stuff, but it was still pretty similar to traditional single core approach, only branching for some stuff, like loops. I'm not sure if any of their abstraction ideas were radical enough to get excited about, but it was still interesting to see. Task specific hardware and low level programming seems like the best approach for me. Like graphics cards in games. Once we're comfortable with that it then maybe build up some APIs. Sun's presentation convinced me that its the biggest challenge of modern computing.
1. Re:Sun's thoughts by nschubach · 2008-03-11 03:18 · Score: 1
  
  I have always looked for ways to thread loops in my programs. I've been on the verge of writing a personal threadeach that works like a foreach loop and would send the functions in the block to new threads to process on individual elements of an object(array.) This of course doesn't work for every situation, and I have some things to figure out like a good way to "stall" the main thread so I don't spawn a bunch of threads that will finish processing after the main execution path has past their usefulness. I was mainly thinking of this for multiple file processing. Open 4-5 files, read into memory, parse and upload to a database or similar projects that can be parallelized. Lately though, I've been creating a file queue and assigning cores to functionality as they are needed in the queue or even as simple as stepping the queue in steps of the number of cores (itemCount mod processorCount) just to get some work done so the boss doesn't come down on me.
  
  Anyway, loop parallelization is something I've been fiddling with for a little while now and I'd be very interested to read some of their ideas if you know of any links.
  
  --
  Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
2. Re:Sun's thoughts by dr2chase · 2008-03-11 03:40 · Score: 1
  
  I work for Sun, though I don't speak for them.
  I thought it was a little unusual that the EETimes article didn't even mention Sun, seeing as how we've been shipping "32" and "64"-way multicores for a while now (scare quotes because of how the 32 and 64 are implemented -- it's N cores, each time-sliced M ways to cover operation and memory latency). And yes, we're working on the problem, and so are (to my knowledge) Intel, IBM, and Microsoft, and I'm pretty sure we're all supporting (money and/or equipment) researchers at universities (I know Sun is) to work on it. The graphics card companies have a different take on it; arguably they've been doing mass-market multicore (for funny-looking cores) longer than anyone. I don't know that we started soon enough, and people need to understand that Moore's law is now about number of processors, not clock rates -- imagine, if you will, quadrupling the per-chip processor count every four years.
  And I was once hissed for this remark while speaking in public, but people should took a long, serious look at functional programming languages.
3. Re:Sun's thoughts by Dersaidin · 2008-03-11 22:11 · Score: 1
  
  Check these out. Specifically this one. That was actually the first part of the HPC presentation. Then Liang did a demonstration compiling code like on page 19 of that link with their funky compiler, and some analysis tools.
  I plan on getting myself Solaris developer edition to have a play.
Occam and Beyond by BrendaEM · 2008-03-11 00:33 · Score: 3, Insightful

Perhaps, panic is a little strong. At the same time, programing languages such as Occam, that are built from the ground up seem very provocative now. Perhaps Occam's syntax could modified to a Python-type syntax for a more popularity.

[Although, personally, I prefer Occam's syntax over that of C's.]

http://en.wikipedia.org/wiki/Occam_programming_language

I think that a tread aware programming language would be good in our multi-core world.

--
https://www.youtube.com/c/BrendaEM
Help me understand the distinction by Junior+J.+Junior+III · 2008-03-11 00:35 · Score: 2, Interesting

I'm curious how having specialized multi-core processors is different from having a single-core processor with specialized subunits. Ie, a single core x86 chip has a section of it devoted to implementing MMC, SSE, etc. Isn't having many specialized cores just a sophisticated way of re-stating that you have a really big single-core processor, in some sense?

--
You see? You see? Your stupid minds! Stupid! Stupid!
1. Re:Help me understand the distinction by photon317 · 2008-03-11 01:15 · Score: 1
  
  The difference is that the subunits are instructed on what to do via a single procedural stream of instructions from the compiler's point of view. The CPU does some work to reorder and parallelize the instruction stream to try to keep all the subunits busy if it can, but it doesn't always do a great job, and the compiler also knows the rules for how a given CPU does the re-ordering/parallelization and tries to optimize the stream to better the outcome. This scheduling is taking place at a very low level with very small chunks of (or even single) instructions. Algorithms for auto-parallelizing code quickly in hardware don't really scale up to bigger chunks of code (and as we've seen, even when they deal with smaller chunks, the stream needs to be pre-optimized by the compiler for effectiveness).
  
  But certainly this must be an area of active research. An "obvious" (if currently impossible) solution is to build an 80 core CPU that looks like a 4-core CPU to the operating system, and dedicates a few cores to auto-parallelizing the 4 instruction streams from the OS onto the remaining bulk of the cores. However if we had algorithms that could do that job reasonably effectively in realtime, we could certainly put those same algorithms in compilers and make them do an even better job in non-realtime. So that makes that approach seem silly.
  
  --
  11*43+456^2
2. Re:Help me understand the distinction by TheRaven64 · 2008-03-11 01:36 · Score: 1
  
  The big difference is in power consumption. Modern x86 CPUs have (SSE) instructions that load 128-bits at once. This means you really need a 128-bit connection to your memory. Your CPU has instructions that load 1024 bits at once, meaning you need eight times as wide a connection to memory. If you want to implement these on an x86 core, you need to do one of two things. Your first option is just to make them take 8 cycles. This is not ideal, from a speed-perspective. The other is to make the connections wider to the CPU. The down side of this is that you aren't doing 1024-bit loads or stores very often on your CPU so most of the time this memory controller is using a small fraction of its capacity, but generating a lot of heat.
  The other big difference is whether registers and decode units are shared. SSE has its own set of registers, but you can do direct copies between SSE registers and general purpose registers. You use the same logic to decode (at least the start of) SSE instructions and integer instructions. This has two side-effects. The first is that you lower the overall throughput (with heterogeneous cores, you know which core an instruction will run on before it gets to the core) and you need a lot of circuitry to handle synchronisation between registers. If your CPU is out-of-order then you need to track dependencies between SSE and integer instructions in hardware (actually, you do for in-order too, but it's easier). For heterogeneous cores you just have an instruction that says 'wait for data from this other core.'
  
  --
  I am TheRaven on Soylent News
3. Re:Help me understand the distinction by TuringTest · 2008-03-11 01:54 · Score: 1
  
  Single-core processor is programmed in-house by the chip maker, through the micro-controlled logic.
  
  A collection of specialized cores would be open to the processor users, allowing for arbitrary programming of the wires, thus (ideally) taking advantage of the hardware potential to the max. It's much more flexible than a hard-wired logic, but it's also publishing the complexity to the world at large to face it.
  
  --
  Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
4. Re:Help me understand the distinction by Fulcrum+of+Evil · 2008-03-13 10:27 · Score: 1
  
  Modern x86 CPUs have (SSE) instructions that load 128-bits at once. This means you really need a 128-bit connection to your memory. Your CPU has instructions that load 1024 bits at once, meaning you need eight times as wide a connection to memory.
  
  No you don't. You only need to make it as wide as the memory is (usually 64 or 128 bit), then pipeline the data back. This is why ram is rated as 3-1-1-1.
  
  If you want to implement these on an x86 core, you need to do one of two things. Your first option is just to make them take 8 cycles.
  
  loading memory takes way more than 8 cycles.
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Single core vs multicore by Xacid · 2008-03-11 00:38 · Score: 1

Call me what you will, but personally I *still* prefer the performance of a super fast single core (~3.5ghz+) over this over-hyped multi-core phenomenon. I've yet to see any *major* differences between two machines I have that are the same clock speed, one single core, one dual. The difference I do experience is similar to what I'd expect from a .5ghz jump. In other words, the architecture *does* need to change if they have any desire to have any significant performance increases.
I like this, more complexity - better jobs! by slashbart · 2008-03-11 00:43 · Score: 1

The way I see it, to get max. performance out of these chips, you need a deeper understanding of them, i.e. it requireshigher skills, i.e. better quality jobs, better money, the works. Consider the fact that a lot of programmers have a really hard time dealing with concurrency at a thread level, these coming chips will only make it harder.
I don't think most concurrency problems can be automated away, it's the concepts and implementation of the concurrent algoritms that are hard, not so much the implementation (although that is where the bugs bite you when the stars are just right (wrong?)).

I'm rambling a bit I see, but I'm looking forward to interesting times ahead.
better idea by timster · 2008-03-11 00:43 · Score: 2, Funny

See, the thing to do with all these cores is run a physics simulation. Physics can be easily distributed to multiple cores by the principle of locality. Then insert into your physics simulation a CPU -- something simple like a 68k perhaps. Once you have the CPU simulation going, adjust the laws of physics in your simulation (increase the speed of light to 100c, etc) so that you can overclock your simulated 68k to 100Ghz. Your single-threaded app will scream on that.

P.S.: I know why this is impossible, so please don't flame me.

--
I have seen the future, and it is inconvenient.
1. Re:better idea by Jesus_666 · 2008-03-11 06:05 · Score: 1
  
  P.S.: I know why this is impossible, so please don't flame me.
  Of course it's impossible. Because the calculations would be faster than c in the real world, your CPUs would travel backwards through time and undo all work they've done so far.
  
  --
  USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
How is heterogenous CPU different to separate GPU? by tomalpha · 2008-03-11 00:46 · Score: 1

Genuine question that I don't know the answer to:

How are heterogeneous CPU cores different conceptually to a modern PC system with say:

2 x General purpose cores (in the CPU)
100 x Vector cores (in the GPU)
n x Vector cores (in a physics offload PCI card)

How is moving the vector (or whatever) cores onto the CPU die different to the above setup, apart from allowing for faster interconnects?
Current state of software development by Alex+Belits · 2008-03-11 00:55 · Score: 5, Funny

Ugg is smart.
Ugg can program a CPU.
Two Uggs can program two CPUs.
Two Uggs working on the same task program two CPUs.
Uggs' program has a race condition.
Ugg1 thinks, it's Ugg2's fault.
Ugg2 thinks, it's Ugg1's fault.
Ugg1 hits Ugg2 on the head with a rock.
Ugg2 hits Ugg1 on the head with an axe.
Ugg1 is half as smart as he was before working with Ugg2.
Ugg2 is half as smart as he was before working with Ugg1.
Both Uggs now write broken code.
Uggs' program is now slow, wrong half the time, and crashes on that race condition once in a while.
Ugg does not like parallel computing.
Ugg will bang two rocks together really fast.
Ugg will reach 4GHz.
Ugg will teach everyone how to reach 4GHz.

--
Contrary to the popular belief, there indeed is no God.
1. Re:Current state of software development by asm2750 · 2008-03-11 04:16 · Score: 1
  
  Thats the current problem in a nutshell. People need to learn, that clock speed is no longer be a major factor in processor design and parallelism is.
Invention? by SharpFang · 2008-03-11 01:02 · Score: 1

Some, like senior AMD fellow, Chuck Moore, believe that the industry should move to a new model based on a multiplicity of cores optimized for various tasks

And let's give the cores names like Paula, Agnus, Denise...

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
1. Re:Invention? by Jesus_666 · 2008-03-11 06:01 · Score: 1
  
  I can't wait for Creative to release their new "Sound Interface Device" line...
  
  --
  USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
+1 Optimistic by Sapphon · 2008-03-11 01:03 · Score: 4, Funny

The height of optimism: posting proof in the form of a 70-odd page thesis on a Slashdot.
I don't think we'll be Slashdotting your server any time soon, CBravo ;-)

--
Antiquis temporibus, nati tibi similes in rupibus ventosissimis exponebantur ad necem.
One Fast Core, Multiple Commodity ones by Brit_in_the_USA · 2008-03-11 01:21 · Score: 2, Interesting

I have read many times that some algorithms are difficult or impossible to multi-thread. I envisage the next logical step is a two socket motherboard, where one socket could be used for a 8+ core cpu running at low clock rate (e.g. 2-3Ghz) and another socket for a single core running at the greatest frequency achievable to the manufacturing process (e.g. x2 to x4 the clock speed of the multi-core) with whatever cache size compromises are required.

This help get around yield issues of getting all cores to work at a very high frequency and the related thermal issues . This could be a boon to general purpose computer that have a mix of hard to multi-thread and easy to multi-thread programs - assuming the OS could be intelligent on which cores the tasks are scheduled on. The cores could or could not have the same instruction sets, but having the same instruction sets would be the easy first step.
1. Re:One Fast Core, Multiple Commodity ones by Shados · 2008-03-11 01:45 · Score: 1
  
  Definately a good idea, if its possible. Right now when looking at buying a new computer, i'm looking at the dual cores, not quad core machines. Why? Because the dual cores will have higher clock speed (much higher) at the same price. Most software (read:games, hehehe) are poorly multithreaded, and will benefit more from higher clock speed than multi core. So overall, the computer feels faster. A suggestion like yours would be a cool compromise.
What about the OMG? by guysmilee · 2008-03-11 01:25 · Score: 1

Doesn't the OMG have anything to help with this ... suggested patterns ... specs etc ?
No problems for servers by TheLink · 2008-03-11 01:25 · Score: 5, Insightful

For servers the real problem is I/O. Disks are slow, network bandwidth is limited (if you solve that then memory bandwidth is limited ;) ).

For most typical workloads most servers don't have enough I/O to keep 80 cores busy.

If there's enough I/O there's no problem keeping all 80 cores busy.

Imagine a slashdotted webserver with a database backend. If you have enough bandwidth and disk I/O, you'll have enough concurrent connections that those 80 cores will be more than busy enough ;).

If you still have spare cores and mem, you can run a few virtual machines.

As for desktops - you could just use Firefox without noscript, after a few days the machine will be using all 80 CPUs and memory just to show flash ads and other junk ;).
--
- Too many replies beneath your current threshold
1. Re:No problems for servers by dbIII · 2008-03-11 02:55 · Score: 1
  
  Well Mr Nasty Logout Link there are a lot of solutions for problems that are CPU bound instead of I/O bound. Almost the entire feild of numerical processing as an example and video processing as another that really doesn't require much I/O.
  Remember folks - don't click on the link. It's some kind of practical joke this poster has been playing for a long time where he writes something to provoke replies and you click on the link instead to post as AC.
2. Re:No problems for servers by TheLink · 2008-03-11 06:31 · Score: 1
  
  I'm well aware of that. If you actually read what I posted, I was talking about typical servers: webservers, fileservers, database servers. So no panic there - fix the I/O problem and IBM et all can sell massively multicore machines.
  
  As for the link, people who keep clicking on it and never learning from their mistake, or never figuring out how to log back in again are probably the sort who should be "beneath your current threshold" in typical Slashdot discussions.
  
  People like that would probably have use for multicore machines to:
  a) run all the malware they pick up from randomly clicking on stuff.
  b) transparently and automagically run stuff in many different virtual machines/environments to try to check if it is bad or not, before actually running it in a more "real" virtual machine.
  c) continuously run increasingly resource intensive editions of McAfee/Symantec and anti spyware stuff.
  
  So no panic there either :).
  --
  
  Too many replies beneath your current threshold
3. Re:No problems for servers by ultranova · 2008-03-11 11:27 · Score: 1
  
  As for desktops - you could just use Firefox without noscript, after a few days the machine will be using all 80 CPUs and memory just to show flash ads and other junk ;).
  
  Except that Firefox is apparently singlethreaded, so in reality it would take minutes to render a single page while 79 of the cores would idle.
  
  --
  Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Power of 2 by AlpineR · 2008-03-11 01:38 · Score: 1

The experts in these articles keep forecasting processors with powers-of-2 cores (32, 64, 128). Is there a reason that the number of cores can't be some value in between, like 6?

And is the doubling time really 18 months? Aren't we due for the Intel Core 4 Quad already? If the doubling is slower, then I'd like to see the in-between core counts come sooner rather than wait for the next power of 2.
1. Re:Power of 2 by Jesus_666 · 2008-03-11 06:12 · Score: 1
  
  Aren't we due for the Intel Core 4 Quad already?
  Seeing as it was released back in 2006, I'd say yes.
  
  --
  USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
Re:How is heterogenous CPU different to separate G by TheRaven64 · 2008-03-11 01:40 · Score: 1

Heterogeneous cores is the generic case. A machine with a CPU and a GPU is a special case of a heterogeneous cores machine. You may also have crypto coprocessors, DSPs for sound and other specialised cores in your computer (if your computer is a handheld, you almost certainly do).

--
I am TheRaven on Soylent News
Homogeneous Heterogeneous fine by Patrick_Meenan · 2008-03-11 01:45 · Score: 1

As long as there are only a few different "Heterogeneous" configurations then it shouldn't be that bad. We essentially already have that with GPU's and to a lesser extent Physics Accelerators. What will really be a nightmare is if they start making the cores modular and we start getting hundreds of different configurations. I can just see the game requirements on the side of a box now - must have a CPU with 2 general purpose cores, 4 vector cores, 2 VC1 cores and 3 vertex units.

Oh, and anyone who thinks software development cycles are long and expensive now, just wait until the code needs to be written for and tested against every possible combination :-)
Re:[OT] Cell Programming by everphilski · 2008-03-11 01:54 · Score: 1

I'm interested in cell programming, I do scientific computing (CFD) and I have a code that is highly parallelizable, in C++, and I've often thought after this semester about possibly porting to to the PS3 for kicks. But what you say is kind of discouraging. Would you recommend even trying?

Also, can you point out any good references you used to learn? Beyond a few intro docs from IBM, I'm pretty clueless. I'd appreciate it, thanks.
Let's go with: unable to see the difference by justthinkit · 2008-03-11 02:08 · Score: 1

Tom's Hardware has a great web page just for cpu doubters. It allows you to choose the two cpus to compare, the task you are wondering about and then you get an exhaustive list of how fast several dozen processors would be, with your chosen two in red.

I have been using the charts to compare various CPUs that PCClub.com offers across their various families of computers. The Q6600 looks to be, as they themselves said when I visited the store, the sweet spot as of March, 2008.

I think you should realize that quad+ cores are not going to offer as visible a performance increase as you are used to. In other words, unless you dig out three or four stopwatches and run your test tasks on both single and multi-cpu setups, you aren't going to see the differences.

For what it is worth, I think your mistake is when you say you are looking for "*major*" differences. These are not at all necessary for the user experience to be improved. Try typing on a 110cps teletype into a mainframe to see what I mean -- plenty of raw power goes wanting because the PBKAMainframe. Multiple cores reverse the situation -- average core speed is often lower, but the average task no longer pulls down the whole system.

Consider the following trivial test I did at the store. Open a cmd.exe window, change to root directory, type "DIR /S" and press enter. On my 3.2gHz HT Pentium, I get 100% cpu on at least one of the cores (I can't test my own system at the moment, sorry). So, my fan kicks in, my ears get deafened and I don't like it. On the Q6600, two of the cpus get zero load change, and two get a 35-50% increase. So, thermally, there is little to no change (.LT. 25% increase in overall cpu usage) -- half or less of my system -- and so the fan may not even kick in (I didn't hear it in the store, anyway), I don't get my ears blasted, and I am noticeably happier (because I am a simpleton who only types DIR /S all day long).

--
I come here for the love
Sequential vs Parallel by demallien2 · 2008-03-11 02:09 · Score: 1

Unless I'm missing a major class of computational problem, we have only two types of problem in computing at the moment.

1) We have very computationally intensive tasks, which are inevitably trivial to parallelise. Examples include calculating graphics, running sims, and compiling. These tasks all involve many, many small tasks that can be completed independantly. Ffor example, we can compile individual source files independantly of each other, we can calculate the dot product of a vector with each element calculated in parallel, etc.

2) And then we have the rest. Messy, real world types of problems that are sequential by their very nature. I've tried, but I just can't find any real-world example of this type of problem that is very computationally intensive. Maybe somebody else can think of one? Video compression? But even there, we insert full frames every second or so, to limit propagation errors... Databases? I know that transactions impose an order, but even there the limit seems to be only on accessing one record/table simultaneously by multiple clients.

When I read discussions on massively parallel computing, people always talk about how hard it is. And it's true that parallelising problems of type 2 is difficult. But I just can't think of any real world use case, which leads me to conclude that we don't actually have a problem at all. In other words, Much Ado About Nothing, move along, nothing to see here...
1. Re:Sequential vs Parallel by TheLink · 2008-03-11 05:55 · Score: 1
  
  While you can parallelize a bzip2 of a file, it's hard to do it when you try to bzip2 STDIN ;).
  
  If you run a perl/python program doing the same thing as a C program, you often use 20x more CPU.
  
  That said, I'd rather they make fast IO cheaper - mass storage has had very poor random seek times for decades. When I search through a disk, no matter how many CPUs I have, the drive is too slow. Same for when I run stuff for the first time.
  
  If the hardware people make mass storage I/O much faster and still affordable, I'll find things to do with the multicore CPUs.
  --
  
  Too many replies beneath your current threshold
Task Manager / Top by tjstork · 2008-03-11 02:14 · Score: 1

Both show how many active processes on a box? Computers don't run a single thing any more, they are already federations of dozens of concurrently running programs, both in Unix and in Windows. Multicore makes the whole desktop feel crisper and faster.

--
This is my sig.
1. Re:Task Manager / Top by Jesus_666 · 2008-03-11 06:23 · Score: 1
  
  Only if you have many CPU-hungry processes. I have a dualcore CPU and dozens of running processes, but system load barely ever exceeds 1. With the load usually below 1, adding more cores won't do much because no processes are actually waiting.
  
  --
  USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
Bah Humbug by Binder · 2008-03-11 02:25 · Score: 1

"to make effective use of multicore hardwre today, you need a PhD in computer science."
BAH!

I don't have a PhD and yet I can program multi core. Threading, message passing, heterogeneous or homogeneous. What is really required is thought. Now I realize a good part of the population is opposed to thought... but sometimes you just gotta bite the bullet.

The most basic skill in computer science is breaking a problem down into smaller pieces.

If you have multiple processors then you simple break your problem down into pieces for each core. Granted some problems don't decompose well, but many do.
It's just another processor, sheesh by mlwmohawk · 2008-03-11 02:31 · Score: 1

These guys are freaking out over nothing. As long as the cores are not all tied to a specific process (which would be STUPID), then the current computing models will work fine.

In other words, do you run one program on your system? No, on a slow day I have about 150 concurrent processes on my desktop. On my web servers and database servers, I have a lot of processes competing for CPU. The only thing that will have to happen is a modification to the linux process and scheduling code to accommodate many more processors than the SMP code currently does. Everyone focuses on one application, but everyone runs a multi-user multi-processing system these days. Multiple cores removes CPU contention! We *already* have systems that inherently use multi-core CPUs.

It looks more like they're worried about some fictional single application benchmark where they can measure throughput. That ship has sailed. As it is, processors are as fast as they can practically get (or need to get with RAM and I/O speeds) without a breakthrough or two. (That's why the computer sales slump) There is little Intel or AMD can do to speed up the processing of a single threaded application. So, how do you compete on the practical speed of multi-core CPUs? That's what they are really worried about.
1. Re:It's just another processor, sheesh by ThreeIfByAir · 2008-03-11 03:19 · Score: 1
  
  And actually, Linux can run on dozens of cores simultaneously already. I've done it. I'm not saying that it's necessarily doing it in the best way it possibly could, but it does get the job done, and it's pretty smart about recognizing things like process affinities, which become even more important when you go to massive multicore.
Re:How is heterogenous CPU different to separate G by tomalpha · 2008-03-11 02:33 · Score: 1

Ok, but how are your examples different to a cryto-offload board (e.g. SSL accelerator, that's just really a single core on a PCI board), specialist sound-card with DSP processor etc (same again)?
If I were you by RealErmine · 2008-03-11 02:35 · Score: 1

I would certainly listen to what Chuck Moore has to say on the topic of CPU trends. For one thing, his name is a combination of Chuck Norris and Gordon Moore. How can you be any more of an expert than that? I expect his company to put his ideas into practice soon. Expect to see the AMD "Roundhouse" architecture take the computing world by storm.

--
Dewey, you fool! Your decimal system has played right into my hands!
I'll take 80 cores just for the processes by rawdot · 2008-03-11 02:45 · Score: 1

I've got 106 processes running on my XP laptop and I'm not even doing work right now. (At which point add another good dozen++ processes.)

And lots of these processes are already multi-threaded. (Including most of the tools and frameworks I use and some of the code I'm writing.)

So even though some of this sounds theoretical, I don't think I even need any kind of software upgrade to benefit from having an 80 core processor today just for scheduling processes. (Though, the memory bandwidth issues others have pointed out would need some attention.)

Cheers,
Richard
OT: Re:+1 Optimistic by CBravo · 2008-03-11 03:19 · Score: 1

Not my server, of course ;-)

--
nosig today
the model is dead... by tempest69 · 2008-03-11 03:21 · Score: 1

its a matter of time.. cores on cpu should explode soon. The kicker is that per transistor it makes sense to have simpler cores.. so each core isnt as fast per clock cycle.. like a pentium vs a 486, 3x the transistors for a 1.5 clock cycle efficency boost.
so max crunching will occur on a whole beastload of weak processors.. if we can use them in a respectable fasion.
oh, most software doesnt run well as a single thread, otherwise it wouldnt take so bloody long for the address bar to keep up with my typing when I get stuck at some god forsaken web page that I really want to get off my screen because I mistyped microsoft or google. watching the letters slowly come up one at a time is horrible on a core2duo when IE is the only app open. apps need some massive re-engineering.
1. Re:the model is dead... by springbox · 2008-03-11 03:27 · Score: 1
  
  IE
  
  There, I found your problem. I run a lesser system than yours and never experienced the same problem with slow address bar typing in either Firefox or Explorer.
Re:[OT] Cell Programming by mchanaud · 2008-03-11 03:37 · Score: 1

I'm not the writer of the original post, but I already used a Cell for CFD programming, and I confirm that it's a pain. If your code highly depends on memory access (such as many CFD codes), you will face huge amount of problems :
- the SPE have a very limited memory space, so you'll have to constantly move data between SPE and PPE.
- synchronizing SPEs is sometimes hard
- Moving chunks of memory means align correctly your data (believe me, it's not that simple).
- and don't forget that your code and data share the same memory space : if your code is large, then small amount of data will fit in memory (-> even more communications)
If the code complexity (operations per chunk of data that fits in SPE memory) is less than O(n^2), the speedup will be very poor (sometimes less than 1...) because moving data is very expensive and it's worse when using double precision.

I may be wrong, but it seems that CFD codes are not the best ones to port on Cell. Give it a try : programming on Cell is very interesting and sometimes funny.
morality core by loshwomp · 2008-03-11 03:39 · Score: 1

a multiplicity of cores optimized for various tasks

I guess I'm okay with it as long as they include the morality core. Too much trouble, otherwise.
Tomorrow needs what today doesn't have by AlpineR · 2008-03-11 03:40 · Score: 1

Most software today runs fine as a single thread anyway.

That software runs fine as a single thread because if it didn't then you wouldn't be running it. There are lots of things that computers could do that 99% of people don't do because they run too slow. Some of those things that we don't do are limited by CPU and are highly parallelizable.

We computer geeks seem to have found lots of fun and useful things to do with 2 GHz processors, 2 GB of RAM, 160 GB hard drives, and 1.5 Mbps networking. Nobody needed those capabilities before, but they seem essential now. If you multiply that processor by 80 then we will write software so cool and useful that 99% of people will need it.
8087 - Been There Done That by Nom+du+Keyboard · 2008-03-11 03:53 · Score: 2, Insightful

Others disagree on the ground that heterogeneous processors would be too hard to program.

Been there, done that, already. The 8087 and its 80x87 follow-on co-processors were exactly that. Specialized processors for specific tasks. Guess what? We managed to use them just fine a mere 27 years ago. DSP's have come along since and been used as well. Graphic card GPU's are specialized co-processors for graphic intensive functions, and we talk to them just fine. They're already on the chipsets, and soon to be on the processor dies. I don't think this is anything new, or anything that programming can't handle.

--
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
One possible approach... by argent · 2008-03-11 03:56 · Score: 1

Heterogeneous/distributed microkernels. Each process runs on a core that supports its instruction set, and communicates with other processes using lightweight messages. Could see QNX suddenly becoming much more important.
We Need a New Software Model by MOBE2001 · 2008-03-11 03:58 · Score: 1

The problem is when you have a single CPU-intensive task, and you want to split that over multiple processors. That, in general, is a difficult problem. Various solutions, such as functional programming, threads with spawns and waits, etc. have been proposed, but none are as easy as just using a simple procedural language.

Yes, the reason is that we are trying to fit a square peg into a round hole. Multithreading was not originally intended to be the basis of a parallel programming model but as a mechanism to run sequential (not parallel) programs concurrently. What is needed is an inherently parallel model where parallelism is implicit and sequential order is explicit. Read Parallel Programming, Math and the Curse of the Algorithm and Nightmare on Core Street for more.
Problem in search of a solution? by LuxMaker · 2008-03-11 04:01 · Score: 1

I am surprised no one has mentioned this. When we have mutliCPU and multiGPU systems it should be somewhat trivial for motherboard makers to add another mouse and keyboard port. Then with the right OS, multiple users can use the same machine.

--
I regret that I only have one mod point to give per post.
interval instruction sets by 192939495969798999 · 2008-03-11 04:13 · Score: 1

IO bandwidth breakup -- that might be easy, just switch all the (sometimes heterogeneous) chips over to all-interval math based instruction sets, then allow the embarrassingly parallel nature of intervals to divide and conquer your workload. Obviously if only 4 chips out of 80 are working at 100%, then comes the harder part: analyze the specific situation and break up the instructions "where appropriate". If you can't break it up any farther quick and easy, that's ok. If designing parallel algorithms was easy, we'd be done with this already.

--
stuff |
Do it in the hardware? by codehoser · 2008-03-11 04:14 · Score: 1

I'm not convinced that this can't be tackled in hardware (probably because I don't know anything about hardware). Stick with me for a minute though.

I imagine a single CPU core as having a vertical pipeline for calculation. What we're trying to do is take (let's say) four of these vertical pipelines and figure how best to use them, simultaneously in software. Except for the embarrassingly parallel problems, this is quite hard.

Can't we do _something_ in hardware to sort of stack those vertical pipelines? A single branch of execution (I'm not sure if that makes sense) would travel all the way up through all four cores before completing.

The bottom line is that I don't think it's feasible to expect developers to write maximally efficient code for computers that have a dozen cores. It makes more sense to have those dozen 2GHz cores to appear as one 24GHz core to the OS due to the way the hardware is created. I realize it would only operate close to 2GHz for single operations, but would scale up toward 24GHz when multi-tasking.

Excuse my ignorance of the subject matter. My intent is only to contribute something to the conversation.

Kevin
What not to do by Animats · 2008-03-11 04:27 · Score: 1

Most of the bad ideas in multiprocessing have already been explored in supercomputers. From the nCube to the Connection Machine to the BBN Butterfly, we have a good idea of what doesn't work.
We know three things that work - clusters, symmetrical shared memory multiprocessors. and highly parallel graphics-type engines. Everything else that's been tried, from hypercubes to perfect shuffle machines, has been a dud.
Clusters make the computing world go round. All big server farms are clusters of relatively independent machines communicating over I/O channels. "Web services" are provided by clusters. So we know that works. The job of the server designer is to make clusters cheaper, smaller, and less power-hungry. There's a ready market for hardware that does that. With Google building data centers in former aluminum smelter locations just to get cheap power, there's no question that this is a very real problem.
Machines within a cluster can be symmetrical multiprocessors. That works fine. Asymmetrical multiprocessors are usually a pain. There's a long history of that idea in large computers, and they've consistently been disappointing. In clusters, each CPU has plenty of memory and its own private disks. Intercommunication is limited and slow, yet not usually the bottleneck. There are faster interconnection schemes, like Infiniband, but most clusters stick with some form of Ethernet.
It's worth noting that while the Cell gets attention, the XBox 360 is more successful, and it's quite conventional. It's a 3-CPU shared memory symmetrical multiprocessor (PowerPC), with a conventional GPU (nVidia) on the back end. On the Cell, brilliant people (I know some of them) struggle to cram problems into the architecture . On the XBox, game designers develop games. Weird architecture hurts your time to market.
The real choke points today are CPU-to-memory and memory-to-disk. We may see more memory move to the CPU chip, in the form of larger caches. We may see machines with a modest number of cores and main memory on the CPU chip. This is easy to design and improves memory access times. When a gigabyte or so can be crammed onto the CPU chip, this will look like a good option for desktop machines. One big chip will be the whole computer. That's the low-end PC of the near future.
As non-volatile memory, ("flash" and its friends) becomes cheaper, we'll face a new architectural challenge. To date, such memory has usually been treated as a fast disk. But this is suboptimal. Flash is near random access, but we're not using it that way. We need a new way to talk to flash memory, something that has file system like protection, doesn't require OS intervention, and has finer granularity than disk blocks. An interesting concept would be a flash memory/CPU combo optimized for running SQL-type databases.
Hetereogeneous is the key word! by Terje+Mathisen · 2008-03-11 04:47 · Score: 2, Interesting

It has been quite obvious to several people in the usenet news:comp.arch newsgroup that the future should give us chips that contain multiple cores with different capabilites:

As long as all these cores share the same basic architecture (i.e. x86, Power, ARM), it would be possible to allow all general-purpose code to run on any core, while some tasks would be able to ask for a core with special capabilites, or the OS could simply detect (by trapping) that a given task was using a non-uniform resource like vector fp, mark it for the scheduler, and restart it on a core with the required resource.

An OS interrupt handler could run better on a short pipeline in-order core, a graphics driver could use something like Larrabee, while SPECfp (or anything else that needs maximum performance from a single thread would run best on an Out-of-Order core like the current Core 2.

The first requirement is that Intel/AMD must develop the capability to test & verify multiple different cores on the same chip, the second that Microsoft must improve their OS scheduler to the point where it actually understands NUMA principles not just for memory but also cpu cores. (I have no doubt at all that Linux and *BSD will have such a scheduler available well before the time your & I can buy a computer with such a cpu in it!)

So why do I believe that such cpus are inevitable?

Power efficiency!

A relatively simple in-order core like the one that Intel just announced as Atom delivers maybe an order of magnitude better performance/watt than a high-end Core 2 Duo. With 16 or 80 or 256 cores on a single chip, this will become really crucial.

Terje

PS As other posters have noted, keeping tomorrow's multi-core chips fed will require a lot of bandwith, this is neither free nor low-power. :-(

--
"almost all programming can be viewed as an exercise in caching"
Add debug support to the silicon by skeptictank · 2008-03-11 05:32 · Score: 1

Use some of the extra silicon space to add support for JTAG or some other on-chip debug interface. Add performance monitoring registers and register stacks where running tasks can deposit their ID and other state information. I would much rather develop on a 28 core processor with 4 core worth of silicon devoted to debug circuitry than on a 32 core processor with no debug hardware support.
I know everyone want to develop a new language or model that will make parallel programming easy and cheap. Maybe they will succeed where everyone else has failed. Adding increased support for development and debug tools to the silicon isn't sexy, but it will have a real impact when it comes to making parallel software development cheaper and quicker.
this is a good thing by bugi · 2008-03-11 05:37 · Score: 1

The common way we use threads today is broken. It's far too easy to deadlock them, for instance. The coming explosion of cores, heterogeneous or homogeneous, gives us the opportunity to learn that there are other concurrency models.

See "The Problem with Threads" in Spectrum, May 2006 for a primer.

Then go crack out a PS-300 (homogeneous example) manual if yours has not yet crumbled to dust. Or an Amiga (heterogeneous example) manual, if you must. Those two machines got it right (mostly). The PS-300 was too easy to break via injudicious use of a clock data source, but demonstrated the rendezvous model quite well.
Threaded Scripting Languages by justinchudgar · 2008-03-11 05:48 · Score: 1

One of the great things about Linux/Unix is that it is really easy to write quick, simple scripts to accomplish little tasks as they occur. Whether written for SH/BASH/etc., for PHP, Python, Perl or what have you, a few lines of code can provide a time or labor saving solution to a sysadmin or skilled end-user. This is one of the things that has encouraged me to convert more and more machines from Windows Server to Linux. These scripts, however, are almost impossible to easily and quickly write in a way that leverages multiple cores. Some interpreters do not support multi-threading and others have funky threading implementations that do not seem to be of much use aside from handling asynchronous IO.

Though I studied programming years ago (late '80s and early '90s), I am far from a skilled programmer. I do, however, have enough of a grasp of the subject to be able to create purpose-specific scripts to make my life as an admin easier or to solve situationally unique problems. Since they often are used to automate repetitive tasks, they tend to have a good degree of parallelism by nature.

I have spent significant time Googling and reading online docs; and, I have not found a reasonably performant threading implementation that even remotely maintains the ease of coding that non-threaded scripts have. While I know that most of this discussion is focused on the software created by developers for distribution; I have a suspicion that having a multitasking script interpreter that is as easy for admins to use as what we have now would greatly improve server performance.

After all, if there are a few poky script interpreters hogging a few cores, even the best optimized daemons will not be able to work to their potential.

--
WARNING: Smoking this sig may cause lowered IQ, insanity or short term memory loss. It is also really bad for your monit
Usage model by John+Bayko · 2008-03-11 05:55 · Score: 1

The usage model for heterogeneous processors is not that difficult. Graphics processors aren't far from this soft of thing already. Any specialised function can just be implemented as a driver - to install a new service that uses a specialised CPU you just install a the driver for it, and access it through standard I/O calls or an exposed API. Without the specialised processor, you'd emulate the slower function on the main CPU (or one of them) using a slower compatibility driver.
Compatibility, flexibility, ease of use, no problem.
How to use so many cpu's by John+Sokol · 2008-03-11 06:03 · Score: 3, Insightful

Back in 2000 I realized that 50 Million transistors of 4004 the first processor ever created, would out perform a P4 with the same transistor count done in the same fab running at the same clock rates. it would be over 10x faster I work out. But how to use such a device?
I had been working with a 100 PC cluster of P4 based systems to do H.264 HDTV compression in realtime. I spread the compression function across the cluster using each system to work on a small part of the problem and flow the data across the CPU's.

Based on this I wanted to build an array of processors on one chip, but I am not a silicon person, just software, driver and some basic electronics. So I looked at various FPGA cores, Arm, MIPS, etc. Then I went to a talk giving by Chuck Moore, author of the language FORTH. He had been building his own CPU's for many years using his own custom tools.

I worked with Chuck Moore for about a year in 2001/2002 on creating a massive multi core processor based on Chucks stack processor.

The Idea was instead of having 1,2 or 4 large processor to have 49 (7 * 7) small light but fast processors in one chip. This would be for tacking a different set of problems then your classic cpus'. It wouldn't be for running and OS or word processing, but for Multimedia, and cryptography, and other mathematic problems.

The idea was to flow data across the array of processors.
Each processor would run at 6Ghz, with 64K word of Ram each.
21 Bit wide words and bus (based off of F21 processor)
this allows for 4x 5bit instructions on a stack processor that only has 32 instructions.
Since it's a stack processor they run more efficiently. So in 16K transistors, 4000 gates,
the F21 at 500 Mhz performed about the same as a 500Mhz 486 with JPEG compress and decompress.
With the parallel core design instead of a common bus or network between the processors there would only be 4 connections into and out of each processor. These would be 4 registers that are shared with it's 4 neighboring processors that are laid out in a grid. So each chip would have a north, south, east and west register.

Data would be processed in whats called a systolic array, where each core would pick up some data, perform operations on it and pass it along to the next core.

The chips with a 7x7 grid of processors would expose the 28(4x7) bus lines off the edge processors, so that these could be tiled into a much larger grid of processors.

Each chip could perform around 117 Billion instructions per second at 1 Watt of power.

Unfortunately I was unable to raise money, partly because I couldn't' get any commitment from Chuck.

below is some links and other misc information on this project. Sorry it's not better organized.
This was my project.

---------
http://www.enumera.com/chip/
http://www.enumera.com/doc/Enumeradraft061003.htm
http://www.enumera.com/doc/analysis_of_Music_Copyright.html
http://www.enumera.com/doc/emtalk.ppt

--------
This was Jeff foxes independent web site, he work on the F21 with Chuck.

http://www.ultratechnology.com/ml0.htm

http://www.ultratechnology.com/f21.html#f21
http://www.ultratechnology.com/store.htm#stamp

http://www.ultratechnology.com/cowboys.html#cm

------
http://www.colorforth.com/ 25x Multicomputer Chip

Chucks site. 25x has been pulled down, but it's accessible on archive.org.
http://web.archive.org/web/*/www.colorfo

--
I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
1. Re:How to use so many cpu's by zergworld · 2008-03-11 12:36 · Score: 1
  
  Well, here's from a simpleton's point of view. When it comes to, say, http request/response processing, there is a stage that pretty much has to be single-threaded, and to generalize, control logic tends to be procedural/single-threaded/imperative, since the order of operations are necessarily so. I'm not sure how such an operation can be divided into parallel tasks.
  
  However, I can see how the control thread might be a subscriber to multiple services that are not run in a serial fashion. In the case of a single database lookup, there's no point, right? But perhaps there is opportunity in the case of a services mashup.
2. Re:How to use so many cpu's by John+Sokol · 2008-03-11 14:06 · Score: 1
  
  With things like http request/response processing and Data Base is really more of data shoveling. Find the data, and dig it up and send it. Memory , Disk and Network throughput are the limiting factors not CPU number crunching.
  
  > here is a stage that pretty much has to be single-threaded
  
  http and DB is one single thread for each operation, but deal with many simultaneous operations handled in parallel.
  SMP/NUMA and other designs with multiple cpu's on a shared memory bus work very well for those applicaitons, but in that case each CPU handles one thread at a time, so each CPU allows for more parrelel threads to run.
  Each thread gets to share the memory and file system cache and internal structures.
  This only apply for a threaded model of programming.
  
  Personally I am a big fan of loosely coupled parallel processors, More like Beowulf clusters or the way google architecture is done.
  In that case http requests are split out across many boxes, that don't share memory and don't even need much communication to respond to most requests.
  This also works well for DB/HTTP, and isn't threaded.
  
  But what about number crunching? There are certain classes of problems that must be run sequentially, but for most people not working on the deeper understanding of the universe, this doesn't apply.
  The most common case for consumer level parallelism is Multimedia. Video, Audio, graphics rendering, compression, cryptography and maybe some day, AI.
  In these cases, arrays of small super fast processors would really help.
  
  --
  I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
I thought of it first. by kcdoodle · 2008-03-11 06:31 · Score: 1

The ideal CPU would be designed for Linux.

It would have several dozen really small CPUs that are Linux commands/daemons literally burned into the chip.
So, much of the operating system would be hardware-based (maybe EEPROM microcode) that does much of the "core guts" of the Linux kernel (or maybe GCC libraries?).
The rest of the chip could be two or four X86 type multi-purpose CPUs.

A Microsoft CPU chip could use the same idea, but who would want it? (Winmodems, etc already have)

--

- I live the greatest adventure anyone could possibly desire. - Tosk the Hunted
1. Re:I thought of it first. by Fulcrum+of+Evil · 2008-03-13 10:48 · Score: 1
  
  What are you talking about? I'd hate to have to change hardware just for an OS upgrade.
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Multicore -- Microkernel -- Greater Security! by StCredZero · 2008-03-11 06:35 · Score: 1

Multicore machines could solve a big problem with microkernel architectures -- high context switch costs. If you lock down the microkernel to one of 8 cores -- let it monopolize the core -- then there is no context switch cost! You could then use a microkernel to implement Capability security architectures, which can provide mathematically provable security!

http://video.google.com/videoplay?docid=1762847950860111011
Do you know what NP means? by Inoshiro · 2008-03-11 07:39 · Score: 1

"However, what multiple cores might do is enable previously impractical tasks to be done on modest PCs. Things like NP problems, optimizations, simulations."

Like hell. NP means non-polynomial -- exponential growth. This means if you have a problem with 2 items in it, it takes 4x the effort of 1 item. 4 items takes 16x the effort. 8 items takes 256x the effort. Want to solve a problem like the travelling salesman problem? It's trivial if you visit one or two cities. However, were you to want to visit the 30,000 or so cities in the US, you're looking at something like 30,000 to the power of 30,000 things to examine (type it into a bc in a terminal -- you might want to time how long it takes to print a number that large). Having 2 CPUs is not going to solve that any faster than having 1,000 CPU cores in a box would -- you need an exponential speedup, which means either a new algorithm, or quantum computing. That, or patience to see if the universe ends in heat death or a big crunch before you get your answer.

Optimized scheduling and goods flow (with more than 2 restrictions) is NP as well. You can approximate NP solutions with heuristics and clever algorithms, even doing some fancy work with stats and running approximations in parallel to get arbitrarily close to a solution in some cases, but you're still not solving the NP problem :p

Simulations and particle physics could be done in parallel, potentially, but there are limits there as well. If you have a scene with 32 items, you do still need to synchronize their interactions (you can only split the parallelism so far). The reason we're seeing multiple cores is because it's getting harder to make a single CPU a significant amount faster. Multiple cores just means we spend less time faking multiple cores, and won't solve problems that require more than a linear speedup to become computable in reasonable or real-time.

Multicore is not a panacea. The trouble people whine about is because multithreaded programming is hard in a lot of languages and environments due to side effects. If you can convince people to switch over en-masse to Scheme, Haskell, SML, Prolog, you might solve this problem -- or at least make it less of a big deal. I doubt that's going to happen (but I'd love to be wrong).

--
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
I'm not sold yet - Was Heterogeneous Experience by PingPongBoy · 2008-03-11 08:09 · Score: 1

Cell was a fairly radical design departure. If IBM continues to refine Cell, and as more experience is gained, the challenge will likely diminish.

For one thing, IBM will likely add double precision floating point support.

The reason why x86 never died the thousand deaths predicted by the RISC camp is that heat never much mattered.

The Cell does sound pretty good, but for now I'll stick to Intel. You see, if you were to tell me about your heterosexual experiences with the Cell, I'd buy into it in a New York minute. That's where Intel is winning hands down.

--
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
A problem that isn't getting solved anytime soon by pcause · 2008-03-11 09:12 · Score: 2, Insightful

The issue of the lack of progress in creating tools to simplify multithreaded programming has been a topic of discussion for well over a decade. Most programmers just don't make much use of multithreading. They take advantage of multithreading because their Web server and database support it and the Web server runs each request in a separate thread. Even then, some activity is complex and is usually not further parallelized. Operating systems programmers and some realtime programmers tend to be good a multithreading and parallel programming, but this is a small minority of programmers. Heck, look st Rails, one of the most popular Web frameworks - it isn't thread safe!

Look at most people's screens. Even if they have multiple programs running, they tend to have the one they are working on full screen. Studies have shown that people who multitask are less efficient than people who do one job at a time. Perhaps we are not educated to look at problems as solvable in a parallel fashion or perhaps there is some other human based problem. Maybe like many other skills, being able to think and program in a multithreaded fashion is a talent that only a small fraction of the population has.

This "panic" isn't going away and there is NO quick fix on the programming horizon. The hardware designers can stuff more cores in the box, but programmers won't keep up. what can consume the extra CPU power are things like speech recognition, hand writing and gesture recognition and rich media. Each of the can run in its 1-4 cores and help us serial humans interact with those powerful computers more easily.
core dumped by Anonymous Coward · 2008-03-11 09:23 · Score: 0

80 cores on a chip? So what. That's just an exercise in integration. As the number of available transistors continues to increase, so it will be easier to shove more simple cores on the die.
If you want to see real innovation in this field look at Sun's Niagara (UltraSPARC T2 and T2) and ROCK. They are a bit more clever.
We have threads. by WarJolt · 2008-03-11 09:28 · Score: 1

When I first learned how to write a server I learned how to split off threads for each client. Trust me...If you have a degree in computer science you know how to do this. Many OS can schedule threads on each processor utilizing them all. The only people who are nervous are those guys with legacy software who didn't have the foresight to program their code using well known techniques.
1. Re:We have threads. by Westley · 2008-03-11 23:09 · Score: 1
  
  Using a thread per client is a debatable technique. It's usually more efficient to have fewer threads, and use asynchronous IO, CSP etc to use those threads efficiently. There's less context switching, to start with, as well as lower memory usage. It's not a problem for relatively low usage servers, but if you've got 10,000 clients each making a web request which may take a while (in terms of wall-clock time, but not in terms of CPU) to process then you probably don't want 10,000 threads, each with a 1MB stack.
  
  Asynchronous programming takes a while to get your head round, unfortunately.
2. Re:We have threads. by Fulcrum+of+Evil · 2008-03-13 10:51 · Score: 1
  
  This is true - 100 worker threads can service 10k http requests (generally) - but for small scale MT, 1 thread per job (with pooling) is simple to implement.
  
  --
  "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
About time by geekoid · 2008-03-11 11:45 · Score: 1

This is a good thing, hopefully a much needed new development systems will be they fallout from this 'panic'

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Re:[OT] Cell Programming by everphilski · 2008-03-11 13:24 · Score: 1

Very interesting, I appreciate the feedback. It's a finite element method, you can break the work into chunks for 90% of the code until you need to solve the system of equations at the end of the iteration. And even in the solver, to a point you can parallelize if you are careful. I'm not sure about the memory requirements, to be honest with you it's still in 2D, I'm working on breaking into 3D right now. (then adding in all the fun stuff like reacting flows, etc.)

What resources did you use to learn cell programming?

Thanks again for your insights.
It's easy to tell who wins by professorfalcon · 2008-03-11 16:59 · Score: 1

Whoever runs the fastest.
Re:[OT] Cell Programming by mchanaud · 2008-03-11 21:31 · Score: 1

Now your problem is to parallelize the linear system solver. Normally this task takes up to 90% of total execution time so it's a good candidate for running on SPEs. For the other 10%, leave it on the PPE. And don't forget : adding "fun" stuff increases the code size, which means less space for data on SPE.

Resources are available on IBM's developerWorks site (http://www.ibm.com/developerworks/power/cell/docs_articles.html, see also the forums some interesting issues are discussed) and on Barcelona supercomputing center (http://www.bsc.es/plantillaH.php?cat_id=326).
Re:[OT] Cell Programming by everphilski · 2008-03-12 01:42 · Score: 1

Do appreciate it.

I'm bogged down in school right now (working on my PhD... CFD, heat transfer, etc.) but hoping this summer/fall to do something a little more "fun". Have to do some research to see if this is it.

Thanks again.
Heterogenous cores are already here by Quattro+Vezina · 2008-03-12 04:32 · Score: 1

I work with Cavium Networks Octeon processors. These are 16-core MIPS beasts that are capable of running different OSes/applications on different cores. You can run Linux on a few cores, your TCP/IP stack on another four cores, a crypto engine on another core, etc.

--
I support the Center for Consumer Freedom
Multicore panics won't be perty... by descubes · 2008-03-13 10:03 · Score: 1

I guess it will start with:
Linux version 3.2.12-himp (buildmaster@hulk.build.redhat.com) (gcc version 6.41) #1 SMP Wed Nov 14 13:43:25 EST 2013
then we'll have something like:
4096 CPUs available, 131072 CPUs total Registering legacy COM port for serial console
Then the kernel will bring up the cores in question:
Total of 65536 processors activated (10299381.81 BogoMIPS). Processor redundancy driver claimed 32768 processors Warning: At least 32768 CPUs have failed the Heisenberg locality test
Finally, the panic itself will start with something like:
Oops: IRQ782 mapped to CPU 28311, already mapped to CPU 828 , check subspace coupling!!! swapper[4]: General Exception: Unmapped interrupt vector 0x8800, 48 [1] Pid: 4, CPU 6645, comm: swapper psr : 00001010085a6000 ifs : 800000046810cc18 ip : [a000000100008c40] Not tainted ip is at start_kernel_thread+0x0/0x40
Of course, we'll need bigger screen to display the whole series of stack traces (like 32768 processors trying to dump garbage on the console at the same time...) But apart from that, I don't think that multicore panics are going to be much of a problem.

--
-- Did you try Tao3D? http://tao3d.sourceforge.net