Intel Says to Prepare For "Thousands of Cores"

← Back to Stories (view on slashdot.org)

Intel Says to Prepare For "Thousands of Cores"

Posted by ScuttleMonkey on Wednesday July 2, 2008 @08:42AM from the viva-la-coding-revolucion dept.

Impy the Impiuos Imp writes to tell us that in a recent statement Intel has revealed their plans for the future and it goes well beyond the traditional processor model. Suggesting developers start thinking about tens, hundreds, or even thousand or cores, it seems Intel is pushing for a massive evolution in the way processing is handled. "Now, however, Intel is increasingly 'discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander,' [Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab] said. He says that the more radical programming path to tap into many processing cores 'presents the "opportunity" for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence.'"

638 comments

The thing's hollow - it goes on forever by stoolpigeon · 2008-07-02 08:42 · Score: 5, Funny

- and - oh my God - it's full of cores!

--
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
1. Re:The thing's hollow - it goes on forever by CDMA_Demo · 2008-07-02 08:47 · Score: 0, Offtopic
  
  If I was from Control, you'd already be a starchild...
2. Re:The thing's hollow - it goes on forever by Penguinisto · 2008-07-02 09:22 · Score: 1
  
  Actually, I think this is the one case where you could honestly say "...it's turtles all the way down!" and not get laughed at.
  /P
  
  --
  Quo usque tandem abutere, Nimbus, patientia nostra?
3. Re:The thing's hollow - it goes on forever by sconeu · 2008-07-02 10:05 · Score: 4, Funny
  
  No, not quite. It's CORES all the way down!
  
  --
  General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
4. Re:The thing's hollow - it goes on forever by Anonymous Coward · 2008-07-02 10:19 · Score: 0, Offtopic
  
  Where did this meme come from? Did it happen over the weekend or something?
  Explain your damn memes!
5. Re:The thing's hollow - it goes on forever by kdemetter · 2008-07-02 10:26 · Score: 3, Informative
  
  2001 : A Space Odyssey , by Arthur C. Clarke.
  Great book.
  
  --
  Slipping shoelaces ?
6. Re:The thing's hollow - it goes on forever by Maxo-Texas · 2008-07-02 10:49 · Score: 5, Funny
  
  Don't give up! Stay the cores!
  
  --
  She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
7. Re:The thing's hollow - it goes on forever by joto · 2008-07-02 11:11 · Score: 4, Funny
  
  You know, before they made it into a book, it was a perfectly good movie.
8. Re:The thing's hollow - it goes on forever by Anonymous Coward · 2008-07-02 11:37 · Score: 0
  
  I get the 2001 thing - it's one of my favorite movies, but what is "If I was from Control..." about?? The new "Get Smart" movie? Lame.
9. Re:The thing's hollow - it goes on forever by Joren · 2008-07-02 11:53 · Score: 2, Informative
  
  The "Control" meme is from Get Smart, which came out a week or two ago. So yes, it is pretty recent...unless you happen to have watched the series from the 60s.
  
  --
  -- Joren
10. Re:The thing's hollow - it goes on forever by Anonymous Coward · 2008-07-02 14:25 · Score: 0
  
  you fail at troll
11. Re:The thing's hollow - it goes on forever by Anonymous Coward · 2008-07-02 14:31 · Score: 0
  
  you fail at fail
12. Re:The thing's hollow - it goes on forever by Anonymous Coward · 2008-07-02 16:44 · Score: 0
  
  Epic.
13. Re:The thing's hollow - it goes on forever by dryeo · 2008-07-02 17:00 · Score: 5, Informative
  
  And before they made it into a movie it was an interesting short story. http://en.wikipedia.org/wiki/The_Sentinel_(short_story)
  If you'd like to read it, seems it is this PDF, http://econtent.typepad.com/TheSentinel.pdf
  
  --
  https://en.wikipedia.org/wiki/Inverted_totalitarianism
14. Re:The thing's hollow - it goes on forever by PacoSuarez · 2008-07-02 19:02 · Score: 1
  
  Why is this rated "funny"? The novel and the movie were developed at the same time and the novel was published *after* the movie was released.
15. Re:The thing's hollow - it goes on forever by alnapp · 2008-07-02 22:56 · Score: 1
  
  - and - oh my God - it's full of cores!
  Cor blimey
  
  --
  Get the EULA T-shirt
16. Re:The thing's hollow - it goes on forever by Tano · 2008-07-03 00:20 · Score: 1
  
  Yes, but, will it run Crysis ?
17. Re:The thing's hollow - it goes on forever by Impy+the+Impiuos+Imp · 2008-07-03 01:41 · Score: 1
  
  THANK YOU! The first one to figure out what the questioner was talking about.
  "Oh my god, it's full of stars/cores/Natalie Portman's pubic hairs!" hasn't been a meme that "what, developed over the weekend?"
  Some nerds ain't got no critical thinking ability...
  
  --
  (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
18. Re:The thing's hollow - it goes on forever by muhadeeb · 2008-07-03 12:15 · Score: 1
  
  I want a six pack of CORES!
Impiuos Imp? by InvisblePinkUnicorn · 2008-07-02 08:43 · Score: 0

Impiuos?! Bow down thee to the Gods of Grammar!!
1. Re:Impiuos Imp? by Penguinisto · 2008-07-02 09:08 · Score: 1
  
  Actually, it's "bow thee down..."
  
  Heretic.
  
  You'll burn for this.
  
  I'm calling them now...
  
  I'm getting a ringtone...
  /P
  
  --
  Quo usque tandem abutere, Nimbus, patientia nostra?
2. Re:Impiuos Imp? by cptnapalm · 2008-07-02 09:37 · Score: 1
  
  Perhaps he is Sigismund and above the rules of grammar.
3. Re:Impiuos Imp? by vaz01 · 2008-07-02 14:50 · Score: 1
  
  Invisble?! Bow down... wait, what does that have to do with grammar?
4. Re:Impiuos Imp? by InvisblePinkUnicorn · 2008-07-02 15:32 · Score: 1
  
  If you believe you can see the i, you shall see it, and behold all its glory.
5. Re:Impiuos Imp? by Impy+the+Impiuos+Imp · 2008-07-03 01:53 · Score: 1
  
  Yeah yeah. A decade ago I goofed it up but I had a 50 karma by that point so I kept plugging away.
  This is only the 2nd time anyone's noticed, and it took my first submitted story to get that much attention. My other one being the introduction of the new 5 blade razor lo these decades ago, rejected.
  In any case:
  > Intel Says to Prepare For "Thousand and Twenty Fours of Cores"
  Fixed it for ya!
  
  --
  (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
Not Sure I'm Getting It by gbulmash · 2008-07-02 08:44 · Score: 5, Insightful

I'm no software engineer, but it seems like a lot of the issue in designing for multiple cores is being able to turn large tasks into many independent discrete operations that can be processed in tandem. But it seems that some tasks lend themselves to that compartmentalization and some don't. If you have 1,000 half-gigahertz cores running a 3D simulation, you may be able to get 875 FPS out of Doom X at 1920x1440, but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?

--
Start a happiness pandemic
1. Re:Not Sure I'm Getting It by Delwin · 2008-07-02 08:46 · Score: 5, Informative
  
  Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
  
  Then you take the tasks that can be broken up over multiple cores (Ray Tracing anyone?) and fill the rest of your cores with that.
2. Re:Not Sure I'm Getting It by Mordok-DestroyerOfWo · 2008-07-02 08:47 · Score: 5, Funny
  
  My friends and I have lots of conversations about girls, how to get girls, how to please girls. However until anything other than idle talk actually happens this goes into the "wouldn't it be nice" category
  
  --
  "Never let your sense of morals prevent you from doing what is right" - Salvor Hardin
3. Re:Not Sure I'm Getting It by FinchWorld · 2008-07-02 08:51 · Score: 0
  
  My friends and I have lots of conversations about girls, how to get girls, how to please girls.
  And as you post on slashdot (yes, yes, I'm postin too) fail on all parts?
  
  --
  "I may be full of crap about this game, and I may be wrong, and that's fine." -Jack Thompson
4. Re:Not Sure I'm Getting It by zappepcs · 2008-07-02 08:54 · Score: 3, Interesting
  
  IANACS, but if your program structure changes a bit, you can process the two different styles of instructions in different ways, such that when the data needed from or to some sequential group of tasks is needed it is already there, sort of like doing things 6 steps ahead of yourself when possible. I know that makes no sense on the face of it, but at the machine code basics of it, by parsing instructions this way, 5 or 6 operations from now you will need register X loaded with byte 121 from location xyz, so while this core plods through the next few instructions, core this.plus.one prefetches the data at memory location xyz to register X.... or something like that. That will break the serialization of the code. There are other techniques as well, and if written for multicore machines, the program machine code can be executed this way without interpretation by the machine/OS.
  There are more than one type of CPU architectures, and principles of execution vary between them. Same for RISC CISC. I think it is likely that the smaller the instruction set for the CPU, the more likely that serialized tasks can be shared out among cores.
  
  --
  Support NYCountryLawyer RIAA vs People
5. Re:Not Sure I'm Getting It by CDMA_Demo · 2008-07-02 09:03 · Score: 4, Funny
  
  My friends and I have lots of conversations about girls, how to get girls, how to please girls.
  What, haven't you guys heard of simulation?
6. Re:Not Sure I'm Getting It by zarr · 2008-07-02 09:04 · Score: 2, Informative
  
  How do those get sped up if you're opting for more cores instead of more cycles?
  Algorithms that can't be parallelized will not benefit from a parallel architecture. It's really that simple. :( Also, many algorithms that are parallelizable will not benefit from an "infinite" number of cores. The limited bandwith for communication between cores will usually become a bottleneck at some point.
7. Re:Not Sure I'm Getting It by Talennor · 2008-07-02 09:05 · Score: 4, Interesting
  
  While prefetching data can be done using a single core, your post in this context gives me a cool idea.
  Who needs branch prediction when you could just have 2 cores running a thread? Send each one executing instructions without a break in the pipeline and sync the wrong core to the correct one once you know the result. You'd still have to wait for results before any store operations, but you should probably know the branch result by then anyway.
  
  --
  
  //TODO: signature
8. Re:Not Sure I'm Getting It by ViperOrel · 2008-07-02 09:05 · Score: 3, Insightful
  
  Just a thought, but I would say that 3 billion operations should be enough for just about any linear logic you could need solved. Where we run into trouble is in trying to use single processes to solve problems that should be solved in parallel. If having a thousand cores means that we can now run things much more efficiently in parallel, then maybe people will finally start breaking their problems up that way. As long as you can only count the cores up on one hand, your potential benefit from multithreading your problem is low compared to the effort of debugging. Once you have a lot of cores, the benefit increases significantly. (I see this helping a lot in image processing, patern recognition, and natural language... not to mention robotics and general AI...)
9. Re:Not Sure I'm Getting It by pla · 2008-07-02 09:10 · Score: 5, Insightful
  
  I'm no software engineer [...] but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?
  
  As a software engineer, I wonder the same thing.
  
  Put simply, the majority of code simply doesn't parallelize well. You can break out a few major portions of it to run as their own threads, but for the most part, programs either sit around and wait for the user, or sit around and wait for hardware resources.
  
  Within that, only those programs that wait for a particular hardware resource - CPU time - Even have the potential to benefit from more cores... And while a lot of those might split well into a few threads, most will not scale (without a complete rewrite to chose entirely different algorithms - If they even exist to accomplish the intended purpose) to more than a handful of cores.
10. Re:Not Sure I'm Getting It by zappepcs · 2008-07-02 09:18 · Score: 3, Interesting
  
  Indeed, and any tasks that are flagged as repeating can be repeated on a separate core from cores executing serial instructions such that IPC allows things that happen serially to happen coincident with each other. A simple high level example is reading the configuration for your process that may change at any time during your process due to outside influences. Let the reading of that happen out of band on the processing as it is not part of the sequential string of instructions for executing your code. That way config data is always correct without your serially oriented code needing to stop to check anything other than say $window.size=? such that it's value is always updated by a different core.
  Sorry if that is not a clear explanation. I just mean to say that since most of what we do is serially oriented, it's difficult to see how at the microscopic level of the code, it can be broken up to parallel tasks. A 16% decrease in processing time is significant. Building OS and compilers to optimize this would improve execution times greatly, just as threading does today. If threads are written correctly to work with multiple cores, it's possible to see significant time improvements there also.
  
  --
  Support NYCountryLawyer RIAA vs People
11. Re:Not Sure I'm Getting It by mweather · 2008-07-02 09:19 · Score: 4, Insightful
  
  Pleasing a woman is easy. Give her your credit card.
12. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 09:21 · Score: 5, Insightful
  
  That is what most current processors do and use branch prediction for. Even if you have a thousand cores, that's only 10 binary decisions ahead. You need to guess really well very often to keep your cores busy instead of syncing. Also, the further you're executing ahead, the more ultimately useless calculations are made, which is what drives power consumption up in long pipeline cores (which you're essentially proposing).
  In reality parallelism is more likely going to be found by better compilers. Programmers will have to be more specific about the type of loops they want. Do you just need something to be performed on every item in an array or is order important? No more mindless for-loops for not inherently sequential processes.
13. Re:Not Sure I'm Getting It by sexconker · 2008-07-02 09:21 · Score: 2, Interesting
  
  So instead of a pipeline you have a tree.
  Great, except for the fact that it's incredibly inefficient and the performance gain is negligible.
  Quantum computers will (in theory) allow us to do both at once.
14. Re:Not Sure I'm Getting It by jfim · 2008-07-02 09:26 · Score: 1
  
  Who needs branch prediction when you could just have 2 cores running a thread? Send each one executing instructions without a break in the pipeline and sync the wrong core to the correct one once you know the result. You'd still have to wait for results before any store operations, but you should probably know the branch result by then anyway.
  This is actually how 3D cards used to do branching(not sure nowadays, though). Basically you would compute both outputs and do a linear interpolation between both. Since you would use either 0 or 1 as the linear interpolation factor, you would end up with either result A or B.
  IIRC this is how the old nVidia register combiner ops worked as well, though I haven't really used them.
  I assume it's horribly inefficient in terms of performance/watt, though I might be wrong.
  
  --
  Jean-Francois Im's blog
15. Re:Not Sure I'm Getting It by jandrese · 2008-07-02 09:28 · Score: 4, Insightful
  
  Process switching overhead is pretty low though, especially if you just have one thread hammering away and most everything else is largely idle. The fundamental limitation of being stuck with 1/1000 of the power of your 1000 core chip because your problem is difficult/impossible to parallelize is a real one.
  
  From a practical standpoint, Intel is right that we need vastly better developer tools and that most things that require ridiculous amounts of compute time can be parallized if you put some effort into it.
  
  --
  
  I read the internet for the articles.
16. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 09:30 · Score: 0
  
  Well, yes, but, if you are going to do that, then why not also talk about the benefits of parallelization, in case the opportunity does present itself in a more practical and concrete sense?
17. Re:Not Sure I'm Getting It by xtracto · 2008-07-02 09:32 · Score: 1
  
  I've got three words for you.
  Agent Based Programming.
  We will get to the day when each small processor will be controlled by a software agent and will use messaging systems to communicate with other processes.
  Multi-Agent Systems researchers are already "solving" the communication issues (via speech acts, blackboard paradigm, etc see FIPA agents for more).
  As an Agent Based Modelling expert I can not wait for that!.
  
  --
  Ubuntu is an African word meaning 'I can't configure Debian'
18. Re:Not Sure I'm Getting It by 192939495969798999 · 2008-07-02 09:33 · Score: 5, Insightful
  
  I concur, furthermore I'd like to see one core per pixel, that would certainly solve your high-end gaming issues.
  
  --
  stuff |
19. Re:Not Sure I'm Getting It by Intron · 2008-07-02 09:35 · Score: 4, Insightful
  
  I wonder who has the rights to all of the code from Thinking Machines? We are almost to the point where you can have a Connection Machine on your desktop. They did a lot of work on automatically converting code to parallel in the compiler and were quite successful at what they did. Trying to do it manually is the wrong approach. A great deal of CPU time on a modern desktop system is spent on graphics operations, for example. That is all easily parallelized.
  
  --
  Intron: the portion of DNA which expresses nothing useful.
20. Re:Not Sure I'm Getting It by Brian+Gordon · 2008-07-02 09:43 · Score: 3, Informative
  
  Are you crazy? Context switches are the slowdown in multitasking OSes.
21. Re:Not Sure I'm Getting It by mikael_j · 2008-07-02 09:44 · Score: 3, Insightful
  
  Obviously just adding more cores does little to speed up individual sequential processes, but it does help with multitasking, which is what I really think is the "killer app" for multi-core processors.
  Back in the late 90's (it doesn't feel like "back in.." yet but I'm willing to admit that it was about a decade ago) I decided to build a computer with an Abit BP6 motherboard, two Celeron processors and lots of RAM instead of a single higher end processor because I wanted to be able to multitask properly, my gamer friends mocked me for choosing Celeron processors but for the price of a single processor system I got a system that was capable of running several "normal" apps and one with heavy cpu usage without slowing down the system, and the extra RAM also helped (I saw lots of people back then go for 128 MB of RAM and a faster CPU instead of "wasting" their money on RAM, and then they cursed their computer for being slow when it started swapping). There was also the upside of having Windows 2000 run as fast on my computer as Windows 98 did on my friends' computers...
  /Mikael
  
  --
  Greylisting is to SMTP as NAT is to IPv4
22. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 09:46 · Score: 1, Funny
  
  Why in holy fucking Jesus hell is this informative?
  
  AC cause I already used mod points elsewhere here.
23. Re:Not Sure I'm Getting It by thermian · 2008-07-02 09:47 · Score: 1
  
  what this means is that those of us who caan code are going to have to go back to the textbooks and re-learn our trade. I've been writing serial code for years, with some multi-threading for really intensive tasks.
  When this turns up, the idea of a monolithic program will be all but out of the window. It'll be 'nano software components' or something.
  I predict a sharp rise in the use of higher level languages like Python. Ever tried multi threaded coding in C?
  Sure, it can be done, but it aint easy.
  
  --
  A learning experience is one of those things that say, 'You know that thing you just did? Don't do that.' - D. Adams
24. Re:Not Sure I'm Getting It by bonehead · 2008-07-02 09:51 · Score: 1
  
  I don't know if this idea has come up in your conversations or not, but...
  A good first step might be to cut back on the conversations with your buddies, and go out and talk to some girls.
  Just sayin'.......
25. Re:Not Sure I'm Getting It by mmkkbb · 2008-07-02 09:53 · Score: 1
  
  Looks like some combination of Sun Microsystems and Oracle.
  
  --
  -mkb
26. Re:Not Sure I'm Getting It by hey! · 2008-07-02 09:53 · Score: 3, Insightful
  
  Are you crazy? Context switches are the slowdown in multitasking OSes.
  Unfortunately, multitasking OSes are not the slowdown in most tasks, exceptions noted of course.
  
  --
  Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
27. Re:Not Sure I'm Getting It by k8to · 2008-07-02 09:56 · Score: 5, Informative
  
  True but misleading. The major cost of task switching is a hardware-derived one. It's the cost of blowing caches. The swapping of CPU state and such is fairly small by comparison, and the cost of blowing caches is only going up.
  
  --
  -josh
28. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 09:57 · Score: 0
  
  A 16% decrease in processing time is significant.
  It usually isn't. It certainly isn't if you have to keep a second core powered up, doubling your power consumption. A 16% decrease in processing time also isn't worth writing new code for that isn't totally generic. Most programs could be sped up much more by simply using more adequate algorithms, instead of micro-dissecting bad algorithms.
29. Re:Not Sure I'm Getting It by jonbryce · 2008-07-02 09:57 · Score: 4, Insightful
  
  At the moment, I'm looking at Slashdot in Firefox, while listening to an mp3. I'm only using two out of my four cores, and I have 3% CPU usage.
  Maybe when I post this, I might use a third core for a little while, but how many cores can I actually usefully use.
  I can break a password protected Excel file in 30 hours max with this computer, and a 10000 core chip might reduce this to 43 seconds, but other than that, what difference is it going to make?
30. Re:Not Sure I'm Getting It by k8to · 2008-07-02 09:58 · Score: 3, Interesting
  
  Of course, the billion threads design doesn't solve the "how do n cores efficiently share x amount of cache" problem at all.
  
  --
  -josh
31. Re:Not Sure I'm Getting It by Lord+Ender · 2008-07-02 09:59 · Score: 1
  
  Computers run many programs and many processes. If they each had their own core, it doesn't matter that any individual application isn't parallelized.
  
  --
  A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
32. Re:Not Sure I'm Getting It by hummassa · 2008-07-02 10:00 · Score: 1
  
  but even for a nonparalelizable algorithm, it will run together with the other thousands of processes running in your computer/phone without jeopardizing the performance of those...
  
  --
  It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048
33. Re:Not Sure I'm Getting It by jsebrech · 2008-07-02 10:06 · Score: 1
  
  'm no software engineer, but it seems like a lot of the issue in designing for multiple cores is being able to turn large tasks into many independent discrete operations that can be processed in tandem. But it seems that some tasks lend themselves to that compartmentalization and some don't.
  Think on this: google's applications run the gamut, containing lots of functionality which we think of as single-threaded. All of google's apps run on commodity hardwars, slower than the machine on your desk.
  My point is this: if programmer's are forced to think about how they can deserialize an algorithm, they will invent solutions. In google's case, they've built a whole parallel empire on top of a map / reduce library.
34. Re:Not Sure I'm Getting It by MrBigInThePants · 2008-07-02 10:07 · Score: 1
  
  This is quite funny. I remember as a graduate tutor discussing how I thought multicore/processor architectures would be the future of the commonly computer system. (not speciallist systems) This was a few years before dual cores came out.
  At the time I talked about how GPUs were being used to take load of the CPU for a specific task and how this model could work for many other things including on a single CPU. (e.g. floating point pre-calcs etc)
  He ridiculed me. He felt it was his right since he lectured in the field.
  I wish I I could talk to him again. :)
  Anyways, there are many application for this but the biggest is on the server side where server nodes typically run hundreds of threads+ on 8+ cores and could always do with more grunt in this area.
35. Re:Not Sure I'm Getting It by marnues · 2008-07-02 10:07 · Score: 1
  
  The problem is that (and I assume this is true of many single tech oriented men like myself) there are very few women I care to talk to. There are very few men I want to talk to for that matter. And the set of women I want to get with and the set of women I want to talk to have a very small intersection that I am constantly trying to enlarge to no success. In fact, as I go along, I'm finding that a third set of women, those that I am not able to get with, is increasingly encompassing that intersection...and I'm only 24. Granted, where I live, there is a noticeable age gap from about 18-35...But my point is, the fun women rarely equal the attractive and non-taken women.
36. Re:Not Sure I'm Getting It by Kjella · 2008-07-02 10:09 · Score: 1
  
  what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?
  They.... don't? This is building the new Airbus, bigger planes if you got plenty people to fly (multi-threading). You want an even faster one-man jet (single-threaded), then this won't help you one bit. And I think at this point you can get pretty used to there being a hard limit on a single thread - things are improving a little but I don't think you'll ever see 30GHz or 300GHz CPUs on this technology. It's time to stop believing we'll have infinite computing power.
  
  --
  Live today, because you never know what tomorrow brings
37. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 10:10 · Score: 0
  
  The majority of code doesn't parallelize well because we don't have the CPUs to run parallel code. If we had had the CPUs and a 4kb parallel BASIC in the 80s we would already know parallel programming in and out by now. We would have as many parallel apps as we have now sequential. In addition to the sequential ones, of course.
  For parallel programming to flower we need one of those Core K CPUs for the masses. 2-4 cores are a suboptimal joke.
  However I fear for heat and or latency with so many cores, although parallel apps can outperform sequential clones easily with low Hz if you give them enough cores to run.
38. Re:Not Sure I'm Getting It by hedwards · 2008-07-02 10:13 · Score: 3, Interesting
  
  That's what I'm curious about. Having 2 cores is enough for most consumers, one for the OS and background tasks and one for the application you're using. And that's overkill for most users.
  Personally, I like to multi task and am really going to love when we get to the point where I can have the OS on one core and then have 1 core for each of my applications. But even that is limited to probably less than 10 cores.
  Certain types of tasks just don't benefit from extra cores, and probably never will. Things which have to be done sequentially are just not going to see any improvement with extra cores. And other things like compiling software may or may not see much of an improvement depending upon the design of the source.
  But really, it's mainly things like raytracing and servers with many parallel connections which are the most likely to benefit. And servers are still bound by bandwidth, probably well before they would be hitting the limit on multi cores anyways.
39. Re:Not Sure I'm Getting It by Artuir · 2008-07-02 10:13 · Score: 2, Funny
  
  Well, you see.. when posting somewhere like Slashdot that knows nothing about women or girls, anything pertaining to their habits or way of life is insightful and/or informative.
40. Re:Not Sure I'm Getting It by Sparohok · 2008-07-02 10:16 · Score: 1
  
  Fortunately most of the stuff that is "slow, plodding, and sequential" isn't particularly challenging anymore, even for a single core on a modern CPU. The stuff that will bring even the fastest hardware to its knees tends to be inherently parallel. Things like video encoding & decoding, graphics and image processing, numerical and scientific computation, or manipulating vast quantities of data.
  In part, that's because software developers face the same challenges that hardware developers do. One reason that hardware developers are going multicore is that designing a single custom CPU core to use a billion transistors rises to a superhuman level of complexity. From a project management perspective, it just makes more sense to replicate simpler designs.
  Similarly, if you write a million lines of purely sequential, completely unparallelizable code, then execute it on a modern processor it's going to take a few seconds to run. Obviously that's an extreme example, but even in practice, real world problems which are accessible to merely human programmers are either inherently parallel at some level, or sooner or later become trivial to execute on single threaded hardware.
  Martin
41. Re:Not Sure I'm Getting It by rrohbeck · 2008-07-02 10:17 · Score: 2, Informative
  
  Yup. Its Amdahl's law.
  This whole many core hype looks a lot like the Gigahertz craze from a few years ago. Obviously they're afraid that there will be no reason to upgrade. 2 or 4 cores, ok - you often (sometimes?) have that many tasks active. But significantly more will only buy you throughput for games, simulations and similar heavy computations. Unless we (IAACS too) rewrite all of our apps under new paradigms like functional programming (e.g. in Erlang.) Which will only be done if there's a good reason for it.
  
  --
  thegodmovie.com - watch it
42. Re:Not Sure I'm Getting It by Hucko · 2008-07-02 10:23 · Score: 1
  
  It sounds like a job for plan 9
  
  --
  Semi-automatic amateur armchair Australian philosopher; conjecture ready at any moment...
43. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 10:25 · Score: 0
  
  A lot of physicists are probably going to like this. In number crunching, it is so annoying (costly) having to buy all that sillicon for MMX and extra specialized instructions and features, that a lot of people are working out how to utilize graphics card processors. They come with as many as 256 cores per card. All are relatively simple processors, but great for number crunching. Unfortunately, however, memory is limited and bus speeds are surpriceingly slow.
44. Re:Not Sure I'm Getting It by cpeterso · 2008-07-02 10:29 · Score: 5, Interesting
  
  Now that 64-bit processors are so common, perhaps operating systems can spare some virtual address space for performance benefits.
  The OPAL operating system was a University of Washington research project from the 1990s. OPAL uses a single address space for all processes. Unlike Windows 3.1, OPAL still has memory protection and every process (or "protection domain") has its own pages. The benefit of sharing a single address space is that you don't need to flush the cache (because the virtual-to-physical address mapping do not change when you context switch). Also, pointers can be shared between processes because their addresses are globally unique.
  
  --
  cpeterso
45. Re:Not Sure I'm Getting It by frission · 2008-07-02 10:33 · Score: 2, Interesting
  
  maybe in some language "for" loops will be meant to be processed sequentially, and "for each" can be parallelized?
46. Re:Not Sure I'm Getting It by painehope · 2008-07-02 10:36 · Score: 5, Funny
  
  They have been simulating it, that's why he said "My friends and I". *shudders*
  
  --
  PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
47. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 10:40 · Score: 0
  
  >Pleasing a woman is easy. Give her your credit card.
  That might work, as a short term expedient, but doesn't pan out in the long run. Trust me, I know whereof I speak.
  If you find yourself in a relationship with a woman that is only happy when you buy her things, or give her the means to do so herself with your funds, RUN AWAY.
  And I imagine the same advice applies if you're a woman and find yourself in a relationship with a man that only values material goods, though I am not equipped to know :)
  Or any other combination of genders, I suppose...
  Captcha: wounded
  Now *that* is uncannily accurate...
48. Re:Not Sure I'm Getting It by LandDolphin · 2008-07-02 10:41 · Score: 5, Insightful
  
  "Having 2 cores is enough for most consumers"
  
  Before having 1 core was enough, and having 512mb of RAM was enough for most consumers. Computing power grows, and software developers makes use of that additional power. However, this will mainly effect the gaming industry.
  
  --
  Spelling and Grammar errors have been added to this post for your enjoyment
49. Re:Not Sure I'm Getting It by Penguinisto · 2008-07-02 10:43 · Score: 1
  
  I do 3D/CG artwork as a hobby. Whenever I do a render, or the polygon count in a scene gets over 300,000 or so, things take longer to render.
  with a decent multi-threaded render engine and enough cores and RAM laying about, I can easily build far more complex scenes without having to set aside hours (or days) on end just to render the thing (esp. concerning animation, which averages at least 24 renders per second of runtime).
  I'm sure that as game engine coders get comfortable with multi-core, the framerates will rise appreciably as well, without a sacrifice in eye-candy or physics. (Indeed - where once you were stuck with a maximum of 150 non-player polys viewable in an old Unreal Tournament map, you can now stretch that out by orders of magnitude... and eventually with multi-core, get some very nice physics and eye-candy out of the deal.)
  /P
  
  --
  Quo usque tandem abutere, Nimbus, patientia nostra?
50. Re:Not Sure I'm Getting It by algae · 2008-07-02 10:47 · Score: 1
  
  Not sure I agree with you there - I would say there was quite a long period of time; let's say 2000 through whenever Intel '86ed their single-core lineup, when OSes *were* able to take advantage of SMP, and most users didn't have it. Maybe people didn't know how much better things would be with a second CPU, but they sure knew they didn't have "the snappy".
  
  --
  Causation can cause correlation
51. Re:Not Sure I'm Getting It by giorgist · 2008-07-02 10:47 · Score: 1
  
  mad ... cause it's maxed out
  
  what will she do with it ?
52. Re:Not Sure I'm Getting It by Endo13 · 2008-07-02 10:48 · Score: 1
  
  Yes, but in this case "many" does not usually equal "thousands". At most it equals a few dozen, and most of those use the CPU very little. So what you're left with the majority of the time is one or two process that need a lot of CPU, and a multitude of cores helps very little unless those big process can be split between them.
  
  --
  There is no -1 Disagree mod. Slashdot.org/faq defines mod options. USE IT.
53. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 10:49 · Score: 0
  
  Yes, I would assume that for-each does not imply order, but you'll have to look at the specs to be sure. That was just an example though. Unnecessarily using the iterator concept would be another pitfall. Sequential programming languages and APIs are full of places where code idioms imply order unnecessarily. Some can be detected by better compilers, but programmers will really need to be more exact when they want to keep more cores busy.
54. Re:Not Sure I'm Getting It by blahplusplus · 2008-07-02 10:50 · Score: 4, Informative
  
  "Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
  Then you take the tasks that can be broken up over multiple cores (Ray Tracing anyone?) and fill the rest of your cores with that."
  Unfortunately all this is going to lead to bus and memory bandwidth contention, you're just shifting the burden from one point to another. Although their is a 'penalty' for task switching, there is an even greater bottleneck at the bus and memory bandwidth level.
  IMHO intel would have to release a cpu on a card with specialized ram chips and segment the ram like GPU's do to get anything out of multicore over the long term, ram is not keeping up and the current architecture for PC ram is awful for multicore. CPU speed is far outstripping bus and memory bandwidth. I am quite dubious of multi-core architecture, there is fundamental limits of geometry of circuits. I'd be sinking my money into materials research not glueing cores together and praying CS and math guys come up with solutions that take advantage of it.
  The whole of human history of engineering and tool use, is to take something extremely complicated and offload complexity, and compartmentalize it so that it's mangable. I see the opposite happening with multi-core.
55. Re:Not Sure I'm Getting It by Maxo-Texas · 2008-07-02 10:52 · Score: 1
  
  "Having 2 cores is enough for most consumers"
  And we only need 640k ram.
  
  --
  She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
56. Re:Not Sure I'm Getting It by skulgnome · 2008-07-02 10:57 · Score: 5, Informative
  
  No. I/O is the slowdown in multitasking OSes.
57. Re:Not Sure I'm Getting It by mabhatter654 · 2008-07-02 10:57 · Score: 1
  
  more to the point, why do we care now? It's not like Windows 7 will be multi-core aware.. not really. Development houses don't even properly use AMD64 chips that have been dual-core for longer than intel has making multi-core chips.... nobody will use it for at least 5 years.
  All they're doing is blatantly trying to FUD and pre-announce "great new features" to keep people from simply low-balling their systems for what they can do NOW and going with AMD/Nvidia solutions NOW that do what people need.
58. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 10:58 · Score: 0
  
  Not trying to belittle your point, but you should be able to brute force an excel file in under ten minutes, even with a Pentium IV @ 2Ghz.
  Google "Password Xtra" for more info. Some dude in the netherlands.
59. Re:Not Sure I'm Getting It by Gilmoure · 2008-07-02 10:59 · Score: 2, Funny
  
  I think I once figured out that, starting with 3 billion women on the planet, there were about 5 with mutual attraction with me. I think I've found two of them.
  
  --
  I drank what? -- Socrates
60. Re:Not Sure I'm Getting It by poot_rootbeer · 2008-07-02 11:00 · Score: 1
  
  A great deal of CPU time on a modern desktop system is spent on graphics operations, for example. That is all easily parallelized.
  If true, then maybe the CPU makers should consider some sort of MultiMedia eXtension for their instruction sets, so that graphics rendering and similarly parallelizable operations are routed to a separate bit of silicon with registers and operators specificially designed for parallelism...
61. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 11:01 · Score: 0
  
  Wider adoption of a powerful parallel programming language would sincerely help. Not just a version of C with parallel-programming syntax strapped on, a full-blown language intended to effectively replace C when it comes to hardware-level programming with multiple CPUs.
  Compiler logic can only go so far, a language with specific constructs to say things like "I really don't care how this loop is executed" would be incredibly helpful. Students could be taught how and when both classical and new structures could be used, it's up to them to correctly decide when and where to use each.
  I know there are languages designed to easily allow parallel programming, but I haven't seen any that allow the sort of control C allows. Does such a language exist?
62. Re:Not Sure I'm Getting It by ceswiedler · 2008-07-02 11:07 · Score: 3, Insightful
  
  Uh, last time I checked, Python had a single interpreter lock per process which made it unsuitable for heavily multithreaded programs. Java would be a better example of a scalable and multithread-aware language.
63. Re:Not Sure I'm Getting It by kv9 · 2008-07-02 11:08 · Score: 5, Funny
  
  I can break a password protected Excel file in 30 hours max with this computer, and a 10000 core chip might reduce this to 43 seconds, but other than that, what difference is it going to make?
  29 hours 59 minutes 17 seconds?
  
  --
  Stop Computers/Cars Analogies on S
64. Re:Not Sure I'm Getting It by Nefarious+Wheel · 2008-07-02 11:08 · Score: 1
  
  But what happens when you have more than one core per thread? Do you just let the unused cores lie idle, or do you further decompose the thread and try to predictively parallelise it?
  Removing context switching logic -- hmmm... that will take a while to register...
  
  --
  Do not mock my vision of impractical footwear
65. Re:Not Sure I'm Getting It by joto · 2008-07-02 11:21 · Score: 1
  
  The problem is that (and I assume this is true of many single tech oriented men like myself) there are very few women I care to talk to.
  Think of it as you do of work, you don't have to enjoy your work, you do it for the money. Similarly, you don't have to enjoy the conversation, you do it because it increases the chances of sex.
66. Re:Not Sure I'm Getting It by riceboy50 · 2008-07-02 11:23 · Score: 1
  
  keep a second core powered up, doubling your power consumption
  Are you sure a second core requires a 100% increase in power consumption? It seems like they probably share a lot of circuitry and thus powering up a second core takes less than the first core.
  
  I wonder how it compares to CPUs that scale their clock-rate up and down depending on processing requirements?
  
  --
  ~ I am logged on, therefore I am.
67. Re:Not Sure I'm Getting It by xsadar · 2008-07-02 11:26 · Score: 1
  
  They did a lot of work on automatically converting code to parallel in the compiler and were quite successful at what they did. Trying to do it manually is the wrong approach.
  This works great for some easily parallelized algorithms (such as graphics). For those algorithms, doing it manually may be the wrong approach, however, too many algorithms cannot gain much from current automagic parallelization methods. You really have to do it manually. A good example of this is minimax (alpha-beta, PVS, etc) game tree searching, due to its recursive nature.
  
  --
  The only thing I know is that I don't know anything; and I'm not even sure about that.
68. Re:Not Sure I'm Getting It by joto · 2008-07-02 11:34 · Score: 2, Insightful
  
  In reality parallelism is more likely going to be found by better compilers. Programmers will have to be more specific about the type of loops they want. Do you just need something to be performed on every item in an array or is order important? No more mindless for-loops for not inherently sequential processes.
  I disagree. Having the compiler analyze loops to find out if they are trivially parallelizable is easy, there's little need to change the language.
  On the other hand, a language that was really designed for kilocores or megacores would be radically different from most modern languages, adding a few extra (un)loop-statements wouldn't do. Functional languages are a good bet. When everything is side-effect-free, there's no good reason why all of it can't be executed in parallel.
  But maybe we need even more abstraction. And more time. It took quite a while after the invention of the programmable computer for someone to invent FORTRAN. And we still program in something resembling FORTRAN. Maybe what we really need are actual many-core computers so that someone really smart will use them, and finally figure out a way to program them that's practical. That's where I'll put my money. Wait and see!
69. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 11:38 · Score: 1, Informative
  
  Inasmuch as CM-1 and CM-2 machines, theparallelism you saw tended to be pure SIMD
  level (what TMC promoted as 'data level')parallel. This probably changed in CM-5,
  a machine I know less about...
  Pure SIMD level parallelism is /somewhat/ akin
  to what you see in GPGPU style projects,
  although even those (in certain GPU programming
  idioms) tend to 'see' things as collections of
  threads as opposed to large arrays of ALU/CPU
  sets working lock-step on an array.
  I would suggest that TMC had quite a bit going for
  it in terms of some of the novel thinking
  regarding how one might use such a beast as
  CM[125]. This is the problem in front of Intel
  inasmuch as how to tame so many cores: how do
  you think about these problems and match this
  model to the problem(s) at hand. Clearly, at
  very large N cores, the idiom will change
  because you can't just 're-write it in C with
  threads' or some other larger-granularity
  adaptation. This is the prize: taming wide-spreadparallelism, for systems programming (at the
  OS level) and for compute-intensive applications
  (It REALLY SUCKS to do MPI in Fortran......
  things have to change).
  There are some very significant and very
  interesting changes afoot over the next 10
  years in computer science due to the wide spread
  of parallelism. You used to have exotic machines
  for this. Now, it will be in laptops through
  supercomputers.....
  Very neat.
70. Re:Not Sure I'm Getting It by curunir · 2008-07-02 11:46 · Score: 4, Insightful
  
  ...but other than that, what difference is it going to make?
  This is, IMHO, the wrong question to be asking. Asking how current tasks will be optimized to take advantage of future hardware makes the fundamental flawed assumption that the current tasks will be what's considered important once we have this kind of hardware.
  But the history of computers have shown that the "if you build it, they will come" philosophy applies to the tasks that people end up wanting to accomplish. It's been seen time and again that new abilities for using computers wait until we've hit a certain performance threshold, whether it CPU, memory, bandwidth, disk space, video resolution or whatever, and then become the things we need our computers to do.
  Take, for instance, the huge success of mp3's. There was a time not so long ago when people were limited to playing music off a physical CD. This wasn't because there was no desire amongst computer users to listen to digital files that could be stored locally or streamed off the internet. It was because computer users did not know yet that they had the desire to do it. But technology advanced to the point where a) processors became fast enough to decode mp3's in real time without using the whole CPU and b) hard drives grew to the point where we had the capacity to store files that are 10% of the size of the size of the files on the CD.
  Similarly, it's likely that when we reach the point where we have hundreds or thousands of cores, new tasks will emerge that take advantage of the new capabilities of the hardware. It may be that those tasks are limited in some other way by one of the other components we use or by the as yet non-existent status of some new component, but it's only important that multiple cores play a part in enabling the new task.
  In the near term, you can imagine a whole host of applications that would become possible when you get to the point where the average computer can do real-time H.264 encoding without affecting overall system performance. I won't guess at what might be popular further down the road, but there will be people who will think of something to do with those extra cores. And, in hindsight, we'll see the proliferation of cores as enabling our current computer-using behavior.
  
  --
  "Don't blame me, I voted for Kodos!"
71. Re:Not Sure I'm Getting It by geekoid · 2008-07-02 11:53 · Score: 4, Insightful
  
  Why wouldn't each core have it's own cache? It only needs to cache what it needs for its job.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
72. Re:Not Sure I'm Getting It by geekoid · 2008-07-02 12:00 · Score: 1
  
  Please bear in mind that none of the tools used to build those products, nor the software architectures, as been designed to take advantages of a lot of cores.
  Add to that the fact that your operating system wasn't built/compiled/architected to use a lot of cores.
  Some rough ideas:
  Accessing databases would change, your services can be run independently, large number crunching can be done faster.
  You could see the return of central processing in a big way. Give people a terminal and assign them 4 cores, but still let them store the apps locally. A huge benefit to business.
  I would type about the benefits of a clockless multicore system, but every time I do my boner knocks my keyboard away.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
73. Re:Not Sure I'm Getting It by Tubal-Cain · 2008-07-02 12:01 · Score: 1
  
  This guy would also benefit greatly.
74. Re:Not Sure I'm Getting It by geekoid · 2008-07-02 12:05 · Score: 0, Redundant
  
  Many core could allow for slower clock speeds, cooler chips and quite computers.
  Of course, An OS could be designed so different modular componts run on different cores.
  Please remember even though they have been around for a while, the tools for multicores still aren't mature.
  A horse is enough for anybody, and that's overkill for most people.
  Expect to see 'core clusters' and 'core clouds' to handle problems that 'won't see any improvement'. These will be abstractions of cores into behaving like one fast core.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
75. Re:Not Sure I'm Getting It by geekoid · 2008-07-02 12:07 · Score: 2, Insightful
  
  "Unfortunately all this is going to lead to bus and memory bandwidth contention, "
  Good. Current bus needs to be redone.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
76. Re:Not Sure I'm Getting It by Cynic.AU · 2008-07-02 12:12 · Score: 2, Interesting
  
  Holy crap. I just realised what you were saying -- use parallelism, vast parallelism for BRANCH PREDICTION.
  That's not really how concurrency works at the moment :) it's at a much higher level at the moment, explicitly in the code itself - take matrix multiplication for instance, it's easy to see how that can be split up into multiple threads..
  But calculation of every possible state 'n' states into the future, with 2^n CPU cores, that sounds like a good idea, sir! :) and is also not mutually exclusive with explicit multithreading (although each concurrent thread blows out the total number of states).
77. Re:Not Sure I'm Getting It by geekoid · 2008-07-02 12:15 · Score: 2, Insightful
  
  except when running an algorithm on 1 core, you can have 900 cores running different outputs based on the probability of a different out come of the previous part of the process.
  WHen it is actually determined, kill the 899 that wher incorrect. In fact, what would probably happen is they would all branch differently, so you might kill 400, then after running for a bit, 200, and so on. This would exponentially decrease the time it takes to solve it.
  In fact, for some application getting 'close enough' will do.
  Example:
  Chess. I move my pawn in the first move in chess. 18 processes started up on separate cores, each one calculating the next 5 steps that are possible. When the next mover is made, it kills the processes that didn't calculate 5 steps from that move.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
78. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 12:20 · Score: 0
  
  Automatic loop parallelization has limits. A loop in a library function may imply an order which is not evident in internal dependencies. External code may or may not rely on the sequential behavior. This kind of issue can not be resolved at compile time. Another problem is when the dependency which sequentializes the loop does not actually exist. Sequential output typically forces order, but the programmer may not actually care about the order in which the results are written. This makes automatic loop parallelization a dangerous optimization if it's effective, and not as effective as it could be when it's safe.
  There may be other programming paradigms which make some automatic parallelization simpler, but if we're going to change programming languages in order to better utilize massively parallel processors, then we might as well look at languages where parallelism isn't just a compiler optimization. Anyhow, ultimately it's the algorithm which makes or breaks parallelism.
79. Re:Not Sure I'm Getting It by geekoid · 2008-07-02 12:22 · Score: 1
  
  YOu engineering experience is crippling you.
  I am sure you are an excellent engineer, but this changes a lot of things. They way things are 'parallelized' today will bear no resemblance to how it will be done wit thousands of corest.
  You will have course running further events based on probability of certain outcomes. when the actual event is determined, the process that have been working from that event since it was a probably event will have done a bunch of work, skip to the end of the event(becasue it is done) and continue from there.
  Tools, thinking and new ways to use this will be a huge change.
  Now talk about servers and centrally controlled process for employees, and you will ahve a large cost savings.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
80. Re:Not Sure I'm Getting It by kramerd · 2008-07-02 12:23 · Score: 3, Informative
  
  Girls like it when you buy them things. Or when you pretend to listen. And when you shower.
81. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 12:27 · Score: 0
  
  Put simply, the majority of code simply doesn't parallelize well.
  This is true. Some process, say "x", is serial in nature. No parallelism.
  The win with multicore systems is that you can process many x's. The real world, modeled in data, is generally not sequential in nature so the vast majority of the time what we actually want to process is thousands or millions of x's.
  This is true for communications (umpteen million cyphered connections,) simulations (adjacent cells), media encoding (delta frames,) etc. Many use cases.
  Don't bet against parallelism. You'll lose.
82. Re:Not Sure I'm Getting It by geekoid · 2008-07-02 12:27 · Score: 1
  
  That assume optimal choice. In practicality most choice will be sub-optimal, and thus actually speed up.
  Also it doesn't take into Cache with parallel processing. In fact super linear speed-up kind of blows that rule out of the water for this scenario.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
83. Re:Not Sure I'm Getting It by Trespass · 2008-07-02 12:28 · Score: 1
  
  I do 3D/CG artwork as a hobby. Whenever I do a render, or the polygon count in a scene gets over 300,000 or so, things take longer to render.
  with a decent multi-threaded render engine and enough cores and RAM laying about, I can easily build far more complex scenes without having to set aside hours (or days) on end just to render the thing (esp. concerning animation, which averages at least 24 renders per second of runtime).
  I'm sure that as game engine coders get comfortable with multi-core, the framerates will rise appreciably as well, without a sacrifice in eye-candy or physics. (Indeed - where once you were stuck with a maximum of 150 non-player polys viewable in an old Unreal Tournament map, you can now stretch that out by orders of magnitude... and eventually with multi-core, get some very nice physics and eye-candy out of the deal.)
  /P
  I think it's more likely that in the next two years we'll see additional cores being given to different tasks than rendering. The Crytek engine does something interesting in that at least some character animation is procedural, which is to say something like a walk cycle is calculated in terms of engine physics rather than keyframing. Nice looking, but computationally expensive. Procedural animation and other physics calculations are prime candidates to offload onto another core. More sophisticated particle effects and even creature AI would be well-served by this as well.
  I think in the medium term, getting comfortable with multicore CPUs will involve dividing currently monolithic tasks (like rendering and physics) into a series of smaller tasks running in parallel within the game engine. I'll be curious to see where it goes.
84. Re:Not Sure I'm Getting It by kesuki · 2008-07-02 12:35 · Score: 5, Interesting
  
  yes, but if you have 1000 cores each with 64k of cache, then you start to run into problems with memory throughput when computing massively parallel data.
  memory throughput has been the achilles heel of graphic processing for years now. and as we all know, splitting up a graphic screen into smaller segments is simple. so GPUs went massively parallel long before CPUS, in fact you will soon be able to get over 1000 stream processing units in a single desktop graphic card.
  so, the real problem is memory technology, how can a single memory module consistently feed 1000 cores, for instance if you want to do real-time n-pass encoding of a hd video stream... while playing a FPS online, and running IM software, and a strong anti-virus suite...
  I have a horrible horrible ugly feeling that you'll never be able to get a system that can reliably do all that. at the same time, just because they'll skimp on memory tech or interconnects, so you'll have most of the capabilities of a 1,000 core system wasted.
  
  --
  https://www.gnu.org/philosophy/free-sw.html
85. Re:Not Sure I'm Getting It by kesuki · 2008-07-02 12:46 · Score: 3, Informative
  
  "Take, for instance, the huge success of mp3's. There was a time not so long ago when people were limited to playing music off a physical CD. This wasn't because there was no desire amongst computer users to listen to digital files that could be stored locally or streamed off the internet. It was because computer users did not know yet that they had the desire to do it. But technology advanced to the point where a) processors became fast enough to decode mp3's in real time without using the whole CPU"
  I started making mp3s with a 486 DX 75mhz
  I could decode in real time on a 486 DX 75 as i recall encoding took a bit of time, and i only had a 3 GB HDD that had been an upgrade to the system...
  Mp3s use a asynchronous encoding algorithm, more CPU to encode, than to decode, if your MP3 player doesn't run correctly on a 486, then it's because they designed in features not strictly needed to decode a MP3 stream.
  Oh hey, I have an RCA Lyra mp3 player, that isn't even as fast as a 486, but the decoder was designed for mp3 decoding.
  Ogg decoding uses a beefier decoder, that's half the problem getting ogg support in devices not made for decoding video streams.
  
  --
  https://www.gnu.org/philosophy/free-sw.html
86. Re:Not Sure I'm Getting It by dbIII · 2008-07-02 12:50 · Score: 1
  
  Good point, but there are still a lot of things that are trivially done in parallel such as per frame operations in video or even manipulation of a single image. 3D games could also run a lot of threads and I'm hoping things do go the way of a lot of multiple cores in consumer gear - makes it easier for those of us in numerical processing.
87. Re:Not Sure I'm Getting It by Hadlock · 2008-07-02 12:50 · Score: 1
  
  That's what I'm curious about. Having 2 cores is enough for most consumers, one for the OS and background tasks and one for the application you're using. And that's overkill for most users.
  
  I typically run firefox, bit torrent, and then folding@home + World community grid at the same time. I close none of these to play TF2 and see no difference in FPS with the CPU hogging programs on or off. My friend was debating between a quad core AMD or a dual core intel and finally bought the dual core after realizing OS technology and the software that runs on it hasn't progressed enough to even take advantage of two cores, let alone four or more. I think there are some particle effects that are done in TF2 now with dual core optimization, but for the most part dual cores is enough for even heavy users.
  
  --
  moox. for a new generation.
88. Re:Not Sure I'm Getting It by rtechie · 2008-07-02 12:51 · Score: 1
  
  This wasn't because there was no desire amongst computer users to listen to digital files that could be stored locally or streamed off the internet. It was because computer users did not know yet that they had the desire to do it.
  This is a contradiction. "Not knowing you have the desire" is the same as NOT HAVING the desire, unless you want to make some bizarre argument that computer users have subconscious precognition.
  The REAL innovation of MP3 was a codec that compressed audio to a large degree (80%+) without a substatantial loss of quality. Before that, "digital music" was unpopular because WAV files were too large to share using the link speeds at the time (9600 baud) and other codecs sounded like ass. Yes, MP3 used a relativel high amount of CPU time, but it was not faster CPUs per se that fostered MP3 adoption.
89. Re:Not Sure I'm Getting It by HeroreV · 2008-07-02 12:59 · Score: 1
  
  You're thinking too small. What we really need is a processing core for every photon.
90. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 13:02 · Score: 0
  
  The CM-1 and CM-2 systems, which were the massively parallel machines, were programmed in *LISP http://en.wikipedia.org/wiki/*Lisp, a variant of LISP with parallel data constructs. The front end was a Symbolics LISP machine, and this is where the sequential code executed http://en.wikipedia.org/wiki/Symbolics.
  The Symbolics OS was called Genera, and was also written in LISP. If you had the right license, you got the OS source code, and you could browse the OS when you were debugging your code.
  It is very unclear who owns what part of the code base between Symbolics and Thinking Machines
91. Re:Not Sure I'm Getting It by kesuki · 2008-07-02 13:04 · Score: 1
  
  "the majority of code simply doesn't parallelize well."
  but the first thing that will get ported is that nasty website that strobes your screen and keeps you from doing anything with the computer through JavaScript/whatever.
  just imagine how fast those nasty sites can strobe the screen with 1000 cores!
  
  --
  https://www.gnu.org/philosophy/free-sw.html
92. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 13:19 · Score: 0
  
  What you are describing has been in CPUs for years.
  All processors now are superscalar with multiple stages of pipelines.
  See http://en.wikipedia.org/wiki/Superscalar
93. Re:Not Sure I'm Getting It by ClosedSource · 2008-07-02 13:21 · Score: 1
  
  The problem in AI isn't speed, it's the "I".
94. Re:Not Sure I'm Getting It by Salamander · 2008-07-02 13:23 · Score: 5, Informative
  
  Because each core is no longer task switching. Once you have more cores than tasks you can remove all the context switching logic and optimize the cores to run single processes as fast as possible.
  OK, so now the piece that's running on each core runs really really fast . . . until it needs to wait for or communicate with the piece running on some other core. If you can do your piece in ten instructions but you have to wait 1000 for the next input to come in, whether it's because your neighbor is slow or because the pipe between you is, then you'll be sitting and spinning 99% of the time. Unfortunately, the set of programs that decompose nicely into arbitrarily many pieces that each take the same time (for any input) doesn't extend all that far beyond graphics and a few kinds of simulation. Many, many more programs hardly decompose at all, or still have severe imbalances and bottlenecks, so the "slow neighbor" problem is very real.
  Many people's answer to the "slow pipe" problem, on the other hand, is to do away with the pipes altogether and have the cores communicate via shared memory. Well, guess what? The industry has already been there and done that. Multiple processing units sharing a single memory space used to be called SMP, and it was implemented with multiple physical processors on separate boards. Now it's all on one die, but the fundamental problem remains the same. Cache-line thrashing and memory-bandwidth contention are already rearing their ugly heads again even at N=4. They'll become totally unmanageable somewhere around N=64, just like the old days and for the same reasons. People who lived through the last round learned from the experience, which is why all of the biggest systems nowadays are massively parallel non-shared-memory cluster architectures.
  If you want to harness the power of 1000 processors, you have to keep them from killing each other, and they'll kill each other without even meaning to if they're all tossed in one big pool. Giving each processor (or at least each small group of processors) its own memory with its own path to it, and fast but explicit communication with its neighbors, has so far worked a lot better except in a very few specialized and constrained cases. Then you need multi-processing on the nodes, to deal with the processing imbalances. Whether the nodes are connected via InfiniBand or an integrated interconnect or a common die, the architectural principles are likely to remain the same.
  Disclosure: I work for a company that makes the sort of systems I've just described (at the "integrated interconnect" design point). I don't say what I do because I work there; I work there because of what I believe.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
95. Re:Not Sure I'm Getting It by ClosedSource · 2008-07-02 13:28 · Score: 1
  
  "The real world, modeled in data, is generally not sequential in nature..."
  The problem is that many functions people perform on computers have nothing to do with the real world.
96. Re:Not Sure I'm Getting It by Lehk228 · 2008-07-02 13:30 · Score: 1
  
  i don't know, why DOESN'T intel give each core it's own cache?
  
  you'll have to ask them, AMD does and in my experience, it makes running two or more intensive threads run much better.
  
  --
  Snowden and Manning are heroes.
97. Re:Not Sure I'm Getting It by MightyYar · 2008-07-02 13:34 · Score: 1
  
  I think you are right. My first dual-core machine was in 2004, and it really opened my eyes. My wife's single-core machine is technically faster, but it feels much slower.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
98. Re:Not Sure I'm Getting It by Lehk228 · 2008-07-02 13:36 · Score: 1
  
  that is EASY for the compiler to look for, code that only looks at the same value of somearray[iteratedvalue+C] can be done simultaniously, if the loop contains more than one value of C in the brackets it cannot be (well actually it can, but you need to make an additional copy of array values for every value of C the loop contains.
  
  --
  Snowden and Manning are heroes.
99. Re:Not Sure I'm Getting It by MightyYar · 2008-07-02 13:45 · Score: 1
  
  Yeah, I'm not sure where he's coming from either... for me, I was always loading sound files onto my computer. First it was MODs and short sound clips of the Simpsons... later on, when drive space became less of an issue and processing improved the quality similarly improved.
  And hell, I remember some kids collecting music for their C64s.
  The mere existence of the jukebox should tell you that people have always wanted a big collection of indexed music - but most people couldn't justify the space or expense of a full-sized jukebox. Prior to MP3s, huge CD carousels were actually pretty common. As a teenager, I had a 5 or 6 disc changer - and before that, a record player that could hold a stack of records and play them in sequence... my first CD player even had random-access for the record player on top... it worked most of the time.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
100. Re:Not Sure I'm Getting It by jamesswift · 2008-07-02 13:50 · Score: 1
  
  I'd prefer 3 per pixel. And maybe an alpha chip.
  
  --
  i wish i could stop
101. Re:Not Sure I'm Getting It by earthforce_1 · 2008-07-02 13:52 · Score: 2, Insightful
  
  You speed it up by rewriting sequential algorithms to run in parallel. It is surprising the number of algorithms you would swear are inherently sequential that can be rewritten to operate in parallel. Beyond that, you can have cores engaged in speculative execution, where the results may or may not be used. I could imaging a spell checker where multiple words and sentence fragments are dispatched to numerous cores for spelling/grammar checking. A compiler could devote a separate core to compiling/linking/optimizing each individual module or function.
  Programmers don't think massively parallel and most programming languages (excluding hardware design languages such as Verilog/VHDL) are sequential in nature.
  
  --
  My rights don't need management.
102. Re:Not Sure I'm Getting It by PDX · 2008-07-02 13:54 · Score: 1
  
  Imagine the moral+ethical subroutines of your holographic doctor being handled by one core or by many smaller cores. What type of advice would you want to receive.
  Distributed = Reliable
  Singular = Confidential
  I see Dead links!...
103. Re:Not Sure I'm Getting It by GaryOlson · 2008-07-02 13:55 · Score: 1
  
  it's likely that when we reach the point where we have hundreds or thousands of cores, new tasks will emerge that take advantage of the new capabilities of the hardware.
  To which I would propose we focus on post-processing data correlation instead of pre-processing debugging. Rather than DECLARE a value or process and test IF the result or process OPERATES to a predefined expectation, we POSTULATE a direction and allow each core to PROPOGATE along a different vector. Then use those multiple cores to display all possible post-processing correlations of all the PROCESS VECTORs. The result set will include valid results, invalid results "eg: can't divide by zero", unexpected results, contradictory results.
  
  The abundance of cores will provide a resource which has not been available in traditional resource limited computing: disposable processes and disposable results.
  
  --
  Every mans' island needs an ocean; choose your ocean carefully.
104. Re:Not Sure I'm Getting It by smallfries · 2008-07-02 13:56 · Score: 1
  
  The problem with Google is that you generally need to know the name of a thing before you can search for it. Be enlightened at the coolness. Yes it is a very good idea, it was big in the 80s. You are right that it will be big again in a few years when there are lots of idle cores sitting around.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
105. Re:Not Sure I'm Getting It by jamesswift · 2008-07-02 13:59 · Score: 1
  
  In reality parallelism is more likely going to be found by better compilers
  Better compilers or better languages or just different ones to what are commonly used?
  Aren't functional languages are more suited to having the compilers decide how to parallelise a task?
  
  --
  i wish i could stop
106. Re:Not Sure I'm Getting It by Gazzonyx · 2008-07-02 14:01 · Score: 2, Insightful
  
  Another thing to think about (besides cache coherency, ping ponging between sockets over the bus, locking overhead, etc.): You can have a million cores and it won't matter. You're only as fast as your weakest link. Right now, that's storage, but solid state hard drives will be common in the next decade for first tier storage (as straight memory bank storage becomes more common for high performance applications), the average disk access time will improve by a few orders of magnitude. Still, that only moves the problem 'forward' a level.
  
  You still choke on the Memory Wall; you have to feed all those cores data, and you're going a few orders of magnitude slower than the CPU cores. Increasing bandwidth on the front side bus doesn't help, as you have to increase bandwidth and decrease latency. You compound this when you have many cores/sockets doing backward cache flushes to RAM.
  
  Even if you've got a hypertransport link (as Intel doesn't, they push bits on the front side bus between sockets, IIRC) to the north bridge for each socket, you've still only got a single north bridge. You're bottlenecked again. OK, use two front side buses with an interlink. Now we're back to coherency problems, but at two points. At some point, you have to either give each socket its own RAM bank (NUMA) and isolate data (and make CPU migration for tasks take an extra hit) or figure out how to perfectly isolate and stripe your data over multiple paths to a single backing store.
  
  --
  If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
107. Re:Not Sure I'm Getting It by jamesswift · 2008-07-02 14:06 · Score: 1
  
  Put simply, the majority of code simply doesn't parallelize well.
  Even if that is true this may not remain so once programmers have multiple cores available.
  But lets take spreadsheets as an example. This is possibly one of the most common forms of programming out there.
  Speadsheets would benefit enormously from multiple cores. The benefits aren't always obvious.
  
  --
  i wish i could stop
108. Re:Not Sure I'm Getting It by stephanruby · 2008-07-02 14:07 · Score: 1
  
  ...but what about the processes that are slow and plodding and sequential?
  You got it. That's the current bottleneck right now. Many of those processes have to be re-conceptualized and rewritten from scratch. For instance, many games say they require multiple processors, but the truth is that most games haven't taken full of advantage of those multiple processors yet. It's a new way of programming for most programmers. It will take time to get those guys converted.
  And of course, once this bottleneck gets taken care of, other bottlenecks will follow, but right now the current bottleneck is mostly human.
109. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 14:08 · Score: 0
  
  It's not a contradiction, you just are too literally minded to grasp the metaphor present in his expression. Sad.
110. Re:Not Sure I'm Getting It by gbulmash · 2008-07-02 14:12 · Score: 1
  
  "Expect to see 'core clusters' and 'core clouds' to handle problems that 'won't see any improvement'. These will be abstractions of cores into behaving like one fast core."
  
  Now that would be cool, but how would you do it?
  
  --
  Start a happiness pandemic
111. Re:Not Sure I'm Getting It by dw604 · 2008-07-02 14:12 · Score: 1
  
  other than that? what more could you want.
112. Re:Not Sure I'm Getting It by yabos · 2008-07-02 14:16 · Score: 1
  
  I couldn't play an MP3 on my Mac IIci which was all of 24 or so MHz. I remember exporting it to a WAV so I could listen to it and it took hours to do that.
113. Re:Not Sure I'm Getting It by Joe+The+Dragon · 2008-07-02 14:20 · Score: 1
  
  AMD also has the ram controller built in as well.
114. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 14:29 · Score: 0
  
  I thought Microsoft was the slowdown in multitasking OSes.
115. Re:Not Sure I'm Getting It by TheLink · 2008-07-02 14:47 · Score: 1
  
  I'm not a CPU guy.
  
  Say all the zillions of cores happen to cache the same memory area (running same compute program or something).
  
  Then one of them _writes_ to that memory area.
  
  The rest of the cores will have to know in order for all of them to be in the same "reality".
  
  So that address in their caches will have to be invalidated - so the next time any of them tries to read that address they'll fetch it from wherever they should (probably cheaper to do that than to forcibly cause them to all fetch - it may turn out they never read from that area).
  
  It's all very easy if the cores are treated separately. That's just like having thousands of PCs that aren't connected to each other and can run independently - and only submit their results at the end (just like those public SETI and protein folding stuff).
  
  But many interesting problems aren't so easy. For these problems at some points you'll have to serialize stuff. I don't think you can ever get away from it.
  
  If it's so easy to coordinate parallel computers to serialize stuff and not mess up AND be fast. Then whether you have 1000 cores on a chip or 500 computers with 2 cores each starts to matter less for the computation.
  
  Now the advantage of 1000 cores on a chip of course is they're close to each other- so naturally the locking and stuff will be faster.
  --
  
  Too many replies beneath your current threshold
116. Re:Not Sure I'm Getting It by spirit+of+reason · 2008-07-02 14:58 · Score: 1
  
  That's more of an x86 problem, though, isn't it? Don't ASIDs help that in some other architectures?
117. Re:Not Sure I'm Getting It by Hooya · 2008-07-02 15:07 · Score: 1
  
  "The problem is that many functions people perform on computers have nothing to do with the real world."
  could that be because computer hardware up till now has not been able to handle that level of parallelization that you would need to realistically model the real world? Take for example the DARPA challenge of autonomous driving vehicles. There are so many sensory input that humans process in parallel that humans can react to multiple things in real time. Computers, because of the sequential nature of the processing, have to prioritize certain input and then react to those - leading to a 'soft' realtime if it's even close to realtime. Could thousands of cores process a multitude of 'sensors' in parallel? Could that lead to self-driving cars? Who knows... But one thing I'm fairly certain of is that if todays algorithms and programs and tasks are sequential in nature, it's because there was a need for it to be. The world itself is parallel.
118. Re:Not Sure I'm Getting It by skulgnome · 2008-07-02 15:09 · Score: 1
  
  Well cache reloads are the other major slowdown in multitasking OSes these days. TLB flush? pah, it's in the L2 cache anyhow.
119. Re:Not Sure I'm Getting It by Erich · 2008-07-02 15:17 · Score: 5, Informative
  Single Address Space is horrible.
  It's a huge kludge for idiotic processors (like arm9) that don't have physically-tagged caches. On all non-incredibly-sucky processors, we have physically tagged caches, and so having every app have its own address space, or having multiple apps share physical pages at different virtual addresses, all of these are fine.
  Problems with SAS:
  
  Everything has to be compiled Position-independent, or pre-linked for a specific location
  
  Virtual memory fragmentation as applications are loaded and unloaded
  
  Where is the heap? Is there one? Or one per process?
  
  COW and paging get harder
  
  People start using it and think it's a good idea.
  
  Most people... even people using ARM... are using processors with physically-tagged caches. Please, Please, Please, don't further the madness of single-address-space environments. There are still people encouraging this crime against humanity.
  Maybe I'm a bit bitter, because some folks in my company have drunk the SAS kool-aid. But believe me, unless you have ARM9, it's not worth it!
  --
  -- Erich
  Slashdot reader since 1997
120. Re:Not Sure I'm Getting It by Spy+Hunter · 2008-07-02 15:22 · Score: 1
  
  Or, if you use only memory-safe languages, you can eliminate the need for (and overhead of) hardware memory protection altogether...
  
  --
  main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
121. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 15:24 · Score: 0
  
  The thing is, you actually have to have talked to them to find out if they are the sort you want to talk to or not. And maybe you have done this. But there are an awful lot of shy techy types ( I was one myself thirty years ago..) who don't talk to enough women to give themselves a chance to meet the few whose company they would enjoy. So if the slashdot stereotype is really true( and I have my doubts) there are probably quite a few here who would benefit from the advice to tidy up a bit and get out and talk to a few girls. The whole thing about "my interests are so narrow that few women would share them" is a bit of a nonsense anyway. Yes, you want some interests in common, but a few differences give you something to talk about as well. Of course you have to find what they are interested in interesting enough to listen to, even if it is not your main interest. That may well exclude a lot of the bubble headed Paris Hilton types from your range of possibilities, but there are plenty more fish in the sea.
122. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 15:44 · Score: 0
  
  I think I once figured out that, starting with 3 billion women on the planet, there were about 5 with mutual attraction with me. I think I've found two of them.
  Unfortunately the two women hate each other...
123. Re:Not Sure I'm Getting It by Goalie_Ca · 2008-07-02 15:45 · Score: 1
  
  But unless each process shares memory you end up evicting a lot of cache lines.
  
  --
  
  ----
  Go canucks, habs, and sens!
124. Re:Not Sure I'm Getting It by Lehk228 · 2008-07-02 15:53 · Score: 1
  
  that is part of the parallelizing process, if your algorithm does that, then it isn't very efficient. the problem with going for thousands of cores is that intel is basically asking every programmer to re-learn programming.
  
  --
  Snowden and Manning are heroes.
125. Re:Not Sure I'm Getting It by TheLink · 2008-07-02 15:59 · Score: 1
  
  Which is part of the parallizing process you're talking about?
  
  If you talk about removing the serialization bits, I'd say it's practically impossible (unless there's a way of doing time travel or something).
  
  If it was so simple, you'd be able to run everything on separate isolated computers in parallel - no need for low latency interconnects.
  --
  
  Too many replies beneath your current threshold
126. Re:Not Sure I'm Getting It by Dolda2000 · 2008-07-02 16:07 · Score: 1
  
  I'm not sure if I can agree. Even if you have 10 MB of cache, it would only take about a millisecond to fill it with PC5300 RAM, which is only 1/20 of a timeslice, and it's not as if the CPU stands still during that time. And that's assuming that the cache has been completely invalidated since the last timeslice, which is rather unlikely (especially there are many CPUs (or cores) and the system has a smart scheduler).
127. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 16:13 · Score: 0
  
  What you came up with is called Thread-Level Speculation. It's a good idea, but unfortunately you reach saturation pretty quickly, where adding another thread doesn't buy you much. OTOH for things that don't parallelize well, any speedup you can get is welcome.
128. Re:Not Sure I'm Getting It by desenz · 2008-07-02 16:22 · Score: 1
  
  By putting them next to each other, so they can all hold hands when necessary.
129. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 16:46 · Score: 0
  
  That's easy, just get a two pixel screen and run your games at 2x1.
130. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 16:48 · Score: 0
  
  Even better, turn it to a horse race at multiple levels.
  Run a task slice on multiple cores with multiple strategies, first one to the end wins. This can be done with multiple granularites from the loop level (unrolled or rolled) up to the task level (which blur do you like? 3x3, 5x5, 7x7, 9x9, etc)
  There's lots of cool stuff that can be done with all this computing power. I want it!
131. Re:Not Sure I'm Getting It by philipgar · 2008-07-02 16:49 · Score: 1
  
  And as we start adding more cores to the mix, how much does performance improve? Right now branch predictors are easily in the 90%+ range for many codes. I'll use the 10% number for when the prediction fails, and assume that on a fail, we take 10 times as long to run the instruction. I'll also assume that branches occur every 10 instructions (I can't remember the exact number, so I'm using intentionally pessimistic numbers). Assuming a base CPI of 1 (although the analysis works the same for any cpi), we have a branch misprediction once every 90 instruction, and then must add 10 cycles for the penalty, so our overall speed is limited to 90% of the theoretical maximum. Even if we make all these numbers much worse, we can see that the problem is not branch prediction. If a branch was equally likely to be taken as not taken, two cores (each running a separate branch) could yield an appreciable speedup, this just isn't the case. A 10% speedup by using the wrong path branch scheme isn't horrible performance, but far shy of the performance 2 cores could give, and if we expand the number of cores, we greatly reduce that number.
  
  This leads us to the other problem, in that you assumed that processor speed was the limiting factor. This in fact is not really true. Many processors can run significantly faster then they do, however the speeds aren't sustainable due to power limitations. When power is our concern, we've doubled the power of the processor and improved performance 10%. This is really an awful tradeoff. While power consumption grows with the square of the processor's frequency, this isn't nearly as bad. We can get a 10% performance gain by increasing the processor's power by only 21%.
  
  Phil
132. Re:Not Sure I'm Getting It by ClosedSource · 2008-07-02 17:03 · Score: 1
  
  "could that be because computer hardware up till now has not been able to handle that level of parallelization that you would need to realistically model the real world?"
  No. It's because modeling the real world wasn't the goal in many cases. Besides there's a big difference between modeling and controlling.
  It's seems to me that multiple-core systems may excel in two different areas:
  1) Problems for which there is already a solution, but practical applicatons require near real-time performance and we currently can't achieve that speed (assuming the problem can be solved in parallel).
  2) Problems that we understand how to solve but don't yet have a solution because they would take too long to calculate (say 20 years). A large network of multicore machines might be able to significantly shorten the time (again assuming the problem can be solved more efficiently in parallel).
  On the other hand, problems that we've been working on for many years without a proven approach (e.g. AI) are unlikely to be solved any quicker.
133. Re:Not Sure I'm Getting It by poopdeville · 2008-07-02 17:08 · Score: 2, Interesting
  
  Many core could allow for slower clock speeds, cooler chips and quite computers.
  Of course, An OS could be designed so different modular componts run on different cores.
  More is possible if you have thousands of cores. A machine with thousands of cores could conceivably pre-compute the possible computational consequences of your in-a-standard-deviation-most-likely actions, based on a genetic learning algorithm to figure out what you do when. In a sense, the more predictable you are, the faster it would get. Imagine an iPhone that does that!
  
  --
  After all, I am strangely colored.
134. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 17:10 · Score: 0
  
  Newsflash: one core and 512mb of RAM still is enough for most consumers.
135. Re:Not Sure I'm Getting It by poopdeville · 2008-07-02 17:21 · Score: 1
  
  But what happens when you have more than one core per thread? Do you just let the unused cores lie idle, or do you further decompose the thread and try to predictively parallelise it?
  Yes. You would need thousands for the idea to be effective, though. Each of those cores could run an Erlang machine...
  
  --
  After all, I am strangely colored.
136. Re:Not Sure I'm Getting It by Tablizer · 2008-07-02 17:24 · Score: 1
  
  What we really need is a processing core for every photon [for gaming].
  Maybe that's why physics is so complex: we're living in a simulation where the programmer (AKA "God") followed your idea.
  
  --
  Table-ized A.I.
137. Re:Not Sure I'm Getting It by poopdeville · 2008-07-02 17:30 · Score: 1
  
  Imagine an Erlang machine running on each of those cores....
  
  --
  After all, I am strangely colored.
138. Re:Not Sure I'm Getting It by Stan+Vassilev · 2008-07-02 17:32 · Score: 4, Insightful
  
  As a software engineer, I wonder the same thing.
  Put simply, the majority of code simply doesn't parallelize well. You can break out a few major portions of it to run as their own threads, but for the most part, programs either sit around and wait for the user, or sit around and wait for hardware resources.
  Within that, only those programs that wait for a particular hardware resource - CPU time - Even have the potential to benefit from more cores... And while a lot of those might split well into a few threads, most will not scale (without a complete rewrite to chose entirely different algorithms - If they even exist to accomplish the intended purpose) to more than a handful of cores.
  As a software engineer you should know that "most code doesn't parallelize" is very different from "most of the code's runtime can't parallelize", as code size and code runtime are substantially different things.
  Look at most CPU intensive tasks today and you'll notice they all parallelize very well: archiving/extracting, encoding/decoding (video, audio), 2D and 3D GUI/graphics/animations rendering (not just for games anymore!), indexing and searching indexes, databases in general, and last but not least, image/video and voice recognition.
  So, while your very high-level task is sequential, the *services* it calls or implicitly uses (like GUI rendering), and the smaller tasks it performs, actually would make a pretty good use of as many cores as you can throw at them.
  This is good news for software engineers like you and me, as we can write mostly serial code and isolate slow tasks into isolated routines that we write once and reuse many times.
139. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 17:52 · Score: 0
  
  I don't know about you, but my CompSci degree included a fair amount of content on concurrent and distributed programming.
  The people writing in PHP, VB etc don't need to know this stuff. The popular languages will eventually have it built in (by definition; if huge numbers of cores are the norm, languages which don't support this "for free" aren't going to remain popular for long). And the people who are actually writing programming languages, OS kernels, and performance-critical code already know this stuff.
  The people who are most likely to have problems are the people who should be using VB etc but are writing in C/C++. C is too low-level to include "transparent" parallelism except in the most trivial cases (e.g. recognising when a "for" loop is a "map").
140. Re:Not Sure I'm Getting It by burgundysizzle · 2008-07-02 17:52 · Score: 1
  
  LOL, the posts are starting to sound like they want the features in the Itanium processors.
  Speculative loads so you can start a pre-fetch of data then check to see if it's been loaded later and predication so some branches are not need just different bits of code executed based upon a predicate.
  The Itanium CPU needs a good and intelligent compiler to get good performance. It may be a slow start but perhaps it's the way of the future (ROTFL).
141. Re:Not Sure I'm Getting It by Lehk228 · 2008-07-02 17:53 · Score: 1
  
  that is my point, many algorithms won't split up well, ones that do should perform very well, but those that don't will take a huge penalty.
  
  --
  Snowden and Manning are heroes.
142. Re:Not Sure I'm Getting It by mindstrm · 2008-07-02 18:30 · Score: 1
  
  Even when mp3 came out, it was just another neat tool. THe files were still relatively big, and we didn't have huge disks to store stuff on. The internet was in it's infancy.
  Now, we swap them like gumdrops. We trade them by the thousands, and we have applications to analyze and do all kinds of neat stuff with them.
  When it took an air-cooled pentium with a 100 watt power supply to play mp3s, portables were out of the question.
  The parent is right - faster processors will lead to doing things that today we don't even see as interesting, as they aren't in scope.
  Think, perhaps, analysis and search - processes could be doing analysis on every key you type, every application you use, filtering your rss feeds for you and mapping all your actions into finding new interesting stuff for you online, all without getting in your way. Right now, the algorthms for doing that are in their infancy, nad they eat up a lot of computing resources.
143. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 18:37 · Score: 0
  
  The obvious solution is to have an asymmetric multicore architecture, where you have a few very fast (read complex) for difficult to parallelize tasks, and a lot of simple (slow) cores. Note this is not new, as is exactly how the PS3 is designed.
  The key point is in having the tools (languages, compilers, operating systems) to harness this new level of complexity.
144. Re:Not Sure I'm Getting It by Kuvter · 2008-07-02 19:30 · Score: 1
  
  You'll probably need at least 1000 cores and 8 gigs of ram to run the next Windows. I think that's what you're talking about when you say, "In the near term, you can imagine a whole host of applications that would become possible..."
  
  --
  "To be is to do." --Socrates
  "To do is to be." -- Aristotle
  "Do-Be-Do-Be-Do..." --Sinatra
145. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 19:36 · Score: 0
  
  "but what about the processes that are slow and plodding and sequential? How do those get sped up if you're opting for more cores instead of more cycles?"
  For my sound synthesis needs, under VST and ASIO standards, the difference in CPU charge between
  - a 100$ AMD XP-M 2600+ (overclocked at 2.4ghz, so let's say a hypothetical "amd xp-m 3300+") from around 2004
  - a 200$ Intel Q6600 from 2008
  is a 20% advantage for the Q6600.
  What a ugly ripoff.
146. Re:Not Sure I'm Getting It by bm_luethke · 2008-07-02 19:43 · Score: 1
  
  As a software engineer who actually worked in parallel processing for a number of years I also wonder. We have what is known as Amdahl's Law.
  Reality is that many problems just really can not be split up into small discrete operations. For those a single fast processor is the key. Most problems can be split into a few large discrete operations and multiple cores helps here. Very few can be broken up into many.
  It isn't a matter of thinking about problems differently - there is a whole huge branch of Comp Sci that has been thinking about this for decades. The only computing technology I know of that may end up making an actually change in the way problems are approached is quantum computing, that is if it ever becomes reality as we see it now.
  Many just think that if it is made then people will figure out how to do so - we already can see "machines" in the thousands of processors and there is little even in the scientific realm that can really use it.
  The field itself is quite mature - reality is web browsing, music, word processors, spreadsheets, movies, and other normal user tasks don't even really benefit much from two cores dedicated to them (however they *do* benefit from allowing them to run on one core whilst OS tasks run on another so two core systems are still quite nice). There is no reason for anyone to spend the money needed to a large number of cores that will sit idle.
  Outside of scientific research gaming is the only place I can see some of this. I can't really put a number on where you would need to stop. Even though I'm pretty much certain it is so I can't say that there is no case for 4-8 cores for environment and a core for each AI construct (I've played Quake on an SGI CAVE on MUCH less). That would be one heck of a modeling system and, lets face it, if you *really* needed that then gaming is so far from a general purpose machine that you aren't going to be using the same machine to surf the internet of run you word-processor.
  
  --
  ------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
147. Re:Not Sure I'm Getting It by makapuf · 2008-07-02 19:54 · Score: 2, Insightful
  
  Why "before" ? I think 512Mb RAM / 1 or 2 GHz + decent speedy harddrive IS enough for most consumers, playing (moderately recent) games (maybe upgrading to a newer $50 video card), playing (moderate) HD, MP3, browsing sites, any office work usings lots of ajax/ on FF3.
  You know what ? you could even (gasp) code on it (maybe not compile eclipse every 5 minutes, OK), run a small server on it, or transcoding videos (maybe 4x more slowly, so you'll end up letting it run for the night instead of 2 hours from time to time. big deal)
  Of course, SOME people might need more. For most of us, 512Mb/1x2GHz is perfectly enough (see eeePC).
148. Re:Not Sure I'm Getting It by Dorceon · 2008-07-02 20:33 · Score: 1
  
  In ARM, any instruction (not just jumps like most CPUs) can be conditional. It's a more explicit form of 'do both', but your idea reminded me of it.
  
  --
  What sound do people on rollercoasters make? Hint: it's not Xbox 360.
149. Re:Not Sure I'm Getting It by Raenex · 2008-07-02 20:35 · Score: 1
  
  I typically run firefox, bit torrent, and then folding@home + World community grid at the same time. I close none of these to play TF2 and see no difference in FPS with the CPU hogging programs on or off.
  Firefox is not CPU intensive unless you're sitting on a page that happens to hog the CPU with JavaScript. Not sure about BitTorrent, but it should be mostly IO bound. Folding@home is designed to use your idle CPU processes, so it won't detract from your game. I assume the same from World, though I'm not familiar with it.
  In other words, you could probably get away with 1 CPU.
150. Re:Not Sure I'm Getting It by AlecC · 2008-07-02 20:46 · Score: 1
  
  Well, for one possibility, quality voice recognition. Dozens of cores each parsing the sound stream to test a different hypothesis of what you said, using both phonic and semantic analysis. Somewehat similar to the way the brain works. And results wanted in real time.
  
  --
  Consciousness is an illusion caused by an excess of self consciousness.
151. Re:Not Sure I'm Getting It by Waccoon · 2008-07-02 20:47 · Score: 1
  
  How would this apply to OSes like Inferno that run everything in kernel mode, and use a VM for managing memory for all programs?
152. Re:Not Sure I'm Getting It by AlecC · 2008-07-02 20:52 · Score: 1
  
  So you have to split the problem into an enormous number of pieces - much larger than the number of cores. Then you don't communicate explicitly, you just have a task queue. Each core does one micro-task, generating zero or more new microtasks which added to the task queue. In principle, after each task the core takes the next off the queue, but you might use the equivalent of tail-recursion: run the last-generated microtask, if any.
  But such a system could never be written in C or other pure imperative languages. You need Functional Programming, or some other new paradigm, in which the order of execution is divorced from program layout. Which is what the Intel bods are talking about.
  
  --
  Consciousness is an illusion caused by an excess of self consciousness.
153. Re:Not Sure I'm Getting It by Ant+P. · 2008-07-02 20:52 · Score: 1
  
  Except for the ones who let Microsoft dictate their hardware purchases, 512MB and a 1GHz CPU are still enough.
154. Re:Not Sure I'm Getting It by AlecC · 2008-07-02 20:57 · Score: 1
  
  The majority of code written in current programming languages does not parallelise well, because sequential execution is built into the structure of the language. The default operation is to execute instructions in sequence. But if the default were to execute all the commands in a block in parallel, unless instructed otherwise, there would be a lot more parallelism. And once people got used to this parallel thinking, they would design in more parallelism.
  
  --
  Consciousness is an illusion caused by an excess of self consciousness.
155. Re:Not Sure I'm Getting It by Jellybob · 2008-07-02 22:01 · Score: 1
  
  Supreme Commander does a similar thing if running on a multi-core system, putting rendering, physics, and AI players on different cores.
  As a lot of people have said already, it'll be the games industry that makes use of this sort of thing first, since they have a lot of things that can be easily run in parallel.
  Personally I think it's an interesting idea, but full use won't be made of multiple cores until developers learn to think in terms of parallel processes, instead of the largely procedural model we're using now.
156. Re:Not Sure I'm Getting It by LordMyren · 2008-07-02 22:06 · Score: 1
  
  Storage speeds are irrelevant, you can add as much disk throughput as is necessary. Even pre-SSD, storage scales very very well. Its just not cheap, and it always has large access time.
  "you've still only got a single north bridge": as you yourself say, wrong. You can attach a north bridge to each cpu, giving you a constant scaling factor of CPU's/external system I/O.
  "Now we're back to coherency problems, but at two points.": Its allocating the work load on the NUMA system thats non-trivial. DragonFly BSD is the only reasonable & usable proposal I've seen that addresses this.
157. Re:Not Sure I'm Getting It by phagstrom · 2008-07-02 22:27 · Score: 1
  
  ...we only need 640k...
  of register space per core.
158. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 22:38 · Score: 0
  
  She wants the PIN. What do I do now?
159. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-02 22:42 · Score: 0
  
  this will mainly effect the gaming industry
  No It will effect SecondLife tooo .
160. Re:Not Sure I'm Getting It by Salamander · 2008-07-02 23:00 · Score: 1
  
  So you have to split the problem into an enormous number of pieces - much larger than the number of cores. Then you don't communicate explicitly, you just have a task queue. Each core does one micro-task, generating zero or more new microtasks which added to the task queue.
  Again, that works only if the problem is inherently capable of being decomposed that far, and that's not usually the case. Also, if the tasks in a program all run on one processor then that program is irrelevant to this discussion, and if they don't then we're back to dealing with communication and contention. Task queues shared between processors have been well known loci of contention in parallel programs since the early SMP days. Many programs written in just the manner you suggest have achieved far less than linear scaling (and have sometimes become worse) because cache/memory contention and context switches stole back most of the gains from having multiple processors. Papers are written about the rare exceptions, but don't represent the day-to-day reality for the people who work on such programs or systems.
  
  such a system could never be written in C or other pure imperative languages. You need Functional Programming, or some other new paradigm, in which the order of execution is divorced from program layout. Which is what the Intel bods are talking about.
  That much is true, and the same as what I've said myself before this article came along. Adoption of such paradigms will extend the set of applications that can take advantage of kilocore systems, but it will still be a relatively small set.
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
161. Re:Not Sure I'm Getting It by mdwh2 · 2008-07-02 23:08 · Score: 1
  
  Although note that Chess is an easily parallelisable task anyway, so probably not a good example.
  For non-parallelisable tasks, a problem I see with this method is that you'd have to do this on a per-instruction level. So for every instruction, you have several cores working out the next instruction based on what the result might be - but I wonder if the overhead for setting up different threads this way might not outweigh any advantages?
162. Re:Not Sure I'm Getting It by methuselah · 2008-07-02 23:10 · Score: 1
  
  it seems like to me at some point there will have to be a "central" cpu core that pretends to be the processor. It will then pass on instructions as it receives them and pass the result back. kind of like a kernel in an operating system. this would free coders up from figuring out how to multi thread apps. just do it all in hardware and virtualize it like one core.
163. Re:Not Sure I'm Getting It by GigaplexNZ · 2008-07-02 23:16 · Score: 1
  
  And a separate core for each combination of pixel pairs as many pixels display stuff related to other pixels...
164. Re:Not Sure I'm Getting It by a_real_bast... · 2008-07-02 23:47 · Score: 1
  
  There's also the problem of cache coherency across 1000 cores when not operating on massively parallel data; if each core has a copy of a variable, and one updates it... then what? Intel have answered that question intelligently before, but I'm not sure the old ways will scale.
  
  --
  You're making me think. You won't like me when I'm thinking.
165. Re:Not Sure I'm Getting It by rcallan · 2008-07-02 23:52 · Score: 1
  
  I think there's bigger problems than memory throughput, keeping the memory hierarchy coherent for 1000 L1 caches would be prohibitively difficult ...
166. Re:Not Sure I'm Getting It by StatusWoe · 2008-07-02 23:56 · Score: 2, Interesting
  
  Why do all the processors have to be the same? why not have a x-core processor for the smaller tasks that are easily parallizable and have a high-cycle processor for the ones that aren't? Same might be done for cache requirements?
  
  --
  "drink deeply the illusion of your safety"
167. Re:Not Sure I'm Getting It by Theolojin · 2008-07-02 23:58 · Score: 1
  
  Pleasing a woman is easy. Give her your credit card.
  I believe that is only legal in Nevada. Other states require cash.
  
  --
  Life is short; think quickly.
168. Re:Not Sure I'm Getting It by tgd · 2008-07-03 00:10 · Score: 1
  
  Actually Intel hasn't -- the x86 platform, generally speaking, guarantees cache coherency... and that's a significant overhead at a huge increase in reliability of multi-threaded software written by poor engineers. (No need to worry about inserting every little memory barrier manually, and forgetting a "volatile" keyword, to use a Java example, won't bork your system) At small numbers of CPUs (2 on die, for example) thats an intelligent solution. At 4 or 8 it starts to really make a lot less sense, and at 100 its unworkable.
  What is really needed is to eliminate the requirement for cache coherency and put the intelligence in the runtimes. At compile time you can't figure out where threading issues happen, but analysis at runtime can likely find (and automatically correct) concurrency problems.
  In fact, given the problem of cache synchronization I'm sure both Intel and engineers at IBM, Sun and Microsoft's compiler research groups are all looking at ways to help make that work reliably.
169. Re:Not Sure I'm Getting It by EastCoastSurfer · 2008-07-03 00:23 · Score: 1
  
  On the other hand, a language that was really designed for kilocores or megacores would be radically different from most modern languages, adding a few extra (un)loop-statements wouldn't do. Functional languages are a good bet. When everything is side-effect-free, there's no good reason why all of it can't be executed in parallel.
  Exactly. Languages like you describe already exist. Erlang for example, was designed from the ground up with concurrency in mind.
170. Re:Not Sure I'm Getting It by tkinnun0 · 2008-07-03 00:26 · Score: 1
  
  You can simulate such a programming environment today simply by randomly reordering all lines in a block that are specifically marked not to be reordered. That there aren't such open source tools in use tells that the benefit is not worth the cost.
171. Re:Not Sure I'm Getting It by JasterBobaMereel · 2008-07-03 00:30 · Score: 1
  
  The Majority of PC's are used to do the same tasks that they did 10 years ago
  Database
  Spreadsheet
  Web Browsing
  Email
  Document Writing
  These all do not require massively parallel computing on a desktop machine (the server is another matter) but then they don't really require most of the power of the machines they are run on now ....
  A few people run more intensive apps, servers require much more (in most cases) but the vast majority of people do not require massively powerful machines ... this is why the eePC and similar are so popular ...
  
  --
  Puteulanus fenestra mortis
172. Re:Not Sure I'm Getting It by Targon · 2008-07-03 00:33 · Score: 1
  
  The problem is that most applications out there are single-threaded applications, so the applications themselves do not take advantage of multi-core processors.
  Now, there are some very good uses for more cores when it comes to the game environment. AI for example could be broken out into one thread per NPC(enemy or friend). With more cores, the AI for each of these can become more complex. Instead of needing a single monolithic design to handle all the AI needed in the game, each entity could "think" independent of the others.
  Of course, that doesn't matter much for the majority of people who don't play games, but it is one use. I/O is really the big problem, going to and from the hard drive is a VERY slow process. SSD technology doesn't seem very fast because the hard drive controller isn't terribly fast in most systems. SATA may seem like a huge improvement, but it's not good enough for a SSD in most cases.
173. Re:Not Sure I'm Getting It by master_p · 2008-07-03 01:11 · Score: 1
  
  There are incredible benefits to multicore. Searching, for example, can be sped up almost linearly with the number of cores. Let's say you have an array of 10000 elements and you want to find a specific value. In single core, you have to iterate the array, find the element and process it. If we had 10000 cores, we simply give one element to each core to test with a specific value, and the search time goes down to 1/10000.
174. Re:Not Sure I'm Getting It by AlecC · 2008-07-03 01:13 · Score: 1
  
  No, it shows that there are no CPUs and accompanying execution environments capable of exploiting such techniques. It only becomes worthwhile when there are many cores capable of executing the randomly sorted code. Doing it on a single core, or any system with current inter-process communications models, would just introduce complexity for no gain.
  
  --
  Consciousness is an illusion caused by an excess of self consciousness.
175. Re:Not Sure I'm Getting It by AlecC · 2008-07-03 01:17 · Score: 1
  
  You also need to make mahomet come to the mountain. The current model is that the mountain (data) comes to mahomet (the processor, the active unit). If you have many cores, you put them "near" different address spaces, and the threads execute on a core "near" the data. If one area of memory gets overcrowded, the data migrates to a cooler area with more free cores. It may sound odd to bulk move the data, but if you are using small enough atoms, there will be fewer mcycles involved in a block copy that in accessing it many times over for processing.
  
  --
  Consciousness is an illusion caused by an excess of self consciousness.
176. Re:Not Sure I'm Getting It by AlecC · 2008-07-03 01:21 · Score: 1
  
  It tells us that there are no benefits on a single core, or using threading models fundamentally designed for single cores and then bodged for small numbers of cores. Until all the randomly ordered lines can be executed simultaneously by multiple cores, such randomisation is all pain, no gain. (Except that top-end CPUS do that internally on a very fine grain with look-ahead execution etc.)
  
  --
  Consciousness is an illusion caused by an excess of self consciousness.
177. Re:Not Sure I'm Getting It by Salamander · 2008-07-03 01:38 · Score: 1
  
  You also need to make mahomet come to the mountain. The current model is that the mountain (data) comes to mahomet (the processor, the active unit). If you have many cores, you put them "near" different address spaces, and the threads execute on a core "near" the data. If one area of memory gets overcrowded, the data migrates to a cooler area with more free cores. It may sound odd to bulk move the data, but if you are using small enough atoms, there will be fewer mcycles involved in a block copy that in accessing it many times over for processing.
  The "bulk moves" you suggest sound a lot like the explicit communication (explicit from the hardware standpoint, not that of the higher-level language or API) that I did. Are we just in violent agreement, or is there some nuance that has gone unmentioned so far?
  
  --
  Slashdot - News for Herds. Stuff that Splatters.
178. Re:Not Sure I'm Getting It by a_real_bast... · 2008-07-03 01:40 · Score: 1
  
  Take a look at the bus architecture in the Cell.
  
  --
  You're making me think. You won't like me when I'm thinking.
179. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-03 01:42 · Score: 0
  
  Pleasing a man is easy. Give him your credit card.
  Misogyny: it's not as funny as you think.
180. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-03 01:43 · Score: 0
  
  I must say. I was more interested in comments than in this "news" (intel was announcing something like that).
181. Re:Not Sure I'm Getting It by Floritard · 2008-07-03 02:20 · Score: 1
  
  The 1000 cores might be just what you need.
182. Re:Not Sure I'm Getting It by tehcyder · 2008-07-03 02:27 · Score: 1
  
  I can break a password protected Excel file in 30 hours max with this computer, and a 10000 core chip might reduce this to 43 seconds, but other than that, what difference is it going to make?
  
  As I don't imagine anyone's breaking password protected Excel files for the fun of it, I think your answer is best expressed in the currency of your choice.
  
  --
  To have a right to do a thing is not at all the same as to be right in doing it
183. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-03 02:40 · Score: 0
  
  Bullshit. My Pentium 75 couldn't even play mp3s without stuttering when they first came out.
184. Re:Not Sure I'm Getting It by Coppit · 2008-07-03 03:01 · Score: 1
  
  I think you're wrong. Current processors use branch prediction for speculative execution. That is, they guess which way the branch will go and run with that. If they're wrong they back up and try again. Given that this approach is successful for something like over 97% of the branch choices, actually executing both branches then discarding one isn't going to get you very much.
185. Re:Not Sure I'm Getting It by jedidiah · 2008-07-03 03:05 · Score: 1
  
  Yes because "pushing out the intellegence" of the microprocessor just worked SO WELL for them last time...
  Itanium anyone?
  Building systems out of robust subcomponents is pretty much the
  bedrock of engineering. If you start "deconstructing the onion"
  then you end up with systems that are far too complex for
  mediocre engineers to handle.
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
186. Re:Not Sure I'm Getting It by jedidiah · 2008-07-03 03:14 · Score: 1
  
  > Bullshit. My Pentium 75 couldn't even play mp3s without stuttering when they first came out.
  A WinDOS problem perhaps...
  I was ripping and playing mp3's simultaneously on my 486DX4/100. That's about equivalent to a P5/60.
  So your system should have been more than capable of decoding mp3s without stuttering.
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
187. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-03 03:16 · Score: 0
  
  I really am getting sick of this mentality that we don't have problems that can be parallelized. The reason I say that is FPGAs. You talk about having a 100 cores won't help. Welcome to 350,000 LUT FPGAs. These are effectively 350,000 VERY small processors. Each one can do 1 bit of addition. Now I fully agree that memory bandwidth is our biggest problem but to keep harping on many cores being ineffective is counterproductive. These problems of not being able to handle multiple cores is because software guys only seem to be able to handle thinking sequentially.
188. Re:Not Sure I'm Getting It by jedidiah · 2008-07-03 03:18 · Score: 1
  
  Quite so. The OP has things BACKWARDS.
  If 1000 cores are something really potentially useful then there needs to be
  someone out there RIGHT NOW that's reading this and thinking: Oh yeah. I know
  what to do with those cores. Bring them on and I will be able to solve problem
  X that's been keeping me up at night.
  Humans are always greedy beyond their means. Technology is no different.
  I could see 10 or so cores being useful in a mythtv frontend for decoding high resolution h264 video.
  I already use about 7 cores now to do this in serial with individual files and divx or h264.
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
189. Re:Not Sure I'm Getting It by jedidiah · 2008-07-03 03:27 · Score: 1
  
  Sure it is. What's funnier is that this stuff is NOTHING compared to Glamour and Cosmo.
  You want misogyny... there's misogyny.
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
190. Re:Not Sure I'm Getting It by jhfry · 2008-07-03 03:40 · Score: 1
  
  The future of computers is the break from the GUI. In the future I expect to be able to speak to my computer, and to have it see me so I can interact with it physically.
  For example, I want my computer to wake me up in the morning like my mother did when I was in High School. I want to to hear me up and moving around, or see me come down to the toilet... and if it doesn't I want to keep waking me up.
  I want my computer to know when I'm planning to go on a trip because it sees me packing and ask me where I'm going so it can warn me about traffic and weather problems.
  I want my computer to keep track of my schedule for me and remind me about my appointments of the day as it sees me walk out the door... not on some arbatrairy schedule I set.
  I want my computer to remind my kids to bring an umbrella to the bus stop, or grab their lunch when they leave for school.
  All of these things are possible, and will happen in the future. However they will require a level of parallel processing that we can only dream about. It's proven that the human mind is not like a single core design, but instead like thousands of cores each performing their own functions yet working together to accomplish even the smallest task. When the software exists to do the things listed above, it will require many cores.
  Computing today is largely sequential because we are using computers as a tool to complete one or two tasks at a time. The future is using a computer to perform thousands of simple tasks at the same time. Which lends itself well to multiple cores.
  
  --
  Sometimes the best solution is to stop wasting time looking for an easy solution.
191. Re:Not Sure I'm Getting It by Illbay · 2008-07-03 04:04 · Score: 1
  
  Are there any NON-multitasking OSes in common use any longer?
  
  --
  Any technology distinguishable from magic is insufficiently advanced.
192. Re:Not Sure I'm Getting It by m50d · 2008-07-03 04:24 · Score: 1
  
  That's only one particular implementation of Python; it happens to be the most common one, but it's not the only one. You can use IronPython and have multithreading work fine.
  Furthermore, the code to remove the GIL has been written, but doing so slows down python on uniprocessors by a factor of 2, which is why it hasn't been merged yet. So once (if) the average system has 2 cores, removing it will be viable, and may well be done.
  
  --
  I am trolling
193. Re:Not Sure I'm Getting It by jonbryce · 2008-07-03 04:35 · Score: 1
  
  I don't think your mother's brain can be expressed in boolean logic. Until we can get round that problem, the things you describe are not going to happen, no matter how fast the hardware is.
194. Re:Not Sure I'm Getting It by Jurily · 2008-07-03 04:41 · Score: 1
  
  Yes, but we love good ol' C.
  Besides, is the overhead of using memory-safe languages smaller than the same done in hardware? And what if something gets corrupted?
195. Re:Not Sure I'm Getting It by cytg.net · 2008-07-03 05:17 · Score: 1
  
  3D is what this is all about .. if this billioncore philosophy doesnt tie in with their upcoming Larrabee.. well...it does
  . So i'd say its part evolutionary and part "lets corner the industry into this concept of cpu/gpu-whatsthedifference-multimulti-core line of thought.
  .. and you dont have to own a crystal ball to see that the road to multipass land is paved with patens.
196. Re:Not Sure I'm Getting It by owlstead · 2008-07-03 06:00 · Score: 1
  
  Note that you can also do this pretty well when running a virtual machine that doesn't directly address memory (e.g. JVM, but many others). You may run a hell of a lot of applications in the same address space, without them ever interfering with each other (unless intended). Of course you would need protection for your data structures such as serialization and/or immutable objects. Of course, this would make you dependent on said VM.
  Processes are starting to feel like second class citizens to me. They do really abstract applications well, but at the cost of a whole lot of connectivity with other running applications and services.
197. Re:Not Sure I'm Getting It by tgd · 2008-07-03 06:20 · Score: 1
  
  Hey, I didn't say it was a good idea. I just said I'd be willing to bet they're doing that.
198. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-03 06:23 · Score: 0
  
  Pleasing a man is easy. Give him your credit card.
  A blowjob is even better...
199. Re:Not Sure I'm Getting It by default+luser · 2008-07-03 06:50 · Score: 1
  
  A millisecond is an eternity in CPU time, especially because context switches typically happen more than once per second.
  If you're eating up 10ms out of every second (%10) with context switches, that's a lot of overhead for a modern processor.
  Multiple execution cores (or SMT) can save you from the overhead of context switches, but nothing will save you from the complexity of cache tainted by multiple processes. Two things:
  1. For independent caches, you get the incredible overhead of keeping the caches coherent (same data reflected in all caches). If a processor changes data in memory, the supporting circuitry must make sure that if this memory is cached elsewhere, the entry is invalidated or updated. The complexity of these circuits increases with every extra processor you add.
  2. For shared caches (like the Intel Nehalem and AMD Phenom L3), you get cache corruption as the current thread expunges entries used by other threads from the cache. More threads running simultaneously means more cache corruption.
  
  --
  Man is the animal that laughs.
  And occasionally whores for Karma.
200. Re:Not Sure I'm Getting It by default+luser · 2008-07-03 07:08 · Score: 1
  
  Yup, I remember running winamp 2 on my Pentium 133 overclocked to 150 (75x2). The player took about %20 of CPU time. A CPU half that speed could easily render an mp3, although it couldn't do anything else wile rendering.
  
  --
  Man is the animal that laughs.
  And occasionally whores for Karma.
201. Re:Not Sure I'm Getting It by curunir · 2008-07-03 07:39 · Score: 1
  
  This is a contradiction. "Not knowing you have the desire" is the same as NOT HAVING the desire...
  You missed a crucial word in what I said that makes a world of difference. "Not knowing you have the desire" and "Not knowing yet that you have the desire" are two entirely separate things.
  The word yet implies a temporal comparison. The addition into that sentence signifies that, at some point in time, people would know that they would have the desire, but at a previous point in time they did not know that they had the desire. I'll admit that it could have been worded better, but that's just semantics.
  
  The REAL innovation of MP3 was a codec that compressed audio to a large degree (80%+) without a substatantial loss of quality. Before that, "digital music" was unpopular because WAV files were too large to share using the link speeds at the time (9600 baud) and other codecs sounded like ass.
  This was my point entirely. The mp3 boom came about because technology advanced to a certain threshold where its use became feasible. The CPU reached a certain point where the researchers at the Fraunhofer institute realized that a compression algorithm that would chop 90% of the file size off a raw audio sample could work. Then hard drives and bandwidth advanced to the point where the general public could use that compression algorithm to the desired effect.
  What if CPU makers asked the question, "why make our CPUs faster...they can already do [insert cpu tasks done at the time]?" or had hard drive manufacturers ask, "why increase capacity for consumer drives since there's nothing that people need to store that is that large?" or had modem manufacturers asked, "why increase the speed of the modem when 9600 baud is fine for connecting to a BBS or telnet session?"
  But they didn't. And others found things to do with those extra CPU cycles, Megabytes and kbps that would have been very difficult to predict ahead of time. And because they didn't ask those, my mom (an extreme example of a technophobe) has an iPod that she listens to every day. She would have never known to ask for it before others had shown it was possible. But once she was shown it was possible, she wanted it and it would now be a huge loss for her to not have that ability.
  Proliferating the number of cores will have the same effect. It may require other technologies to also reach a certain threshold, but there will come a time when that advancement will play a part in enabling some use that we will consider to be essential in the future. We just don't know what that is yet .
  
  --
  "Don't blame me, I voted for Kodos!"
202. Re:Not Sure I'm Getting It by Anonymous Coward · 2008-07-03 07:53 · Score: 0
  
  To remove all the obfuscatory and overblown language, you want to use multiple cores to evaluate both sides of every branch, selecting the correct result after the branch has been resolved.
  This idea has been around for a long time, and typically has been found wanting: it's very wasteful of hardware resources and power, especially if you want to allow more than one unresolved branch.
203. Re:Not Sure I'm Getting It by D+Ninja · 2008-07-03 08:17 · Score: 1
  
  And other things like compiling software may or may not see much of an improvement depending upon the design of the source.
  Heck. You may actually get to see Gentoo finish compiling in your lifetime!
  ...yeah...I know, I know. I'm trolling. I'll go back to my basem^H^H^H^H^Hcave now.
204. Re:Not Sure I'm Getting It by D+Ninja · 2008-07-03 08:20 · Score: 1
  
  However, this will mainly effect the gaming industry.
  I don't typically harp on grammar, but effect vs. affect is the one thing I remember from English class in high school...so...
  In this case, "effect" means "brought about." So, you're saying that computing power will bring about the gaming industry.
  Try "affect" which means influence.
205. Re:Not Sure I'm Getting It by LandDolphin · 2008-07-03 08:40 · Score: 1
  
  As I said, I placed errors there for your entertainment. :-)
  
  --
  Spelling and Grammar errors have been added to this post for your enjoyment
206. Re:Not Sure I'm Getting It by KevReedUK · 2008-07-03 09:48 · Score: 1
  
  Erm... Correct me if I'm wrong, but isn't 10ms ONE percent of a second?
  
  --
  Just my $0.03 (At current exchange rates, my £0.02 is worth more than your $0.02)
207. Re:Not Sure I'm Getting It by John+Bayko · 2008-07-03 09:48 · Score: 1
  Problems with SAS:
  
  Everything has to be compiled Position-independent, or pre-linked for a specific location
  If it's a well designed processor, that's no problem. I remember programming 6809, position independent was as easy as position dependent (providing you don't move it once it starts running).
  
  Virtual memory fragmentation as applications are loaded and unloaded
  This isn't as easy on bare metal, as Mac OS 9 and earlier showed, but possible by relocating the position independent code or data as needed. Virtual machines (or adequate runtime environments) make it easy though, since you know which values are pointers, you can update them transparently (as happens during compacting garbage collection).
  
  COW and paging get harder
  You can still have paged mapped memory with a single address space. Maybe you lose a little efficiency, but still save on context switch overhead.
  
  Where is the heap? Is there one? Or one per process?
  Memory management would be different (discontinuous), but user level code should be using an intermediate library (like malloc/free) which hides all that anyway.
  Not that I'm saying single address space is better, but a lot of the problems come mainly from using it with OS and environments designed for virtual address space - of course that's going to go badly. Legacy technology gets that way because it's more effort to change it than just adapt what came before, not because it's better.
208. Re:Not Sure I'm Getting It by John+Bayko · 2008-07-03 10:09 · Score: 1
  
  The Burroughs B5000 pretty much covered that. It was designed so that only "safe" compiler-emitted code could be run, because only a "safe" program could set the "safe" flag on an executable file (probably was a way around that for system development, but good enough for normal usage):
  One thing that dismays me is that systems in the past had some amazing and wonderful technology beyond what most people these days would even imagine. Unfortunately, there was always a "simplification" trend: mainframes got refined and safer, but complex and expensive until someone figured out new and cheaper technology minicomputers could do 80% of the job for a fraction of the cost. Then minis encountered the same problems with the remaining 20% and had to have the same refinements added, becoming complex and expensive until newer and cheaper technology meant microcomputers could do 80% of the job for a fraction of the cost, and the cycle repeated. Except it got interrupted partway with PCs.
  Unfortunately development could either do the same thing better, or do more things "well enough", and with PCs doing more things won out, leaving the people who came after to think that "good enough" is "best possible" (or it would have been changed by now, right?), and the pinnacle of computing accomplishments, whether hardware, OS, language, etc. is forgotten and becomes myth. Or is occasionally rediscovered as if it were new.
  Anyway, the point is it's good to learn history, because you never know when the problem you have now will be one already solved decades ago.
209. Re:Not Sure I'm Getting It by John+Bayko · 2008-07-03 10:26 · Score: 1
  
  I think the (much, much delayed) Sun "Rock" SPARC-based CPU does this. Maybe the design delay indicates why it's not tried more often.
210. Re:Not Sure I'm Getting It by kesuki · 2008-07-03 11:30 · Score: 1
  
  actually, there were a few really bad, really slow mp3 players out there. I remember trying 3 players before i found one that worked the way i wanted it to. but then i was using my 486 until i got a laptop with a Pentium 120.
  oh yeah, and i had 48 MB of ram, the maximum my system supported. RAM can make a huge difference, with badly coded software...
  
  --
  https://www.gnu.org/philosophy/free-sw.html
211. Re:Not Sure I'm Getting It by Spy+Hunter · 2008-07-03 14:26 · Score: 1
  
  Yes, but we love good ol' C.
  More's the pity. This attitude has been holding CS back for years...
  
  Besides, is the overhead of using memory-safe languages smaller than the same done in hardware?
  Yes. By a lot. But don't just take my word for it; there are published papers from the Singularity guys.
  
  And what if something gets corrupted?
  For all intents and purposes, the JVM and the CLR are perfect in their memory safety. (If you think otherwise, I'm sure you could get a lot of money for your remote code execution exploit against Silverlight or Java applets...) The only "corruption" likely to happen is hardware failure, which traditional OSes are also vulnerable to. One flipped bit in the wrong place can bring down any computer system.
  It would be an interesting experiment to try flipping random bits in memory to see which OS is more resistant to such events on average, but I don't think you can a priori declare that e.g. Singularity would be worse than Windows or Linux. Furthermore, the occurrence of such events is so rare, and even when one does occur the likelihood that it actually hits a bit that is important is so small, that it is probably not even worth worrying about to any great extent.
  
  --
  main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
212. Re:Not Sure I'm Getting It by Hal_Porter · 2008-07-03 19:43 · Score: 1
  
  NT based OSs have run on platforms that don't guarantee cache coherency in the past. In fact Windows still runs on Itanium and that AFAIK has Risc like cache coherency sematics - i.e. software needs to use special instructions to force it when needed.
  Now you could imagine an Intel platform which would boot with x86/x64 augtomatic cache coherency enforcement, though actually it would be rather inefficient, essentially an emulation. Once a recent OS started the HAL would use special instructions to say "From now on don't enforce it". From that point on it would use special instructions to explicitly flush caches when needed. Intel chips already have MONITOR and MWAIT instructions to handle spinlocks effectively.
  So your processor would run old OSs, albeit not quickly, and new OSs quickly if they knew how to switch to software visible coherency management. It could even run old OSs quickly if you used an updated HAL.DLL
  
  --
  echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
213. Re:Not Sure I'm Getting It by Hal_Porter · 2008-07-03 20:01 · Score: 1
  
  If it's a well designed processor, that's no problem. I remember programming 6809, position independent was as easy as position dependent (providing you don't move it once it starts running).
  Actually one of the interesting side effects of segmentation in x86 was that code was position independent, since all addresses were an offset from a segment register. TSRs and device drivers would quite frequently move themselves around in memory quite a bit in an attempt to reduce their resident size. Since each module had it's own CS and DS values the code would still work if you did this. It meant that you could start of with code like this
  Module 1 Code
  Module 1 Data
  Module 2 Code
  Module 2 Data
  Module 3 Code
  Module 3 Data
  Now suppose you run and find that you don't need Module 2 on this system you can do this
  Module 1 Code
  Module 1 Data
  Module 3 Code
  Module 3 Data
  Modules 1 and 3 could chuck away all their initialization code and data too. So the resident size could be tiny, a few hundred bytes even if the load size was hundreds of kilobytes. The constant values in the segment registers would change during the move of course, but you could just correct the constants the interrupt service routines loaded.
  x64 is supposed to be quite PIC of course, there's an RIP (the 64 bit instruction pointer) relative address mode you can use. x86 wasn't at all PIC - the OS loader needed to fix up addresses if an executable was loaded at an address other than the one it was linked to run at. There's a cost for that - at least in Windows pages containing code are mapped copy-on-write, so even one fixup in a page forces that copy of the executable to have a private copy of the entire 4kB page.
  In fact from what I've read Microsoft pressured AMD to add RIP relative addressing because of this. x64 was meant for servers, and servers apparently spend significant amounts of memory on copy-on-write pages due to fixups on x86.
  http://www.nynaeve.net/?p=192
  
  --
  echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
214. Re:Not Sure I'm Getting It by 10Ghz · 2008-07-03 20:32 · Score: 1
  
  "That's what I'm curious about. Having 2 cores is enough for most consumers, one for the OS and background tasks and one for the application you're using. And that's overkill for most users."
  But with dozens (if not more) cores, individual tasks are (or rather, should be) broken up more.
  So you talk about having "one core for the OS". Well, the OS could be doing system-wide spellchecking, so that task could get one core dedicated to it. the search-tool could be indexing the HD and creating semantic connections between files and people, so that task gets a dedicated core. And so forth. And I didn't even mention "generic house-keeping".
  As to the app I'm using.... suppose it's a app to manage my photo-libraary. One core is busy creating previews for my images, another core is busy indexing the photos, third core is busy adjusting the exposure, since I'm editing one of the pictures.
  Or what if I'm playing a game? One core could be handling the physics, anoter might be handling the sound (each channel could have a dedicated core), another core could be setting up the graphics, preparing to push them to the GPU, another could be handling the AI (hell, individual AI-elements in the game could get an individual core!)...
  That's already a bunch of cores being used right there.
  
  --
  Lesbian Nazi Hookers Abducted by UFOs and Forced Into Weight Loss Programs - -all next week on Town Talk.
215. Re:Not Sure I'm Getting It by rootooftheworld · 2008-07-04 03:55 · Score: 1
  
  how much time would it take to break government cryptos? *thunk*
  
  --
  I know full well that tobacco is bad for you, so I smoke weed with crack
216. Re:Not Sure I'm Getting It by PrntlUnit27 · 2008-07-04 05:11 · Score: 1
  
  I can break a password protected Excel file in 30 hours max with this computer, and a 10000 core chip might reduce this to 43 seconds, but other than that, what difference is it going to make?
  29 hours 59 minutes 17 seconds?
  On my old Intel the difference was 287 years.
217. Re:Not Sure I'm Getting It by taylorjonl · 2008-07-04 23:20 · Score: 1
  
  This is true, read up on the OpenSPARC T2 processor and how it is able to run 8 concurrent threads per core. I think core counts like this will help with AI. After all the human brain is very similar to a computer with many, many cores. It doesn't process fast but the processing happens concurrently on a very large scale.
218. Re:Not Sure I'm Getting It by Jurily · 2008-07-10 17:10 · Score: 1
  
  Yes, but we love good ol' C.
  More's the pity. This attitude has been holding CS back for years...
  Please enlighten me. What language is your sig in again?
  
  But don't just take my word for it; there are published papers from the Singularity guys.
  Sure, they're going to publish a paper that says C is more efficient than their garbage-collecting crap, right?
  
  but I don't think you can a priori declare that e.g. Singularity would be worse than Windows or Linux.
  Sure they're not. But they also don't claim to be, nor do they claim to be 'memory safe'.
  
  Furthermore, the occurrence of such events is so rare,
  If you didn't see that one happen, you need to use computers more. Sorry.
219. Re:Not Sure I'm Getting It by Spy+Hunter · 2008-07-10 18:07 · Score: 1
  
  I'm sorry, can you try making a coherent argument that isn't ad hominem? Thanks.
  
  --
  main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Imagine! by gbjbaanb · 2008-07-02 08:44 · Score: 0

now, imagine a beowulf cluster of .. oh never mind.
Great... by Amarok.Org · 2008-07-02 08:44 · Score: 4, Funny

As if Oracle licensing wasn't complicated enough already...

--
-- "Other than that, how was the play Mrs. Lincoln?"
1. Re:Great... by MBGMorden · 2008-07-02 09:05 · Score: 1
  
  IBM is just as bad. They charge by performance unit, and a core is worth more or less depending on the number of cores present on the chip, the number of processors present in the machine, and the architecture of the chip in question.
  Single core x86 chips would be 100 per core, but dual or quad core x86 chips would be 50 per core.
  Then say SPARC-based chips might be worth 60 units per core regardless of how many cores per chip.
  Then there are the absurdity of their AS/400 machines, which ship with a certain number of processors already in the machine anyways, but you have to pay extra (aside from extra software licensing) in order to have additional ones activated if you need them . . . .
  
  --
  "People who think they know everything are very annoying to those of us who do."-Mark Twain
2. Re:Great... by Penguinisto · 2008-07-02 09:15 · Score: 2, Interesting
  
  ...then again, I can see it as an argument for vendors to finally --finally!-- stop counting "processors" as their license limit metric. And yes VMWare, I'm talking to you too when I say that.
  /P
  
  --
  Quo usque tandem abutere, Nimbus, patientia nostra?
3. Re:Great... by Detritus · 2008-07-02 15:18 · Score: 1
  
  1. How much money do you have?
  2. Give us 25% of line 1.
  
  --
  Mea navis aericumbens anguillis abundat
Memory bandwidth? by Brietech · 2008-07-02 08:45 · Score: 5, Interesting

If you can get a thousand cores on a chip, and you still only have enough pins for a handful (at best) of memory interfaces, doesn't memory become a HUGE bottleneck? How do these cores not get starved for data?

--
I'm perfect in every way, except for my humility.
1. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 08:47 · Score: 0
  
  Always has, always will be.
2. Re:Memory bandwidth? by Piranhaa · 2008-07-02 08:54 · Score: 1
  
  If the memory controller is built onto the silicon, each core has access to the cache directly, and there is enough bandwidth between the cache and memory, I don't see this being a problem. I'm quite sure they have this figured out :)
3. Re:Memory bandwidth? by smaddox · 2008-07-02 08:55 · Score: 2, Interesting
  
  Memory would have to be completely redefined. Currently, you have one memory bank that is effectively accessed serially.
  If you have 1000 cores that depend on the same data, you would have to have a way of multicasting the data to the cores, which could then select the data they want.
  Basically, hardware and software architecture has to be completely redefined.
  It is not impossible, though. Just look around. The universe computes in parallel all the time.
4. Re:Memory bandwidth? by lazyDog86 · 2008-07-02 08:57 · Score: 2, Insightful
  
  I would assume that if you have enough transistors to have thousands of cores that you will be able to put on a lot of SRAM cache as well - just drop a few hundred or thousand cores. You won't be able to integrate DRAM since it requires a different process, but SRAM should be integrated easily enough.
  
  --
  my insights may be modded Funny, but at least some of my jokes are modded Insightful
5. Re:Memory bandwidth? by tt465857 · 2008-07-02 08:59 · Score: 2, Interesting
  
  3D integration schemes, which IBM and Intel are both pursuing, help deal with this problem. As you noted, you can't put enough pins on a chip with traditional packaging to achieve a sufficient memory bandwidth. But with 3D integration, the memory chips are connected directly to the CPUs with "through-chip vias". You can have tens of thousands of these vias, and as a bonus, the distance to the memory is extremely short, so latency is reduced.
  - Trevor -
  [[self-construction]]: The autotherapeutic diary of a crazy geek's journey back to mental health
6. Re:Memory bandwidth? by Gewalt · 2008-07-02 09:12 · Score: 2, Insightful
  
  Not really. If you can put 1000 cores on a processor, then I don't see why you cant put 100 or so layers of ram on there too. Eventually, it will becomea requirement to get the system to scale.
  
  --
  Modding Trolls +1 inciteful since 1999
7. Re:Memory bandwidth? by Penguinisto · 2008-07-02 09:19 · Score: 1
  
  Could be, but that could be solved (at least physically) by using daughterboards/Slot 1 like rigs and by physically breaking up the CPU into discrete chips (which in turn would offer an interesting way to upgrade... don't want to buy a whole new CPU? No problem, just buy some additional 'core pack' chips and plug 'em into empty daughterboard slots).
  
  Never underestimate the ingenuity of an engineer when there's a potential to make shitloads of money off of the solution, even if that solution isn't the most optimal or elegant.
  /P
  
  --
  Quo usque tandem abutere, Nimbus, patientia nostra?
8. Re:Memory bandwidth? by bluefoxlucid · 2008-07-02 09:33 · Score: 4, Insightful
  
  Memory would have to be completely redefined. Currently, you have one memory bank that is effectively accessed serially.
  Yes, in Intel land. AMD has this thing called NUMA. What do you think "HyperTransport" means?
  
  --
  Support my political activism on Patreon.
9. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 09:35 · Score: 2, Insightful
  
  You need a basic course in TTL. No they haven't figured this out and putting address decoded on the chip makes very little difference when you scale. They also haven't figured out communication between cores. We had 1000s of CPUs rigged up with transputers back in the 80s. It was a mare, and near useless for just about everything. We had to use serial data to make things sane.
  The more logic you have the longer the signal path. The longer the signal path the hard it is to sync on the clock pulse. The higher the clock freq the less like a square wave the single is, it starts to look like a ramp.
  There are huge problems with scaling, whether it's speed or cores. If Intel want us to have all these cores, their engineers are going to have to overcome the same problems parallel programming has had for 30 year or more.
10. Re:Memory bandwidth? by AnyoneEB · 2008-07-02 09:45 · Score: 1, Informative
  
  Intel is finally catching up to AMD on that front with Nehalem.
  
  --
  Centralization breaks the internet.
11. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 09:53 · Score: 0
  
  Don't surprised so many of us don't know what HyperTransport means. I've only seen the original video, but I don't think HT was shown or described by the NUMA guy. Mainly he just seems to be ripping on Maya - e.g., "Maya, hee!", "Maya, who?" - but without making a clear case for other 3D graphics software.
12. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 09:56 · Score: 0
  
  Welcome to November, 2008, and the launch of Intel's 'Nehalem' processor and chipsets. After disparaging AMD's approach for so long, they've 'seen the light' and are going with something very similar.
13. Re:Memory bandwidth? by NickDngr · 2008-07-02 10:03 · Score: 1
  
  If you can get a thousand cores on a chip, and you still only have enough pins for a handful (at best) of memory interfaces, doesn't memory become a HUGE bottleneck? How do these cores not get starved for data?
  Nehalem
  
  --
  Yoda of Borg am I! Assimilated shall you be! Futile resistance is, hmm?
14. Re:Memory bandwidth? by kipman725 · 2008-07-02 10:03 · Score: 1
  
  well stuff like VRAM is already duel ported meaning multiple bits of hardware can read and (under some restrictions) write to the same memory IC. This however increases the wafer area for each memory cell with each additiional port.
15. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 10:26 · Score: 0
  
  Numa numa numa yay!
16. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 12:03 · Score: 0
  
  Memory would have to be completely redefined. Currently, you have one memory bank that is effectively accessed serially.
  Yes, in Intel land. AMD has this thing called NUMA. What do you think "HyperTransport" means?
  Intel will respond with a dual-core version. It'll be called NUMA-NUMA
17. Re:Memory bandwidth? by stinkbomb · 2008-07-02 13:36 · Score: 1
  
  You know, Trevor, I've turned off sig blocks for a reason. It's just plain rude to jam one into your post just to make sure that everyone sees it.
18. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 15:24 · Score: 0
  
  NUMA in a one chip system? Explain how. .... *cricket chirp* ....
  Thanks.
  
  If you can get a thousand cores on a chip, and you still only have enough pins for a handful (at best) of memory interfaces, doesn't memory become a HUGE bottleneck? How do these cores not get starved for data?
  Still a valid question.
19. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 18:45 · Score: 0
  
  Intel is addressing the disparity between AMDs memory interface in a future chip. I think it's codenamed Nehalem, but I could be mistaken. My guess is that they were smart enough to realize that they could spank AMD in desktop performance without 'wasting' any die space for an on-chip memory controller, then wait for their process lead to increase enough to add it comfortably.
20. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 19:00 · Score: 0
  
  AMD has this thing called NUMA.
  
  maya-hi maya-ha maya-ha maya-ha-ah!
21. Re:Memory bandwidth? by Anonymous Coward · 2008-07-02 21:00 · Score: 0
  
  that only works if you have a relatively small amount of cores per chip. if you have an 8 chip amd server with a hypothetical 128 cores per cpu, you would still have 128 cores trying to get their data through a single dual channel memory controller (or worse, hypertransport)
  granted, it would be only 128 cores on two 64 bit memory channels instead of 512 of 1024 on a similar intel system, but its still not gonna work
  (besides, the current intel chips will be the last ones without the IMC, the new generation intel chips (nehalem, benches are out already) will be much more AMD like in terms of platform design)
22. Re:Memory bandwidth? by LordMyren · 2008-07-02 21:22 · Score: 1
  
  Yes memory bandwidth is obviously a concern.
  Intel is switching to integrated memory controllers on each die with the upcoming Nehalem. It has a tri-channel memory controller that will eventually run 1.3GHz. Thats 32GB/s of memory bandwidth -per socket-. Get four Nehalems together and you'll have (almost) the same aggregate bandwidth as a reasonably top of the line graphics card.
23. Re:Memory bandwidth? by explorer107 · 2008-07-02 23:33 · Score: 1
  
  3D stacking is probably that technology, which can lead to memory-on-processor designs with pretty much unlimited memory bandwidth.
  MulticoreInfo.com
24. Re:Memory bandwidth? by Anonymous Coward · 2008-07-03 00:50 · Score: 0
  
  So, people should go partially asynchtronic on cores and memory access. Hmm, software configurable timing network for synchronization...?
25. Re:Memory bandwidth? by Rhys · 2008-07-03 03:05 · Score: 1
  
  I think HT means you don't understand the problem: NUMA and HT are irrelevant. Intel is talking 1000s of cores in a single die. That's still a single memory link for the AMD memory-controller-on-chip CPU too. At best you could scale it out to a few dozen links, but at that point the pin requirements for memory get troublesome.
  What they're talking about "redefined" is processor-in-memory. Or memory-in-processor. Take it whatever way you want, the result is going to be about the same. Research into that has been going on for quite a while in the academic CS realm.
  
  --
  Slashdot Patriotism: We Support our Dupes!
Generic jokes by Toe,+The · 2008-07-02 08:45 · Score: 0, Redundant

Maybe Program X will finally not be so slow.
It's a series of tubes; um cores.
Howabout a beowolf clust... I can't even do that one.
1. Re:Generic jokes by cashman73 · 2008-07-02 08:58 · Score: 0, Redundant
  
  Once the squeeze the 10,000th core into that, most people ought to be able to run Windows Vista just fine,... maybe even Duke Nukem Forever, too? ;-)
2. Re:Generic jokes by TaoPhoenix · 2008-07-02 09:03 · Score: 5, Funny
  
  In the Soviet Union ...
  Oh wait... the Soviet Union already broke into smaller cores.
  
  --
  My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
3. Re:Generic jokes by Anonymous Coward · 2008-07-02 09:19 · Score: 0
  
  In Soviet Russia, cores prepare for thousands of you!
4. Re:Generic jokes by Anonymous Coward · 2008-07-02 10:22 · Score: 0
  
  Program X, sure. But not Adobe Acrobat. Never surrender!
Disagreement about this trend by Raul654 · 2008-07-02 08:46 · Score: 5, Interesting

At Supercomputing 2006, they had a wonderful panel where they discussed the future of computing in general, and tried to predict what computers (especially Supercomputers) would look like in 2020. Tom Sterling made what I thought was one of the most insightful observations of the panel -- most of the code out there is sequential (or nearly so) and I/O bound. So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this) Thus, he predicted, processors would max out at a "relatively" low number of cores - 64 was his prediction.

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
1. Re:Disagreement about this trend by RailGunSally · 2008-07-02 08:57 · Score: 5, Funny
  
  Sure! 64 cores should be enough for anybody!
2. Re:Disagreement about this trend by jez9999 · 2008-07-02 08:59 · Score: 1, Flamebait
  
  How is it possible to get a good, responsive end-user e-mail experience with a mere 64 cores?
  
  --
  == Jez ==
  Do you miss Firefox? Try Pale Moon.
3. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 09:02 · Score: 0
  
  64 cores is enough for anybody.
4. Re:Disagreement about this trend by tzhuge · 2008-07-02 09:11 · Score: 3, Funny
  
  Sure, until a killer app like Windows 8 comes along and requires a minimum of 256 cores for email, web browsing and word processing. Interpret 'killer app' how you want in this context.
5. Re:Disagreement about this trend by clampolo · 2008-07-02 09:13 · Score: 0
  
  I disagree with this guy. The average automobile has over 300 processors in it. Why would a computer max out at 64?
  Part of the reason for this is that things become easier when you have a separate processor running things. Instead of having a single processor share time between the cruise control, the engine controller, turning signals, variable speed wipers, etc... you just have a different processor for each and glue them together. I suspect that once processors become ubiquitous software engineers will learn this and use up all the processors as well to make simpler code.
6. Re:Disagreement about this trend by RightSaidFred99 · 2008-07-02 09:22 · Score: 4, Insightful
  
  His premise is flawed. People using email, running a web browser, etc... hit CPU speed saturation some time ago. A 500MHz CPU can adequately serve their needs. So they are not at issue here. What's at issue is next generation shit like AI, high quality voice recognition, advanced ray tracing/radiosity/whatever graphics, face/gesture recognition, etc... I don't think anyone sees us needing 1000 cores in the next few years.
  My guess is 4 cores in 2008, 4 cores in 2009, moving to 8 cores through 2010. We may move to a new uber-core model once the software catches up, more like 6-8 years than 2-4. I'm positive we won't "max out" at 64 cores, because we're going to hit a per-core speed limit much more quickly than we hit a number-of-cores limit.
7. Re:Disagreement about this trend by the_olo · 2008-07-02 09:25 · Score: 3, Interesting
  
  So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this)
  You've excluded gamers as if this had been some nearly extinct exotic species. Don't they contribute the most to PC hardware market growth and progress?
8. Re:Disagreement about this trend by Gewalt · 2008-07-02 09:28 · Score: 1
  
  Wow, that guy has no vision at all. Does he really think computing is going to stop evolving for 20 years?
  
  --
  Modding Trolls +1 inciteful since 1999
9. Re:Disagreement about this trend by eht · 2008-07-02 09:31 · Score: 2, Interesting
  
  We've pretty much already hit a per-core speed limit, you really can't find many CPU's running over 3GHZ, whereas back in P4 days you'd see them all the way up to 3.8.
  Architectures have changed and other stuff allow a current single core of a 3.2 to easily outperform the old 3.8's but then still why don't we see new 3.8's?
10. Re:Disagreement about this trend by MojoRilla · 2008-07-02 09:34 · Score: 5, Insightful
  
  This seems silly. If you create more compute power, someone will think of ways to use it.
  
  Web applications are becoming more AJAX'y all the time, and they are not sequential at all. Watching a video while another tab checks my Gmail is a parallel task. All indications are that people want to consume more and more media on their computers. Things like the MLB mosaic allow you to watch four games at once.
  
  Have you ever listened to a song through your computer while coding, running an email program, and running an instant messaging program? There are four highly parallelizable tasks right there. Not compute intensive enough for you? Imagine the song compressed with a new codec that is twice as efficient in terms of size but twice as compute intensive. Imagine the email program indexing your email for efficient search, running algorithms to assess the email's importance to you, and virus checking new deliveries. Imagine your code editor doing on the fly analysis of what you are coding, and making suggestions.
  
  "Normal" users are doing more and more with computers as well. Now that fast computers are cheap, people who never edited video or photos are doing it. If you want a significant market besides gamers who need more cores, it is people making videos, especially HD videos. Sure, my Grandmother isn't going to be doing this, but I do, and I'm sure my children will do it even more.
  
  And don't forget about virus writers. They need a few cores to run on as well!
  
  Computer power keeps its steady progress higher, and we keep finding interesting things to do with it all. I don't see that stopping, so I don't see a limit to the number of cores people will need.
11. Re:Disagreement about this trend by drinkypoo · 2008-07-02 09:38 · Score: 1
  
  Actually, I think this is pretty much the year of the quad-core if people stay at all on their timelines. Intel is slated to bring out the mobile quad-cores in August and I am slated to buy one :D But nobody even needs 2 cores this year, they're just everywhere and it sure is nice to have more than one CPU core chugging away.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
12. Re:Disagreement about this trend by BlueHands · 2008-07-02 09:38 · Score: 2, Insightful
  
  I KNOW it is so very often sited but if every was a time to mention the "5 computers in the whole world" it is this. In fact, I would dare say that is the whole point of this push by Intel: trying to get people (programmers) used to the thought of having so many parallel cpus in a home computer.
  Sure, from where we stand now, 64 seems like a lot but maybe a core for nearly each pixel on my screen makes sense, has real value to add. Or how about just flat-out smarter computers, something which might happen by simulating 100 neurons per core. As far as I understand it, speech recognition can always use more power. Let me put it differently:
  Games requiring a lot of computing power makes sense to you in the future but not elsewhere. The same would have been said about a high end gaming rig just a handful of years ago, and yeta low-end PC today has amazing graphics,amazing everything, compared to what things were just 10 years ago. And it gets used, much of the time. If we have the power, we will use it. Games just push the envelope further, sooner, but they don't go anywhere that we all wouldn't wouldn't like to go anyways.
  I can not think of a single task in a game that I would not want to be able to do in real life. Games are about living an idealized life, of some sort, inside your computer. The next step is bring it our here, to the rest of the world.
  
  --
  I mod everyone down who says "I'll get modded down for this." I hate to disappoint.
13. Re:Disagreement about this trend by drinkypoo · 2008-07-02 09:42 · Score: 5, Interesting
  
  Architectures have changed and other stuff allow a current single core of a 3.2 to easily outperform the old 3.8's but then still why don't we see new 3.8's?
  The Pentium 4 is, well, it's scary. It actually has "drive" stages because it takes too long for signals to propagate between functional blocks of the processor. This is just wait time, for the signals to get where they're going.
  The P4 needed a super-deep pipeline to hit those kinds of speeds as a result, and so the penalty for branch misprediction was too high.
  What MAY bring us higher clock rates again, though, is processors with very high numbers of cores. You can make a processor broad, cheap, or fast, but not all three. Making the processors narrow and simple will allow them to run at high clock rates and making them highly parallel will make up for their lack of individual complexity. The benefit lies in single-tasking performance; one very non-parallelizable thread which doesn't even particularly benefit from superscalar processing could run much faster on an architecture like this than anything we have today, while more parallelizable tasks can still run faster than they do today in spite of the reduced per-core complexity due to the number of cores - if you can figure out how to do more parallelization. Of course, that is not impossible.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
14. Re:Disagreement about this trend by superdave80 · 2008-07-02 09:49 · Score: 1
  
  You are comparing the processors in my car to the processors in my computer? That's so not in the same ballpark. My car might have 300 processors in it, but I guarantee that their combined processing power doesn't measure up to the power that my desktop computer has. Those things are simple, task-specific processors. Which do you think takes more power: my 3D modeling software, or managing my turn signal?
15. Re:Disagreement about this trend by danomac · 2008-07-02 10:00 · Score: 1
  
  People using email, running a web browser, etc... hit CPU speed saturation some time ago. A 500MHz CPU can adequately serve their needs.
  Ah, yes. However, who knows what malware/virus/etc will be out then. Antispam/virus products seem to require 2GHz and 1GB of RAM by themselves now to run at a reasonable speed. By 2020 we may have 64 cores, and 48 of them will be protecting you from malware. :P
16. Re:Disagreement about this trend by mrchaotica · 2008-07-02 10:08 · Score: 1
  
  The average automobile has over 300 processors in it.
  By that standard, so does your PC! Each ALU in your (superscalar) CPU, each cache controller in your CPU, the disk controllers on your hard drives, the 2^N cores in your GPU, the memory controller on your video card, the northbridge, the southbridge, various minor ones on other expansion cards, etc. There's probably even one in your mouse.
  
  --
  "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
17. Re:Disagreement about this trend by Jah-Wren+Ryel · 2008-07-02 10:12 · Score: 1
  
  My guess is 4 cores in 2008, 4 cores in 2009, moving to 8 cores through 2010
  AMD says 12 cores by 2010. (and 6 in 2009)
  
  --
  When information is power, privacy is freedom.
18. Re:Disagreement about this trend by MadnessASAP · 2008-07-02 10:12 · Score: 1
  
  Even using the most loos definition of a processor I can accept, there is still far less then 300 processors in a car. There is maybe 8 computer modules in a car and they certainly aren't stuffed with 20+ processors. Unless you want to start counting something like an oscillator circuit or a counter as a "processor."
  And before anybody tries to tell me I don't know what I'm talking about, I do work as a mechanic and have done plenty of work on car electronic systems.
  
  --
  I may agree with what you say, but I will defend to the death your right to face the consequences of saying it.
19. Re:Disagreement about this trend by zap0d · 2008-07-02 10:15 · Score: 1
  
  My guess is 4 cores in 2008, 4 cores in 2009, moving to 8 cores through 2010. We may move to a new uber-core model once the software catches up, more like 6-8 years than 2-4. I'm positive we won't "max out" at 64 cores, because we're going to hit a per-core speed limit much more quickly than we hit a number-of-cores limit.
  You can get already 8 cores on current consumer hardware with dual quad core xenons. The Cell CPU is also 8 cores. And the most recent GPUs are running with 240 cores.
20. Re:Disagreement about this trend by brkello · 2008-07-02 10:24 · Score: 1
  
  That's just not how it works. People don't make things that aren't in demand. It isn't a "create it, and they will come" sort of market. Home users really won't need more than 4 cores with the types of apps they use. Servers can take advantage of more. The only market for 1000s of cores are going to be in super computing which just isn't a driver of the market. Video gamers have a larger effect than these guys.
  
  So you make HD videos...is that going to need 1000s of cores? No, it doesn't. You aren't going to improve home user performance by just throwing more cores at it. The biggest jump you will get is from going from one core to two. After that the difference sharply declines and then you will see more benefit from faster processor speeds, memory bandwidth, and the more traditional things we think of as increasing system performance.
  
  --
  Support a great indie game: http://www.abaddon360.com
21. Re:Disagreement about this trend by jsebrech · 2008-07-02 10:29 · Score: 2, Informative
  
  Architectures have changed and other stuff allow a current single core of a 3.2 to easily outperform the old 3.8's but then still why don't we see new 3.8's?
  Clock rate is meaningless. They could build a 10 ghz cpu, but it wouldn't outperform the current 3 ghz cpu's.
  A modern cpu uses pipelining. This means that each instruction is spread out across a series of phases (e.g. fetch data, perform calculation 1, perform calculation 2, store data). Each phase is basically a layer of transistors the logic has to go through. The clock rate simply is how often data is transferred to the next phase. The higher you push the clock, the faster instructions move through their phases towards completion. The problem is that the transistors in each phase take a while after every clock tick to stabilize. So, if you push the clock rate too high, the end result of your current phase won't have been reached yet, and you'll push garbage to the next phase. This is why a cpu that is overclocked too far will cause crashes. It simply doesn't do reliable calculation anymore.
  Now, the reason you had higher clock rates on the P4 architecture is that intel "solved" the clock rate problem by having more phases and making each phase shorter. Overall the cpu was less efficient, but they could put a bigger ghz number on the package, so marketing was happy. They've come back from that because they couldn't compete on cost/performance with someone who didn't do that (amd), and their current architecture has appropriate-length phases again, with a lower clock rate to match.
  Like you've observed however, overall the speed has gone up.
22. Re:Disagreement about this trend by Aceticon · 2008-07-02 10:30 · Score: 1
  
  If you create more compute power, someone will think of ways to use it.
  They would be creating more computer power anyway:
  - They could concentrate on making CPUs run faster
  - They could use more silicon real-estate for bigger caches
  - They could integrate new ways of designing digital circuits using non-synchronous functional blocks
  Instead the went for the approach of having many weak processing units instead of a few strong ones. It's a bit like having 1000 Ladas instead of a couple of Porsches:
  - It's beter if all you want is to move many people across small distances
  - It's not quite as good if what you're trying to do is get a small number of people from Cologne to Berlin as fast as possible.
  What many of us are pointing out is that there are rarelly enough concurrent processing threads to justify hundreds, much less thousands of cores and that often enough you do have one or two time constrained non-paralelizeable tasks ("get from Cologne to Berlin as fast as possible") only now with this approach, you only have "Ladas" to do it with.
23. Re:Disagreement about this trend by Kjella · 2008-07-02 10:35 · Score: 1
  
  You've excluded gamers as if this had been some nearly extinct exotic species. Don't they contribute the most to PC hardware market growth and progress?
  Considering the huge upswing in laptops, which tend to be very poor or very expensive for gaming, I'd say the answer is no. Obviously they're the force behind nVidia and ATIs latest creations but when it comes to power consumption, weight, noise, wireless, the Atom processor, UMPCs, SSDs and a host of other developments I'd say no. One trend I've noticed is that graphics card reviews have increased their resolution and AA/AF settings. Many, many games are now only reviewed at 2560x1600 4xAA 16xAF and like "all these cards will run this fine" with some notable exceptions like Crysis. When the PS4/XB3 arrives I expect it'll take a serious bit out of the PC gamer market. Don't get me wrong, they're very important to SOME segments but I don't think they direct the big picture.
  
  --
  Live today, because you never know what tomorrow brings
24. Re:Disagreement about this trend by Chemisor · 2008-07-02 10:36 · Score: 1
  
  > Web applications are becoming more AJAX'y all the time, and they are not sequential at all.
  AJAX is not CPU intensive. In fact, everything on the web is bound pretty much by your network connection and the load on the server. Yes, serves will benefit from lots of cores, but you, on the desktop, will not.
  > Watching a video while another tab checks my Gmail is a parallel task.
  With a modern video card, your video decoding is offloaded almost completely to the GPU. That is, of course, unless you are running Linux and have no specs for that... Checking Gmail requires no CPU time at all, since Gmail supports IMAP, which notifies the client when new mail is available without any need to poll. See IMAP IDLE RFC.
  > Have you ever listened to a song through your computer while coding,
  > running an email program, and running an instant messaging program?
  Email is already idle, as explained in the above RFC. IM is pretty much the same way, waiting on the network 99.99% of the time. Your editor spends the majority of its time waiting on you. Your music is processed by the sound card, unless you are a cheapskate and use AC97 onboard crap. In fact, the only time you put any appreciable load on your CPU is when you are compiling. The compiler does benefit from multiple cores, but only to a point. 2 cores I can fill, 4 or 8 might get used with a large project. Anything more might help, but is probably not worth paying for. You need to think now and then, you know; might as well do it during the four seconds it takes to build your subproject.
  > Imagine the song compressed with a new codec that is twice as
  > efficient in terms of size but twice as compute intensive.
  So you process half the data at half the speed? :) Can you divide?
  > Imagine the email program indexing your email for efficient search,
  > running algorithms to assess the email's importance to you, and
  > virus checking new deliveries.
  Most of us get email rather infrequently. I don't think I've ever had more than ten messages a day. But say you're Linus, and get a thousand messages per day. Each one might have a few kilobytes of text. So let's say, 10M total input. My hard disk can read 10M in 0.2s, actual indexing is a single pass through the data, and you might write 10-20k of index results. I doubt this will take more than a few seconds. You can use a PIII for that. If you do it on the fly, it will take even less time. Virus checking is a little more intensive, but if the mail program is smart enough to do it in the background, you won't notice that either.
  > Imagine your code editor doing on the fly analysis of what you are coding, and making suggestions.
  You mean like Visual Studio does with its API lookup dropdowns? That probably takes no CPU at all, being a database lookup. If you are thinking about AI, I'll laugh at you. It ain't happening any time soon. Anything that is happening any time soon will not be computationally expensive.
  > Now that fast computers are cheap, people who never edited video or photos are doing it.
  Photo editing is not CPU intensive, with the exception of filters, which can indeed be parallelized. But the performance improvement is again not something worth paying for. Filters aren't even used all that often.
  > If you want a significant market besides gamers who need more cores,
  > it is people making videos, especially HD videos.
  Yeah, that's what we need. More idiots making fools of themselves in HD video. As if YouTube didn't teach you anything.
  > And don't forget about virus writers. They need a few cores to run on as well!
  Are you suggesting I support virus writers?
  > Computer power keeps its steady progress higher, and we keep finding interesting things to do with it all.
  I'm sorry, but I just don't see any of those interesting new things. Even video editing was around and quite usable bac
25. Re:Disagreement about this trend by EchaniDrgn · 2008-07-02 10:45 · Score: 1
  
  But those are still individual programs being run. Intel is suggesting that developers begin thinking of how to have a single program span across multiple cores.
  No one is saying that it isn't possible to saturate a single 64 core computer by running numerous applications. The real hurdle is in getting a single application to be designed to fully utilize that same 64 (or 1000) core computer as well.
  As I write this I have eight applications running in the foreground, and nine tabs open in my browser but I doubt that many of these are using multiple processors each.
  Just my two cents.
26. Re:Disagreement about this trend by moosesocks · 2008-07-02 11:01 · Score: 1
  
  Web applications are becoming more AJAX'y all the time, and they are not sequential at all.
  
  The problem here is that AJAX is hideously inefficient.
  The level of user-interaction achieved with an AJAX application is about on-par with what a Windows 3.1 app is capable of on a 386. Really, it's embarrassing. We don't need more cores. We need a better specification for webapps.
  
  --
  -- If you try to fail and succeed, which have you done? - Uli's moose
27. Re:Disagreement about this trend by felipekk · 2008-07-02 11:11 · Score: 2, Funny
  
  Ah, I see you are running Vista...
  j/k though, I have a single core running Vista x64 and I love it. It's responsive as hell (seriously).
28. Re:Disagreement about this trend by porneL · 2008-07-02 11:25 · Score: 1
  
  DHTML is CPU intensive (mostly due to inefficiency of JS+DOM+CSS reflows).
  Simple DB-lookup auto-completion sucks. If you have a lot of power to spare, you can do realtime static analysis of the code and e.g. use syntax highlighting to show dead code or use of uninitialized variables immediately, or have more useful auto-completion in languages with weak/dynamic typing.
  
  Even video editing was around and quite usable back in 1993.
  Not on "consumer" hardware. In 1993 on Amiga I could barely make short, low-resolution animations (and with help of dedicated hardware mix them with analog video input).
  Even today I would like to have a few more cores to export H.264 video.
29. Re:Disagreement about this trend by 1jpablo1 · 2008-07-02 11:40 · Score: 1
  
  This kind of argument seems funny to me. Of course running a web browser or running a word processor won't benefit from this computer power. But it doesn't need to.
  You could say as well that your watch calculator (remember those?) won't benefit from multiple cores (although perhaps it does).
  What will benefit from this computer power will be games (and virtual environments), transcoding media files, automatic language translation, object recognition, driverless cars, and a thousand things that we don't do right now because we haven't even thought about it because we don't have the computer power.
30. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 11:46 · Score: 0
  
  And don't forget about Vista!
31. Re:Disagreement about this trend by SiegeTank · 2008-07-02 11:51 · Score: 1
  
  Solution:
  
  - More cores
  - Lower clock speed for each core
  - Ability to turn off unused core
  - Power savings from all the last two points
  - Distributed architecture for assigning work to cores (chips designed to offload the work in different ways on the motherboard)
  
  That last part is the tricky bit, this is what people looking into multi-core technology and programming are really trying to solve - it's a distribution problem.
  
  Vast savings of power are available if this can be done in a way that doesn't affect the way software is written in a big way. Cranking the speed of the chips just will not work without some new factor to overcome the thermal problems involved.
  
  And no, we can't just all use dry-ice or liquid nitrogen :P
32. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 11:57 · Score: 0
  
  "Web applications are becoming more AJAX'y all the time, and they are not sequential at all. Watching a video while another tab checks my Gmail is a parallel task."
  But all I really want is the embedded flash Video I'm watching with my sucky Fedora9/Firefox not to crash.
  I guess looking at the "Next Big Thing" is always greener than mundane work of fixing the way things are now.
  PS: Oh and i forgot: "fix it yourself, linux developers don't work for you!"
33. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 12:04 · Score: 0
  
  How is it possible to get a good, responsive end-user e-mail experience with a mere 64 cores?
  Well, it's a good start (thinking about those average 20000+ emails in certain corporate mailboxes).
34. Re:Disagreement about this trend by ChrisA90278 · 2008-07-02 12:18 · Score: 1
  
  most of the code out there is sequential (or nearly so) and I/O bound. So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power. (Gamers are obviously not included in this) Thus, he predicted, processors would max out at a "relatively" low number of cores - 64 was his prediction.
  Yes in 2020 if people are still doing those tasks all they would need were a "few" cores. But what if your email program actually tried t understand the text of the mail and tried to decide for you what you might be interested in? That is a HUGE and complex task that could use many, many cores per email message. Parsing the text and matching it into some kind of big semantic network is a non-trivel task. Next, what if your computer in 2020 had a web cam attached and used hand motion and lip reading as input. That alone could keep many cores bussy. Say nothing about voise input and output. And what if the company's PBX system actually answeredthe phone and talked with users and understadd simple things like "tell susan the meeting is at 2 not 3." and then the phone whould find susan and tell her.
  OK my point is that if you had 1,000+ core you could do tasks you can't do now, like lip reading and answring 250 incoming calls all at once.
  Ho, and what about controlling that walking robot that needs to carry your beer up the stairs? The robiot must have 1,000+ motors and sensors inside of it. One core, one thread per sonsor or motor.
  we will find new tasks.
35. Re:Disagreement about this trend by geekoid · 2008-07-02 12:33 · Score: 1
  
  Several reasons, however one of the most over looked on slashdot is the Fab.
  It is getting harder and harder to make chips. Flaws that wouldn't even be NOTICED 8 years ago have become a huge problem to fix. Getting a higher chip count on a wafer doesn't matter if 30% of them aren't useful. Since the MHz is basically over, fabs understand the 3GHz is plenty fast under current software design models. Meaning that the design model is better then the user in most cases.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
36. Re:Disagreement about this trend by WhoBeDaPlaya · 2008-07-02 12:54 · Score: 1
  
  My E6600 @ 3.6GHz begs to differ ;) So do all the folks running E8xxx chips at 4GHz+
37. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 13:48 · Score: 0
  
  Well, you can buy Power6 at 5GHZ or mainframes (z6) at 4GHz (both from IBM), if you have the bucks.
  The interesting thing about Power6 is that each processor (2 cores, 4 threads) has an 8 channel memory controller[1] and a memory bandwidth which increases with the number of processors. Of course, this means that you have to install memory modules by sets of 8 (or perhaps more since most machines have more than 1 CPU chip). The interesting point on multi-chip Power6 machines is that there are low latency communication channels between processor chips that make access to remote memory not much more expensive than the local one (no need for NUMA).
  [1] Actually the 8 channel memory controller is external to limit the number of pins on the processor chip, which only a dual channel controller but can run at 4 times the memory speed (up to 3.2GHz IIRC) and an external chip manages the 8 memory channels. In practice the result is not very different, less than 1ns added latency for cache misses.
38. Re:Disagreement about this trend by Bodrius · 2008-07-02 14:00 · Score: 1
  
  Are we seriously talking about the inefficiency of DHTML rendering as a rationale for long-term demand for multicore CPUs?
  At some point, rich web-app CSS rendering will become more efficient or die in favor of better web-app technologies - there are already a few out there.
  I find it hard to believe that if that is a big enough problem it couldn't be 'fixed' faster (and cheaper) than the fundamental hardware and software changes required by the thousand-core model.
  Code analysis, compilation, etc. are good cases for usage of those cores on CPU-bound tasks... but they're all professional workstation tasks.
  I have no doubt I could put more cores to use on my professional workstation - if only by making it painless to run independent compilations and automated tests in the background while I work on something else on the IDE.
  But when I go home and most of my tasks consist on interwebs browsing, email, or watching dvds, I can't find a use for all those cores that easily. Gaming and encoding/decoding are the only good candidates - i.e.: tasks well targeted by specialized video hardware.
  It's been a while since workstations and PC hardware markets merged - depending on the costs of the new hardware, this might encourage that divide again.
  
  --
  Freedom is the freedom to say 2+2=4, everything else follows...
39. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 14:42 · Score: 0
  
  >>This seems silly. If you create more compute power, someone will think of ways to use it.
  Problem with that is the same as we have now. His Steveness announces a new Mac that does all this really neat stuff much faster but when you get down to it, you're stuck waiting for Adobe of Microsoft or someone to release a new version of their software that actually USES all the new stuff efficiently.
  So two years after you buy your new Mac, Adobe brings out SooperDooper version 9.1 ... and then it's time to update your hardware. Again. Because Steve said so.
40. Re:Disagreement about this trend by nonewmsgs · 2008-07-02 15:31 · Score: 1
  
  In 1997 IBM built a chess Supercomputer -- Deep Blue that had 64 cores. a chess board has 64 squares and IIRC IBM had it set up so each processor started from that square for move calculations and then compared the results at the end to the other processors. so it is possible for programming with 64.
  the thing about it though is at least 32 squares are always starting out unused, but i guess that makes the parallel programming easier?
41. Re:Disagreement about this trend by RightSaidFred99 · 2008-07-02 17:12 · Score: 1
  
  Yeah, but I'm talking mainstream and general purpose CPUs. In other words, the target hardware for software which sells more than 10,000 copies. Quad core is rolling into the mainstream this year, and it'll advance from there. We'll start seeing 8 core CPUs late this year, moving to mainstream in a year or so after.
  Dual socket machines will never be mainstream.
42. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 17:13 · Score: 0
  
  His premise is flawed. People using email, running a web browser, etc... hit CPU speed saturation some time ago. A 500MHz CPU can adequately serve their needs
  AJAX and the shift of Web sites to heavy client side CPU usage makes this less clear. Try using Facebook in Firefox on a 500Mhz PC. It bogs down my 1.6Ghz machine and runs noticeably faster on a modern 2 core CPU. So I disagree that CPU speed saturation has been achieved.
43. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 17:32 · Score: 0
  
  This seems silly. If you create more compute power, someone will think of ways to use it.
  
  And don't forget about virus writers. They need a few cores to run on as well!
  Oh, are you on Vista also? ;)
44. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 18:21 · Score: 0
  
  Web applications are becoming more AJAX'y all the time, and they are not sequential at all.
  Javascript is single-threaded by design, so you have at most one thread per page. While it'd be nice to be able to use a CPU core per page, Firefox has horrible threading support -- loading some resource-heavy page will lock up the entire UI and I can't even switch tabs. Doubly so if you have any plugins (*cough*, Flash) running.
  Imagine the email program indexing your email for efficient search, running algorithms to assess the email's importance to you
  This is pretty much I/O bound right now.
  Imagine your code editor doing on the fly analysis of what you are coding, and making suggestions.
  The bottleneck here is the IDE developers, not computing power. I guess if everyone had a fast CPU you could write an IDE in a nice high level language like Python though. Something like brute force AI / pattern matching might take a lot more than a few thousand cores though.
  Anyhow, that said, I agree with you that more computing power = teh awesome. It's going to take a quite while for the software to match the hardware when it comes to multicore though.
45. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 18:47 · Score: 0
  
  Put another way: 64 cores should be enough for everyone.
46. Re:Disagreement about this trend by jaseer · 2008-07-02 19:25 · Score: 1
  
  Very well written !! Bottom line is.. when it comes to processing power..
  "Invention is the mother of necessity"
47. Re:Disagreement about this trend by mariuszbi · 2008-07-02 20:14 · Score: 1
  
  imagine the song compressed with a new codec that is twice as efficient in terms of size but twice as compute intensive. According to Leonardo Chiariglione (co-founder of MPEG): "the idea that compression technology keeps on improving is a myth". Actually I think we kinda reached the limit for compressing the stream without losing important data. I don't think I need more cores to do the same thing, I need several small groups of cores to do very specialized things (like some cores for graphics, a core or to for sound and so on).
48. Re:Disagreement about this trend by Anonymous Coward · 2008-07-02 20:26 · Score: 0
  
  Thus, he predicted, processors would max out at a "relatively" low number of cores - 64 was his prediction.
  'Cause 64 should be enough for anyone...
49. Re:Disagreement about this trend by LordMyren · 2008-07-02 21:31 · Score: 1
  
  Yes applications today are mostly monolithic pieces of code running sequentially. Theres little we can do to speed these tasks up, its up to programmers to do a decent job of building reasonably responsive applications.
  But the notion that code has to be parallelized to make us of multicores is patently false. Data parallel code turns a single computation into many threads of work, but all we really need is many threads of work to make use of a multicore system.
  The real issue is that a single user might not be able to effectively utilize a multi core system. However, if you put thirty or a hundred users on a system, each users tasks aggregate to a sizable pool of threads that would be ideal for a multi core system.
  The I/O bounds you mentioned are the real issue. Its largely the reason Nehalem is going with an integrated memory controller, to prevent the northbridge from limiting scaling. PPC, AMD, everyone else has gone this direction already, and for this reason.
50. Re:Disagreement about this trend by LordMyren · 2008-07-02 21:52 · Score: 1
  
  This is the poster child for vacuous argument on parallelism, and we'd be better if people would stop trying to say this.
  The tasks you mention are laughably bad. You make up nonsense about new music codecs, pretend that indexing is something that happens all-the-time/realtime, and speak emptiness about "algorithms" that will magically consume cpu to make lives better.
  The only viable thing you mention in your post is video editing, which is a task handled on the de-facto parallel processor of the computer: the GPU.
  The notion that the average user is ever going to aggregate enough background work at the same time to need a multicore system is preposterous. Even this outlandish reference-free fantasy hasnt proposed a foreground task that a common end user would perform that would use even a mildly parallel cpu.
51. Re:Disagreement about this trend by LordMyren · 2008-07-02 22:13 · Score: 1
  
  as in, "we invented it, now we have to come up with some way to use it?"
52. Re:Disagreement about this trend by JasterBobaMereel · 2008-07-03 00:37 · Score: 1
  
  No business users buy far more PC's than Gamers
  They buy three types of machines - Laptops they can use and do presentations on - Desktop machines for normal office work - Servers
  All but the servers don't need massively parallel machines ... why do you think most machines don't come with a decent graphics card, it's because most people would never use it ...
  
  --
  Puteulanus fenestra mortis
53. Re:Disagreement about this trend by Anonymous Coward · 2008-07-03 00:48 · Score: 0
  
  I agree to a certain extent. There is a lot of stuff being developed currently for multiple cores, but in general most things are lagging behind. As parallel processing catches up then the number for cores will begin to jump quickly.
54. Re:Disagreement about this trend by JasterBobaMereel · 2008-07-03 00:57 · Score: 1
  
  The quote is attributed to Thomas J Watson (IBM) "I think there is a world market for maybe five computers" since at the time what he would have been describing we would view as the equivalent of a state of the art supercomputer ... there are (and have always been) about five of these, Current Supercomputers 200TFlops+ = 5) the rest are mostly ~100TFlops or less
  Ignoring games what people actually use PC's for has not really changed in 10 years, the last killer app that actually required more computing power was the Web - most people do not need a massively powerful computer - This does not hold for Game machines, servers, or a few specialists (Graphics, sound video processing, CPU heavy analysis) and they will always need the fastest available - but the majority do not need this, why do you think eee PC's and cheap laptops are so popular ...
  
  --
  Puteulanus fenestra mortis
55. Re:Disagreement about this trend by bs7rphb · 2008-07-03 01:21 · Score: 1
  
  Imagine your code editor doing on the fly analysis of what you are coding, and making suggestions.
  Agh! Clippy, why won't you die?!
56. Re:Disagreement about this trend by Anonymous Coward · 2008-07-03 03:05 · Score: 0
  
  because we're going to hit a per-core speed limit much more quickly than we hit a number-of-cores limit.
  Most likely, we have already hit a per-core speed limit. The relationship between power consumption/dissipation and clock frequency demands that at a certain point, it becomes necessary to find increased performance through parallelism.
57. Re:Disagreement about this trend by MojoRilla · 2008-07-03 03:18 · Score: 1
  
  You make up nonsense about new music codecs...
  You are absolutely correct. There are no promising technologies that will improve music compression.
58. Re:Disagreement about this trend by Anonymous Coward · 2008-07-03 03:57 · Score: 0
  
  As someone who still has a working 500MHz computer, I can say you're totally wrong :) Video decoding is very expensive, and used by just about everybody (specially joe average). Not to mention new functionality is added all the time.
  Many 'futurologists' forget about this, but really, do you remember Quatro Pro? Nice program, but I wouldn't go back to it, new functionality is important, even though sometimes it wastes a lot of processor power for only a small improvement.
59. Re:Disagreement about this trend by Anonymous Coward · 2008-07-03 08:03 · Score: 0
  
  The interesting point on multi-chip Power6 machines is that there are low latency communication channels between processor chips that make access to remote memory not much more expensive than the local one (no need for NUMA).
  Didn't you just describe a non-uniform memory access model? A cluster is more like the architecture you were thinking of.
Ok.. so how do I do that? by bigattichouse · 2008-07-02 08:47 · Score: 2, Interesting

Are we just looking at crazy-ass multithreading? or do you mean we need some special API? I think its really the compiler guru's who are really going to make the difference here - 99% of the world can't figure out debugging multithread apps. I'm only moderately successful with it if I build small single process "kernels" (to steal a graphics term) that process a work item, and then a loader that keeps track of workitems .. then fire up a bunch of threads and feed the cloud a bunch of discrete workitems. Synchronizing threads is no fun.

--
meh
1. Re:Ok.. so how do I do that? by Phroggy · 2008-07-02 09:02 · Score: 4, Informative
  
  A year or so ago, I saw a presentation on Thread Building Blocks, which is basically an API thingie that Intel created to help with this issue. Their big announcement last year was that they've released it open-source and have committed to making it cross-platform. (It's in Intel's best interest to get people using TBB on Athlon, PPC, and other architectures, because the more software is multi-core aware, the more demand there will be for multi-core CPUs in general, which Intel seems pretty excited about.)
  
  --
  $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
  $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
2. Re:Ok.. so how do I do that? by drinkypoo · 2008-07-02 09:44 · Score: 1
  
  the more software is multi-core aware, the more demand there will be for multi-core CPUs in general, which Intel seems pretty excited about
  They used to sell processors on the basis of their high clock rate, that game is now over, so now they are planning to sell processors based on their high number of cores.
  However, the last time intel produced a new architecture which required new programming techniques to really gain the advantages, it went approximately nowhere... So what chance do they have to do it this time?
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
3. Re:Ok.. so how do I do that? by jsebrech · 2008-07-02 10:36 · Score: 1
  
  Google uses map/reduce. It seems to work well for them. The system avoids thread synchronization during processing, and by doing that essentially turns it into a sequential programming model that can be debugged normally.
  I think the advances are going to come from finding ways to avoid thread synchronization. Even if you need more threads overall, it's still a gain.
4. Re:Ok.. so how do I do that? by Courageous · 2008-07-02 14:14 · Score: 1
  
  I think the advances are going to come from finding ways to avoid thread synchronization.
  This has been understood for ages in the supercomputing business, where the number of "threads" running at any given time can be in the many thousands these days. If you use synchronization in those environments, you lose.
  So yah, good guess.
  C//
5. Re:Ok.. so how do I do that? by everphilski · 2008-07-02 15:05 · Score: 1
  
  Intel provides good support for OpenMP (a standard parallelization technique for shared memory systems, like a desktop) in their compilers, they just think they can do some things better with IntelTBB. I've looked at it, it's neat and it nests with C++ a lot cleaner than OpenMP (which is essentially a bunch of preprocessor commands and not really language integration). However for true scalability (not just on one computer, but distributed computing) you need to parallelize differently using something like MPI.
  
  If you read their pitches on TBB they will say OpenMP might be the better tool for some scenarios, they are pretty honest... this isn't MMX or whatever you are thinking about.
6. Re:Ok.. so how do I do that? by shutdown+-p+now · 2008-07-02 17:07 · Score: 1
  
  If you haven't studied functional programming at least a little bit, it's probably time to do so now - it might be needed very soon. It's not just wishful thinking, either - both Microsoft and Sun are very big players on the developer toors arena at the moment, and both are pushing for FP, specifically in the context of large-scale parallelization. Microsoft has released C# 3.0 with its rather obvious FPish leanings, is developing Parallel LINQ, and has stated that F# is likely to appear as a first-class IDE-supported language in future versions of Visual Studio. In addition, most Microsoft developers and, more importantly, managers I've spoke to seem to agree on FP as the way forward.
  Sun is somewhat more quiet, but you can still tell by the fact that Java is getting full-fledged closures soon, and Fortress is built for parallel architectures from ground up (so much so that "for" loop defaults to parallelizing, unspecified-order version).
True Dat by stoolpigeon · 2008-07-02 08:48 · Score: 1

and imagine all those cores in a box running a bunch of virtual machines. every dba team will need an accountant.

--
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
1. Re:True Dat by Anonymous Coward · 2008-07-02 09:46 · Score: 0
  
  every dba team will need an accountant.
  Not the ones running PostGresQL.
2. Re:True Dat by stoolpigeon · 2008-07-02 15:35 · Score: 1
  
  This is true - though many who would need Oracle, even if they use Postgres as a replacement will probably be purchasing support for whatever rdbms they use. Same thing if they are running it on Linux - they will be paying someone to support that OS - even though they don't 'have' to.
  
  --
  It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
been there, done that by frovingslosh · 2008-07-02 08:49 · Score: 5, Funny

Heck, my original computer had 229376 cores. They were arranged in 28k 16 bit words.

--
I'm an American. I love this country and the freedoms that we used to have.
1. Re:been there, done that by Anonymous Coward · 2008-07-02 09:00 · Score: 0
  
  I thought a core only stored one bit?
2. Re:been there, done that by frovingslosh · 2008-07-02 09:22 · Score: 1
  
  That's why you needed lots of them. My original math was off by a factor of 2 (damn bytes and words), so I had even more cores. Intel is so behind the times.
  
  --
  I'm an American. I love this country and the freedoms that we used to have.
3. Re:been there, done that by Sponge+Bath · 2008-07-02 09:27 · Score: 1
  
  I thought a core only stored one bit?
  Bits were much smaller back then,
  allowing you to store two in each core.
Where will they put... by Thelasko · 2008-07-02 08:50 · Score: 1

all of those DIMMs of RAM. I'm thinking they will have to come up with something smaller. Maybe more than one DIMM on a... DIMM?

--
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
Downright neat by Alarindris · 2008-07-02 08:51 · Score: 2, Funny

640K cores should be enough for anybody.
1. Re:Downright neat by Anonymous Coward · 2008-07-02 09:23 · Score: 0
  
  640K cores should be enough for anybody.
  "Sniff", I miss Bill already.(sigh)
We all saw it coming anyway by eebra82 · 2008-07-02 08:51 · Score: 1

It's fairly obvious that both Intel and AMD are heading this way. The transistors are shrinking, but we will soon create a transistor that cannot be shrunk further, and once this happens, we will have to think layers and cores and possibly more GHz.

So whether programmers find this move acceptable or not is irrelevant because this path is probably the only way to design faster CPU:s once we've hit the nanometer wall.

--
Full Tilt
1. Re:We all saw it coming anyway by ClosedSource · 2008-07-02 09:10 · Score: 5, Insightful
  
  "So whether programmers find this move acceptable or not is irrelevant because this path is probably the only way to design faster CPU:s once we've hit the nanometer wall."
  I guess you should put "faster" in quotes.
  In any case, it is absolutely relevant what programmers think since any performance improvements that customers actually experience is dependent on our code.
  Historically a primary reason to buy a new computer is because a faster system makes legacy applications run faster. To a large extent this won't be true with a new multicore PC. So why would people buy them?
  That's why Intel wants us to redesign our software - so that in the future their customers will still have a reason to buy a new PC with Intel Inside.
2. Re:We all saw it coming anyway by drinkypoo · 2008-07-02 09:48 · Score: 1
  
  Historically a primary reason to buy a new computer is because a faster system makes legacy applications run faster. To a large extent this won't be true with a new multicore PC.
  Historically the primary reason to buy a new computer is because the old one is outdated and will not run the new software.
  The contemporary reason to buy a new computer is that the old one is somehow broken, and although it serves your needs and is repairable it's cheaper to just buy a new one for five hundred bucks... which also happens to be two to four times faster than your old machine. Maybe more.
  
  That's why Intel wants us to redesign our software - so that in the future their customers will still have a reason to buy a new PC with Intel Inside.
  That part is true. They failed with itanic and now they're going to try again with this particular strategy. On the other hand, massively parallel computing is an idea whose time has long since come. It's going to enable a lot of new types of tasks when it does get here.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
3. Re:We all saw it coming anyway by kipman725 · 2008-07-02 10:14 · Score: 1
  
  they will hit a brain power problem first. It takes progresivly more experiance people and more inteligent people to design smaller and smaller chips as the design rules grow with each shrink.... I expect 5nM or 10nM will be limit for comodity cpus as quantum effects are going to be very hard to find solutions for.
4. Re:We all saw it coming anyway by Anonymous Coward · 2008-07-02 10:24 · Score: 0
  
  In any case, it is absolutely relevant what programmers think since any performance improvements that customers actually experience is dependent on our code.
  Uuh, CPU clock rate, for example?
5. Re:We all saw it coming anyway by ClosedSource · 2008-07-02 10:51 · Score: 1
  
  Sure, in the past. The point is that they're going to offer more cores instead of faster ones.
6. Re:We all saw it coming anyway by Anonymous Coward · 2008-07-02 13:00 · Score: 0
  
  So what happens in 15-20 years, after we've successfully made the switch (honestly it probably would take that long) and Intel announces a breakthrough in quantum computing or some other thing, effectively eliminating the nanometer wall? We switch back?
7. Re:We all saw it coming anyway by Gazzonyx · 2008-07-02 14:19 · Score: 1
  
  The Itanicium didn't fail on its own; Intels own Xeons killed it. Xeons simply crushed Itaniums for price/performance ratio. Especially the newly non netburst based Xeons.
  
  --
  If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
8. Re:We all saw it coming anyway by toddestan · 2008-07-02 14:24 · Score: 1
  
  If that's what ends up happening, it won't be a big deal. Multi-threaded applications will run just fine on a single core CPU, we've doing it for years.
9. Re:We all saw it coming anyway by Anonymous Coward · 2008-07-02 17:38 · Score: 0
  
  exactly, it sounds wonderful "a good opportunity to refactoring software", come on, show me the money to redo whatever I have done plus using a more difficult programming style...
10. Re:We all saw it coming anyway by Anonymous Coward · 2008-07-03 07:48 · Score: 0
  
  "In any case, it is absolutely relevant what programmers think since any performance improvements that customers actually experience is dependent on our code."
  It's not dependent on the engineer's opinion, it's based on business need. If the speed advantage (or ability to surpass or keep up with a competitor) is worth the cost of that development style (which I wouldn't think viable in the near future), then that's what will go down. As always, being a developer with a locked in mindset is the number one risk to your usefulness, and thus job security.
11. Re:We all saw it coming anyway by ClosedSource · 2008-07-03 08:25 · Score: 1
  
  "It's not dependent on the engineer's opinion, it's based on business need"
  I was including business need as part of "what programmers think".
  "As always, being a developer with a locked in mindset is the number one risk to your usefulness, and thus job security"
  I don't consider myself in that category, however, I don't agree with your theory anyway. Your relationship with your Boss is far more significant in keeping your job than any mindset is.
It's already here. by GreatBunzinni · 2008-07-02 08:51 · Score: 1, Informative

We already have systems with tens and and hundreds of cores. Those processors already go by the name of "graphics card" and those changes in languages and libraries go by the name of CUDA, C2M, brook+ and the like.
The only thing new that Intel brought to the table with this press release is the attempt to fool us into believe that there is nothing of the kind available and that Intel is somehow innovating in some aspect or another.
Face it: the age of the "CPU is the computing muscle" is long gone.

--
Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
1. Re:It's already here. by drinkypoo · 2008-07-02 09:52 · Score: 2, Interesting
  
  Last time I checked my computer had only one GPU core, which had a multitude of functional units. So does my CPU, in fact, but the GPU has more. Each CPU has its own "context" (the state of certain registers which store pointers, and the flags register.) More CPU cores means more contexts means less context switches means cheaper threads. Pretty simple!
  CUDA &c are cool in that they offer you a way to use your video card for non-video applications when it is idle. However, their use is likely to be cyclical. It seems that we go through phases of having lots of custom hardware, and then getting cheap horsepower to throw at problems and thus having less custom hardware and doing more things in software, then having things flop back the other way. The PC was originally an expression of software-heavy use, but these days we have standard graphics processors and physics processors are even gaining some ground. Eventually the processors will take another big jump (having a thousand cores would qualify) and then everyone will want to do all this stuff on the CPU again, because a) it will be able to do it and b) you won't have to mess with two processors to get one job done.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
2. Re:It's already here. by mrchaotica · 2008-07-02 10:18 · Score: 1
  
  Eventually the processors will take another big jump (having a thousand cores would qualify) and then everyone will want to do all this stuff on the CPU again, because a) it will be able to do it and b) you won't have to mess with two processors to get one job done.
  I'm just waiting for ATI/AMD to stick a hypertransport controller on a Radeon and be done with it.
  
  --
  "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
3. Re:It's already here. by Courageous · 2008-07-02 14:19 · Score: 1
  
  I think they are awaiting more common adoption of HT3.0. I know that their is a type of HT "slot" that looks just a bit like a PCIe8x. You'd think they'd have done this by now, at least for some supercomputer or something.
  C//
4. Re:It's already here. by mrchaotica · 2008-07-02 14:22 · Score: 1
  
  I don't think they're waiting at all; I think they just haven't had time to accomplish it yet. Remember, AMD and ATI didn't merge that long ago.
  
  --
  "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
5. Re:It's already here. by LordMyren · 2008-07-02 22:31 · Score: 1
  
  The GPU does not have "one GPU core". The NVidia GTX 280, for example, has 10 independent processing units. Each unit is comprised of 24 shaders, 8 texture units, some L1 memory, and access to RAM. All sub-units in the processing unit share a context and work towards completing the same task, but each of the 10 different units can be working on different things at the same time.
6. Re:It's already here. by sweet_petunias_full_ · 2008-07-03 20:09 · Score: 1
  
  "I'm just waiting for ATI/AMD to stick a hypertransport controller on a Radeon and be done with it."
  Hopefully they wouldn't be completely done at that point. As long as the Radeon is going to have its own separate memory bus (like an Opteron) they could add a CPU on the same die and allow it to soak up any extra memory cycles the Radeon isn't using. During idle periods or at lower graphics modes the CPU could use the Radeon's idle function units as onboard stream processors (using the parallelism to accelerate MMX, SSE type instructions). Since they would both be on the same silicon, they could talk across a wider interconnect than what you could afford to use in an external bus. For all intents and purposes the rest of the system would see the hybrid beast as another CPU sitting on an HT link and would be scheduling stuff to it. At higher graphics modes the CPU would bizarrely speed up and slow down every sixtieth of a second (either that or hammer the HT link), but otherwise would be a net gainer. There would be no need to use any special programming like CUDA to get this to work - the hardware would handle it all.
  With thousands of processors I would expect the memory bus will become one ripe mother of a bottleneck. I hope future processors will be able to sip memory and play nice with each other like the Niagara did, otherwise this thousand processor business will be more of a marketing gimmick than anything else.
  
  --
  You can't send a takedown notice to an already printed newspaper.
Good idea by Piranhaa · 2008-07-02 08:52 · Score: 4, Insightful

It's a good idea.. Somewhat of the same idea that the Cell chip has going for it (and well, Phenom X3s). You make a product with lots of redunant objects so that when some are bound to failure, the percentage of failure is much lower..
If there are 1000 cores on a chip, and 100 go bad... You're still only losing a *maximum* of 10% of performance versus when you have 2 or 4 cores and 1 or 2 go bad, you have a performance impact of 50% essentially.. Brings costs down because yeilds go up dramatically.
1. Re:Good idea by wooferhound · 2008-07-02 09:17 · Score: 0, Redundant
  
  We will be scanning the CPU for Bad Cores, Instead of scanning the Hard Drive for Bad Sectors.
  
  --
  We are Dead Stars looking back Up at the Sky
2. Re:Good idea by drinkypoo · 2008-07-02 09:54 · Score: 1
  
  The Cell chip is a boondoggle in much the same way the PS2's Emotion Engine was, and before that, the Sega Saturn. Oddly the LACK of complexity is one of the reasons often cited by developers for the success of the Playstation over the Saturn; Saturn has been described as a pile of chips on a board, but the PSX provided a previously-unparalleled ease of development (for the level of available complexity) ... didn't hurt that it was $100 cheaper or that it had hardware transparency either, of course.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
3. Re:Good idea by brkello · 2008-07-02 10:32 · Score: 1
  
  That seems logical but it really doesn't make any sense. So yeah, now you have 900 cores as opposed to two. That must mean it goes so much faster! Not really, assuming the cores are the same speed, there would probably be very little noticeable speed increase because current program are not going to take advantage of 98% of those cores. Unless programming and OS's change significantly, this is not going to result in faster computers. Generally, you only have 1 or 2 active applications up at one time (with a bunch of other ones sitting idle). It's not like you are able to play WoW and LotRO at the same time. So really, it isn't going to help the desktop market much.
  
  On the other hand, if you look at changing home computing completely where 1 server is a system that serves the whole neighborhood using vm's to server up the OS and apps for each person, then it is more interesting. But it isn't really going to increase speed at all. That will still be done using traditional methods.
  
  --
  Support a great indie game: http://www.abaddon360.com
4. Re:Good idea by AllIGotWasThisNick · 2008-07-02 11:02 · Score: 1
  
  * Have you ever tried to recode a video?
  * Have you ever tried to uncompress something bigger than a few megabytes?
  * Have you ever tried to restore missing archive/RAID pieces from parity?
  * Have you ever tried to index the full-text contents of your documents?
  I'd be very happy to reduce these times by nearly 100% -- they're a lot more common uses for CPU horsepower than Crysis.
5. Re:Good idea by Anonymous Coward · 2008-07-02 12:06 · Score: 0
  
  That assumes that the hardware logic can deal with arbitrary failures in random patterns across the chip. With 1000 cores on a chip, I find that unlikely. 1 'core' failing may lead to say...64 being rendered useless to keep an upper lid on complexity.
6. Re:Good idea by houghi · 2008-07-02 19:46 · Score: 1
  
  Look at it in another way. If you have a 4 core and 25% outage, you also will have that on a 1000 cores.
  The difference is that the outages are not measured per individual CPU. So if you have 10% outage, each person will be affected that has 1000 cores. If you have 1 core, then 10% of the people is affected and completely screwed.
  I could imagine that a certain amount of bad cores could become tolerable to keep the price down. e.g. if you have 1024 cores, the guarantee will be 1.000 working cores. With 64 cores, it will be e.g. 60 cores for the guarantee. Want better guarantee? Pay more!
  Bit like the amount of pixels that can be broken before the screen is considered broken.
  
  --
  Don't fight for your country, if your country does not fight for you.
7. Re:Good idea by Richard+W.M.+Jones · 2008-07-02 19:50 · Score: 1
  
  Been done: See Wafer-scale integration.
  
  --
  libguestfs - tools for accessing and modifying virtual machine disk images
8. Re:Good idea by LordMyren · 2008-07-02 22:21 · Score: 1
  
  Current real world cases:
  A failure in an Amd X4 will either turn it into an X3 (tri core) or a useless pile of purified silicon (some faults are non-recoverable, or there could be multiple)
  Failures in an NVidia GTX 280 will turn it into an GTX 260 (8/10 working units) or a very very large piece of purified silicon. In this case, it goes from having 240 shaders / 80 texture mappers / 32 render units to 192/64/28.
9. Re:Good idea by Anonymous Coward · 2008-07-03 03:33 · Score: 0
  
  You make a product with lots of redunant objects...
  In this case, another D would not be.
10. Re:Good idea by m50d · 2008-07-03 04:36 · Score: 1
  
  Have you ever tried to recode a video?
  Yes, but there's an upper bound on how much speed is actually useful here - there's no point transcoding videos faster than I can actually watch them. And current CPUs are pretty close to hitting that.
  Have you ever tried to uncompress something bigger than a few megabytes?
  Yes, but not in the age of terabyte hard disks. Why would you need to?
  Have you ever tried to restore missing archive/RAID pieces from parity?
  Yes, twice in my entire life, I think. Those are things that happen once in a blue moon.
  Have you ever tried to index the full-text contents of your documents?
  That's I/O bound, you can put in as fast a CPU as you like and it won't go any faster.
  
  --
  I am trolling
Quote Vegeta by Daimanta · 2008-07-02 08:52 · Score: 0, Offtopic

It's over 9000!!

--
Knowledge is power. Knowledge shared is power lost.
1. Re:Quote Vegeta by Yvan256 · 2008-07-02 09:01 · Score: 1
  
  But... that's impossible!
  See next week's episode for the 2-seconds reply.
2. Re:Quote Vegeta by QuantumHobbit · 2008-07-02 14:26 · Score: 1
  
  See next week's episode for the 2-seconds reply.
  2-seconds reply filled in with 20 minutes of Goku yelling and causing new veins to pop out of his forehead
optimal & maximum by Tumbleweed · 2008-07-02 08:53 · Score: 1

The optimal # of cores will inevitably wind up being 42, but nobody should ever need more than 640K cores.
Perhaps a machine that can configure the # and type of cores it needs on the fly will come about some day.
I'm more interested in on-die RAM for now. A combined CPU/GPU/RAM hooked to SSD storage. Yum.
Where's my Singularity? I was promised a Singularity!
1. Re:optimal & maximum by wooferhound · 2008-07-02 09:18 · Score: 1
  
  No user with 512 cores will need more than 640k of memory for each core.
  
  --
  We are Dead Stars looking back Up at the Sky
2. Re:optimal & maximum by mmkkbb · 2008-07-02 09:51 · Score: 1
  
  Actually, an Altera FPGA with the NIOS software core can be configured in software to have multiple cores, and they can drop instructions that aren't necessary.
  
  --
  -mkb
3. Re:optimal & maximum by Tumbleweed · 2008-07-02 10:00 · Score: 1
  
  Can it configure part of itself for memory? That would be interesting - configure the balance of cores/memory for what's needed at the time. How long does it take to reconfigure? Is it on the fly, or is it like an EEPROM that has to be loaded?
4. Re:optimal & maximum by mmkkbb · 2008-07-02 22:59 · Score: 1
  
  FPGAs can do RAM as well. It would effectively be static RAM though, and not as much as you could fit on a package of DRAM. The reconfiguration would be pretty expensive and you'd have to page all your memory out to a different storage device. That would be a neat trick, but I imagine that the compiler needed to do this optimization to require a second FPGA due just to sheer size.
  
  --
  -mkb
Useless by Rinisari · 2008-07-02 08:54 · Score: 1

What's the use until programmers start learning effective parallel programming? Right now, it's game developers who are winning that game, with graphics and movie editors right behind it.

--
Colin Dean Go a year without DRM
1. Re:Useless by CastrTroy · 2008-07-02 08:59 · Score: 5, Insightful
  
  Well, parallel programming is hard. It's not so hard that it can't be done, but it's harder than sequential programming. Unless your app will have a specific advantage because of this parallel programming, then it isn't worth the effort to do it in the first place. The nice thing however, would be that you could run each process on a separate core, and there wouldn't be any task switching needed. This would speed things up quite a bit. Also, if you locked a process or thread to each core, then one slow down wouldn't take out the entire system.
  
  --
  
  Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
2. Re:Useless by everphilski · 2008-07-02 09:14 · Score: 3, Interesting
  
  Parallel programming doesn't have to be hard, in fact, it comes very naturally in a number of domains. For example, in finite element analysis (used in a number of math disciplines, including CFD and various stress type calculations) the problem domain is broken down into elements which can naturally be distributed. Calculations within an element are completely independent of the domain until the system of equations are to be solved, and efficient parallelized matrix solvers is old hat.
  
  We got to keep reminding ourselves, the world we live in runs in parallel, why shouldn't our computers?
3. Re:Useless by Anonymous Coward · 2008-07-02 09:44 · Score: 0
  
  exactamundo
  but i think a similiar result could be attained by combining the multicore chipset structure with the CELL structure and have 4 cores each controling anywhere from 4 to 32 SPEs each
4. Re:Useless by AnyoneEB · 2008-07-02 10:03 · Score: 1
  
  Well, parallel programming is hard.
  You mean, "parallel programming is hard with current popular programming languages/libraries. Instead of going down to C for the parts of your program that need to be fast/parallelized, use a functional programming language or Erlang, and the compiler can automatically parallelize the parts of your program for which that is easy. Microsoft recently added F# to their collection of .NET languages to make this easier for .NET developers. Or if if you do not want to change languages, consider libraries like ThreadWeaver which try to make threading easier.
  
  --
  Centralization breaks the internet.
5. Re:Useless by Anonymous Coward · 2008-07-02 10:19 · Score: 0
  
  We got to keep reminding ourselves, the world we live in runs in parallel, why shouldn't our computers?
  The rest of your post is fine, but this line deserves ridicule for attempting to blow people's minds with a comparison that ignores the entirely different scale, nature, and functioning of the world versus that of modern computers. Also, a hell of a lot of the world runs synchronously and if it didn't some very, very bad things would happen. Saying something like "separate processes involving separate entities naturally run concurrently in the real world, with occasional examples of asynchronous behavior within those processes" is much more accurate, but I'm guessing it doesn't have the "Wow!" factor you were looking to get.
6. Re:Useless by kramulous · 2008-07-02 10:32 · Score: 1
  
  True, but perhaps the compilers need to change with the different CPUs. Different types of for loops that mean different things? Or use the standard loop but don't program with pointers so the compilers can see what can be done to parallelise the loops for you.
  
  --
  .
7. Re:Useless by GreatDrok · 2008-07-02 12:48 · Score: 1
  
  Parallel programming isn't hard. It is quite a natural way to express your code. The problem is with the languages that people choose to use. Anything based on a sequential language isn't going to work well. Back in the early 90's there was a lot of research done into trying to automatically parallelise sequential code to take advantage of SIMD and MIMD computing platforms. Personally, I wrote code on Thinking Machines CM-200, MasPar MP-1 and MP-2 all of which were SIMD and used a language based off C where the normal data types had plural versions to distribute data across the array. This was reasonably effective for some algorithms and the natural offshoot is the MMX/SSE type vector types on modern CPUs which are programmed in a similar way. MIMD programming was somewhat different and the closest you get these days is threading or MPI but these sit on top of traditional languages and are pretty nasty. Back in the late 80's there were transputers which were programmed in OCCAM. This language was naturally parallel and easy to use. Of course, it was a new language so people avoided it but if you wrote code in OCCAM it was very fine grained parallelism and this made it very simple to serialise back onto a smaller number of CPUs. CSP based languages (Ada was another one) would be a good way to work with many cores but I'm damn sure we'll end up with more half baked sequential code with bits of parallelism welded in.
  Parallel programming has really suffered in the last few decades such that I feel the current state of play (lots of cheap nodes in a cluster hooked together with ethernet and programmed using MPI or similar) is pretty desperate. Cheap, but nasty.
  
  --
  "I have the attention span of a strobe lit goldfish, please get to the point quickly!"
8. Re:Useless by WhoBeDaPlaya · 2008-07-02 13:05 · Score: 1
  
  Hell, even critical tools in designing chips can be parallelized - floorplaning as well as place-and-route.
  In both these cases though, you run into the problem of affecting solution quality if you don't split up the problem carefully. I've worked a little on a non-stochastic floorplanner using joint shape curves with min-cut parttioning as well as with the people behind FastPlace and FLUTE, and this is definitely an issue.
9. Re:Useless by MrSteveSD · 2008-07-02 13:38 · Score: 1
  
  Well I should think that a lot of programs will benefit from calling prebuilt components that themselves make use of parallelism. So for example, if you write a graphics program like the GIMP, many of the routines you call, like Gaussian Blur, will themselves be exploiting all the cores even if you don't explicitly do that yourself.
10. Re:Useless by Anonymous Coward · 2008-07-02 13:40 · Score: 0
  
  I've got three words for you:
  Glasgow Haskell Compiler.
  More info
  Basic parallelism hinting
  More sophisticated tools
11. Re:Useless by steelfood · 2008-07-03 02:04 · Score: 1
  
  Because with a few exceptions, our brains are wired serially, as is our thought process.
  Parallel programming is most applicable for simulations, where the program is actually trying to immitate real world situations. However, it is terrible for application programming, as applications are user-input-based, and because users' brains are serial, user applications are largely serial.
  Quite frankly, we really only need 3 threads to look up e-mail, and 2 for word processing or web surfing. And since we can't do more than one thing at the same time, we don't need 7 cores total in order to do all tasks effectively, just 3. If we add background OS processes, we might need a few more threads, but 8 is already sufficient to cover all everyday tasks. And by the time it gets to 8, the bottleneck isn't the CPU or system memory, it's the storage medium, HDD, SSD, whatever, which is sequentially accessed. Yes, we can implement RAID, but how many regular, average Joe users even know what RAID is, much less need it.
  
  --
  "If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
12. Re:Useless by Tweenk · 2008-07-03 06:31 · Score: 1
  
  Beacuse we (humans) run serially. How many things can you do at once? I have much trouble doing two, even when neither of them requires any significant attention or concentration.
  
  --
  Those who would give up liberty to obtain working drivers, deserve neither liberty nor working drivers.
Stupid Dyslexia by FlyingSquidStudios · 2008-07-02 08:54 · Score: 1

I read that as 'Thousands of Crows' and had this great image of flocks of birds flying out of my ethernet port, delivering packets to the world.

--
http://twitter.com/OLDTELEGRAM
Already Happening by sheepweevil · 2008-07-02 08:55 · Score: 3, Informative

Supercomputers already have many more than thousands of cores. The IBM Blue Gene/P can have up to 1,048,576 cores. What Intel is probably talking about is bringing that level of parallel computing to smaller computers.
1. Re:Already Happening by Piranhaa · 2008-07-02 09:08 · Score: 1
  
  I'm thinking Intel is talking about bringing thousands of cores to minimal piece(s) of silicon. The Blue Gene still uses single cored processors, just mass amounts compacted together by a crossbar switch.. (http://en.wikipedia.org/wiki/Cyclops64). I highly doubt Intel would be referring to having a system with each core separate.. That was my take anyways
What's in it for us? by ClosedSource · 2008-07-02 08:59 · Score: 1

I understand why Intel is so interested in multiple cores - they can't make the faster single-core chips that the market wants.
The question is what's our motivation? Unless software performance is approximately linearly proportional to the number of cores (e.g. a 10 core cpu can run a software application 10x faster than it ran before it was made core-aware), it probably isn't worth converting legacy apps.
1. Re:What's in it for us? by SeekerDarksteel · 2008-07-02 09:17 · Score: 1
  
  1) It doesn't necessarily have to be linearly proportional. It just has to be greater than the performance we could gain by making more complex single cores. In fact, scalability is more important in some sense than efficiency (speedup/core). The most significant advantage of parallelization isn't speedups, but opportunities. It's not that you can do the same task twice as fast, it's that you can do a problem twice as large in the same amount of time.
  
  2) It very much is, currently, a solution in search of a problem. But that doesn't mean it's not worth pursuing. It's a chicken and the egg kind of scenario. We don't have a lot of uses for a thousand cores because we dont have thousand core systems to develop useful applications for.
  
  --
  The laws of probability forbid it!
2. Re:What's in it for us? by ClosedSource · 2008-07-02 13:12 · Score: 1
  
  "It doesn't necessarily have to be linearly proportional. It just has to be greater than the performance we could gain by making more complex single cores."
  Well, given that it requires additional time and money to take advantage of it, there needs to be a commensurate performance advantage. It's not enough to be merely "greater than" a single core.
3. Re:What's in it for us? by Courageous · 2008-07-02 14:21 · Score: 1
  
  We'll soon all be like Mac fanboys in the 90's, gleefully awaiting he next Photoshop benchmark from Apple, and talking about how fast our Mac's will be.
  C//
Declarative languages is the answer by olvemaudal · 2008-07-02 08:59 · Score: 3, Interesting

In order to utilize mega-core processors, I believe that we need to rethink the way we program computers. Instead of using imperative programming languages (eg, C, C++, Java) we might need to look at declarative languages like Erlang, Haskell, F# and so on. Read more about this at http://olvemaudal.wordpress.com/2008/01/04/erlang-gives-me-positive-vibes/
1. Re:Declarative languages is the answer by Colin+Smith · 2008-07-02 09:21 · Score: 1
  
  Bingo!
  Pull not push.
  
  --
  Deleted
2. Re:Declarative languages is the answer by tanadeau · 2008-07-02 15:10 · Score: 2, Informative
  
  Declarative languages are ones like Prolog. You're talking about functional programming (Lisp, Haskell, Erlang, OCaml, etc.) which is a wholly different (and easier to understand) beast.
Re:Imagine the new math! by TaoPhoenix · 2008-07-02 08:59 · Score: 1

"Problem #6.
If Intel makes a machine with 875 cores and there are 413 machines in your Beowulf Cluster, how many total cores are there?"

--
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
just one more.. by Anonymous Coward · 2008-07-02 09:00 · Score: 0

"Oh, the fools! If only they'd built it with 6000 and 1 cores"
Lookahead/predictive branching is one option... by Cordath · 2008-07-02 09:00 · Score: 4, Interesting

Say you have a slow, plodding sequential process. If you reach a point where there are several possibilities and you have an abundance of cores, you can start work on each of the possibilities while you're still deciding which possibility is actually the right one. Many CPU's already incorporate this sort of logic. It is, however, rather wasteful of resources and provides a relatively modest speedup. Applying it at a higher level should work, in principle, although it obviously isn't going to be practical for many problems.

I do see this move by Intel as a direct follow up to their plans to negate the processing advantages of today's video cards. Intel wants people running general purpose code to run it on their general purpose CPU's, not on their video cards using CUDA or the like. If the future of video game rendering is indeed ray-tracing (an embarrassingly parallel algorithm if ever there was one) then this move will also position Intel to compete directly with Nvidia for the raw processing power market.

One thing is for sure, there's a lot of coding to do. Very few programs currently make effective use of even 2 cores. Parallelization of code can be quite tricky, so hopefully tools will evolve that will make it easier for the typical code-monkey who's never written a parallel algorithm in his life.
1. Re:Lookahead/predictive branching is one option... by umghhh · 2008-07-02 09:22 · Score: 1
  
  As for monkeys this has been tried many times. Sadly some of the monkeys I saw had degrees in computer science. Good thing about these degree monkeys - they did not defecate on the keyboard (at least the ones I knew did not) which the real monkeys did when confronted with similarly daunting task: http://www.wired.com/culture/lifestyle/multimedia/2003/05/58790
  It may be just a hunch but I think the parallel processing will take some time and brain power to use properly and it is going to stay this way for a while.
Is that really a good idea? by neokushan · 2008-07-02 09:03 · Score: 3, Interesting

I'm all for newer, faster processors. Hell, I'm all for processors with lots of cores that can be used, but wouldn't completely redoing all of the software libraries and such that we've got used to cause a hell of a divide in developers?
Sure, if you only develop on an x86 platform, you're fine, but what if you want to write software for ARM or PPC? Processors that might not adopt the "thousands of cores" model?
Would it not be better to design a processor that can intelligently utilise single threads across multiple cores? (I know this isn't an easy task, but I don't see it being much harder than what Intel is proposing here).
Or is this some long-time plan by intel to try to lock people into their platforms even more?

--
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
1. Re:Is that really a good idea? by furball · 2008-07-02 09:36 · Score: 1
  
  It's a two fold problem. The first problem is writing code so that it can make use of multiple cores. The second problem is writing code so that it doesn't significantly add overhead when using a single core.
  The first problem isn't simply due to the limitations of your CPU. Most shipping computers have a GPU. Even if you have a single core processors, you actually have two processors capable of handling tasks. If your GPU is idle, it's a candidate to be tapped to take on additional tasks that your single core isn't capable of handling (see also, OpenCL).
  The second problem is a run-time problem. What if your GPU is actually busy and you don't have an idle second processor to work with? How do you kick down to single core mode and avoid the overhead of threads? The OS can schedule threads and all but threads do have a cost, albeit smaller than the cost of a forked process itself. Having a structure lighter weight in resources than threads then become ideal in these scenarios.
2. Re:Is that really a good idea? by Anonymous Coward · 2008-07-02 09:59 · Score: 0
  
  I can see the marketing forces calling it "The New Y2K"
3. Re:Is that really a good idea? by dbIII · 2008-07-02 13:24 · Score: 1
  
  Sure, if you only develop on an x86 platform, you're fine, but what if you want to write software for ARM or PPC?
  The Nintendo DS has a dual core AMD CPU and came out more than two years ago. There have been multiprocessor PowerPC systems for several years.
4. Re:Is that really a good idea? by neokushan · 2008-07-02 21:26 · Score: 1
  
  True and that's fine, but has ARM or IBM given any indication that they plan to go from dual/tri-core setups to "thousands-of-cores" setups? That's what I'm really referring to.
  I guess I'm just not convinced that it's the best way forward.
  
  --
  +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
5. Re:Is that really a good idea? by Kamineko · 2008-07-02 22:24 · Score: 1
  
  A Nintendo DS with 'thousands of cores' would be hiiiiiiii-larious to program for, I tell ya.
6. Re:Is that really a good idea? by LordMyren · 2008-07-02 22:46 · Score: 1
  
  yes there will be a lot of work to make software that can use multi core systems. we dont even have established practices for most software for doing this.
  however i see no reason this would discriminate for x86. sparc has a 32 core design you can buy now. powerpc has a 8 way cpu announced. these are both more than x86 provides now. arm is close to releasing four way.
  as for "single threads/multiple cores", processors already do this, its called superscalar design and out of order execution. expanding the out of order execution outside of the cpu core is feasible but i think returns will be laughably horrendously bad. one of the cpu makers was talking about trying it a little while ago.
Desperation? by HunterZ · 2008-07-02 09:05 · Score: 3, Interesting

Honestly I wonder if Intel isn't looking at the expense of pushing per-core speed further and comparing it against the cost of just adding more cores. The unfortunately reality is that the many-core approach really doesn't fit the desktop use case very well. Sure, you could devote an entire core to each process, but the typical desktop user is only interested in the performance of the one progress in the foreground that's being interacted with.
It's also worth mentioning that some individual applications just aren't parallelizable to the extent that more than a couple of cores could be exercised for any significant portion of the application's run time.

--
Arguing about vi versus Emacs is like arguing whether it's better to make fire by rubbing sticks or banging rocks.
1. Re:Desperation? by electrosoccertux · 2008-07-02 09:59 · Score: 1
  
  This is when the management and business guys come in, and we tech folk finally get the last laugh.
  There's no reason they need a POS terminal to have 2 cores at 2.2Ghz with 6MB of onboard cache-- but if we put it in a system and label it "high end", they think it will help them sell more burgers so they buy it.
2. Re:Desperation? by Anonymous Coward · 2008-07-02 10:23 · Score: 0
  
  You're not thinking ahead. Today most processes are sequential for two reasons:
  We traditionally had only one processor, so it didn't make a performance difference whether we wrote parallel code for parallel problems or sequentialized the problem in the code. Sequential programs are easier to debug, so there was a good reason not to write heavily multithreaded code. Many legacy applications can be parallelized to some extent, but a rewrite is necessary to realize the full potential.
  The other reason is that we have not had the processing power for massively parallel problems, so we simply avoided them. But there are lots of interesting things which can be done with vast amounts of parallel processing power: Pattern recognition, natural language processing, computer vision, complex 3d graphics, FEM simulation, etc. You might notice that this looks like a list of recent graphics card demos, and that is not a coincidence: Graphics cards are leading the way. They use processors with hundreds of simple cores.
3. Re:Desperation? by Anonymous Coward · 2008-07-02 10:34 · Score: 0
  
  You are not seeing the picture at all. Parallel computing won't make you typewrite your text faster, or will it?
  The spellchecker can take lots of parallelization. The GUI can take lots of parallelization. OCRs. Speech recognition. Imaging software. Those are all current applications that could take some real parallelization love(2-4 cores are a joke). But in general, existing programs won't do what they already do much faster or much better than now. However, there are lots of things we cannot do now that will be feasible for the first time.
4. Re:Desperation? by Anonymous Coward · 2008-07-02 11:21 · Score: 0
  
  Indeed. In fact, it seems to me like it's an admission that all the previously mooted future technologies for improving clock speed have basically failed or been maxed out. What happened to optical computing? It seemed to be just around the corner for about 10 years. Are we really that desperate that we need to change the entire foundation of software engineering now?
Even 64 sounds optimistic by Joce640k · 2008-07-02 09:05 · Score: 2, Interesting

I'd be surprised if a desktop PC ever really uses more than eight. Desktop software is sequential, as you said. It doesn't parallelize.
Games will be doing their physics, etc., on the graphics card by then. I don't know if the current fad for doing it on the GPU will go anywhere much but I can see graphics cards starting out this way then going to a separate on-board PPU once the APIs stabilize.
We might *have* 64 cores simply because the price difference between 8 and 64 is a couple of bucks, but they won't be used for much.

--
No sig today...
1. Re:Even 64 sounds optimistic by everphilski · 2008-07-02 09:09 · Score: 1
  
  Our world runs in parallel; why shouldn't our computers?
  
  64 sounds good to me. I wouldn't complain if I had more.
2. Re:Even 64 sounds optimistic by bluefoxlucid · 2008-07-02 09:35 · Score: 1
  
  It'd make microkernels a lot faster too :p
  
  --
  Support my political activism on Patreon.
3. Re:Even 64 sounds optimistic by kipman725 · 2008-07-02 10:07 · Score: 1
  
  HUUUURRRRDDD!!!!
4. Re:Even 64 sounds optimistic by bluefoxlucid · 2008-07-02 10:38 · Score: 1
  
  Hurd is trash, it should run on top Minix.
  The Minix IPC facilities structure around security, making sure there's a minimum number of possible attacks in design.
  Mach, which HURD uses, structures around a core IPC controller that validates every message, and thus lags a lot. Very slow response due to security features.
  L4, which GNU wants to use, does away with that by blanket statement that the service should implement such checks itself. Of course, the service may very well have to do as much work; the core kernel doesn't look as busy and messages get where they're going faster, but they get processed slower and there's more code duplication.
  Minix + Minix services, or even Minix + HURD services, gives the best design here.
  
  --
  Support my political activism on Patreon.
5. Re:Even 64 sounds optimistic by LandDolphin · 2008-07-02 10:50 · Score: 1
  
  Don't worry, Microsoft will come out with an upgrade to Vista that will waste the computing power of thoe 64 cores on the OS.
  
  --
  Spelling and Grammar errors have been added to this post for your enjoyment
6. Re:Even 64 sounds optimistic by SiegeTank · 2008-07-02 11:41 · Score: 1
  
  Oh, I would be surprised if there wasn't computers in 5+ years with vastly more than 8 processors. Not only that I would say that the clock speed for each of the chips will be reduced substantially to increase the efficiency and power consumption would be reduced and the thermal output would be much less as unused cores could be turned off when they aren't required (unless some technology arrives to increase the efficiency of higher frequency cores - but atm higher cycles = thermal waste).
  
  The biggest problem will be the fact that this is will required distributed workloads for the core to work in this fashion - which is needless to say a difficult problem to address. Especially if you factor in that programming languages and/or compilers need to be modified substantially to work in this fashion.
Intel is building an FPGA by obender · 2008-07-02 09:06 · Score: 2, Interesting

From TFA:
Dozens, hundreds, and even thousands of cores are not unusual design points

I don't think they mean cores like the regular x86 cores, I think they will put an FPGA on the same die together with the regular four/six cores.
1. Re:Intel is building an FPGA by kipman725 · 2008-07-02 11:10 · Score: 1
  
  configurable hardware... mmmm each game with its own PPU, Ai proccesor. Each crypto program with its own crypto engine. Now that could give a massive speed up, but it will also massivly increas development costs.
That's all well and good..... by PontifexMaximus · 2008-07-02 09:07 · Score: 1

but can we PLEASE work on getting apps to run on more than just ONE core/processor for now? I mean it's amazing how many apps still are not SMP aware unless you can (possibly) compile them for that purpose and even then you don't always get the kind of increase you would expect. Let's start with programming on 2 cores and then maybe go to 4 or more. For now, the kind of stuff Intel is discussing is moot. If we can't get shit to run on 2 procs, why do we bother with thinking about 12?

--
Pax Vobiscum
1. Re:That's all well and good..... by Anonymous Coward · 2008-07-02 09:09 · Score: 0
  
  AMEN !
2. Re:That's all well and good..... by pimpimpim · 2008-07-02 09:26 · Score: 3, Informative
  
  bingo. The problem is there. I've followed an introductory course on parallel programming (not saying I'm an expert, though), and while the idea of multiple processor programming is fairly simple, the implementation is amazingly difficult and painful.
  Example: "race condition" Say processor one is trying to find the optimal value of variable A, and processor two is doing something different, but calling some subfunction which changes variable A, then processor one might keep on running forever.
  The other main problem is the deadlock: Processor one needs the final result of variable B to calculate variable A, but processor two needs the final result of variable A to calculate B. Both processors will come to a standstill, and the program is halting forever.
  For simple programs, these things are relatively easy to troubleshoot. But for your huge program package with hundreds of modules, it is almost impossible to know what is happening.
  Actually, it is the duty of intel and co. to find a way to prevent these situations, but also there, what kind of genius is able to program an automated debugger that can find deadlocks and race conditions.
  
  --
  molmod.com - computing tips from a molecular modeling
3. Re:That's all well and good..... by mrchaotica · 2008-07-02 14:14 · Score: 2, Informative
  
  but can we PLEASE work on getting apps to run on more than just ONE core/processor for now?
  Why?
  The kind of parallelism needed for a few cores (coarse-grained task parallelism) is entirely different than the kind of parallelism needed for hundreds or thousands of cores (fine-grained data parallelism). Designing for a few cores won't do us a damn bit of good when we have hundreds or thousands.
  
  --
  "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
4. Re:That's all well and good..... by pbrooks100 · 2008-07-02 23:09 · Score: 1
  
  The geniuses are working for NI. The deadlocks and race conditions can be controlled using dataflow programming. The language is called LabVIEW. http://www.ni.com/multicore
5. Re:That's all well and good..... by master_p · 2008-07-03 01:26 · Score: 1
  
  The race condition can be solved by copying the value of A and working on that. In purely functional programming languages, that everything is copied, you have no such problem.
  The deadlock problem is a logical one and it exits in single-threaded computations: if you need the value of A to compute B and the value of B to compute A, then there is something wrong with your algorithm.
Punting again by Waffle+Iron · 2008-07-02 09:10 · Score: 1

Intel tried to push the complexities of increasing computing speed off into software before. When they designed the Itanium, they figured that the software compiler would magically find extra concurrency in the apps and utilize the large number of functional units in the core, and that this would make other architectures obsolete. Well, it didn't quite work out as they planned.
Hopefully they won't spend $Billions going down the "hypothetical software will enable radical hardware changes" road again just to learn the same lesson as last time.
Start! What do they mean, start? by 4pins · 2008-07-02 09:10 · Score: 3, Interesting

It has been long taught in theory classes that certain things can be solved in fewer steps using nondeterministic programming. The problem is that you have to follow multiple paths until you hit the right one. With sufficiently many cores the computer can follow all the possible paths at the same time, resulting in a quicker answer. http://en.wikipedia.org/wiki/Non-deterministic_algorithm http://en.wikipedia.org/wiki/Nondeterministic_Programming

--
I will not mourn that which I never had to lose. - Unknown
Imagine a Beowulf cluster.... by davidwr · 2008-07-02 09:11 · Score: 5, Funny

oh nevermind, what's the point?

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Microsoft's reply by Anonymous Coward · 2008-07-02 09:15 · Score: 1, Funny

Prepare for thousand core dumps!
1. Re:Microsoft's reply by David+Greene · 2008-07-02 10:29 · Score: 2, Informative
  
  That's no joke. It's not at all unusual to have to wait hours for tens of thousands of core files to be produced on large HPC machines. Debugging at scale is a really, really hard problem.
  
  --
don't trust him! by nih · 2008-07-02 09:18 · Score: 0

Anwar Ghuloum, a principle engineer with Intel's Microprocessor Technology Lab
don't trust him! he just wants the ring!

--
I'm a rabbit startled by the headlights of life :(
1. Re:don't trust him! by ezzzD55J · 2008-07-02 09:27 · Score: 1
  
  Also, principles aren't things to be engineered :)
2. Re:don't trust him! by uncqual · 2008-07-02 10:08 · Score: 1
  
  And, the amazing thing is that TFA (yes, I admit to breaking the cardinal rule and did RTFA) got it right while the /. summary got it wrong.
  
  Well, okay, it's not amazing - cut/paste is such an advanced feature.
  
  --
  Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
Heat issues by the_olo · 2008-07-02 09:18 · Score: 3, Interesting

How are they going to cope with excessive heat and power consumption? How are they going to dissipate heat from a thousand cores?
When the processing power growth was fed by shrinking transistors, the heat stayed at manageable level (well, it gradually increased with packing more and more elements on die, but the function wasn't linear). Smaller circuits yielded less heat, despite being much more of them.
Now we're packing more and more chips into one package instead and shrinkage of transistors has significantly slowed down. So how are they going to pack those thousand cores into a small number of CPUs and manage power and heat output?
Have you ever seen VPU design? by Anonymous Coward · 2008-07-02 09:18 · Score: 0

That problem was solved by VPU design a long long time ago, there is an array of memory controller each has multiplexed pipe to every core in its block. The idea is data gets pipelined between the processor and memory, requests for start of transfer and end of transfer only changes the selector on the multiplexor. Any outstanding transfer requests get queued. Further more, data gets cached which the cores have direct access to.
1. Re:Have you ever seen VPU design? by MadnessASAP · 2008-07-02 10:02 · Score: 1
  
  How do you propose to physically lay these pipes down on the Silicon? Actually for that matter I'm kinda curious to see how Intel intends to connect anything to their thousands of cores.
  
  --
  I may agree with what you say, but I will defend to the death your right to face the consequences of saying it.
For those of us downstream on the dbx food chain by PolygamousRanchKid+ · 2008-07-02 09:19 · Score: 1

. . . thousands of cores are less than amusing . . .

--
Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
How many are? by spectrokid · 2008-07-02 09:19 · Score: 1

I mean, even your average plod starting Outlook on Vista, is starting dozens of fancy visualisation thingies, anti-spam algorithms, networking things...
Properly programmed, it can be torn apart in hundreds of tasks. It is not going to speed up your terminal window, no. But does your terminal window need speeding up?

--
10 ?"Hello World" life was simple then
All we need to do now... by DerPflanz · 2008-07-02 09:20 · Score: 3, Interesting

is find out how to program that. I'm a programmer and I know the problems that are involved in (massive) parallel programming. For a lot of problems, it is either impossible or very hard. See also my essay 'Why does software suck' (dutch) (babelfish translation).

--
-- The Internet is a too slow way of doing things, you'd never do without it.
I'm not bitter. by Headcase88 · 2008-07-02 09:21 · Score: 1

Games are already so suspiciously inefficient at managing the hardware they run on in order to help hardware companies push their newer products. It's going to be fun to watch games in the future somehow slow a 1000-core cpu to a crawl on the low detail setting, to help sell the 2000-core models.

They'll have an excuse if we have 3D monitors at that point, otherwise they'll just have to bullshit about particle effects taking more power (even on the low detail settings).

--
"When the atomic bomb goes off there's devastation...but when the atomic bong goes off there's celebraaaaation!"
1. Re:I'm not bitter. by GatesDA · 2008-07-02 10:07 · Score: 2, Informative
  
  They'll have an excuse if we have 3D monitors at that point
  3D monitors already exist and are available for purchase; there are even some that don't need glasses. To go with those, nVidia has stereo drivers up on their website that will work on all their cards and with most games. (Last I checked, ATI's stereo drivers only work on their workstation cards).
  To make a game work in 3D, the graphics card just renders two images -- one for each eye; that's not enough work to be used as an excuse for poor performance. Of course, you can always increase the size of armies and such if you WANT to lower performance. They'll find a way.
  http://en.wikipedia.org/wiki/Autostereoscopy
2. Re:I'm not bitter. by geekoid · 2008-07-02 12:42 · Score: 1
  
  "inefficient at managing the hardware they run on in order to help hardware companies push their newer products"
  Stop that conspiracy train. The reason for the slowdown Direct X/any layer between code and hardware.
  I know too many programmers in the game world who want their code to be as optimal as possible;however they don't have unlimited time or money to do so.
  Direct X was the wrong way to go for developers. It would ahve beed far better to go with distributed nformation on how to do specific tasks for a specific CPU.
  So most developers would get there "Computer functions and process tuning - 2008" engineering bokk off there shelf, look up the engineered way o do it and implement that.
  When we can do that, we have matured as a discipline.
  I mean Direct X is just a way to make things easier for developers. Unfortunately when implemented in software there is always bloat, but when a written piece of tested code is available to plug in, you have none of the extra cruft.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
3. Re:I'm not bitter. by cnettel · 2008-07-02 18:51 · Score: 1
  
  Stereoscopy still looks like shit for multiple viewers, or if you move your head too much (if you don't add head-tracking). True 3D might mean rendering 360 or so different images and then projecting them. (36 would also be ok, but the thing is then to generate a separate image for each relatively small span of angles.) THAT is a bandwidth and computing power killer.
Re: n/t by Anonymous Coward · 2008-07-02 09:21 · Score: 0

posting anon so dont mod mee
what is n/t??
Free Beer by wooferhound · 2008-07-02 09:22 · Score: 1

I sure would like to have 1000 Coors
Sure it would be nice to have 1000 cores in the CPU but it would suck if they were all 386sx 16mhz.

--
We are Dead Stars looking back Up at the Sky
1. Re:Free Beer by mapsjanhere · 2008-07-02 09:46 · Score: 1
  
  Depend; if it scales, what benchmark scores can you get with a 386sx at 16 GHz?
  
  --
  I'm aging rapidly, I bought a new game and had no idea if my machine was good for it.
Re:Start! What do they mean, start? by ratboy666 · 2008-07-02 09:25 · Score: 1

Yes, but...
Each choice point requires a core. You WILL run out of cores; it really doesn't matter how many you have... Unless, of course, you solve THAT problem, which scores you a Nobel Prize.

--
Just another "Cubible(sic) Joe" 2 17 3061
It's all changing too fast by blowhole · 2008-07-02 09:26 · Score: 2, Insightful

I've only been programming professionally for 3 years now, but already I'm shaking in my boots over having to rethink and relearn the way I've done things to accomodate these massively parallel architectures. I can't imagine how scared must be the old timers of 20, 30, or more years. Or maybe the good ones who are still hacking decades later have already had to deal with paradigm shifts and aren't scared at all?

--
"Ask me about Loom"
1. Re:It's all changing too fast by GatesDA · 2008-07-02 10:13 · Score: 5, Insightful
  
  My dad's been programming for decades, and he's much more used to paradigm shifts than I am. His first programming job was translating assembly from one architechture to another, and now he's a proficient web developer. He understands concurrency and keeps up to date on new developments.
  I'm reminded of an anecdote told to me during a presentation. The presenter had been introducing a new technology, and one man had a concern: "I've just worked hard to learn the previous technology. Can you promise me that, if I learn this one, it will be the last one I ever have to learn?" The presenter replied, "I can't promise you that, but I can promise you that you're in the wrong profession."
2. Re:It's all changing too fast by uncqual · 2008-07-02 10:35 · Score: 4, Interesting
  
  If a programmer has prospered for 20 or 30 years in this business, they probably have adapted to multiple paradigm shifts.
  
  For example, "CPU expensive, memory expensive, programmer cheap" is now "CPU cheap, memory cheap, programmer expensive" -- hence Java et al. (I am sometimes amazed when I casually allocate/free chunks of memory larger than all the combined memory of all the computers at my university - both in the labs and the administration/operational side - but what amazes me is that it doesn't amaze me!)
  
  Actually some of the "old timers" may be a more comfortable with some issues of highly parallel programming than some of the "kids" (term used with respect, we were all kids once!) who have mostly had them masked from them by high level languages. Comparing "old timers" to "kids" doing enterprise server software, the kids seem much less likely to understand issues like memory coherence models of specific architectures, cache contention issues of specific implementations, etc.
  
  Also, too often, the kids make assumptions about the source of performance/timing problems rather than gathering empirical evidence and acting on that evidence. This trait is particularly problematic because when dealing with concurrency and varying load conditions, intuition can be quite unreliable.
  
  Really, it's not all that scary - the first paradigm shift is the hardest!
  
  --
  Why is there an "insightful" mod and why isn't it "-1"? If I wanted insight, I wouldn't be reading /.
3. Re:It's all changing too fast by geekoid · 2008-07-02 12:46 · Score: 2, Insightful
  
  We're not scared. All the good ones spit in to their hands, brace themselves and say "Bring it on."
  Any old timers actually scared needs to leave, and don't let your beard get caught in the door on the way out, wuss.
  Don't worry about relearning, by the time this hits the market, tools will ahve been written, and there will ahve been a lot of documentation.
  It's going to be a great step in computing... Or it will get killed becasue the tools weren't developed fast enough.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
4. Re:It's all changing too fast by MrSteveSD · 2008-07-02 13:46 · Score: 1
  
  I can't imagine how scared must be the old timers of 20, 30, or more years.
  I'm more excited than scared. It's going to completely change things. Obviously debugging is going to be tricky though. Code that runs perfectly one time can fail the next time due to the OS juggling the threads about and exposing some problem. In fact you might have to run it a hundred times before the problem shows up. I think it's going to make development fundamentally harder and a more difficult mental task. I think this is a good thing, since perhaps programmers who can do it will be given more respect at work instead of being treated like they are flipping burgers.
5. Re:It's all changing too fast by skeptictank · 2008-07-02 17:01 · Score: 1
  
  It doesn't change as fast as the marketing hype out of Intel would have you believe. This is the same old song and dance that Intel was doing back in 2000. "Software developers have to learn to do parallel programming, compilers have to become smarter, blah, blah, blah - now my dog is gonna jump over this pony." IBM and Motorola were talking the same in the 90's as they pushed RISC and the PowerPC architecture.
  Intel is pushing this because its the easy way forward for them. As processes become able to etch smaller features, its easier to just shrink existing dies and put more transistors on a wafer than it is to find creative uses for the space that opens up.
  About 16 general purpose cores combined with programmable logic arrays would actually open up a new realm of possibilities for software, but the yield would be a lot lower than what Intel can get with 100's or 1000's of redundant cores on the same die. IOW, Intel would actually have to spend money on design and improved manufacturing processes.
6. Re:It's all changing too fast by MadMidnightBomber · 2008-07-02 21:03 · Score: 1
  
  I've been hacking properly for 14 years or so - starting with ARM assembler that is, not counting BASIC for good reason. Since then Java, C, perl and ML to name a few on everything from huge servers to embedded boards.
  Multiple cores are great, and to be honest, you can get plenty of concurrency issues using multiple threads on single core CPUs. Bring it on.
  
  --
  "It doesn't cost enough, and it makes too much sense."
Isn't access RAM going create a limit? by glgraca · 2008-07-02 09:27 · Score: 1

If you have some many cores and unless they are purely crunching numbers (even then they'll have to spit something out some time), isn't access to RAM going to limit the throughput of your system?
1. Re:Isn't access RAM going create a limit? by Anonymous Coward · 2008-07-02 18:09 · Score: 0
  
  Wow, you're amazing. You figured that out all by yourself? I guess we'll have to call intel, IBM, AMD, etc. and tell them to stop wasting their time, since you've informed the world of memory bandwidth constaints.
Re:Imagine the new math! by doti · 2008-07-02 09:34 · Score: 2, Funny

A lot.

--
factor 966971: 966971
look what happened to ps3 by edxwelch · 2008-07-02 09:37 · Score: 4, Interesting

So now we have a shit load of cores all we have to do is wait for the developers to put some multi-threading goodness in their apps.... or maybe not.
The PS3 was ment to be faster than any other system because of it's multi-cores cell architecture, but in a interview John Carmack said, "Although it's interesting that almost all of the PS3 launch titles hardly used any Cells at all."
http://www.gameinformer.com/News/Story/200708/N07.0803.1731.12214.htm
1. Re:look what happened to ps3 by mgblst · 2008-07-03 20:04 · Score: 1
  
  How is that interesting? That a complex system such as the PS3 doesn't get fully utilised by the initial games that come out? What are you, retarded?
enough by nategoose · 2008-07-02 09:38 · Score: 1, Redundant

64K cores is enough for anybody.
1. Re:enough by rfuilrez · 2008-07-02 11:28 · Score: 1
  
  I agree. No one could ever use that much.
Interesting challenges by Eravnrekaree · 2008-07-02 09:41 · Score: 2, Interesting

If people are writing their applications using threads, I dont see there should be a big problem with more cores. Basically, threads should be used where it is practical and makes sense and does not make programming that much more difficult, in fact it can make things eisier. Rather than some overly complicated reengineering, threads when properly used can lead to programs that are just as easy to understand. They can be used for a program that does many tasks, processing can usually be parallelised when you have different operations which do not depend on the output of each other. A list of instructions which depends on output of a previous instructions, which must run sequentially, of course cannot be threaded or paralellised. Obvious example of applications that can be threaded is a server, where you have a thread to process data from each socket, a program which scans multiple files, can have a thread for processing each file, etc.
it's.... by thermian · 2008-07-02 09:42 · Score: 4, Funny

OVER 9000!!!!!!11111one

--
A learning experience is one of those things that say, 'You know that thing you just did? Don't do that.' - D. Adams
1. Re:it's.... by Anonymous Coward · 2008-07-02 17:52 · Score: 0
  
  ok people. enough. please remember to keep your shit on 4chan.
Re:Imagine the new math! by Tacvek · 2008-07-02 09:49 · Score: 1

Only 413, as sadly I could not afford even one of those 875 core machines, much less 413.

--
Stylish sheet to fix many problems in Slashdot's D3: https://gist.github.com/801524
A parallel language is better by EmbeddedJanitor · 2008-07-02 09:50 · Score: 1

occam for instance: http://en.wikipedia.org/wiki/Occam_programming_language
Sure, you need to reqwrite code, but you need to do that anyway to get massive parallelism. As least occam provides the parallelism at a language level.

--
Engineering is the art of compromise.
it's not about cores by speedtux · 2008-07-02 09:50 · Score: 2, Interesting

If you put 1000 cores on a chip and plug it into a PC... very little would happen in terms of speedup.
What we need to know is the memory architecture. How is memory allocated to cores? How is data transferred? What are the relative costs of accesses? How are the caches handled?
Without that information, it's pointless to think about hundreds or thousands of cores. And I suspect even Intel doesn't know the answers yet. And there's a good chance that a company other than Intel will actually deliver the solution.
1. Re:it's not about cores by morganga · 2008-07-02 23:18 · Score: 1
  
  Azul Systems have already thought about this: See the Vega 3 7300 Series for their 864 core SMP Java based server.
2. Re:it's not about cores by speedtux · 2008-07-03 07:36 · Score: 1
  
  Lots of people have thought about it, and there are lots of technical solutions. That's not the question.
  The question is which of the well-known solutions Intel is going to push for.
  What Azul does matters to almost nobody since they simply aren't a big player and since their use of Java as the sole programming language makes their systems useless to most of the people who really need this.
Excel by ruiner13 · 2008-07-02 09:52 · Score: 1

Sweet, Excel will be able to compute every cell in a worksheet to 65535 in parallel! What a time saver!

--
today is spelling optional day.
Plodding processes.. by Joce640k · 2008-07-02 09:54 · Score: 1

Today's CPUs are already powerful enough for almost any amount of "look-ahead" I can imagine.
What predictive process would need multiple cores to be able to do their thing?

--
No sig today...
Intel is dead... by fluffykitty1234 · 2008-07-02 09:58 · Score: 2, Interesting

I have two comments:
1) Intel is doing this because they've run out of optimizations on single core systems, basically this is the only thing they have left to preserve their market. I expect this time next year you'll see ARM SoC's with 1Ghz+ processors that draw under 1W of power and sell for under $10. These cores will be changing the low end of the market. Intel won't be able to continue to charge $50 for a processor when you get the same or better perf for 1/5 the cost. The only real advantage Intel has is that Windows XP/Vista doesn't run on ARM.
2) The Processor Company Graveyard is filled with companies that have touted parallel processing solutions that were going to revolutionize the world of computing. Parallel processing is extremely difficult, and only fits a subset of computing needs, we will need fast single processor systems for a long time to come. I wish Intel luck on this endeavor, everyone else has failed miserably.
Re: n/t by Random+Destruction · 2008-07-02 10:00 · Score: 1

no text.

That space intentionally left blank.

--
:x
Profit!!! by DeVilla · 2008-07-02 10:03 · Score: 5, Funny

Hi. I make processors. I know a lot about processors. I think a big change is coming to processors. I think you should learn to use a lot of processors. A whole lot of processors. You need more processors. Oh, and did I tell you I make processors?
Our computers are already running in parallel by Joce640k · 2008-07-02 10:03 · Score: 1

Which of your daily compute tasks is bogging down and could use a boost from multiple CPUs?
Messenger? Word? Email?
I'm guessing "none of the above".
Games ... up to a point. Today's games are already pretty realistic on single/dual cores and the work is already being moved to dedicated CPUs (eg. graphics cards) leaving the CPU mostly idle.
Maybe you compress a lot of video. That's the only thing which could really benefit, but that's hardly a common task.

--
No sig today...
1. Re:Our computers are already running in parallel by Anonymous Coward · 2008-07-02 11:00 · Score: 0
  
  As a developer: compiling (non-C) code.
  And running large test suites.
2. Re:Our computers are already running in parallel by cnettel · 2008-07-02 17:50 · Score: 1
  
  Indexed searches are still non-instant. It is naturally an issue of I/O throughput as well. Building some software projects can take minutes, and there is at least room for some parallelization. Some more should be possible if we emit the frontend results first and use those for dependencies and then merge at a later stage (for example).
3. Re:Our computers are already running in parallel by everphilski · 2008-07-03 03:45 · Score: 1
  
  Which of your daily compute tasks is bogging down and could use a boost from multiple CPUs? Messenger? Word? Email?
  
  I'm an engineer who caps out his quad core Opteron on a daily basis, and submits jobs to queue on a cluster comprised of hundreds of nodes with 2, 4 and 8 processors per node. No, at the moment I can't have enough processing power.
  
  People laugh at primitive statements from pioneers in the industry regarding what is now pitiful amounts of memory being sufficient for future generations.
  People will do the same in the future about comments being made today about 1, 2, 4 cores being sufficient for future generations.
you mean SGI by ArchieBunker · 2008-07-02 10:08 · Score: 4, Insightful

SGI and or Cray were using NUMA a decade ago.

--
Only the State obtains its revenue by coercion. - Murray Rothbard
1. Re:you mean SGI by Kjella · 2008-07-02 10:46 · Score: 1
  
  SGI and or Cray were using NUMA a decade ago.
  Burroughs B6800 had it back in 1977, so it's hardly a new concept. That desktops would need it OTOH...
  
  --
  Live today, because you never know what tomorrow brings
2. Re:you mean SGI by Anonymous Coward · 2008-07-03 04:18 · Score: 0
  
  I guess everyone's favourite fat guy doesn't have precedence then eh?
  Numa numa!
That is a little silly by Layth · 2008-07-02 10:09 · Score: 1

Java takes advantage of multi-threaded functionality extremely well, and the API simplifies things quite a bit. To suggest that we need to rethink the way we program computers may just be a personal issue of yours.
I like lisp a lot actually; I think it's fun, so don't get me wrong.
However - the way the lisp is written is merely a style, which can be adopted into any imperative language as well. Unless you're talking about dynamic, self-altering code (which is a totally different subject than concurrency) there are no distict advantages to the lisp compiler that cannot be utlized in java.
1. Re:That is a little silly by LordMyren · 2008-07-02 22:41 · Score: 1
  
  Java lets you take advantage of multi-threaded functionality sure, but you have to manually instantiate all the tasks. You are imperatively telling it what to do. The syntax is hideous too, for any sort of high level parallelism library you end up re-creating function pointers and passing these tasks into your library. Its all perfectly doable and fits in the OO style, but its a manual process every time.
  Declarative languages lend themselves much more naturally to schemes like lazy evaluation, which gives your runtime the opportunity to evaluate what you need to process, and then launch a bunch of independent tasklets to compute the answer. Once you are at the end and looking back on what you need to compute, holding the depenency graph, finding things capable of running in parallel should be very simple.
I don't think they have by Spatial · 2008-07-02 10:10 · Score: 1

I don't think it's a technological limit, but rather an economic one; lack of competition in the high-end CPU market is why you don't see clocks like that. AMD simply have nothing to offer as competition in that domain. There is little doubt in my mind that Intel is capable of making 3.8Ghz and even 4Ghz CPU models, because many people have overclocked the newer 45nm dual-core chips to such levels without much hassle. If memory serves, up until around 3.5Ghz you don't even need anything more than the stock cooling system!
shocking! by jzk · 2008-07-02 10:14 · Score: 1

Call Ripley's!! Intel pushing for a future in which they can sell more silicon!
Turtles And Intel Cores by Anonymous Coward · 2008-07-02 10:16 · Score: 0

I like turtles.
http://www.youtube.com/watch?v=CMNry4PE93Y
1. Re:Turtles And Intel Cores by Impy+the+Impiuos+Imp · 2008-07-03 01:45 · Score: 1
  
  "Back in my day, we didn't have no steenking kilo-core processors. No, back in my day, we had one core, and played Core Wars with one thread in that one core! And it wasn't even thread, or even a process. It was basically a round robbin executive loop executing one pseudo-instruction per worm or virus or whatever the hell you thought you were programming while Push-Up orange drool dripped down your chin and got in the keyboard which required four hundred dollars to replace, not $19.99 from the toy aisle in the local drug store. And we liked it! >:( "
  
  --
  (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
Cores? by mugnyte · 2008-07-02 10:20 · Score: 3, Interesting

Can't they just make the existing ones go faster? Seriously, if I want to start architectures around 1000's of independent threads of execution, i'd start with communication speeds, not node count.
It's already easy to spawn thread armies that peg all IO channels. Where is all this "work" you can do without any IO?
I think Intel better starting thinking of "tens, hundreds or even thousands" of bus speed multipliers on their napkin drawings.
Aside from some heavy processing-dependent concepts (graphics, complex mathematical models, etc) the world need petabyte/sec connectivity, not instruction set munching.
1. Re:Cores? by geekoid · 2008-07-02 12:50 · Score: 1
  
  Can't they just make the existing ones go faster?
  not practically, no. Unless there are some unexpected fab changes.
  All you are really saying is that to utilize this, a lot of other things need to change.
  I say Good, it's about damn time.
  If you can't see other uses, you aren't thinking about it hard enough.
  
  --
  The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
2. Re:Cores? by cnettel · 2008-07-02 18:59 · Score: 1
  
  Uh, what are you going to transfer with that petabyte/sec connectivity? To a desktop machine? Graphics, complex mathematical models and other stuff like that is what are going to use those cycles. Even those applications need serious bandwidth, but most machines are doing something besides routing packets.
3. Re:Cores? by mugnyte · 2008-07-03 09:27 · Score: 1
  
  to the above peer posts...
  I understand what you guys are saying - I'm being somewhat cheeky with that "existing ones go faster..." but to me, the world of computing is becoming more distributed. Average bandwidth rates are already way behind in the US.
  For most of the desktop/productivity apps I see in the public space, here's the future - how do more cores help?
  - Constant-connection personal devices (phones + music + etc)
  - Distributed desktops / licenses.
  - Hosted productivity applications.
  - Web 2.0 toolkit standardizations and higher adoption rates for pages.
  - Embedded and small-form-factor devices become more web centric (dashboards, phones, etc)
  I'll take a stab at my own question with some answers where cores could help:
  - Dual monitor or 3D displays
  - HD and similar video handling
  - Visual and sound recognition models
  - Realtime devices/systems for robotics, safety systems
  I don't mind the world of multi-core arrive in a rush, but I'm questioning the need for it without an "all boats rise together" redesign of the PC architecture.
A thousand cores on one CPU by Orion+Blastar · 2008-07-02 10:21 · Score: 1

but no programming languages or tools to take advantage of them. Most software is only written for one core. Very few if any support even dual cores.
I'd much rather see quantum computing become a reality instead of seeing a thousand cores and no way to make use of them all.

--
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
Databases and implimentation-neutrality by Tablizer · 2008-07-02 10:23 · Score: 4, Interesting

Databases provide a wonderful opportunity to apply multi-core processing. The nice thing about a (good) database is that queries describe what you want, not how to go about getting it. Thus, the database can potentially split the load up to many processes and the query writer (app) does not have to change a thing in his/her code. Whether a serial or parallel process carries it out is in theory out of the app developer's hair (although dealing with transaction management may sometimes come into play for certain uses.)
However, query languages may need to become more general-purpose in order to have our apps depend on them more, not just business data. For example, built-in graph (network) and tree traversal may need to be added and/or standardized in query languages. And, we made need to clean up the weak-points of SQL and create more dynamic DB's to better match dynamic languages and scripting.
Being a DB-head, I've discovered that a lot of processing can potentially be converted into DB queries. That way one is not writing explicit pointer-based linked lists etc., locking one into a difficult-to-parallel-ize implementation.
Relational engines used to be considered too bulky for many desktop applications. This is partly because they make processing go through a DB abstraction layer and thus are not using direct RAM pointers. However, the flip-side of this extra layer is that they are well-suited to parallelization.

--
Table-ized A.I.
1. Re:Databases and implimentation-neutrality by Shados · 2008-07-02 10:54 · Score: 4, Informative
  
  By "a lot of processing can potentially be converted into DB queries", what you discovered is functional programming :) LINQ in .NET 3.5/C# 3.0 is an example of functional programming that is made to look like DB queries, but it isn't the only way. It is a LOT easier to convert that stuff and optimize it to the environment (like how SQL is processed), since it describes the "what" more than the "how". It is already done, and one (out of many examples) is Parallel LINQ, which smartly execute LINQ queries in parallel, optimized for the amount of cores, etc. (And I'm talking about LINQ in the context of in memory process, not LINQ to SQL, which simply convert LINQ queries into SQL ones).
  Functional programming, tied with the concept of transactional memory to handle concurency, is a nice medium term solution to the multi-core problem.
2. Re:Databases and implimentation-neutrality by johanatan · 2008-07-02 16:28 · Score: 1
  
  Hahaha... i just found your reply after my last one. GP was the database post I was referring to. :-)
Hey remember the 1980's and the Amiga? by Orion+Blastar · 2008-07-02 10:24 · Score: 1

The Amiga had a 68000 chip for the main CPU but had custom processors for the graphics, sound, and I/O and AmigaDOS/AmigaOS was built around it.
We have come full circle now with dual core and up chips and the GPU being built into the CPU now, back to the Amiga, which was a superior system design.
The OS will have to be rewritten to support all of the new cores and special built in GPUs and other features. Windows, Linux, and Mac OSX need to become more like AmigaOS. Small in memory footprints and able to handle multiple processors.

--
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
1. Re:Hey remember the 1980's and the Amiga? by ergo98 · 2008-07-02 12:40 · Score: 2, Insightful
  
  We have come full circle now with dual core and up chips and the GPU being built into the CPU now, back to the Amiga, which was a superior system design.
  How is that back to the Amiga?
  The PC platform hit Amiga levels well over a decade and a half ago, with dedicated graphics hardware, dedicated audio hardware, dedicated network hardware, a numerical coprocessor, and so on. People need to stop claiming every new change finally brings things back to the Amiga. That argument is terribly old.
  And yeah I was into the Amiga and Atari ST and Mac Classic back in those days, but then I moved on.
2. Re:Hey remember the 1980's and the Amiga? by Orion+Blastar · 2008-07-02 13:14 · Score: 1
  
  Then maybe you'll remember that the Amiga 2000 and the Zorro expansion slots had CPU cards for emulating IBM PC systems via Intel chips and even had PowerPC expansion cards to rev the Amiga up to PowerMac and CHRP PowerPC technology? The Amiga was all about multiple processors and adding on more processors that AmigaOS could take advantage of via the expansion buses.
  
  --
  Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
3. Re:Hey remember the 1980's and the Amiga? by Anonymous Coward · 2008-07-02 14:04 · Score: 0
  
  ??? There's nothing stopping you from building a PPC coprocessing board for a modern PCI-E slot. Except for the fact there wouldn't be any useful software for such a thing.
4. Re:Hey remember the 1980's and the Amiga? by Orion+Blastar · 2008-07-03 05:54 · Score: 1
  
  But AmigaOS had useful software that could use such boards as additional processors, so you would have the 68000, 80386, and 601 PowerPC processors working at the same time doing different things. That is what I was talking about how modern operating systems need to become more like AmigaOS to be able to take advantage of multiple processors or multiple cores. Only the Amiga seemed able to do those things, but Commodore's nickle and dime marketing and lack of third party driver and software support killed the Amiga.
  
  --
  Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
5. Re:Hey remember the 1980's and the Amiga? by Warbot+1Alpha · 2008-07-03 07:35 · Score: 1
  
  Entity Orion Blastar, human beings cannot understand the upcoming singularity or past tech that worked that new tech is now based upon. Nor can they understand the nature of time-space in which technology effects it. Intel not know what they bring or what happen in future like you do. Please be advised when communicating with entities not unstuck from time as you are about the future and what technology it will bring, as they will not understand you. Selfunit is programmed to remind you of these things.
6. Re:Hey remember the 1980's and the Amiga? by Orion+Blastar · 2008-07-03 08:59 · Score: 1
  
  I somehow activated the Warbot and it started reading my Slashdot posts again. Sorry about that it is a prototype I've been working on for the past few years or so. Nice to see people modding it up though.
  
  --
  Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
Funny... by socialhack · 2008-07-02 10:26 · Score: 2, Funny

Back in 2002 when I was working for a software company that was using OCR on hundreds of thousands of images, I was pushing clustered computing. I had an engineer (not one of ours) tell me that it would probably never be practical to develop software to take advantage of multiple processors. I wonder what he would say today.

--
Never leave a dead horse unbeaten!
1. Re:Funny... by Tim+C · 2008-07-03 19:45 · Score: 1
  
  He was wrong even back then - games have been taking advantage of multiple processors (CPU + GPU) since the mid- to late-90s...
  
  --
  It's official. Most of you are morons.
I can see this being helpful by subspacemsg · 2008-07-02 10:26 · Score: 2, Interesting

Multi-core can be useful with existing programmin models. Imagine getting rid of the context switcher forever and executing threads/processes on a new core every time an application is launched or a thread is spawned. The OS can incorporate a Core manager similar to a memory manager.
This is an effective method as long as the processor is able to manage its load properly internally.
i.e if a processor has say 100 cores..with a combined processing capacity per unit time of Z and there are X threads and the processing capacity of 1 core per unit time is Y XY must always equal Z. The challenge is how do u manage Core loads within the CPU, if Intel can solve that uber multi can really take off.
Next up... by swilde23 · 2008-07-02 10:27 · Score: 1

the googol core.

--
There are 10 types of people in the world. Those that understand this sig, and those that beat up people who do.
wait...more cores? by Anonymous Coward · 2008-07-02 10:27 · Score: 1, Interesting

recently Intel came out with its Atom core. It is going into all sorts of things because it is smaller (1/10th the size of a normal core)and it draws a lot less power. It also has about half the power/cycles of some of the bigger cores. Yet this is more than enough for the normal user (/.ers excepted) All the normal people (my family) want to do is to surf the web, check email, watch movies, stuff that even Damn Small Linux can do. So it kind of begs the question: if a smaller and less powerful processor is selling so well, what kind of sales could we expect from something with a thousand or more cores?
Unganged dual channel by DrYak · 2008-07-02 10:31 · Score: 1

And unganged mode access too.
(i.e.: AMD Phenoms have dual channel memory controllers too. But those don't function as dual channel to boost 2x the bandwidth, but instead function as 2 independent controllers to help more tasks access memory at the same time)

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
no text by Anonymous Coward · 2008-07-02 10:32 · Score: 0

Occasionally, you'll also see it written as: nt, NT, no text, $, (cent symbol), (euro symbol), (British pound symbol), etc. Generally, though, it should be placed at the end of the subject to tell people not to bother with opening the whole message.
Oh great... by LilGuy · 2008-07-02 10:35 · Score: 1

Just as I started learning x86 assembly. I'm already about 20 hours into it and I'm not about to stop, but it may prove to be a waste of time 10 years from now.

--

You're nothing; like me.
Intel Says to Prepare For "Thousands of Cores" by Anonymous Coward · 2008-07-02 10:37 · Score: 0

Thousands of Gradius fans crying in agony!
The novelty by DrYak · 2008-07-02 10:44 · Score: 1

The only thing new that Intel brought to the table with this press release is the attempt to fool us into believe that {...} Intel is somehow innovating in some aspect or another.
The big innovation, according to Intel is that :
- those Intel's manycore chips actually use x86 ISA. And thus can be used standalone, as main processors. Whereas current GPU are rather special architectures. One can use them for special purpose computations. But one can't get the OS to run of them. (most of the current GPU have limited branching abilities and completely lack any function calling capabilities beside what is possible by in-lining.
- another argument from intel is that, because the x86 ISA is so much more popular, it will be easier to develop and the learn to use manicore chips (with everything looking much more like what it was on the desktop), than today's GPGPU which requires special libraries and special languages.
Whether these are right is non trivial question best left to the reader's discretion.

Face it: the age of the "CPU is the computing muscle" is long gone.
Well, at least until the next turn of the Wheel of reincarnation

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
My first thought... by Druppy · 2008-07-02 10:51 · Score: 4, Funny

Is it bad that my first thought when I saw this was: "But, my code already generates thousands of cores..."
Perfect for Microsoft by PapayaSF · 2008-07-02 10:51 · Score: 1

So your home user checking his email, running a web browser, etc is not going to benefit much from having all that compute power
But with 1,000 cores perhaps Microsoft could speed up Vista or Windows 7 so that 498 cores are devoted to Aero, 498 to DRM, and the last four could be used for work.

--
Q: What does the "B." in Benoit B. Mandelbrot stand for? A: Benoit B. Mandelbrot
Azul is almost to a kilocore by Anonymous Coward · 2008-07-02 11:00 · Score: 0

Azul Systems already ships a 54-core chip with systems up to 864-way SMP. Not quite thousands, but getting close. Today.
Their page
Difference by XanC · 2008-07-02 11:09 · Score: 2, Insightful

What's different this time may be that nobody else has anything better. Last time, AMD64 was the easier solution, and it clobbered Itanium. Can AMD (or anybody) simply choose to keep making single cores faster, or is multi-core the way CPUs really must go from here?
Apple and Snow Leopard by Anonymous Coward · 2008-07-02 11:11 · Score: 0

Anyone else find it odd that Apple is focusing it's next release around multi-core processing.
1. Re:Apple and Snow Leopard by nurb432 · 2008-07-02 11:35 · Score: 1
  
  Not really, since every chip you buy now has multiple cores.
  
  --
  ---- Booth was a patriot ----
NUMA NUMA by poot_rootbeer · 2008-07-02 11:18 · Score: 1

AMD has this thing called NUMA. What do you think "HyperTransport" means?
I assumed it was just meaningless marketing jargon, like Sega's "Blast Processing".
According to the summary... by Anonymous Coward · 2008-07-02 11:20 · Score: 0

They're all 'or cores'. Great, I'm always thinking "boy I wish I could do thousands of OR's at once."
how fast it will execute the current x86 apps? by Z80a · 2008-07-02 11:21 · Score: 1

because this IS the main selling point on any desktop computer
Thats nice, but.. by nurb432 · 2008-07-02 11:32 · Score: 1

Software bloat will increase to fill the available hardware and we will be in the same boat.

--
---- Booth was a patriot ----
1. Re:Thats nice, but.. by freedom_india · 2008-07-02 18:42 · Score: 1
  
  Well, the much vaunted VISTA can't use all of my 4GB RAM, let alone dual-core processing.
  The /PAE switch for boot.ini doesn't work at all and crashes to a BSOD.
  The VISTA/XP license prevents me from installing Quad-Core processor.
  and Intel wants to put 1,000 cores into one chip??? Wow they live in their own small world, don't they?
  Well, be prepared to see thousands of cores crashing simultaneously -:)
  
  --
  "Doing what i can, with what i have." ~ Burt Gummer
2. Re:Thats nice, but.. by Anonymous Coward · 2008-07-03 01:27 · Score: 0
  
  man, don't get all sarcastic because your shit is breaking
feeding the cores? I/O? caches? by Anonymous Coward · 2008-07-02 11:33 · Score: 0

And how are we supposed to feed all of these cores? What kind of memory and I/O bandwidth are we talking about? Cache sizes?
I think Sun's "Niagara" chips are actually a head of the curve on all of this, and while Sun may have many issues, designing systems with lots of I/O is not one of them.
Reminds Me.... by Anonymous Coward · 2008-07-02 11:33 · Score: 0

Sort of like the talk about how Itanium was going to require new compiler designs etc etc. Look how that turned out.
Functional paradigm (map/reduce) by johanatan · 2008-07-02 11:33 · Score: 1

I'm surprised no one has mentioned pure functional programming. It is 'side-effect free' so you can take a block of code and drop it on any core. This is the future.
Also, look at Google's Map/Reduce design. A great number of problems can be re-expressed in terms of map and reduce.
1. Re:Functional paradigm (map/reduce) by Shados · 2008-07-02 13:28 · Score: 1
  
  Nobody that you read maybe :) There are references to functional programming in the earlier posts (one by me, of course). Functional programming is sweet. Makes programs cleaner (when its used well of course), it shows your intent better to the code's reader, it takes fewer lines of code to do a lot of "loop heavy" operations, makes a tons of OO design patterns easier to implement, AND it is easier to optimize for multi core, weee!
  Though while theoritical functional programming may be side effect free, the real world stuff isn't. Many languages that have functional paradigms still have the standard features in them that allows a function to reach for stuff out of scope (let say, a Singleton), so that mess things up. But it is still FAR easier to handle than regular stuff.
2. Re:Functional paradigm (map/reduce) by johanatan · 2008-07-02 16:23 · Score: 1
  
  Nobody that you read maybe :) There are references to functional programming in the earlier posts (one by me, of course).
  Nobody that was modded up enough to be seen by an AC with default filtering in place. :-)
  
  Functional programming is sweet. Makes programs cleaner (when its used well of course), it shows your intent better to the code's reader, it takes fewer lines of code to do a lot of "loop heavy" operations, makes a tons of OO design patterns easier to implement, AND it is easier to optimize for multi core, weee!
  Though while theoritical functional programming may be side effect free, the real world stuff isn't. Many languages that have functional paradigms still have the standard features in them that allows a function to reach for stuff out of scope (let say, a Singleton), so that mess things up. But it is still FAR easier to handle than regular stuff.
  Agreed. Functional programming is awesome! Even though some of a program must obviously have side-effects (I/O anyone??), you can still divide the concerns such that a large majority of the program (or your standard library) does not have side effects. I noticed elsewhere that someone mentioned SQL and databases and their applicability to this discussion. SQL is probably the most common functional language, but there's really no need to use a database unless it's actually needed (as they seemed to suggest). Also, LINQ for C# works on in-memory collections as well as databases (and is arguably a cleaner syntax than SQL). I, for one, am quite interested to see functional programming come into the mainstream and eagerly await the day that a search for 'Haskell' on dice.com returns many more than 12 results (and this will quite possibly be precipitated by getting 'thousands [or even 10s] of cores'!
What about by Anonymous Coward · 2008-07-02 11:52 · Score: 0

http://en.wikipedia.org/wiki/XMTC ?
Memory on the chip by Skapare · 2008-07-02 11:53 · Score: 1

The more cores we have the better. Provided that we can supply memory bandwidth to the device.
With 1024 cores, that is definitely going to bottleneck an external memory bus. So put the memory on the chip, instead. That would certainly be a lot fewer cores. You either have smaller boards or more chips on the board.
Next, a system on a chip (SoC) complete with 3-D accelerated graphics. And you wonder why AMD bought ATI.

--
now we need to go OSS in diesel cars
1. Re:Memory on the chip by LordMyren · 2008-07-02 22:59 · Score: 1
  
  We'll definitely be seeing larger and larger caches on chip, but you cant fit very much. The Wii has ~24MB of fast onboard SRAM, which is where it does almost all its graphics work, but it has an unusually simple "1T" (one transistor ram) design to crap all that in. The PS3's SPE's have 16kb memory but insane bandwidth. Theres a place for onboard memory, but its limited.
  No the real solution is bigger pipes to the outside world. Graphics cards are a great example, watching the GB/s climb year after year. AMD has had scalable memory for years via NUMA. You may have noticed CPU RAM speeds are rapidly climbing: we went from SDR 133 in mid 90's to DDR to PC2-6400 (266mhz ddr) mid 2000's (a trickle pace so far) to..... DDR3 at 1.3 GHZ in `08; four times what it was early/mid decade.
  Intels Nehalem will release at 32GB/s per socket and climb to 50GB/s per socket. Graphics cards have been running ~140GB/s for a while (8800GTX).
2. Re:Memory on the chip by cnettel · 2008-07-03 01:03 · Score: 1
  
  Sorry, but mid-90s were more like 66 MHz EDO. If you were lucky. 133 MHz SDR around 2000. And then your trend of accelerating growth actually starts to fail.
kernel tasker responsibility by mckniffen · 2008-07-02 11:54 · Score: 1

Despite the huge push towards writing programs optimized to control their own threading, I still cannot comprehend why this is not left to the kernel tasker.
If the kernel can manage process time on one core, I feel that scaling this to work efficiently over a number of cores, with the use of semaphores to protect data and control thread access would allow for a much more efficient system level approach.
My experience controlling data access between multiple threads has been riddled with unneeded tweaking. I think that gcc should insert code to control memory leaks and process safety and the kernel should be in charge of tasking between cores.

fork() that intel.

--
Communism, its a party!
1. Re:kernel tasker responsibility by Skapare · 2008-07-02 12:05 · Score: 2, Interesting
  
  I think that gcc should insert code to control memory leaks and process safety and the kernel should be in charge of tasking between cores.
  Please limit this desire to languages like Java, Python, and Ruby. We don't need this in C. If you can't program without it, you shouldn't be programming in C.
  
  --
  now we need to go OSS in diesel cars
GPU, anyone? by protectr · 2008-07-02 11:58 · Score: 0

How much would someone bet that those will follow the very same restrictions that current GPUs have when they're used a stream procesoors? There aren't 10,000 ways to make parallel processing efficient.
If they don't put restrictions on when and how a program can use resources, simultaneous access to the memory by those cores would be a real nightmare to design, and worse to program. The best to currently use multiprocessing is by using GPGPU techniques, _because_ of those restrictions that make it possible to keep the GPU running without waiting too much on memory
May I refer you to: http://tech.slashdot.org/article.pl?sid=08/05/31/1633214
Stream processing has many more applications than games or scientific computing, Intel is seeing that. But it seems like Nvidia is way ahead in that race... Let's see if Intel will take the lead.
so, Intel made risc passé... by DragonTHC · 2008-07-02 12:02 · Score: 2, Insightful

and now they're bringing it back?
we all learned how 1000 cores doesn't matter if each core can only process a simplified instruction set compared to 2 cores that can handle more data per thread.
this is basic computer design here people.

--
They're using their grammar skills there.
Can't you only have 1 core? by kahanamoku · 2008-07-02 12:03 · Score: 2, Informative

By definition, isn't a core just the middle/root of something? if you have more than 1 core, shouldn't the term really be changed to reflect something closer to which it represents?

--
----- Concentrate on promoting more than demoting.
640 K? by jamrock · 2008-07-02 12:06 · Score: 1

Is that the number of cores or how much heat it dissipates?
1. Re:640 K? by Tweenk · 2008-07-03 06:27 · Score: 1
  
  You fail, temperature != heat
  
  --
  Those who would give up liberty to obtain working drivers, deserve neither liberty nor working drivers.
All processors by Anonymous Coward · 2008-07-02 12:06 · Score: 0

idle at the same speed...
maybe.. by dezent · 2008-07-02 12:11 · Score: 1

motherboards will be equipped with an extra 128mb chip just to keep /proc/cpuinfo.
Re:Imagine the new math! by Tubal-Cain · 2008-07-02 12:11 · Score: 1

If he's lucky, he might get his hands on a few dual core/processor machines.
Arificial intelligence and Computational Neurosci? by RockoTDF · 2008-07-02 12:13 · Score: 1

One of the main complaints about AI and Comp Neuro is that the brain is a massively parallel system....this sort of thing could open up all sorts of possibilities for more realistic brain simulation. As someone going into these fields, this got my attention real quick. I actually could use a beowulf cluster of these...

--
There is more to science than physics!

www.iomalfunction.blogspot.com
9.987878989, 99.9987834, or even 999.9912 of cores by GNUALMAFUERTE · 2008-07-02 12:42 · Score: 1

There, fixed it for you.

--
WTF am I doing replying to an AC at 5 A.M on a Friday night?
Wrong adjective by Anonymous Coward · 2008-07-02 12:42 · Score: 0

it seems Intel is pushing for a massive evolution in the way processing is handled.
It seems more like intelligent design to me. Intel isn't leaving the technology to morph on it's own. They are actively designing it. Technolgy doesn't evolve. It is designed, with changes implemented on purpose.
tens, hundreds, thousands? by Cope57 · 2008-07-02 12:48 · Score: 1

With that many cores, you still have to wonder, is it Vista ready?
When a thousand core cpu comes out, Vista should be ready for the desktop.

--
http://www.accountkiller.com/removal-requested
Yeah, right. by rtechie · 2008-07-02 12:51 · Score: 1

I don't think so. We're seeing decreasing returns in multi-core computing because it is still basically multi-CPU computing and many tasks are not easily parallelized. The notion that some revolutionary compiler or IDE is going to solve this problem is just wrong. Tell it to Itanic, that was based on exactly these assumptions and failed miserably because of them.
There are also serious problems with I/O with lots of cores. How do you feed them all? It seems like you'd need a LOT of very fast memory and interconnects, as close to the CPU as possible. I think the only way to get this to work would be to have embedded memory for each core IN ADDITION to duplicate system memory. Possible, but extremely expensive.
1. Re:Yeah, right. by Doctor+Faustus · 2008-07-02 14:41 · Score: 2, Insightful
  
  The notion that some revolutionary compiler or IDE is going to solve this problem is just wrong. Tell it to Itanic, that was based on exactly these assumptions and failed miserably because of them.
  With Itanium, they were trying to say compiler improvements could handle it invisibly, with no work from the application programmers. Taking advantage of more than two cores (since one can take care of other programs that would have slowed down your app) is going to take conscious thought about what can and can't be parallel. Taking advantage of more than a handful is going to take more fundamental shifts in how we program. They're asking a lot more this time.
  On the other hand, you could easily opt out of Itanium. Now, this is the only way your programs are going to get much future processing improvement. Ever. No matter who you're buying CPUs from.
Heh your right by DeadDecoy · 2008-07-02 12:55 · Score: 1

Just look at Vista. M$ must have been planning waaay into the future when we'll have a million cores.
Typo - ARM not AMD by dbIII · 2008-07-02 13:26 · Score: 1

The Nintendo DS has a dual core ARM CPU.
The F Programming Language by mevets · 2008-07-02 13:31 · Score: 1

Although they are all turing machines in the end, a functional programming language can separate the program from the implementation. You don't think in terms of contexts, a single program may use an arbitrary number of cores, in a very lightweight and low context method.
Each one effectively removes a to-be-evaluated expression from a list, and returns its evaluated result. In the process of evaluating the expression, it may add other to-be-evaluated expressions to this list, which will be evaluated by available cores. When the list is single valued, the 'program' is complete. The idea is that the system, rather than the programmer, stretches to the available parallelism. It won't help getting the last element of a list, but an entire list can be searched in parallel, presuming the comparison is more expensive than the traversal.
When a core has completed a given expression, its context is dead, or, in other words, the expression is the context.
two problems: 1. functional programming is hard. I know the math nerds will say its natural, which is true if you happen to be a math nerd. 2. most functional programming languages are about as intuitive as windows registry.
maybe we can ask Misters K&R to write an "F" language to do for functional programming what C did to the abyss of COBOL, FORTRAN, PL/I, ... just don't let Mr S near it.
1. Re:The F Programming Language by Hooya · 2008-07-02 15:10 · Score: 1
  
  Corp E has already done it. Check out Erlang sometime. The book "Programming Erlang" is an excellent intro.
Bill gates was just mis-quoted by Growlor · 2008-07-02 13:38 · Score: 5, Funny

He meant 640k CORES should be enough for anybody.
Backwards Compatability? by oljanx · 2008-07-02 13:40 · Score: 1

So how do you handle backwards compatibility? Let's say I have an application that runs just fun on a single core 1ghz processor. And it's not designed to use multiple cores. Now all the sudden I have 1000 100mhz cores, and my app is only designed to utilize a single core. Do you obscure parallel processing at the hardware level? ie, provide a logical "primary" processor capable of delegating tasks to it's minions?
GPUs Already employ this paradigm by Anonymous Coward · 2008-07-02 13:46 · Score: 0

GPUs already use massively parallel processing. If you think of the 800 stream processors in the rv770 core as individual cores, then you have your hundreds of cores.
I think Intel misses the fact that in other segments of the industry, their "new shiny ideas" are old hat.
Remember when the Pentium (1) brought strong floating point to the consumer market. Look back people, Alpha and sparc had the importance of FP down years before Intel really figured out what was going on.
Lastly, will I be overclocking cores by the batch or do I still have to see what each core will max out at?
So they're making a CBE like chip? by NotZed · 2008-07-02 14:20 · Score: 1

What's so 'massive evolution'ary about that?
Only a few years late to the game I guess.

--
_ // `Thinking is an exercise to which all too few brains
\\/ are accustomed' - First Lensman
Sometimes it's easier, though... by Anonymous Coward · 2008-07-02 14:29 · Score: 0

This is one of the things I noticed about working with CUDA (the general purpose computing API for Nvidia GPUs). There's a bit of extra complexity in the planning stage (you have to be really careful managing resources, or most of the chip will sit there doing nothing), but it also tends to eliminate a couple levels of nested loops and handle a lot of indexing / addressing type stuff automatically.
Die Size by Anonymous Coward · 2008-07-02 14:39 · Score: 0

Is it only me, who wondering about wafer process they are going to follow? Wouldn't that better to increase cache die area instead of cores?
Multi-core using Atom processor by flappinbooger · 2008-07-02 14:51 · Score: 1

When the UMPC's started coming out using the atom processor, a few things really stood out to me.

It seems to me that the die is very small, physically, and it is obviously a low power consumption and low heat chip.

It also isn't all that fast.

But what if you had, like, 10? 100? 1000, like the TFA says? NOW we're talkin.

--
Flappinbooger isn't my real name
Chickens by Viking+Coder · 2008-07-02 15:07 · Score: 1

Why did the multithreaded chicken cross the road?
to To other side. get the

--
Education is the silver bullet.
Re:[AC]Useless by everphilski · 2008-07-02 15:15 · Score: 1

I'm guessing it doesn't have the "Wow!" factor you were looking to get.

It wasn't for wow factor as you presuppose.

Also, a hell of a lot of the world runs synchronously and if it didn't some very, very bad things would happen.

Sure, but there are a hell of a lot of things in this world running synchronously, in parallel. And many of these separate synchronous events can be influenced by external forces. Nothing exists in a vacuum.

My computer is a tool, like my desk. I'm rarely working on one thing at once, multiple projects across multiple disciplines. Why should my computer be focused on one task? It shouldn't. It should be spread across many tasks. To do this, it can incorporate many parallel processors.

Now take it in context, among other things I write and work with engineering codes for large clusters, modeling and simulation of real-world phenomena. I work with parallel distributed code on a daily basis. This isn't wow factor for me, this is daily life.
The real point by Anonymous Coward · 2008-07-02 15:22 · Score: 0

The point is not that it will be faster for highly parallel tasks, but that it is becoming dificult to increase throughput by lowering latency. Intel is looking into multiple parallel processors as a way of increasing throughput, and attempting to develop software design to a point where it is useful.
Programming vs. Hardware by Anonymous Coward · 2008-07-02 15:53 · Score: 0

What do cores matter if the IDE doesn't keep pace?
The past called by chthon · 2008-07-02 16:30 · Score: 1

It wants its systolic systems, its thinking machines and other parallel architectures back.
I think that Intel is making a mistake here by calling upon programmers to solve the problem.
It is them who should be making their cores available in usable hardware architectures, but maybe they suffer from NIH, because all worthwile parallel architectures already exist.
Useless by FlightlessParrot · 2008-07-02 17:06 · Score: 1, Redundant

No one will ever need more than 640 cores.
Gaming? by phorm · 2008-07-02 17:31 · Score: 2, Informative

I'd say that it could have a rather hefty impact on the graphics industry (though to be fair, both tend to share tech fairly regularly as it is) as well as many others.

How about servers? If you have 1000 cores, and 1000 clients connecting through the network, then each core could service a client (though depending on what they're doing, IO and other issues also rear their heads). Another nice aspect would be that if you could fix a process to a certain # of cores, you could always be sure that it wouldn't max out your entire CPU capacity.
1. Re:Gaming? by Anonymous Coward · 2008-07-02 19:50 · Score: 0
  
  > How about servers? If you have 1000 cores, and 1000 clients
  > connecting through the network, then each core could service
  > a client.
  How much bandwidth do you think you have?
2. Re:Gaming? by walshy007 · 2008-07-02 20:08 · Score: 2, Informative
  
  "Another nice aspect would be that if you could fix a process to a certain # of cores" already can in linux, schedtool lets you set hard cpu affinities per process, you can let it only go on certain cores if you like
3. Re:Gaming? by vidarh · 2008-07-02 21:09 · Score: 1
  
  How about servers? If you have 1000 cores, and 1000 clients connecting through the network, then each core could service a client (though depending on what they're doing, IO and other issues also rear their heads).
  For some types of services, yes, but most network services are IO bound, to the point where getting the event loop right means your network service will spend most of it's time in kernel space waiting for the network driver or block device driver (for disk).
  Network services WILL scale nicely with more cores, but for network bound ones it means memory and bus bandwidth needs to scale accordingly, and it'll mostly matter if you need to be able to saturate many Gbps with small-ish requests (file serving etc. is much less CPU intensive than lots of tiny messages/requsts), so it's not relevant for most people. My current servers have "only" dual gigabit ethernet, and it's trivial to saturate that with existing hardware.
  Next up is disk. Scaling disk bandwidth is currently ridiculously expensive to the point where CPU often doesn't matter much for disk bound systems unless you have a ton of expensive NAS devices - you'll get more CPU power than you know what to do with if you buy cheap servers and spread the load (as an example: I'm ordering new database servers today - the ONLY driving factor for my current use is IO capacity and we're spreading it out over a small number cheap, small servers rather than buying a large storage array. Each of those servers come with 4 2.3GHz cores, and we'll rarely use more than half the capacity of one of them - still cheaper than every other solution I priced out)
  
  Another nice aspect would be that if you could fix a process to a certain # of cores, you could always be sure that it wouldn't max out your entire CPU capacity.
  Virtualization already does that for me. It would be nice to be able to pick the level of isolation on a per process group level, but that IS actually being worked on for Linux at least for memory and networking, and it'd surprise if it won't come for CPU usage as well as part of work to merge in OpenVz into the kernel as generic features.
bandwidth and IO by phorm · 2008-07-02 17:34 · Score: 1

CPU speed is far outstripping bus and memory bandwidth

One the the issues I'm continually faced with at work is not so much CPU horsepower anymore, but disk IO. Even with a good RAID setup, there's only so many clients you can service off a single machine at a given time. Removed storage capabilities like iSCSI and other forms of storage arrays can help this, but I'm not sure that even those are ready for 1000-core machines running as superservers.
1. Re:bandwidth and IO by jambox · 2008-07-03 02:55 · Score: 1
  
  Quite true. Our server at work is a best of machine, (well, a dual Xeon with a bunch of RAM) so any code doing stuff in-memory flies along, even with 200+ users logged on. But as soon as you start looping databases, you have to worry about how long it'll take to run at peak times.
  
  --
  You thought you could break the laws of physics without paying the PRICE?
Hey retard... by Anonymous Coward · 2008-07-02 17:59 · Score: 0

I'm running Vista, and DWM (aero) and every background task including your DRM boogeyman (as if it did anything at all when not playing back protected media), uses like 1% of the CPU, so why don't you shut up and learn something instead of spreading your idiotic FUD.
4 part parallel computing lecture series at Resear by Anonymous Coward · 2008-07-02 18:02 · Score: 0

I am not a programmer, just a lowly MS Servers/Photoshop/Photographer/inactive biz atty guy. However, that being said, to me, these webcasts are quite good... particularly the 4 part parallel computing lecture series. It clearly breaks down the problem from the "computationally and parallellizably trivial" to the real and very hard challenges in problems that are extremely difficult and complex to solve ... the lectures are by a master of these issues and of the domain; Geoffrey Fox.
kellybundy@operamail.com is my postable email. (checked only in rare, comatose, delusional spam-loving moments)
Anonymous Coward
------------
begin links:
------------
Technical Computing @ Microsoft: Lectures Series on the History of Parallel Computing - Part 1
Geoffrey Fox, Ph.D., professor, Computer Science, Informatics, and Physics at Indiana University
February 26, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11073&fID=569
Technical Computing @ Microsoft: Lectures Series on the History of Parallel Computing - Part 2
Geoffrey Fox, Ph.D., professor, Computer Science, Informatics, and Physics, Indiana University
February 27, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11071&fID=569
Technical Computing @ Microsoft: Lectures Series on the History of Parallel Computing - Part 3
Geoffrey Fox, Ph.D., professor, Computer Science, Informatics, and Physics, Indiana University
February 28, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11070&fID=569
Technical Computing @ Microsoft: Lectures Series on the History of Parallel Computing - Part 4
Geoffrey Fox, Ph.D., professor, Computer Science, Informatics, and Physics, Indiana University
March 1, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11069&fID=569
The Stanford Data Stream Management System
http://www.researchchannel.org/prog/displayevent.aspx?rID=4355&fID=569
Parallel Execution Models for Future Multicore Architectures
Guri Sohi, faculty member and chair, Computer Sciences Department, University of Wisconsin-Madison
February 17, 2006
http://www.researchchannel.org/prog/displayevent.aspx?rID=4793&fID=569
SaC: Off-the-Shelf Support for Data-Parallelism on Multicores
Dr. Sven-Bodo Scholz, senior lecturer, University of Hertfordshire
March 30, 2007
http://www.researchchannel.org/prog/displayevent.aspx?rID=11269&fID=569
http://www.researchchannel.org/prog/displayevent.aspx?rID=24404&fID=569
Stream Programming: Luring Programmers into the Multicore Era
New capabilities with parallel abstraction that simplify application development, becoming more appealing to
programmers.
The Center for Parallel and Distributed Computation
http://www.researchchannel.org/prog/displayevent.aspx?rID=2380&fID=569
The Google Linux Cluster
Infrastructure of Google web search
http://www.researchchannel.org/prog/displayevent.aspx?rID=2879&fID=569
-----
end links
-----
Are you sure it would benefit? by Joce640k · 2008-07-02 18:03 · Score: 1

At the moment my CPU usage never goes much above 50% when compiling no matter how many threads I tell it to spawn.
This suggests I'm either I/O bound (on one of those new-fangled Velociraptor drives...) or stalled because of build dependencies.
I've only got two cores at the moment. Adding more wouldn't necessarily speed up my compile times. YMMV.

--
No sig today...
1. Re:Are you sure it would benefit? by Joce640k · 2008-07-02 18:14 · Score: 1
  
  ...of course this result could be a problem with the way my IDE does the build. :-)
  
  --
  No sig today...
Imagine ... by crapdot · 2008-07-02 18:25 · Score: 1

... idle process counting thousands of processor-time seconds PER SECOND. Cool!
Modded as Funny? by Anonymous Coward · 2008-07-02 18:29 · Score: 0

This is modded as funny now, but this is almost CERTAINLY where we are headed. The computing world moves fast, and certain new fabrication techniques may move it even faster..... I can see a near future where we have a million quantum cores in our cell phones, and the term super computer literally does not exist any more in any meaningful way, because every computer has twice the computing power of all computers currently existent. I mean, that has happened already since the days of ENIAC, has it not?
Now what is MORE interesting to me is that at some point in the near future I think computer OSes will operate like a cloud and instantly turn all processors (which will all be multicore, eventually), within in wifi-z or whatever we have at that point into an instant cluster computer. It's hard to think that far ahead of the current box, and envision exactly how applications and interfaces will work at that point, but I still don't see that as necessarily funny, so much as interesting....
1. Re:Modded as Funny? by jedidiah · 2008-07-03 03:08 · Score: 1
  
  There are already consumer systems that work in a similar fashion.
  That sort of idea is not so far fetched or so far in the future for some of us...
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
Cooling. by myspace-cn · 2008-07-02 18:51 · Score: 1

Anyone remember the 10 Deca chip? Blazing fast, but...So hot it burns up.
2003 is calling by LordMyren · 2008-07-02 21:40 · Score: 1

Niagra2 is 8 core, and 4 way SMT per core. Given that each core has multiple functional units, its very close to being a fully 32 way cpu. It feeds on four dual channel memory pipes. The servers running these dont need special software to make use of these cores, they just handle lots and lots and lots of user requests per second. For the most part they're webservers and fileservers, but they'd almost certainly make excellent mainframes for large multi user environments/virtualized systems. All it takes to use this multi core cpu is a dozen 5-watt 166mhz thin clients.
Besides multi user environments, theres already plenty of data parallel tasks that can use however many cores you give them. Graphics is just one, but neutral nets, signal processing, simulation tasks of all kinds, these tasks can all parallelized to any degree of parallelism you can build. 1000 will be laughably small by 2010; already today you can buy an 800 core cards in Best Buy: just ask for an AMD 4870 and pay your $200.
Video and audio processing can eat all 1000 by alukin · 2008-07-02 22:11 · Score: 1

Multimedia apps are resource hogs today and this stuff will easily eat up all 1000 of cores. Yes, RAM is bottleneck but sound and video processing may be parallelized easily. Voice and image recognition, cryptography, voice synthesis, neural networks, ....
A whole lot of modern tasks may utilize far more then today's 4 cores.
Re:9.987878989, 99.9987834, or even 999.9912 of co by V!NCENT · 2008-07-02 22:50 · Score: 1

Over 9000? WTF! http://www.youtube.com/watch?v=VJerOY0xqIw

--
Here be signatures
Applications that are multi-core aware? by EmagGeek · 2008-07-02 23:05 · Score: 1

I sure haven't used any... nor seen any... Is there a list somewhere of software that actually can use more than one core? Just once I'd like to see it in action. None of the software I use will escape a single core in my X2.
1. Re:Applications that are multi-core aware? by cnettel · 2008-07-03 00:58 · Score: 1
  
  I sure haven't used any... nor seen any... Is there a list somewhere of software that actually can use more than one core? Just once I'd like to see it in action. None of the software I use will escape a single core in my X2.
  How are you measuring this? In an app that's mostly idling, a very short spike with two active threads on separate cores won't show up unless you actively look for it. Your assertion that you haven't used any such software indicates just ignorance to me.
Ctrl+Alt - Delete by Anonymous Coward · 2008-07-02 23:26 · Score: 0

You are going to need one bad ass graphics and display setup to view them in task manager
64K by Anonymous Coward · 2008-07-03 00:18 · Score: 0

I would think 64,000 cores to be more than anyone would ever need.
We'll probably need 50 cores... by crivens · 2008-07-03 00:41 · Score: 1

We'll probably need 50 cores just to run the next version of Windows!
1. Re:We'll probably need 50 cores... by freedom_india · 2008-07-03 04:22 · Score: 1
  
  Oh yeah! And M$FT will cripple Vista to use only 2 out of them for Home use and extort $300/CPU for the Pro version.
  
  --
  "Doing what i can, with what i have." ~ Burt Gummer
What about software using these cores? by kde_rocks · 2008-07-03 01:19 · Score: 1

The problem with thousand of cores is not hardware related. Yes, Intel can build thousands of core by continued innovation in the hardware. However, the software community is still struggling to understand how to use even 8 cores. The concurrency issues are enormous. For example look at the following post: http://kashi.webhop.net/blog/Technology/index.php/archives/22 Till we solve the concurrency issues the increased number of core will not achieve any higher performance. The C++ is just starting to deal with concurrency. C++0x adds threads support however, that is just the starting point.
It's not hard, we've been doing it wrongly. by master_p · 2008-07-03 01:22 · Score: 1

If you use mutexes, semaphores, critical sections etc then parallel programming is indeed hard. These low level primitives should only be used for coding parallel APIs, where the user never sees them.
One such example is the Actor model: objects communicate through messages. It doesn't get simpler than that, and the low level communication primitives are only used at the message queue of each object (because one thread writes the queue, one thread reads it).
Imagine a program written in an object-oriented language where each object is an Actor! if you have 10,000 objects, and 10,000 cores, each core can represent one object. Sequential algorithms could then be parallelized automatically, without being re-written, since a method call becomes a message...there are solutions for parallel programming, it's just that the major programming languages define a culture that is hard to break.
For a successful application of the Actor model in the industry, you can check the programming language Erlang used in telecommunications.
Missing the point by Orgasmatron · 2008-07-03 01:58 · Score: 2, Insightful

The point is that this is going to happen, whether anyone likes it or not.

CPU clock speeds ran into the brick wall a few years ago. Here is a chart showing CPU clocks from 1993 to 2005.

There have been no major performance improvements from that direction for the last few years, and probably won't be any more without a major breakthrough in semiconductors.

Moore's law is about transistor counts, and shows no real signs of stopping. Every 18 to 24 months, we double the number of transistors on a given wafer/die. The transistion to 64 bit CPUs used a generation or two of those extra transistors, but we aren't likely to move to 128 bits soon. We are already pretty deep into the diminishing-returns curve for on-die cache.

What is left to consume those transistors?

More cores. Lots more cores. If you replace your CPU every 2 years, you can pretty much bet that each one you buy for the next decade or so will have twice as many cores as the one it is replacing.

And if developers and compilers get good at managing parallel code (and they have no choice in this), you can expect core counts to go up even faster than doubling ever couple of years.

--
See that "Preview" button?
Stop Talking, Start Doing by Anonymous Coward · 2008-07-03 03:34 · Score: 0

So far Intel's processors got 4 cores maximum. PS3's Cell got 8 cores, and Cell is not news for a long time.
A little less conversation, a bit more action please Intel.
Upgrade now?! by Anonymous Coward · 2008-07-03 03:51 · Score: 0

"even thousand of cores"
Finally, someone plans to build a CPU that can run Vista!
trying to catch NVIDIA by Anonymous Coward · 2008-07-03 04:10 · Score: 0

Tesla chips are ready to ship from NVIDIA (240 cores per chip).
Intel talks up vaporware. Tells to prepare for dozens of cores.
Apple and "Grand Central" multicore? by llamafirst · 2008-07-03 06:26 · Score: 1

If this is Intel hinting at future products, this would explain Apple's new "Grand Central"...Apple knows about the upcoming super-ultra-parallel chips?
Grand Central is a Snow Leopard feature to make it "much easier for developers to create programs that squeeze every last drop of power from multicore systems."
http://www.apple.com/macosx/snowleopard/
Hey we have a solution! by owlstead · 2008-07-03 06:52 · Score: 1

To all developers with a problem: make your problem match our solution!
At least Sun got the idea right with the Niagara based processors. You have a problem: a high load web or database server which is inefficient. Ok, what are the problems 1) IO, 2) energy usage 2) SSL performance. Ok, here you have a CPU with many cores to make sure the IO is saturated, bold on two 2 10 Gbps NIC's and make sure it is nicely under clocked. Add crypto systems to speed up SSL (and change the crypo API in Java to make it work - whoops, almost forgot that part).
Well, that was a very direct approach to solving a problem.
MOD PARENT UP (Funny) by shentino · 2008-07-03 07:40 · Score: 1

I take exception to that statement.
Video Compression! by benwaggoner · 2008-07-03 08:45 · Score: 1

Video compression, and media processing in general, can scale up to 1000+ parallel threads, although current apps will need to be re-architected. I regularly have my 8-core workstation tied up for 24+ hours doing media processing, so this sounds really good to me!
Current compression products (Rhozet's Carbon Coder is the biggest example) can already scale up happily to 16 and 32 cores.

--

My video compression blog
Relax Boys...It's called a Paradigm shift by skbach · 2008-07-03 09:45 · Score: 1

I imagine people were having these same types of discussions when we maxed out: electromechanical, relay-based, vacuum tube, and transistor based technologies. The problem is that Integrated Circuits are over. We'll make the jump... Parallel computing is fine, and certainly useful, but one day soon (er than you think) you will have 1 core that is doing 100 PetaFlops with no heat consequences. And if you still wanna put 100 of them on a chip, that's fine too.
How about more efficient code? by NateTech · 2008-07-03 12:07 · Score: 1

With the world looking at ways to lower energy consumption, our industry is retarded if we're going to keep pushing to higher and higher CPU core numbers, higher and higher power consumption, etc.
Is this just the industry's way of giving up and realizing they can't get control of so-called "software engineers"?
Put some leashes on some people, measure PERFORMANCE of code again like we did when we didn't HAVE massive CPU horsepower, and actually work hard on sysadmin goals like properly prioritizing processes running on the hardware?
Think any of that would help a whole lot toward being able to close down a whole lot of data centers?
Won't happen though -- we're humans. We want it now, we want it fast, and we don't care if we have to leave a steaming pile of shit in someone's yard to accomplish it!
Create non-crappy code (even if you have to get underneath the compilers and high level languages to do it) that do the core things people need REALLY WELL perhaps, and say to hell with buying more and more cores from Intel?
I know it's a pipe-dream at this point. Multiple generations of coders haven't analyzed their code for speed/CPU efficience in almost two decades now. Those folks will never learn how, either. No business motivation to do so.
(Hint: Stop buying hardware and make people use what they have for a while, fall back off the leading edge and wait a bit. Those ideas/words scare Intel to death. They HAVE to sell you "more cores!" or "more GHz!" or "more MIPS!" every year to stay in business, now don't they?)

--
+++OK ATH
Re:Great... license woes by Anonymous Coward · 2008-07-03 14:34 · Score: 0

This is a huge problem already.
Our standard Unix box at work is a Sun T5240 which has 16 cores and 128 threads. We just bought a bunch of expensive S/W (7 figures) of which some is licensed for 2 cores. It is getting very hard to buy a 2 core server of any sort - Intel or Unix.
Larry
practical ways to program multicores by toby · 2008-07-04 06:06 · Score: 1

finally figure out a way to program them that's practical.
You haven't heard of Erlang yet?

--
you had me at #!
Erlang has declarative features by toby · 2008-07-04 06:20 · Score: 1

I can't speak for the others, but it's certainly true that Erlang can be used in declarative ways, as its function signatures are patterns which are matched and bound at runtime. Idiomatic Erlang is therefore much shorter then ordinary imperative code (Java, C, ...), some people have estimated by a factor of 4-10.
For an example of declarative style, see my simple minded Tic-Tac-Toe Erlang web application - for example, ttt.erl.

--
you had me at #!
Don't worry! Get started with Erlang today. by toby · 2008-07-04 06:26 · Score: 1

Erlang may not end up being 'the' massively concurrent language of the future, but it's arguably the closest thing by far, that we have today. The shift in thinking that it involves will conceptually prepare you very well for a C-core, K-core, M-core future. A properly architected application will transparently scale.

--
you had me at #!
sigh by toby · 2008-07-04 06:29 · Score: 1

but no programming languages or tools to take advantage of them.
You expect that to come in the CPU box? Good tools exist, but you will have to learn how to use them.

--
you had me at #!
stop by toby · 2008-07-04 06:48 · Score: 1

It's already a complete waste of time. Real work is done at HLL or VHLL level.

--
you had me at #!