Slashdot Mirror


The Father of Multi-Core Chips Talks Shop

pacopico writes "Stanford professor Kunle Olukotun designed the first mainstream multi-core chip, crafting what would become Sun Microsystems's Niagra product. Now, he's heading up Stanford's Pervasive Parallelism Lab where researchers are looking at 100s of core systems that might power robots, 3-D virtual worlds and insanely big server applications. The Register just interviewed Olukotun about this work and the future of multi-core chips. Weird and interesting stuff."

90 comments

  1. That's a lot of systems. by Cheesebisquit · · Score: 2, Funny

    That's a lot of core systems.

    1. Re:That's a lot of systems. by K.+S.+Kyosuke · · Score: 2, Funny

      "Sun Microsystems's Niagra" And they decided to, uhm, "increase their performance" as well...

      --
      Ezekiel 23:20
  2. Imagine a by Anonymous Coward · · Score: 0, Funny

    Imagine a beowulf cluster of ... oh wait

  3. WTF??? by zappepcs · · Score: 1, Funny

    This is slashdot, you _CAN'T_ post an article that can't be read! timothy, what are you thinking?

    1. Re:WTF??? by trum4n · · Score: 1

      This is slashdot. NO ONE reads the fucking article anyway!

    2. Re:WTF??? by tatermonkey · · Score: 1

      Sad but true. I did read like 3 articles yesterday.

  4. The Future Is Non-Algorithmic by Louis+Savain · · Score: 2, Troll

    It is time for professor Olukotun and the rest of the multicore architecture design community to realize that multithreading is not part of the future of parallel computing and that the industry must adopt a non-algorithmic model. I am not one to say I told you so but, one day soon (when the parallel programming crisis heats up to unbearable levels), you will get the message loud and clear.

    1. Re:The Future Is Non-Algorithmic by hostyle · · Score: 3, Funny

      Indeed. Its turtles all the way down.

      --
      Caesar si viveret, ad remum dareris.
    2. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 5, Insightful

      That strikes me as crackpottery. The stuff that link describes as "nonalgorithmic" is also easily algorithmic, just in a process calculus.
      And guess what? Non-kooks in the compsci community are busily working on process calculi and languages or language-facilities built around them.

    3. Re:The Future Is Non-Algorithmic by lenski · · Score: 3, Interesting

      To simplify: Dataflow. It's been too many years, but I recall that DataFlow was a company name. Their lack of commercial success was based on the combination of being way ahead of their time.

      The recent advent of multiple on-die asynchronous units ("cores") is leading to a resurgence of interest in the dataflow model.

      Anyone who has implemented networked event-driven functionality has already started down the path of dataflow model of computation, though obviously it's not fine-grained. The "non-algorithmic model" looks like a fine-grained implementation of a normal network application. (I agree with a downthread post that claims that current and classical Java-based server applications are already there, accepting the idea that event-driven multithreading applications are essentially coarse-grained dataflow applications.) And when the research gets going hot and heavy, I'll wager that the research will end up focusing on organizing the connectivity model.

      As far as I am concerned, one place to look for multicore models to shine would be in spreadsheets and similar applications where there is already a well-defined pattern of interdependency among computational units (which in this case would be the spreadsheet cells). I also think that database rows (or row groupings) would be naturals for dataflow computing.

      An efficient dataflow system would be the most KICK-ASS life computation engine! :-) (Now you know how old I am...)

    4. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 2, Insightful

      Hi MOBE2001. Trying Twitter's tricks as well now?

    5. Re:The Future Is Non-Algorithmic by Louis+Savain · · Score: 1, Interesting

      An efficient dataflow system would be the most KICK-ASS life computation engine! :-) (Now you know how old I am...)

      Actually National Instruments has had a graphical data-flow dev tool for years. Their latest incarnation even has support for multicore processors (i.e., multithreading). However, what I'm proposing is not dataflow but signal flow as in logic circuits, vhdl, spiking neural networks, cellular automata and other inherently and implicitly parallel systems.

    6. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 0, Insightful

      yep. and that attitude is why, in the 21st century, we still run the horribly inefficient internal combustion engine, because fucktards like you believe all the alternatives are "crackpottery"

      Fuck you and all your ignorant "yes-man" kind

    7. Re:The Future Is Non-Algorithmic by lenski · · Score: 4, Insightful

      I can see your point... I can imagine a thing that looks a whole lot like an FPGA whose cells are designed to accept new functional definitions extremely dynamically.

      (As you can tell, I don't agree with using the name "non-algorithmic": It's algorithmic by any reasonable theoretical definition. This is why I refer to it as being an extremely fine-grained data flow model.)

      However, if you look at modern FPGAs, you will discover that even there, the macrocells are fairly large objects.

      I guess that when it comes down to it, the "non-algorithmic" model proposed in the page you cite seems so fine-grained that benefits would be overwhelmed by connectivity issues. By this I mean not simply bandwidth among functional components, but in defining "who talks with whom under what dynamically changing circumstances". Any attempt to discuss fine-grained data flow must face the issue of efficiency in connecting the interacting data and control "elements".

      There's the possibly even more interesting question about how many of each sort of functional module should be built.

      What do you say to meeting in the middle, and thinking about a system that isn't so fine-grained, while also thinking of "control functions" as being just as movable as the data elements? Here's why I ask: In my opinion, there might well be some very good research work to be done in applying techniques related to functional programming to a system of extremely large number of simple functional units that know how to move functionality around with the data.

    8. Re:The Future Is Non-Algorithmic by Louis+Savain · · Score: 0, Flamebait

      Fuck you and all your ignorant "yes-man" kind

      Thank you but why be anonymous? There is nothing to fear. Censorship on Slashdot by the usual Turing machine worshipers and other ass kissers is nothing new but that will not stop progress. The best is yet to come.

    9. Re:The Future Is Non-Algorithmic by hostyle · · Score: 2, Funny

      Sorry about your turtle, John. We'll get you a new one. Or maybe some Japanese Fighting Fish? They are fun until they start programming your VCR and recording Leno.

      --
      Caesar si viveret, ad remum dareris.
    10. Re:The Future Is Non-Algorithmic by K.+S.+Kyosuke · · Score: 2, Interesting

      Oh yes, that is why we have (or are currently developing) purely functional programming languages that can often mimic this model quite nicely, and efficient compilers capable of compiling the code into (potentially) whatever paralellism model you are using. Threads should ideally be just a means of implementing paralellism for such languages, of for parallel computing frameworks. Today, you are probably not supposed to write threaded code by hand in most cases. Once you have a reasonable compiler (latest versions of GHC look really promising in this respect), you can stop worrying about how the damned thing works (in most cases) and just write your code.

      Just one question - as "algorithm" is essentially just a "recipe" for computing certain results from its input. How a supposedly universal computing machine could work without such recipes to change its mode of operations? No matter whether the machine is a Turing machine, a lambda calculus abstract machine or just a reconfigurable electric circuit.

      --
      Ezekiel 23:20
    11. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 0

      What the heck? How is the parent flamebait? Did some mod mis-click?

    12. Re:The Future Is Non-Algorithmic by Louis+Savain · · Score: 1, Insightful

      What the heck? How is the parent flamebait? Did some mod mis-click?

      Nope. It's not a mis-click. It is called censorship on Slashdot. I made myself a lot of enemies apparently. LOL. On Slashdot, you are not allowed to criticise Turing or Darwinism or atheism. Like peer review in science, Slashdot's moderation system serves as a mechanism to suppress dissent, that's all. Too bad Slashdot is not the only forum for expression on the net. But it does not matter in the end, does it?

    13. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 2, Insightful

      I made myself a lot of enemies apparently.

      I don't think its so much enemies you make, its the attitude you take towards the community you are trying to influence. There are many very intelligent computer scientists, and you seem to suggest that most are idiots. You will not be seen as insightful if you cannot recognize the great accomplishments already made.

      Personally, I disagree with your positions on physics, and (especially) mathematics. Statements like "Continuity ... leads to an infinite regress" belie your lack of understanding of these mature fields. Who is going to trust your analysis when you make these statements without any real argument? Down-modding these statements is not censorship, it's moderation: we do not need any more of this crap on /.

      With this kind of broader view of your posts, its tempting to just throw away all of your comments as "crackpot posts." Which is, by the way, what happened with your previous post: someone just though, "Oh it's that crackpot Louis Savain again; time for a downmod." This is bad, because that post in particular was actually insightful.

      The thing is, we do need new programming languages; we do need implicit concurrency; we do need simplicity. Unfortunately, we don't need your arrogance or extremism. You may have something to offer, but it isn't your hate.

      Your COSA project doesn't get traction because it requires the world to change; for better or worse, you must take the world as it is and nudge it where you think it should go. There are many people smarter than you who should and do have more sway in the matter: you should be seeking to convince them. Why not code a working version of COSA which can run on a single-core computer, but can exploit arbitrarily many additional cores? People would be less unimpressed by you if you produced a functional product.

    14. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 0

      Singling out /.-commentators, looking external references for characterization, are we now? This is witch hunt, I tell you, witch hunt!
      I propose a new, stateless version of /. protocoll where "we all begung as something else, but we changed" without the need of relinquishing our faith in holy, indeterministic state machines.

    15. Re:The Future Is Non-Algorithmic by Cheesey · · Score: 4, Insightful

      Right, so you split your computation up into small units that can be efficiently allocated to the many core array. This allows you to express the parallelism in the program properly, because you're not constrained by the coarse granularity of a thread model. Cool.

      But the problem here is how you write the code itself. Purely functional code maps really well onto this model, but nobody wants to retrain all their programmers to use Haskell. We're going to end up with a hybrid C-based language: but what restrictions should exist in it? This depends on what is easy to implement in hardware - because if we wanted to stick with what was easy to implement in software, we'd carry on trying to squeeze a few extra instructions per second out of a conventional CPU architecture.

      The biggest restriction turns out to be the "R" in RAM. Most of our programs use memory in an unpredictable way, pulling data from all over the memory space, and this doesn't map well to a many core architecture. You can put caches at every core, but the cache miss penalty is astronomical, not to mention the problems of keeping all the caches coherent. Random access won't scale; we will need something else, and it will break lots of programs.

      This is going to lead to some really shitty science, because:

      • Many core architectures will only be good for running certain types of program: not just programs that can be split into tiny units of computation, but programs that access RAM in a predictable way.
      • The many core architects will pick the programs that work best on their system; these may or may not have anything to do with real applications for many core systems (And what is an application for a many core system anyway? Don't say graphics...)
      • It will be hard to quantitatively compare one many core architecture with another because of the different assumptions about what programs are able to do in each case. There are too many variables; there is no "control variable".

      I think that the eventual winning architecture will be the one that is easiest to write programs for. But it will have to be so much better at running those programs that it is worth the effort of porting them. So it will have to be a huge improvement, or easy to port to, or some combination of the two. However, those are qualitative properties. Anyone could argue that their architecture is better than another - and they will.

      --
      >north
      You're an immobile computer, remember?
    16. Re:The Future Is Non-Algorithmic by Louis+Savain · · Score: 1

      But the problem here is how you write the code itself. Purely functional code maps really well onto this model, but nobody wants to retrain all their programmers to use Haskell.

      I agree but I am not so sure that it is true that "Purely functional code maps really well onto this model". For example, in spite of its proponents' use of terms like 'micro-threading' and 'lightweight processes', Erlang's concurrent model uses coarse-grained parallelism. I have yet to see a fine-grained parallel quicksort in Erlang. I'm sure the same is true for the other concurrent functional programming languages as well.

      The biggest restriction turns out to be the "R" in RAM. Most of our programs use memory in an unpredictable way, pulling data from all over the memory space, and this doesn't map well to a many core architecture. You can put caches at every core, but the cache miss penalty is astronomical, not to mention the problems of keeping all the caches coherent. Random access won't scale; we will need something else, and it will break lots of programs.

      I agree. This is why a properly designed multicore processor should not use the cache coherency approach. Tilera's Tile-64 processor uses an Imesh interconnect that goes a long way toward solving this problem, in my opinion. However, the trick is to keep related caches as close together as possible so as to avoid slow long-distance fetches and writes as much as possible. Unfortunately, this is the job of a good on-chip automatic load balancer, which the Tile-64 does not have. In my opinion, it's almost impossible to properly balance a multicore processor using threads. Plurality (of Israel) claims to do it with its Hypercore HAL processor but I am not very fond of Plurality's programming model.

      I think that the eventual winning architecture will be the one that is easiest to write programs for.

      Yep. This is why the multithreading approach to parallel processing is an ultimate loser.

      But it will have to be so much better at running those programs that it is worth the effort of porting them. So it will have to be a huge improvement, or easy to port to, or some combination of the two. However, those are qualitative properties.

      I think that the idea that future multicore processors should make it easy to port legacy apps is unwise. The current algorithmic computing model is so completely wrong that it should be allowed to die a slow death like the buggy whip and the vacuum tube. We need to start over from scratch.

    17. Re:The Future Is Non-Algorithmic by coresnake · · Score: 1

      But I laik turtles... :(

    18. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 0

      Yes, I think *that* is the attitude the OP was talking about.

      It should be obvious to everyone that your *unproven idea* about how to program parallel systems is the right way to go? Total egotistical nonsense.

      If you're right about parallel systems, like everyone else in the world, you need to prove it. If you're not prepared to prove it then shut the hell up. Everything else is noise.

      As for continuity and infinite regress, you're basically saying that Zeno's paradox is a real problem? WTF? If you haven't got beyond *that* you're almost certainly not going to save the programming world. I can definitely see why people think you're a nutjob.

    19. Re:The Future Is Non-Algorithmic by cobaltnova · · Score: 3, Interesting

      My goal is to bash them every chance I get.

      Hence the downmod. You just do not learn.

      They don't put food on my table or a roof over my head.

      Really? I take it you don't use the internet or a computer then? Fail.

      Wisdom is 90% guts and 10% sweat.

      I just do not see this: considering the amount of guts you've got, where is the COSA toolchain? The COSA OS, with the COSA web-browser, with the COSA frigging interactive editor? Why should I believe anything you say? You HAVE DONE NOTHING.

      You are a prime example of what I mean by an ass kisser.

      Who's ass am I kissing? Turing? Hawking? Zeno? Have you heard of proof by intimidation? It is not effective. I mean, there are discrete topologies with "infinite divisibility" which are (consequentially) non-continuous, take

      {2^{-n}|n\in N}

      Where the frig is the contradiction?! I WANT TO SEE; PLEASE SHOW ME!

      PS. Why be a gutless coward? Sign your work if you stand by it.

      I am not here to make an enemy or collect blood-karma from you. I am here to make a point. I am here because I see fallow potential in you.

    20. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 2, Interesting

      So sir, what does 0.999... = ?
      I guess you're the kind that would say this isn't 1.

    21. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 0

      Let me shamelessly plug our work in.

      My name Serguey Zefirov (s_zefirov at IPMCE ru) and I work for Lebedev Institute of Precision Mechanics and Computer Engineering.

      At the this Subversion repository I put sources of high-level model of dynamic dataflow machine with token sorting. The model had shown that it is possible to create dynamic dataflow CPU without big associative memory (one of drawbacks of dynamic dataflow architectures).

      There you'll find a little paper about the very concept. And a presentation I did for Moscow Haskell User Group about modeling hardware in Haskell.

      Now we working on optimizing compilers for that interesting kind of architectures (not especially "sorting machine", but dynamic dataflow in general).

      Actually, we already have some simple prototype that works on simple (as in NAS benchmark) Fortran programs.

    22. Re:The Future Is Non-Algorithmic by Anonymous Coward · · Score: 0

      All good points.

    23. Re:The Future Is Non-Algorithmic by darkwhite · · Score: 1

      You're not being censored. You're being modded down because you're an ignorant, uninformed, arrogant self-promoting idiot.

      --

      [an error occurred while processing this directive]
    24. Re:The Future Is Non-Algorithmic by default+luser · · Score: 1

      You know, most programmers already know how to construct state machines, and alreay create entire programs using this concept. Your ideas are not revolutionary, they only highlight the need to use asynchronous state machines over synchronous threads. Where your ideas fail is this: you want to get rid of threads completely.

      Do you want to know why the programming community likes threads? There's a simple reason: state machines DO NOT SCALE. As you add more capabilities to a state machine, the number of states you have to add for each new functionality can increase geometrically. Petty soon, the number of independent state variables defining the machine gets unmanageable.

      What do you do when that happens? Well, in the real world, you break up that state machine into multiple threads. You basically draw an artificial line between units that should not be separated, simply because leaving them attached is not feasable or maintainable. Yes, operations within each thread happen asynchnously, but the multiple threads are now dependent on each other, because you were forced to break-up the complex system.

      In the end, you're left with the same problem we've always had: balancing threads with performance. Yes, you can build threads that are asynchronous; this has always been possible using state machines. What you cannot do is completely toss threads; threads exist to make progamming problems more accessible to humans. Unless you can take humans out of the loop, threads are here to stay.

      --

      Man is the animal that laughs.
      And occasionally whores for Karma.

    25. Re:The Future Is Non-Algorithmic by paulgrant · · Score: 1

      Technically, his comment regarding continuinity implying infinite regress *is* accurate;
      Newton considered this one of the fundamental flaws in his theory of fluxions (an ill-defined
      smallest unit of change). A fact I also independently came to at the tender young age of 17 ;)
      And when I went to my math professor and expressed my idea, he was very excited... Apparently its
      not all that common for people to realize the gross assumptions in the systems they are taught.

      As to Zeno's paradox; its not a *real* problem in that you're particularly sloppy about what you use
      as your yardstick; translation: evaluation of distance (length) depends on the yard stick you use - overall
      length is *not* scale-invariant. And hence, given a scale (a particular length ruler), I can cross the room
      in X amounts of steps even though I aught not be able to. The question you should ask my friend is this:
      given ruler A (measuring in angstroms) and ruler B (measuring in kilometers), would you cross the same room
      in the same number of steps e.g. X_a = X_b ? and if this is not the case, then wouldn't you have to concur
      that zeno's paradox does indeed have teeth, if not feeble ones at your particular scale? And in considering
      your reply, do keep in mind that the trend in technology is to try to keep miniturizing everything....

      Oh, and to the original asshole who started this, KEEP ON BEING AN ASSHOLE!!!!!!! just 'cause ignorant jackasses
      run things by sheer dint of numbers and limitless stupidity don't mean you ain't right :)

  5. Imagine a Beowulf Cluster... by rwillard · · Score: 1, Funny

    ...oh, you know how this one is supposed to go.

  6. Cool but... by bakedpatato · · Score: 0, Redundant

    let's just hope that more programs in the future are written so that they can scale well to the 100s of cores that new CPUs will have. People were pretty slow to get onto the dual core bandwagon...so here's hoping.

    1. Re:Cool but... by larry+bagina · · Score: 1

      maybe you should start writing them instead of posting to slashdot?

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    2. Re:Cool but... by lenski · · Score: 1

      Super-complicated spreadsheets (which are becoming the norm), databases and (my current favorite) simulations fit multi-core systems very well.

      I am helping a company build a simulation management system and have been salivating for a Niagara-based system for about a year now. Their requirements don't call for such performance (yet), but I remain hopeful. :-)

      Finally, I'll bet Linden Labs ("Second Life") is following the large-count multicore research closely.

  7. Multi-core chips will be constrained by by Skapare · · Score: 4, Insightful

    Multi-core chips will be constrained by, among other things, the memory bandwidth going off-chip. Maybe they need larger caches. Maybe they just need to put all the RAM on the chip itself instead of so many other cores. How about 4GB of RAM at 1st level cache speed.

    Ultimately, we'll end up with PCs made from SoCs, and direct SATA, USB, Firewire, and DVI interfaces coming out instead of a RAM access bus. By the time they are ready to make 256 core CPUs, software still won't be ready to work well on that. So in the interim, they might as well just do tighter integration (that can also run faster there, too). No more north bridge or south bridge. Just a few capacitors, resistors, and maybe a buffer amp or two, around the big CPU.

    About the only thing that won't be practical to put in the CPU for a long time is the power supply. They could even put the disk drive in there (flash SSD).

    --
    now we need to go OSS in diesel cars
    1. Re:Multi-core chips will be constrained by by ddrichardson · · Score: 4, Insightful

      That sounds ideal and in the long term is probably what will happen. But you need to overcome two massive issues first - leakage and interference between that many components in one space and of course heat dissapation.

      --
      A thistle is a fat salad for an ass's mouth...
    2. Re:Multi-core chips will be constrained by by BrentH · · Score: 2, Insightful

      How do videocards handle feeding data to 800 (latest AMD chip) separate processors? The memory controller is onchip of course, and it has a bandwidth of about 50-60GB/s I believe. So, for normal multicore cpu's, try bumping up that DDR2 ram from a measly ~10GB/s (when used in dual channel) up to the same level (AMD again already has the memorycontroller onchip, Intel is going there I believe). DDR(2) being 64bits wide (why?) doesn't either help I'd say.

    3. Re:Multi-core chips will be constrained by by pjt33 · · Score: 5, Funny

      Three - three massive issues! Leakage, interference between that many components in one space, of course heat dissipation, and having a single, expensive, point of failure. Wait, I'll come in again.

    4. Re:Multi-core chips will be constrained by by lenski · · Score: 2, Interesting

      Actually, Sun's Niagara has that "problem". The way they solved it is to place Gbit networking close to the cores. There are also multiple DDR-2 memory buses and (I think) PCI-E lanes to feed the processor's prodigious need for memory bandwidth.

      The comments to the Register article include a comment about the Transputer. (In case it's not familiar history, the transputer was a really slick idea that went nowhere... 4 high bandwidth connections, one for each neighbor CPU, with onboard memory. I recall that they were programmed in "Occam", a dataflow-oriented language.)

      I believe that large-count multi-core systems will remain niche solutions until dataflow "locality", "discovery", and unification of control and data become well understood in a theoretical framework. The niches are nice places to be, though. A high quality game is "merely" a simulation of some virtual reality and simulations are perfectly matched to high-count multi-core systems.

      The idea of unifying control and data is no new invention, and anyone trying to patent it should be shot. It's just an ordinary spreadsheet cell. Or possibly a neuron.

    5. Re:Multi-core chips will be constrained by by Tumbleweed · · Score: 1

      Three - three massive issues! Leakage, interference between that many components in one space, of course heat dissipation, and having a single, expensive, point of failure. Wait, I'll come in again.

      You forgot the almost fanatical devotion to the Pope!

    6. Re:Multi-core chips will be constrained by by mikael · · Score: 2, Interesting

      the transputer was a really slick idea that went nowhere... 4 high bandwidth connections, one for each neighbor CPU, with onboard memory. I recall that they were programmed in "Occam", a dataflow-oriented language.)

      Mainly because CPU clock speed and data bus speed were doubling every year. By the time an accelerator card manufacturer had a card out for six months, Intel had already ensure that the CPU was faster and so the accelerator card rapidly became a de-accelerator card. If you look at the advert pages of old Byte magazines, you will see all sorts of accelerator card that tried to offload work away from the CPU (i860's, TMS34020's, quad transputers, video cards with built in networking). All of these were squelched one way or another (Intel created a custom video bus, added the 80x87 FPU, then put it on-core, created the Xeon with a built in i860, added a larger cache, multi-stage pipelines, superscalar architecture, doubled register size from 16-bits to 32-bits, and so on...)

      Also, most applications that benefit from parallel processsing required data to be stored in three-dimensional grids, which needed both floating-point acceleration and six and more memory accesses (up and down as well as north, south, east and west). Both Parallel C and Fortran were available for the transputer, but the problem was cost - a transputer accelerator board cost well over 500 pounds just for a four transputer board.

      It's a shame the transputer never made it, but in the PC world a manufacturer needed to have a new product out every six months to keep up against Intel.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    7. Re:Multi-core chips will be constrained by by Fweeky · · Score: 2, Interesting

      The memory controller is onchip of course, and it has a bandwidth of about 50-60GB/s I believe

      Which is in fact, around the amount of memory bandwidth Niagara systems have, with 6 memory controllers per socket.

    8. Re:Multi-core chips will be constrained by by Anonymous Coward · · Score: 1, Interesting

      I'm not sure about Niagara, but it should be noted that for GPUs to obtain anything even close to the advertised bandwidth requires very specific access patterns. I'm not familiar with ATI's GPUs, but in the case of nVidia GPUs, "proper" access requires ensuring that the memory address of the "first" process in a warp (group of processes) meets certain criterion and that the addresses accessed in parallel with this by other processes in the warp are contiguous with a stride of 4, 8, or 16 bytes. Doing anything else results in the "86.4GB/s" bandwidth on a G80 being reduced significantly (more than 50%).

    9. Re:Multi-core chips will be constrained by by TheRaven64 · · Score: 3, Insightful

      Niagara has enough memory bandwidth to keep its execution units happy. The last chip I remember that didn't was the G4 (PowerPC). The problem is more one of latency. This isn't such a problem in a GPU, since they are basically parallel stream processors - you just throw a lot of data at them and they process it more-or-less in that order.

      There was a study conducted ages ago (70s or 80s) which determined that, on average, there is one branch instruction in every seven instructions in general purpose code. This means that you can pretty much tell where memory accesses are going to be for 7 instructions, you've got a 50% chance for 14 (assuming it's a conditional jump, not a computed jump), a 25% chance for 21 instructions and so on. The time taken to fetch something from memory if you guessed wrongly is around 200 cycles.

      This is a big reason why the T1/2 have lots of contexts (threads). If you need to wait for memory with one of them, then there are 3 or 7 (T1 or T2) waiting that can still use the execution units.

      Most CPUs use quite a lot of cache memory. This does two things. First, it keeps data that's been accessed recently around for a while. Second, you access memory via the cache, so when you want one word it will load an entire cache line and data near it will also be fast to access. This is good for programs which have a lot of locality of reference (most of them).

      --
      I am TheRaven on Soylent News
    10. Re:Multi-core chips will be constrained by by thogard · · Score: 1

      The T1s have lots of contexts so they can deal with running poorly optimised code that spends all of its time chasing pointers. A typical T1 CPU will run program 1, grab a pointer, try to load its data and stall from a cache miss but that switches to program 2 which had already stalled and now has its data ready so it can grab that, do a pointer calculation and stall again so the CPU is off to thread 3. The key to the T1 is there is not context swap between them. The T1 seems to very quick running poorly optimised ODD code but its a real dog running anything that has been optimised. Running hand optimised code on it is a waste of time. The T1 is the pinnacle of the solution "All problems in computer science can be solved by another level of indirection"

  8. Well by Anonymous Coward · · Score: 1, Interesting

    IAASE and if I recall correctly, Donald Knuth said that the all the advent of multi-core systems showed was that chip developers had run out of ideas and from what I can see happening in the industry today, he was right.

    multi-core = multi-kruft

    1. Re:Well by Wesley+Felter · · Score: 1

      So what else should we do? Should we stick with single core and watch our computers never get any faster?

    2. Re:Well by Anonymous Coward · · Score: 0

      well I havent seen multicore chips make the average pc/laptop appreciably faster so either way the result is the same.

      Multicores in the average PC is just marketing. I mean computers just have to keep getting faster, right? Just like we need the print in books to get smaller so we can fit more words in? Or we need a nano-keyboard? If your computer does 2.2 billion operations a second and then inproves to 2.2 trillion, nobody is going to notice. I mean, can YOU tell the difference between a nanosecnd and a femtosecond? I don't think so.

      The laptop I am typing this on is 8 years old 2.2GHz celeron,512MB 266MHz RAM and a 60GB ATA133 HDD, that I use to browse the internet, do 'homework', store photos,watch movies and send email. It runs mandriva 2008.1 with compiz-fusion and it does so very well despite having an IGP rather than a graphics card. I consider these to be the things that the average user puts their computer to. Now, you tell me how I can benefit appreciably from the latest 3.0GHz, multicore beast? I know the answer is that I cannot because as I pointed out earlier, you cannot tell the difference between a nanosecond and a femtosecond.

      All the latest computers are good for (apart from improved graphics) is for keeping up with the jonses

    3. Re:Well by maeka · · Score: 2, Informative

      Multicores in the average PC is just marketing.

      I'm sorry you do not have a need to run more than one CPU intensive process at a time.

      The laptop I am typing this on is 8 years old 2.2GHz celeron

      You are significantly off on your estimate of its age.

    4. Re:Well by Cheesebisquit · · Score: 0

      That's right, beg for his forgiveness.

    5. Re:Well by harry666t · · Score: 1

      > 2.2 billion operations a second

      No, no, no. It's 2.2 billion "processor cycles", not operations. Some CPU architectures will do one operation in many cycles (think pre-pentium x86, z80, m68k, and so on), some do one op per cycle (early RISC designs: MIPS, etc), some do many operations in one cycle (superscalar stuff, instruction parallelism, AFAIK: POWER, SPARC), and some even run an older instruction set on a "hardware virtual machine", so you'll probably never know how many instructions per cycle (or cycles per instruction) they really do at the core (your Celeron works like this). Just pointing this subtle (but crucial) difference out (it is what the whole RISC-vs-CISC-vs-Everything dispute was all about).

      > you cannot tell the difference between a nanosecond and a femtosecond

      But I can tell a difference between an hour and a minute, and when running a CPU-intensive task (say, transcoding media), having more CPU power certainly pays back (not to mention that if you also have a second core, you can keep using the computer for whatever else you want, while it is crunching numbers in the background, or - if you're not doing any work - speed up that process by 100% (if it scales to multiple cores/CPUs)).

      I believe you're just misinformed.

  9. Multicore vs MultiCPU programming ? by karvind · · Score: 1
    IANACS

    Pardon my ignorance, but we had many supercomputers (more recent one Roadrunner) which use multiple CPUs (and accelerators). Can't we use the programming 'tricks' or 'models' or 'techniques' used there for efficiently using multicores ? I understand multicore has significantly less communication overhead, but overall philosophy of synchronizing, message passing, shared memory etc wouldn't be completely irrelevent ?

    1. Re:Multicore vs MultiCPU programming ? by Macman408 · · Score: 1

      Their "tricks" are to 1. have a hell of a lot of computation to be done, and 2. make sure that the work can be split into millions or billions of completely independent tasks. Then you send a few thousand tasks to each CPU, wait a while, and they all get done. Most interesting problems require some amount of communication or reduction or something that is not perfectly parallel - but there's nothing magical going on. If your computation is largely serial, there's not a whole heck of a lot that can be done when you're given a parallel architecture.

    2. Re:Multicore vs MultiCPU programming ? by pimpimpim · · Score: 1

      The programming trick on a supercomputer is that you have a dedicated PhD-(student) work fully on parallelizing your single application. The program that is parallelized performs a lot of computations in a serial kind of way, and is built up out of blocks that can be calculated separately to an extent that allows the speed up of parallelization make up for your cost in bandwidth. The program has to be simple enough that the programmer can predict at which time, which data is needed at what processor, for a variable amount of processors of course. Most of your daily software is too segmented for this. I bet gaming and video decoding have good chances to be parallelized , though.

      --
      molmod.com - computing tips from a molecular modeling
  10. First mainstream multicore? by Wesley+Felter · · Score: 1, Informative

    For servers, POWER4 was released in 2001. For desktops, the Pentium D came out in mid 2005 and the PowerPC 970MP a few months later. All of these came out before Niagara.

    1. Re:First mainstream multicore? by TheRaven64 · · Score: 3, Informative

      Read more carefully. He created the Stanford Hydra in 1994, and the Niagara is based on this design. They are not claiming the Niagara was first.

      --
      I am TheRaven on Soylent News
    2. Re:First mainstream multicore? by Anne+Thwacks · · Score: 3, Interesting
      Well, the fundamental idea behind it was used in the National Semiconductors COP - a 4 bit processsor in the late 1970s.

      Incidentally, I worked with Transputers,and the concept died for many reasons

      1) The comms channel was a wierd, proprietry protocol, and not HDLC - completely fatal

      2) In the event of an error, the entire Transputer netowork locked up - competely fatal

      3) Mrs Thatcher eventually agreed to fund the project with $50,000,000 the same day that United Technology (can you say 6502, or was it Z80) cancelled a project saying "in the world of Microprocessors $50,000,000 is nothing". - Two fatal errors here (a) expecting the UK government to fund anything reasonably sensible, and (b) Making it clear that the project is insufficiently funded to survive

      4) The project was taken over by the French - whose previous achievements in both hardware and software are [white space here]. 5) Inmos, who made it, (a) tried to force people to use a new language, at a time when there was a new language every month, (b) took two years to discover that the target market wanted C, and (c) never discovered the appropriate language was Algol68.

      In short, the company was run by a clever but narrow minded geek, who failed to take advice from others in the industry (including other narrow minded geeks, like me, etc).

      --
      Sent from my ASR33 using ASCII
  11. Illogical, Donald Knuth is smarter than that. by lenski · · Score: 2, Insightful

    Silly. I cannot believe Donald Knuth would be that dense, there must be more to the conversation.

    Every major system in existence today is already a "multiprocessor" system, we just don't think of them that way. The average PC is a parallel system running at least 14 CPUs in parallel. (two or three for every spindle, one or two for keyboard, a few for your broadband modem, a few in your firewall, etc etc etc).

    Multicore systems are simply an extension of the existing computational model. Plus, every supercomputer built in the last 20 years has been massively parallel.

    Out of ideas? I Don't think so.

    1. Re:Illogical, Donald Knuth is smarter than that. by cnettel · · Score: 2, Informative
      Those processors have all been part of maintaining the illusion of the von Neumann machine, and to maintain common interfaces between different pieces of hardware. What's happening now is that the model of a single instruction feed is breaking down completely, no matter what task you want to do, if you want it done efficiently. And that's for the very reason that the very smart people designing chips have run out of ideas on how to make them faster while maintaining that very convenient illusion for the smart people writing software for those chips.

      No one wants to write threaded code for computations, if it can be done serially. (Parallelism can be quite convenient for processing of requests, like a server, but even then most designs unless done with great care will contain synchronization or non-locality in one way or another.)

    2. Re:Illogical, Donald Knuth is smarter than that. by lenski · · Score: 2, Interesting

      I'll accept the argument that the single-threaded model is (temporarily) being preserved in current systems. That said, I believe that there is a natural progression toward multithreaded computing as the technologies become more pervasive.

      What do you think of such things as SQL and spreadsheets already starting down a road of declarative style of "programming", which would implicitly allow the engines to make their own decisions about how to run multi-threaded?

      I had good experience with a quad-phenom running a classical web application recently: The system used all 4 cores very effectively without our needing to make a single adjustment to our extremely simple application.

      To me, it appears that we (the developers, theoreticians as well as practical implementers) are already naturally moving to use the resources that the hardware developers are providing. However, I really don't see multi-core systems as "cruft"y as your first comment claims.

    3. Re:Illogical, Donald Knuth is smarter than that. by BrainInAJar · · Score: 1

      Intel & Sun's compilers have openmp -xautopar flags, such that when you build your code, the compiler is smart enough to find obvious places to parallelise your code (loops, what have you)

      But honestly, I think it'll take about 8 or 9 years for the real potential of multicore to start to pay dividends. First universities will need to start teaching OpenMP ( or whatever ), then the kids graduate, and start using it @ work

    4. Re:Illogical, Donald Knuth is smarter than that. by Anonymous Coward · · Score: 1, Interesting

      No. Sheesh. Supercomputers almost all use a message passing model in practice. Messages between concurrent processes, not threads and mutexes. They may have shared memory, but that means message passing is zero-copy, not that that they're programmed with anything other than MPI.

      The grownups have been playing with concurrent systems for decades now.
      There's a lot of wheel-reinvention going on as the kiddie home computers get multiple processors. Eventually they'll realise what we've known for years - the ONE concurrent model that humans program well is message passing between concurrent processes.

    5. Re:Illogical, Donald Knuth is smarter than that. by Anonymous Coward · · Score: 0

      Don't forget the hard drive. The controller on there is itself an asymmetric multi core ARM processor. :-)

    6. Re:Illogical, Donald Knuth is smarter than that. by xigxag · · Score: 1

      Those may be processing units, but they're hardly CENTRAL Processing units.

      --
      There are two kinds of people: 1) those who start arrays with one and 1) those who start them with zero.
    7. Re:Illogical, Donald Knuth is smarter than that. by johanatan · · Score: 1

      Please: Haskell, LISP, O'Caml or any of a number of other 'real' functional languages deserve your attention far before SQL or Excel.

    8. Re:Illogical, Donald Knuth is smarter than that. by johanatan · · Score: 1

      Yea, but client-side GUI programmers understand that too. Message queues are great for this. WIN32 is not the best system by any stretch of the imagination, but its message passing is a fairly simple model for concurrent programming.

    9. Re:Illogical, Donald Knuth is smarter than that. by CBravo · · Score: 1

      Maybe not called central. Maybe not multi-purpose. But try pulling out your graphical processor.

      I agree that we haven't really been programming our multi processor environments like we should (generally with libraries).

      --
      nosig today
    10. Re:Illogical, Donald Knuth is smarter than that. by lenski · · Score: 1

      OK by me! :-)

      I just brought out the most commonly known technologies that can be converted to high-count multi-core systems quickly and relatively efficiently.

      I didn't claim that SQL, spreadsheets, etc are THE solutions, only that multi-core systems are not wasteful cruft even given the ordinary techniques we have available today.

      Here's the bottom line: we agree that somewhere, sometime, a new language or language family is very likely to replace the current procedural (or half-object, half-procedural language families we use now). But in the meantime, multicore systems will be more useful more quickly than is implied by the idea that hardware designers are out of ideas. Our world is not stagnating, and is (in my opinion) not under threat of stagnation.

    11. Re:Illogical, Donald Knuth is smarter than that. by johanatan · · Score: 1

      I agree completely. :-) There is no crisis, but the sooner that people realize that Haskell can help them in this area (and allow writing many fewer [and more elegant] lines of code), then the sooner I will get to start using it in production. [A current search of dice.com for 'Haskell' returns only 10 results--and the other functional langs, not so much more].

  12. Multiprocessor Programming by yumyum · · Score: 2, Informative

    I just finished taking a course at MIT on multiprocessor programming. It was taught by the authors of The Art of Multiprocessor Programming, Maurice Herlihy and Nir Shavit. I highly recommend their book, their classes, their expertise. They are now focused on transactional memory, which may make things a bit easier to program in the multiprocessor universe. Of course we can stick with course-grained locking, but as they pointed out early on, Amdahl's Law shows that throwing hardware at a problem may not be successful in upping performance by the amount you expect if the system's scheduler has no hopes of keeping the cores busy due to how you've written your code.

    1. Re:Multiprocessor Programming by explorer107 · · Score: 1

      Gustafson's law says that while throwing hardware at a problem, increase the problem size of an application as well to improve scalability of hardware. There are many discussions supporting Amdahl's and Gustafson's laws. But, if an application can scale without increasing data access contention, it can benefit from multi-core processors. All applications may not scale their problem size, but there are many applications that can.
      Check out http://www.multicoreinfo.com/ for a lot of multicore related news and programming resources.

  13. How will software design change? by Kaenneth · · Score: 1

    Linked Lists? in order to get to an item, you have to traverse the list until that point. Maybe you could have one thread traverse the list, and dispatch each item to a new thread for processing... but what about counting how many items on the linked list satisfy a condition? maybe round robin assign them to the worker threads, the add up the subtotals...

    Seems to me that simple linked list structures may be something to avoid in favor of trees, where you could just send references to branches to threads. Byte streams might have the same issue, for example in decoding variable length characters, you wouldn't want to start in the middle of a character (unicode encoding issues), fixed-width data in an array of a specific length (instead of depending on a terminator) could be easier to break up tasks (such as spell check, send each page to a different thread)

    How about pre-emptive multitasking?, if you have hundreds of cores, how often will you need to put a thread on hold, run another thread, then switch back... if you never have to save/restore thread state except when doing unusual tasks like debugging, you could optimize in favor of other operations. You could design threads to stream data like an assembly line, thread A watches for the end/other exceptions, thread B validates the range of values, thread C breaks the data into groups, and passes them along to a collection of other threads. If the chips were optimized to be able to stream data between cores without reaching out to main ram the stream buffers could be rather small, if it's known that the threads involved will never be preempted and each of the tasks has a reasonable upper bound on time. Each core would be looping over a tiny number of instructions, so a large instruction cache wouldn't be needed (per core, but yes for the whole chip package), and since the data is mostly rolling in from one end, and rolling out the other, only the initial input and final output leave the chip.

    1. Re:How will software design change? by jd · · Score: 1

      The software guys in the team have some neat utilities for analyzing and generating parallel code - but it's old and clearly not maintained. Apart from the stuff that has become commercial. Either way, a good workman keeps their toolset in good condition, so the lack of maintenance does bother me.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:How will software design change? by TheRaven64 · · Score: 1

      In general, you don't want to be breaking down your problem at anything like that granularity. Traversing a linked list and testing each element is a tiny bit of code which fits nicely in an instruction cache. Automatic prefetching means that you can do this very efficiently on a single core. Unless the processing component is very large, you would be better off splitting your program at a coarser granularity. No one writes code to traverse linked lists, they write code to solve problems, and these problems can usually be broken down into a lot of overlapping stages.

      --
      I am TheRaven on Soylent News
  14. Horse Pucky..... by FlyingGuy · · Score: 4, Insightful

    We already have servers for INSANELY HUGE internet apps, its called a main-frame.

    It amazes me to no end, how many people still think its about the CPU. It about throughput, ok? Can we just get that fucking settled already? I don't give a rats ass how many damn cores you have running or if the are running 100 gigahertz, if you are still reading data across a bus, over an ethernet connection, ANYTHING that does not work at CPU speed then it makes little difference, that damn CPU will be sitting there spinning waiting for the data to come popping through so it can do something!

    Mainframes use 386 chips for I/O controllers and even those sit there and loaf, talk about a waste of electricity! About .01% of the worlds computers need the kind of power that a CPU with more then say 4 cores provide. Those that do are rather busy doing insanely complex mathematics, but even then I doubt that the CPU(s), even when running at "100%" utilization are actually doing the work that they were programmed to do, rather they are waiting for I/O to a database or RAM and fetching data.

    Until someone figures out how to move data in a far far more efficient manner then we currently understand, these mega-core CPU's, while nice to think about, are simply a waste of time and silicon with the possible exception of research.

    --
    Hey KID! Yeah you, get the fuck off my lawn!
    1. Re:Horse Pucky..... by Anonymous Coward · · Score: 1, Informative

      Mainframes use 386 chips for I/O controllers and even those sit there and loaf, talk about a waste of electricity! About .01% of the worlds computers need the kind of power that a CPU with more then say 4 cores provide. Those that do are rather busy doing insanely complex mathematics, but even then I doubt that the CPU(s), even when running at "100%" utilization are actually doing the work that they were programmed to do, rather they are waiting for I/O to a database or RAM and fetching data.

      IBM mainframes use PPC 440 processors in their channel cards. You are wrong. PPC 440 is not fast enough. Look at their pathetic Ficon IOs/sec numbers vs. an Emulex FCP adapter.

    2. Re:Horse Pucky..... by Kristoph · · Score: 1, Funny

      About .01% of the worlds computers need the kind of power that a CPU with more then say 4 cores provide.

      Yes but now that we can't buy XP any more, the penetration of Vista is sure to grow.

    3. Re:Horse Pucky..... by dodobh · · Score: 1

      The problem is that the mainframe is still a huge, single point of failure. What we need is the ability to toss a few dozen _CHEAP_ systems at the problem, and figure out how to make it work.

      Failures happen. Code around them.

      There is only so high you can scale a mainframe up to. Then you need to start scaling out. Scalability isn't a few thousand users pounding your systems. It's about a few million users pounding your systems, with increases of one order of magnitude being common.

      --
      I can throw myself at the ground, and miss.
    4. Re:Horse Pucky..... by Anonymous Coward · · Score: 1, Insightful

      Precisely. This is why labs such as RAMP and PARLAB (both from Berkeley - take that, Stanford) have designed not just multicore systems, but 'manycore' systems possessing in excess of 1000 CPUs (the chips are actually FPGAs if I'm not mistaken). The chips run pretty slowly -- some of them around 100MHz, but the operation virtually any part of the chip can be observed and tweaked at a very low level. The idea is not to design a faster-clocked or more parallel CPU so much as it is to discover the best architecture for parallel multiprocessing; i.e. the architecture with the best throughput.

    5. Re:Horse Pucky..... by FlyingGuy · · Score: 1

      I stand corrected, but I think my original point still holds.

      While even a more efficient channel controller will open up those choke points to a degree, the main processing unit still sits there doing nothing, for a very large percentage of the time waiting on Data, User Input, all those things that happen rather S L O W L Y.

      --
      Hey KID! Yeah you, get the fuck off my lawn!
  15. Great Researcher, but... by Anonymous Coward · · Score: 0

    I took a class from him at Stanford University. He was the worst lecturer I ever had. He clearly cared a lot more about his research than us silly undergraduates. Is this a problem with all well-known professors? And if our teachers don't put any effort into teaching, how are we, the students of the next generation, supposed to contribute to the field?

    1. Re:Great Researcher, but... by Anonymous Coward · · Score: 2, Insightful

      You misperceive the role of a Stanford computer science professor. They're not there to educate you; they're there to create their startups in a risk-free environment with cheap talent. Teaching is just the price of admission.

  16. right on by KingBenny · · Score: 1

    totally and undeniably allrighty then ... we don't need something new, we just need MORE of the same old stuff ... all hail von neumann !

    --
    Free speech was meant to be free for all... how can anyone grow up in a nanny state ?