Scaling To a Million Cores and Beyond
mattaw writes "In my blog post I describe a system designed to test a route to the potential future of computing. What do we do when we have computers with 1 million cores? What about a billion? How about 100 billion? None of our current programming models or computer architecture models apply to machines of this complexity (and with their corresponding component failure rate and other scaling issues). The current model of coherent memory/identical time/everything can route to everywhere; it just can't scale to machines of this size. So the scientists at the University of Manchester (including Steve Furber, one of the ARM founders) and the University of Southampton turned to the brain for a new model. Our brains just don't work like any computers we currently make. Our brains have a lot more than 1 million processing elements (more like the 100 billion), all of which don't have any precise idea of time (vague ordering of events maybe) nor a shared memory; and not everything routes to everything else. But anyone who argues the brain isn't a pretty spiffy processing system ends up looking pretty silly. In effect, modern computing bears as much relation to biological computing as the ordered world of sudoku does to the statistical chaos of quantum mechanics.
thats about 30 forks(), and there you go.
839*929
Simply put, there are some computational problems that work well with parallelization. And there are some that no matter how you try to approach it, you come back to a serial-based model. You could have a billion core machine running at 1Ghz get stomped by a single core machine running at 1.7Ghz for certain computational processes. We have yet to find a way computationally or mathematically to make intrinsically serialized problems into parallel ones. If we did, it would probably open up a whole new field of mathematics.
#fuckbeta #iamslashdot #dicemustdie
I've left out links to some projects, by request, but everything can be found on their homepage anyway. Anyways, it is this combination that is important, NOT one component alone.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I don't know about this specific project, but Manchester is strongly Open Source. The Manchester Computer Centre developed one of the first Linux distributions (and - at the time - one of the best). The Advanced Processor Technologies group has open-sourced software for developing asynchronous microelectronics and FPGA design software.
Manchester University is highly regarded for pioneering work (they were working on parallel systems in 1971, and developed the first stored-program computer in 1948) and they have never been ashamed to share what they know and do. (Disclaimer: I studied at and worked at UMIST, which was bought by Manchester, and my late father was a senior lecturer/reader of Chemistry at Manchester. I also maintain Freshmeat pages for the BALSA projects at APT.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
The problem posed by the author is somewhat of a straw man argument: "The trouble is once you go to more than a few thousand cores the shared memory - shared time concept falls to bits."
Multiple processors in a single multicore aren't required even today to be in lockstep in time (it is actually very difficult to do this). Yes, locally within each core and privates caches they do maintain a synchronous clock, but cores can run in their own clock domains. So I don't buy the argument about scaling with "shared time".
Secondly, the author states that the "future" of computing should automatically be massively parallel. Clearly they are forgetting about Amdahl's Law (http://en.wikipedia.org/wiki/Amdahl's_law). If your application is 99.9% parallelizable, the MOST speedup I can expect to achieve is 1000X, forget about millions. High sequential performance (ala out-of-order execution, etc.) will not be going away anytime in the near future simply because they are best equipped to deal with serial regions of an application.
Finally, I was under the impression that they were talking about fitting "millions" of cores onto a single die, until I read the to the end of the post that they are connecting multiple boards via multi-gigabit links. Each chip on a board has about 20 or so cores with privates caches or local store. They talk to other cores on other boards through off-chip links...... SO isn't this just a plain old message passing computer?! What's the novelty here? Am I missing something?
Computers are consistant and predictable. The human brain is not.
We have billions of human brains cheaply available, so let's use those when we want a human brain. And let's use computers when we want computers.
This 1-million core machine better be running open source software and not proprietary software.
Yeah, especially if their software is licensed on a per-core basis.
Some folks with severely damaged brains seem to make better human computers than people with healthy brains. Rain Man leaps to mind as well as other savants. It seems that when some parts of the brain are impaired the energy of thought is diverted to narrower functions. Perhaps we need to think of delivery more energy to less cores to make machines that do tasks that normal humans are not so good at doing.
Is it coincidence that earlier this month there was a press release from IMEC regarding the issues of massively scaling up computational power ("exascaling")?
Press blurb can be found here.
Killer application would be "space weather prediction".
The problems with "coherent memory/identical time/everything can route to everywhere" isnt only seen when you get up to a million cores. I've done plenty of work with MPI and pthreads, and depending on how it's organized, a significant portion of these methods start showing inefficiencies when you get into just a few hundred cores.
Since there are already plenty of clusters containing thousands upon thousands of individual processors (which dont use coherent memory..etc), the step to scale up to a million would likely follow the same logical development. There should already be one or two decent CS papers on the topic, since it's basically a problem that's been around since beowulf clusters were popularized (or even before then)
...it seemed to me that Amdahl's law was still alive and kicking.
This is very similar to the Inmos Transputer, a mid-1980s system. It's the same idea: many processors, no shared memory, message passing over fast serial links. The Transputer suffered from a slow development cycle; by the time it shipped, each new part was behind mainstream CPUs.
This new thing has more potential, though. There's enough memory per CPU to get something done. Each Cell processor, with only 256K per CPU, didn't have enough memory to do much on its own. 20 CPUs sharing 1GB gives 50MB per CPU, which has more promise. Each machine is big enough that this can be viewed as a cluster, something that's reasonably well understood. Cell CPUs are too tiny for that; they tend to be used as DSPs processing streaming data.
As usual, the problem will be to figure out how to program it. The original article talks about "neurons" too much. That hasn't historically been a really useful concept in computing.
The CM-1 has 65,536 one bit processors. The CM-5 was in Jurassic Park, and some phone companies.
The brain isn't like computers at all. The brain is compartmentalized. There are dozens of separate pieces each with its specialty. Its wired to other pieces in specific ways. There is no "Total Information Awareness"(tm) bullshit going on (what 1 million cores would give you). The problem with TIA is that there is too much crap to wade through. Too big a haystack to find the needle you need. What they found when analyzing Berger-Liaw speech recognition systems against other systems is that the Berger-Liaw system kept temporal (time-based) subtleties, in contrast to other speech recognition systems that were simply digital (with the clock/oscillator sampling in a Nyquist format, destroying or failing to capture temporal information). The Berger-Liaw system can best the best human listeners (which is why the US navy got it instead of it becoming an available commercial product). It could act as a 'sonic input device' using only a tiny neural network (20 to 30 nodes) for superhuman input, instead of the digital ones, giving crappy results with 2048 or 4096 nodes. The brain is wired with a lot of 'specialty components' which use a spare number of components to get the job done. Some of the excess appears to be redundant (although I am not a neural-scientist and could be wrong).
My opinion is that you should not require software to be parallelized from the start. You parallelize it during runtime or at compile time.
This makes sense because parallelization does not add anything in functionality (the outcome should not change). My point is: program functionality and configure/compile parallelization afterwards (possibly by power-users). There could be a unique selling point for open source: parallel performance because you can recompile.
nosig today
From a science perspective I'm pretty sure that either computer are already "sentient" or (IMHO, more likely) that we don't really understand what sentience is. At all.
The Internet is at least in the 1 billion cores range. The way to use many of them for a parallel computation has been demonstrated by Seti@home, Folding@home and even by botnets. They might not be the most efficient implementations when you have full control of the cores but they show the way to go when the availability of the cores and the communication between them is unreliable, when they have different times and different clocks and when they might be preempted to do different tasks.
Exactly! New is the new old. A million processors? Pah! Old hat. There has been done lots of interesting research into parallel processing in the past. Read the Connection Machine book It's a great read.
..'
Feynman was also involved with the machine at a certain point. There's a great writeup on him and it for a quick introduction: '.. It was a complicated device; by comparison, the processors themselves were simple. Connecting a separate communication wire between each pair of processors was impractical since a million processors would require $10^{12]$ wires. Instead, we planned to connect the processors in a 20-dimensional hypercube so that each processor would only need to talk to 20 others directly.
The C-5 looked awesome as well. And I'll just keep quiet about all the cool Lisp stuff they did on it.
The brain does not do arithmetic, it only does pattern matching. That's what most people don't get and that's the obstacle to understanding and realizing AI.
If you ask how can humans can then do math in their brain, the answer is simple: they can't, but a pattern matching system can be trained to do math by learning all the relevant patterns.
If you further ask how humans can do logical inference in their brain, the answer is again simple: they can't, and that's the reason people believe in illogical things. Their answers are the result of pattern matching, just like Google returning the wrong results.
You don't even need Erland, you can use a lightweight message-passing library like ZeroMQ that lets you build fast concurrent applications in 20 or so languages. It looks like sockets but implements Actors that connect in various patterns (pubsub, request-reply, butterfly), and works with Ruby, Python, C, C++, Java, Ada, C++, CLisp, Go, Haskell, Perl, and even Erlang. You can even mix components in any language.
You get concurrent apps with no shared state, no shared clock, and components that can come and go at any time, and communicate only by sending each other messages.
In hardware terms it lets you run one thread per core, at full efficiency, with no wait states. In software terms it lets you build at any scale, even to the scale of the human brain, which is basically a message-passing concurrent architecture.
My blog
on the contrary, i can't wait for sentient machines. at last we will be free of shoddy human programming. no vendor lockins and other such stuff. just tell your computer to write a specific program you desire for a specific purpose, and he will write it for you.
Wealth is the gift that keeps on giving.
No no - you had the golden chance and missed it!
You *license* the baby!
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine