Intel Pledges 80 Core Processor in 5 Years
ZonkerWilliam writes "Intel has developed an 80 core processor with claims 'that can perform a trillion floating point operations per second.'" From the article: "CEO Paul Otellini held up a silicon wafer with the prototype chips before several thousand attendees at the Intel Developer Forum here on Tuesday. The chips are capable of exchanging data at a terabyte a second, Otellini said during a keynote speech. The company hopes to have these chips ready for commercial production within a five-year window."
With the heavily threaded nature of BeOS, even demanding apps would really fly on the quad+ core cpus that are preparing to take over the world.
Not that you couldn't do threading right in Windows, OS X, or Linux. But BeOS made it practically mandatory: each window was a new thread, as well as an application-level thread. Plus any others you wanted to create. So to make a crappy application that locks up when it is trying to do something (like update the state of 500+ nodes in a list; ARD3 I'm looking at you) actually took skill and dedication. The default state tended to be applications that wouldn't lockup while they worked, which is really nice.
Slashdot Patriotism: We Support our Dupes!
Moore's law says nothing about speed, that is a common error. IIRC he made a general statement about the number of transistors that could be in a defined area doubling every 2 years and that was later changed to 18 months. It also had to do with cost of transistors I believe.
Scheduling isn't a one size fits all process. What works at 4 cores doesn't work at 40 and so on. As for other operating systems, FreeBSD has been working quite actively on getting Niagras working well with their sparc64 port. I've been saying it didn't make sense until this announcement. I figured we'd have no more than 8 cores in 5 years. We'll see what really happens.
The BSD projects, Apple and Microsoft have five years. Microsoft announced awhile back they want to work on supercomputing versions of windows. Perhaps they will have something by then. Apple and Intel are bed partners now. I'm sure intel will help them.
What this announcement really means is that computer programmers must learn how to break up problems more effectively to take advantage of threading. Computer science programs need to start teaching this shit. A quick you can do it, go get a master's degree to learn more isn't going to cut it anymore. There's no going back now.
MidnightBSD: The BSD for Everyone
The big question is how these processors interconnect. Cached shared memory probably won't scale up that high. An SGI study years ago indicated that 20 CPUs was roughly the upper limit before the cache synchronization load became the bottleneck. That number changes somewhat with the hardware technology, but a workable 80-way shared-memory machine seems unlikely.
There are many alternatives to shared memory, and most of them, historically, are duds. The usual idea is to provide some kind of memory copy function between processors. The IBM Cell is the latest incarnation of this idea, but it has a long and disappointing history, going back to the nCube, the BBN Butterfly, and even the ILLIAC IV from the 1960s. Most of these, including the Cell, suffered from not having enough memory per processor.
Historically, shared-memory multiprocessors work, and loosely coupled network based clusters work. But nothing in between has ever been notably successful.
One big problem has typically been that the protection hardware in non-shared-memory multiprocessors hasn't been well worked out. The Infiniband people are starting to think about this. They have a system for setting up one way queues between machines in such a way that appliations can queue data for another machine without going through the OS, yet while retaining memory protection. That's a good idea. It's not well integrated into the CPU architecture, because it's an add-on as an I/O device. But it's a start.
You need two basic facilities in a non-shared memory multiprocessor - the ability to make a synchronous call (like a subroutine call) to another processor, and the ability to queue bulk data in a one-way fashion. (Yes, you can build one from the other, but there's a major performance hit if you do. You need good support for both.) These are the same facilities one has for interprocess communication in operating systems that support it well. (QNX probably leads in this; Minix 3 could get there. If you have to implement this, look at how QNX does it, and learn why it was so slow in Mach.)
So think more like Cell with 80 SPEs. Great for lots of vector processing.
No folly is more costly than the folly of intolerant idealism. - Winston Churchill