Four Core Processor to Bring Tera Ops
panhandler writes "As reported at CNet and the Austin American Statesman, researchers at UT are working with IBM on a new CPU architecture called TRIPS (Tera-op Reliable Intelligently adaptive Processing System). According to IBM, 'at the heart of the TRIPS architecture is a new concept called 'block-oriented execution"' which will result in a processor capable of executing more than 1 trillion operations per second."
what if they could fabricate each core separately and then somehow connect the cpu's. Shouldn't be too hard to do in the factory. It wouldn't be as fast as a single core cpu's internal bus but it would be a heck of a lot better then mult cpu's in standard mobos now (like xeon's etc.).
Hmmm... Pie...
Exactly. The IA64/itanic/itanium instruction set provides for executing multiple instructions "simultaneously" (aka: pipelined with no interference) but the intel guy I heard from said it so far doesn't provide anything close to the improvements they hoped the feature might. Scaling it up to 64 instructions per clock is only going to help tasks which IBM supercomputers have already lost to beowolf clusters.
Imagine how high the failure rate would be with fabricating a CPU with four cores... I don't see how it would be practical unless it was with an extremely-high yield design such as the StrongARM.
Naw, that doesn't seem like too big a problem. All they have to do is check to see how many cores are working, and then sell the chips like that. Something like this (assuming you pay a premium for more cores, relative to the lower yields):
$500 for 1 core
$1200 for 2 cores
$1800 for 3 cores
$2500 for 4 cores
Wasn't the PS3 "Cell" chip made by IBM and Sony supposed to deliver 1 teraflop too?
That's throughput they're working on, which is great, but not the problem. Latency is the problem, not throughput. Try having large programs with lots of branches and/or syscalls: If the code is large enough, you'll spend more time bringing pages in from memory than actually executing your code, especially since you can forget about pipelining benefits...
Personally, I wish a company would throw out every idea from current memory, put a GB of cache on a chip, and get memory access times down to about 3 picoseconds. But memory doesn't have the marketing appeal that processors do, so we're screwed.
--That's the point of being root, you can do anything you want, even if it's stupid.
The problem is the larger the cache size, the slower the access time. It is a trade off.
Actually, I have substantially optimised my Linux startup times to get it down to that. I've removed a load of non-essential services (I'm not running a mail server or web server at all now, I only really have stuff that runs from inetd and mysql running other than the absolute essentials) and moved the X startup so that it happens before a lot of other stuff has loaded.
OK, I'll admit that I haven't parallelised it beyond this, but I wouldn't expect to see a huge amount of improvement from that. Besides, most unix daemons fork and terminate the parent process before doing very much, in much the same way that most WinNT services just call StartServiceControlHandler (or whatever its called) first thing as they get into their WinMain()... there's not a lot to gain by parellelising that.
Besides, Win2 boots some services in parallel, while in Linux we still boot all of them sequentially, waiting for [OK] string before starting the next one. The only way to paralelize the sequence is to track dependencies between services. In Gentoo there are some efforts to do the parallel boot.
How are they doing it?
I've often thought that we should be booting up our computers with a parallel invocation of "make". Then when adding a new service you would have none of this "what number between 0 and 100 should I assign?" foolishness: just write a three line makefile that includes all the dependencies that your service has on others.
Or maybe IBM's MRAM will do this for us.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"