Sun Unveils Direct chip-to-chip Interconnect
mfago writes "On Tuesday September 23, Sun researchers R. Drost, R. Hopkins and I. Sutherland will present the paper "Proximity Communication" at the CICC conference in San Jose. According to an article published in the NYTimes, this breakthrough may eventually allow chips arranged in a checkerboard pattern to communicate directly with each other at over a Terabit per second using arrays of capacitively coupled transmitters and recievers located on the chip edges. Perhaps the beginning of a solution to the lag between memory and interconnect speed versus cpu frequency?"
therefore the speeed increase will be unnoticable.
I wonder if this release might have been pressed forward a bit to squelch some of the talk about Sun losing their will to innovate after Bill Joy left.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Does "terrabit" mean that it will be made of pieces of the earth?
Via Google
This could prove very interesting as the speed usually drops when "leaving the chip" to do communications. There has been alot of research to develop protocols to ease on-chip communication when several ICs are combined on a single chip. If Suns technology can stand the test, NoC/SoC products could reduce it's time-to-marked dramatically...smaller and faster devices for everyone!
BTW: I didn't RTFA since it requires (free) reg.
Or maybe Rambus is already fixing to sue them.
-Libertarian secular transhumanist
That is the nature of the beast.
Remember how excited you were to get your hands
on a 386 machine?
The thrill of your first encounter with a 286 screamer?
Upgrading to 16k from 4k on your TRS-80?
Your first disk drive for your Apple 2?
It's all relative.
So enjoy
Whatever, I think this will end up being the SUV of chip to chip conections. ;)
New Sun Microsystems Chip May Unseat the Circuit Board
This might be the obvious question but, why hasn't anyone done this before?
It seems obvious, the end of chip has pins. The chip it will eventually connect to has pins. Instead of having 20 trace lines to the next chip why not redesign them so the out/inputs of both line up to reduce the complexity of the design.
Anyone wanna fill in my mental gap for me?
of these, well that's kind of the point actually :-)
...in bed
I wonder if this hardware computing model could provide the first real base for Neural Network computing? As far as I know, any neural network is currently emulated on linear processing machines.
Someone gets it. As an Electrical Engineer-in-training, I was always frustrated with people who got these big bad processors and wondered why their improvement was minimal.
They never quite grasped that the biggest bottleneck is between the processor and memory.
My EE instructor always said that they could improve performance by doing one simple thing: make the interconnects on the motherboard between the motherboard and RAM rounded instead of cornered. You could then increase bus speed as you wouldn't have magnetic loss at the corners like you do now.
You fix that, and you can see a SUBSTANTIAL improvement in performance. The only thing that can be done beyond that is to get a Platypus drive (Solid state "Hard Drive" made from Quikdata made from DDR RAM). Then you reduce your access time to your hard drive from milliseconds to nano/microseconds.
Sounds a lot like the ol' Transputer (was from INMOS), of course faster. One could also think of AMD's HyperTransport. So, again, except maybe for the speed, I don't see much innovation here.
If only people could remember that "terra" has something to do with earth, "tera" is the unit...
Placing large numbers of chips adjacent to one another has obvious problems with heat and power, in particular when running at those speeds. That, rather than interconnect technology, is probably the main reason we still package up chips in large packages.
This might be useful for placing a small number of chips close together, in particular chips that may require different manufacturing processes.
The article is a bit vague as to what the innovation really is.
The article immediately made me think of multi-chip modules. Multi-chip modules is an idea which never really caught on in the industry (except for IBM), and I'm not sure how Sun's innovation isn't just a take-off along that idea. Multi-chip modules have failed due to costs since much has to go right to get a multi-chip module that works.
Any practical chip-to-chip connectivity scheme had better have a good rework scheme. If it doesn't, it's just boutique technology that will not affect the industry overall.
Having worked on chips with multi-gigabit pins, a huge problem is resynchonizing the signals. Creating a receiver to align one pin's data with 15 neighbors at 3GHz takes a whole lot more logic space on the die than a small driver (or receiver). The auxiliary logic basically makes shrinking the final driver FET almost meaningless.
Modern chip design is a constant trade-off between features and cost. And what's cheap is what everyone has been doing for years (or is an evolution of that).
By comparison, an Intel Pentium 4 processor, the fastest desktop chip, can transmit about 50 billion bits a second. But when the technology is used in complete products, the researchers say, they expect to reach speeds in excess of a trillion bits a second, which would be about 100 times the limits of today's technology.
If a P4 is already doing 50 Gbps (as they say), and this uber-technology will allow 1Tbps (which is 20x a P4's 50Gbps), then how is that "100x the limits of today's technology" ?
<shakes head>
Normally I don't pimp Sun, but here's something that makes me think they still have a finger on the pulse of things:
;-)
Read about plans for Sun's "Niagra" core
I understand they hope to create blade systems using high densities of these multiscalar cores for incredible throughput.
There's your parallel/grid computing.
the Transputer. It had 4 available hardware connections and the description of the way the different processors communicate is very similar to what is described by the article.
Of course to take maximum effect of this communication speed in general parallel applications, main memory access would have to be improved. I'd guess these things will have huge on-chip caches.
IANAEE either, but this made a little more sense to me after I read this Inforworld article, which talks about two other aspects of Sun's DARPA-funded project: clockless "asynchronous logic", and building processors with interchangeable and upgradable modules. They absolutely need these busless "proximity" interconnects for the processor modules to communicate at close to on-chip speeds, and the clockless architecture lets them get rid of the bus. Or vice versa... or something like that.
Working prototype computer about six years away, according to the article.
"This is not a sig." -- R.
As usual with alot of Computer Science, this appears to be just an old idea reinvented...the Transputer...and about time too!
If you look at a modern evaluation board with gigabit SERDES or SERialization-DESerialization (e.g. the 3.125Gbit/s differential signal pair per channel), the trace routes are typically rounded, with no square corners. This is done to reduce the effective impedance along the line which needs to be carefully controlled. They also run in parallel closely-routed pairs because it's typically a differential signal. Actually looks a bit like a set of minature train tracks without the railroad ties.
In fact, multichannel SERDES is the next real interconnect technology. It's used in Infiniband, HyperTransport, PCI Express, Rambus RDRAM and in 10 Gb/s Ethernet (usually as 4x3.125Gbit/s channels as a XAUI interface between optical module and switch fabric silicon with 8b/10b conversion). There are even variants, such as LSI Logic's HyperPHY, that are deployed specifically for numerous high-bandwidth chip-to-chip interconnections. The problem that is cropping up is that the traditional laminate PCBs are becoming the limiting factor in increasing per-channel connectivity, to the extent that 10Gbit/s per channel speeds are next to impossible on these boards due to the lack of signal integrity. There has been some experimentation for very short hops on regular boards, as well as using PTFE resins to manufacture the boards themselves, but it's precarious at best.
As for Sun's technology, it's interesting but I don't know how much it will catch on or how feasible it will be. It creates packaging issues and requires good thermal modelling and 3-D field modelling to account for expansion and contraction through the operating temperature range and the presence of nearby signals, which could affect the integrity of the signals.
Great ! I am so happy to see that there are some real programmers exist who see the truth. I have seen our Sun E3500 with 8 CPUs felt like a pentium pro with java shit running on it. But it was management's vision ... what we can do. I just procured the servers and pretended that I am doing social work by giving Sun more money.
- People who believe other people have no right to live, got no right to live ...
You need more training. Or less ego.
Look at a recent P4 motherboard for 45 degree traces. Look at any previous motherboard with RAMBUS (even an Nintendo 64 from November 1999) for curved traces.
It's not so much a question as knowing about something as it is implementing it. If it isn't affordable, it isn't worth it. Because if it isn't affordable, you might be able to buy two affordable ones for the same price. And you're going to have trouble beating the performance of two systems with one.
Finally, to make a hard drive from RAM is to totally lose track of the idea of what a hard drive is. Hard drives are supposed to be slower but they make it for it with lower cost per megabyte. Instead of a RAM drive, just put more RAM in your machine, it will use it as a disk cache/backing store and get you all the performance you want.
Also, at 120us per command register access, you really cannot initiate any transfer over ATA in under 0.75ms.
...in particular chips that may require different manufacturing processes.
Or at least portions of more complex circuits where part of the circuit may not warrant the added cost of SOI, 90nm, or strained silicon.
But then, those divisions are already made. AMD, for one, is working on recombining those parts. As an example, consider AMD's putting the memory controller on the CPU die.
I am curious, however, as to whether you could have more than one silicon die in the same ceramic casing. This would let you combine different parts made using different techniques. The CPU and L1/L2 caches could be on a high-cost process, with the L3 cache and memory controller being built with cheaper processes.
Placing more emphasis on cost savings than on performance, you could build the L1 instruction cache with a slower process than the L1 data cache. Or you could leave all of L1 on the same process and split L2 into instruction and data caches, with different processes.
If you wanted to ramp up CPU speed without a major hit to your performance, you could reduce the feature size of the core and L1 caches, and put L2 on a slower, more reliable process. That way, you could ramp up core speeds without having as much worry about yield loss from cache failure.
Helluva way to increase yield, and it gives the designer a LOT of options.
What's this Submit thingy do?
Isn't that called a trace? Or another fancy name would be a lead? I think that there are people with prior art...
No, a trace is a flat wire stuck to (or etched from) a printed circuit board. This invention (process, really, see below) obviates the need for PCB's between (at least some of the) chips. A lead is a wire, not stuck to a PCB, such as the input connections to most oscilloscopes and test equipment.
I don't get it either. You want to make memory access faster and faster, so you put it closer and closer to the cpu. Eventually the bus length reaches 0, as the two chips are physically adjacent. So what?
As with many great inventions, the difficulty is not so much thinking of what needs to be done, but in actually doing it cost-effectively. System designers have been trying to use the idea of optimized interconnect (sometimes called "integration", as in LSI, VLSI, etc.) but it has remained cost-prohibitive in most cases (notable exceptions include the Pentium Pro and some ATI mobility products, but these are more desperation moves than anything, since margins drop on multi-chip "chips", and they had to do it to get the needed result even though the costs were higher than normally tolerable.)
So, sure, light bulbs are obvious, as are cars, space shuttles, computers, etc. The hard part is making them possible technically and economically.
Hope that helps you two understand why the ASIC-design industry is pretty damn excited and anxious to license this technology (if we really can do this as cheaply as they claim).
everything in moderation