Sun Unveils Direct chip-to-chip Interconnect

← Back to Stories (view on slashdot.org)

Sun Unveils Direct chip-to-chip Interconnect

Posted by Hemos on Monday September 22, 2003 @01:35AM from the faster-then-a-speeding-bus dept.

mfago writes "On Tuesday September 23, Sun researchers R. Drost, R. Hopkins and I. Sutherland will present the paper "Proximity Communication" at the CICC conference in San Jose. According to an article published in the NYTimes, this breakthrough may eventually allow chips arranged in a checkerboard pattern to communicate directly with each other at over a Terabit per second using arrays of capacitively coupled transmitters and recievers located on the chip edges. Perhaps the beginning of a solution to the lag between memory and interconnect speed versus cpu frequency?"

22 of 185 comments (clear)

It will be running java by holzp · 2003-09-22 01:37 · Score: 5, Funny

therefore the speeed increase will be unnoticable.
Timing? by afidel · 2003-09-22 01:38 · Score: 3, Interesting

I wonder if this release might have been pressed forward a bit to squelch some of the talk about Sun losing their will to innovate after Bill Joy left.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
1. Re:Timing? by Usagi_yo · 2003-09-22 01:43 · Score: 5, Insightful
  
  No. What you don't understand or realize is that Bill Joy actually left 2 years ago, when he "retired" into distinguished senior engineer, from CTO. This latest move by Bill Joy, full retirement is merely a continuation of that. At least thats how I see it.
No registration by Anonymous Coward · 2003-09-22 01:42 · Score: 5, Informative

Via Google
Replacing Network-on-Chip/System-on-Chip by KarmaPolice · 2003-09-22 01:43 · Score: 4, Interesting

This could prove very interesting as the speed usually drops when "leaving the chip" to do communications. There has been alot of research to develop protocols to ease on-chip communication when several ICs are combined on a single chip. If Suns technology can stand the test, NoC/SoC products could reduce it's time-to-marked dramatically...smaller and faster devices for everyone!

BTW: I didn't RTFA since it requires (free) reg.
I suppose this will be patented... by Thinkit3 · 2003-09-22 01:43 · Score: 3, Funny

Or maybe Rambus is already fixing to sue them.

--
-Libertarian secular transhumanist
Fast today Slow Tomorrow by Anonymous Coward · 2003-09-22 01:44 · Score: 5, Interesting

That is the nature of the beast.

Remember how excited you were to get your hands
on a 386 machine?

The thrill of your first encounter with a 286 screamer?

Upgrading to 16k from 4k on your TRS-80?

Your first disk drive for your Apple 2?

It's all relative.

So enjoy
SUV of chip interconnects? by Atomizer · 2003-09-22 01:45 · Score: 3, Funny

Whatever, I think this will end up being the SUV of chip to chip conections. ;)
Link via Google (no Reg. Required) by chrestomanci · 2003-09-22 01:48 · Score: 4, Informative

New Sun Microsystems Chip May Unseat the Circuit Board
IANAEE (I am not an electrical engineer) by Peridriga · 2003-09-22 01:51 · Score: 4, Insightful

This might be the obvious question but, why hasn't anyone done this before?

It seems obvious, the end of chip has pins. The chip it will eventually connect to has pins. Instead of having 20 trace lines to the next chip why not redesign them so the out/inputs of both line up to reduce the complexity of the design.

Anyone wanna fill in my mental gap for me?
1. Re:IANAEE (I am not an electrical engineer) by Jah-Wren+Ryel · 2003-09-22 02:02 · Score: 5, Insightful
  
  It has been done before, probably the most recent incarnation is hypertransport from AMD. The only difference at the 50,000ft view is that the speeds and feeds are faster. This is an evolutionary step, not revolutionary or innovationary,
  
  --
  When information is power, privacy is freedom.
2. Re:IANAEE (I am not an electrical engineer) by Anonymous Coward · 2003-09-22 02:18 · Score: 5, Informative
  
  You can't simply just remove the circuit board to achieve better speeds, you need to eliminate the need for the pad that converts internal logic to what we currently use externally. That is what Sun is claiming they have done.
  
  Sun's technology is not simply soldering to pins directly together (as you suggest), which is effectively the same thing as wiring through a circuit board. The high speed, low drive strength, low-voltage drivers have to go through pads that convert the internal signal to a slower, high drive strength, high voltage driver, that will yield a reliable connection to the next chip. I'm not an expert in this area, but Physics just gets in the way. There are capacitive issues, and interconnect delay issues.
  
  Sun is claiming to use capacitive coupling (put the pins really close together, but don't physically connect them.) This way they don't have to drive the external load of the pin/board connection, and are claiming they will be able to scale this down to a pad that will be able to switch faster than existing physical wire connected pins. Which means they believe they can make this technology work with lower drive stengths.
  
  They still have a ways to go. Notice that the P4 has faster connections using existing techology. Sun did a proof of concept, and claim they can speed it up 100x. So they haven't _proved_ that this will operate faster yet. They still have many things to overcome to make this viable, including how to make a mass production/assembly process. It's going to be a few years. At least.
3. Re:IANAEE (I am not an electrical engineer) by eXtro · 2003-09-22 02:49 · Score: 5, Informative
  There are two seperate metrics that define the speed of memory. Latency, which is what your 40 ns refers to, and bandwidth. Large caches address the latency problem as you stated. If you want to transfer more bits per cycle you're restricted due to signal integrity issues related to the bus, so the parent post is also correct. You can increase the width of the bus, up to a point, and get a small scalar increase in bandwidth. To go beyond this you need to address signal integrity problems.
  
  Sending fast edges over a bus is difficult because the signal degrades:
  
  inter-signal interference: Each parsel of information spreads due to the RC nature of the bus so that it takes up more than a period, thus interfering with the next packet.
  
  cross-talk: Each wire on the bus is fairly tightly coupled with it's neighbours, so switching activity on one wire affects it's neighbours.
  
  transmission line effects: Package connectors, bends in circuit traces etc all create impedance mismatches. This causes reflections which degrade the signal.
  
  If your dataset fits into the cache well, which is often the case for PCs, then a cache can fix most of your problems. If you're dealing with datasets that span gigabytes or terabytes and your application can't be subdivided such that processing and memory can be constrained per cpu then your cache doesn't assist you very much.
  --
  Chris Kuivenhoven is a thief, beware
Perhaps a physical base for Neural Network? by BlankStare · 2003-09-22 01:59 · Score: 3, Interesting

I wonder if this hardware computing model could provide the first real base for Neural Network computing? As far as I know, any neural network is currently emulated on linear processing machines.
FINALLY! by JoeLinux · 2003-09-22 01:59 · Score: 5, Interesting

Someone gets it. As an Electrical Engineer-in-training, I was always frustrated with people who got these big bad processors and wondered why their improvement was minimal.

They never quite grasped that the biggest bottleneck is between the processor and memory.

My EE instructor always said that they could improve performance by doing one simple thing: make the interconnects on the motherboard between the motherboard and RAM rounded instead of cornered. You could then increase bus speed as you wouldn't have magnetic loss at the corners like you do now.

You fix that, and you can see a SUBSTANTIAL improvement in performance. The only thing that can be done beyond that is to get a Platypus drive (Solid state "Hard Drive" made from Quikdata made from DDR RAM). Then you reduce your access time to your hard drive from milliseconds to nano/microseconds.
1. Re:FINALLY! by chrysrobyn · 2003-09-22 03:30 · Score: 5, Insightful
  
  Someone gets it. As an Electrical Engineer-in-training, I was always frustrated with people who got these big bad processors and wondered why their improvement was minimal.
  
  They never quite grasped that the biggest bottleneck is between the processor and memory.
  
  Don't get too frustrated with this. There will always be people who don't understand something fundamental to your training. That's why you're trained, to understand these non-obvious fundamentals. Now that you understand a CPU has to be fed data in order to process it, it's obvious, but a PHB wouldn't necessarily come to that conclusion on his own.
  
  My EE instructor always said that they could improve performance by doing one simple thing: make the interconnects on the motherboard between the motherboard and RAM rounded instead of cornered. You could then increase bus speed as you wouldn't have magnetic loss at the corners like you do now.
  
  You fix that, and you can see a SUBSTANTIAL improvement in performance. The only thing that can be done beyond that is to get a Platypus drive (Solid state "Hard Drive" made from Quikdata made from DDR RAM). Then you reduce your access time to your hard drive from milliseconds to nano/microseconds.
  
  Your EE instructor will tell you lots of things that can help performance. For example, making the L2 cache be the size of main memory. Just because it helps performance doesn't make it worth the price. Rounded edges on the PCB are not easy to accomplish and their benefits may not be outweighed by the added price-- even for exceptionally high end servers. Without looking at the math, I would like to toss "10% performance adder, 50% cost adder" out into the air, and say that most people would rather save the dough. Another factor to consider is reliability. Intuition suggests to me that reliability would go up without sharp edges, but intuition also tells me that modelling board coupling on a 4 layer board would be a real pain in the ass, to say nothing of a server class 6 or 8 (or higher) layer board if you have to model curved structures. You might not find an easy way to capitalize on your wonderful curved wire performance. Not only do you have to worry about your slowest path, but your quickest one can't arrive so quickly that the other chip can't sample the previous output.
  
  Take care in your classes when you use the word "only". Taking advantage of our wonderful next generation 64 bit processors and multiple gigs of RAM, we could conceivably copy the contents of the hard drive to main memory (especially if we are only concerned with 1-4 gigs of data in a low cost solution). Here, we get the enhanced bandwidth of main memory instead of having to kludge through the southbridge, PCI controller, IDE/SCSI to RAM interface and back.
  
  There are many things that improve system performance-- and the system is the only thing that matters-- rounded wires and SSDs (solid state drives) are only the beginning. Depending on the application, a larger L3 cache may make more difference, or a wider faster CPU to CPU interface, or a pair of PCI controllers hanging off the southbridge for twice the bandwidth, or integrating the northbridge onto the CPU, or ...
  
  The best engineering advice I can give you is that the answer is always, "It depends". You'll spend the next 5-30 years of your life learning how to answer the followon question "Depends on what?". Almost everything has advantages and disadvantages and there are few absolutes.
  
  The "Someone gets it" and "They never quite grasped" attitude may get you in trouble. Being proactive and explaining and educating instead will likely be more effective.
Is this new? by 4im · 2003-09-22 02:00 · Score: 3, Insightful

Sounds a lot like the ol' Transputer (was from INMOS), of course faster. One could also think of AMD's HyperTransport. So, again, except maybe for the speed, I don't see much innovation here.

If only people could remember that "terra" has something to do with earth, "tera" is the unit...
or it might not by penguin7of9 · 2003-09-22 02:00 · Score: 3, Insightful

Placing large numbers of chips adjacent to one another has obvious problems with heat and power, in particular when running at those speeds. That, rather than interconnect technology, is probably the main reason we still package up chips in large packages.

This might be useful for placing a small number of chips close together, in particular chips that may require different manufacturing processes.
Bad math? by Quixote · 2003-09-22 02:19 · Score: 4, Insightful

I hate it when the hype overshadows the technical details. Here's a snippet from the article:
By comparison, an Intel Pentium 4 processor, the fastest desktop chip, can transmit about 50 billion bits a second. But when the technology is used in complete products, the researchers say, they expect to reach speeds in excess of a trillion bits a second, which would be about 100 times the limits of today's technology.
If a P4 is already doing 50 Gbps (as they say), and this uber-technology will allow 1Tbps (which is 20x a P4's 50Gbps), then how is that "100x the limits of today's technology" ?
<shakes head>
Sun may be ahead in other areas, too by Mr.+Ophidian+Jones · 2003-09-22 02:20 · Score: 4, Interesting

Normally I don't pimp Sun, but here's something that makes me think they still have a finger on the pulse of things:
Read about plans for Sun's "Niagra" core

I understand they hope to create blade systems using high densities of these multiscalar cores for incredible throughput.

There's your parallel/grid computing. ;-)
More on the broader project by leery · 2003-09-22 02:21 · Score: 4, Informative

IANAEE either, but this made a little more sense to me after I read this Inforworld article, which talks about two other aspects of Sun's DARPA-funded project: clockless "asynchronous logic", and building processors with interchangeable and upgradable modules. They absolutely need these busless "proximity" interconnects for the processor modules to communicate at close to on-chip speeds, and the clockless architecture lets them get rid of the bus. Or vice versa... or something like that.

Working prototype computer about six years away, according to the article.

--
"This is not a sig." -- R.
Already been done with SERDES by StandardCell · 2003-09-22 03:21 · Score: 4, Informative

If you look at a modern evaluation board with gigabit SERDES or SERialization-DESerialization (e.g. the 3.125Gbit/s differential signal pair per channel), the trace routes are typically rounded, with no square corners. This is done to reduce the effective impedance along the line which needs to be carefully controlled. They also run in parallel closely-routed pairs because it's typically a differential signal. Actually looks a bit like a set of minature train tracks without the railroad ties.

In fact, multichannel SERDES is the next real interconnect technology. It's used in Infiniband, HyperTransport, PCI Express, Rambus RDRAM and in 10 Gb/s Ethernet (usually as 4x3.125Gbit/s channels as a XAUI interface between optical module and switch fabric silicon with 8b/10b conversion). There are even variants, such as LSI Logic's HyperPHY, that are deployed specifically for numerous high-bandwidth chip-to-chip interconnections. The problem that is cropping up is that the traditional laminate PCBs are becoming the limiting factor in increasing per-channel connectivity, to the extent that 10Gbit/s per channel speeds are next to impossible on these boards due to the lack of signal integrity. There has been some experimentation for very short hops on regular boards, as well as using PTFE resins to manufacture the boards themselves, but it's precarious at best.

As for Sun's technology, it's interesting but I don't know how much it will catch on or how feasible it will be. It creates packaging issues and requires good thermal modelling and 3-D field modelling to account for expansion and contraction through the operating temperature range and the presence of nearby signals, which could affect the integrity of the signals.