Your point is well taken except there are few technological issues:
Yield: When you stack 4 layers up, the only economical way would be to test the four layers separately before stacking. Testing means that you would need pull the signals out before you can do that. You will lose some of the wirelength reduction advantage there because you will now have to design the system for intermediate testing. No, testing after all packaging is not a viable option. Do a simple calculation, if probability of one layer working is 0.99, then probability of 4 layers working simultaneously will be (0.99)^4 = 0.96. This will significantly affect your cost.
Bigger L2 onchip cache: Actually that may not help that much. If you have ran the SPEC2000 or latest benchmarks, too large a L2 cache doesn't help. Yes SPEC benchmarks are not the real world applications. But making L2 bigger also means larger access time. In the end you may end up not gaining anything. A more interesing idea would be to put on-chip main memory. Again the major latency is not due to its being off-chip but due to memory architecture design itself. The only overhead you will save by bringing main memory on chip will be the multiplexing of signals and buffers. That is a small fraction of the off-chip memory latency. The main bottleneck is still the access from the rows and banks.
Is it really 3D ?: Actually it is not really 3D as you cannot connect two layers where you want. Due to technology problems, the interlayer connections are much bigger than rest of the features. They also have lot of electrical resistance. For example RPI technology requires interlayer interconnects to be 4-6 microns wide with 4-6 microns distance. That is a lot of real estate on chip if you consider that transistor gate length in production is 90 nm. So there is a long way to go.
Is 3D useful for microprocessor? That is still a debate. But there is somewhere else it may be useful: heterogenous integration. If you want to integrate RF, Analog and Digital: you can make them separately and optimize them separately. In the end you stack them up and that seems to be more promising application.
Yes heating is a big issue in 3D chips. In common ICs the heat is removed from the bulk as silicon is a better conductor of heat (than silicon dioxide). But in 3D, upper stacks are the victims. So a 3D design does need to take into account the temperature effect during place and route. Present tools do not take that into account (Cadence does have an option about power aware placement, but doesn't do extensive temperature modeling). There had been couple of papers regarding this (available on IEEExplore):
(1) Heating effects of clock drivers in bulk, SOI, and 3-D CMOS
Liu, C.C. Jifeng Zhang Datta, A.K. Tiwari, S.
This paper appears in: Electron Device Letters, IEEE
Publication Date: Dec. 2002 , Volume: 23 , Issue: 12,On page(s): 716 - 718
(2) Full chip thermal analysis of planar (2-D) and verticallyintegrated (3-D) high performance ICs
Sungjun Im Banerjee, K.
This paper appears in: Electron Devices Meeting, 2000. IEDM Technical Digest. International
Publication Date: 2000
On page(s): 727-730
Another approach: Researchers had been tinkering with idea of using heat pipes to conduct the heat from top layers. But then that will also affect the vertical routing density (because you will have to make dummy vertical vias to pull the heat out).
Re:Why can't we distribute this work?
on
Software Telescope
·
· Score: 5, Informative
The bandwidth of the connection between each Remote Stations and the Central Processing Systems will be ~10 Gbit/s, of which ~ 2.5 Gbit/s will be occupied by the sustained datarate resulting from the sensors.
LOFAR produces very large data streams, especially for the astronomy application (e.g. 6 TB of raw visibility data for an 8 beam, 4 hour synthesis observation, after integration for 1 sec and over 10kHz).
They mention that final post-processing can be done at a central processing station (I am guessing the Blue Gene one) or locally by the users. Only bottlenecks seems to be the bandwidth.
LOFAR post-processing can take place either at the Central Processor or locally with the users (in particular at Science Centers). If the available Internet capacity is sufficient, intermediate dataproducts can be transported to the user, and local processing can be done. Otherwise processing resources at the Central Processor are available for further data reduction (within the limits of the Central Processor processing budget).
LOFAR is going to be exciting
on
Software Telescope
·
· Score: 5, Informative
Our earlier Slashdat stories on LOFAR: a consortium between ASTRON (The Netherlands), NRL (USA) and MIT/Haystack (USA).:
I was talking to a professor in astronomy here and he mentioned about some of the conflicts between US and Europe regarding the plan. That is one of the reasons why US is also working on Square Kilometer Array. LOFAR imaging telescope are designed for the 10-240 MHz frequency range where as SKA will cover 0.15-20GHz or higher. Hopefully the two efforts will complement each other.
I don't think I said PPC970 core. For blue gene they used PPC440 core. I meant that once the core is developed it can be used for many different applications (Blue Gene being one example where they used an already developed core rather than designing from scratch).
IBM Journal of R&D has a special on Blue gene. From the article which has details about the processing node in Blue Gene.
The BLC ASIC that forms the heart of a BG/L node is a
SoC built with the IBM Cu-11 (130-nm CMOS) process.
Integrating all of the functions of a computer into a single
ASIC results in dramatic size and power reductions for
the node. In a supercomputer, this can be further
leveraged to increase node density, thereby improving
the overall cost/performance for the machine. The
BG/L node incorporates many functions into the BLC
ASIC. These include two IBM PowerPC 440 (PPC440)
embedded processing cores, a floating-point core for each
processor, embedded DRAM, an integrated external
DDR memory controller, a Gigabit Ethernet adapter,
and all of the collective and torus network cut-through
buffers and control. The same BLC ASIC is used for both
compute nodes and I/O nodes, but only I/O nodes utilize
the Gigabit Ethernet for host and file system connectivity.
The two PPC440s are fully symmetric in terms of their
design, performance, and access to all chip resources.
There are no hardware impediments to fully utilizing both
processors for applications that have simple message-
passing requirements, such as those with a large compute-
to-I/O ratio or those with predominantly nearest-
neighbor communication.
IBM posted an application note in direct reference to a dual 64-bit core PowerPC970MP and how to use thermal diodes in the chip long ago. (not available on IBM website anymore). Mac rumors has a copy of it here
From the notes:
The dual 64-bit core PowerPC970MP(TM) (970MP) is the next evolutionary step in the PowerPC 970 family of microprocessors. The higher frequency grade versions of the 970MP consume higher amounts of power than earlier IBM microprocessors do, and that can cause temperature issues. Each 970MP processor core contains a thermal diode used to monitor its operating temperature. The thermal diode must be monitored to ensure that the maximum operating temperature of the 970MP is not exceeded.
I wonder if Apple will reconsider the decision regarding the migration. I don't think it will feasible for them to support products with both the processors. According to the rumors on the web, Apple wasn't happy about the low power processor option from IBM. I wonder if this is it ?
Cornell had (2004-2005) a pilot program where Napster services were provided free to the students. At that time it was supported by corporate sponsors and gifts fund in Student and Academic Services.
And the way it saved bandwidth (obvious) was by using a local caching server.
I am not trying to be troll here, but why Google waited so long to provide the toolbar for a non-IE browser ?
How about other neat google goodies like Google Desktop Search and Picasa photo organizer ? Any guess if they would provide these utilities for *nix too.
Few weeks ago Cornell orinthologists rediscovered ivory-billed woodpecker in Big Woods of Arkansas. It was believed to be extinct as well. More than 60 years after the last confirmed sighting of the species in the United States they found at least one male ivory-bill still survives in vast areas of bottomland swamp forest.
Like any scientific endeavor, the journey is as important as the final goal. Many new techniques and ideas come along to put the whole picture together. So even if a practical realization of this technique may not be feasible, the learning (from the experiments and theory) will be useful.
PhysOrg (Article on Caltech's work on weighing molecules) has comment about the possible applications:
The new method might ultimately permit the creation of microchips, each possessing arrays of miniature mass spectrometers, which are devices for identifying molecules based on their weight. Today, high-throughput proteomics searches are often done at facilities possessing arrays of conventional mass spectrometers that fill an entire laboratory and can cost upwards of a million dollars each, Roukes adds. By contrast, future nanodevice-based systems should cost a small fraction of today's technology, and an entire massively-parallel nanodevice system will probably ultimately fit on a desktop.
The rupture spread from south to north, resulting in a Doppler effect in instruments measuring it. Seismometers in Russia recorded the quake at a higher frequency because it was moving toward them, while those in Australia measured a lower frequency as it moved away.
I was wondering about this: Depending where you are measuring the signal, you should observe different frequencies. Science paper doesn't give too much details about this though.
Link to the Science article. Article has some interesting numbers as well:
It released 4.3 x 1018 J, equivalent to a 100-gigaton bomb, or about as much energy as is used in the United States in 6 months. Shifts in the sea floor displaced more than 30 km3 of seawater, generating a tsunami that traveled to the Antarctic, the east and west coasts of the Americas, and (with lessening amplitudes) the Arctic Ocean.
From the article: "In order to authenticate, the player would also need to link to some type of online network, similar to the EPCglobal Network, that would associate the DVD with a legal sale. Through this system, the copyright owners (the film production company and any other license-holders of the content) would have digital rights management over the work."
That doesn't sound right. The RFID is only a way of providing a unique identifier to a stamped DVD. Does it mean I have to authenticate my DVDs online to play it ?
From the article: "But viewers would not be able to play the DVDs without an RFID-enabled player because the tag would essentially lock the disc."
Secondly it is antifuse-based one-time programmable ROM. It is NOT a flash which can be re-written 100,000 times. So it is more useful for storing application code but not for data storage etc.
Antifuse base memories are diode like and can be much smaller than regular FLASH memories. But these are inherently slower and also don't have any gain element (like transistor). This requires careful design to achieve good signal-to-noise ration for memory read operation
More aggressive 3D technology was demonstrated by IBM last year where they have circuits in 3D.
A startup R-cube logic is also designing 3D microprocessor where memory is put on top of the logic core to reduce latency.
Xanoptics is more into hybrid design (mixed analog, RF, optics) on a single footprint.
Yield: When you stack 4 layers up, the only economical way would be to test the four layers separately before stacking. Testing means that you would need pull the signals out before you can do that. You will lose some of the wirelength reduction advantage there because you will now have to design the system for intermediate testing. No, testing after all packaging is not a viable option. Do a simple calculation, if probability of one layer working is 0.99, then probability of 4 layers working simultaneously will be (0.99)^4 = 0.96. This will significantly affect your cost.
Bigger L2 onchip cache: Actually that may not help that much. If you have ran the SPEC2000 or latest benchmarks, too large a L2 cache doesn't help. Yes SPEC benchmarks are not the real world applications. But making L2 bigger also means larger access time. In the end you may end up not gaining anything. A more interesing idea would be to put on-chip main memory. Again the major latency is not due to its being off-chip but due to memory architecture design itself. The only overhead you will save by bringing main memory on chip will be the multiplexing of signals and buffers. That is a small fraction of the off-chip memory latency. The main bottleneck is still the access from the rows and banks.
Is it really 3D ?: Actually it is not really 3D as you cannot connect two layers where you want. Due to technology problems, the interlayer connections are much bigger than rest of the features. They also have lot of electrical resistance. For example RPI technology requires interlayer interconnects to be 4-6 microns wide with 4-6 microns distance. That is a lot of real estate on chip if you consider that transistor gate length in production is 90 nm. So there is a long way to go.
Is 3D useful for microprocessor? That is still a debate. But there is somewhere else it may be useful: heterogenous integration. If you want to integrate RF, Analog and Digital: you can make them separately and optimize them separately. In the end you stack them up and that seems to be more promising application.
(1) Heating effects of clock drivers in bulk, SOI, and 3-D CMOS
Liu, C.C. Jifeng Zhang Datta, A.K. Tiwari, S. This paper appears in: Electron Device Letters, IEEE Publication Date: Dec. 2002 , Volume: 23 , Issue: 12 ,On page(s): 716 - 718
(2) Full chip thermal analysis of planar (2-D) and verticallyintegrated (3-D) high performance ICs
Sungjun Im Banerjee, K. This paper appears in: Electron Devices Meeting, 2000. IEDM Technical Digest. International Publication Date: 2000 On page(s): 727-730
Another approach: Researchers had been tinkering with idea of using heat pipes to conduct the heat from top layers. But then that will also affect the vertical routing density (because you will have to make dummy vertical vias to pull the heat out).
The bandwidth of the connection between each Remote Stations and the Central Processing Systems will be ~10 Gbit/s, of which ~ 2.5 Gbit/s will be occupied by the sustained datarate resulting from the sensors.
LOFAR produces very large data streams, especially for the astronomy application (e.g. 6 TB of raw visibility data for an 8 beam, 4 hour synthesis observation, after integration for 1 sec and over 10kHz).
They mention that final post-processing can be done at a central processing station (I am guessing the Blue Gene one) or locally by the users. Only bottlenecks seems to be the bandwidth.
LOFAR post-processing can take place either at the Central Processor or locally with the users (in particular at Science Centers). If the available Internet capacity is sufficient, intermediate dataproducts can be transported to the user, and local processing can be done. Otherwise processing resources at the Central Processor are available for further data reduction (within the limits of the Central Processor processing budget).
When Lofar Meets Stella
350 KM Diameter Radio Telescope Array
I was talking to a professor in astronomy here and he mentioned about some of the conflicts between US and Europe regarding the plan. That is one of the reasons why US is also working on Square Kilometer Array. LOFAR imaging telescope are designed for the 10-240 MHz frequency range where as SKA will cover 0.15-20GHz or higher. Hopefully the two efforts will complement each other.
IBM Journal of R&D has a special on Blue gene. From the article which has details about the processing node in Blue Gene.
The BLC ASIC that forms the heart of a BG/L node is a SoC built with the IBM Cu-11 (130-nm CMOS) process. Integrating all of the functions of a computer into a single ASIC results in dramatic size and power reductions for the node. In a supercomputer, this can be further leveraged to increase node density, thereby improving the overall cost/performance for the machine. The BG/L node incorporates many functions into the BLC ASIC. These include two IBM PowerPC 440 (PPC440) embedded processing cores, a floating-point core for each processor, embedded DRAM, an integrated external DDR memory controller, a Gigabit Ethernet adapter, and all of the collective and torus network cut-through buffers and control. The same BLC ASIC is used for both compute nodes and I/O nodes, but only I/O nodes utilize the Gigabit Ethernet for host and file system connectivity. The two PPC440s are fully symmetric in terms of their design, performance, and access to all chip resources. There are no hardware impediments to fully utilizing both processors for applications that have simple message- passing requirements, such as those with a large compute- to-I/O ratio or those with predominantly nearest- neighbor communication.
From the notes:
The dual 64-bit core PowerPC970MP(TM) (970MP) is the next evolutionary step in the PowerPC 970 family of microprocessors. The higher frequency grade versions of the 970MP consume higher amounts of power than earlier IBM microprocessors do, and that can cause temperature issues. Each 970MP processor core contains a thermal diode used to monitor its operating temperature. The thermal diode must be monitored to ensure that the maximum operating temperature of the 970MP is not exceeded.
IBMs own server products and embedded processors. IBM's blue gene used the core from earlier PowerPC series.
I wonder if Apple will reconsider the decision regarding the migration. I don't think it will feasible for them to support products with both the processors. According to the rumors on the web, Apple wasn't happy about the low power processor option from IBM. I wonder if this is it ?
And the way it saved bandwidth (obvious) was by using a local caching server.
How about other neat google goodies like Google Desktop Search and Picasa photo organizer ? Any guess if they would provide these utilities for *nix too.
RFC 792 dates back sep 1981.
wikipedia
Apple hardware runs on Mac OS X
Cutting Archives does a lot of restoration work. Check their faq
We also had a cool story on slashdot before about Concert to be Performed from Beyond the Grave
We covered a story from Cornell on self-replicating robots before. I guess it wasn't opensource.
It is all clear now, folks...
Story here
PhysOrg (Article on Caltech's work on weighing molecules) has comment about the possible applications:
The new method might ultimately permit the creation of microchips, each possessing arrays of miniature mass spectrometers, which are devices for identifying molecules based on their weight. Today, high-throughput proteomics searches are often done at facilities possessing arrays of conventional mass spectrometers that fill an entire laboratory and can cost upwards of a million dollars each, Roukes adds. By contrast, future nanodevice-based systems should cost a small fraction of today's technology, and an entire massively-parallel nanodevice system will probably ultimately fit on a desktop.
The rupture spread from south to north, resulting in a Doppler effect in instruments measuring it. Seismometers in Russia recorded the quake at a higher frequency because it was moving toward them, while those in Australia measured a lower frequency as it moved away.
I was wondering about this: Depending where you are measuring the signal, you should observe different frequencies. Science paper doesn't give too much details about this though.
Link to the Science article. Article has some interesting numbers as well:
It released 4.3 x 1018 J, equivalent to a 100-gigaton bomb, or about as much energy as is used in the United States in 6 months. Shifts in the sea floor displaced more than 30 km3 of seawater, generating a tsunami that traveled to the Antarctic, the east and west coasts of the Americas, and (with lessening amplitudes) the Arctic Ocean.
And the related science paper from last year.
Professor Woo Suk Hwang and his colleagues also successfully cloned human embryos last year.
Australia
South Korea
Brazil
Spain
India
Vienna
French Police
Dutch
Venezuela
Germany
That doesn't sound right. The RFID is only a way of providing a unique identifier to a stamped DVD. Does it mean I have to authenticate my DVDs online to play it ?
From the article: "But viewers would not be able to play the DVDs without an RFID-enabled player because the tag would essentially lock the disc."
Now we have to buy another DVD player ??
Earlier story on slashdot about Cornell work on atomic MEMS
Secondly it is antifuse-based one-time programmable ROM. It is NOT a flash which can be re-written 100,000 times. So it is more useful for storing application code but not for data storage etc.
Antifuse base memories are diode like and can be much smaller than regular FLASH memories. But these are inherently slower and also don't have any gain element (like transistor). This requires careful design to achieve good signal-to-noise ration for memory read operation
More aggressive 3D technology was demonstrated by IBM last year where they have circuits in 3D.
A startup R-cube logic is also designing 3D microprocessor where memory is put on top of the logic core to reduce latency.
Xanoptics is more into hybrid design (mixed analog, RF, optics) on a single footprint.