MIT Develops New Chip That Reduces Neural Networks' Power Consumption by Up to 95 Percent (mit.edu)
MIT researchers have developed a special-purpose chip that increases the speed of neural-network computations by three to seven times over its predecessors, while reducing power consumption 94 to 95 percent. From a report: That could make it practical to run neural networks locally on smartphones or even to embed them in household appliances. "The general processor model is that there is a memory in some part of the chip, and there is a processor in another part of the chip, and you move the data back and forth between them when you do these computations," says Avishek Biswas, an MIT graduate student in electrical engineering and computer science, who led the new chip's development. "Since these machine-learning algorithms need so many computations, this transferring back and forth of data is the dominant portion of the energy consumption. But the computation these algorithms do can be simplified to one specific operation, called the dot product. Our approach was, can we implement this dot-product functionality inside the memory so that you don't need to transfer this data back and forth?"
Does anyone know how MIT's new chips stack up against what Google already has in operation?
This seems to be different.
Google's TPUs reduce power and increase speed, but are targeted for internal use in data centers. You can't buy one.
This MIT chip is targeted toward home use and mobile devices.
Both chips do fast low precision matrix ops. The TPU uses eight bit multipliers. TFA is poorly written, but it appears that the MIT chip does analog multiplication. From TFA: In the chip, a node’s input values are converted into electrical voltages and then multiplied by the appropriate weights. Summing the products is simply a matter of combining the voltages. Only the combined voltages are converted back into a digital representation and stored for further processing.
If this is true, then that could be a huge boost in efficiency, but results would not be exactly repeatable: You could get different results for the exact same inputs.
Another feature is that the neurons in each layer produce a single binary output. That is obviously simpler than the TPU's 8-bit outputs, and is analogous to how biological neurons work. But it limits which algorithms can be used. RBMs (Restricted Boltzmann Machines) use single bit outputs, and were used in the first successful "deep" networks, but have more recently fallen out of favor. Single bit outputs make backprop more difficult, although it sounds like this chip is targeted more for deployment than for learning.
how is what they are proposing much better than existing GPU's?
How about reading the summary?
GPU's aren't exactly known for being energy efficient.
This chip is more energy efficient since it doesn't need to move the data to a central processor that might even be on another chip.
It distributes the ALU's among the memory so it doesn't have to move the data as far.
Also to get an idea of the scale we are working with here, speed of light / 5 cm is about 6 GHz.
If you want to work fast you don't want to move data long distances.
There is a limit to how fast information can travel and on a bidirectional bus you have to wait until the last word reaches the destination until you switch direction.
Reduce the data path to a mm and you have a lot more margin to work with.