MIT Develops New Chip That Reduces Neural Networks' Power Consumption by Up to 95 Percent (mit.edu)
MIT researchers have developed a special-purpose chip that increases the speed of neural-network computations by three to seven times over its predecessors, while reducing power consumption 94 to 95 percent. From a report: That could make it practical to run neural networks locally on smartphones or even to embed them in household appliances. "The general processor model is that there is a memory in some part of the chip, and there is a processor in another part of the chip, and you move the data back and forth between them when you do these computations," says Avishek Biswas, an MIT graduate student in electrical engineering and computer science, who led the new chip's development. "Since these machine-learning algorithms need so many computations, this transferring back and forth of data is the dominant portion of the energy consumption. But the computation these algorithms do can be simplified to one specific operation, called the dot product. Our approach was, can we implement this dot-product functionality inside the memory so that you don't need to transfer this data back and forth?"
The tensor processing units Google developed seem also very capable compared to regular processors. Does anyone know how MIT's new chips stack up against what Google already has in operation?
That sounds like something an FPGA could do from the very beginning.
The only new thing here would be possibly LARGER amounts of memory stored inbetween the fabric (reducing off-chip access, and increased number of LUTs not tied up as memory cells), and possibly like they said, combined "access and modify" operations.
But I think the article itself doesn't understand what it's talking about then.
And as general purpose as FPGA are in idea, they "custom adapted" to different tasks (and layout/fabric) since inception. So the question here is, are they talking about some kind of ASIC advancement that they didn't have before?
>The chip can thus calculate dot products for multiple nodes — 16 at a time, in the prototype — in a single step, instead of shuttling between a processor and memory for every computation.
This appears to be the only actual advancement/tech/change, being extruded out into an entire fluff article for college PR purposes.
Personally, I'm way more interested in getting my hands on an "FPGA in CPU" ever since back in college when Altera was bought by Intel. Imagine a CPU that can be told to add CUDA cores when you start a game, or SHA cores when you start a server. Altera specializes is live reconfigurable FPGAs. FPGA's that can be "flashed" in whole or in part while still running.