Slashdot Mirror


MIT Develops New Chip That Reduces Neural Networks' Power Consumption by Up to 95 Percent (mit.edu)

MIT researchers have developed a special-purpose chip that increases the speed of neural-network computations by three to seven times over its predecessors, while reducing power consumption 94 to 95 percent. From a report: That could make it practical to run neural networks locally on smartphones or even to embed them in household appliances. "The general processor model is that there is a memory in some part of the chip, and there is a processor in another part of the chip, and you move the data back and forth between them when you do these computations," says Avishek Biswas, an MIT graduate student in electrical engineering and computer science, who led the new chip's development. "Since these machine-learning algorithms need so many computations, this transferring back and forth of data is the dominant portion of the energy consumption. But the computation these algorithms do can be simplified to one specific operation, called the dot product. Our approach was, can we implement this dot-product functionality inside the memory so that you don't need to transfer this data back and forth?"

12 of 55 comments (clear)

  1. How does this compare with Google's? by Mostly+a+lurker · · Score: 3, Interesting

    The tensor processing units Google developed seem also very capable compared to regular processors. Does anyone know how MIT's new chips stack up against what Google already has in operation?

    1. Re:How does this compare with Google's? by DrTJ · · Score: 4, Insightful

      The MIT press release says next to nothing, unfortunately. AFAICT, they don't reference any published article, or any kind of link to more information, so it is hard to assess. I really wanted to know more so I'm a little disappointed with MIT.

      There are a few things that indicates that this is not even comparable to Google TPU:
      1. The lack of more information.
      2. They label it as a prototype.
      3. The top person link goes to a first year graduate student (making a real ASIC takes a slightly larger team, I hear).

      Without more detailed information, this is hard to distinguish from PR.

    2. Re:How does this compare with Google's? by bluefoxlucid · · Score: 2

      I'd hope the MIT chip could do math better. 3-7 times faster, 5%-6% as much power draw? That's 0.7%-2% as much power consumption per computational operation.

    3. Re:How does this compare with Google's? by ShanghaiBill · · Score: 4, Informative

      Does anyone know how MIT's new chips stack up against what Google already has in operation?

      This seems to be different.

      Google's TPUs reduce power and increase speed, but are targeted for internal use in data centers. You can't buy one.

      This MIT chip is targeted toward home use and mobile devices.

      Both chips do fast low precision matrix ops. The TPU uses eight bit multipliers. TFA is poorly written, but it appears that the MIT chip does analog multiplication. From TFA: In the chip, a node’s input values are converted into electrical voltages and then multiplied by the appropriate weights. Summing the products is simply a matter of combining the voltages. Only the combined voltages are converted back into a digital representation and stored for further processing.

      If this is true, then that could be a huge boost in efficiency, but results would not be exactly repeatable: You could get different results for the exact same inputs.

      Another feature is that the neurons in each layer produce a single binary output. That is obviously simpler than the TPU's 8-bit outputs, and is analogous to how biological neurons work. But it limits which algorithms can be used. RBMs (Restricted Boltzmann Machines) use single bit outputs, and were used in the first successful "deep" networks, but have more recently fallen out of favor. Single bit outputs make backprop more difficult, although it sounds like this chip is targeted more for deployment than for learning.

    4. Re:How does this compare with Google's? by ShanghaiBill · · Score: 2

      Price and computational utility aside, they sound GREAT for researching how biological neural networks work.

      I doubt that. This chip is designed to do fast and efficient matrix operations, which only work well if the neurons are in distinct and ordered layers. Biological brains don't do that . Also, biological brains learn by strengthening connections as they are used in a process very different from the backprop algorithm used in ANN, and it isn't clear if this new chip actually does any learning rather than just running a pre-programmed network.

      We will learn much more about biological brains from projects like OpenWorm, which is an attempt to understand and emulate the brain of C. elegans, a nematode.

      It is not clear that ANNs will be improved much by better understanding of BNNs. They work in different ways, and ANNs are much faster. You may be better than a computer at face recognition, but the computer is improving quickly and is WAY faster, scanning thousands of images per second.

    5. Re: How does this compare with Google's? by burtosis · · Score: 2

      I'm jaded because I had my technology push multi-million dollar startup stolen even though I was involved with several professors and graduate students, and licensed the technology from the university. We weren't allowed to buy off the patent costs, which for this portfolio (and due to government waste) wound up being around 400 thousand USD. Though I did have to pay off those costs before I was allowed any income disbursements from the royalty payments I made to myself, so in essence I paid for the patent portfolio, but retained no rights to it. Its quite likely the same story with MIT. They are so fucked, I feel sorry for those grad students. Without taking on money, they will have no product, if they take on money there is a 99% chance of complete and total fuckage. Trust me, I lived through exactly this scenario.

  2. Just imagine by pablo_max · · Score: 2

    Just imagine a Beowulf cluster of these things ;)

  3. "Smart" homes are stupid by pablo_max · · Score: 2

    Looking at what is available today, I would have to say that today's smart world is incredibly stupid. Not to mention fractured with loads of standards, apps and do-dads.
    When Google took over Nest I had high hopes.
    I had imagined they would do something clever like install their phased array mics into the "smart" fire alarms that could be in almost every room. Then from anywhere you could ask google something. But no... you need to find some stupid crappy little speaker and keep shouting HEY, GOOGLE, HEY GOOGLE, HEY GOOGLE, until it finally can hear you.
    With these chips, they could take that idea even further. Install connect appliances, connected switches and sockets and then figure out the patterns of usage and voices to "learn your ways" and begin to antisipate thing.
    Oh.. Bob always turns on the TV right after he grabs a beer from the fridge around 6pm. The fridge just opened, so I will turn on the TV for him.. also it was cold today, so i will adjust the heat in that room so Bob's ass doesnt get to cold on his leather lazy-boy.
    I should think that is all totally possible today.

  4. Color me skeptical by ckatko · · Score: 4, Interesting

    That sounds like something an FPGA could do from the very beginning.

    The only new thing here would be possibly LARGER amounts of memory stored inbetween the fabric (reducing off-chip access, and increased number of LUTs not tied up as memory cells), and possibly like they said, combined "access and modify" operations.

    But I think the article itself doesn't understand what it's talking about then.

    And as general purpose as FPGA are in idea, they "custom adapted" to different tasks (and layout/fabric) since inception. So the question here is, are they talking about some kind of ASIC advancement that they didn't have before?

    >The chip can thus calculate dot products for multiple nodes — 16 at a time, in the prototype — in a single step, instead of shuttling between a processor and memory for every computation.

    This appears to be the only actual advancement/tech/change, being extruded out into an entire fluff article for college PR purposes.

    Personally, I'm way more interested in getting my hands on an "FPGA in CPU" ever since back in college when Altera was bought by Intel. Imagine a CPU that can be told to add CUDA cores when you start a game, or SHA cores when you start a server. Altera specializes is live reconfigurable FPGAs. FPGA's that can be "flashed" in whole or in part while still running.

    1. Re:Color me skeptical by religionofpeas · · Score: 2

      FPGA aren't really good for massive amounts of multiplications. Modern FPGAs have dedicated multipliers, but they only have a few of them. And the reason they have dedicated multipliers is because the general FPGA fabric sucks at doing multiplications.

  5. Sounds like hybrid memputing by wierd_w · · Score: 2

    Such things include "Computational Ram"
    https://en.wikipedia.org/wiki/...

    There is also a very old idea of using memory elements directly to compute results, which is true memputing. (There are few examples of this, because it is costly as an architecture-- but your brain is a pretty good biological example. The same components are used for data storage, as well as data processing.)

    Given that such "Computational Ram" devices already exist in the wild, I fail to see why more novel hardware is needed, excepting as a refinement of concept?

  6. Re:Video Cards by Anonymous Coward · · Score: 2, Informative

    how is what they are proposing much better than existing GPU's?

    How about reading the summary?

    GPU's aren't exactly known for being energy efficient.
    This chip is more energy efficient since it doesn't need to move the data to a central processor that might even be on another chip.
    It distributes the ALU's among the memory so it doesn't have to move the data as far.

    Also to get an idea of the scale we are working with here, speed of light / 5 cm is about 6 GHz.
    If you want to work fast you don't want to move data long distances.
    There is a limit to how fast information can travel and on a bidirectional bus you have to wait until the last word reaches the destination until you switch direction.
    Reduce the data path to a mm and you have a lot more margin to work with.