Slashdot Mirror


Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs (pcworld.com)

Four years ago, Google was faced with a conundrum: if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers just to handle all of the requests to the machine learning system powering those services, reads a PCWorld article, which talks about how Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks came into being. The article shares an update: Google published a paper on Wednesday laying out the performance gains the company saw over comparable CPUs and GPUs, both in terms of raw power and the performance per watt of power consumed. A TPU was on average 15 to 30 times faster at the machine learning inference tasks tested than a comparable server-class Intel Haswell CPU or Nvidia K80 GPU. Importantly, the performance per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.

3 of 91 comments (clear)

  1. A purpose built chip by Anonymous Coward · · Score: 5, Insightful

    outperforms general purpose chips?

    Wow.

    1. Re: A purpose built chip by Tough+Love · · Score: 1, Insightful

      Actually, what is really surprising is that Google considered the project worth doing to get only 15-30% advantage vs GPU, if those numbers are accurate. In the best case, this buys roughly an 18 month advantage before GPUs get faster and the engineering has to be done all over again, or the project will just go the way of other Google abandonware. And in that brief window, do saved operating costs justify the sunk engineering and fabrication cost? I doubt it.

      Now, on second look, this smells like a vanity project more than anything.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
  2. Re:Wait you mean an ASIC is fast? Why I never! by chispito · · Score: 3, Insightful

    An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin.

    Per TFA, the chips they designed are flexible enough to apply to new machine learning models. I think the point is that this was a space ripe for customized architecture, like graphics cards were 15-20 years ago.

    --
    The Daddy casts sleep on the Baby. The Baby resists!