Slashdot Mirror


Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs (pcworld.com)

Four years ago, Google was faced with a conundrum: if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers just to handle all of the requests to the machine learning system powering those services, reads a PCWorld article, which talks about how Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks came into being. The article shares an update: Google published a paper on Wednesday laying out the performance gains the company saw over comparable CPUs and GPUs, both in terms of raw power and the performance per watt of power consumed. A TPU was on average 15 to 30 times faster at the machine learning inference tasks tested than a comparable server-class Intel Haswell CPU or Nvidia K80 GPU. Importantly, the performance per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.

91 comments

  1. I for one, by Bodhammer · · Score: 3, Funny

    Welcome our new Google overlords. (or whatever...)

    --
    "I say we take off, nuke the site from orbit. It's the only way to be sure."
    1. Re:I for one, by Hognoxious · · Score: 3, Funny

      No point saying it, dude. They already know.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  2. A purpose built chip by Anonymous Coward · · Score: 5, Insightful

    outperforms general purpose chips?

    Wow.

    1. Re: A purpose built chip by Anonymous Coward · · Score: 1

      My thoughts exactly. How is it even the least but surprising that custom silicon is better than general purpose

    2. Re:A purpose built chip by MightyYar · · Score: 1

      While there is some truth to that, this "purpose built" chip's purpose is to run an open-source AI language. So this is more interesting than a typical custom ASIC.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    3. Re:A purpose built chip by NatasRevol · · Score: 1

      ASICs everywhere are embarrassed by how slow the TPUs are.

      --
      There are two types of people in the world: Those who crave closure
    4. Re:A purpose built chip by NatasRevol · · Score: 1

      How so?

      Custom chip built for how the bits are handled is .... typical of an ASIC.

      --
      There are two types of people in the world: Those who crave closure
    5. Re: A purpose built chip by Anonymous Coward · · Score: 1

      I suppose the application is *slightly* more broad than a typical ASIC but yea, looks like some marketing article.

    6. Re:A purpose built chip by Anonymous Coward · · Score: 0

      "outperforms general purpose chips?" should read: "outperforms general purpose chips by 15 to 30 times".

    7. Re:A purpose built chip by ShanghaiBill · · Score: 4, Informative

      How so?

      The TPU is a "purpose built" chip, but that purpose is very broad. It is optimized for massively parallel low-precision matrix operations, which is useful not only for neural nets, but also simulation of physical processes like CFD, weather prediction, climate models, computational chemistry, etc. It can do everything a GPU can do except the rasterization and texture mapping, but it can do it faster and with much less power.

    8. Re:A purpose built chip by Anonymous Coward · · Score: 0

      That sounds a lot like, it can't do everything a GPU can do, but GPUs have vector processors which these have in common. That sounds very narrow.

    9. Re: A purpose built chip by ChristopherSkinner · · Score: 1

      GPU chips are a bunch of multipliers. Floating point, which is, of course, integer , with an exponential. The press release talks about "inference" as if this a math function. My guess is Weighted multipliers. ...If Ggle have in fact made more efficient multipliers, then you gamers can turn off those noisy fans. Or they faster? It seemed to be saying they were more efficient.

    10. Re:A purpose built chip by Baloroth · · Score: 4, Informative

      It is optimized for massively parallel low-precision matrix operations, which is useful not only for neural nets, but also simulation of physical processes like CFD, weather prediction, climate models, computational chemistry, etc.

      Maybe, but I doubt it. It's far too low precision, for one thing: 8-bits doesn't get you very far in any of those fields (you typically want at least 32-bit FLOPS for those, and quite often 64-bit precision is required, as numerical errors accumulate exponentially in a chaotic system), and they're really not even big matrices (just 256x256). Really the only place this kind of thing would excel is signal processing, which is basically what they're using them for.

      --
      "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
    11. Re:A purpose built chip by Tough+Love · · Score: 1

      It takes considerable organizational effort to push an ASIC all the way through through the pipe from design to production. Even budgeting and staffing are nontrivial. The technology might not be earth shattering, but the engineering process is respectable. And who knows, the technology might be earth shattering. But probably not. It uses numerical methods, analog would be faster and more interesting.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    12. Re:A purpose built chip by psmoot · · Score: 1

      I can't count how many times someone thought they could build an ASIC to optimize some computation, only to get crushed by Moore's Law operating on general purpose CPUs.

      Never fight a land war during a Russian winter. Never bet against Ethernet. Never bet against Moore's Law.

    13. Re:A purpose built chip by MightyYar · · Score: 1

      By your broad definition, a GPU is also a "typical ASIC".

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    14. Re: A purpose built chip by ArmoredDragon · · Score: 1

      Not sure why they didn't just call it what it is: ASIC.

    15. Re:A purpose built chip by Anonymous Coward · · Score: 0

      you mean just like any other everyday ASIC that is custom built for a specific workload. saying it can do everything a GPU can do EXCEPT rasterization and texture mapping is like saying my house can do everything my car can do except move.

    16. Re: A purpose built chip by Tough+Love · · Score: 1, Insightful

      Actually, what is really surprising is that Google considered the project worth doing to get only 15-30% advantage vs GPU, if those numbers are accurate. In the best case, this buys roughly an 18 month advantage before GPUs get faster and the engineering has to be done all over again, or the project will just go the way of other Google abandonware. And in that brief window, do saved operating costs justify the sunk engineering and fabrication cost? I doubt it.

      Now, on second look, this smells like a vanity project more than anything.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    17. Re: A purpose built chip by Anonymous Coward · · Score: 0

      It's entirely possible Google funded the project not even knowing the increase they would get.

    18. Re:A purpose built chip by Anonymous Coward · · Score: 0

      Can it mine Bitcoins? (This will be the new "can it play Crysis".)

    19. Re: A purpose built chip by Anonymous Coward · · Score: 0

      TFS said times, not percent

    20. Re: A purpose built chip by religionofpeas · · Score: 1

      They are 15-30 times faster, not 15-30%. That's a huge difference. And this is only the first version, so it is likely that the TPU can be improved faster than GPUs that have been on the market for years.

    21. Re:A purpose built chip by religionofpeas · · Score: 1

      It will take a while for Moore's law to catch up with 15-30 times speed improvement, and even better power improvement.

      And Moore's law also helps this chip.

    22. Re:A purpose built chip by religionofpeas · · Score: 1

      It's not that obvious when you're talking about floating point calculations in combination with external memory. A GPU is highly optimized for both of those requirements, and it's not all that simple to make an ASIC that does this better. The main reason Google got such an improvement is because the require much less precision in their results.

    23. Re: A purpose built chip by Tough+Love · · Score: 3, Informative

      They are 15-30 times faster, not 15-30%.

      Every little order of magnitude really helps :)

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    24. Re: A purpose built chip by s_p_oneil · · Score: 2

      I'm sure you've already seen the "it's 15-30 times, not 15-30 percent" replies. There's also the "performance per watt of the TPU was 25 to 80 times better". Can you imagine how much money this can save Google in electricity costs? It's 1-2 orders of magnitude better (10-100 times), with the possibility that they will continue to find dramatic improvements.

      If we equate your assessment with a "bunt", what Google really did is knocked the ball out of the park.

    25. Re:A purpose built chip by David_Hart · · Score: 1

      By your broad definition, a GPU is also a "typical ASIC".

      Yep... A GPU is basically an ASIC as well. It's programmed to do one thing well. That it can be used for other things that use similar calculations was pure coincidence (i.e. bitcoin mining).

    26. Re:A purpose built chip by NatasRevol · · Score: 1

      Well, since it's not a CPU, yes it's an ASIC.

      --
      There are two types of people in the world: Those who crave closure
    27. Re: A purpose built chip by Jayfar · · Score: 1

      Not sure why they didn't just call it what it is: ASIC.

      Well TFU kinda did that: "TPUs are what’s known in chip lingo as an application-specific integrated circuit (ASIC)."

    28. Re:A purpose built chip by psmoot · · Score: 1

      I know, that's what makes this a remarkable achievement. Many times in the past people have tried to do this but it took much longer than they anticipated. In the mean time, Intel, AMD, nVidia, ATI, or whoever managed to catch up and surpass the ASIC. It turned out the performance win from ASICs had a shorter shelf life than people realized.

      It seems times have changed. Google has a very specific workload which appears to be different from what the mainstream processors have optimized for. The easy (easier?) part of Moore's Law (crank cycle times and add ALUs/FPUs) seems over. As a result, Google has a great win, yay for them. If this turns out to be a large market, I'll be surprised if neural net circuits don't show up in mainstream CPUs and GPUs in the next few generations. Then it becomes a race to see who can optimize the fab process fastest.

    29. Re:A purpose built chip by MightyYar · · Score: 1

      Agreed that a GPU is an ASIC. I don't think they are "typical" in a few senses, but mainly the crowd around here go crazy over them.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    30. Re:A purpose built chip by MightyYar · · Score: 1

      Not sure where I said otherwise.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  3. Wait you mean an ASIC is fast? Why I never! by Sycraft-fu · · Score: 5, Informative

    Man is this a "duh" moment. Purpose built ASICs are extremely fast and low power for what they accomplish. That's why we use them. Look at a small desktop network switch: Little tiny processor that can pass 16gb/sec of traffic around. try and put 8 NICs in a computer and have it switch traffic and you'll be amazed at how much power you need. The reason the switch is small is it is purpose built: It's ASIC does nothing but switch Ethernet packets.

    Same deal with some thing on a CPU. You find that decoding an AVC video stream takes next to no CPU power on modern CPUs, yet decoding an MPEG-2 video takes some. Why? Because they have a small bit of dedicated logic for AVC decoding (usually some other formats too). It is low power because it is dedicated.

    Always the question in designing a system is flexibility and unit cost vs fixed function and up front cost. A CPU is great because it can do anything, and you can just buy them straight out, tons of companies have them available for purchase right now. However they take a lot of silicon and power to perform a given task. An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin. In the middle there is something like an FPGA. Which one is right for a application just depends on the balance of a lot of factors.

    1. Re:Wait you mean an ASIC is fast? Why I never! by chispito · · Score: 3, Insightful

      An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin.

      Per TFA, the chips they designed are flexible enough to apply to new machine learning models. I think the point is that this was a space ripe for customized architecture, like graphics cards were 15-20 years ago.

      --
      The Daddy casts sleep on the Baby. The Baby resists!
    2. Re:Wait you mean an ASIC is fast? Why I never! by Nukenbar · · Score: 2

      There is a reason that all of the Bitcoin miners are ASIC based now.

      Don't expect those machines do be able to do anything else though if bitcoin dies off.

    3. Re:Wait you mean an ASIC is fast? Why I never! by thegarbz · · Score: 1

      Purpose built ASICs are extremely fast and low power for what they accomplish

      And they have very specific algorithms. Certainly nothing traditionally resembling "machine learning".

    4. Re:Wait you mean an ASIC is fast? Why I never! by Anonymous Coward · · Score: 0

      I work in the field, and have looked at alternatives. Google isn't the first to do this.

      But it's important to understand that all these chips will work at new machine learning models, provided that they're fairly densely, fairly regularly connected neural networks. And that is a major limitation.

      Another peeve: these are not "Machine Learning" chips. Inference is what you do _after_ machine learning has completed. That's why Google gets away with low precision.

  4. Some SV chip startups by Anonymous Coward · · Score: 0

    Just got a big bump in market valuation.

    1. Re: Some SV chip startups by Anonymous Coward · · Score: 0

      Such as?

  5. Say what... by __aaclcg7560 · · Score: 1

    I thought the TPU was for hard drive encryption. Or is it doing double duty?

    1. Re:Say what... by itsownreward · · Score: 4, Informative

      You're thinking of a TPM. This is a TPU.

    2. Re:Say what... by Anonymous Coward · · Score: 0

      TPM

    3. Re: Say what... by Anonymous Coward · · Score: 0

      TPM? TPU? Whatever, just remember your cover sheet.

    4. Re: Say what... by Anonymous Coward · · Score: 0

      Encryption needs large integer multiplication. Which is like GPU with a carry bit between multipliers.
      Division (modulus) is multiplication and subtractions

    5. Re: Say what... by Anonymous Coward · · Score: 0

      Actually cover sheets are TPS, as in TPS reports.

      From the movie Office Space.

    6. Re: Say what... by Anonymous Coward · · Score: 0

      Whoosh...

  6. Performance bottleneck by speedplane · · Score: 0

    if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers

    The performance bottleneck in machine learning is training the system and the amount of training data, not the number of users running the model. Not sure I understand how usage is so directly proportional to computing costs.

    --
    Fast Federal Court and I.T.C. updates
    1. Re: Performance bottleneck by Anonymous Coward · · Score: 0

      That's true for small and medium scale applications. For Google scale applications it's the number of users running the model.

      There are over a billion Android phones, and loads of other voice-enabled devices like Chromebooks. To process those sorts of numbers requires serious resources.

    2. Re: Performance bottleneck by Anonymous Coward · · Score: 0

      But the training of neural networks is what requires significant resources. Utilization of a trained NN is simple enough that he process could be outsourced to the local device. At the very least, the brunt of the work could be client side, feeding only what is absolutely necessary to the server. So still not sure why this was necessary.

    3. Re: Performance bottleneck by aussie_a · · Score: 1

      It's not about training the neural network. It'd about data mining and monetizing the customers. That's why everything Google phones home.

  7. But... by BobbyLovell · · Score: 0

    How does the TPU do on regular CPU and GPU type tasks? It's really an Apples to Oranges comparison either way.

    1. Re:But... by Anonymous Coward · · Score: 0

      It stares blankly.

    2. Re: But... by Anonymous Coward · · Score: 0

      Nicely put. Maybe a whoosh, but worth it.

  8. They really need to rename this stuff by Anonymous Coward · · Score: 0

    Algorithms and processor sets are not artificial intelligence and neural networks. Biggest fish story of the 21st century. Cool technology, horrendously disengenuous PR narrative.

    1. Re:They really need to rename this stuff by Anonymous Coward · · Score: 0

      How can you be sure? Is there really more to intelligence than that? Maybe it is just the same, but on a higher level, so we don't understand its complex functions.

  9. 15-30x the speed by fred6666 · · Score: 2

    But 1000x as expensive?

    1. Re:15-30x the speed by religionofpeas · · Score: 1

      Energy cost is lower, and those will be dominant over longer term.

  10. That's a funny word for ... by Anonymous Coward · · Score: 0

    outsourced foreign cheap labor!

  11. Does this add up (CPU vs GPU vs TPU?) by Anonymous Coward · · Score: 0

    Shouldn't the performance numbers between CPU and GPU be different enough that comparing a TPU to them would produce very different numbers.

    The performance improvement range seems to be too small, right?

    1. Re:Does this add up (CPU vs GPU vs TPU?) by bugs2squash · · Score: 1

      maybe it's a task that is not well suited to the GPU, so it performs little better than general purpose hardware.

      --
      Nullius in verba
  12. Sounds familiar by Anonymous Coward · · Score: 0

    Wasn't there some movie where they have these cool chips and then the robots take over the world and some dude has to go back in time to bang his friend's mother?

  13. Why link to an article that regurgitates the same by Anonymous Coward · · Score: 0

    Why link to an article that regurgitates the same information, when you could link to the actual blog post by Google: https://cloudplatform.googleblog.com/2017/04/quantifying-the-performance-of-the-TPU-our-first-machine-learning-chip.html

    And while we are at it, why not also link to the paper: https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk/view

  14. Thanks for letting me know I'm winning! by Anonymous Coward · · Score: 0

    See subject: You impersonate me & downmod my real posts & can't prove me wrong technically/validly so I've won, obviously!

    * LOL - & you do it via UNIDENTIFIABLE anonymous posts (also proving you're some BUTTHURT loser I've torn up so badly in technical debates on hosts files you're reduced to such BITCH tactics, lmao...)

    APK

    P.S.=> Thanks whoever you are impersonating me + doing so by UNIDENTIFIABLE cowardly trolling worm stalking/harassing tactics - you're just tipping your hand you can't get the better of me... apk

  15. Re: Yous fake APK by Anonymous Coward · · Score: 0

    So you're saying, no one blows harder than you?

  16. Prime directive! Bah! by kimgkimg · · Score: 1

    Oh good, so our dystopian future can be realized just that much faster then...

  17. amazing! by tommeke100 · · Score: 1

    but how many fps does it get running the new Mass Effect? Oh it can't?

    1. Re:amazing! by Anonymous Coward · · Score: 0

      CNN: Google's New Computer Racist and Sexist, Won't Run Games Developed by Diversity Hires

    2. Re:amazing! by Anonymous Coward · · Score: 0

      How many Bitcoins can it mine per day? That's what the Google Admins will be asking.

  18. No, they don't by fyngyrz · · Score: 1

    Algorithms and processor sets are not artificial intelligence and neural networks

    That's like saying a software defined radio is not a radio.

    It's right -- but it's also completely wrong.

    And the important part in the context here... yeah, the completely wrong part.

    You can create a perfectly fine neural network with a general purpose von Neuman or Harvard architecture CPU. Speed and efficiency are issues, that's all, and that's what the TPU is designed to address.

    --
    I've fallen off your lawn, and I can't get up.
  19. Inherent stratification by fyngyrz · · Score: 1

    Energy cost is lower, and those will be dominant over longer term.

    This is likely another demonstration of "those who have the money, make more money."

    Solar panels: You can save all kinds of money. If you can afford to install the system in the first place.
    Investments: You can make all kinds of interest. If you have money to invest.
    Toilet paper: You can save lots of money. If you buy it in bunches on sale. But if you can't spare the funds... your TP costs more than the person with a few bucks to spare who buys it in bulk. Likewise has storage space for it, etc.

    And so on.

    --
    I've fallen off your lawn, and I can't get up.
  20. How about comparing vs the latest GPU and CPUs? by Anonymous Coward · · Score: 0

    Both the K80 and Haswell are a couple of generations old - I'd like to see the performance increase vs Pascal based GPU cards and whatever is the latest in the Intel camp.

  21. Machine language example? by Tablizer · · Score: 1

    What does the machine language for these things look like? Does anybody know of a bare-bones example to illustrate how it does a simple sample neural net? Is it only for the offset shifting kind of NN's common for language AI, or other kinds also?

  22. There, I fixed it for you. by JustNiz · · Score: 1

    >> Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs AT MACHINE LEARNING

    There, I fixed it for you.

    1. Re: There, I fixed it for you. by religionofpeas · · Score: 1

      Thanks for fixing, but it was obvious for everyone else.

  23. Make the web 100% faster vs. Google ads by Anonymous Coward · · Score: 0

    See subject & nothing does it as efficiently as APK Hosts File Engine 9.0++ SR-7 32/64-bit https://www.google.com/search?hl=en&source=hp&biw=&bih=&q=%22APK+Hosts+File+Engine%22+and+%22start64%22&btnG=Google+Search&gbv=1/

    Ads/script & malware rob speed/security/privacy

    Hosts add speed (via hardcodes/adblocks), security (vs. bad sites/malware/poisoned dns), reliability (vs. dns down), & anonymity (vs. dns requestlogs/trackers).

    Less power/cpu/ram + IO use vs. DNS/routers/addons/antivirus + less security bugs/complexity & faster vs. addons/routers/remote dns!

    Avoids DNSChangers in routers/IP settings & dns redirects (99.999% of ISP DNS != patched vs. it) + lightens DNS load & resolves faster from local system RAM!

    * Via what u NATIVELY have in the IP stack in FASTER kernelmode!

    APK

    P.S. - Safe https://www.virustotal.com/en/file/e01211ca36aa02e923f20adee0a3c4f5d5187dc65bdf1c997b3da3c2b0745425/analysis/1433430542/

  24. EAT YOUR WORDS, blowhard (lol) by Anonymous Coward · · Score: 0

    See subject: Many like + use my work (quoted): I'm going to continue using the Host File Engine. Your software is well written, functional. The Host File Engine performs exactly as promised by mmell

    his hosts program is actually pretty good by xenotransplant

    I've never tried to belittle (APK's) work, I've flat out said it's good by BronsCon

    APK is kinda right. I've tried his hosts file generating software. It works by bmo

    I like your host file system by Karmashock

    I find your hosts file admirable by vel-ex-tech

    his hosts tool is actually useful for those cases in which one does indeed want to locally block stuff outright while consuming minimum system resources by alexgieg

    * Recommended & hosted by Malwarebytes' hpHosts!

    APK

    P.S.=> No one's as big a BLOWHARD as you UNIDENTIFIABLE cowardly troll! Thanks 4 proving I'm winning when all u have's stalking me & downmodding my posts yet not proving me technically wrong... apk

  25. Will the chip be available to non-Googlers? by wisebabo · · Score: 1

    (Disclaimer, not an AI or machine learning expert but interested in learning!)

    So will this chip (or board) be available outside of google? I've heard they've released (some of) their AI/Machine learning code, would be good if once you made a working application you could buy one of these things and speed it up. Would be especially useful for applications where access to the cloud was unavailable or intermittent at best (think self driving cars, drones, spacecraft).

    I guess a PCI card that would go in a server would be best but maybe a dedicated peripheral could work

    Any other companies working on similar hardware? Are there any standards, like Open GL for AI?

    1. Re:Will the chip be available to non-Googlers? by religionofpeas · · Score: 1

      Check out Tensorflow.

    2. Re:Will the chip be available to non-Googlers? by wisebabo · · Score: 1

      So they've open sourced the software? That's good, but no chip will be available?

    3. Re:Will the chip be available to non-Googlers? by Anonymous Coward · · Score: 0

      Anything is possible. If open sourcing or selling these AI chips, helps promote growth of industry in an arena they are interested in then they very well make them available.

  26. A device specifically designed for one task... by Anonymous Coward · · Score: 0

    is faster at that task than a device designed for something else!

    THIS IS NEWS THAT MATTERS?!?!?

  27. 15x by Anonymous Coward · · Score: 0

    How much?

    1. Re:15x by trevc · · Score: 1

      not much

  28. Bob the SuperWeasel EATS HIS WORDS by Anonymous Coward · · Score: 0

    Your "YOUS" gave it away & how'd your words taste as you EAT THEM (trying to impersonate me Bob) https://politics.slashdot.org/comments.pl?sid=10458715&cid=54192877/ ?

    * Bit like your FOOT IN YOUR MOUTH ramming them back down your chicken-neck throat & washing them down w/ the bitter taste of SELF-defeat? Yes... lol!

    APK

    P.S.=> Eating your words isn't GOOD nutrition Bob the superWEASEL (hopefully you'll die of malnutrition)... apk