Slashdot Mirror


Google Announces 8x Faster TPU 3.0 For AI, Machine Learning (extremetech.com)

At its developer conference yesterday, Google announced third-generation TPUs (Tensor Processing Units) for AI and machine learning, which are eight times more powerful than the Google TPU 2.0 pods with up to 100 petaflops in performance. They're so power-hungry that they require water cooling -- something previous TPUs haven't required. ExtremeTech reports: So what do we know about TPU 3.0? Not much -- but we can make a few educated guesses. According to Google's own documentation, TPU 1.0 was built on a 28nm process node at TSMC, clocked at 700MHz, and consumed 40W of power. Each TPU PCB connected via PCIe 3.0 x16. TPU 2.0 made some significant changes. Unlike TPU v1, which could only handle 8-bit integer operations, Google added support for single-precision floats in TPU v2 and added 8GB of HBM memory to each TPU to improve performance. A TPU cluster consists of 180 TFLOPS of total computational power, 64GB of HBM memory, and 2,400GB/s of memory bandwidth in total (the last thrown in purely of the purposes of making PC enthusiasts moan with envy).

No word yet on other advanced capabilities of the processors, and they are supposedly still for Google's own use, rather than wider adoption. Pichai claims TPU v3 can handle 100 PFLOPS, but that has to be the clustered variant, unless Google is also rolling out a new tentative project we'll call "Google Stellar-Equivalent Thermal Density." We would've expected to hear about it, if that was the case. As more companies flock to the AI / ML banner, expect to see more firms throwing their hats into this proverbial ring.

27 comments

  1. HBM memory by Anonymous Coward · · Score: 2, Informative

    High Bandwidth Memory memory

    1. Re:HBM memory by Anonymous Coward · · Score: 0

      How high bandwidth? That PCIe channel seems like a very tiny straw.

    2. Re:HBM memory by Anonymous Coward · · Score: 0

      100 PFLOPS? So it's like 10 1080 GTX Ti cards? Doesn't actually sound *that* impressive. Especially when commodity hardware will be *orders of magnitudes* cheaper.

    3. Re:HBM memory by Anonymous Coward · · Score: 0

      Have you seen this Slashdot video yet? Have you bought the family friendly Goat C shirt?

      - FatCashewsLoveMe

    4. Re:HBM memory by Anonymous Coward · · Score: 0

      100 PFLOPS? So it's like 10 1080 GTX Ti cards? Doesn't actually sound *that* impressive. Especially when commodity hardware will be *orders of magnitudes* cheaper.

      Uhm, 10 1080 GTX Ti cards is about 100 TFlops. This is 100 PFlops. That's 1000x as much. You'd need 10000 1080 GTX Ti cards, and good luck finding a networking solution that would make those 10000 cards come together in a useful way.

  2. Oh Noes AI will take our jerbs by Anonymous Coward · · Score: 0

    The google AI's are coming for our jerbs.

  3. Microsoft's Approach Differs by lazarus · · Score: 3, Interesting

    In this particular case they seem to be bucking the silicon trend:

    "At its annual Build conference Monday, Microsoft will suggest companies with big AI ambitions should steer clear of chips like Google’s. It says machine learning is evolving so fast that it doesn’t make sense to burn today’s ideas permanently into silicon chips that could soon prove limiting or obsolete."

    --
    I am not interested in articles about life extension advancements.
    1. Re:Microsoft's Approach Differs by Anonymous Coward · · Score: 1

      *cough* 80x86 *cough*

      Microsoft will suggest companies with high-performance ambitions should steer clear [wired.com] of chips like Intel’s. It says high-performance computing is evolving so fast that it doesn’t make sense to burn today’s ideas permanently into silicon chips that could soon prove limiting or obsolete."

    2. Re:Microsoft's Approach Differs by drinkypoo · · Score: 3, Interesting

      In this particular case they seem to be bucking the silicon trend:

      They're not bucking anything. They're stumping for Azure. They want people to do their AI in the cloud, because there's a chance they'll do it in Microsoft's cloud.

      In any case, they're also wrong. If you want to do AI without the cloud, and you need high performance, you need specialized hardware. If you have a concept which can lead to a product right now, and it needs to work without the cloud, then you probably need this hardware (or something like it) right now.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:Microsoft's Approach Differs by mikael · · Score: 1

      For their business interests they are right. They can't pull AI back onto the desktop or the mobile device if the algorithms are locked into custom instruction sets or languages. For the customer, they get a higher performance/price ratio with custom cloud hardware.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    4. Re:Microsoft's Approach Differs by religionofpeas · · Score: 1

      Microsoft also tried to do music players and phones differently.

    5. Re:Microsoft's Approach Differs by phantomfive · · Score: 1

      TensorFlow is generic enough that it will likely be around for at least a decade, and these particular chips will be obsolete by new technology long before then. It's a sweet framework.

      Also, I don't know if it's accurate to say machine learning is evolving quickly......it would be more accurate to say researchers are exploring the solution space that recently became accessible as a result of recent increases in processing power. To really make an evolution we'd have to figure out how to break out of that solution space (I have some ideas on how to do that, but ideas are cheap).

      --
      "First they came for the slanderers and i said nothing."
    6. Re:Microsoft's Approach Differs by Rockoon · · Score: 1

      Are they right? it depends what you are looking for.

      The solution Google is offering is actually overly generic to the point of needing large die areas to solve anything useful in a reasonable amount of time. Google's example isnt one of efficiency. They are still throwing large racks of silicon at the same problems, and any honest comparison is surely going to include a discussion of cost per solution.

      --
      "His name was James Damore."
    7. Re:Microsoft's Approach Differs by religionofpeas · · Score: 1

      Microsoft's solution uses FPGAs. That's even more generic and inefficient. Neural net processing requires fast multiplications as well as memory access, two areas in which FPGAs are not particularly good.

  4. AI is the new bitcoin by Anonymous Coward · · Score: 0

    Today's fad is an old fad in new skin.

  5. But who cares by Anonymous Coward · · Score: 0

    "...still for Google's own use, rather than wider adoption"

    If none of us can use them then who cares how much faster they are than the previous version that we also couldn't use?

    1. Re: But who cares by Anonymous Coward · · Score: 0

      Who cares? Google recruiting team.

  6. Black Hole A.I. by Anonymous Coward · · Score: 0

    Water cooled huh? I'd be more impressed if it bent the time-space continuum.

    1. Re:Black Hole A.I. by Anonymous Coward · · Score: 0

      and if it used Push Technology.

  7. Ah yyiyeuh! 20 Gooble Niggas have SHitted to this by Anonymous Coward · · Score: 0

    Dayum. Theez fuccin gooble nigga did TPU 8x. WHo did dat?

    All they JAvascirpt have shittted rite into 2 this mufffuccin AI shiit

  8. 28nm?! by DontBeAMoran · · Score: 1

    What is this, 2011?

    --
    #DeleteFacebook
    1. Re:28nm?! by religionofpeas · · Score: 1

      The 28nm chip was the first TPU, not this latest version.

  9. And still not actually available on GCP, AWS wins by Anonymous Coward · · Score: 0

    I checked a few months back and a year after the original announcement TPU 2.0 is available in low quantities to a few GCP customers in a single region. In the meantime AWS provides actual P3 instances in multiple of their regions. Sure they are not as fast or 'cost efficient' but by being available, they are making money for Amazon. Sometimes I'm wondering if GCP is only there to annoy Amazon rather than be turned into a real product. At least it helps drive AWS prices down so both GCP and Azure are useful to me in a way.

  10. Compared to Top500 Supercomputers by chadkennedyonline · · Score: 2

    So at the 100 PFLOPS stated in the article, this thing ties with the worlds top supercomputer (https://en.wikipedia.org/wiki/TOP500#Top_10_ranking)? That's pretty nuts.

    1. Re:Compared to Top500 Supercomputers by slew · · Score: 1

      So at the 100 PFLOPS stated in the article, this thing ties with the worlds top supercomputer (https://en.wikipedia.org/wiki/TOP500#Top_10_ranking)?

      That's pretty nuts.

      Actually, this is 100 P- DL -FLOPS (DL=deep learning meaning 8-bit with shared exponent). Although the second generation (and presumably third gen) TPU can also do 16-bit floating point (and maybe FP32) for training, the quoted (i.e., not-to-be-exceed) number is the deep learning flops for inference/recall...

      In contrast, a typical supercomputer generally describes their performance for IEEE 64-bit double precision floating point (FP64)

      No doubt the later generations of TPUs will support some reasonable level of performance of 16bit (and maybe 32-bit) FP, but not likely at the peak rate for 8-bits.

      I also doubt they would even bother to support FP64 on a deep learning chip since FP64 is mostly used for discrete time dynamical simulations and other forms of finite element analysis where you want to limit error accumulation due to precision issues.

  11. Yes But... by Anonymous Coward · · Score: 0

    How fast do they calculate TPS reports? If they calculate the TPS reports 8x faster, "that would be great"!

    https://makeameme.org/meme/Ummm-yeah-Hows