Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs (pcworld.com)
Four years ago, Google was faced with a conundrum: if all its users hit its voice recognition services for three minutes a day, the company would need to double the number of data centers just to handle all of the requests to the machine learning system powering those services, reads a PCWorld article, which talks about how Tensor Processing Unit (TPU), a chip that is designed to accelerate the inference stage of deep neural networks came into being. The article shares an update: Google published a paper on Wednesday laying out the performance gains the company saw over comparable CPUs and GPUs, both in terms of raw power and the performance per watt of power consumed. A TPU was on average 15 to 30 times faster at the machine learning inference tasks tested than a comparable server-class Intel Haswell CPU or Nvidia K80 GPU. Importantly, the performance per watt of the TPU was 25 to 80 times better than what Google found with the CPU and GPU.
Welcome our new Google overlords. (or whatever...)
"I say we take off, nuke the site from orbit. It's the only way to be sure."
outperforms general purpose chips?
Wow.
Man is this a "duh" moment. Purpose built ASICs are extremely fast and low power for what they accomplish. That's why we use them. Look at a small desktop network switch: Little tiny processor that can pass 16gb/sec of traffic around. try and put 8 NICs in a computer and have it switch traffic and you'll be amazed at how much power you need. The reason the switch is small is it is purpose built: It's ASIC does nothing but switch Ethernet packets.
Same deal with some thing on a CPU. You find that decoding an AVC video stream takes next to no CPU power on modern CPUs, yet decoding an MPEG-2 video takes some. Why? Because they have a small bit of dedicated logic for AVC decoding (usually some other formats too). It is low power because it is dedicated.
Always the question in designing a system is flexibility and unit cost vs fixed function and up front cost. A CPU is great because it can do anything, and you can just buy them straight out, tons of companies have them available for purchase right now. However they take a lot of silicon and power to perform a given task. An ASIC takes a bunch of up front money to design and do a manufacturing run, but is very small and efficient, however it can't be reconfigured to do anything else and needs a full respin. In the middle there is something like an FPGA. Which one is right for a application just depends on the balance of a lot of factors.
I thought the TPU was for hard drive encryption. Or is it doing double duty?
But 1000x as expensive?
Oh good, so our dystopian future can be realized just that much faster then...
but how many fps does it get running the new Mass Effect? Oh it can't?
That's like saying a software defined radio is not a radio.
It's right -- but it's also completely wrong.
And the important part in the context here... yeah, the completely wrong part.
You can create a perfectly fine neural network with a general purpose von Neuman or Harvard architecture CPU. Speed and efficiency are issues, that's all, and that's what the TPU is designed to address.
I've fallen off your lawn, and I can't get up.
This is likely another demonstration of "those who have the money, make more money."
Solar panels: You can save all kinds of money. If you can afford to install the system in the first place.
Investments: You can make all kinds of interest. If you have money to invest.
Toilet paper: You can save lots of money. If you buy it in bunches on sale. But if you can't spare the funds... your TP costs more than the person with a few bucks to spare who buys it in bulk. Likewise has storage space for it, etc.
And so on.
I've fallen off your lawn, and I can't get up.
It's not about training the neural network. It'd about data mining and monetizing the customers. That's why everything Google phones home.
What does the machine language for these things look like? Does anybody know of a bare-bones example to illustrate how it does a simple sample neural net? Is it only for the offset shifting kind of NN's common for language AI, or other kinds also?
Table-ized A.I.
>> Google's Custom Machine Learning Chips Are 15-30x Faster Than GPUs and CPUs AT MACHINE LEARNING
There, I fixed it for you.
maybe it's a task that is not well suited to the GPU, so it performs little better than general purpose hardware.
Nullius in verba
(Disclaimer, not an AI or machine learning expert but interested in learning!)
So will this chip (or board) be available outside of google? I've heard they've released (some of) their AI/Machine learning code, would be good if once you made a working application you could buy one of these things and speed it up. Would be especially useful for applications where access to the cloud was unavailable or intermittent at best (think self driving cars, drones, spacecraft).
I guess a PCI card that would go in a server would be best but maybe a dedicated peripheral could work
Any other companies working on similar hardware? Are there any standards, like Open GL for AI?
not much