NVIDIA Creates a 15B-Transistor Chip With 16GB Bandwidth Memory For Deep Learning (venturebeat.com)
An anonymous reader cites a report on VentureBeat: NVIDIA chief executive Jen-Hsun Huang announced that the company has created a new chip, the Tesla P100, with 15 billion transistors, 16GB high-bandwidth memory for deep-learning computing. It's the biggest chip ever made, Huang said. "We decided to go all-in on A.I.," Huang said. "This is the largest FinFET chip that has ever been done." The chip has 15 billion transistors, or three times as much as many processors or graphics chips on the market. It takes up 600 square millimeters. The chip can run at 21.2 teraflops. Huang said that several thousand engineers worked on it for years. Jim McGregor, writing for Forbes (the link is not accessible to ad-blocking tool users): It features NVIDIA's new Pascal GPU architecture, the latest memory and semiconductor process, and packaging technology -- all to create the densest compute platform to date. In addition, it combines 16GB of die stacked second-generation High-Bandwidth Memory (HBM2). The memory and GPU are combined into a multichip module on a state-of-the-art silicon substrate. The P100 has NVIDIA's NVLink interface technology to connect to multiple Tesla P100 GPU modules.
Please enjoy hunting me with your time machine.
can it make me a sandwich ?
Non-Linux Penguins ?
Maybe you should use some of those engineers to fix your drivers, to, you know, support the people that have already paid you for a product you already produce. Seems that deep learning tech hasn't taught you anything. Spoken as an owner of GTX 660 SLI setup, not some rabid other team fanboi.
I'm always amazed how it takes so many engineers. What the heck do they all do? How does one organize this many contributions? Isn't this sort of they highly automated with largely repetitive subunits.
Some drink at the fountain of knowledge. Others just gargle.
This should provide some astonishing porn.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
In the Tesla's firmware http://jalopnik.com/a-hacker-m... That would be interesting if it was a chip reference and not a car reference --- tinfoil hat.
From what my friends who work at nVidia tell me, most engineers work on all projects. They get sent problems from one GPU, after fixing that, start working on issues from a CPU or some other project.
16GigaBillion
Since they are talking about bandwidth, I would guess that what they really mean is 16 GB/s. Although I don't see any reference to bandwidth in the article and the only reference I see to 16 is the 16 nm fabrication process.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
The chip has 15B transistors and no RAM. (it does have 4MB L2 cache and 14MB worth of register files)
The entire Tesla P100 package is comprised of many chips not just the GPU, that collectively add up to over 150 billion transistors and features 16GB of stacked HBM2 VRAM.
Yes it can do 21,6 teraflop.... at FP16.... half precision...it can "only" do 10,6 teraflop at single precision and 5,3 teraflop at double (64) precision. Also it doesn't have 1TB/sec advertised (for months) HBM2 memory speed, but only 720GB/sec
Just imagine a beowulf clust........
Oh, never mind.....
If telephones are outlawed, then only outlaws will have telephones.
Isn't that a little close to the other Tesla? http://jalopnik.com/a-hacker-m...
But can it run Crysis?
Yet my learning ain't so deep.
It's a FinFET device. You can represent more than 1 binary bit per transistor by using multi-gate transistors.
This is not a factually correct statement. Multi-gate transistors are used because they are more energy-efficicient, perform better, and can be scaled to smaller dimensions than traditional planar CMOS devices. The extra gates give better electrostatic control over the MOSFET channel, but they do not allow the device to perform operations on more than one bit of data at once.
https://en.wikipedia.org/wiki/Multigate_device
I'm still waiting for the game AI that can match a human brain for strategy in an open world, especially in an RPG game but anything beyond well-studied board games really. It's so frustrating to have the computer win by cheating. Not to mention implications for new expert systems. This technology can't mature soon enough.
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
Oh, for God's sake, I ignored this at first but now it's been modded up.
15 billion is the transistor count for the GPU logic. It's not the transistor count for the HMB2 memory installed alongside the GPU on the interposer. Adding an interposer does not suffice to make it all the same chip (hint from TFS: "multichip module").
FinFET is neither necessary nor sufficient to for multi-level-cell-like bit representation. That's also a flash storage technology, not a logic or volatile memory technology (at least in mass produced products).
It's 15 days to Weed Day. Put down whatever you're smoking and get back to work.
Hmmm... NVIDIA. Giant chip.
Bill Dally, are you going for a "jump approximate" instruction again?
Is this right?? 2' x 2' chip?
Note it's 20 Tflops at half precision. Single is 10, and double is 5.
Deep learning leads to Deep Thought leads to forty two.
I can address 16GByte with only 44 bits. that leaves more than 14.9B transistors left over to do whatever.
“Common sense is not so common.” — Voltaire
https://www.youtube.com/watch?...
https://www.youtube.com/c/BrendaEM
I've seen nothing that would indicate either way that this could be the truth. It's pure speculation. Others have speculated that to reach that 15 billion number they have to be counting the memory transistors as well. Though this is big at 600mm2 it isn't that much bigger than previous die's that held a fraction of that number of transitions.
How do you make 16 GB memory from 15 G transistors?
Excuse me, but please get off my Pennisetum Clandestinum, eh!
It has about 50% more transistors than the Oracle Spark M7 at 10.2 billion so the increase is reasonable.
So can it run 600 km on a single charge?
You do realize that 15 billion transistors, if you assume that each holds one bit of stored information (HA!), is less than 2GB of storage?
BTW, it's not speculation, it's from NVIDIA's own press release.
An aside from the article: "Huang showed a demo from Facebook that used deep learning to train a neural network how to recognize a landscape painting. They then used the network to create its own landscape painting."
So long for such jobs... How about deep learning about post-scarcity economics?
https://en.wikipedia.org/wiki/...
https://en.wikipedia.org/wiki/...
Also: ""Our strategy is to accelerate deep learning everywhere," Huang said."
How about some deep learning about morality? Imagine training children (or child-like AIs) in skills like weapons use without training them in morality, kindness, cooperation, and so on... How would that end?
See also:
http://www.child-soldiers.org/
"Child Soldiers International is an international human rights research and advocacy organisation. We seek to end the military recruitment and the use in hostilities, in any capacity, of any person under the age of 18 by state armed forces or non-state armed groups. We advocate for the release of unlawfully recruited children, promote their successful reintegration into civilian life, and call for accountability for those who unlawfully recruit or use them."
Maybe AIs should not be asked to replace humans until they have been around for at least eighteen years?
http://www.rfreitas.com/Astro/...
https://en.wikipedia.org/wiki/...
http://www.amazon.com/The-Chro...
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
There are two sets of memory on this chip if you read the reports. The die itself has an additional layer that is HBM (High bandwidth memory) linked directly to the CPU. Think of it like the L1 and L2 cache in x86 chips. There is nothing that indicates how much memory this is (as the quoted memory sizes are for the memory chips attached to the boards). I'm willing to bet the chip has around 10billion transistors and the remaining 5 are the HBM layer that sits on top.
There are only two sets of memory if you consider the register file to be memory instead of cache (which you apparently do). The problem is, the published specifications demonstrate that you are simply wrong.
4MB of L2 cache and ~14MB of register file space per GPU means that there is about 151 million bits associated with cache and "memory." On a chip with 15.3 billion transistors, that comfortably means that you have about 15 billion transistors for GPU logic.
There is everything to indicate the specs of the chip, and you've lost your bet. Now go away.
:P