NVIDIA Creates a 15B-Transistor Chip With 16GB Bandwidth Memory For Deep Learning (venturebeat.com)

← Back to Stories (view on slashdot.org)

NVIDIA Creates a 15B-Transistor Chip With 16GB Bandwidth Memory For Deep Learning (venturebeat.com)

Posted by msmash on Tuesday April 5, 2016 @07:01AM from the affinity-for-transistors dept.

An anonymous reader cites a report on VentureBeat: NVIDIA chief executive Jen-Hsun Huang announced that the company has created a new chip, the Tesla P100, with 15 billion transistors, 16GB high-bandwidth memory for deep-learning computing. It's the biggest chip ever made, Huang said. "We decided to go all-in on A.I.," Huang said. "This is the largest FinFET chip that has ever been done." The chip has 15 billion transistors, or three times as much as many processors or graphics chips on the market. It takes up 600 square millimeters. The chip can run at 21.2 teraflops. Huang said that several thousand engineers worked on it for years. Jim McGregor, writing for Forbes (the link is not accessible to ad-blocking tool users): It features NVIDIA's new Pascal GPU architecture, the latest memory and semiconductor process, and packaging technology -- all to create the densest compute platform to date. In addition, it combines 16GB of die stacked second-generation High-Bandwidth Memory (HBM2). The memory and GPU are combined into a multichip module on a state-of-the-art silicon substrate. The P100 has NVIDIA's NVLink interface technology to connect to multiple Tesla P100 GPU modules.

79 of 128 comments (clear)

Min score:

Reason:

Sort:

Welcome SkyNET overlords! by Anonymous Coward · 2016-04-05 07:06 · Score: 3, Funny

Please enjoy hunting me with your time machine.
1. Re:Welcome SkyNET overlords! by ShanghaiBill · 2016-04-05 07:46 · Score: 1
  
  Please enjoy hunting me with your time machine.
  https://xkcd.com/652.
2. Re:Welcome SkyNET overlords! by U2xhc2hkb3QgU3Vja3M · 2016-04-05 09:51 · Score: 1
  
  (spoiler) Since the Doctor is Skynet, he already has a time machine.
So, by dargaud · 2016-04-05 07:08 · Score: 1

can it make me a sandwich ?

--
Non-Linux Penguins ?
1. Re:So, by Grog6 · 2016-04-05 10:47 · Score: 2
  
  Only if you have admin privileges, are s superuser, and enter the right password.
  Or say "sudo make me a sandwich".
  It works on geek girls, anyway, from what I hear. :)
  
  --
  Truth isn't Truth - Guliani
Now We Know Why Drivers Suck by zenlessyank · 2016-04-05 07:09 · Score: 1

Maybe you should use some of those engineers to fix your drivers, to, you know, support the people that have already paid you for a product you already produce. Seems that deep learning tech hasn't taught you anything. Spoken as an owner of GTX 660 SLI setup, not some rabid other team fanboi.
1. Re:Now We Know Why Drivers Suck by fuzzyfuzzyfungus · 2016-04-05 07:24 · Score: 1
  
  I'm guessing that the mention of 'SLI' might be the key point here. Taking problems not explicitly designed to be parallelized and attempting to parellelize them at the driver level after the fact is...a bit of a mixed bag...in terms of actually working. It's not clear that Nvidia is holding out on us here; given that they sell fancy multi-GPU systems to high end customers I'm sure that they would be delighted to also offer tools that make using those expensive multi-GPU systems really easy; but that doesn't change the fact that, based on the application, 'SLI' can offer amusingly pitiful performance enhancements, exciting reliability issues, and similar fun even when drivers specifically tweaked to compensate for the application being run are available.
2. Re:Now We Know Why Drivers Suck by Hognoxious · 2016-04-05 07:40 · Score: 1
  
  You're box is messed up
  One item on a long, long list.
  
  --
  Confucius say, "Find worm in apple - bad. Find half a worm - worse."
3. Re:Now We Know Why Drivers Suck by halivar · 2016-04-05 07:44 · Score: 1
  
  Yes, let's have the chip designers drop everything and help write the device drivers. Brilliant. I should have the network engineers come upstairs and help me with my Excel spreadsheets.
4. Re:Now We Know Why Drivers Suck by angel'o'sphere · 2016-04-05 07:56 · Score: 1
  
  Make sure you have a couple of them so they can argue amoung themselves and find an agreement.
  Worst case sent randoms of them off to fetch you a coffee!
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
5. Re:Now We Know Why Drivers Suck by zenlessyank · 2016-04-05 07:57 · Score: 1
  
  I am sure they could help if YOU are the one doing the spreadsheet work, since reading comprehension eludes you. Mr. NVidia said 'engineers', which is a broad term, versus chip designers which is pretty specific. If they needed 100,000 chip designers then they DO have problems. Now crawl back under your bridge.
6. Re:Now We Know Why Drivers Suck by fuzzyfuzzyfungus · 2016-04-05 12:16 · Score: 1
  
  I think I did a terrible job of explaining it: Shader programs are indeed designed to be parellelized across however many shader units a given GPU provides. However, SLI(presumably because the niche market doesn't justify a nicer interconnect, or because it may simply not be feasible to provide the same level of integration between multiple PCBs as it is between elements on the same die) doesn't provide particularly close integration of the participant GPUs.
  
  SLI-ed GPUs can't even share VRAM(with some limited exceptions in DX12 if everything is aligned just right to support that), and the shaders on one die aren't visible to the scheduler on the other or vice-versa, so the decision about how to divide the workload has to be made at the level of the driver either chopping the frame into two, three, or four(depending on the number of GPUs) regions of roughly equal complexity, assigning each GPU one of them and then stitching the results together to get the finished frame(Split Frame Rendering) or by allocating alternate frames to each GPU(Alternate Frame Rendering).
  
  In either case, each GPU is effectively working in isolation on the task of rendering the frame or partial frame it has been assigned; with the downsides that SFR is vulnerable to 'tearing' effects, since deciding what an 'even' split of the work involved in rendering a frame before you actually render it can be tricky; and when you judge incorrectly your partial frames will be finished at different times, either dragging down the effective framerate if the system always waits for a complete frame before updating, or causing visual artifacts when one part of the frame is updated before the other part is. AFR has the problem that it can't actually reduce the time needed to render a frame compared to a single GPU(since it is just a single GPU rendering each frame), so while under good conditions you can get nearly linear increases in FPS; any frames that render unacceptably slowly on the GPU in question will render no faster with two of them. Plus, anything that makes rendering frame N+1 require knowledge of frame N(or worse, the state involved in rendering frame N) completely breaks AFR, since each GPU has very little access to what the other is doing.
  
  If SLI actually involved glue logic good enough that putting two cards in SLI essentially produced a single GPU with twice the resources and just slightly higher latency, it would indeed run like a bat out of hell, and make parallization mostly invisible to shader programs. It's just that it doesn't actually do that. Analogous to cluster systems vs. multisocket NUMA systems: SLI provides largely transparent support for collecting the output of each GPU and putting it together into a single video output; but is otherwise fairly loosely coupled, with the driver handing out work units to mostly isolated GPUs with their own VRAM and their own shaders and limited communication with one another.
7. Re:Now We Know Why Drivers Suck by halivar · 2016-04-06 02:11 · Score: 1
  
  Oh God, okay, I suppose you're right, and they had software engineers design this new chip instead of actual chip designers, thus stealing the precious resources you're bitching about.
8. Re:Now We Know Why Drivers Suck by martinfb · 2016-04-06 11:52 · Score: 1
  
  Perhaps the value here is the AI to overcome that driver OOPS! Replace those lame driver devs with this AI chip!
  
  --
  
  Self-importance and self-indulgence is the root of ALL evil.
9. Re:Now We Know Why Drivers Suck by zenlessyank · 2016-04-06 15:52 · Score: 1
  
  Artificial Intelligence sponsored/created by a selfish corporation. Let me get some Mellow Yellow and Pringles so I can enjoy the show.
10. Re:Now We Know Why Drivers Suck by Coren22 · 2016-04-07 04:07 · Score: 1
  
  SLI is an acronym, it stands for Scan Line Interleave. What this means, is that each GPU does half the screen worth of work by running a line at a time on each GPU. I am not sure what the OP's issue with drivers is, but my assumption would be the age of the hardware, 6xx is pretty old, and might not be enough to support modern games anymore. I highly doubt the OP's issue is with the driver having parallelization problems.
  
  --
  APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
11. Re:Now We Know Why Drivers Suck by Blaskowicz · 2016-04-09 00:07 · Score: 1
  
  SLI was Scan Line Interleave back in 1998. (and the Voodoo5 used small horizontal bands of pixels rather than straight scanlines). Then nvidia decided to revived it years later, but deciding the letters would stand for Scalable Link Interface, which doesn't really mean anything in particular. It never interleaved scanlines anymore.
1000 engineers by goombah99 · 2016-04-05 07:12 · Score: 1

I'm always amazed how it takes so many engineers. What the heck do they all do? How does one organize this many contributions? Isn't this sort of they highly automated with largely repetitive subunits.

--
Some drink at the fountain of knowledge. Others just gargle.
1. Re:1000 engineers by cowtamer · 2016-04-05 08:07 · Score: 5, Informative
  
  There are several factors. First of all, what they are building is a HUGE engineered system which would have taken up a couple of buildings a decade or two ago. The fact that the end product is small doesn't change the complexity. The second part is the fact that it IS so small, which brings its own complications. In addition, semiconductor manufacturing is a very tricky business where even making the simplest thing (e.g., a transistor) takes an enormous amount of planning, characterization, and tool design.
  Part of it is the R&D -- nothing like this has been done before, so certain things have to be figured out (heat dissipation, how the proximity of the components effect the other components,stuff neither of us will understand, etc. etc). Another huge part is tooling and process -- someone has to design, test and characterize the fabrication tools and processes (the "automation" you speak of has to be built by someone -- a device this complicated probably can't be built without the automation). The chip is divided into subsystems each of which needs to be designed, simulated, and optimized. Someone has to integrate all the subsystems and simulate them together. The 1000 people probably include material scientists, process engineers, electrical engineers of various stripes, semiconductor physicists, mechanical engineers (heat dissipation, packaging, etc)., systems engineers, engineering project managers, etc.
2. Re:1000 engineers by alvinrod · 2016-04-05 08:12 · Score: 1
  
  To some degree yes, but making a physical chip requires working withing the confines of the physical limitations of the fabrication process, so it's never as simple as designing some ideal chip. You can certainly try to do it that way, but the yields will suck. It's also not trivial to design the sub-units either, and there's always new instruction sets or other such things to be supported.
3. Re:1000 engineers by crgrace · 2016-04-05 08:20 · Score: 5, Informative
  
  One organizes many contributions using any number of industry-standard design methodologies. Designing airplanes and cars uses even more engineers.
  I suspect NVIDIA is slightly exaggerating and are counting the contribution of many "overhead" engineers that provide value for the whole engineering organization, such as people who work on design tools, design kits, methodology and the like.
  You're right, there are many repeated subunit but each unit needs a team to be optimized.
  For a chip this complex you need:
  Logic Designers (who come up with high-level models for the chip and define the instruction set / hardware interface)
  Front-end engineers that write Verilog and/or VHDL (I have no idea what NVIDIA uses)
  Implementation engineers (who do place and route and parasitic extraction)
  Verification engineers (who use various tools to see if everything is as it should be)
  Packaging engineers (who work closely with vendors to develop a custom package for the chip/module)
  Module engineers (since we have 3D stacked memories on this device the module engineering is far from trivial)
  Thermal Engineers (3D modules typically have very complex thermal requirements)
  Signal Integrity engineers (since we're going so fast just getting a signal from point A to point B is hard)
  Analog/Mixed Signal engineers (for clocking, serial I/O development)
  Integration Engineers (for modeling how to put all this together)
  System Engineers (for figuring out if this is all going to work)
  Software Engineers (for low-level software dev)
  CAD Engineers (for developing and maintaining an appropriate computer-aided design flow)
  Foundry Engineers (for working with the foundry on the physical production of the wafers... anything this big and complex will need process customization)
  ESD engineers (for figuring out and implementing an ESD strategy)
  Library Engineers (for customizing and optimizing the standard cell library used in the chip)
  Product Engineers (for solving production problems as they arise)
  Test Engineers (for developing and implementing tests to show the chip is working as expecting)
  Application Engineers (who work with early adopters to integrate this chip into their systems)
  and on and on and on...
  As you can see, an army of engineers is required for a chip this complex to see the light of day. On simpler chips, many of these roles can be played by the same people, but in a chip this big, they need to divide the work or it would never get done.
4. Re:1000 engineers by __aaclcg7560 · 2016-04-05 08:47 · Score: 1
  
  For a chip this complex you need:
  How many of those job titles and descriptions actually correspond to a college major that an American citizen can learn?
  
  It's a problem mostly seen in the U.S., say labor-market experts, thanks to a rapidly evolving economy and a divide between the country's educational institutions and employers that isn't there in other advanced economies. In Germany and Denmark, for example, the two groups collaborate to ensure training and apprenticeships lead to jobs after graduation. The gap has helped push U.S. job vacancies to 5.5 million, near historic highs. For most of the past year the number of job openings has exceeded the number of new hires, a reflection of employers' difficulty in filling positions.
  http://www.wsj.com/articles/colleges-drill-down-on-job-listing-terms-1459704268
5. Re:1000 engineers by crgrace · 2016-04-05 09:20 · Score: 1
  
  I see what you're getting at and I don't disagree. However, as you know this stuff is really complicated and you need to be specialized in your career to be effective.Most of these jobs are for Electrical Engineers, a few could be also held by people who studied Computer Science or Mechanical Engineering. I'm an Analog/Mixed-Signal Engineer and while I know Verilog and how to run verification tools, I'm frankly not as competent at those roles as specialists are. It is the way of the world.
  I agree you could retrain a product engineer to become a verification engineer but a company would rather bring in a fresh out from India on an H1-B.
6. Re:1000 engineers by X-Ray+Artist · 2016-04-05 09:21 · Score: 1
  
  Wow, "and on and on and on..."
  It is amazing how much I take for granted.
  
  --
  I would have a sig but I am too busy updating programs and restarting my computer
7. Re:1000 engineers by Pulzar · 2016-04-05 09:35 · Score: 1
  
  How many of those job titles and descriptions actually correspond to a college major that an American citizen can learn?
  They almost all correspond to an electrical or computer engineering major, however the vast majority are not really available to new grads. Three or four on that list will accept new grads who have already specialized a bit in their masters programs, and then once they have a better handle on the big picture they can transfer into other roles.
  For example, you don't design systems without knowing how the pieces work first, and you don't help customers integrate your IP/chip without knowing how *their* systems work. But, you can work on verification of a simple block with knowledge picked up in school, with appropriate guidance.
  A lot of the logic design is still done in US (in case of US companies), and many of these roles require close proximity to designers... so, you'll find that many are still widely available locally.
  Because barrier to entry in hardware design is much higher (you can't simply read a book or take online course and be proficient enough), it's much harder to fill the needs with a bunch of "cheap" H1-Bs. The outsourcing tends to happen by opening location in India/China and hiring full teams to work on some of the more easily outsourced tasks, like CAD and integration. But even there, it's hard to find enough experienced engineers.
  
  --
  Never underestimate the bandwidth of a 747 filled with CD-ROMs.
8. Re:1000 engineers by U2xhc2hkb3QgU3Vja3M · 2016-04-05 09:52 · Score: 1
  
  Dr. Ellie Sattler: Women inherits the earth.
9. Re:1000 engineers by slew · 2016-04-05 11:07 · Score: 2
  
  (with apologies to Michael Crichton)
  
  Ian - God creates intelligence, god destroys intelligence. God creates man, man destroys god. Man create AI.
  Ellie - AI destroys man, women inherit the earth...
  Perhaps a different, more historical view from the 1950's http://www.alteich.com/oldsite...
  
  Dwar Ev threw the switch. There was a mighty hum, the surge of power from ninety-six billion planets. Lights flashed and quieted along the miles-long panel. Dwar Ev stepped back and drew a deep breath. "The honor of asking the first question is yours, Dwar Reyn."
  "Thank you," said Dwar Reyn. "It shall be a question that no single cybernetics machine has been able to answer." He turned to face the machine. "Is there a God?"
  The mighty voice answered without hesitation, without the clicking of single relay.
  "Yes, now there is a God."
  Sudden fear flashed on the face of Dwar Ev. He leaped to grab the switch.
  A bolt of lightning from the cloudless sky struck him down and fused the switch shut.
10. Re:1000 engineers by AK+Marc · 2016-04-05 15:21 · Score: 1
  
  This won't get us closer to AI. A Sperm whale has a brain about 5x the size of a human, and isn't any smarter. So presuming that a larger brain will be smarter is contrary to the known facts.
  
  The solution to hard AI will not be direct.
  
  --
  Learn to love Alaska
11. Re:1000 engineers by KGIII · 2016-04-05 18:06 · Score: 1
  
  I've actually been wondering and asking, for a while now, why we can't just buy like a 5 cubic inch block of CPU and stuff it in our computers. Yes, I know it will get hot. Yes, I know it'll suck down juice like an arc welder. I'm okay with that. I've got solar and wind. I can cool that down - there are things to do that with.
  Seriously, am I the only one that envisions a 5^3" chunk of CPU and all the glorious things I could do with it? Coupled with stacks of those NVRAM critters, piped right next to it, and crammed into a tower that's not much larger than the standard tower we have today. Hell, back home in Maine, I've got two racks that aren't even full. I'd cram it in there. No latency to deal with from distributing. Just miles of compute cycles as far as the eye can see.
  Sure, make it a rich-man's toy. I'd still think about buying it. Something like 10,240 cores or whatnot. (Ha! Can you imagine what Oracle would cost on that?) Only, make it so that it's something I can reasonably buy and use a desktop OS on it. Hell, I'd switch to OS X or Windows if they were the only OSes to run on it. I still see my CPU peg 100% - and I can bust up to 4.2 GHz - in my laptop. (Actually, I guess I can OC my CPU and get up to about 5 GHz.) I don't run VMs locally that often, so I don't really run out of RAM.
  But I still peg the CPU - frequently. It's almost like we're back at the CPU being the bottleneck again. Just a 5x5x5" cube of transistors. Might want to spread 'em out a little bit - I guess there's some potential for shorts and "leaks." Still... Just a big ol' stack of 'em plugged into a motherboard, it'd probably have to be a horizontally mounted board so that the weight doesn't yank it out of the socket. So we might be talking something akin to the old desktops that were flat - what the hell was the name of that style again? Meh, I forget... I don't think I've even seen any of those in the wild for a long time. I used to stuff my monitor on top of it even though that was probably a bad idea.
  Maybe it'd be a little bigger. I'm okay with that. I'm sure as hell not gonna pay supercomputer prices for it or one-off prices for it. I'm absolutely positive that I'm not the only one who'd be willing to buy something like that. Sure, it might cost as much as a fairly decent family sedan or something but I'd never have to upgrade again. Ever... Well, until they became common and then they started to assume we all have that many resources and start making everything bigger so that they can then consume those resources. (I never did figure that one out. My hardware's much faster than it used to be. Things seem to take about the same amount of time in many areas.)
  Hmm... There's something to that. I had scads of RAM (maxed out on 32 bit systems) and then they decided everyone had that. My computer was no longer speedy. I had broadband when everyone else had dial-up. Then they started making web pages that are 2 - 5 MB each and load things from across the globe. Fortunately, I can avoid some of that. We buy more resources and they just find a way to take them all back from us. Then they assume everyone's got it. I can't imagine the web today if you have dial-up. Oh wow, that'd just suck. I'd so use Lynx or elinks and block everything at the firewall with more attention paid to keeping it up to date. Actually, I'd move.
  It's reached the point where I even get gobs of RAM for my laptop.
  At any rate, I can't be the only one who'd like to see a giant-ass CPU in a desktop system. Just a huge chunk of nearly infinite cores and a compiler switch to make sure it uses 'em all. Hell, make me an extra large one with two of 'em in there. 20480 cores ought to be enough for the rest of my life. Hell, if one fizzles out, design it so that it routes around it. And no, not for HPC-type things but for a desktop. I'd never have to update again. I just gotta figure out how to compile for dual-CPU with a total of 20480 cores. There's a make switch for that somewhere. *nods* There's gotta be. Well, there should be.
  
  --
  "So long and thanks for all the fish."
12. Re:1000 engineers by peragrin · 2016-04-05 22:59 · Score: 1
  
  except you can't remove the heat from inside the block. The reason cpu's are basically 2D flat pieces is so we can glue on cooling fins, and cool that bad boy down. until we figure out how to do micro liquid cooling, under pressure, and interweave that cooling into the chip's design we won't get real 3D stacks of processors.
  Though What we can do is create 4-6 cpu's stacked vertically around a coolant tube. but I believe that run a foul of pushing parts of the pct to far away from the rest of the components.
  
  --
  i thought once I was found, but it was only a dream.
13. Re:1000 engineers by religionofpeas · 2016-04-06 01:38 · Score: 1
  
  Because we can only make things in layers. So it you want to stack a million layers of silicon, and each layer is a dozen process steps, manufacturing a wafer will take years. Even with current 2D designs, the process already takes many weeks. The chance of a tiny error in one of the layers messing up the whole chip will be huge.
14. Re:1000 engineers by Coren22 · 2016-04-07 03:21 · Score: 1
  
  You could probably train the AI to play Quake 1 pretty effectively.
  
  --
  APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Traditional early adopter killer app by Impy+the+Impiuos+Imp · 2016-04-05 07:13 · Score: 1

This should provide some astonishing porn.

--
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
1. Re:Traditional early adopter killer app by gstoddart · 2016-04-05 07:38 · Score: 1
  
  LOL, why am I suddenly picturing millions of horny guys getting blown off by a porn AI which has developed attitude?
  Except for the people into that whole humiliation thing, I just don't see that being a big selling point. :-P
  I think sentient porn is the last thing we want.
  
  --
  Lost at C:>. Found at C.
2. Re:Traditional early adopter killer app by Perky_Goth · 2016-04-05 14:09 · Score: 1
  
  Well, it could learn to categorize it and learn your preferences very fast, so it's not out of the question...
The P100 was already discovered.. by HumanWiki · 2016-04-05 07:13 · Score: 2

In the Tesla's firmware http://jalopnik.com/a-hacker-m... That would be interesting if it was a chip reference and not a car reference --- tinfoil hat.
1. Re:The P100 was already discovered.. by michelcolman · 2016-04-05 09:23 · Score: 1
  
  The reference was P100D, so maybe the car is using two of them?
most nVidia enginners work on all projects by Anonymous Coward · 2016-04-05 07:14 · Score: 4, Informative

From what my friends who work at nVidia tell me, most engineers work on all projects. They get sent problems from one GPU, after fixing that, start working on issues from a CPU or some other project.
Units by Anonymous Coward · 2016-04-05 07:18 · Score: 1

16GigaBillion
Re:15B transistors = 16 GB ? by CastrTroy · 2016-04-05 07:22 · Score: 1

Since they are talking about bandwidth, I would guess that what they really mean is 16 GB/s. Although I don't see any reference to bandwidth in the article and the only reference I see to 16 is the 16 nm fabrication process.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:15B transistors = 16 GB ? by Anonymous Coward · 2016-04-05 07:23 · Score: 1

The chip has 15B transistors and no RAM. (it does have 4MB L2 cache and 14MB worth of register files)
The entire Tesla P100 package is comprised of many chips not just the GPU, that collectively add up to over 150 billion transistors and features 16GB of stacked HBM2 VRAM.
somewhat deceiving numbers.... by etash · 2016-04-05 07:27 · Score: 5, Informative

Yes it can do 21,6 teraflop.... at FP16.... half precision...it can "only" do 10,6 teraflop at single precision and 5,3 teraflop at double (64) precision. Also it doesn't have 1TB/sec advertised (for months) HBM2 memory speed, but only 720GB/sec
1. Re:somewhat deceiving numbers.... by Anonymous Coward · 2016-04-05 07:35 · Score: 5, Informative
  
  It turns out that for deep learning, 1/2 precision is very commonly used. You are using floats for numbers in a fairly small range, and accuracy isn't key. half precision speeds up processing, and more importantly lets you work with twice as much data.
2. Re:somewhat deceiving numbers.... by ChrisMaple · 2016-04-05 08:42 · Score: 1
  
  Is there really any advantage over 16 bit integer, which would be faster and less complex?
  
  --
  Contribute to civilization: ari.aynrand.org/donate
3. Re:somewhat deceiving numbers.... by 0100010001010011 · 2016-04-05 09:20 · Score: 2
  
  Of course they do and do it with 0.00001525878 precision to boot. If that's all you need then you can get by with 16 bit numbers just fine.
4. Re:somewhat deceiving numbers.... by fizzup · 2016-04-05 09:24 · Score: 4, Informative
  
  I don't think that's a very good explanation. If sigmoid or step neurons only used numbers in the range [0,1], then you could divide the range into 65,536 individual states and use 16 bits to translate [0,65535] into [0,1]. However, sigmoid neurons have many inputs of many different weights, so the total input to a sigmoid neuron can be greater than one. In fact, any one input, after weighting, can be greater than one. The weights themselves can be greater than one. Only the output is constrained to [0,1] by the sigmoid or step function.
  In order to represent a number without a lot of accuracy, but keep the ability to represent large and small values, you need a floating-point number. I'm no expert in deep learning, but it does pass the sniff test that a 16-bit float would be good enough for neurons. I assume that NVIDIA has done their homework and determined that FP-8 numbers have too much rounding error to be useful in a neural network.
5. Re:somewhat deceiving numbers.... by edxwelch · 2016-04-05 09:39 · Score: 1
  
  Shaders for mobile GPUs use 1/2 precision quite a bit, however the small range (-2 - 2) is a problem for many operations, so you end up only being able to use them for less than half of the code.
6. Re:somewhat deceiving numbers.... by Mandrel · 2016-04-05 13:42 · Score: 1
  
  Is there really any advantage over 16 bit integer, which would be faster and less complex?
  Yes, artificial (and real) neurons deliver a weighted average of positive and negative inputs. So you usually have large positive and negative inputs which subtractively cancel to a moderate output. Integer doesn't handle this subtractive cancellation nearly as well as floating point, which can keep the same precision over large changes in scale.
7. Re:somewhat deceiving numbers.... by DrJimbo · 2016-04-05 20:24 · Score: 1
  
  Now with floating point 0.5*0.5 = 0.25 which is a smaller number as expected. If you multiply two positive integers like 50*50 you get 2500, so a larger value which requires further operations on it for it to be useful.
  The only "further operation" needed is to look at the higher word of the result which takes zero extra effort. For example, if you multiply two 16-bit words then you get a 32-bit result. The "extra effort" is taking the upper 16-bits of the result and ignoring the lower 16-bits.
  There may well be good reasons for FP16 to preferred over using integers but scaling the result of multiplications isn't one of them.
  
  --
  We don't see the world as it is, we see it as we are.
  -- Anais Nin
8. Re:somewhat deceiving numbers.... by religionofpeas · 2016-04-06 01:45 · Score: 1
  
  The only "further operation" needed is to look at the higher word of the result which takes zero extra effort. For example, if you multiply two 16-bit words then you get a 32-bit result. The "extra effort" is taking the upper 16-bits of the result and ignoring the lower 16-bits.
  
  So, multiplying 100 by 100 equals 0, but starting at 0 and adding 100 for 100 times equals 10000 ?
I for one welcome our () Overlords by tekrat · 2016-04-05 07:28 · Score: 4, Funny

Just imagine a beowulf clust........
Oh, never mind.....

--
If telephones are outlawed, then only outlaws will have telephones.
1. Re:I for one welcome our () Overlords by avandesande · 2016-04-06 01:43 · Score: 1
  
  Actually I believe they are going to simulate Natalie Portman's mind using this chip.
  
  --
  love is just extroverted narcissism
2. Re:I for one welcome our () Overlords by avandesande · 2016-04-06 06:32 · Score: 1
  
  If this processor runs hot we can use it to make the grits!
  
  --
  love is just extroverted narcissism
Tesla P100? by rickyb · 2016-04-05 07:30 · Score: 1

Isn't that a little close to the other Tesla? http://jalopnik.com/a-hacker-m...
1. Re:Tesla P100? by ShanghaiBill · 2016-04-05 08:16 · Score: 2
  
  Musk can't copyright the name of a famous scientist.
  TRADEMARK, not copyright ... and yes he can, but only for a narrow commercial purpose. Elon owns the trademark "Tesla" as a car brand. NVIDA owns the trademark "Tesla" as a GPU brand.
2. Re:Tesla P100? by stealth_finger · 2016-04-05 22:30 · Score: 1
  
  Musk can't copyright the name of a famous scientist.
  TRADEMARK, not copyright ... and yes he can, but only for a narrow commercial purpose. Elon owns the trademark "Tesla" as a car brand. NVIDA owns the trademark "Tesla" as a GPU brand.
  And C&C owns "Tesla" as a tank
  
  --
  Wanna buy a shirt?
  https://www.redbubble.com/people/stealthfinger/shop?asc=u
The Most Advanced Hyperscale Datacenter GPU Ever by Grismar · 2016-04-05 08:17 · Score: 1

But can it run Crysis?
My brain has unlimited storage capacity and speed by JoeyRox · 2016-04-05 08:29 · Score: 1

Yet my learning ain't so deep.
Re:15B transistors = 16 GB ? by Anonymous Coward · 2016-04-05 08:39 · Score: 2, Informative

It's a FinFET device. You can represent more than 1 binary bit per transistor by using multi-gate transistors.
This is not a factually correct statement. Multi-gate transistors are used because they are more energy-efficicient, perform better, and can be scaled to smaller dimensions than traditional planar CMOS devices. The extra gates give better electrostatic control over the MOSFET channel, but they do not allow the device to perform operations on more than one bit of data at once.
https://en.wikipedia.org/wiki/Multigate_device
Decent game AI when? by Iamthecheese · 2016-04-05 08:43 · Score: 1

I'm still waiting for the game AI that can match a human brain for strategy in an open world, especially in an RPG game but anything beyond well-studied board games really. It's so frustrating to have the computer win by cheating. Not to mention implications for new expert systems. This technology can't mature soon enough.

--
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
Re:15B transistors = 16 GB ? by DRJlaw · 2016-04-05 08:45 · Score: 5, Informative

It's a FinFET device. You can represent more than 1 binary bit per transistor by using multi-gate transistors.
Oh, for God's sake, I ignored this at first but now it's been modded up.
15 billion is the transistor count for the GPU logic. It's not the transistor count for the HMB2 memory installed alongside the GPU on the interposer. Adding an interposer does not suffice to make it all the same chip (hint from TFS: "multichip module").
FinFET is neither necessary nor sufficient to for multi-level-cell-like bit representation. That's also a flash storage technology, not a logic or volatile memory technology (at least in mass produced products).
It's 15 days to Weed Day. Put down whatever you're smoking and get back to work.
History repeats? by dlleigh · 2016-04-05 08:50 · Score: 1

Hmmm... NVIDIA. Giant chip.
Bill Dally, are you going for a "jump approximate" instruction again?
600 square millimeters ???? by colin_faber · 2016-04-05 09:36 · Score: 1

Is this right?? 2' x 2' chip?
1. Re:600 square millimeters ???? by Waffle+Iron · 2016-04-05 09:52 · Score: 5, Funny
  
  Is this right?? 2' x 2' chip?
  That's right.
  And the package is shaped like Stonehenge.
2. Re:600 square millimeters ???? by Janthkin · 2016-04-05 09:54 · Score: 1
  
  Is this right?? 2' x 2' chip?
  No, it's not right.
  600mm^2 is a chip just under 25mm on a side.
3. Re:600 square millimeters ???? by TomGreenhaw · 2016-04-05 10:02 · Score: 1
  
  2 feet X 2 feet - my guess is no
  
  Square root of 600mm = 24.4948974278mm
  
  24.4948974278 mm = 0.964366 inches (0.964366")
  
  Even still .. that's a friggin HUGE chip
  
  --
  Greed is the root of all evil.
20 Tflops at half precision by rfengr · 2016-04-05 09:42 · Score: 1

Note it's 20 Tflops at half precision. Single is 10, and double is 5.
We all know how this ends by U2xhc2hkb3QgU3Vja3M · 2016-04-05 09:53 · Score: 1

Deep learning leads to Deep Thought leads to forty two.
Re:15B transistors = 16 GB ? by OrangeTide · 2016-04-05 12:03 · Score: 1

I can address 16GByte with only 44 bits. that leaves more than 14.9B transistors left over to do whatever.

--
“Common sense is not so common.” — Voltaire
Or.... by BrendaEM · 2016-04-05 13:27 · Score: 2

https://www.youtube.com/watch?...

--
https://www.youtube.com/c/BrendaEM
Re:15B transistors = 16 GB ? by rahvin112 · 2016-04-05 13:45 · Score: 1

15 billion is the transistor count for the GPU logic.
I've seen nothing that would indicate either way that this could be the truth. It's pure speculation. Others have speculated that to reach that 15 billion number they have to be counting the memory transistors as well. Though this is big at 600mm2 it isn't that much bigger than previous die's that held a fraction of that number of transitions.
Numbers don't add up by flyingfsck · 2016-04-05 14:44 · Score: 1

How do you make 16 GB memory from 15 G transistors?

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Re:15B transistors = 16 GB ? by Agripa · 2016-04-05 19:26 · Score: 1

I've seen nothing that would indicate either way that this could be the truth. It's pure speculation. Others have speculated that to reach that 15 billion number they have to be counting the memory transistors as well. Though this is big at 600mm2 it isn't that much bigger than previous die's that held a fraction of that number of transitions.
It has about 50% more transistors than the Oracle Spark M7 at 10.2 billion so the increase is reasonable.
Tesla P100 by hackertourist · 2016-04-05 19:31 · Score: 1

So can it run 600 km on a single charge?
Re:15B transistors = 16 GB ? by DRJlaw · 2016-04-05 23:58 · Score: 1

You do realize that 15 billion transistors, if you assume that each holds one bit of stored information (HA!), is less than 2GB of storage?
BTW, it's not speculation, it's from NVIDIA's own press release.
Deep learning about morality and post-scarcity? by Paul+Fernhout · 2016-04-05 23:58 · Score: 1

An aside from the article: "Huang showed a demo from Facebook that used deep learning to train a neural network how to recognize a landscape painting. They then used the network to create its own landscape painting."
So long for such jobs... How about deep learning about post-scarcity economics?
https://en.wikipedia.org/wiki/...
https://en.wikipedia.org/wiki/...
Also: ""Our strategy is to accelerate deep learning everywhere," Huang said."
How about some deep learning about morality? Imagine training children (or child-like AIs) in skills like weapons use without training them in morality, kindness, cooperation, and so on... How would that end?
See also:
http://www.child-soldiers.org/
"Child Soldiers International is an international human rights research and advocacy organisation. We seek to end the military recruitment and the use in hostilities, in any capacity, of any person under the age of 18 by state armed forces or non-state armed groups. We advocate for the release of unlawfully recruited children, promote their successful reintegration into civilian life, and call for accountability for those who unlawfully recruit or use them."
Maybe AIs should not be asked to replace humans until they have been around for at least eighteen years?
http://www.rfreitas.com/Astro/...
https://en.wikipedia.org/wiki/...
http://www.amazon.com/The-Chro...

--
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
Re:15B transistors = 16 GB ? by rahvin112 · 2016-04-06 05:01 · Score: 1

There are two sets of memory on this chip if you read the reports. The die itself has an additional layer that is HBM (High bandwidth memory) linked directly to the CPU. Think of it like the L1 and L2 cache in x86 chips. There is nothing that indicates how much memory this is (as the quoted memory sizes are for the memory chips attached to the boards). I'm willing to bet the chip has around 10billion transistors and the remaining 5 are the HBM layer that sits on top.
Re:15B transistors = 16 GB ? by DRJlaw · 2016-04-06 06:46 · Score: 1

There are only two sets of memory if you consider the register file to be memory instead of cache (which you apparently do). The problem is, the published specifications demonstrate that you are simply wrong.
4MB of L2 cache and ~14MB of register file space per GPU means that there is about 151 million bits associated with cache and "memory." On a chip with 15.3 billion transistors, that comfortably means that you have about 15 billion transistors for GPU logic.
There is everything to indicate the specs of the chip, and you've lost your bet. Now go away.
Imagine what a BEOWULF CLUSTER of these could do!! by ChoosyBeggar · 2016-04-06 09:17 · Score: 1

:P