Nvidia 55nm Parts Are Bad Too
JagsLive sends in a story (in somewhat inflammatory prose) from The Inquirer, which links to many others; they have been following developments in the alleged NVidia quality "fiasco" for some time. "Hot on the heels of its denials that anything is wrong with the G92 and G94s comes another PCN [Product Change Notification] that shows the G92s and G92b are being changed for no reason. Yup, the problems that are plaguing G84 and G86 are the same that affect seemingly all 65nm and now 55nm NVidia parts ... It is hard to overstate how bad this is. Basically every 65nm and 55nm NVidia part appears to be defective ... We are hearing of early failure rates in the teens percent for 8800GTs and far higher for 9600GTs ... To make matters worse, NVidia has a mound of unsold defective parts that they are going to bleed out into the channel along side of the (hopefully) fixed parts. As a buyer, you have no way of knowing which one you are getting ... Until NVidia comes fully clean on this fiasco, lists all the defective parts, and orders boxes clearly marked, you can't say anything other than just avoid them. Then again, since doing the right thing would likely bankrupt them, we wouldn't hold your breath for it to happen."
Sure, the GPU might be faulty but the rest of the components on their graphics cards (cooling fan, PCI-Express connector) are not showing any issues.
So let's not blow this out of proportion.
I'm a big tall mofo.
At risk or not?
Also, this sounds like a class-action waiting to happen.
This is the kind of story that can only end with somebody being fired for making pizza in the silicon fab oven.
No kidding!!! What do you say at this point?
"Then again, since doing the right thing would likely bankrupt them, we wouldn't hold your breath for it to happen"
-5 Troll
...to buy Nvidia? Problem solved.
If you're a betting man, now's a good time to pick up on Nvidia stock.
The question is, do you feel lucky, punk?
It's been a long time.
I've always secretly been an ATI fanboy... and a traitor since the 6800GT.
Now, I've got ATI again but recommended everyone I know (up until 48XX by ATI) buy the 8800 or 9600....
I wanted ATI to regain some track to even the market... but this is a little much. Complete flops are not good for competition either.
Boot Windows, Linux, and ESX over the network for free.
I've got a two 8800 series cards (one 8800GT, one 8800GTS), and I live in a place with no air conditioning. If these cards were subject to heat failure the way the Inquirer has been hollering about - one or both would have died by now. Particularly the one in my wife's computer - it's a Shuttle box, which runs toasty. It's been rock solid, running 24/7 for more than a year now.
I'm not suggesting there is NO problem - but the Inquirer has been talking about this like all of these cards are just waiting to die. With no A/C, and temps in the house above 90F during the summer, they should be dead if the Inq is to be believed. Perhaps I'm just lucky, but I still aint buying the story.
"Nothing is so important that you cannot make fun of it." -Clarke
I don't get people who show any sort of devotion to a GPU manufacturer. I just don't. The author of this article is one of them. That doesn't mean it's not true, but he's written a number of articles that later proved to be completely false in the past, for instance saying that the 8800 series would doom nV because of low performance and high power usage compared to the 1900 or 2900, whatever ATI was releasing at about the same time. I'd suggest you not take any article written by Charlie seriously until it's been confirmed (not just repeated, as often happens) elsewhere.
"I zero-index my hamsters" - Willtor (147206)
I stopped reading when I got to "By Charlie Demerjian."
Seriously, this guy is to NVIDIA as Jack Thompson is to video games. It's just not as common knowledge that you shouldn't take him seriously.
has been proposed:
buy ati.
Good people go to bed earlier.
Why is NVidia using lead-based solders at this late date? The European RoHS deadline for lead-free components was back in 2005. The NForce and 8800 parts were RoHS compliant years ago. Are these NVidia parts even exportable to Europe?
Yeah, because the Inquirer is such a steady and accurate news source.
I'll believe this when I see more proof.
Okay, i hear about supposedly deffective nvidia GPUs all the time now, but why are we not seeing forums crowded with people with these failed graphics cards? i believe this issue is being overblown substantionally out of it's actual proportions.
To make matters worse, NVidia has a mound of unsold defective parts that they are going to bleed out into the channel along side of the (hopefully) fixed parts.
This sounds very similar to what finally took down Weitek, back when there were a bunch of graphics chip companies competing hotly and being shaken out if they screwed up.
Weitek had built a very fast and powerful chip. But they had goofed: While it had the mandatory basic VGA mode for acquiring the Microsoft certification, there was a bug in it.
QA told management that the bug was there and would fail them. But Software told them a driver could work around it and people would want the chip because it was so fast on graphics rendering. (Of course it could not - because to get the cert it had to work with the stock bootstrap stuff, before a custom driver could be loaded.)
So they went to production with the bug. And the customers got their prototypes, found the bug, and demanded a fix. Eventually they did a fixed version - but had maybe a couple million of the buggy ones on hand and wouldn't sell the fixed ones unless the customer bought some buggy ones, too. So nobody bought and the company folded.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Well, it WAS part of a joke. Now you blew it for everyone else. I hope you're happy.
Faster! Faster! Faster would be better!
We're not unreasonable here, nobody's going to eat your eyes.
It is hard to overstate how bad this is.
This will end all life on earth.
That wasn't hard.
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Based on personal experience with the 8800 GT boards, I think this problem is vastly overstated... Running 4 of them in my house, and three friends also running them in SLI config, and no failures. That's with the boards overclocked by a bit.
Additionally, failure rates based on NewEgg reviews seem very low (and we all know people love to post a nasty review if they get a bad one).
The cards do run nasty hot, at least until you set the fan to turn on at something under 180F.... who the hell came up with that turn-on temp?
The Intel 486SX was a defective 486DX who's numeric processor was dead or not working.
Most very very large scale integrated chips have defects. Depending on the nature of the defect, they simply categorize the part differently.
A chip is not fast enough for a high speed gaming system? Us it in an embedded device.
Buy it, if it fails, return it. Just because nVidia has issues you know about, don't think for an instant that ATI doesn't.
- Linux support
Really?.
Also, I use FreeBSD. Unless something has dramatically changed with ATI drivers on FreeBSD in the past year, the drive quality argument goes right out the window.
I am neither an NVidia or ATI fanboy (heck, my current GPU is an integrated Intel), but this article is a steaming pile of crap.
Somehow, he takes a report of a routine running change to the production process (a new kind of solder), and magically turns this into some wild tale of how NVidia is shipping thousands of defective parts that will remain in the field.
Completely lacking is how he corresponds the running change to some defect...
SirWired
I'm not at all sure your criticism is based on the correct quotation source; cf: http://en.wikiquote.org/wiki/Friedrich_Nietzsche#The_Gay_Science_.281882.29 Now back to nvidia....
They have less lead, but they still have lead.
WRT to beef though, salmonella poisoning by beef is almost completely unheard of - chicken yes, beef no. Where this whack job got his numbers from is anyone's guess but they are wrong.
Think outside the... Hey, where'd the friggin' box go?
This guy still hasn't posted sources and is making radical claims about salmonella infection rates. If this rate was true then most of the US would have had some level of Salmonella poisoning by now. That is unless it is all killed by cooking it correctly. Still this post above is NOT informative. It is inflammatory. Don't confuse the two.
Charlie at The Inquirer has no credibility when it comes to nVidia.
From TFA, nVidia is changing from high lead to eutectic (tin) solder - for RoHS compliance - and has issues a PCN to that effect. Charlie has latched onto this as "proof" of his claim that all nVidia chips are faulty and overheat.
What Charlie doesn't explain is how switching from high-lead solder (5/95 Sn/Pb) to eutectic solder (63/37 Sn/Pb) - which has the lowest melting point of all tin-lead solders - is supposed to help if the chips are overheating. Nor does he explain how changing the solder material has any relationship to changing the underfill material on some mobile chips (other than they were both PCNs.) But hey, why let facts get in the way of a conspiracy theory/page hits?
I wonder if we'll ever see graphics card makers use socket GPU's (or maybe it's been done before).
Could be a useful thing if they start coming out with multi-GPU cards... if you can't afford a dual-GPU then add it in later.
Contrary to your belief that 'these kinds of problems are subtle and might be missed during a decent period of testing' it can be EXTREMELY difficult to find these kinds of problems. Beyond your wildest imaginings difficult.
Having worked on high performance hardware/software systems as an engineer I can tell you from first hand experience that the situation is more like there are 999,999,998 ways for things to go wrong and about 2 ways you can get it right, and those 2 ways are not AT ALL obvious. Usually the types of problems you encounter HAVE no obvious cause and no obvious solution and mostly can't be reliably replicated. They can stem from the very most subtle differences between two boards or systems. A cap that happens to be a bit out of spec and a slightly less than perfect solder joint can combine to create an error that happens 1 out of every 100 billion times an operation is performed.
Now, combine that with the fact that you have a dozen vendors slightly varying implementations of a given board design, PCs of all different types and quality levels running at different speeds with different CPUs in them, running a plethora of different versions and subversions of OS and drivers and applications, and the real miracle is you can make a board that works reliably at all.
Any attempt to make a really seriously bullet proof product that would virtually never have problems is simply infeasible. There is a law of diminishing returns involved. At a certain point you have to say "Well, we've tested it in 10 dozen different systems under 6 different OS versions with 128 different apps, and we get N number of crashes/malfunctions per hour of runtime." and then you call it a day. You could spend 10x more time and money on QA and reduce the failures to N/2, but you also won't sell much product when multiply your NRE by a factor of 10...
Plus such perfection will be for naught because MS will release BrokenOS patch "friday the 13th" 2 days later and you'll STILL be encountering the higher error rates. Same goes for new motherboards, games, etc. It is just a loosing proposition.
All you can realistically do is what they do now, test the heck out of it as best you can afford to, ship it out the door, and try to address any issues that come up later as quickly and painlessly as you can.
This is the kind of reason why military and aerospace grade hardware costs 2000x more than electronics with similar functionality with civilian retail/commercial specs. They REALLY do have to be certain things work exactly right or people die, and it is WAY expensive.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
nVidia needs to take a page from Intel's FDIV days (ca. 1993) and just do a no-questions-asked recall and replace.
https://www.accountkiller.com/removal-requested
Good.. I thought it was just me. And I'm definitely NOT a hardware guy. But I can't see, from his description of the PCN, how switching from high-lead to tin solder could be seen as a response to, well, anything except "let's use less lead".
I know that 63/37 has a lower melting point than 60/40, and a "sharper" one (no pasty phase), which is why I use it for audio repairs and cabling; I'm a klutz, and anything that makes my solder joints more stable is good. But I can't imagine that this matters as much on SMT, where your components are fixed in place.
That said... a quick Google shows that there are all sorts of considerations in what solder to use for PCB solder bumps: not just temperature, but the metals involved in the leads, and the PCB traces, and a bunch of other stuff that involves knowing more about electronics and metallurgy than my "the batteries go this way" brain can handle. So there may well be some stability advantages to eutectic solder for NVidia's solder bumps.
Anyone here actually know this stuff? I've got an 8800GT in my Mac Pro, which definitely runs hotter than your average PC...
I own two notebooks with Nvidia Chipsets in them. Both HP notebooks, one contains an 8400M the other an 8800m GTS. The 8400M notebook's cover broke at the hinge conection (a problem that was in no way related to circuit boards) last week and was sent back, just got it back today and checked on the repair slip was a note that they replaced the outside cover but they had also replaced the video circuit board. Surprise!
Just last week the Laptop with the 8800GTS started blue screening windows with a video subsystem problem before the login prompt. Ubuntu booted without error but would freeze every 30 seconds for 15 seconds or so if you moved the cursor on the screen. HP concluded the graphics system was malfunctioning and off to repair it went. I'll know in a couple weeks what was replaced but I bet the 8800GTS gets replaced.
This is a BIG deal people. Charlie is being a sensationalist but it's a BIG deal if HP extends the warranty on every laptop with the chips in them for an additional year. HP wouldn't do that unless they feared loss of customers or a class action lawsuit because the warranty extension costs them serious dollars. And I would also bet HP isn't going to eat every dollar. Nvidia will share the cost at a minimum. Even 10% bad parts could cost Nvidia hundreds of millions.
Charlie might go overboard in his complaints about Nvidia but he's right about this issue, it's really really big and Nvidia will eventually talk about it because of stories like this. Without Charlie's stories Nvidia would probably try to bury the issue and pretend it wasn't happening and if I was invested in NVDA I would want to know this information because it's a harbinger of a profit warning by NVDA.
TFA and /. summary are possibly grossly unfair. There *are* two sides to every story, and apparently the article author has a chip on his shoulder for Nvidia, no pun intended. (Personally, I don't have a dog in this fight, but in the interest of fairness...) Check out the comments, like this one which would seem to be from someone at Nvidia:
Answer this... As you know Charlie has a history of severe bias against NVIDIA. Our July announcement of the problem with notebook GPU failures (link) has given him lots to rant about. This new story is the latest in a series of articles in which he continues to stretch the truth in order to spread FUD. In it he asserts he paints the notebook chip failures as if it were a widespread epidemic affecting every single NVIDIA GPU in existence including desktop. Here is a list of BS and the truth.
Myth 1 - NVIDIA has denied responsibility for the failures and is blaming suppliers and partners.
In our announcements accept responsibility for the failures. We DO call out the material failure but we also acknowledge that our suppliers and notebook designs because this is true and we need to disclose this in our official statements to the SEC. We would not go on record with the SEC making such bold claims if they weren't true. See our Form 8-K statement below.
Myth 2 - There is an "official story" that the problems were limited a batch of a few bad parts for HP.
We have never issued a stated this. See our public statements below.
Where is source for that?
Myth 3 - NVIDIA is forcing a fix on notebook makers
The idea that a supplier like NVIDIA can dictate a fix to the world's largest PC makers is preposterous.
The truth is the notebook makers determining their own course of action and we are supporting them.
Where is source for that?
Myth 4 - NVIDIA is trying to cuts our financial liability.
We put aside $200M to help partners solve this problem for consumers. As far as we know NVIDIA is the first and only chip maker to help fund the cost for repairs.
Myth 5 - This affects desktop chips, G92, G94, etc.
We have only seen this problem on notebooks. We just reiterated this during an official financial call. Once again we would not say this if it wasn't true. Note we have not disclosed the specific GPUs but we have stated this impact previous generation GPUs and that current gen GPUs are not in production.
Fact Charlie has an obvious bias against NVIDIA and he has no sources to back up his claims. Out of all of the hundreds upon hundreds of notebooks models designed with NVIDIA chips in the last few years, only a small number of these have experienced the problem. Within this small number of models, only a small percentage actually experiences the chip failure. It is highly unlike a notebook user will experience the problem. And we have never seen this problem on desktop.
Other Useful Information
"Separately, NVIDIA plans to take a one-time charge from $150 million to $200 million against cost of revenue for the second quarter to cover anticipated warranty, repair, return, replacement and other costs and expenses, arising from a weak die/packaging material set in certain versions of its previous generation GPU and MCP products used in notebook systems. Certain notebook configurations with GPUs and MCPs manufactured with a certain die/packaging material set are failing in the field at higher than normal rates. To date, abnormal failure rates with systems other than certain notebook systems have not been seen. NVIDIA has initiated discussions with its supply chain regarding this material set issue and the Company will also seek to access insurance coverage for this matter."
posted by : Derek, 29 August 2008
So, whichever way it breaks, I do hope that what *is* the truth WRT this issue gets out...
"...there are some things that can beat smartness and foresight. Awkwardness and stupidity can." ~ Mark Twain