Intel's Atom C2000 Chips Are Bricking Products -- And It's Not Just Cisco Hit (theregister.co.uk)
Thomas Claburn, reporting for The Register: Intel's Atom C2000 processor family has a fault that effectively bricks devices, costing the company a significant amount of money to correct. But the semiconductor giant won't disclose precisely how many chips are affected nor which products are at risk. In its Q4 2016 earnings call earlier this month, chief financial officer Robert Swan said a product issue limited profitability during the quarter, forcing the biz to set aside a pot of cash to deal with the problem. "We were observing a product quality issue in the fourth quarter with slightly higher expected failure rates under certain use and time constraints, and we established a reserve to deal with that," he said. "We think we have it relatively well-bounded with a minor design fix that we're working with our clients to resolve." Coincidentally, Cisco last week issued an advisory warning that several of its routing, optical networking, security and switch products sold prior to November 16, 2016 contain a faulty clock component that is likely to fail at an accelerated rate after 18 months of operation. Cisco at the time declined to name the supplier of that component.
Intel for the past decade has dropped the ball. Its missing the boat on mobile and failing to push x86 chips into mobile phones has weakened their entire platform which really needs to be an "everywhere" platform. It has been clear for a while that mobile would be a majority of CPUs for a decade, why it has not pushed x86 into more phones is beyond me. Its totally incompetent, especially given x86 binary compatability between desktop and mobile could be a selling point
Once you get a replacement CPU from Intel, it's easy to upgrade your system.
Get a small screwdriver, and insert it in the gap under the chip near pin 1. Gently rock the CPU out of its DIP socket; you may have to alternate pulling at each end of the chip.
The new chip's legs will be slightly splayed for use with automatic pick-and-place machines. You may need to gently bend them inwards before proceeding. Making sure that pin 1 is aligned with the marker on the motherboard silkscreen, gently push the new CPU straight down into the DIP socket. Your system is fixed!
Can't post to The Register, since they don't have ACs.
Anyway, the issue is damage to the LPC (low-pin-count) bus clock line. This is a secondary bus where you hang old ISA-style devices, like the system FLASH. If the FLASH is the only thing in there, it will mostly render the system unbootable (so, stuff that never gets power-cycled would just keep going). But LPC can generate interrupts, and one often hangs other crap to that bus, such as i2c controllers for hot-swap bays, motherboard management controllers, and other sensors. In that case, you can expect severe runtime misbehavior.
The issue is caused by *continuous degradation due to use*, so repairing it is easy, if costly: replace the motherboard with a new one under warranty (and even if out of warranty period wherever this kind of "stealth" manufacturing defect is not subject to warranty time period limitations, such as in Brazil). It will "reset" the counter. This is your zero-day solution to the issue.
Depending on time-to-market for the new stepping (hardware revision) B1/C0 of the Atom C2000, you might need an interim solution, which is the "platform-level change", i.e. redesigned board with extra components that work around Intel's hardware design error. As soon as you have these, you start using these to replace any boards returned due to the defect, or start a "recall" to preemptively replace boards.
Depending on the total cost of the board plus other components, you keep the old boards you replaced around, and when revision B1/C0 of the Atom C2000 is out, you BGA-replace them in a factory (about US$ 25 per board in large volumes, if that much), maybe replace any liquid electrolytic capacitors and other crap that ages badly, and use the boards either as new or as refurbished, depending on your corporate/regulatory ethics. This kind of repair almost always really resets the boards MTBF. If Intel supplies the replacement Atoms at no charge, the cost of repair might well be far less than the cost of the production run for boards you'd want to keep around for warranty services, anyway.
Mind you, at 1.5 years per failure, it will be rare the legislation/contract that forces more than one replacement... so, let's hope they don't replace a faulty board with a brand-new virgin but-still-timebombed board. You'd have trouble to replace it a second time if it fails after the warranty period.