Cisco Blamed A Router Bug On 'Cosmic Radiation' (networkworld.com)
Network World's news editor contacted Slashdot with this report: A Cisco bug report addressing "partial data traffic loss" on the company's ASR 9000 Series routers contended that a "possible trigger is cosmic radiation causing SEU [single-event upset] soft errors." Not everyone is buying: "It IS possible for bits to be flipped in memory by stray background radiation. However it's mostly impossible to detect the reason as to WHERE or WHEN this happens," writes a Redditor identifying himself as a former [technical assistance center] engineer...
"While we can't speak to this particular case," Cisco wrote in a follow-up, "Cisco has conducted extensive research, dating back to 2001, on the effects cosmic radiation can have on our service provider networking hardware, system architectures and software designs. Despite being rare, as electronics operate at faster speeds and the density of silicon chips increases, it becomes more likely that a stray bit of energy could cause problems that affect the performance of a router or switch."
Friday a commenter claiming to be Xander Thuijs, Cisco's principal engineer on the ASR 9000 router, posted below the article, "apologies for the detail provided and the 'concept' of cosmic radiation. This is not the type of explanation I would like to see presented to the respected users of our products. We have made some updates to the DDTS [defect-tracking report] in question with a more substantial data and explanation. The issue is something that we can likely address with an FPD update on the 2x100 or 1x100G Typhoon-based linecard."
"While we can't speak to this particular case," Cisco wrote in a follow-up, "Cisco has conducted extensive research, dating back to 2001, on the effects cosmic radiation can have on our service provider networking hardware, system architectures and software designs. Despite being rare, as electronics operate at faster speeds and the density of silicon chips increases, it becomes more likely that a stray bit of energy could cause problems that affect the performance of a router or switch."
Friday a commenter claiming to be Xander Thuijs, Cisco's principal engineer on the ASR 9000 router, posted below the article, "apologies for the detail provided and the 'concept' of cosmic radiation. This is not the type of explanation I would like to see presented to the respected users of our products. We have made some updates to the DDTS [defect-tracking report] in question with a more substantial data and explanation. The issue is something that we can likely address with an FPD update on the 2x100 or 1x100G Typhoon-based linecard."
Cosmic
I'm not saying it was aliens, but...
It was aliens!
And they're not all that rare. Ask your local super computer admin how often they register ECC errors...
would be another explanation.
Slashdot, fix the reply notifications... You won't get away with it...
I wonder if this is a real thing on the surface. The Earth's magnetic sphere has a tendency to grab and divert most of these things, which manned spacecraft have a hard time maneuvering. Do they actually ever screw up processors on the ground? That's pretty crazy.
How is this new? They've been saying that for years. They've used that explanation for every router series back to the 7500s at least. Mind you people have been making fun of it for just as long. 'Ooo. Sunspots. Better check to see if any of the routers rebooted.'
If it's cosmic radiation, wouldn't it affect more than the ASR 9000? Or is that the only model without a lead case?
Is that a roll of dimes in your pocket or are you happy to see me?
even if you have a strong support organization, one slacker responding with this to a customer, and the entire brand is tarnished.
Anybody can work under ideal circumstances. -- Jeff K. (January 4, 2001)
I work at a fortune 500 and I had to explain this to management just a few years ago on a Cisco 6500. It was a tough sell but I recall having a similar issue in the late 90's/ early 2000's with sun hardware so it isn't new. That was was even better to explain. The Sun's cosmic rays were causing the Sun's hardware to break!
I'm guessing that they've read the BOFH, but realized that there's much more reporting on solar-induced radiation ... so just decided to go with 'galactic' instead. .... completely forgetting that if this were the case, it would happen more frequently at high latitudes, due to the magnetosphere. And we'd also see a higher incidence rate after solar x-ray flares and solar particle events.
(and the disclaimer: I work for the Solar Data Analysis Center, but I'm not a scientist, and don't speak for my place of work, etc, blah blah blah)
Build it, and they will come^Hplain.
Sun Microsystems already pulled this bullshit back about 15 years ago... I don't really recall if it was a bad batch of processors or if it was bad non-ecc cache memory or whatever... but I do remember plenty of folks giving them a ration of shit and generally refusing to buy hardware from them after that... though once they fessed up to the problem and replaced all of our defective systems(and gave us a couple of free systems) we never had any further issues.
Trying to clam acts of god to get out of being responsible?
I work at a fortune 100, we're being delayed at the moment by software bugs in Cisco's routers. Their QA has completely gone out the window in the last few years, probably related to all the staff cutbacks. I expect we will start seeing Cisco losing market share if this keeps up.
This makes me wish I was still working for them in IOS Engineering for the opportunity just to stir some shit.
I'd have gone into the office on Monday morning with my head covered in tinfoil.
I use to get some good laughs in the Cisco office, I seem to be getting more on the outside these days.
Oh how the mighty have fallen....
I worked on communications satellites in the 1980's and bit flips from cosmic rays was a serious problem we needed to address. Chips needed to be hardened to resist cosmic rays and electronics had to be substantially shielded.
There are a lot of cosmic rays going through you right now. While the vast majority don't interact with your cells, once in a while one does and that can cause cancer or genetic changes. We owe much of evolution to cosmic rays!
As discovered by IBM back in the 70s, if it is a radiation induced upset, you'd see higher rates in places like Colorado vs Sea Level, and on upper floors of building vs lower floors.
As FOLDOC explains, Intel tested this idea decades ago by putting one board in a 25 ton lead safe and another outside to see if there was a measurable difference in bit rot. There wasn't. " Further investigation demonstrated conclusively that the bit drops were due to alpha particle emissions from thorium (and to a much lesser degree uranium) in the encapsulation material." They ended up redesigning the memory to be more resistant to the effect.
Good, inexpensive web hosting
Santa Claus spread chemtrails in the sky with which the easter bunny got stoned and confused causing the routers to crash!
Hey, it's not impossible!
Why believe someone from Reddit?
That was in his excuses rolodex.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
I used to use that all the time. Now I'll have to think of something else..
Finally a reason to put a tonfoil hat on my cable modem. Is that why they keep failing over time, or is it the diodes in the splitters the cable keeps replacing?
Cisco has said this before, easily 15 years they have claim this issue. Shitty happens is a better explanation.
Back in the mid 70, a company I was working for as a software engineer rented a non-IBM 370 type computer. It began to drop several random bytes out of random length records. The company swore that the computer couldn't do that and the problem was not their fault.
Hahaha... for some reason, this video came to mind. Just another fine example of stupid assumptions based on, and exposing, a woman's desire to be a sex object.
Maybe cosmic radiation is effecting them too?
Politics; n. : A religion whereby man is god.
Flips of a single bit in a memory or register are that few modern systems would run for long without error correcting memory. Even ECM has its limitations and most systems eventually crash/panic/blue-screen or whatever and require a reboot.
The costs to improve error resilience go up rapidly and don't have a meaningful upper bound. My working trade off was to design for a mtbf comparable to how long I wanted to keep that job.
From http://noelcomm.com/ethernet/: " We have a philosophy of using routers to route and switches to switch which ensures that our Ethernet devices move layer 2 frames as quickly as possible avoiding the “bumps on the wire” often encountered by our competitors who seek to agglutinate multiple services on a single, expensive platform."
Of course, this business just happens to be located in the buttcrack of the universe, Yakima, WA; home of the Braindead.
"does not take cpp comments" jesus christ
-> STACK TRACE OF ALREADY COMPILED PROGRAM
pls get help soon you aspergian shitfuc
I remember in the 70s some memory manufacturer used a ceramic package that had a lot of thorium. Bad trouble.
...a Cosmic Brownie?
http://cosmicbrownies.littlede...
You were mistaken. Which is odd, since memory shouldn't be a problem for you
LA Prostitute (by her own admission) Raven Williams is an HIV positive self-loathing crack addict and alcoholic who has borderline personality disorder and is possibly bipolar as well. She spends her time making YouTube videos cursing people out and starting fights with people in public around Hauser Blvd. in LA as she walks around topless. She has been arrested numerous times by the LAPD and is well known by the cops who patrol that district. She also harasses the police as they are doing their jobs. She attempts to sell her ugly art work on YouTube and via her weblog as well. Currently, she is being investigated by the IRS as she has never filed taxes for the IRS, besides the fact that she makes money illegally via prostitution. Due to her serious personality and psychological issues, she has been fired from every job she has ever had and simply can not relate to anyone else socially or professionally.
I have heard a person from a Cisco competitor talk about how their switches are cosmic-ray safe, but Ciscos are not.
The correct response to rare spontaneous radiation-induced errors is not, "Oopsie! Oh well!" The correct response is to design the hardware to be more tolerant and robust in the presence of inevitable background radiation. E.g., use ECC memory for fuck sake. And at least parity checking on all buses.
It shouldn't be a huge expense to build in some form of error correction to catch that sort of thing.
Chas - The one, the only.
THANK GOD!!!
My wife was looking over my shoulder when the "Cisco Blamed A Router Bug on 'Cosmic Radiation'" headline went by, and asked:
"What's their next excuse? Global Warming?"
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
In the early 00s i worked on a lot of as5200 series routers for dialup. Ive had Cisco blame cosmic radiation and solar flares for a handful of unexplainable crashes. I really dont know how anyone could argue that with a straight face when the equipment is in a rack with 10 other working pieces of Cisco equipment.
It was rats, sir. They ate through the wires.
Except all professional computing hardware comes in metal cases which are rack in metal rack units
A White House health report addressing "partial data traffic loss" on Secretary of State Hillary Clinton contends that a "possible trigger is cosmic radiation causing SEU [single-event upset] soft errors." Not everyone is buying: "It IS possible for bits to be flipped in memory by stray background radiation. However it's mostly impossible to detect the reason as to WHERE or WHEN this happens," writes a Redditor identifying himself as a former [technical assistance center] engineer...
What type of cosmic radiation? Does it occur more often near nuclear reactors? Fukushima?
If neutrinos trigger it, then thses routers are a really cheap neutrino detector.
If an of the other neutral partcles trigger it, ditto.
If only charges pertcles trigger it, then no such luck.
BUT ! If dark matter can trigger it, then physicists will keep them all.
Then there's the rare suggestion of psychic phenomena......
Just open source the software, check to see if deep-packet inspection triggers it when
the CEO gets a bonus... or something...
Yet she still had time to write the Realm-Jumper Chronicles, 5 volumes and counting. Colour me impressed.
Il n'y a pas de Planet B.
When I was a physics teacher I had an ongoing memory error problem with my Fujitsu Siemens laptop which led to frequent BSOD. I replaced the memory, and it still occurred. I then noticed the memory error happened frequently at work, but never at home. I wondered whether it could be a radiation issue, as I handled radioactive sources at my desk. I got my tech to do a leak check on my desk. It showed there was higher-than-background levels of radiation (can't recall whether alpha or beta) around my desk. This only showed up using a fairly decent G-M tube which had been given to us by the local hospital when they were having a clearout. Turns out the source of radiation was dust from a piece of fossilised wood I'd picked up some time previously. It had been sitting on my desk and zapping my laptop's memory. I sealed the fossil in a Ziplock bag and kept it in a Quality Street tin. The problem never recurred.
A project I'm working on expects one SEU per month. This is an issue in safety-critical applications where failures have to be of the order of once per decade. Mitigated by CPU's and memory being triplicated. "Voting" hardware detects differences on every cycle.
There has been assloads of research on mitigating soft errors going back to the 1970’s. I’ve published some myself. There is no shortage of workable methods on masking transient errors in logic and bit flips in DRAMs. SEUs are a major problem for supercomputers, so their memory systems have sophisticated mechanisms for catching them.
If Cisco is blaming this on SEUs, that just proves their incompetence, since they obvious didn’t spend 5 minutes with Google Scholar looking at hundreds of GOOD papers (in the top conferences and journals) on this topic. Seriously.
PLUS, if something goes wrong, even if it IS a transient error, it’s FAR more likely to be a fixable bug than radiation. We had a weird bug in a DRAM controller whose state kept going invalid. We had to add another circuit to fix that. We *called* is a cosmic ray deflector, but the more likely causes, in order were (a) another bug we couldn’t find, (b) a timing violation caused perhaps by voltage or temperature fluctuation, or (c) crosstalk in the circuit. We would have kept looking, but this deflector circuit made it robust to hundreds of hours of slamming the memory system, so we let it go. (Also, it was graphics memory, so even if it did ultimately suffer a glitch some day, it would go unnoticed.)
I guess in this case it is "the same thing" ... the silicon from which they made some of the chips involved was not pure enough, or the material for doting was contaminated.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I have had Cisco tell me this many times any time a router reboots from a parity error for over 15 years now, so they have been using this for a long time now.
It could indeed be possible. Aloha particles are well-know to be capable of causing bit-flips in capacitive memories (DRAM). This is exactly why we have things like ECC on memory pathways. That said - its not the only explanation. There are ways of testing this. For example, observing the general abundance and frequency of such particles in a bubble chamber, and attempting to corrolate to instances if error. Or placing equipment in a shilded enviroment and seeing if frequency of errors change. Long story short - it MAY be true - but if you want to draw a conclusion - you really have to offer more data to prove it.
everything is because Sunspots!
My reaction when I first heard the "cosmic radiation" excuse for misbehaving electronics.
With decades of experience in tech implementations in radiation fields I can personally attest to the fact that the radiation flux levels needed to cause reactions in electronics could only be high enough due to cosmic radiation at elevations higher than 20,000 feet. The levels need to be in Rad per hour rather than the microrad per hour that you get from cosmic radiation. (i.e. background at sea level is often 15-20 microrem/hr in the day and 3-5 microrem/hr at night with the difference due to cosmic radiation. In a 5 Rad/hr field, 5000000 microrem, the lifetime of electronics is weeks if not days before the semiconductors fail from ionization of the doping in the material.) This is for electronics other than radio transmissions as radio transmissions can experience interference in transmission due to ionization in the atmosphere. (thunderstorms do that too) Low power short range such as wifi is much less effected than long range skywave or aimed microwave. And radio interference is not an issue in the electronics but with interfering transmissions from mama nature. Cisco was so obviously full of a certain word that rhymes with their name.
NRRPT/RCT
thanks for that video, that chick was hilarious!
I needed a good laugh... she was addicted to the penus!!!
Case solved.
Shut the fuck up, you plagiarising cut-n-paste junky! that has nothing to do with TFA!
Try think for yourself instead of quoting random people from youtube and trying to pass it off as a legitimate original thought.
You just need the right gadget.
More likely an bug in the code that the NSA has inserted into all of their routers.