Cisco's Network Bugs Are Front and Center in Bankruptcy Fight (bloomberg.com)
Reader Dharkfiber writes: Bloomberg is covering a story today about a hosting business that is now filing chapter 11 due to bugs in a switch. Good, bad, or ugly, is it time to admit that business really can't continue without IT? When will IT training become formal curriculum in schools?An excerpt from the Bloomberg report: There's buggy code in virtually every electronic system. But few companies ever talk about the cost of dealing with bugs, for fear of being associated with error-prone products. The trial, along with Peak Web's bankruptcy filings, promises a rare look at just how much or how little control a company may have over its own operations, depending on the software that undergirds it. Think of the corporate computers around the world rendered useless by a faulty update from McAfee in 2010, or of investment company Knight Capital, which lost $458 million in 30 minutes in 2012 -- and had to be sold months later -- after new software made erratic, automated stock market trades.
Peak Web, founded in 2001, had worked with companies including MySpace, JDate, EHarmony, and Uber. Under its $4 million-a-month contract with Machine Zone, which began on April 1, 2015, it had to keep Game of War running with fewer than 27 minutes of outages a year, court filings show. According to Machine Zone, the hosting service couldn't make it a month without an outage lasting almost an hour. Another in August of that year was traced to faulty cables and cooling fans, according to the publisher.
Never? That's all being outsourced, duh.
or at least a cabinet full of new plug-ready parts. that means the HDAs need to be pre-formatted, for instance. cables tested. configurations stored on a server for tftp loading behind your firewall.
things that cost money. things that suits have no clue about.
if this is supposed to be a new economy, how come they still want my old fashioned money?
All systems have bugs, not all data centers have this kind of crap uptime record.
Smart IT people build data centers out of heterogeneous hardware and set it up to degrade gracefully when something fails. You won't get this if you just hire A+/Net+ staff.
Blame the PHB/CTO not the hardware.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
"Disclaimer of Liabilities - Limitation" Page 16, states that (condensed) : all liability shall not exceed the price paid for the software, or of the price of the product which includes the software.
And to use the equipment and Cisco software, you agree to the terms of service.
http://www.cisco.com/c/en/us/t...
So, at best, they can recover the costs of the switches involved. . .
I'm a photographer, and I sell my work through a web service. They bring together the finishing providers (prints, calendars, t-shirts, etc) and take care of payment, and all I have to do is provide content and manage sales. When I finish post-processing on a new photo, the tool I use (Adobe Lightroom) automatically uploads to the web service in the album I select. I cover events, so there's often a massive number (600 or so) of photos to upload.
Yesterday I was getting sporadic "service not available" messages from the service. After doing some triage to verify the problem was not at my end, I contacted customer support. Mind you, this was 10:30 PM PST. But that's the way it is with photographers -- we often take photos during the day and process them at night, which is somewhat the opposite of a standard use case. (And should be borne in mind when said services schedule maintenance. Just sayin'.)
Browsing the service's forum, I saw others were seeing the same error message, and people were starting to get excited. (This is our livelihood, after all.)
I got an answer to my service ticket in less than 30 minutes, that they were struggling with with network problems with one of their service providers (probably a cloud service). I got a followup shortly after that they thought the service was up now but they were still testing. And I got another followup at 6:30 AM that the problem had been resolved and they had put steps in place to insure it would not happen again. They also implemented a "status page" that we could consult in the future (which should have already existed, but live and learn).
Now, *that's* the way to handle an incident like this. Very commendable. But it does point up the problems a business sometimes has when they rely too much on external services. Just my opinion, but the main difference I can see between in-house and outsourced is one of motivation. If you're providing an online service, your employees realize in their heart of hearts that outages can easily result in business failure and loss of jobs. But if you're renting all the pieces of your service from outside vendors, you soon find that those vendors may be concerned about their contract with you, and the money they make off you, which isn't at the same level in the hierarchy of needs as the live-or-die situation you are in.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
Good, bad, or ugly, is it time to admit that business really can't continue without IT? When will IT training become formal curriculum in schools?
Good, bad, or ugly, is it time to admit that business can't really continue without Patents/Accounting/Negotiations/Advertising/Sales/1000 other things?
When will patent law/banking/economics/marketing of these become formal curriculum in schools? That's about the time when IT should become a part of the formal curriculum as well.
High school shouldn't be about training for a job that only a fraction of the students will eventually do. If businesses can't survive without IT, then they hire people who are specially trained in IT - a HS course won't be train people enough to solve any hard IT problems anyway.
People will figure out IT training is important, when they realize that they can't make stupid statement like "IT training" as if it means something.
What even IS I.T.?
Are you talking about server management? Network Managment? DevOps? What skill sets do you need?
It's like saying we need more brain surgeons, so we need MOAR BIOLOGY TRAINING!
MBAs, or people in general, will never appreciate just how complex some work can be, because of Dunning-Kruger. They don't know or understand how complex IT is, therefore they are unable to *appreciate* how complex IT is. Just like they are unable to appreciate anything else that is complicated, whether it's medicine, physics, etc.
If they're contractually bound to deliver that sort of uptime, and their system isn't designed to tolerate these kind of failures, they deserve to fail.
The company’s Nexus 3000 switches began to fail after trying to improperly process a routine computer-to-computer command, and because Cisco keeps its code private, Peak Web couldn’t figure out why.
...
Finally, late in October, came the 10 hours of darkness. Three people familiar with Peak Web’s operations say the lengthy outage gave the company time to deduce that the troublesome command was reducing the switches’ available memory and causing them to crash. The company alerted Cisco.
So they ended up black-box debugging the vendor's own problem for them. I wish I could say I am unfamiliar with that...
Most likely it never touched systems and was put in by some sales shmuck or bean pusher after everything on the RFP response was reviewed and engineering told them not to. Someone asked for 99.995% uptime (26min and a change; there is one more 9 than you put in), sales decided that planned outages and packet loss don't count in, hilarity ensues.