Cisco's Network Bugs Are Front and Center in Bankruptcy Fight (bloomberg.com)
Reader Dharkfiber writes: Bloomberg is covering a story today about a hosting business that is now filing chapter 11 due to bugs in a switch. Good, bad, or ugly, is it time to admit that business really can't continue without IT? When will IT training become formal curriculum in schools?An excerpt from the Bloomberg report: There's buggy code in virtually every electronic system. But few companies ever talk about the cost of dealing with bugs, for fear of being associated with error-prone products. The trial, along with Peak Web's bankruptcy filings, promises a rare look at just how much or how little control a company may have over its own operations, depending on the software that undergirds it. Think of the corporate computers around the world rendered useless by a faulty update from McAfee in 2010, or of investment company Knight Capital, which lost $458 million in 30 minutes in 2012 -- and had to be sold months later -- after new software made erratic, automated stock market trades.
Peak Web, founded in 2001, had worked with companies including MySpace, JDate, EHarmony, and Uber. Under its $4 million-a-month contract with Machine Zone, which began on April 1, 2015, it had to keep Game of War running with fewer than 27 minutes of outages a year, court filings show. According to Machine Zone, the hosting service couldn't make it a month without an outage lasting almost an hour. Another in August of that year was traced to faulty cables and cooling fans, according to the publisher.
Never? That's all being outsourced, duh.
or at least a cabinet full of new plug-ready parts. that means the HDAs need to be pre-formatted, for instance. cables tested. configurations stored on a server for tftp loading behind your firewall.
things that cost money. things that suits have no clue about.
if this is supposed to be a new economy, how come they still want my old fashioned money?
All systems have bugs, not all data centers have this kind of crap uptime record.
Smart IT people build data centers out of heterogeneous hardware and set it up to degrade gracefully when something fails. You won't get this if you just hire A+/Net+ staff.
Blame the PHB/CTO not the hardware.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
"Disclaimer of Liabilities - Limitation" Page 16, states that (condensed) : all liability shall not exceed the price paid for the software, or of the price of the product which includes the software.
And to use the equipment and Cisco software, you agree to the terms of service.
http://www.cisco.com/c/en/us/t...
So, at best, they can recover the costs of the switches involved. . .
I'm a photographer, and I sell my work through a web service. They bring together the finishing providers (prints, calendars, t-shirts, etc) and take care of payment, and all I have to do is provide content and manage sales. When I finish post-processing on a new photo, the tool I use (Adobe Lightroom) automatically uploads to the web service in the album I select. I cover events, so there's often a massive number (600 or so) of photos to upload.
Yesterday I was getting sporadic "service not available" messages from the service. After doing some triage to verify the problem was not at my end, I contacted customer support. Mind you, this was 10:30 PM PST. But that's the way it is with photographers -- we often take photos during the day and process them at night, which is somewhat the opposite of a standard use case. (And should be borne in mind when said services schedule maintenance. Just sayin'.)
Browsing the service's forum, I saw others were seeing the same error message, and people were starting to get excited. (This is our livelihood, after all.)
I got an answer to my service ticket in less than 30 minutes, that they were struggling with with network problems with one of their service providers (probably a cloud service). I got a followup shortly after that they thought the service was up now but they were still testing. And I got another followup at 6:30 AM that the problem had been resolved and they had put steps in place to insure it would not happen again. They also implemented a "status page" that we could consult in the future (which should have already existed, but live and learn).
Now, *that's* the way to handle an incident like this. Very commendable. But it does point up the problems a business sometimes has when they rely too much on external services. Just my opinion, but the main difference I can see between in-house and outsourced is one of motivation. If you're providing an online service, your employees realize in their heart of hearts that outages can easily result in business failure and loss of jobs. But if you're renting all the pieces of your service from outside vendors, you soon find that those vendors may be concerned about their contract with you, and the money they make off you, which isn't at the same level in the hierarchy of needs as the live-or-die situation you are in.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
Fantastic, then: a chance to test EULAs in a court of law! I'm sure Cisco will let that happen.
Good, bad, or ugly, is it time to admit that business really can't continue without IT? When will IT training become formal curriculum in schools?
Good, bad, or ugly, is it time to admit that business can't really continue without Patents/Accounting/Negotiations/Advertising/Sales/1000 other things?
When will patent law/banking/economics/marketing of these become formal curriculum in schools? That's about the time when IT should become a part of the formal curriculum as well.
High school shouldn't be about training for a job that only a fraction of the students will eventually do. If businesses can't survive without IT, then they hire people who are specially trained in IT - a HS course won't be train people enough to solve any hard IT problems anyway.
Setting up this or many other systems has nothing per se to do with IT, and everything about a design philosophy called "fail-operational/fail-safe", meaning a system should tolerate a failed component and keep going, or with multiple failed components degrade to a safe mode.
This applies to everything from surgery to rocket science. It's why airliners have triply-redundant control systems, why trucks and trains have air brakes (losing pressure applies the brakes), why elevators are ubiquitous (the Otis braking system -- if there's no load on the cable, the brakes activate), why valves for spacecraft guidance rockets are actually four valves connected in parallel/series (so a single valve failing open or failing closed will not affect the overall system), why a glitch in an intersections traffic lights fails them to all blinking red (all green should be electrically impossible). Shit happens, and you should be designing for that. (Sure, sometimes shit still happens regardless, but less often.)
If your SLA calls for 99.999% uptime, you'd darn well better design for that (and 27 minutes downtime a year is only 99.95% uptime, how hard can that be?).
People will figure out IT training is important, when they realize that they can't make stupid statement like "IT training" as if it means something.
What even IS I.T.?
Are you talking about server management? Network Managment? DevOps? What skill sets do you need?
It's like saying we need more brain surgeons, so we need MOAR BIOLOGY TRAINING!
MBAs, or people in general, will never appreciate just how complex some work can be, because of Dunning-Kruger. They don't know or understand how complex IT is, therefore they are unable to *appreciate* how complex IT is. Just like they are unable to appreciate anything else that is complicated, whether it's medicine, physics, etc.
If they're contractually bound to deliver that sort of uptime, and their system isn't designed to tolerate these kind of failures, they deserve to fail.
Sounds like the suits took a contract but did not want to pay for the back end infrastructure to really support it.
I can't tell you the number of times I've seen this mentality -
From Banks to Airlines to Healthcare to "Service" Providers....
Usually it seems to be a combination of cheap C-level people and a layer of "yes" men between them and IT.
Unfortunately the deciders in chief don't feel the pain when deals like this cause the company to implode....
Service guarantees Citizenship! Questions Guarantee GITMO.... Amerika Uber Alles!
We really just need more coders and engineers, for now. AI will replace a lot of coders too. The continued move to the cloud will replace a lot IT personal. We always need good managers and customer relations/sales if you prefer more stable fields. Coding can be hard work for the money. I don't think people get that. Most IT is a lot of ass sitting waiting for something to happen. Coding is more like real work unless you own the product and can mostly do bug and feature requests. If you planned to work for a corporation, the Network and Server Administrator jobs are FAR FAR less work than coding. Coders should probably be paid 50-100% more than they are and you'd have a lot more ppl want to be and STAY coders.
The company’s Nexus 3000 switches began to fail after trying to improperly process a routine computer-to-computer command, and because Cisco keeps its code private, Peak Web couldn’t figure out why.
...
Finally, late in October, came the 10 hours of darkness. Three people familiar with Peak Web’s operations say the lengthy outage gave the company time to deduce that the troublesome command was reducing the switches’ available memory and causing them to crash. The company alerted Cisco.
So they ended up black-box debugging the vendor's own problem for them. I wish I could say I am unfamiliar with that...
The reason they were able to underbid the competition was because they had insufficient redundant infrastructure. Now they're paying the price, and good riddance.
The modern app appers behind App of App almost lost their entire appy app because the LUDDITES at LUDDITE Peak Web can't app apps because they're LUDDITES, not modern app appers!
Apps!
No matter how much we hate being treated without respect, both infrastructure and desk-side support are considered as a "cost center" as opposed to a "revenue center". This means that the company spends money on us without any tangible return as opposed to the sales and sales support group that actually generates income. The best way to increase profits is to reduce spending and staff reduction is the quickest and easiest way to reduce spending. In the short-term things will continue humming away in the data center but users may start to notice it's taking longer to get a call-back regarding their open ticket.
Sales people are basically reactionary and the way to solve a problem is to immediately get on the phone and talk to a customer or a prospect. No plan - just "wing it" and hope for the best, close the deal and sign that contract. Ring the bell and everyone cheers.
Tech people, on the other hand, are thinkers and planners. We put things in place to prevent having to "put out fires" later on. We'd rather fix things BEFORE they break but when pushed, we think through the problem and come to a solution. Unfortunately, there is no deal closing, signing of a contract or a cheerful bell ringing. When it comes time to figure budgets they look at how much they spend on IT while IT, from their point of view, doesn't really do much of anything. "Do we really need "X number" of people on staff? Can't be get by on "X/2" people?
Developers may create the product that ultimately gets sold, but it's the sales people that get the commissions while throwing money at prospects and contract renewals. When it comes time to figure the budgets they look at how much they spend on developers and assume one developer is as good as any other developer so why can't they just use developers from India and spend a tiny fraction of what they're currently spending.
Once IT stops being viewed as an expense and viewed as a way to generate income, we will continue to be treated like low-income crap.
"Hey, let's cut our expenses by 90% and outsource our sales staff to India"...
...said no company ever.
Editors: Please RTFA and then ask yourself, "does my summary accurately convey what the article said". You failed horribly here.
It's important to note that the plaintiff (Machine Zone) is one asserting it was faulty cables and fans, not the host (Peak Web). Peak Web is asserting that the outages were caused by issues with their Cisco equipment. Why is this important? It seems the hosting contract included an indemnification of some type for vendor failures. So if Peak Web can prove that it was a Cisco failure, they're off the hook. If Machine Zone can prove that they were caused by shoddy equipment or installation then Peak Web has to cough up.
The headline mentioned Cisco but the summary doesn't mention Cisco at all and completely left out any discussion of the fact that Cisco acknowledged that there were issues with their equipment and seems to have done a piss-poor job of supporting PeakWeb (though PeakWeb shouldn't have singled sourced).
Regardless of how it shakes out, whoever it was at Machine Zone that agreed to the indemnification in the hosting contract needs a boot in the a**.
All I can say, their horrible uptime is not because of cisco. Piss poor management, emphasis on sales over service, etc. All the things that make a company bad. This is why I no longer work there..
QB64, doh.
How else do you expect this to turn out? I suppose that Cisco could pay up to keep them quiet and out of a courtroom but that sets a precedent for writing checks if a company can somehow blame them for their failure.
There is already a long history of people getting fitness of purpose claims tossed out of court. I don't believe that Cisco has much to worry about here.
I am armed because I am free. I am free because I am armed.