Slashdot Mirror


Ask Slashdot: How Transparent Should Companies Be When Operational Technology Failures Happen?

New submitter supernova87a writes: Last week, Southwest Airlines had an epic crash of IT systems across their entire business when "a router failure caused the airlines' systems to crash [...] and all backups failed, causing flight delays and cancellations nationwide and costing the company probably $10 million in lost bookings alone." Huge numbers of passengers, crew, and airplanes were stranded as not only reservations systems, but scheduling, dispatch, and other critical operational systems had to be rebooted over the course of 12 hours. Passenger delays, which directly attributable to this incident, continued to trickle down all the way from Wednesday to Sunday as the airline recovered. Aside from the technical issues of what happened, what should a public-facing company's obligation be to discuss what happened in full detail? Would publicly talking about the sequence of events before and after failure help restore faith in their operations? Perhaps not aiming for Google's level of admirable disclosure (as in this 18-minute cloud computing outage where a full post-mortem was given), should companies aim to discuss more openly what happened and how they recovered from system failures?

93 comments

  1. Router Failure? by mlw4428 · · Score: 1

    Router failures shouldn't cause loss of data in any appreciatable amount. Enterprise level organizations should have automatic failover routers in place. This was far more than a simple router failure...so the real question should be: should companies be allowed to lie to their customers about major technical issues?

    1. Re:Router Failure? by known_coward_69 · · Score: 1

      this is southwest. they fly out of the ghetto and broken down terminal in laguardia that looks like crap. i fly them because they are cheap. it's hundreds of $$$ more for me and my family to fly delta out of their nice terminal but why would i pay that when i don't want to sit there for hours just to spend money on overpriced food. if they had a delay i wouldn't care either. i'd just fly on the next flight they put me on

    2. Re:Router Failure? by LifesABeach · · Score: 1

      A router losing data? Ya, right. SouthWest got hacked, and they seem to think that insulting everyone's intelligence is good enough. Why did they even bother?

    3. Re:Router Failure? by PolygamousRanchKid+ · · Score: 1

      A router losing data? Ya, right.

      "Teacher, I couldn't do my homework last night, because our dog ate the router."

      "And the cat ate my gym suit."

      --
      Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
    4. Re: Router Failure? by davidwr · · Score: 1

      "Teacher, I couldn't do my homework last night, because our dog ate the router."
        "And the cat ate my gym suit."

      Pics from your pets' veterinarian or it didn't happen.

      --
      Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    5. Re:Router Failure? by jellomizer · · Score: 1

      Well how many institutions actually have a proper IT Infrastructure?
      Having a company having to embarrassingly show their inadequacies when a problem effects customers should be public. Because if their value in their IT systems is so low should we trust them with their data? Also being self serving: That embarassment will make sure they hire more staff and put more money in IT funding.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    6. Re:Router Failure? by unencode200x · · Score: 1

      I hope they maintain their aircraft better than their computer systems and terminals. It sure doesn't inspire confidence.

      These people are just incompetent and should be fired immediately. Up time is a solved problems if you engineer well.

      --

      Chance favors the prepared mind.
      Perfect is the enemy of good.
    7. Re:Router Failure? by Aaden42 · · Score: 1

      I'm not sure it counts as "insulting everyone's intelligence" when the vast majority of the audience they're playing to doesn't have the knowledge to know what they said is in any way implausible.

    8. Re:Router Failure? by Aaden42 · · Score: 5, Insightful
      That embarassment will make sure they hire more staff and put more money in IT funding.

      You haven't worked in enterprise IT for long, have you? An embarrassment like this will make them flog their existing staff harder, insist on more metrics to measure performance, more boxes on the audit form to tick, more mandatory unpaid overtime. But little chance they'll actually spend more money on the IT cost center.

    9. Re:Router Failure? by bravecanadian · · Score: 1

      I hope they maintain their aircraft better than their computer systems and terminals. It sure doesn't inspire confidence.

      These people are just incompetent and should be fired immediately. Up time is a solved problems if you engineer well.

      You can be relatively sure they do the absolute bare minimum like every company does with their "cost centers".

      People have been convinced they want cheap everything so the MBAs turn the screws down really good..

    10. Re:Router Failure? by bravecanadian · · Score: 1

      That embarassment will make sure they hire more staff and put more money in IT funding.

      You haven't worked in enterprise IT for long, have you? An embarrassment like this will make them flog their existing staff harder, insist on more metrics to measure performance, more boxes on the audit form to tick, more mandatory unpaid overtime. But little chance they'll actually spend more money on the IT cost center.

      Sadly true in most cases.

      In most organizations whose businesses are not IT related, the only time anyone powerful enough to do anything about it cares about IT is when it breaks.

      When things are working, what do we need more IT expenditures for?

      When things are not working, why did we spend what we did?

      I wish I had never gotten into this "career".

    11. Re:Router Failure? by swb · · Score: 1

      Shouldn't, but could.

      They could be running a converged network infrastructure with storage and networking fabrics meshed and a run-amok router starts blasting out broken routes and it cascades into storage access problems and crashes compute nodes that lose their storage, resulting some borked databases and crashed apps.

      I'd guess it was designed to not do that and we don't know if it was a config error, some HA feature that didn't work, some other bug or what.

    12. Re:Router Failure? by fahrbot-bot · · Score: 1

      People have been convinced they want cheap everything so the MBAs turn the screws down really good.

      Funny how "cheap" never seems to apply to their salaries and bonuses though.

      --
      It must have been something you assimilated. . . .
    13. Re:Router Failure? by Anonymous Coward · · Score: 0

      My wife and kids got stuck in Chicago on Wednesday. The next flight they put them on ended up being Saturday night. This wasn't as simple as they canceled the 6 o'clock so I'll just take the 7:30.

      Captcha: dispute

    14. Re:Router Failure? by bravecanadian · · Score: 1

      Funny how "cheap" never seems to apply to their salaries and bonuses though.

      Of course not! They are adding value and if they weren't sufficiently compensated they would take their talent elsewhere!

    15. Re:Router Failure? by Anonymous Coward · · Score: 0

      This sounds familiar to me. A router failure caused "systems to crash."

      This isn't about the routers or network redundancy, this is about the application environment not surviving a network disruption. Personally, I blame the software developers.

      At my shop (health care networking), our network up time is great, but we had a backbone router failure once (actually a cascading error that took out all 4 routers). The network had less than 5 minutes of downtime; however, we got charged for a few hours of downtime because the DNS resolution servers melted down when they lost connectivity and took over 30 minutes to restore. All kinds of other application services crashed hard due to lack of DNS resolution, with many server applications requiring intervention to recover. It would have been worse if it had been a datacenter router cascade crash.

      Sure the network recovered quickly, but do all the servers and applications recover? Many applications take shortcuts in error handling that assume the network will always be available. When it isn't things get really odd as applications don't recover their interconnections cleanly. That complexity really goes up fast with lots of interconnected application systems, and the results are very obtuse failures after a network disruption.

    16. Re:Router Failure? by easyTree · · Score: 1

      So, they're insulting themselves for assuming their customers are largely clueless?

    17. Re:Router Failure? by fustakrakich · · Score: 1

      Router failure? No.

      Windows 10 upgrades... Don't worry. That all ends today

      --
      “He’s not deformed, he’s just drunk!”
    18. Re:Router Failure? by easyTree · · Score: 1

      Having a company having to embarrassingly show their inadequacies when a problem effects customers should be public. Because if their value in their IT systems is so low should we trust them with their data? Also being self serving: That embarassment will make sure they hire more staff and put more money in IT funding.

      More interestingly (IMO), given that such a path is chosen, how could we appropriately encourage truthful and full disclosure? i.e. what's in it for the business?

      "You are required to under threat of onerous penalties" has been shown over centuries to offer little in terms of preventative potential.

    19. Re:Router Failure? by hawguy · · Score: 1

      Router failures shouldn't cause loss of data in any appreciatable amount. Enterprise level organizations should have automatic failover routers in place. This was far more than a simple router failure...so the real question should be: should companies be allowed to lie to their customers about major technical issues?

      Why is that so hard to believe? I can see how a core router failure could lead to data loss. Router failed, backup router didn't work (if you don't do failover testing, you don't know that your backup is really ready to take over the load: "oh oops, the firmware on the fiber interface card on the secondary crashes under heavy load"), split-brain leads some systems to fail over to secondary, now you've got transactions hitting primary and secondary databases concurrently, possibly with no way to reconcile them, hence data loss. It may have failed over and back several times, making data recovery even harder.

      My company made a conscious decision to delay failover until an engineer decides to flip the switch to prevent this kind of split-brain situation, we'll take the 30 - 60 minute hit on downtime, but that costs us a lot less than it would cost Southwest.

      Of course, even letting an engineer decide when to failover doesn't prevent problems, just ask salesforce

    20. Re:Router Failure? by easyTree · · Score: 1

      Isn't it the free upgrade which ends today? Surely the harassment (from the intimidating blue re-spawning rectangle) will need to ramp up significantly to continue to drive adoption?

    21. Re:Router Failure? by rickb928 · · Score: 1

      Wow. You've never done this for a living, right?

      Network failures in such a complex, distributed system cause unexpected problems. 'Router' should be thought of in this scenario as 'data flow device', and of course data is at risk.Transaction rollbacks, session timeouts, more than these cause problems that become data loss events.

      Not that SWA is without blame here. At work we had a server failure that impacted thousands of virtual machines. What was a storage failure became a corruption failure, and ultimately we lost most of those VMs. Recovery varied from restore image from backup to, for our team, rebuild from source. Total loss of data of 3 years' data. Rebuild the data for only 9 months due to unforeseen limitations. And silence form the technology team. We had to go to C-level execs to be included in the M&M and analysis, and were asked continually why, since we were just customers. Accountability was not even considered until we demonstrated the ultimate costs for *our* real customers. Even now they keep trying to write it off as unpredictable, and we go back to apparent lack of testing, disaster recovery validation, and the abject failure of a three-letter vendor to recover their flagship system from an error induced by their own software update. After pointing out that the only real penalty for their team is to remove team member, we had to say out loud, in front of execs, "and if we do not, will this happen again?" Of course not, they say. And of course, they could not say that they never lead us to believe that prior to the failure.

      To this day, and I will reference this on a call in about 2 hours, when they take up my current top issue, it will be blamed on an unexpected failure. And I'll say 'like $%^&* this spring?'. And every one on the call will remember, and know that I called them out again. And even the C-level is reluctant to actually cost the team anything, since this was a failure of routine maintenance, preferred and strategic vendor failure, recovery and data loss prevention failure, and even a system design deficiency resulting in a significant loss and concurrent brand damage/customer dissatisfaction/recovery cost impacts, or to put it simply, everything failed. No one is willing to acknowledge that all this failed. And they may, unknown to me, be in an investigation that will result in changes, but sadly I doubt it.

      SWA will, however, be looking into this, since it is not just lost bookings but huge overtime costs, make up flight costs, penalties, and compensation. My niece was flying then and this turned a 6 hour trip into an 11 hour ordeal with lost baggage and a very unsatisfactory experience at the counters, since after all the systems were down and no info was readily available. We won't know about that. And this is a first for SWA, but Continental failed like this a few years ago, and the USAir merger with an airline to be named later resulted in a huge system merge and a failure similarly. Big systems fail big. It is hard to test recovery when it costs so much to replicate the hardware, and the production system is 24x7x365. Glad I'm not in that business any more, though there is nothing like a realistic DR exercise to sharpen your focus and get the blood flowing, and when it actually works, a huge validation.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    22. Re:Router Failure? by ShanghaiBill · · Score: 1

      Funny how "cheap" never seems to apply to their salaries and bonuses though.

      You might want to talk to some SWA employees about that. SWA is notorious for low pay and stingy benefits. That is one way that they keep their fares low.

    23. Re:Router Failure? by Archfeld · · Score: 2

      Only those that are required to by some laws, regulations, or an external body, such as financial or health care institutions. Everybody else cheats on infrastructure and recovery equipment. They figure the odds of 2 or more related apps going down and then combine recovery/fail over systems and equipment into one. The cell phone industry is one of the worst, they barely have enough equipment to handle 50% of their KNOWN customer load and just figure that not everyone is going to try and make a call at the same time. When we were primarily a land line market the governing FCC require them to handle at least 80% before a failure, but as we switched to mobile devices they did away with the requirements for workload almost entirely, arriving at the basic priority restoration system we have now. In the event of a emergency the cell networks are going to fail almost instantly, leaving texting as the only, unreliable alternative.

      --
      errr....umm...*whooosh* *whoosh* Is this thing on ?
    24. Re:Router Failure? by eth1 · · Score: 1

      That embarassment will make sure they hire more staff and put more money in IT funding.

      You haven't worked in enterprise IT for long, have you? An embarrassment like this will make them flog their existing staff harder, insist on more metrics to measure performance, more boxes on the audit form to tick, more mandatory unpaid overtime. But little chance they'll actually spend more money on the IT cost center.

      Depends. What situations like this do is put a pretty firm dollar amount on the failures that IT asks for $X to mitigate/prevent. That way, next time they ask for $400k for something to avoid a $2M problem, they can ask in a language that upper management understands, and have memorable evidence to back them up.

      Sad that management won't trust the expensive experts they hire, but sometimes it takes an expensive lesson for them to learn (just sucks that the customers usually get screwed in the process).

    25. Re:Router Failure? by Anonymous Coward · · Score: 0

      Not that SWA is without blame here.

      You mean LUV... I can't believe that's actually their stock ticker, what a bunch of wankers.

    26. Re:Router Failure? by Anonymous Coward · · Score: 0

      "In the event of a emergency the cell networks are going to fail almost instantly, leaving texting as the only, unreliable alternative."
      Um, if the networks are down, and you can't make a call, how are you able to text? I think that you made an omission:
      "...leaving hardwired or satellite computer texting as the only, unreliable alternative."
      Even then, I've planned for the worst, the When rather than the If. _When_ the next big Quake hits here, almost certainly on the Hayward Fault, I can survive at home or on my boat for at least a month, with redundant Marine and Ham Radio systems for chatting. Although I would much prefer to spend that month on the boat. After all, far more people died from the April 18, 1906 San Francisco fire than directly from the quake itself, and Civilian telephone service wasn't even started being restored until May 10th.

    27. Re:Router Failure? by Anonymous Coward · · Score: 0

      Considering the number of corners they probably cut, I wouldn't be surprised if their IT ran on hamster wheels and voodoo priests. What I'm saying is, their system is probably such a friggin Frankenstein's monster that something that silly and stupid probably could happen. The keyword in your phrasing is "should". When a company "should" do something, they don't in order to cut costs. It's the same thing as devs ignoring warnings because they aren't errors.

      Besides, even if they did full disclosure, 75% of the audience wouldn't read/watch it, care, or understand it.

    28. Re:Router Failure? by chukm · · Score: 1

      In this case, it may be that the author of the article used the word backups instead of redundant systems. That seems far more likely.

    29. Re: Router Failure? by rickb928 · · Score: 1

      SWA is just an acronym. Luv refers to an old motto / ad campaign, notice they have a heart shape in most of their imagery.

      Their IATA code is. WN, possibly from an old parent airline...

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    30. Re:Router Failure? by Archfeld · · Score: 1

      The overhead to make a cell phone connection is much higher, and requires a point to point connection, thus the network will fail under that load. SMS or text messages don't require the same direct connection but can be bounced about and their lifespan is much greater, thus they can be made when the cell network is unavailable for normal calls. The downside is that text messages can go stale and the protocol does not include reception acknowledgement. Thus you can send 3 messages and get only part of them or receive them out of order. Hardwired phones that aren't VOIP are on a different system, and Satellite phones are a different animal entirely and very likely to function. Your solution of Ham and Marine radio will be a MUCH more reliable alternative to a cell phone or text in the event of an emergency and are quite forward thinking as a survival tool. The only downside to those are the limited people equipped to receive them. I've a neighbor who has both HAM and marine equipment he uses regularly.

      --
      errr....umm...*whooosh* *whoosh* Is this thing on ?
    31. Re: Router Failure? by HybridST · · Score: 1

      "SWA is just an acronym."

      Found online years ago:
      A.c.r.o.n.y.m. - A contrived reduction of nomenclature yielding mnemonics.

      --
      Ever notice that Cobra Commander sounds an awful lot like Star scream?
    32. Re:Router Failure? by TheOneFreeman · · Score: 1

      Well said, bravo! Bravo!

  2. why can't people accept that things happen? by known_coward_69 · · Score: 1

    i've been delayed because of weather, engine troubles, etc and i'm still alive and happy. the only thing public pressure does is cause the company to spend more money in redundant hardware which mostly sits unused and raises prices

    1. Re:why can't people accept that things happen? by acoustix · · Score: 1

      the only thing public pressure does is cause the company to spend more money in redundant hardware which mostly sits unused and raises prices

      My redundant hardware is constantly in use and I have nowhere near the budget of these big boys. Redundant doesn't always mean active/passive. Routers are especially easy to run active/active, hell that's way the Internet routes traffic. BGP/EIGRP will take care of the routing.

      But I suspect that this wasn't a simple router failure. A router failure wouldn't require other systems to be rebooted.

      --
      "A plan fiendishly clever in its intricacies"- Homer Simpson
    2. Re:why can't people accept that things happen? by tiberus · · Score: 2

      i've been delayed because of weather, engine troubles, etc and i'm still alive and happy.

      Many (most?) can accept that things happen but, there are limits. Yet, most can't accept a lack of information and being outright lied to. Air line in general have a very poor public perception of telling the truth about why delays are occurring. Weather and "Acts of God" are one thing, we can check the weather, many can sort out things like "Oh, there's a thunderstorm in Ohio, why is that affecting my flight in Colorado... Oh, my plane in trying to leave Ohio..."

      With air lines implementing various cost cutting measure (which rarely seem to improve service or cut out prices) that have meant fewer, fuller flights and less convenience for travelers the system seems wholly unable to deal with interruptions above a certain threshold. In light of all this, I fully understand someone's ire at a lack of transparency from any air line.

      Southwest, and others, should (as in be required to) provide details on what went wrong.

      the only thing public pressure does is cause the company to spend more money in redundant hardware which mostly sits unused and raises prices

      It's likely that the cost of redundant hardware pales in comparison to what the device failure cost them..

    3. Re:why can't people accept that things happen? by Anonymous Coward · · Score: 1

      the only thing public pressure does is cause the company to spend more money in redundant hardware which mostly sits unused and raises prices

      My redundant hardware is constantly in use and I have nowhere near the budget of these big boys. Redundant doesn't always mean active/passive. Routers are especially easy to run active/active, hell that's way the Internet routes traffic. BGP/EIGRP will take care of the routing.

      But I suspect that this wasn't a simple router failure. A router failure wouldn't require other systems to be rebooted.

      Try dealing with systems where a lot of the code was outsourced to India. A unicorn farting in Uzbekistan might cause things to get FUBAR.

      Then the low-cost O&M folks you hired live in WindozeWorld where rebooting is step 1, 2, 3, 4, 5, all the way up to step 9153 in troubleshooting.

    4. Re:why can't people accept that things happen? by Anonymous Coward · · Score: 1

      This - sometimes lack of information is ridiculous. I missed the connection for my 5th of 6th flights a few weeks ago. I'm pretty sure they knew I would miss it before we took off - but they kept being optimistic. And then when I landed, no more flights anywhere that night.

      Now, had I known, even 5 minutes before the flight, I would have rebooked to a flight basically anywhere other than Detroit.

      Same trip - Hertz Gold had a reservation for a car at 1230 am - I get to the lot at 1am - no cars for any gold members. And basically no cars. Took until 2 am to get a car. Just tell me your shit is fucked up, I would've jumped in a taxi and gotten to bed. I get shit happens - be a goddamn adult and tell me about it.

    5. Re:why can't people accept that things happen? by Joe_Dragon · · Score: 1

      A router failure wouldn't require other systems to be rebooted.

      Unless they are on some older os / old mainframes / have apps that got stuck / have stuck sessions / the systems where due for an os update and reboot.

    6. Re:why can't people accept that things happen? by Somebody+Is+Using+My · · Score: 1

      But how would informing you of the issues have been better for the company, at least short-term?

      Take your Hertz example; not knowing the extent of the problem, you waited around until you got a car - and Hertz got paid. Had they told you that no vehicle would be available until 2AM, you would have taken a taxi and Hertz would have been out a rental.

      Of course, long-term these attitudes can cost a company customers, who will look to their competitors rather than use a company with such poor service. But that's less of a fear when a company is large enough that the alternatives are unavailable or unpalatable (e.g., given the choice between frequenting a big-national-chain or an unknown local business, most people chose the former... especially if they themselves aren't locals). And businesses these days aren't really known for looking out for long-term problems anyway...

      The short of it is that there is often very little incentive for companies to admit their shortcomings and very expensive reasons not to.

  3. Failure? by Anonymous Coward · · Score: 0

    What failure?

  4. As transparent as their customers demand by El+Cubano · · Score: 4, Interesting

    The companies understand one thing: profit.

    It depends on the volume of business and a variety of factors. For example, I was recently considering the purchase of a new automobile. There was one make which I ended up removing from consideration because their infotainment was not open for me to hack on. I felt like this was important and so I told the salesman why it was important to me and that this single factor resulted in my no longer considering any models from this manufacturer.

    In another instance, a specific dealership had two different sales people contact me by phone, essentially competing with each other. I didn't like that so I didn't bother calling back either one. Several days later I received a form inquiry from the general manager (certainly an automated message). I took the time to respond, explaining that I wouldn't be doing business with them because of the poor coordination of their salesmen's activities. If I already talked with one and explained what I needed in a vehicle, why was another going to call me and try to make me go through all that again?

    Granted, these are different examples, but I make this small effort in the hopes that it will either improve the situation for the person who comes along after me or for myself the next time. Of course, the larger the organization, the less likely this is to have an effect. I expect that the GM of the dealership with two salesmen could possibly do something based on my feedback. I fully expect nothing to change from the manufacturer of the car with the closed infotainment system. However, if 10,000 customers all told different dealers the same thing or bothered to write to the manufacturer directly, then something might change.

    Southwest and other airlines are by necessity very large companies. If you tell a booking agent something it is almost certain no manager will hear of it. But, if you contact the execs directly, perhaps if there is a VP of customer service or an ombudsman, contact that person and let them know that you value openness and that you are specifically avoiding giving them your business because of their lack of it. If they hear this from enough people, the will get the message: we are losing out on business because of our approach to blah blah blah.

    So, bottom line: companies should be as transparent as their customers demand. If you, the customer, don't demand then they won't know and won't make any change.

    1. Re:As transparent as their customers demand by tekrat · · Score: 1

      Not necessarily. You *can* influence a large organization *if* they think they can make money off your idea. For example, I was at the NYC Auto Show with my GF -- and we were sitting in one of those giant Fiat-500 looking half-SUV things. And it had a glass roof which we liked.

      But my GF complained that the vehicle was too tall, making it difficult for her to get snow off the roof (she prefers station wagons to SUVs, and very few manufacturers make wagons anymore).

      So, we were talking with the booth rep, and when she brought up the snow situation, I threw out an idea:

      "Why don't you guys run the heating wires through the glass roof, like you do with the back window? Then you can just heat up the glass and all the snow melts away."

      I'm telling you now, the guy whipped out an iPad, and typed furiously "defroster for glass roof", and I'm sure in a few years, this feature will appear.

      --
      If telephones are outlawed, then only outlaws will have telephones.
    2. Re:As transparent as their customers demand by Anonymous Coward · · Score: 0

      In another instance, a specific dealership had two different sales people contact me by phone, essentially competing with each other. I didn't like that so I didn't bother calling back either one.

      So... you don't want two people competing for your business by offering a lower price?

    3. Re:As transparent as their customers demand by Anonymous Coward · · Score: 0

      >There was one make which I ended up removing from consideration because their infotainment was not open for me to hack on.

      Seems like a poorly thought out decision. Perhaps you didn't know this, but they do make replacement head units which easily qualify as "infotainment" systems. And yes, many of those replacement head units run completely open systems, such as Android. I think a better thought out decision would be to inform the salesman that since the vehicles don't offer a worthwhile stereo system, you'll be adding the cost of the replacement head unit + labour to the final price of the vehicle when making a comparison.

      Personally, I wish manufacturers would offer the option of a blank double din slot with marked wiring for the speakers, power, ignition, etc. I'd probably even pay extra for that. You can't get much more open!

      >In another instance, a specific dealership had two different sales people contact me by phone, essentially competing with each other. I didn't like that so I didn't bother calling back either one.

      You write things off far too easily. If I wanted to buy there, I would have called the salesperson I preferred and let them know of the situation and your decision that they'll be receiving your business. Unless the other salesdroid wants a punch in the face (from the one you selected) he'll never call you again. This is literally the most minor mistake I can imagine a dealership performing. You could have even used it in your favour to get a better price.

      I mean, hey, it's up to you, you're the customer, you do things whatever way you want, so long as you can find someone to sell to you. I just think your decisions weren't as great as you're thinking they were.

    4. Re:As transparent as their customers demand by easyTree · · Score: 1

      if you contact the execs directly, perhaps if there is a VP of customer service or an ombudsman, contact that person and let them know that you value openness and that you are specifically avoiding giving them your business because of their lack of it. If they hear this from enough people, the will get the message: we are losing out on business because of our approach to blah blah blah

      That's great - I agree that targeting someone who might care/have a stake in profits/has power to effect change is probably more effective than talking to someone outside that combined set... but ...how do you avoid them doing mental calculations in their heads along the lines of:

        customer wants openness
        + customer (and wider society) also values competence, particularly when their lives are at stake
        + competence costs more
        + customer likes cheap
      = give the customer the appearance of openness*

      (*) whether through filtering to prevent serious incompetence being reported or the to extreme of fake-incident reporting

      ?

    5. Re:As transparent as their customers demand by c0d3g33k · · Score: 1

      This brings to thought several things:

      1. It's a "sun roof" so keeping it clear in the winter isn't exactly a common use case.

      2. The area covered by the sun roof relative to the rest of the roof is relatively small. Putting heating wires in the glass only won't do much good, since there will still be snow on the rest of the roof. You still have to clear the rest manually, often to comply with local laws about clearing vehicles of snow before driving.

      3. You don't really need to see out the top of the vehicle to drive safely, so it's an added expense for dubious benefit.

      4. Clearing snow off a vehicle isn't a bad thing to do, so really the problem to be solved is how to do that for the entire top of the car, not just the sun roof. But that's a lot harder and more prone to problems than the back window.

    6. Re:As transparent as their customers demand by easyTree · · Score: 1

      Not necessarily. You *can* influence a large organization *if* they think they can make money off your idea. For example, I was at the NYC Auto Show with my GF -- and we were sitting in one of those giant Fiat-500 looking half-SUV things. And it had a glass roof which we liked.

      But my GF complained that the vehicle was too tall, making it difficult for her to get snow off the roof (she prefers station wagons to SUVs, and very few manufacturers make wagons anymore).

      So, we were talking with the booth rep, and when she brought up the snow situation, I threw out an idea:

      "Why don't you guys run the heating wires through the glass roof, like you do with the back window? Then you can just heat up the glass and all the snow melts away."

      I'm telling you now, the guy whipped out an iPad, and typed furiously "defroster for glass roof", and I'm sure in a few years, this feature will appear.

      Agreed.

      Now the challenge is to express "publicly display our faults in a truthful and complete manner" as a thinly-veiled money-making opportunity.

    7. Re:As transparent as their customers demand by easyTree · · Score: 1

      In another instance, a specific dealership had two different sales people contact me by phone, essentially competing with each other. I didn't like that so I didn't bother calling back either one.

      So... you don't want two people competing for your business by offering a lower price?

      Uhh, surely they'd just tag-team the customer with a combination of:
        * increase price before bartering begins (check, built into the dealership model)
        * the appearance that they're operating against each other rather than together

      ?

    8. Re:As transparent as their customers demand by Anonymous Coward · · Score: 0

      >I ended up removing from consideration because their infotainment was not open for me to hack on. I felt like this was important and so I told the salesman why it was important to me and that this single factor resulted in my no longer considering any models from this manufacturer.

      An honest statement, and an honest attempt at sharing to the sales guy why there was no sale. But do not believe for a second that the very few customers who want to hack their infotainment systems will seriously sway the company to abandon their car designs. Look, it's not a Lego Car it's an already assembled car. If you want to hack it, then an open system- with easily disassembled panels, external ports, malleable OS... I mean this is an invitation to literally change how the system is designed... and they're selling a complete package so no don't expect a change for you. *** Though, as a hacker myself, I realize it sure would be 'nicely convenient' if they did, realize that car manufacturers & their amenity partners are making solid units not porous ones for us. I mean really, you're asking for bread bakers to already have the peanut butter on it for us. ***

  5. No simple answer by sjbe · · Score: 1

    Aside from the technical issues of what happened, what should a public-facing company's obligation be to discuss what happened in full detail?

    There is no simple single answer to this question. It's going to be circumstance dependent. In many cases a lot of transparency will be helpful and appropriate. In other cases it probably won't matter much and in a few cases it might even be counterproductive though I expect that would be uncommon. If the problem is something like a security problem that will take time to resolve, immediate transparency might do more harm than good in some cases. But in general people are pretty forgiving if they understand the mistake was an honest one and that the company is working in good faith and transparently to fix it.

    Would publicly talking about the sequence of events before and after failure help restore faith in their operations?

    Generally speaking the answer is probably yes. If people can see that the company is acting in good faith to solve a problem that might shake confidence in the product then yes, transparency will probably help. Probably the canonical example of transparency working to the benefit of the company is how Johnson & Johnson handled the Tylenol tampering back in 1982. The company acted quickly, transparently and forcefully to deal with the problem and it probably saved the product and changed how such products were packaged going forward. It's one of the better examples of how to handle a major crisis. It's not hard to find examples of companies brushing things under the rug and then it blowing up in their face down the road. See GM and their ignition failures for a good example of that.

  6. They Solicit by JimSadler · · Score: 1

    Airlines solicit people to be 5 miles in the air and thus vulnerable to death. To me that means that zero levels of privacy should be allowed so that all individuals and competitors can study every single detail about anything to do with an airline. For example the pay rates for their mechanics is one indicator of the quality of maintenance performed. How about dollars spent on maintenance per hour of flight? How about the hours in the air for every plane they fly? All these things can be used to judge safety and should be wide open for all inspections at all times.

    1. Re:They Solicit by Anonymous Coward · · Score: 0

      Everyone is vulnerable to death at all times. That not is not a good enough argument to violate privacy. If an airline doesn't maintain adequate safety standards, people will not fly on that airline and it will quickly go out of business. That is the way the system works. It is in the company's best interests to have exceptionally safe planes and a spotless safety record.

    2. Re:They Solicit by Anonymous Coward · · Score: 0

      "If an airline doesn't maintain adequate safety standards, people will not fly on that airline and it will quickly go out of business. That is the way the system works."
      Ah, the Classic Libertarian Argument- if enough people die in large enough horrifying numbers, others may decide to no longer do business with an airline. Caveat Emptor.

      "Everyone is vulnerable to death at all times. That not is not a good enough argument to violate privacy." There is no right to "Privacy", and there never has been. You whackoes are so delusional; my "Right" to continue living trumps your "Right" to "privacy", however you define it.
      If your precious "Privacy" is so important, find a cave somewhere, crawl in, and never come out again. You no longer fit the definition of a Human Being.
      A great Captcha: banishes

  7. Not a router failure and not a surprise by Anonymous Coward · · Score: 4, Interesting

    I worked IT in the airline industry for over 20 years and that happening does not surprise me.

    In many cases the systems are old, the software is not well maintained, and management does not understand how critical it is to the operation of the company. Many airline/aircraft companies have outsourced their IT to Managed Service Providers under the guise that "We are an airline, not an IT company." In doing so management negotiated the contracts, not IT, and the contracts are crap. No clauses for upgrading systems, no clauses for management of software patching, and one such contract, that I have read, guaranteed a 98% uptime. Yes, it really was 98% and not 99.999%.

    In almost all cases once IT was outsourced, they not only eliminated their IT department, the added rules that stated they could not hire IT people as it was all outsourced and they had no need of them. The companies I have worked for have haired me with odd titles to avoid such rules.

    Redundancy is, in many cases, non-existent. Equipment is aging and starting to fail, and there is no plans or projects in the works to update them. Heck, one company I know of is still running on computers that were purchased in 1995.

    When projects are put forward with proper HA, network fail over, SAN, etc. They get cut in cost cutting measures to the point that they are unrecognizable. A great example is an upgrade to an Oracle server that I was working on. The original upgrade plan was to deploy an HA pair with back end SAN on a dual 10g fail over connection. After it was cut it ended up being a single dual proc windows system with internal drives running on a 1g connection. It has already crashed multiple times and each time has brought the company to a standstill.

    In this day and age, companies need to realize that they run on IT. If your IT infrastructure fails, your company comes to a halt and you loose money!

    1. Re:Not a router failure and not a surprise by bravecanadian · · Score: 2

      In this day and age, companies need to realize that they run on IT. If your IT infrastructure fails, your company comes to a halt and you loose money!

      It is amazing to me how many companies do not realize this until they suffer a major outage.

      I like to think that it is because many senior managers are still of the generation that did not grow up with computers being a central part of their lives/businesses.

      However, the generation coming up now that has had that is almost as bad but in the other direction -- they want to use computers / tablets / phones / the cloud etc. for everything and are very quick to adopt new devices /apps / services... with very little thought to the long term viability, reliability, or maintainability of those products.

      It is really time for IT to get a seat at the grownups table. Many companies don't have senior IT management and, at many of the ones that do, they report to the CFO.. not directly to the top. And when is the last time a CIO was a candidate for a CEO transition outside a pure tech company? Probably never.

      IT is a dead end in most places.

    2. Re:Not a router failure and not a surprise by Anonymous Coward · · Score: 0

      "I worked IT in the airline industry for over 20 years and that happening does not surprise me..."
      Well, you may be able to answer this...

      Does Southwest still use their in-house version of Sabre? One thing missing in all of the hand-waving is that Southwest has a history of having their Reservations Systems crash, but the details are never disclosed. Sabre is very old, hell, it was obsolete two decades back, but the costs of replacing it were so enormous that Airlines that still use it just patch and wait, patch and wait...

    3. Re:Not a router failure and not a surprise by Anonymous Coward · · Score: 0

      You nailed it. IT at Southwest is over 60% (and growing) outsourced to Managed Service Providers and the quality of work has been going downhill ever since. There was a backup system in place but the switch over failed. Lots of old systems and software weren't able to handle the switch over (never been tested as far as I know). It doesn't matter anyhow. They were already in the process of outsourcing the entire data center that went down; this will accelerate and justify the outsourcing to management.

  8. Easy Betteridge by Anonymous Coward · · Score: 0

    If you forget to shave or shower on a particular day, should you be required to post that to your Facebook page or wear a billboard sign all day decrying your lack of hygiene?

    1. Re:Easy Betteridge by tsqr · · Score: 1

      If you forget to shave or shower on a particular day, should you be required to post that to your Facebook page or wear a billboard sign all day decrying your lack of hygiene?

      If you don't shower, it will be apparent to everyone around you; no need for a sign or Facebook post.

  9. Civil Engineering Lesson by Rob+Riggs · · Score: 1

    What do we do when buildings and bridges fail, or when an aircraft falls out of the sky? We should do something like that. In a more enlightened age, we'd have the NTSB-equivalent for massive IT failures.

    --
    the growth in cynicism and rebellion has not been without cause
    1. Re:Civil Engineering Lesson by bravecanadian · · Score: 1

      What do we do when buildings and bridges fail, or when an aircraft falls out of the sky? We should do something like that. In a more enlightened age, we'd have the NTSB-equivalent for massive IT failures.

      Having some minimum standards that are required for both the systems themselves and the people working on them would be great.

      IT needs to get much more professional but that would mean doing battle with all the companies/lobbyists who like IT being cheap, easily outsourced (in the short term), and with a bunch of cowboys who don't want to unionize or group themselves under a true professional group in any way.

    2. Re:Civil Engineering Lesson by ErichTheRed · · Score: 1

      "IT needs to get much more professional but that would mean doing battle with all the companies/lobbyists who like IT being cheap, easily outsourced (in the short term), and with a bunch of cowboys who don't want to unionize or group themselves under a true professional group in any way."

      Indeed, this is the problem. There are way too many cowboy sysadmins and coders out there who wouldn't even think about minimum standards for work product. I think the only way to solve it would be to have a purely political organization that did nothing but counteract the corporate lobbyists. If you basically said to everyone, pay us a small amount per year, and we'll give it directly to Congress to pass favorable laws, the only question in my mind is how much money it would take. No work rules, just a direct payment to each lawmaker for legislation. The AMA does this for doctors, and business interests do this for their member companies. I think it's time to admit that the only way to get things done is to hand over paper bags of money and sample legislation.

  10. Too glib by sjbe · · Score: 4, Insightful

    The companies understand one thing: profit.

    That's not true. Companies and the people that run them understand more than just profit. I defy you to find a single person in a company who cannot comprehend something other than profit. To claim that profit is all they can understand is absurdly untrue. But there is a nugget of truth in what you say. What is true is that companies and some (not all) of those who run them have a strong tendency to focus on profits excessively, particularly short term profits. They do this to the detriment of all else including the long term health of the company sometimes. It's too glib to say that companies only understand profit but it is fair to say that companies tend to focus on it too hard at times and make bad decisions as a result.

    A well managed company has to consider things like the health of their community, the well being of their suppliers, the trust of their customers, etc. All these things sooner or later will impact profits so if company focuses excessively on near term profits then in the long term they will likely be worse off and so will all those who depend on the company - customers, suppliers, community, shareholders and employees.

    1. Re:Too glib by waveclaw · · Score: 2

      I defy you to find a single person in a company who cannot comprehend something other than profit.

      Investors.

      Also the implied definition of profit it very limited. There are other kinds of profit than 'make as much money as possible.' But the investors are always taking on some of the risk and responsibility for a profit.

      Large investors like Venture Capitalists or Mutual Funds may only be interested in how to generate money since they don't really have any other value they can derive from a random business.

      It is sad today that any company created who doesn't have the express purpose of making more money is called a non-profit. It reflects our current narrow thinking in Western culture and a lack of knowledge of our history. Originally a company was a kind of business that a group of people formed legally to achieve some end of some kind. There were many kinds of charters. Often expected social benefits were required for granting the recognition of a company as a thing.

      A company was once practical tool for a practical world. If openness was not harmful it might even make the achievement of that goal easier by enabling other companies to work together to achieve that goal. A perfect example is Universities creating the Internet long before the private dial-up networks created their closed captive markets.

      But in lassie-fare market economics secrecy can give your for-money-profit-only company a competitive edge. Deny others access to your market and force them to spend time developing their own trade secrets. There is little advantage in the for-money-profit-only world to you letting government regulators or customers in on your super secret formula. Best to do away with the FDA and BSA, too.Your product or process could be a ball full of crap, kill kittens to make pop-tarts or power ancient evil with pollution. Openness would be harmful to that business model.

      --

      "You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
    2. Re:Too glib by easyTree · · Score: 1

      The companies understand one thing: profit.

      That's not true. Companies and the people that run them understand more than just profit. I defy you to find a single person in a company who cannot comprehend something other than profit. To claim that profit is all they can understand is absurdly untrue. But there is a nugget of truth in what you say. What is true is that companies and some (not all) of those who run them have a strong tendency to focus on profits excessively, particularly short term profits. They do this to the detriment of all else including the long term health of the company sometimes. It's too glib to say that companies only understand profit but it is fair to say that companies tend to focus on it too hard at times and make bad decisions as a result.

      A well managed company has to consider things like the health of their community, the well being of their suppliers, the trust of their customers, etc. All these things sooner or later will impact profits so if company focuses excessively on near term profits then in the long term they will likely be worse off and so will all those who depend on the company - customers, suppliers, community, shareholders and employees.

      You appear to be saying it's not true then explaining how actually, it's all about profit, or anything which can affect profit - which is just another way of saying they're *really* serious about profit..

    3. Re:Too glib by Anonymous Coward · · Score: 0

      A company would chop off its own nuts if it increased profit, decreased cost, or got in the way of making profit. Everything a company does is in order to increase its profit margin. They may consider all those things you mentioned, but it's all in the name of profit increase. If companies truly gave a rats a-- about "community" they wouldn't be firing workers left and right to become "more agile", "cut costs", "change directions". They would impact their profit margin to retrain and reuse their existing workers to make those changes. Instead they throw them out like last month's clothes fashion.

    4. Re:Too glib by Anonymous Coward · · Score: 0

      A well managed company has to consider things like the health of their community, the well being of their suppliers, the trust of their customers, etc. All these things sooner or later will impact profits

      The companies understand one thing: profit.

      I fail to see the difference between these two statements - one gives a detailed explanation, one is a concise and summarized point.

      In the US - small and medium sized public companies may still be able to perform non-profit motivated actions on occasion but those are the exception, not the norm. Large public companies - name one that is NOT motivated by profit at the end of the day. European-based companies, may have a different discussion with you.

  11. the root cause by Anonymous Coward · · Score: 0

    Southwest Airlines Co. has filed 33 labor condition applications for H1B visa and 1 labor certifications for green card from fiscal year 2013 to 2015. Southwest Airlines was ranked 5651 among all visa sponsors.

    Ah. That explains it.

  12. Yes, but it depends on the level of danger by Vliegendehuiskat · · Score: 2

    Yes! I think airlines and all companies exposing the public to potential life and death situations should definitely give a post mortem when critical systems fail, regardless of whether they are mechanical or not. However, if your local supermarket had a crash of their inventory management system, would you really care? No you probably would not because you will still be able to pay with cash and take your goods anyway. I think the line should be drawn somewhere near exposure to mortal danger. Therefore every company offering some sort of transportation service should be as transparent as possible and should have near-zero privacy.

  13. Legal Requirements by ytene · · Score: 1

    The previous post offering the title "That Depends" is on the right track.

    Some industry sectors have legal requirements to disclose technical failures that could impact their operating bottom line. For example, think about Section 404 of the Sarbanes-Oxley Act.

    Other requirements are driven by locations - for example California was the first US State to require formal disclosure if a company lost unencrypted client data.

    The bottom line is that, for a growing number of industry sectors, legislative jurisdictions and use cases, there is a legal requirement to make necessary disclosures and in a timely manner. In the case of some requirements [like SOX-404] there is the potential of jail time for company officials that fail to abide by the law.

    Ultimately, it is the legal responsibility of the CEO of a [publicly listed] company to ensure that the company operations fully comply with all legal obligations at all times. Irrespective of whether or not a company unaware of it's obligations may end up breaking the law, a company that doesn't understand those obligations has a negligent CEO.

    Here be dragons. Tread carefully!

  14. "transparency" to build confidence by ChrisOtt · · Score: 1

    I think it's smart to be as transparent as you can about system failures (without creating a security risk by discussing your infrastructure in too much detail). A company like Southwest depends in part on consumer confidence in order to gain customer loyalty and confidence. People are still a little afraid to fly. The act of transparency can boost confidence because the customer is expected to accept that bad things happen but that the right person is in charge, knows what happened, knows how to fix it, and can assume that changes have been or will be made so that it doesn't happen again. If Southwest instead chose to be completely opaque the customer would be either be wondering what they're covering-up and why (hurting confidence) or if they are completely incompetent (also greatly hurting confidence). For companies like Southwest, I think it's essential that they be as detailed as they can be.

  15. I accept anything as long as it's truthful by Opportunist · · Score: 1

    Accidents happen. And only people who don't work make no mistakes. So if anyone claims he never makes mistakes, you have found the slacker.

    People are surprisingly willing to cut you some slack if you admit mistakes, apologize and offer them some token compensation. Provided that they don't happen too often and that it cannot be considered malice or gross negligence.

    Also, what you offer in compensation should be in sync with your mistake. Handing out a free trial that marketing has been throwing about left and right like it's some candy that's reaching its best before date after losing your customer's credit card info is NOT going to cut it. Sony, I'm looking your way.

    Generally, Sony could be used as the poster child of how NOT to reconcile with your customer base after fucking up...

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  16. Indians, prolly. by vikingpower · · Score: 4, Interesting

    "Outsourcing partner" in Bangalore must have screwed up.
    On Indian outsourcing, here's a war story. When working with Fokker, the Dutch aerospace company, I was sent to Bangalore to emit a final judgment on an outsourcing firm there. On the second day, needing to go to the toilet, I lost my way in the building. Trying to find the loo, I walked by an empty cubicle (the cubicles had large glass panes in them). On the table lay a blueprint. Being an engineer, I couldn't refrain from looking at it. The name "Areva" was printed all over it, Areva being a French constructor of nuclear power plants. It soon became clear to me that those st***d Indians had left the blueprint of an import safety valve in a current nuclear reactor design, unsupervised, on a table in an empty cubicle, and that anyone could walk in on it. I took a picture with my cell phone and sent it to Areva - after having stood there, for a test, for about 10 minutes. Nobody turned up. Anyways - some high-up security guy there went ballistic; on the phone, he thanked me and explained to me the kind of mayhem that blueprint falling in the wrong hands could have caused. (Needless to say we at Fokker immediately cut ties with that Bangalore company.)

    --
    Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
    1. Re:Indians, prolly. by Anonymous Coward · · Score: 1

      On my last gig my employer had an Indian company handling Linux systems administration chores. When they created our virtual machines they left behind scripts that contained the credentials to access another customer's infrastructure. The other customer was a big, national bank.
      Most big companies are willing to accept the increased risk incurred by farming out work to incompetent offshore people if the price is right.

      That's how they do.

  17. Case-by-case basis by davidwr · · Score: 1

    Those customers that got "badly burned" are going to want to know that you've learned your lesson.

    If the event hit the press or word got around to your target customer base, you'll need to convince them that it won't happen again (I'm looking at you, Southwest Airlines).

    If your industry is one where the failure could cause death or injury if it happened again - even to a competitor - then you have a moral and possibly legal obligation to "go public" within your industry so they can learn from your experience (I'm looking at you, Blue Bell Creamery).

    Even if it's not life-or-death, you may find it good busine$$/good PR to share details within your industry or to the general public (thank you, Google).

    There are some cases where publicity isn't critical.

    For example, if you sell widgets and you had a no-critical-lessons-learned systemic failure in one of your factories that shut down production in that factory for a week, but your other factories were able to ramp up production so all your distributors and major customers noticed was a half-day shipping delay on some parts resulting in their own inventories, but your other end users didn't notice anything, then all you need to do is apologize for the inconvenience and say is that a plant had to be taken offline and it took half a day to add shifts to the other plants and get your widgets shipped out. If you are a public company you may need to issue a press release for the benefit of investors. If you had temporary layoffs or if employee health and safety were affected, you may have to notify the goverment, unions, and affected employees. Other than that, you probably don't need to say much more.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  18. Social Conditioning by Anonymous Coward · · Score: 0

    The public expects no explanation. They have been conditioned over 50-years of experience to accept, "Our computers are down" as sufficient.

    You may think I'm joking but it is mostly true.

  19. All Backups Failed? by QlooQl · · Score: 1

    First off, how does a router failure make you lose data? That's either a lie or they have no clue how their system works...which would explain why they failed so miserably. Also, how does such a large company not run a RAID on their server hard drives? So you're telling me the RAID failed, the NAS failed, the on-site backup failed and the offsite backup(s) failed? I guess I could believe Southwest outsources to a company that doesn't consider the basics that every 20+ person small business in the USA takes for granted.

    1. Re:All Backups Failed? by vikingpower · · Score: 1

      They prolly outsourced. See my "Indians/Bangalore" post above.

      --
      Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
  20. Re:Too glib - Not glib at all. by Anonymous Coward · · Score: 0

    Actually, that is the definition of a company.

  21. Skeletons in the closet? by ErichTheRed · · Score: 1

    All large organizations have some messy aspects of their internal IT. The longer the organization has existed, and the larger and more diverse it is, the worse it gets. There was a story a couple days ago circulating about a Citibank employee (NOC engineer or something like it) that was able to stop most network traffic by removing the configs in a few key routers. (Turns out he was upset about a bad review he had just been given.) If a network were properly designed with no choke points, no SPOFs, etc. it would be extremely hard to take out all traffic. But the reality is that stuff grows organically over time and there are lots of IT skeletons in closets. I doubt there's a CIO on earth that wants to go out there and say to the public that they screwed up because they didn't, say, pay an extra $5K for a redundant router.

    About SWA's troubles, here's a clue -- airlines have absolutely zero interest investing more in IT than the basics required to run the business. There's cool stuff being done, but airlines are a low-margin business (believe it or not) and have historically relied on a web of third party companies to provide IT services. It used to be just reservation systems, etc. that most airlines couldn't or wouldn't want to run themselves anyway. But in recent years, lots of development and operations work has been moved to "offshore partners" or IT companies that in turn offshore everything. Because of all this abstraction, I'm sure Southwest's onsite IT staff had a very difficult time figuring out who and what was actually to blame for the issue. That, and airline IT is full of single points of failure that are just the nature of the business. Losing operational messaging links, having one system fail in a chain of dependencies that prevents aircraft dispatch or crew scheduling, and others can stop an airline from operating until they're fixed.

    Another point - the cloud doesn't really solve this either. It has the potential to, but architecting a failure-tolerant solution in a public cloud is actually harder to do than on-site stuff. Sure, if you're starting from scratch you can write software in a way that gracefully handles failure. However, any legacy application port into the cloud requires very careful thought about how to design it for fault tolerance.

  22. Definitions by sjbe · · Score: 1

    Actually, that is the definition of a company.

    No it is not. The definition of a company is "an 'artificial person', invisible, intangible, created by or under law, with a discrete legal personality, perpetual succession and a common seal. It is not affected by the death, insanity or insolvency of an individual member."

    A company is a term that refers to a variety of types of organizations. Some types of companies are explicitly not concerned with profits at all. Perhaps you've heard of non-profit companies? Those are a thing you know.

    From the linked article: "In the United States, a company may be a "corporation, partnership, association, joint-stock company, trust, fund, or organized group of persons, whether incorporated or not, and (in an official capacity) any receiver, trustee in bankruptcy, or similar official, or liquidating agent, for any of the foregoing". In the US, a company is not necessarily a corporation."

  23. They need to be by Anonymous Coward · · Score: 0

    100% transparent to someone, it needs to be known whether are not software is actually reliable, hype needs to be set aside. When we are talking about lives hanging in the balance, the spreadsheet mentality is very inappropriate. In some cases, silicon valley and its chsmpions are beginning to remind me of the pharmaceutical industry!

  24. Re:Router Failure? Cheapskates? by Anonymous Coward · · Score: 0

    Undoubtedly, it's related in some way to cheaping out - as you say, it's in the culture of every corporation. And yes, it's the same for their airplanes, though they're a far cry from Allegiant. I've been delayed several times by problems with the airplane on Southwest, thought the worst actually flying plane I've been on was United. Southwest even had the fuselage open up on one, that caused rapid (though not explosive) decompression and an emergency landing in Yuma, and like many airlines has been cited in the past for outsourced maintenance practices. In all, though, they may have problems but they're way ahead of second place unless you can afford to fly in your own plane.

    Don't say they're cheap to fly, though. Fares are very similar to the other big boys these days.

    I'm suspecting that a router did fail, which triggered a bunch of other cascade failures which it shouldn't have.

  25. Who Cares? Let the Market Decide by RobotRunAmok · · Score: 1

    If it's a government entity, yeah, full disclosure, down to the last comma separated value. A public company? That's between them and the share holders. Private company, disclose whatever they want or not. In the end, there'll be some consumer watchdog outfit that will publish all the up and down time percentages and companies will reap their desserts. Unless they're calling me in to fix the problem, I don't care whether they were hacked or somebody's cat pissed on a circuit breaker, they're either up or they are down.

  26. SW IT is lying by Anonymous Coward · · Score: 0

    1 router failed and corrupted all backups umm what the f*** Failover is a thing, like a really really important thing. If they're going for transparency they're doing a poor job because the excuse they provided is a lie. This sounds more like an IT worker trying to save his job by lying his ass off, and his excuse somehow made it all the way to Slashdot.

    1. Re:SW IT is lying by vikingpower · · Score: 1

      Although I rarely respond to ACs, here is a thumbs-up. This sounds like the cleverest in-road I personally would consider pursuing in case I were hired to do an investigation.

      --
      Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
  27. Helpdesk/NOC by Anonymous Coward · · Score: 0

    I've worked at a helpdesk and we were always lying through our noses to cover our screw ups. On the phone while the incident was happening we'd be very vague and give as few details as possible. We'd then give a full account to the manager who would produce an incident report which was what we called "customer-service friendly".

    Tech companies have no incentives to admit when they screwed up and as such, will always try to lie/cover it up when they think they can get away with it.

  28. If they said it was a router . . . by Anonymous Coward · · Score: 0

    it was probably not a router.

  29. Be Transparent by luis_a_espinal · · Score: 1

    You are either transparent, or you aren't.

  30. The Backups didn't fail, the Backup systems failed by Anonymous Coward · · Score: 0

    Southwest has redundant systems, and many redundant paths to those system. Most everything will survive a single point of failure, except the failure of one of the corner routers (they used to use F5's setup in a redundant network path, it has been a few years since I saw things). Most of the apps share the 4 corner F5 routers.

    The apps tend to be interconnected. The reservation system talks to the ground system so the bags get on the right airplane. The ground system talks to the crew system, so the gate agent knows what flight attendants are on the flight, etc.

    Normally processes load balance from the main data center to the alternate data center (if it is a new app, the older apps will only fail over to the alternate data center). The either data center isn't responding, the F5's are supposed to route the traffic to the known good data center, and the apps should work fine. If the F5 gets confused, it may route data to the bad data center, and in effect loose stuff.

    I haven't heard anything about what actually went wrong, but my guess is one of the F5s was mis-configured, or broken, and it caused many of the apps to not be able to talk to each other. It probably started small, but as failure compounded, and more systems couldn't talk to each other, the front line people just couldn't use the computers, and had to revert to manually checking in passengers, which slowed things down. When the pilots couldn't get flight releases, then everything stopped.

    I am speculating a bunch here, don't quote me on this.

  31. What the fuck? by Anonymous Coward · · Score: 0

    The companies understand one thing: profit.

    To claim that profit is all they can understand is absurdly untrue. But there is a nugget of truth in what you say. What is true is that companies and some (not all) of those who run them have a strong tendency to focus on profits excessively . . .

    Uhm, yeah. That's what the parent is implying, duh.