Slashdot Mirror


Failed Software Upgrade Halts Transit Service

linuxwrangler writes "San Francisco Bay Area commuters awoke this morning to the news that BART, the major regional transit system which carries hundreds of thousands of daily riders, was entirely shut down due to a computer failure. Commuters stood stranded at stations and traffic backed up as residents took to the roads. The system has returned to service and BART says the outage resulted from a botched software upgrade."

14 of 125 comments (clear)

  1. Strange times by nightsky30 · · Score: 5, Insightful

    Why was a weekday selected for this software update?

    1. Re:Strange times by TWX · · Score: 4, Informative

      Well, based on my own experience with bureaucracies, there is some existing rule that ensures that certain types of staff have certain days off unless there's an emergency, and a software update probably didn't previously count as an emergency.

      From one standpoint, it makes sense, especially if those doing the work need technical support from a vendor. On the other hand, it probably makes more sense to have a QA lab set up if one is going to operate this way, so that one can test a rollout in advance, hopefully forestalling such problems going live.

      --
      Do not look into laser with remaining eye.
    2. Re:Strange times by B33rNinj4 · · Score: 4, Insightful

      Man, my company hasn't had a QA environment that mirrored production in over a decade. I'd like to think that they had something set up, but the few state-run departments I've seen have been sorely lacking.

    3. Re:Strange times by Salo2112 · · Score: 3, Funny

      Patch *Tuesday*. Duh.

    4. Re:Strange times by girlintraining · · Score: 5, Insightful

      On the other hand, it probably makes more sense to have a QA lab set up if one is going to operate this way, so that one can test a rollout in advance, hopefully forestalling such problems going live.

      And that's pretty hopeful. The thing is, in the real world, you just don't test all your patches. You can't; in any non-trivially sized network you're going to have hundreds of them to go through every week, and the workload is the same for a small or large business. That's why large businesses tend to do better (strangely enough) than small ones when it comes to patch management. And this is an attitude that is backed up by the numbers -- I would say over 9 times out of 10, a break/fix patch has no consequences being pushed into the production environment. It goes out. The version increments. The end. It's that 1 time that screws everyone up -- but it happens infrequently enough that management doesn't update its policies.

      Most managers operate under a triage approach to maintenance -- that is, throw resources at a problem when something breaks and complaints start coming in, rather than throwing resources at prevention. In the short run, this is the right approach -- in a crisis you want all hands on deck. The problem is that over time, neglecting preventative maintenance procedures, which show up only as a cost without a defined benefit, results in departments moving to a triage model all the time. Basically, the problem is short-term prioritization over long-term cost reduction.

      And I've seen it in almost every IT department I've worked for. I've even sat down with managers and explained to them that when 35% of their workflow is emergency break/fix and that number is trending upwards, we have a process control issue. They invariably agree with me, but say they can't get out from under the workload. Of course, when I come back three months later and it's now at 47% and the workload is now a third higher, they say the same thing.

      I would lay money that this is how project management is happening at BART, and it has now deteriorated to the point where its starting to impact its core business. The problem is, while it is still likely at a point where effective project management can right this sinking ship... it almost never happens. Unfortunately, the solution most of the time here is to throw someone under the bus, blaming them for the failure, and insisting that as the system has worked up until this point, it does not need an overhaul.

      They couldn't be more wrong; But unfortunately it will take several people being thrown under the bus and a few more high-profile failures before senior management fires the mid-level manager responsible for the project and brings on someone with a strong background in project management and they restructure their department from the ground up following the best practices of change management. Of course, they'll over-do it in the attempt and the pendulum will have to start swinging back the other way, but... that's what happens.

      --
      #fuckbeta #iamslashdot #dicemustdie
  2. Hello, IT. by tech.kyle · · Score: 3, Funny

    Have you tried turning it off and on again?

    --
    If we colonize Mars, it won't be the World Wide Web anymore. UWW?
  3. BART by Anonymous Coward · · Score: 5, Interesting

    BART is run by the dumbest people on Earth. First off, it's takes a special kind of stupid to create a rail system that goes almost, but not quite all the way to the airport. 30 years later they extended to one of them but you still have to transfer to a bus for the last mile on another. Then you have to wonder what kind of idiot puts light carpet and cloth seating on public transport. 35 years later they start testing non-porous flooring/seating and maybe in another five years all of the trains will be switched over. Then, some bean counter got a bonus when they closed all the station bathrooms when 9/11 happened, ostensibly for security. Now a fifth of the escalators are out of service at any one time because they are clogged with human shit.

    I also heard there was some sort of labor dispute.

    1. Re:BART by Jane+Q.+Public · · Score: 3, Insightful

      "BART is run by the dumbest people on Earth."

      Well, you really do have to wonder when they say they worked through the whole night only to discover that this new, mysterious problem was caused by the updated they'd made the night before.

      I mean, wow. Wouldn't that be the first thing that popped into your mind?

    2. Re:BART by MrEricSir · · Score: 4, Informative

      The Bart-SFO extension was a matter of politics, you can't blame the people who run Bart for that. You also can't blame the initial designers for not building the OAK extension, since OAK was a much smaller airport in those days (and had very few passenger flights.)

      The train design was done by an aerospace company with absolutely no rail experience, which explains Bart's quirky design elements. But you can't blame Bart current management for construction contracts awarded in the 1960's.

      --
      There's no -1 for "I don't get it."
    3. Re:BART by Anonymous Coward · · Score: 3, Funny

      So people take a dump while riding the escalator? That's actually a cool idea.

    4. Re:BART by Anonymous Coward · · Score: 5, Insightful

      Plus, BART is not exactly a metro system like in Boston, Chicago, or New York. It's somewhere between a metro and commuter rail, but closer to the latter. It's a product of 1960s thinking, where people were trying to deal with the population shift out of the urban core. So part of the idea was to create high-speed transit from bed-room communities to downtown Oakland and San Francisco.

      Connecting the airports probably never figured much into the equation. It wasn't built to supplement the transportation needs of carless San Francisco residents. It was built to shuttle people around the Bay Area. If you needed to get to the airport, you got there like everybody else--you drove your car.

  4. Re:I Guess by RabidReindeer · · Score: 4, Funny

    wow first it's the unions that are shutting them down and now a software update? I wonder what will happen next.

    Unionized software.

    Ironic, isn't it? Silicon Valley commutes wrecked due to bad IT practices!

  5. Looks like Terry Childs had a point by Somebody+Is+Using+My · · Score: 4, Funny

    See what happens when you give these guys root access? ;-)

  6. Re:BART has drivers. by bluemonq · · Score: 4, Interesting

    You've almost certainly never ridden BART, much less seen the driver's cab. Why do I say this? Because there's a section of the BART system (the Oakland Wye, bane of commuters who want to get anywhere during rush hour) where drivers are instructed to go to manual control, limited to 25 MPH. It's the result of your vaunted "automated" system designed in the '60s never having worked properly in the past 50 years, and one of the contributing factors to a crash in 2009 (thankfully no one was seriously injured). There are many well-documented incidents of entire train sets disappearing from the computer system, as well as "ghost" trains randomly appearing.

    Here is what an actual BART cab looks like:
    http://i.imgur.com/IbYtYTa.jpg