Slashdot Mirror


Failed Software Upgrade Halts Transit Service

linuxwrangler writes "San Francisco Bay Area commuters awoke this morning to the news that BART, the major regional transit system which carries hundreds of thousands of daily riders, was entirely shut down due to a computer failure. Commuters stood stranded at stations and traffic backed up as residents took to the roads. The system has returned to service and BART says the outage resulted from a botched software upgrade."

81 of 125 comments (clear)

  1. I Guess by dale.furno · · Score: 2, Funny

    They should have brought their skateboards to work.

    1. Re:I Guess by noh8rz10 · · Score: 2

      wow first it's the unions that are shutting them down and now a software update? I wonder what will happen next.

    2. Re:I Guess by Anonymous Coward · · Score: 1

      I wonder what will happen next.

      People will buy cars. Only so much of this nonsense can be tolerated when it fucks with your livelihood. When the boss shows up and all the people with cars are getting it done and all the people with train tickets are at home making excuses... well, you shouldn't need any help figuring this part out, even if you don't like it.

    3. Re:I Guess by Trax3001BBS · · Score: 2, Interesting

      San Fran will turn into Detroit?

      While from Reddit posted a day ago, it's so on topic to your post I had to post it your reply

      http://www.reddit.com/r/explainlikeimfive/comments/1r6f8w/eli5_americans_what_exactly_happened_to_detroit_i/
      Very good read if you want to know about Detroit

    4. Re:I Guess by somersault · · Score: 1

      Except the boss probably couldn't get to work either, unless maybe he has a bike.

      --
      which is totally what she said
    5. Re:I Guess by RabidReindeer · · Score: 4, Funny

      wow first it's the unions that are shutting them down and now a software update? I wonder what will happen next.

      Unionized software.

      Ironic, isn't it? Silicon Valley commutes wrecked due to bad IT practices!

    6. Re:I Guess by OldeTimeGeek · · Score: 1
      Never been to San Francisco, have you?

      Let's say all of the BART riders start driving in. They will find themselves adding more traffic to an already congested highway system that will never, ever, get any larger. There simply isn't the space. And once they get to work, good luck finding some place to park...

    7. Re:I Guess by mrchaotica · · Score: 1

      You do realize you've just summoned an earthquake, right?

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    8. Re:I Guess by tompaulco · · Score: 1

      Except the boss probably couldn't get to work either, unless maybe he has a bike.

      What? The executive class condescend to ride in Public Transportation? Scoff!

      --
      If you are not allowed to question your government then the government has answered your question.
    9. Re:I Guess by milkmage · · Score: 2

      most of them already have cars. BART serves the Bay Area. 50 miles south and east of SF.

      the week long strike earlier this year caused havoc on the roads- people were on the road at 0400, and still late for work. extra busses, extra boats, not enough.

      https://www.google.com/search?q=bart+strike+traffic&espv=210&es_sm=119&tbm=isch&tbo=u&source=univ&sa=X&ei=EhyQUtq2FYb9iQKq2oG4CQ&ved=0CDYQsAQ&biw=1354&bih=647

    10. Re:I Guess by _Shad0w_ · · Score: 1

      There's a guy who catches one of the trains I catch in the morning who always gets on with his skateboard. Although I work in North London.

      --

      Yeah, I had a sig once; I got bored of it.

    11. Re:I Guess by _Shad0w_ · · Score: 1

      They do here. They just have First Class tickets instead.

      And the ones who drive just get stuck on the M25 instead.

      --

      Yeah, I had a sig once; I got bored of it.

    12. Re:I Guess by JustOK · · Score: 1

      Yah, white people have NEVER fucked up a government.

      --
      rewriting history since 2109
    13. Re:I Guess by Grishnakh · · Score: 1, Interesting

      Interesting that you bring up Haiti. They occupy the same island as the Dominican Republic; while Haiti has been a disaster for a very long time, the DR has always been totally different (just look at a satellite photo showing the deforestation on the Haitian side, while the Dominican side is lush and green). Now, if you go look at the people there (which you obviously haven't, because you're a dumb troll who lives in a trailer), you'll see that they're all black! The main difference between them is that the Haitians speak French, while the Dominicans speak Spanish. Also, many of those places in Africa that are fucked up are former French colonies. So maybe that's your common denominator there.

    14. Re:I Guess by Hognoxious · · Score: 1

      Also, many of those places in Africa that are fucked up are former French colonies. So maybe that's your common denominator there.

      Congo and Rwanda are total shitpots, as is Liberia. By some reckonings the latter is the shittest shitpot ever.

      None of those were French colonies.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  2. Strange times by nightsky30 · · Score: 5, Insightful

    Why was a weekday selected for this software update?

    1. Re:Strange times by TWX · · Score: 4, Informative

      Well, based on my own experience with bureaucracies, there is some existing rule that ensures that certain types of staff have certain days off unless there's an emergency, and a software update probably didn't previously count as an emergency.

      From one standpoint, it makes sense, especially if those doing the work need technical support from a vendor. On the other hand, it probably makes more sense to have a QA lab set up if one is going to operate this way, so that one can test a rollout in advance, hopefully forestalling such problems going live.

      --
      Do not look into laser with remaining eye.
    2. Re:Strange times by DavidClarkeHR · · Score: 2

      Why was a weekday selected for this software update?

      Should have been a tuesday. Then our windows updates and our transit updates would match! (... 14% ... for ... ever ...)

      --
      - Nec Impar Pluribus, or so I'm told.
    3. Re:Strange times by x181 · · Score: 2

      so they can purposely botch it and justify the need to have human operators. in case you don't know, BART is currently going through a tense union battle resulting in a few worker strikes and contract disputes.

    4. Re:Strange times by Hamsterdan · · Score: 1

      Why was a *production* system chosen to test the upgrade would be a better question. Why were there no fallbacks an even better one...

      --
      I've got better things to do tonight than die.
    5. Re:Strange times by B33rNinj4 · · Score: 4, Insightful

      Man, my company hasn't had a QA environment that mirrored production in over a decade. I'd like to think that they had something set up, but the few state-run departments I've seen have been sorely lacking.

    6. Re:Strange times by s1d3track3D · · Score: 2

      Yes and I bet there was a least one developer saying the exact same thing who was overruled by mgmt who proceeded with the push regardless!

    7. Re:Strange times by Salo2112 · · Score: 3, Funny

      Patch *Tuesday*. Duh.

    8. Re:Strange times by girlintraining · · Score: 5, Insightful

      On the other hand, it probably makes more sense to have a QA lab set up if one is going to operate this way, so that one can test a rollout in advance, hopefully forestalling such problems going live.

      And that's pretty hopeful. The thing is, in the real world, you just don't test all your patches. You can't; in any non-trivially sized network you're going to have hundreds of them to go through every week, and the workload is the same for a small or large business. That's why large businesses tend to do better (strangely enough) than small ones when it comes to patch management. And this is an attitude that is backed up by the numbers -- I would say over 9 times out of 10, a break/fix patch has no consequences being pushed into the production environment. It goes out. The version increments. The end. It's that 1 time that screws everyone up -- but it happens infrequently enough that management doesn't update its policies.

      Most managers operate under a triage approach to maintenance -- that is, throw resources at a problem when something breaks and complaints start coming in, rather than throwing resources at prevention. In the short run, this is the right approach -- in a crisis you want all hands on deck. The problem is that over time, neglecting preventative maintenance procedures, which show up only as a cost without a defined benefit, results in departments moving to a triage model all the time. Basically, the problem is short-term prioritization over long-term cost reduction.

      And I've seen it in almost every IT department I've worked for. I've even sat down with managers and explained to them that when 35% of their workflow is emergency break/fix and that number is trending upwards, we have a process control issue. They invariably agree with me, but say they can't get out from under the workload. Of course, when I come back three months later and it's now at 47% and the workload is now a third higher, they say the same thing.

      I would lay money that this is how project management is happening at BART, and it has now deteriorated to the point where its starting to impact its core business. The problem is, while it is still likely at a point where effective project management can right this sinking ship... it almost never happens. Unfortunately, the solution most of the time here is to throw someone under the bus, blaming them for the failure, and insisting that as the system has worked up until this point, it does not need an overhaul.

      They couldn't be more wrong; But unfortunately it will take several people being thrown under the bus and a few more high-profile failures before senior management fires the mid-level manager responsible for the project and brings on someone with a strong background in project management and they restructure their department from the ground up following the best practices of change management. Of course, they'll over-do it in the attempt and the pendulum will have to start swinging back the other way, but... that's what happens.

      --
      #fuckbeta #iamslashdot #dicemustdie
    9. Re:Strange times by Anonymous Coward · · Score: 1

      You know, I honestly don't give a fuck about global warming. I figure by the time it happens I'll already be dead. Fuck the future generations. And I don't give a fuck if Obama can see me post this. I'm going to shit debt, CO2 and eye soars on them. I'm living for me. Not some fucking little brat who keeps crying while I'm at a restaurant, trying to enjoy a simple fucking meal. Fuck them. Fuck this planet.

      This is a typical Baby Boomer. Imagine it. In all of American history, the Baby Boomers are the first generation to leave their children with a worse, more fucked-up world than what they had. This is more than a mere "fail at life". This is a fail at present AND future life. That's unprecedented in this country.

      And the average Baby Boomer is so arrogant and entitled too. If I were them I'd be a lot more humble and try to stay out of the way and stop running up debt and stop ranting about the youth and try not to hold up traffic going 20 below the limit. Maybe eventually the younger generations are going to get fed up and will forcibly remind the Baby Boomers that the Boomers need the youngers, the youngers do not need the Boomers.

    10. Re:Strange times by causality · · Score: 2

      Yes, of course, it's always clueless management ignoring the brave developer who warns of catastrophe.

      If management wants the power in the form of the final decisions (which they have), and the ability to take most of the credit (which is often the case), then they also get to keep the responsibility.

      Sounds fair to me. Power and responsibility should never be separated. Ever.

      --
      It is a miracle that curiosity survives formal education. - Einstein
    11. Re:Strange times by SeaFox · · Score: 1

      Why was a weekday selected for this software update?

      The same reason your cable company does maintenance in the middle of the day when at night they would disrupt far fewer customers -- the managers are tightwads and don't want to pay the rank-and-file employees for the extra hours outside their normal schedules, and the ones on salary are among that group that refuses to work outside 9-5 M-F.

    12. Re:Strange times by Anonymous Coward · · Score: 1

      Here's the thing.

      Every company wants cheap IT right now. They want an endless stream of no-benefit, no-complaint, low wage IT workers to come in and set things up so they can fire now newly redundant staff, enable them to compete with companies handing them their asses on a silver platter, implement new systems to replace ones that are often decades old, or reduce their current IT operating costs. Very few companies want something entirely new built from scratch thanks to ZIRP; it makes no sense right now to start a business or expand an existing one either with capital in a safe or borrowed capital.

      The very best IT people are capable of eliminating their own positions; doesn't matter what level they work in the industry. Sysadmin, Helpdesk, Programmer, about the only one who can't is a project manager and that's because a project manager isn't IT, it's a managerial position, they have Nothing to do with day to day operations and very few have Actual in-depth knowledge or IT Skills . Eventually those people, and even the mediocre ones, come to the same conclusion; that working nights and weekends to configure their systems correctly for peanuts just to get fired is bullshit. If they're really good much of their time is spent either experimenting or self-educating and managers consider that non-productive time wasted.

      There are Serious problems with hiring outside firms and contractors; there's literally no way for a non-IT individual to tell if someone is in the 95th percentile, competent, or a completely incompetent bastard. Can you tell if a physicist is competent or is a shyster? Part of the reason everyone company wants a college degree is because it's really easy if you're a shyster to bypass the interview process and there are a LOT of no-nothings out there that schools and certification companies fucked up with. If you're going to hire a no-nothing, then might as well cover your ass If you hire your own staff you take responsibility for your systems; if you hire an outside firm now you've got to trust that firm is going to do the job right; are the exorbitant fee's you're paying going to the dude working 80 hours a week, or to the CEO's Ferrari and Jacuzzi fund? If so, does that mean you're better off hiring cheap labor?

      In short, how do you tell if a Physicist is any good?

      The answer is you've got to find someone who can demonstrate they are good to tell you, and every person THAT individual hires is guaranteed to be less competent than they are.

      Eventually costs and workload will unhinge and people will get driven to they point they don't give a shit or it's more profitable to use their knowledge criminally. I've seen fortune 500 companies where this was the case in the IT Department. IT Staff see this entire game going on, many have decided to either get out, never get in, or to become a no-nothing shyster, fake it 'till you make it, and play the blame game.

      Very few people consistently take responsibility for their projects and staff; if they can find an employer who values this, and they're out there, and they self-invest they become IT Demi-gods. Those jobs you see where company X wants an AD, Exchange, Web, SCCM, Powershell, VBS, Batch, C++, C#, Ruby, Linux RHSE, hardware hacker demigod is partially because Those guys are actually out there.

    13. Re:Strange times by HiThere · · Score: 1

      Well, often they have someone already picked out, but I don't think many are total fabrications. Most people who go to the effort of posting a job application actually do want to hire someone. They may not want to pay enough to actually get someone who matches their requirements, but that's a spearate matter. (And often their requirements are literally insane. The people who wirte the applications must not have ANY idea of what they're asking for.)

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    14. Re:Strange times by tlhIngan · · Score: 1

      And that's pretty hopeful. The thing is, in the real world, you just don't test all your patches. You can't; in any non-trivially sized network you're going to have hundreds of them to go through every week, and the workload is the same for a small or large business. That's why large businesses tend to do better (strangely enough) than small ones when it comes to patch management. And this is an attitude that is backed up by the numbers -- I would say over 9 times out of 10, a break/fix patch has no consequences being pushed into the production environment. It goes out. The version increments. The end. It's that 1 time that screws everyone up -- but it happens infrequently enough that management doesn't update its policies.

      That may be true for a general computing system where patches come fast and furious from everywhere, but for closed systems like transit systems, ATC, etc., where the software is basically frozen, there's no excuse. The mantra of "if it ain't broke, don't fix it" applies - i.e., software patches for the underlying OS are often NOT applied because they don't need to be (the system works). However, in the off chance an update is necessary (perhaps the control software has to be updated to handle new equipment that's no longer compatible, or other problem), then the new configuration is extensively tested because once it's set, it's frozen.

      (In fact, the biggest problem is usually someone blindly connects the isolated control network to the main corporate LAN where all of a sudden it's no longer static. Heck it could trigger obscure bugs simply due ot added traffic load).

      Every company wants cheap IT right now. They want an endless stream of no-benefit, no-complaint, low wage IT workers to come in and set things up so they can fire now newly redundant staff, enable them to compete with companies handing them their asses on a silver platter, implement new systems to replace ones that are often decades old, or reduce their current IT operating costs..

      Of course - IT is a cost center. There are various accounting tricks one can do to show that IT can "bring in revenue" but the reality is, it's a cost. A necessary evil since things are way too complex for all but the smallest mom and pop operation to not have some form of computerized business system in place.

      And just like other cost center departments like administration and such, businesses want to cut costs as much as possible because there's no hope that IT will ever make a dollar. Sure, they can ENABLE someone else to make a dollar, but directly, they don't make diddly.

      Departments that have direct revenues often get increased budgets (e.g., sales and marketing), while indirect benefits (e.g. R&D) often get minor increases, and the main engineering or bulk gets little to no increase (despite fulfilling what sales and marketing actually promise).

  3. Hello, IT. by tech.kyle · · Score: 3, Funny

    Have you tried turning it off and on again?

    --
    If we colonize Mars, it won't be the World Wide Web anymore. UWW?
    1. Re:Hello, IT. by gagol · · Score: 1

      Reynholm Industries, successful makers of [insert_your_guess_here]. Great quote!

      --
      Tomorrow is another day...
  4. BART by Anonymous Coward · · Score: 5, Interesting

    BART is run by the dumbest people on Earth. First off, it's takes a special kind of stupid to create a rail system that goes almost, but not quite all the way to the airport. 30 years later they extended to one of them but you still have to transfer to a bus for the last mile on another. Then you have to wonder what kind of idiot puts light carpet and cloth seating on public transport. 35 years later they start testing non-porous flooring/seating and maybe in another five years all of the trains will be switched over. Then, some bean counter got a bonus when they closed all the station bathrooms when 9/11 happened, ostensibly for security. Now a fifth of the escalators are out of service at any one time because they are clogged with human shit.

    I also heard there was some sort of labor dispute.

    1. Re:BART by Jane+Q.+Public · · Score: 3, Insightful

      "BART is run by the dumbest people on Earth."

      Well, you really do have to wonder when they say they worked through the whole night only to discover that this new, mysterious problem was caused by the updated they'd made the night before.

      I mean, wow. Wouldn't that be the first thing that popped into your mind?

    2. Re:BART by MrEricSir · · Score: 4, Informative

      The Bart-SFO extension was a matter of politics, you can't blame the people who run Bart for that. You also can't blame the initial designers for not building the OAK extension, since OAK was a much smaller airport in those days (and had very few passenger flights.)

      The train design was done by an aerospace company with absolutely no rail experience, which explains Bart's quirky design elements. But you can't blame Bart current management for construction contracts awarded in the 1960's.

      --
      There's no -1 for "I don't get it."
    3. Re:BART by Anonymous Coward · · Score: 3, Funny

      So people take a dump while riding the escalator? That's actually a cool idea.

    4. Re:BART by Anonymous Coward · · Score: 5, Insightful

      Plus, BART is not exactly a metro system like in Boston, Chicago, or New York. It's somewhere between a metro and commuter rail, but closer to the latter. It's a product of 1960s thinking, where people were trying to deal with the population shift out of the urban core. So part of the idea was to create high-speed transit from bed-room communities to downtown Oakland and San Francisco.

      Connecting the airports probably never figured much into the equation. It wasn't built to supplement the transportation needs of carless San Francisco residents. It was built to shuttle people around the Bay Area. If you needed to get to the airport, you got there like everybody else--you drove your car.

    5. Re:BART by gagol · · Score: 2

      To suspect something is one thing, to be sure of it you need to gather and analyse data at best. A night to confirm it is reasonable. And bathroom in a metro is a luxury, how many undergrounds have those facilities (dont know, none in montreal, canada)?

      --
      Tomorrow is another day...
    6. Re:BART by gagol · · Score: 2

      Let us know how it went for you!

      --
      Tomorrow is another day...
    7. Re:BART by bluemonq · · Score: 2

      > 30 years later they extended to one of them but you still have to transfer to a bus for the last mile on another.

      Pity you didn't have a spare $100 million a couple decades ago. I'm SURE you'd have been willing to pay for it, right? The extension to SFO wasn't built until recent times because back in the '60s San Mateo County quit the BART project, and the money wasn't around until the tech bubble started growing; ground was broken in 1997. The Oakland extension wasn't started until recently (opens in 2014) because again, there wasn't any money for it. The only reason it's getting built now is because Feds are footing a good chunk of the bill. OAK wasn't even all that popular an airport until last decade, after their renovation.

    8. Re:BART by Anonymous Coward · · Score: 1

      It was certainly a moving experience; quite uplifting. The person behind me didn't seem to fully appreciate the view; or having to climb backwards when I stopped at the top to wipe --- especially once certain stairs came 'round again full loop. I suppose if I wasn't a Republican, I might have cared about their distress --- but, screw it, shitting on people just feels so good. Made riding on the peons' transit system feel totally worth it.

    9. Re:BART by drinkypoo · · Score: 1

      It wasn't built to supplement the transportation needs of carless San Francisco residents. It was built to shuttle people around the Bay Area. If you needed to get to the airport, you got there like everybody else--you drove your car.

      But this just comes right back to how BART is stupid. Because when you build public transportation, it's going to be used by people who don't have cars, and to not take them into account is fucking stupid. Also, it's just stupid not to have the rail be able to take commuters from an airport to downtown no matter how you slice it. That should have been an initial design goal.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    10. Re:BART by SeaFox · · Score: 2

      If you needed to get to the airport, you got there like everybody else--you drove your car.

      But this just comes right back to how BART is stupid. Because when you build public transportation, it's going to be used by people who don't have cars, and to not take them into account is fucking stupid.

      Maybe the assumption was if you couldn't afford a car, you probably couldn't afford to be going on many flights either. Keep in mind air fare was a bit pricier in the 60's and gas was quite a bit cheaper. Financial bar for car ownership was lower.

    11. Re:BART by drinkypoo · · Score: 2

      Well, what I meant was that they should have taken both classes of passenger into account.

      Ideally this means having lines segregated by socioeconomic status. You don't want to go to the airport and the ghetto.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    12. Re:BART by phantomfive · · Score: 1

      Japan has them all over the place in Tokyo.

      --
      "First they came for the slanderers and i said nothing."
    13. Re:BART by xaxa · · Score: 2

      London Underground toilet map (not so great in the centre, but pretty good elsewhere).

      They're in probably half of European underground stations, on average. Expect to pay 0-50c, depending on the country.

      My local station (in London) has one, it's always very clean. I don't think many people use it.

    14. Re:BART by Hognoxious · · Score: 1

      Plus all the people who work at the airport live there.

      That's what the extra-large size luggage lockers are for.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    15. Re:BART by PrimaryConsult · · Score: 1

      And even at the busiest stations they're cleaner than my office bathroom.

  5. This is really surprising to me. by tlambert · · Score: 1

    This is really surprising to me.

    For all the "can not fail" systems I've worked on, there has been an identical set of hardware, along with other hardware to simulate load, on which you could try upgrades before you put them on a live system and cost the local economy tens of millions of dollars by screwing up.

    1. Re:This is really surprising to me. by DexterIsADog · · Score: 1

      Most of the "cannot fail" and "mission critical" and "we're betting the company on this" systems I have seen have one (1) production environment, and one (1) development environment that sort of looks like production, with light servers on each developer's system.

      I recently attempted to test the implementation of a client unlike any of those we had previously hosted, and the CIO and his Development VP told me, "we don't have the resources for that, we'll test it in production". It failed in production. I'm still picking up the pieces.

    2. Re:This is really surprising to me. by bill_mcgonigle · · Score: 2

      and cost the local economy tens of millions of dollars by screwing up.

      So what? What's BART's incentive to avoid this? The customers will go to a competitor? They'll lose their jobs?

      Unionized monopolies are a wonderful thing.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    3. Re:This is really surprising to me. by tlambert · · Score: 1

      and cost the local economy tens of millions of dollars by screwing up.

      So what? What's BART's incentive to avoid this? The customers will go to a competitor? They'll lose their jobs?

      They'll do what they did Thursday and Friday, and flood the roads with drivers who have cars for emergencies, usually take public transit, and are pretty inexperienced as drivers in regular traffic, not just "BART's out traffic". BART isn't really necessary; it's convenient for a lot of people, but once it drops below the convenience threshold, people simply won't use it.

    4. Re:This is really surprising to me. by bill_mcgonigle · · Score: 1

      I understand your argument, but do you think the BART employees really think that BART will get closed down if they don't do a great job?

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    5. Re:This is really surprising to me. by Hognoxious · · Score: 1

      For all the "can not fail" systems I've worked on, there has been an identical set of hardware, along with other hardware to simulate load

      Yeah gramps, we did all that in history class, along with slideframes and mainrules and all that.

      That's obsolete now because cloud and agile and webscale.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    6. Re:This is really surprising to me. by tlambert · · Score: 1

      For all the "can not fail" systems I've worked on, there has been an identical set of hardware, along with other hardware to simulate load

      Yeah gramps, we did all that in history class, along with slideframes and mainrules and all that.

      That's obsolete now because cloud and agile and webscale.

      Let me know when you get the next G.E. Medical systems MRI system running "in the cloud" rather than on a a local control system and a console in the next room, and then trust your life to the thing. Meanwhile, I think I will probably stick with the medical equipment I've worked on instead.

      P.S.: Let me know when your cloud is HIPPA certified.

  6. Software Has No Union Rep by Bob_Who · · Score: 1

    I guess you can't always save by eliminating humans and their expensive unions. Although, I'm sure the software was intended to pick up the financial slack for all of those expensive peeps. Don't worry, Wall Street is highly motivated to eliminate the humans with the software, eventually...

  7. Re:BART has drivers. by Anonymous Coward · · Score: 1, Interesting

    Because there is no means in the "cockpit" to actually make the train go. There are three buttons in a BART rail car:

    Open Doors
    Go to next stop
    Emergency Stop

    Not even a "close doors" button - that is handled by door sensors and the computer when "Go to the next stop" is pressed.

    Everything is automated. A chimpanzee could operate a BART train.

  8. Snapshots? by Neo-Rio-101 · · Score: 2

    First I'm not going to plug any VM vendor.... but with certain VM backends, snapshots are possible, and it's a godsend when crap like this happens.

    --
    READY.
    PRINT ""+-0
    1. Re:Snapshots? by Runaway1956 · · Score: 2

      You have to realize how few people even know what a VM is. Or a snapshot. Where I work, there is one backup made each week, on the server. No other machine has a snapshot, a disk image, a backup, there are no VM's - nothing. If/when a disk fails, that machine comes to a halt until a vendor is called in to replace the disk, the OS, and all the software.

      We have some fool who is referred to as "the IT guy". I can't even say that with a straight face. This is one of those who got a Microsoft-centric education, and proved to be pretty adept at accomplishing Microsoft-centric tasks - and just happens to be related to the company president.

      I know that our situation isn't unique.

      --
      "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
    2. Re:Snapshots? by rubycodez · · Score: 1

      you can do snapshots by other means than having VM software. Many volume managers and filesystems can do it, and some disk array controllers have that built in

    3. Re:Snapshots? by Anonymous Coward · · Score: 2, Interesting

      No. Just no.

      Have you ever actually tried this on a production system? I haven't (I'm not stupid enough to do that), but I've seen many others try. In almost every case, the resulting mess from "rolling back" a VM was greater then the mess of a botched software update to begin with. In one particular case, I witnessed a certain VM running some very expensive enterprise software totally hose itself and then proceed to blow away the majority of a database hosted on another VM after it was restored following a broken update. Despite their attempts to restore both VMs and bring them back in sync, they eventually determined that the data couldn't be trusted on either and the entire system had to be restored from backup. The downtime this cost them was greater then the downtime would have been had they simply called the vendor and said "your update broke our stuff, fix it" (they had the support contracts and the fix would have taken 10 minutes instead of 8 hours).

      Another time I saw someone restore a VM that was running a network daemon for a cluster of hardware locks attached to one of the nodes (of course, this VM was locked to that particular node since it required passthrough access to the USB dongles). That was a good one- not only did none of the licenses get checked back into the network daemon (so they basically lost all the capacity they had in use at the time of restore), but the licensing software freaked out and shat itself when the time stamps coming off the hardware were suddenly in the future (as the clock had not yet been synchronized back to local time). It took those guys several days of pleading with the software vendor to send them new keys and get the licensing system sorted out and working again (snapshots were permanently disabled on that VM thereon after).

      Now, it's an awesome feature to have for testing and development stuff- but for production, you should have procedures in place to deal with this kind of thing rather then reaching for the Big Red Button and nuking everything from orbit. I keep hearing about this kind of thing- "oh just restore the VM from snapshot in prod", and it makes me cringe every time I hear it. You don't restore a server from tape unless you absolutely have to. I fail to see why anyone thinks that restoring a VM from snapshot is any different- the only difference is that it takes seconds to complete, instead of hours.

    4. Re:Snapshots? by Todd+Knarr · · Score: 2

      Gods, no. Just... no. Think for a minute. If your VM's running a database server and you roll back to a snapshot, what happens? Well, the snapshot doesn't know anything about the database since that's an application-level thing, so it'll roll back to being mid-operation (times however many database operations were in progress). The problem is that since the clients haven't been rolled back to the same moment down to the nanosecond, the database is now mid-operation while the clients that're supposedly performing those operations... aren't. From here things proceed to go pear-shaped in a big way.

      It can be done safely, but it requires either intimate knowledge of the application by the VM host or bringing the applications to a safe idle state before starting the snapshot. Basically snapshots are far less useful than they're made out to be because the problem you're trying to solve is far more complex than just taking a snapshot.

    5. Re:Snapshots? by Anonymous Coward · · Score: 1

      > The second example you gave could have easily happened outside of a virtual environment. Imagine somebody did a restore from backup, or accidentally fucked up the system clock - the same thing would have occurred. That is just shitty software and not a problem related to virtual machines.

      Because people just love to take down a system for hours restoring from tape at random? My point was that they restored the VM from snapshot because it was a quick and easy process. The system itself went down for about a minute (the clients didn't even notice until the licensing manager started to refuse floating license checkouts) and then it was back online. The snapshot was recent (only 6 hours old), so no harm done, right? Wrong.

      In my experience, VM snapshots are dangerous precisely because they're so easy to implement, use, and abuse. Right clicking on a VM and selecting "Restore Snapshot" is infinitely easier then firing up your backup package and waiting hours for a server to restore from tape. The end result is mostly the same, save for the fact that VM snapshots will also store the CPU and RAM state to disk. Yet, people are rarely hesitant to roll back an entire VM when they should be treating it as a really quick full system restore.

      I'm not saying it's a bad feature. I use it a lot. Lots of people I know use it a lot. But we all use it responsibly, and it is never, EVER the answer to "something broke" or "something isn't quite working properly". It is one of if not the last resort after attempting to troubleshoot the problem properly inside the VM itself.

    6. Re:Snapshots? by PrimaryConsult · · Score: 1

      That's why you power down the VM to take the snapshot. The snapshot is also instantaneous rather than waiting for some vauge, sketchy attempt at quiescing the FS.

      If the downtime for a reboot is unacceptable, do not use snapshots.

  9. Good redundancy by bob_super · · Score: 1

    "assistant general manager for operations, said the system's backup computer had gone down at the same time its central supervisory computer crashed."
    Redundancy is not just running two boxes... How many times do we need to point out that there's a reason true redundancy is hard and expensive?

    TFA (sorry for reading it) states that the problem showed up 12 hours after the upgrade. That's why it's time-consuming to test hi-rel stuff, whatever bean counters say...

  10. Looks like Terry Childs had a point by Somebody+Is+Using+My · · Score: 4, Funny

    See what happens when you give these guys root access? ;-)

    1. Re:Looks like Terry Childs had a point by bluemonq · · Score: 1

      BART is a metropolitan transit system. The city government of San Francisco has practically nothing to do with day-to-day operations.

  11. Re:Never upgrade by s1d3track3D · · Score: 1

    So your posting from an un-patched windows 98 box? Or are you still on 3.1?

  12. Manual operation by manu0601 · · Score: 2

    I have seen quite efficient manual train network operation, but the workers behind the success could explain it was only possible because they had a few old timers who where still able to organize train flows using paper and pencil. Younger workers had always worked with computers, and when all the old timers will all be retired, the know-how will be lost.

    1. Re:Manual operation by manu0601 · · Score: 1

      You got an idea of the complexity.

      Now try to imagine the job of a traffic regulator for an average European city train station: it is the same exercice with dozens of tracks and switches, and hundreds of trains a day. Your tools are sheets of paper, pencil, and a telephone to call the workers that run the train and the ones that switch lines (manually, of course).

      Some trains are late, they get stopped for various reasons. And in order to ease you, freight trains can be added on the fly as soon as there is a 2 mn gap between two trains.

      And of course if anything get wrong, there will be angry customers interviewed on TV news complaining that you are a lazy civil servant that does not knows what real work means.

  13. Not so much the bureaucracies by rsilvergun · · Score: 1

    it's more the contractors refusing to train and keep their hires. Nobody wants to keep someone around. They cost more every year. But for programmers that means nobody knows how anything works. It keeps profits high for the guy running the sub-contractor, but it means crummy software...

    --
    Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
  14. So does somebody go to jail? by dbIII · · Score: 1

    Terry Childs was locked up on the off chance that something far less disruptive than this would happen. At least that was the excuse.

    1. Re:So does somebody go to jail? by bluemonq · · Score: 1

      BART is not under the governance of San Francisco.

  15. Re:BART has drivers. by bluemonq · · Score: 4, Interesting

    You've almost certainly never ridden BART, much less seen the driver's cab. Why do I say this? Because there's a section of the BART system (the Oakland Wye, bane of commuters who want to get anywhere during rush hour) where drivers are instructed to go to manual control, limited to 25 MPH. It's the result of your vaunted "automated" system designed in the '60s never having worked properly in the past 50 years, and one of the contributing factors to a crash in 2009 (thankfully no one was seriously injured). There are many well-documented incidents of entire train sets disappearing from the computer system, as well as "ghost" trains randomly appearing.

    Here is what an actual BART cab looks like:
    http://i.imgur.com/IbYtYTa.jpg

  16. Re:Never upgrade by bluemonq · · Score: 1

    It was broke (and remains so) decades ago. The automated system never really worked properly.

  17. computers run the track swtichs by Joe_Dragon · · Score: 2

    computers run the track switches

    1. Re:computers run the track swtichs by HiThere · · Score: 1

      I bit more than that. There was an incident a decade or so ago when the driver got out to fix a jammed door, and when it was unjammed, the train decided it was time to take off for the next station. It got there, stopped, and opened the doors. And waited for the driver to show up.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
  18. Terry Childs pissed off the city and he worked for by Joe_Dragon · · Score: 1

    Terry Childs pissed off the city and he worked for them.

    Likely in this case some out side vendor / contractor messed up.

  19. Good grief! by Rhurazz12 · · Score: 1

    If the recent strike wasn't bad enough, now a computer glitch. Man, if I was riding the transit to work and back I would be extremely pissed. Wonder how many people had lost their jobs because they couldn't make it to work??

  20. Rich hippies don't ride the train by gelfling · · Score: 1

    They pilot their solar powered dirigibles.

  21. Don't need a qualifier if there's only one... by Hognoxious · · Score: 1

    I'm sure that if you asked them the answer would be along the lines of "Huh? What's a production system? We just call it the system."

    I once argued for retention of a QA system, which was basically a 4 week old copy of Prod. Things like being able to replicate actual problems with actual data, test new functionality & patches without impacting the business counted for less than some little tart's fluttering eyelashes. Of course that's what management wanted to hear, because an extra server is just a wasted expense, right?

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."