Failed Software Upgrade Halts Transit Service
linuxwrangler writes "San Francisco Bay Area commuters awoke this morning to the news that BART, the major regional transit system which carries hundreds of thousands of daily riders, was entirely shut down due to a computer failure. Commuters stood stranded at stations and traffic backed up as residents took to the roads. The system has returned to service and BART says the outage resulted from a botched software upgrade."
They should have brought their skateboards to work.
Why was a weekday selected for this software update?
BART has real drivers and I would assume a legacy intercom system. Why do they need computers at all? It's just another thing to go wrong and break down.
Have you tried turning it off and on again?
If we colonize Mars, it won't be the World Wide Web anymore. UWW?
This is why I don't upgrade shit. If it isn't broke, don't fix it.
BART is run by the dumbest people on Earth. First off, it's takes a special kind of stupid to create a rail system that goes almost, but not quite all the way to the airport. 30 years later they extended to one of them but you still have to transfer to a bus for the last mile on another. Then you have to wonder what kind of idiot puts light carpet and cloth seating on public transport. 35 years later they start testing non-porous flooring/seating and maybe in another five years all of the trains will be switched over. Then, some bean counter got a bonus when they closed all the station bathrooms when 9/11 happened, ostensibly for security. Now a fifth of the escalators are out of service at any one time because they are clogged with human shit.
I also heard there was some sort of labor dispute.
This is really surprising to me.
For all the "can not fail" systems I've worked on, there has been an identical set of hardware, along with other hardware to simulate load, on which you could try upgrades before you put them on a live system and cost the local economy tens of millions of dollars by screwing up.
I guess you can't always save by eliminating humans and their expensive unions. Although, I'm sure the software was intended to pick up the financial slack for all of those expensive peeps. Don't worry, Wall Street is highly motivated to eliminate the humans with the software, eventually...
First I'm not going to plug any VM vendor.... but with certain VM backends, snapshots are possible, and it's a godsend when crap like this happens.
READY.
PRINT ""+-0
"assistant general manager for operations, said the system's backup computer had gone down at the same time its central supervisory computer crashed."
Redundancy is not just running two boxes... How many times do we need to point out that there's a reason true redundancy is hard and expensive?
TFA (sorry for reading it) states that the problem showed up 12 hours after the upgrade. That's why it's time-consuming to test hi-rel stuff, whatever bean counters say...
I put my hand upon your hip When I dip you dip we dip
I put my hand upon your hip When I dip you dip we dip
I put my hand upon your hip When I dip you dip we dip
I put my hand upon your hip When I dip you dip we dip
From a transit authority born from a constellation of institutions based on a bunch of "educated" people all telling each other they are right. The fact is most of todays institutions are completely out of touch with reality.
a) the medical profession
b) the legal profession
c) academia
d) state and federal law enforcement
e) etc...
Everything from dietary recommendation which have led to an increase in diabetes and cancer, recommending yet more carbs and less fats, etc. Most doctors couldnt find their way through a human metabolic map. Anyway let them build a robotic army, we will see who ends up in control of them.
my .0..1 BTC ;)
See what happens when you give these guys root access? ;-)
It's the lack of a decent rollback plan and making sure they had enough time and resources to rollback.
I have seen quite efficient manual train network operation, but the workers behind the success could explain it was only possible because they had a few old timers who where still able to organize train flows using paper and pencil. Younger workers had always worked with computers, and when all the old timers will all be retired, the know-how will be lost.
it's more the contractors refusing to train and keep their hires. Nobody wants to keep someone around. They cost more every year. But for programmers that means nobody knows how anything works. It keeps profits high for the guy running the sub-contractor, but it means crummy software...
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
Terry Childs was locked up on the off chance that something far less disruptive than this would happen. At least that was the excuse.
Using sharepoint?
computers run the track switches
Only one letter difference between outage and outrage.
Terry Childs pissed off the city and he worked for them.
Likely in this case some out side vendor / contractor messed up.
It is like a magical mystery tour of 1960's technology, including communications and information, and mind-think on display for all to enjoy.
A Dodgy software update. Ah! Bart IT runs on COBAL (beloved of US Federal DoD contractors), the IT of the Future! See. It is all very simple. Bart is Light-Years ahead of the mere humans who try to "run" it by leaps and bounds. Yet the Bart IT team encourages Sabots age, i.e. the tossing of Sabots into the gears of the "machine." We will have to wait for the intrepid Bart IT engineers to evolve to a sufficient brain capacity and comprehension level to understand the IT of the Future, COBAL.
COBAL, a gift from the GODs themselves no doubt.
QED
Well, you really do have to wonder when they say they worked through the whole night only to discover that this new, mysterious problem was caused by the updated they'd made the night before.
adventure tours in vietnam
If the recent strike wasn't bad enough, now a computer glitch. Man, if I was riding the transit to work and back I would be extremely pissed. Wonder how many people had lost their jobs because they couldn't make it to work??
They pilot their solar powered dirigibles.
I'm sure that if you asked them the answer would be along the lines of "Huh? What's a production system? We just call it the system."
I once argued for retention of a QA system, which was basically a 4 week old copy of Prod. Things like being able to replicate actual problems with actual data, test new functionality & patches without impacting the business counted for less than some little tart's fluttering eyelashes. Of course that's what management wanted to hear, because an extra server is just a wasted expense, right?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."