When Computers Go Wrong
Barence writes "PC Pro's Stewart Mitchell has charted the world's ten most calamitous computer cock-ups. They include the Russians' stealing software that resulted in their gas pipeline exploding, the Mars Orbiter that went missing because the programmers got their imperial and metric measurements mixed up, the Soviet early-warning system that confused the sun for a missile and almost triggered World War III, plus the Windows anti-piracy measure that resulted in millions of legitimate customers being branded software thieves."
TFA article should have been named the 'Worlds ten most calamitous logic cock-ups' instead. Because in the end, malformed, ill-tested or and unforeseen logic compensation(s) caused those issues, not computers themselves.
List fails without the therac 25
Due to the imperial-metric mash-up, the sums were so far askew that when Ground Control initiated boosters to secure the pod in orbit, all they succeeded in doing was firing it closer to the planet, where it burnt up in the atmosphere.
When I see the Imperial-Metric confusion shit, I just want to slap the shit out of someone. That waste because some engineers are incapable of using Metric or some vendor just doesn't want to spend the money to modernize their machinery. I know of an aerospace contractor that is using machinery from the 50s - yep, they're constantly being recalibrated and sometimes they don't notice - ooopsie!
And when I see that we, the US, are one of two countries still on Imperial - one is some Third World non-industrial country, I want to barf.
And then, when I have to buy two sets tools to work on a car, I wish for the entire US auto industry to go bankrupt and be replaced with some modern companies.
I love Metric. It makes measurements and calculations much easier - quick! What is the mass of 329 mL of water? You'd need a calculator to do something similar in Imperial.
It isn't smart to assign a 64 bit floating point to a 16 bit integer - unless you want to crash you first flight of the heavy Ariane 5 rocket... (http://en.wikipedia.org/wiki/Ariane_5#Notable_launches)
http://www.pcpro.co.uk/features/363580/when-computers-go-wrong/print
As a fellow programmer I worked with years ago was fond of saying, "Computers don't make mistakes. They do, however, execute yours VERY carefully."
The "Switchboard meltdown" problem sounds like the incident which led to the creation of the EFF.
Basically, someone forgot to include a ";" in a C program, which led to the problems at ATT. Originally, they thought it was due to "hackers", and called in the Secret Service.
The Secret Service in turn busted a gaming outfit called "Steve Jackson Games". Who was completely innocent, of course, but that has never mattered to the Secret Service when they need to look like they are actually useful. The SS confiscated the computers, all illegally.
The ACLU refused to get involved, so John GIlmore (formerly of Sun, and who worked with Richard Stallman to get out an open Operating System around that time) created the EFF to fight the unconstitutional raid on Steve Jackson Games. The EFF trounced the Secret Service in Court, and was thus born. I believe if you google for "Steve Jackson Games", you can still find the original story around.
So, in a way, you can say that the EFF was created due to the single misplacement of a semicolon in a C program. Would that all of our bugs have such results. :)
(See title.)
:per directory (more or less equivalent to /dev on a *nix box) on a Data General mainframe machine running AOS/VS. While hundreds of users' processes disappeared off the system (which took about 90 minutes), I found it expedient to simply make my confession to the boss.
Any of us who have been in a sysprog or sysadmin role for a significant amount of time (by which I mean double-digit years) will often have at least one anecdote of some monumental cockup we've perpetrated.
My worst case in point is where I managed (IIRC after a long liquid lunch) to delete the
Fortunately, in this case, the escapade was more or less written up as "Shit Happens", which I thought was generous...
Te Soviet pipeline explosion seems to be an urban legend, traced to a single source: At the Abyss: An Insider's History of the Cold War, by Thomas C. Reed.
There is no mention of this explosion anywhere else, either in Russian or Western sources. If you can read Russian, some debunking is here:
link
One of the facts mentioned there is that there was no SCADA on Soviet pipelines until late 80-s. All control was still pneumatic in 1982, with no software involved.
to comments, I thought the deal with the big blackout was that the network(TCP/IP) was flooded with a Windows virus infection and if you know TCP/IP, it's not very good with lots of traffic. There was so much traffic that the computer( a UNIX box ) sending status messages to the control room display system could not get messages out of it's buffers. TCP/IP does this thing where the message isn't put on the network if there's going to be a collision and it waits some before trying again. With the network flooded with Windows based computers trying to infect each other, the warning messages were stuck in the UNIX box and eventually the buffers filled up as more and more warning messages queued up. They seem to be blaming the UNIX box software because the software ended up crashing because they didn't catch the situation where they buffers overflowed. IMO, that was caused by Windows and it's ability to be a great petri dish for viruses and the idiots who keep putting Windows systems on critical networks.
The second comment I have on this is about missing the LAX Communications system software crash which caused multiple near misses on the tarmac and in the air when air traffic controllers could not communicate with pilots because of the crash. The cause of the software crash was a UNIX system was replaced with a Windows based system which had a known flaw. The flaw was that the OS could not run for more than 39 days no matter what was running on it. The system and software was still approved and put inplace with a maintenance instruction of rebooting the computer every 30 days. In comes a new employee who sees things are working fine so he/she doesn't reboot the computer and 9 days later the system crashes. The backup does the same and both are unable to recover and it takes hours to get the system back running again. That should have been in the list IMO.
There was also the CSX Railway situation when lots of its signals go offline because they are run by Windows and their Windows computers got a virus.
It would be nice to see a more complete and more accurate list of these kinds of computer software failures.
LoB
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus