Examples of Programming Gone Wrong?

Re:Challenger by Anonymous Coward · 2002-10-27 06:46 · Score: 1, Interesting

Erik, don't troll. The Challenger accident was a mechanical failure, had nothing to do with software, but if you want a software project gone wrong example, I'll give you one: Gah NU/Hurd

The Ultimate Example by Anonymous Coward · 2002-10-27 06:47 · Score: 1, Interesting

12 years of devlopment and it still sucks!

RTM Worm by rwash · 2002-10-27 06:48 · Score: 5, Interesting

In the 80's, Robert T Morris accidentally released a worm that exploited problems in sendmail and other common internet daemons that took down most of what was the internet at that time. This was expecially bad since about half of it was military.

Re:RTM Worm by ProfessorPuke · 2002-10-27 13:17 · Score: 5, Interesting

"accidentally released" is wrong, or prehaps whitewashing by RTM's friends. The release was fully intentional. What was accidental about it is that he hadn't realized that in addition to infecting virtually every UNIX system it found, it would also DOS them. The worm constantly tried to infect every available system, meaning that a system which was vulnerable would recieve many, MANY copies of the worm, exhausting its processing power.

RTM had been aware of the possiblilty, and implemented a fix- but he did it wrong. He'd created code so that a new worm, when first arriving at a host, could check if a previous instance of the worm had been there. If so, it could abort its infection process.

However, he was afraid that this would make vaccinating machines too easy (by sysops faking the "already infected" flag), so he created a 12.5% random chance that an incoming worm would ignored the fact that a machine was already compromised and infect it again. That probability had NO rational basis behind it, (in fact the whole idea of using randomizing like this is flawed), and served to postpone the shutdown of the internet by at most an hour.

This was an especially bad blunder because it set a frightening example of what hackers could do. If RTM had used a 100% chance of non-reinfection, (and played his cards right from then on), he'd have been hailed as an innovative security analyst who'd prevented security-compromising violations of the Pentagon's systems. Instead he was tossed in prison for years.

You forgot the part that went wrong. by mindstrm · 2002-10-27 06:53 · Score: 3, Interesting

The only reason it took thigns down was because a timing loop was messed up, and it was spreading something like 1000 times too fast. It was supposed to spread everywhere, yes, but by crawling slowly.. it was not intended to eat up all connections on all machines.

Had that been the case, it would have been much more widespread and caused much less damage.

Re:Challenger -- AT&T had the biggest gaff. by telecaster · 2002-10-27 06:54 · Score: 2, Interesting

I'll agree that some programming errors *could* be fatal, but the one that comes to mind is the "2 line change" from AT&T that essentially knocked out phone service throughout the east and mid-west in 1990. It was the topic if many quality assurance seminars for the better part of the early 90's. I only remember it because it effected my company -- we lost phone service for 2 days. It was also one of those traditional "last minute changes" that someone clearly f*cked on...

http://www.soft.com/AppNotes/attcrash.html

Why, the world's favorite mail client, by Scarblac · 2002-10-27 06:55 · Score: 5, Interesting

Outlook!

Built with the idea that code in attachments should be executable, often automatically. Also full of exploitable bugs, to get even more stuff running automatically, regardless of who who sent it. Responsible for a huge amount of damage by all sorts of worms, trojans, etc.

Someone, somewhere got the idea that email would look better with html; and if it got html, it should get scripting too, that's consistent with web pages! And it's cool if attachments (like pictures) can be opened in their appropriate program automatically - let's run any executables then, that's consistent!

This is oversimplified, but I really feel that this is a case of stupid consistency that caused multi-billion dollar damage. Email should never be executed by the mail client.

--
I believe posters are recognized by their sig. So I made one.

That was an easy setup by Ghoser777 · 2002-10-27 06:57 · Score: 5, Interesting

A clear example is: [insert random microsoft product].

Oh wait... -1 Redundant

Here's a good site though with tons of examples.

My favorite would be the infamous time when NASA did half its calculation in metric and the rest in SI. ;)

F-bacher

--
James Tiberius Kirk: "Spock, the women on your planet are logical. No other planet in the galaxy can make that claim."

A Great Story by puppetman · 2002-10-27 06:58 · Score: 5, Interesting

that was told to my class about the altitude of fighter jets.

A company was hired to rewrite the code that was used on one of the models of fighter jets, and they offered to fix an unusual bug.

The details are: apparently they had two altimeters - one was barometric, and the other I don't remember.

Anyway, the programmer was coding along, and was writing code to determine what would happen if the altimeters stopped functioning.

He came to the case where they both weren't working, and couldn't figure out what to do, so called one of the pilots that was acting as an information source for the developers, and asked him what altitude they normally flew at, and he answered, "12,000 feet" or something similar.

So the programmer wrote,

if altimeter1 not working
{
if altimeter2 not working
{
set height = 12000;
}
}

Stupid, but this code could not be changed. The pilots had the following rule deeply ingrained: if the altitude stays at 12,000 for more than a few seconds, pull up, as your altimeters aren't working.

Train collision by Shaddup · 2002-10-27 07:04 · Score: 5, Interesting

A company I once worked for (as an intern) was in the business of what's called "train control" software. Briefly, it's the software that dispatchers use to monitor the status of the switches, the position of all the trains being tracked by the system, etc. One of the features of the system is to provide early-warning of potential collisions. Well, the system is quite reliable (having been in service, in one form or another, since the 70's). However, there have been some accidents.

Once such accident, in Mexico, was caused by an unexpected combination of several simultaneous failures. One day, for some reason, one of the servers needed to be reset. At the same time, two freight trains were stopped at a switch, in the process of what's called a "pass," where one train turns off onto a side track to let the other train pass by on the main track. Long story short, the status bits of the switch got lost during the server reset (there is a provision for restoring track states when the backup servers take over, but it didn't work for some reason). After asking if the track was clear, the driver for train1 recieved a green light from the dispatch office. The dispatcher, not knowing that train2 hadn't cleared the switch yet, figured everything was ok. The trains collided at very low speed, and not head-on, but nonetheless the collision cost the rail line several million in equipment and downtime. No one was hurt.

The lesson: When writing bullet-proof software, check every possible condition! More extensive field testing would have caught the failover bug.

Non-life threatening, but interesting bug... by Anonymous Coward · 2002-10-27 07:05 · Score: 5, Interesting

I'm an AC for a reason...

Let's just say that two years ago a very large international shipping company suffered two days of worldwide failure in the package routings printed on labels. The bug was caused by an incorrectly placed paren in an index offset calculation, leading to truncation of an intermediate result (to a 16 bit unsigned int, when it should have been 32). The bug sat dormant for five years because the result matrix it was indexing into was smaller than 64kbytes. As soon as it grew over that size - boom! What a way to wake up at 2am when the Asian-Pacific region starts calling...

I didn't make it, but I was definitely involved with the fix. After that we did some very thorough auditing on all of the routing code - and fortunately didn't find any other surprises lurking.

Airbus by That_Dan_Guy · 2002-10-27 07:07 · Score: 5, Interesting

This isn't really a programming error, but a user training error.

In the Airbus if the pilot tries to correct (use the flight controls) while the computer is engaged the computer will correct the pilot's correction. Unlike in a car with cruise control where if you hit the breaks it just cuts the cruise control. Many China Airlines planes have crashed due to poor pilot training in this regard. They weren't trained well enough to shut off the computer control before taking control of the plane.

I'm also sure someone can be a little more detailed than this, but it is, IMO, at least a design error that has caused hundreds of deaths.

As a side note, my Software Engineer professor refused to ever fly on a fly by wire plane, and was opposed to SDI simply because he didn't beleive that either had been or ever would be debugged properly. (if there is one error in every 10,000 lines of code, and it has 3 or 4 million Lines of Code, how many errors is that? His answer: too many to trust)

Re:Airbus by jamesl · 2002-10-27 10:51 · Score: 2, Interesting

Most of the problems have been related to operator (pilot) error related to inadequate training. Investigators are unhappy at the number of times the Cockpit Voice Recorder transcript includes the phrase "What's it doing now?" or "Why did it do that?".

While these accidents and incidents are chalked up to "pilot error", the industry is beginning to understand that in many cases the user interface or design of the system is at fault.
Re:Airbus by it_atheist · 2002-10-27 15:46 · Score: 2, Interesting

Another funny Airbus story I've heard was as follows: (Apologies if I'm propergating yet another urban myth). In the 90's in India or Sri Lanka (can't remember which), an airbus was being pushed across the tarmac by the usual towtruck. It had it's engines engaged and warmed up as ususal. Air trafic control instructed the pilot to stop as another another plane would soon be passing behind. The pilot attempted to call the tow-truck driver, but there was no answer. So the pilot applied the brakes. Both the plane and the tow-truck stopped. But now it becomes interesting. The tow-truck driver revs his engine and pushes harder. The plane begins to move backwards. The pilot (rather annoyed by now I imagine) applies the brakes as had as possible. The plane stops. The tow-truck drive repeats his trick, but this time there is no way that the plane wil budge. Instead the weight was taken off the front wheel. The Airbus software noted that there was no pressure on the front wheel - so clearly the plane was airborne, but after checking the flightspeed (zero) the computer decided that the situation was unacceptable, and quickly applied full power to all the engines. The plane broke free and thumped into the hanger before the pilot could shut everything down. No-one was hurt and the damage was harldy spectacular - but this is what can happen when you send in a machine to to do a human's job. My same source summed up the design philosophies of Boeing and Airbus as follows: Boeing: Provide every possible instrument and computer to assist the pilot. Airbus: Provide every possible instrument and computer to replace the pilot.

Bye bye credit purchases by cat_jesus · 2002-10-27 07:23 · Score: 3, Interesting

I worked for a programmer back in the 80's who made a mistake that caused all credit card purchases to disappear from the electronic journal. This meant that their purchases were not recorded on their credit card statements. Fortunately for the company the bug did not affect the recording of the transactions on the paper journal. This bug wasn't discovered for a few days and it took quite some time to rekey all the credit transactions.

Unfortunately this was not her first or last mistake of this magnitude. Retailers often see IT as an expense rather than an asset and are as cheap as possible. This has a tendency to cause shoddy programming since they hire as few programmers as possible and overwork them and often software is put into production without being thoroughly tested. At least this was the case when I worked in retail some ten years ago--I don't think I'll do that again.

But I am finding that insurance companies have the same philosophy.

Re:Bye bye credit purchases by Anonymous Coward · 2002-10-27 10:17 · Score: 1, Interesting

Hmmm, insurance companies...

One of the major insurance companies - I mean, worldwide - had a car insurance division that has an old master database (for claims and for policies) with a proprietary database interface - with networked nodes as opposed to being relational. I believe, circa mid 2001, it was coded in COBOL, yet it was code orginially written in BASIC about 15 years ago. I pleaded for them to switch to something a little more modern - something with ODBC access and a SQL interface, or at least in that generation (Oracle, while not necessarily being the recommendation, was a prominent example used). They seemed to hear none of it, as they had no near or distant plans for a system overhaul.

I mean, it's sad. These guys nationally are in the top 10 and were in the middle of conducting aggressive advertising and pricing campaigns to grab more customers - yet they were on the verge of losing access to their "pending claims" reports because... (drumroll)... it was implemented as pivot tables in Excel. I don't mean that the reports were outputted in Excel format... I mean, it was a template with the entire pending claims table as its own worksheet, with pivot tables generating the reports in other worksheets. Keep in mind that car insurance claims can stay open for months, or years, until all injuries/damages are settled. Basically, when I left, the number of pending claims was somewhere around 61,000, and climbing fast... :(

(oh, and they had some special COBOL-like script to output this pending claims data from the database into a comma delimited file. And only one lady at the company knew how to do this)

But hey... they weren't any worse off than any of the other insurance companies apparently, so it's not like they needed to change to stay competitive.

This is why I'd like not to own a car.

Coincidentally - I work in a different industry currently, and it turns out that now I'm the one who is the only one in the company who has to transfer delimited files from one crusty data system to another... and they have called me more than once while I was waiting in an airport asking how to do it. And, to make things worse, I used to make more money in college per hour delivering chicken sandwiches. As a BS in Computer Science, I will never make fun of art history majors again. ("Hahahah, how are you ever going to make money with THAT major?")

Re:Challenger by snStarter · 2002-10-27 07:34 · Score: 2, Interesting

Not ONE hardware problem...ever?

Clearly you are forgetting the Apollo I fire which resulted from a spark in a pure O2 atmosphere. The spark was caused by a frayed wire. That's a hardware problem for sure.

And these are only the problems that get past QA! by testharness · 2002-10-27 07:37 · Score: 2, Interesting

Despite what may programmers think, they do make many mistakes. Having been in QA for more than 7 years, blimey, the stories I could tell.

For example. Once there was a requirment for a windows program to do nothing. If it started up, it would just shut down . Simple? I would have thought so - even if it wasn't, it was simple for the developer to unit test. It took 7 attempts. Ranging from opening a window and sitting there - through several GPFs - and at least one reboot.

Then there was one time (of many) where despite assurances from development that the product had been properly unit tested, it would core dump on start up.

My point is that any CS student should understand the whole development process. It is more than just programming. Whilst neither of the above were life threatening, it illustrates a point. No matter how many examples of catastrophe and failure you find, there would be alot more without testing and QA.

Of course, you could take the point that all those public failures are a result of lax QA.

Ya got to love this one.... by Anonymous Coward · 2002-10-27 07:52 · Score: 1, Interesting

F16 autopilot flipped plane upside down whenever
it crossed the equator.

They should have known from the water going the wrong way when flushing !

Re:Challenger by Anonymous Coward · 2002-10-27 07:52 · Score: 1, Interesting

NASA hasn't ever had a hardware problem. Or a software problem. Ever.

What about Apollo 13?

Re:Y2K? by Doug+Neal · 2002-10-27 08:07 · Score: 2, Interesting

Well the reason why Y2K wasn't the huge disaster the media were predicting was because in the years leading up to it the world's programmers were running around like blue-arsed flies fixing everything :P

Re:Apollo 1 / hardware fault by ckedge · 2002-10-27 08:26 · Score: 3, Interesting

.
I've read in-depth technical analyses of the Apollo fire, and I have an MSc in Physics.

Before that, *no-one* knew that a spark in one place could cause a fire TWO FEET AWAY.

(You get little hot bits of burnt dust floating around in a pure oxygen atmosphere, and they keep themselves hot enough to set something else afire quite a ways away. Of course things are *easier* to set fire to in that atmosphere as well.)

See your nearest mirror by Anonymous Coward · 2002-10-27 08:27 · Score: 1, Interesting

Well, sort of no kidding as after quite some years of being in the job I dare say that 9 out of 10 people are not interested in getting it right in the first place.

Usually the story goes something like, well, take your pick ...

I am a ./ reader so I am a geek and so I do know.
It compiles, it works, so it must be correct.
But ...
... and who are you by the way ?

... and so on and so on. Say in short just let someone else skim through whatever coding you have done up yet and let find her or him what you have done wrong. It is the best source of errors you can get because they are all yours, caused by your lack of knowledge, unwillingness to accept computers do only know of right or wrong and nothing in between, your style, your whatever.

Whether you will be willing to accept what you are going to see is a different question altogether and of course having a good laugh at others is more fun, yet it there is a difference between being just another coder out there and being a developer.

IMHO one ought to aim for the latter and once you have become your harshest critic you are on the right path.

Cambridge University and CAPSA by 26199 · 2002-10-27 08:27 · Score: 2, Interesting

It was a new financial system, and it was a real mess - something like £9m initial cost and £20m due to its flaws. According to Anthony Finkelstein, who's written a very detailed report on the fiasco:

Significantly more costly than had been anticipated (worse than it appears because of hidden costs)
Substantial disruption to working of the University
Placed staff under undue pressure
Placed the finance of the University at risk and may have prevented the University and its staff from fulfilling their legal responsibilities

You can read his full report here (pdf) or here (google html version). There are also news reports on the system here and here.

Basically, it was bad management throughout... a classic case of a big software project gone wrong.

Puget Sound Ferry Boats by Anonymous Coward · 2002-10-27 08:29 · Score: 1, Interesting

About 25 years ago, Washington State Ferries had a new fleet of boats with computer controlled engines. The code included "safety" features to protect the engines and transmissions from abuse.
So, when a ferry was about to crash into a dock, and the captain called for full reverse power, the software would shut the engine down to protect it......and the ferry would crash into the dock.

A more useful angle... by EasyAs314159 · 2002-10-27 08:30 · Score: 2, Interesting

Horror stories (lost rockets, etc) are certainly attention-getters, but a more useful question might be what kinds of errors got made, regardless of how severe the outcome.

For example, I once helped a newbie employee with a program that was working fine in a simple test case, but was blowing up when it tried to crunch through a production file.

After digging a little, I noticed that she was using recursion in her "GetNextInterestingRecord" routine! The logic was:

1) Get a record
2) See if it's the kind we want
3) If not, Call self
4) return record to main

I'm not sure why she chose to use recursion (too many classroom lectures on "cool" stuff and too little experience with getting useful stuff done?), but the program needed "interesting" records every so often to keep from overflowing the stack.

Clearly recursion should be confined to those problems where it's really needed, and not used just because you can find a way to state the problem using recursion. And even then, you need think about how big the stack will get, and what sorts of scenarios could cause it to get too big.

Re:Challenger by the+bluebrain · 2002-10-27 08:53 · Score: 2, Interesting

[...] Mars probe that crashed because of mismatched units. And that was just poor communication among the software guys.

So if it's not a bug, it must be a feature :) ? If you had been responsible for that piece of software, would you have sat together with the NASA guys after the analysis, and claimed it wasn't a bug? Errr....

Have an article on the guys who write the stuff. They're damn good, but they say themselves their programs contain errors: "the last three versions of the program [...] had just one error each. The last 11 versions of this software had a total of 17 errors." Apparently never caused a problem, but not bug-free.

Then there was the Canadarm2 issue. Or wasn't that a bug either :) ?

--
yes, we have no bananas

Re:not true by Anonymous Coward · 2002-10-27 09:20 · Score: 1, Interesting

Whether or not NT was responsible for this particular glitch, according the above article, the engineers involved were not too impressed with NT {emphasis added}:

But according to DiGiorgio [a civilian engineer with the Atlantic Fleet Technical Support Center in Norfolk], who in an interview said he has serviced automated control systems on Navy ships for the past 26 years, the NT operating system is the source of the Yorktown's computer problems.

NT applications aboard the Yorktown provide damage control, run the ship's control center on the bridge, monitor the engines and navigate the ship when under way.

"Using Windows NT, which is known to have some failure modes, on a warship is similar to hoping that luck will be in our favor," DiGiorgio said.

....

Ron Redman, deputy technical director of the Fleet Introduction Division of the Aegis Program Executive Office, said there have been numerous software failures associated with NT aboard the Yorktown.

...

The Yorktown has been towed into port several times because of the systems failures, he said.

"Because of politics, some things are being forced on us that without political pressure we might not do, like Windows NT," Redman said. "If it were up to me I probably would not have used Windows NT in this particular application. If we used Unix, we would have a system that has less of a tendency to go down."

Oh, yes. Personally, I'm am very glad our military has placed its faith (and the lives of our mariners) in such reliable technology.

OT: Scuds and Patriot missile defenses by GuyMannDude · 2002-10-27 09:21 · Score: 5, Interesting

People keep pointint to the floating point error as the cause for why the Patriot system at that time (the PAC-2) let that scud go through. But as I've already pointed out in an earlier post, the PAC-2 did a crappy job (far worse than is generally known) intercepting scuds not because of coding errors but because the problem of hitting an erratically moving missile was so difficult. I think it's important to get the word out as we approach a new war with Iraq and consider a national missile defense shield. This recent article briefly discusses Israel's own attempts at missile defense because they don't trust the PAC-2 (for good reason) and it's questionable whether the US is going to give them some PAC-3 batteries.

Bottom line: that stuff about the floating point error in the PAC-2 system looks neat on paper but it's not at all clear that the faulty calculation was responsible for the loss of life.

GMD

--
watch this

F-16 (or maybe one of the other fighers) by bmwm3nut · 2002-10-27 09:42 · Score: 2, Interesting

my cs teacher told me this one back in college...he said one of the first runs of the f-16 (or maybe another one of the computer controlled fighers in the air force) they were flying and everything worked just fine. however they took it across the equator and the plan flipped upside down. so the pilot corrected it and everything went back to normal. then he flys across the equator again and it flips.

so they took a close look at the software, and there was a bug in their sin function so that when they went across the equator they angle changed from positive to negative and the sin function didn't have the negative incorporated. so basically when the plane went over the equator it thouht it was upside down and corrected itself by flipping itself upside down.

i think it's a funny example of a stupid mistake possibly making a catastrophe. i've never seen this mentioned elsewhere, so i'm not to sure about this. but i do trust the cs prof who told me, before coming to my school he did a bunch of government contract work.

Re:That's kind of silly by pete-classic · 2002-10-27 09:44 · Score: 4, Interesting

From a text I am currently working on:

The compiler requires you to declare variables, but does not require you to initialize them. Does that mean you can get away with leaving them uninitialized? Well, you might program your entire life without coming upon a reason not to. If you don't initialize them, however, you will almost certainly run into a very difficult bug, probably sooner than later. Using an uninitialized variable is perfectly valid syntax, but is always a logic error. The compiler won't complain, but you will get wild, unpredictable, and wrong results. In the worst case, you might get believable, but wrong results. This leads us to what to use as an initializer. Most people use zero. Using an "obviously wrong" value may be more useful. Often a maximal value (such as int students="65536") is more obviously wrong. [Emphasis added for this post.]

This isn't variable initialization, but the principal replies. Data that you know are junk should look like junk! Trying to "fake it" or make it "look good" is exactly the wrong thing to do.

-Peter

Personal Example by MadocGwyn · 2002-10-27 09:48 · Score: 2, Interesting

Was working for a small isp. Sitting at work developing a script to blank the accounts off our old mail server (outsourced) for when our new mail server is completly online and ready to go. Its done, i remove my debugging code and the limites I had placed (i had limited it to work with only 2 test accounts) Congradulating myself on a job well done I head to the hall to grab myself a coke, i come back and my boss is at my comp, now the program was written in VC++ so the 'play' button is pretty obvious and hes seen me use it before, the idiot wanted to see what i was working on and ran it, blanking all accounts off of the mail server. Took us 3 days to get the outsourceing company to restore from a backup (one of the reasons we were co-locating our own), and even then all mail recieved after the backup (the night before) was gone ofc. I just about strangled my boss, on the upside, he never touched my workstation again.

--
Jesus saves, everyone else takes full damage from the fireball.

Re:Y2K? by Anonymous Coward · 2002-10-27 09:49 · Score: 1, Interesting

There's also the tiny truth about issues that are fixable don't sell nearly as well as "Airplanes will FALL FROM THE SKIES! Don't step in an elevator, THEY'LL FALL! Withdraw all of your money or YOU'LL LOSE IT!"

you know?

Mars Pathfinder by Kerg · 2002-10-27 10:48 · Score: 3, Interesting

The little "RC" NASA sent to explore the surface of Mars had a nasty bug in its threading system (priority inversion problem in critical code section) that caused total system resets every 20 minutes or so.

You can read about it from James Gosling's home page (also has info on Arianne 5).

Luckily the engineers were able to upload a patch to Mars. That's remote debugging/patching for you :-)

Re:How is an app the fault of NT? by Anonymous Coward · 2002-10-27 11:00 · Score: 1, Interesting

You can't blame the OS for this.
HUH? Anytime One program can bring down other programs whether they have the best error checking/handling in the world doesn't make it blameless. One application should not cause me to loose my other data/open applications just because some dips*** forget to check for a divide by zero error.
In a multi-tasking environment I have no business knowing or interfering with data/address-spaces that does not belong to me. It is the responsibility of the OS to take the mundane tasks of making sure that the programs "play well with others" and make sure they do.

I have to admit that the USS Yorktown should have had a redundant failsafe system (even if it meant that all people on board grabbed an oar and started paddling). According to the laws of the sea if I see a disabled ship and they accept my offer of a tow then I now "own" that ship. So be careful when your windows systems crash and you need to use someone else's restore disk they now "own" your computer.

a good resource by Herotodus · 2002-10-27 11:11 · Score: 2, Interesting

Back issues of "Communications of the ACM" are a gold mine for such blunders of the art. Most issues have a back page column "Inside Risks" that are or were written by Peter Neumann but various others have contributed. Usually each covers a theme since the subject material is so broad and seemingly unending.

F-16 AOA and WOW by taaminator · 2002-10-27 11:25 · Score: 4, Interesting

"Flight instruments don't lie"

First, BEFORE YOU LEAVE THE GROUND, pilots are taught that instruments don't lie. Specifically, when the human inner ear is placed in flight, things go wrong (the inner ear canals are static, not dynamic, devices; the fluid has no dampening or rate sensors). When there is no external reference, the inner ear canals adjust to the eye's visual presentation. It's called the 'leans.' Bad joo-joo. Many a perfectly good aircraft has been flown into the ground because the pilot believed his ears and eyes and not his instruments.

Second, IN FLIGHT, angle-of-attack (AOA) is a spectacular indicator of where your airfoil exists within (or outside) the flight envelope for your aircraft. Inside the flight envelope, you can seek best range (mpg) or best endurance (loiter) or best climb.

In most aircraft, the angle-of-attack indicator is a manual instrument (on the skin is a sensor which looks like a big euro-style handle and it runs to an indicator in the cockpit).

Many pilots are correctly taught to 'fly' the angle-of-attack.

Third, ON THE GROUND, when you land, you use the aircraft shape as an airbrake. You hold the aircraft nose off the ground as long as possible to create drag.

Fourth, ON THE GROUND, when you land, you do not want to hold the aircraft nose too far off the ground or the tail will scrape the runway and your fitness report will reflect and you'll be the butt of bad jokes at Snopes for eternity.

The AOA is used to assist in the performance of aerodynmic braking. The aircraft performance manual publishes the tried and true range of AOAs for aerodynamic braking. [It also indicates when too much AOA will ding the aircraft.]

Aerodynamic braking is part art and part science and requires accurate instruments.

Enter the F-16 ... it has an electronic AOA.

F-16 pilots were taught to fly the flight direction indicators to land.

However, many old and new pilots fell back on the old AOA once the wheels touched the ground to do aerodynamic braking.

Suddenly, F-16 tails were scraping along the runway at an alarming (and expensive) rate.

[As an aside, the problem was probably ignored until a senior officer ground off a few inches of aluminum THEN there was a problem.]

The programmers who wrote the AOA routines were rightly told that the AOA is used in flight. So, when the AOA detected that the aircraft had placed weight on the wheels (weight-on-wheels - WOW), it was programmed to quit working. Unfortunately, it kept the last AOA reading ... no matter what the real AOA was.

Pilot flies, pilot lands, pilot believes instruments, pilot scrapes multi-million dollar aircraft's tail along runway.

The programming solution was simple: when there was WOW, fade the AOA.

This was another case when contracts pit spec wording against spec intent against functional application and understanding of how it's supposed to work ... Fortunately, it was expensive and not lethal.

"Why did they call you 'sparky' and why are you driving school buses in North Topeka?"

Ever program industrial controls? by Anonymous Coward · 2002-10-27 13:21 · Score: 1, Interesting

A bug in a factory PLC program allowed a machine to start when a metalic object (such as a wedding ring) went in front of a sensor.

Later, a program modification allowed an aircylinder to extend while the machine was turned off for maintenance. The guy jumped out of the way in time, but let us know about it. (This was before lockout tagout.)

Bottom line - a bug in a PC program typically results in data damage. A PLC bug can literally smash someone's head!

Re:Y2K? by DancingSword · 2002-10-27 16:42 · Score: 2, Interesting

That was the opinion of NewScientist magazine, but shortly before the actual date, something happened in Australia that changed their mind ( it was in an editorial, IIRC, not in the online version -- the mag really is worth it ).

What changed their mind, is that some smelting operation ( again, IIRC ) destroyed itself automatically, when the computers that poured fuel ( coal? ) into the furnaces kept doing so, while the computers that poured ore into the system stopped doing so, because Feb 29th didn't, according to them, exist.

Autodestruct, though not quite HAL-style ( as an aside, didn't HAL stand for Holographic Algorithmic Logic? -- remember the clear blocks they used as HAL's units in the computer-room, too )

Sudden, Colossally Expensive equipment damage, but no lives lost.

Had that happened in one model of autopilot. . .

And yes, I remember some city administration stating that they'd done a 'dry run' of rollover, and discovered that the basic infrastructure didn't work ( water was one item specifically mentioned, though I don't remember if it was treatment or what ).

Of course, 'no disaster that had been about-to-be-caused by this code that we discovered to be non-correct didn't happen' . . isn't front-page news.

I know for fact, that some federal gov't contracters were writing NON-Y2K compliant code in 1998 ( either being committedly braindead, or hoping that the re-write contracts would pay extravagantly when the social-insurance system broke on-the-day ).

--
Messages to/for me ( in me journal )

About that automatic ship leveling system of yours by ibi · 2002-10-27 16:50 · Score: 2, Interesting

From the Pacific Northwest, home of "innovative" approaches to software reliability, comes:

http://seattletimes.nwsource.com/html/localnews/13 4563661_ship27m.html

"Officials could not say for certain what caused the ship to heel, but they think the ballast system was probably at fault. A malfunction became evident about 3:30 a.m., when the 653-foot ship started to tilt. The crew was evacuated and no one was hurt. ...

The ship, in operation since June, has an automated ballast system that adjusts water levels in 28 compartments to keep it righted on the high seas."

Kind of frightening - wonder if the crew even knows how to do a manual override. (Also weird that evacuating the upper port balast chamber would cause it to list to port...)

Y2K by Anonymous Coward · 2002-10-27 18:44 · Score: 1, Interesting

In Cook County, northern Minnesota, a large percentage of households are heated by "off-peak electric stored heating". At midnight, December 21st 1999 (precisely 10 days before Y2K) the software controlling the radio signal which keeps all the heaters from going online simultaneously, crashed. The resulting overload shut down the power in the county for hours. This utility was not believed to be Y2K sensitive. Surprise!

On February 28 2000, (one day before the infamous 2/29/2000) credit card traffic into VisaNet (through Vital Processing) was failing out with the error code corresponding to "Invalid Date". Since the date 2/29/1900 is invalid, good Y2K test procedures usually call for testing that condition. AFAIK, the company never admitted to having a Y2K problem.

The National Reconaissance Office had some of its most valued spy satellite systems go offline due to Y2K troubles. I think they were down for at least a day or two. (ouch!)

Re:One of the best resources I've found by PghFox · 2002-10-27 23:11 · Score: 2, Interesting

I appologize for the confusion, I'll attempt to make it more clear. These two goals are actually not contadictory. One of the methods by which a chunk of code can be made easy to re-use is by abstracting it out into a separate module or subroutine. In this manner, anything that needs the functionality that that chunk of code provides, at any time in the future, can simply call it. In other words, you don't want to "cut and paste" any given chunk of code into several places, since if you need to make a change to it you'd have to change the same code in several places instead of just one. The idea here is that we want to save time, and increase maintainability.

Think about it like this. Let's say you want to read a book (use some chunk of code). You have two choices. You can get one copy of the book and keep it in a central location (abstract the code out to one subroutine or module), or you can get a dozen copies of the book and place it at seemingly convienient locations around your house (cut and paste, i.e. duplicate, the code in many different places). You start at the beginning and read a chapter or two. If you have one book you can simply place a bookmark (modify the code) where you left off. If you have a dozen books you're forced to place twelve bookmarks. Now, what would happen if the author puts out a revised edition of the book? Would you rather replace twelve books or one?

Ostensibly, the above example is somewhat contrived, but hopefully it answers the question.

--
--- Fox

a semicolon worth about $40K by one_who_uses_unix · 2002-10-27 23:39 · Score: 2, Interesting

Almost a decade ago when I worked for a differect credit card company that shall remain nameless, a member of my team (I was the lead) introduced a defect that was responsible for about $40K is mis-applied credits. I am not sure whether we ever got the money back.

The program was written in C, and he had changed a do-while loop to a for loop, in editing he had kept the line that contained the original condition (including the trailing semicolon). As many of you C-ers out there are aware, a semicolon following a for() statement will not execute the subsequent code block in the loop!

A very memorable lesson in the value of lint and thorough regression testing!

This may not qualify as a disaster, but I distinctly remember having to give an account for the defect to the corporate controller with an aufience of grand and exalted poobahs. She was a very intolerant and technically ignorant person that actually intimated that this had been done maliciously.

--
KK4SFV

Slashdot Mirror

Examples of Programming Gone Wrong?

43 of 626 comments (clear)