The Big Technical Mistakes of History
An anonymous reader tips a PC Authority review of some of the biggest technical goofs of all time. "As any computer programmer will tell you, some of the most confusing and complex issues can stem from the simplest of errors. This article looking back at history's big technical mistakes includes some interesting trivia, such as NASA's failure to convert measurements to metric, resulting in the Mars Climate Orbiter being torn apart by the Martian atmosphere. Then there is the infamous Intel Pentium floating point fiasco, which cost the company $450m in direct costs, a battering on the world's stock exchanges, and a huge black mark on its reputation. Also on the list is Iridium, the global satellite phone network that promised to make phones work anywhere on the planet, but required 77 satellites to be launched into space."
Rim shot...!
No sig today...
There was no technical flaw in Iridium. It was stated what it would do. It did it. Someone screwed up the business plan, but there was no technical mistake. They knew it took 77 satellites for what they wanted. And they launched them all and they worked flawlessly. Now, if only they had sales to match the business plan, they'd be billionaires. But again, unrelated to any technical issue.
Learn to love Alaska
Don't forget the Therac-25
Poor software design and development led to radiation overdoses for 6 patients being treated for cancer, with 3 dying as a direct result.
Sadly, mistakes still keep on happening.
The article is right about FDIV. The chance of it happening was infinitesimal and it was really any worse than other bugs in contemporary CPUs of that time. A bug in Excel is a much bigger issue for most folks and I for one never bothered to have my P60 replaced.
They forgot the cd protection cracked with a black marker...
http://www.zeropaid.com/news/1069/black_marker_cracks_cd_protection/
European Linux user, living in Antwerp
... and you still use it to do rocket science?
When I saw the title, I immediately imagined the Maginot line. Thousands more examples could come to mind.
Could somebody please explain to the author of the articles that Technology is more than computers/gadjets and older than 10 years? It is an epic history that goes along with mankind.
I had some of those growing up and it wasn't really an engineering failure, it was a mentality failure. IBM didn't built PCs, they built tanks. Their keyboards are infamous and still equally usable today 20 years later as when they were new.
That was equally much the case with the rest of their PCs, using very high quality equipment operated under very less than ideal random home/office conditions and with very much consumer software of consumer quality, not server quality. In short, it made no sense.
The result was that IBM priced themselves way out of the market of cheaper clones. It was cheaper and better to buy a clone, throw it out if it failed and buy another. You just don't do that with big iron or servers, but with desktops hell yeah.
Like the article said, it wasn't more of a failure than that PS/2 ports become the dominating keyboard/mouse connector. If there was every a silly move by IBM there it was giving away the software market to Microsoft, but the average desktop market was doomed long before the PS/2.
Live today, because you never know what tomorrow brings
Bob. :)
Let's not forget Apple's "Lisa". I know the Apple III was in the list but the Lisa cost more to develop and probably sold less units. I know a lot of the Mac UI came from Lisa underpinnings but the "Epic Fail" tag is deserved.
Disclaimer: Apple user for 20 years.
What, no Capacitor Plague? http://en.wikipedia.org/wiki/Capacitor_plague
The technical error here was that there was no test on the real thing. The company that made a part of the telescope had only a separate testbed that was made to specifications. Alas, these specifications were exactly one inch misunderstood, so the result was a part that was incredibly accurately one inch out of position.
Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
Seriously, we have got to stop with the hyperbole before our children don't know the difference between a War on Drugs and a War in Iraq.
We we say of all time, I think of things like lead plumbing in Rome, or the suspension bridge that got tore apart by a mere breeze.
http://en.wikipedia.org/wiki/Lead_poisoning#History
http://video.google.com/videoplay?docid=-3932185696812733207#
Always going forward, 'cause we can't find reverse.
Speaking of technical flaws and Lisa... You could plop the boot drive into the Dumpster, and it would format it. The tech savvy devs who designed the "drag-to-trash = format" function never imagined that users would be stupid enough to do something like that! Little did they know about how giving someone a mouse transforms them from someone who can use a line based editor to set up printer drivers and networking into the horror that is a modern user.
As with the PS/2 mentioned by someone earlier, the failure will be mitigated heavily by those who will buy it based on the name of the company making it and nothing else.
Where's Microsoft Bob? Novell Groupwise? Lotus Word Pro? Lantastic?
There's that, and there's also the whole "the world is flat" and "disease is caused by imbalances in the four humours of the body" ideas. The article's examples seem pretty trivial in comparison.
... and then they built the supercollider.
The virus is thought to have been developed in 1986 by two brothers in Pakistan named Basit and Amjad Farooq Alvi, who were looking to protect some medical software they had written from disc copying. They had found some suitable code on an internet bulletin board site and adapted it so that if someone used the software then the malware would be installed.
I'm guessing "Iain Thomson" is not a day over 25, not very versed on the history of the Internet, and too busy to look up the meaning of "BBS". Am I right?
This post contains no rudeness or derision of any kind. All arguments are friendly. Terms and exclusions may apply.
I still have one of the Pentium 90 chips with the math flaw. The bidding starts at $1.
Come on Node 3, refute this guy's anti-apple rhetoric!
Mod me down, my New Earth Global Warmingist friends!
When I tried to work it out it came out as $449.9999867' million.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
We still live in a world of CPUs that are either little endian or big endian: affects binary compatibility and performance (from having to swizzle).
We still live with the primitive C/C++ type system with code like this in just about any SDK:
#ifndef _BOOL // TRUE_AND_FALSE_DEFINED // true and false
typedef unsigned char bool;
#if !defined(true) && !defined(false)
#ifndef TRUE_AND_FALSE_DEFINED
#define TRUE_AND_FALSE_DEFINED
enum {false,true};
#endif
#endif
#endif // _BOOL
www.rexguo.com - Technologist + Designer
Maybe NASA wouldn't have made that mistake, but the sub-contractor could. OTOH, maybe the sub-contractor had a button to pass from Imperial to Metric units for its navigational controls, but maybe NASA didn't RTFM, and that may have caused the mistake.
One lesson though: Always use metric in science stuff. Understood NASA?
The lack of authentication before forwarding/sending mail has to be one of the biggest issues today. If only the original designers of the software would have thought ahead and verified the sender of the message was legit and that the mail came from the domain specified before blindly sending it along.
Intel's 8086 CPU, Intel's first 16-bit processor, was possibly much worse than any of those mentioned because it affected all of us. Intel chose to continue the quirkiness of the 8008 rather than abandon it.
... and was therefore fairly awkward (and remained so until the 80386)."
Just before the time of the introduction of the 8086 I knew a chief of technology of a high-tech company who was waiting for the 8086 as though it were a combination of Christmas, his birthday, and the birth of his child. He would start every conversation by telling everyone Intel's release date for the 8086.
The day of its release, he was miserably unhappy. Intel chose to continue an architecture that made assembly language programming and debugging of high-level languages more difficult.
Wikipedia says about the 8086: "Marketed as source compatible, the 8086 was designed so that assembly language for the 8008, 8080, or 8085 could be automatically converted into equivalent (sub-optimal) 8086 source code, with little or no hand-editing. The programming model and instruction set was (loosely) based on the 8080 in order to make this possible. However, the 8086 design was expanded to support full 16-bit processing, instead of the fairly basic 16-bit capabilities of the 8080/8085."
The problem was that the quirkiness has been extended to the 32-bit processors of today. The Wikipedia article says, "The legacy of the 8086 is enduring in the basic instruction set of today's personal computers and servers..."
And, "Programming over 64 KB boundaries involved adjusting segment registers
Everyone on the planet who used or were affected by computers then suffered because the debugging was much more complicated than if Intel had chosen to make the operation of the 8086 simpler.
"Such relatively simple and low-power 8086-compatible processors in CMOS are still used in embedded systems."
Ohhh yeah, that's it men. It's because of the big face on the surface of Mars that threw a tornado at the orbiter because it was sending a signal that was NOT human DNA. Now I remember the thing...
I understand it and agree with it for the most part. I also think that if the same device were released by any other company it would be a commercial failure as well.
A man I worked for many years ago, one of my engineering mentors, told me about a mistake made during World War Two, where a large number of very large castings were discarded because the specification called for a much smaller tolerance on the location of an exhaust port than was actually necessary. As I recall, the spec allowed it to be 1/4" away from its nominal location, but it actually was connected to a flexible hose and it could have been a couple inches off in any direction without causing any problem. This mistake wasn't discovered until several millions of dollars worth of tank bodies had been scrapped and melted down unnecessarily.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
You've got this right on a number of levels. Most obviously because the probe was a JPL project, not NASA. Despite their close ties, they are separate entities.
Secondly, it was not a JPL mistake either. JPL is a pure metric shop. This pervades everything they do; if you walk in the front door and ask the receptionist where the toilet is, he'll tell you that it is "Thirty meters down the hall and to your left"
So what happened? How was this mistake made? Politics. When the mission was funded, some congressman saw that it was an opportunity to give some pork to his district and put in some language essentially requiring JPL to hire Rockwell (as I recall, though it might have been Boeing) as the prime contractor.
The trouble is this contractor would have normally failed JPL's requirements, as they did not operate metric internally, and being a good patriotic defense contractor, there was no way they were going to make an exception. As such, the contractor hired an intern who's job it was to interface the two cultures (meteric and imperial) and that intern screwed up. Had the contractor stuck to metric as normally required by JPL, we would still have another probe in orbit around the red planet.
...si hoc legere nimium eruditionis habes...
Close, but the real problem is the electoral college that pretty much ensures that any vote NOT for one of the two major-party candidates is a wasted vote.
We don't technically have a two-party system, we have an election system that is rigged such that only two of the parties count.
"This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
Oh I dunno. I would be hard put to find something sillier than a inches/cms mixup on a mission of that importance.
"The time has come" the walrus said " for a GOOD swim."
The version I heard was that a subcontractor didn't know that NASA uses metric, so the parachute deployed at x feet instead of x meters.
The MCO Investigation Board report is a quick read and an interesting case study.
Yeah, I would immediately classify any error that caused deaths to be more important.
Another interesting case was the Patriot Missile failure. The system clock counted in 1/10th second increments. However, it added 0.1 to a floating point number. Unfortunately, 0.1 in binary is a repeating number, similar to 1/3rd in binary being 0.333333333...
So, ten times every second the time drifted just the tiniest bit. The missile that missed had been running for days, so its clock was one third of a second off, and a Scud travels a long way during that time.
Let that be a lesson to all of you: use an integer counter, and divide by 10 to get the time in seconds.
Write your representatives! Repeal the 2nd Law of Thermodynamics!
And of course let's not forget the infamous Denver International Airport Baggage System fiasco.
HP-35 calculator 2.02 log/antilog problem.
Not big in a disaster sense but noteworthy.
OK a new size TV
The NASA team was expecting metric units and the contractor, Lockheed Martin, who was operating the spacecraft, submitted english units to the navigation system instead of metric.
Lockheed Martin, which was performing the calculations, was sending thruster data in English units -- in this case, pounds -- while NASA's navigation team was expecting metric units, Newtons. One pound is equal to 4.48 Newtons. Over the course of the journey this led to the spacecraft being something like 60 miles off course when it reached Mars.
Lockheed martin was mostly to blame, but there should have been a safeguard to detect this somehow on the nasa side.
To quote Wikipedia:
The metric/imperial mix-up that destroyed the craft was caused by a human error in the software development, back on Earth. The thrusters on the spacecraft, which were intended to control its rate of rotation, were controlled by a computer that underestimated the effect of the thrusters by a factor of 4.45. This is the ratio between a pound force–the standard unit of force in the imperial system–and a newton, the standard unit in the metric system. The software was working in pounds force, while the spacecraft expected figures in newtons; 1 pound force equals approximately 4.45 newtons.
The software had been adapted from use on the earlier Mars Climate Orbiter, and was not adequately tested before launch. The navigation data provided by this software was also not cross-checked while in flight. The Mars Climate Orbiter thus drifted off course during its voyage and entered a much lower orbit than planned, and was destroyed by atmospheric friction.
Source: http://en.wikipedia.org/wiki/Mars_Climate_Orbiter
...have negative charge. To be fair to Franklin though, it was a 50/50 chance.
May the Maths Be with you!
no one mentioned Ben Franklin.
How many more years will slashdot have an off-by-one error on your Score in your profile?
From the article: "The Mars Climate Orbiter, and the Mars Polar Lander it contained, would have advanced our knowledge of the Red Planet immensely...."
Ouch. Mars Climate Orbiter did not "contain" Mars Polar Lander. They were two separate missions.
Saying it was a "simple" mistake is a little simple. The mistake could also be stated as the error of using heritage software in an embedded system, without examining it and testing its validity.
Strider wrote:
When the mission was funded, some congressman saw that it was an opportunity to give some pork to his district and put in some language essentially requiring JPL to hire Rockwell (as I recall, though it might have been Boeing) as the prime contractor.
Neither one; MCO was Lockheed-Martin.
Furthermore, it wasn't "some congressman giving pork to his district." The mission was competed using the standard competition; it may b hard to believe this, but NASA uses competitive bidding, a lot. Unfortunately, the bidding was done under the mandate of "faster better cheaper", and the two elements of that which could be numerically quantified on the bid were "fast" and "cheap." Mars Climate Orbiter was required to be flown at half the price of the previous (Mars Pathfinder) mission-- which was already the cheapest Mars mission flown since the 1960s.
http://www.geoffreylandis.com
FTFA:
It turned out that while most of the programming and mission planning had been done in units of measurement from the Imperial system used in the US, the software to control the orbiter's thrusters had been written with units of measurement from the metric system.
And that is WRONG! It was the software that had the archaic units, and the rest of the spaceship was built with international units.
The software was working in pounds force, while the spacecraft expected figures in newtons; 1 pound force equals approximately 4.45 newtons.
The software had been adapted from use on the earlier Mars Climate Orbiter, and was not adequately tested before launch.
I did not read the rest of that article, since they're not fact-checking their mocking of people's inability to double-check things.
You can't take the sky from me...
What could possibly be sillier than not converting to metric?
The biggest failure to date which didn't get mentioned is Unix. If we had Multics, with it's B2 security rating, we might have actually had secure operating systems in the hands of the public at this point in time. We wouldn't be dealing with spam, or virii.
But no..... it was soooooo complicated.... K&R had to stick us with a piece of insecure crap... and everyone else was stupid enough to copy it.
The company that NASA paid was the one that fucked it up. NASA always uses metric internally so no need to convert.
Believing in some unseen entity who supposedly created existence, and basing all societal calculations on that. That has to be the mother of all errors.
As others have pointed out below, while it it true there was a units mix-up, this error wasn't caught due to other, more wide ranging problems. IEEE did a great writeup of this a while back (article link - didn't link to IEEE directly since you probably have to be a member to see the article). Very interesting reading. In summary...
First the spacecraft was asymmetric, causing some issues with the stabilizing flywheels and the onboard thrusters (used for major course corrections). Second, the person doing the calculations for the major course corrections noticed that the burn time (calculated using the bad units) didn't look right compared with previous missions. However, his management made him prove that the calculations were wrong, instead of proving they were right (presumably knowing that they would be different, given the first point about the asymmetries). He didn't catch the units error, and since he couldn't prove they were wrong they went along as if nothing happened. The article was really pointing out that while this was a technical error, the more fundamental issue was a management and culture issue. To me this made for an interesting case study in how to handle unknowns in a mission critical system - assuming things are wrong until proven otherwise, not vice versa.
(I don't seem to have the Spectrum issue with me, but I seem to remember it had some other articles about related management/culture failures).
There were no manuals before the announcement date.
At the time, as is true now, Motorola was badly managed. Apple moved away from Motorola CPUs. Quote: "Motorola had promised Apple to deliver parts with speed up to 500 MHz, but yields proved too low initially."
Companies don't want to depend on Motorola because Motorola does not seem dependable, in my opinion.
Even if so, when NASA got the subcontractor's work, didn't they check if the results are ok? didn't they use the work of the subcontractor?
I have worked for defense applications and every tiny piece of code that is produced is thoroughly checked by both the subcontractor and the contractor.
The Lisa was actually well-liked, I've read, after it was relabeled "Macintosh XL" and the price slashed by more than half. That's better than I've heard about the Apple III.
I'd call it more of a business disaster than a technical disaster.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Speaking of technical flaws and Lisa... You could plop the boot drive into the Dumpster, and it would format it. The tech savvy devs who designed the "drag-to-trash = format" function never imagined that users would be stupid enough to do something like that!
Even my IIci, bought in 1991 or so, had a similar bug: if you dragged the boot drive to the trash, it would "eject" it (logically, not physically), rendering the system useless until you rebooted. (The drive was not factory installed, so it might have been a driver issue that Apple was not responsible for.)
What part of `yes no` don't you understand?
Isn't this about technical mistakes, not bad business decisions? The Lisa cost around $10,000 at the time of introduction. Obviously, it wasn't going to sell a ton even to geeks (though I knew one that took the plunge), but it certainly was ground breaking in many ways. On a personal note, I waited for the 128k Mac, at around $1800.
Just another day in Paradise
Yeah, unless it's really important. Like Apollo. Then you'd better use U.S. Customary Units.
Seriously, though, the "metric is so much better for science" argument is old and tired. We had no problems getting to the moon and back using our "Standard" measurement system. The only reason we are using metric now is that the newer NASA folks who went to school more recently were all indoctrinated into believing that Metric units had some sort of scientific advantage. They don't. They are just arbitrary units of measurement (albeit slightly less useful in the real world).
A more logical approach, instead of "always use metric" would be "if it ain't broke, don't fix it". AKA just use Standard and quit your crying.
The brains of a chicken, coupled with the claws of two eagles, may well hatch the eggs of our destruction.
I did y2k review on Iridiumat the Satcom facility in Chandler. Worked with software developers, QA and project managers mostly.
Technically, it was amazing... very much a Bond-villian scale project. There were a number of firsts on the project, first satellite assembly line, first common off-the-shelf (mostly) desktop processor used in space, first use of mixed/hybrid launch vehicles (Boeing, Orbital Sciences, Soviets, Ariane... Probably some Long-March thrown in too)
As far as business plans goes, it was a cluster-f*ck.
They sold rights to a hundred or so nations to get downlinks to terrestrial networks.
They FAILED to mention that it worked best with a clear horizon (no canyons or city streets)
They provided limited modem capability
So... Sales never were what they projected (I do remember seeing dozens of sales-reps making calls from the field adjacent to the facility using actual Iridium phones, just to impress customers), the hundred-odd nationalist companies folded and the US Military ended up with a useful asset.
If you ask me, that was the plan all along... Freakin Brilliant!
Wherever You Go, There You Are
Yes, my 9500 handset is large, with a huge phallic antenna. Yes, minutes are expensive ($1.49). But I have coverage where literally nobody else does. That's what it's for.
MicroChannel architecture (MCA) was the problem like the heavy IBM PS/2 P70 386 Mhz portable. ISA wasn't like PS/2 model 30 286 desktop.
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
e.g.: if (incomingMissiles = true) launchCounterAttack();
A British electrical engineer once told me that when he was in college they said one of the biggest mistakes of all time in their field was the UK domestic power plug (BS 1363). While it was designed by some of the brightest engineering minds of the day to be as safe as possible, it's caused more domestic injuries than any other plug design ever.
Why? Because the engineers never thought about what happened when the plug was left in a "safe" state, unplugged and on the floor. As such, the upturned pins inflict nasty puncture wounds when accidentally stepped on, and have provided a steady stream of visits to accident & emergency wards (and likely some deaths) ever since.
"And the meaning of words; when they cease to function; when will it start worrying you?"
This sort of stuff happens with the software that's difficult to test properly. The bug is introduced but then there's no way to detect the bug except under extreme or unlikely conditions. Or your tests also have bugs; say your code used imperial measurements, but you also wrote a test case that used imperial measurements, then your module would pass the tests correctly but then fail when it was integrated into a complete system. In this instance perhaps code borrowed from the earlier orbiter also came with the test cases.
That's why testing should always be done independently from the development, as well as simulators and emulators. Too often I see QA people working a bit too closely with developers to get test cases written so that they could end up with the same faulty assumptions the developers have. Then you need people to debug the specs and requirements to make sure they're right, and so on.
removing asbestos from the Challenger O ring, and replacing it with a "safe" substitute
The metric system is so advantageous because there are no alternate units for the same thing. I'm just dyne to hear of another metric unit for the newton. I gauss I'll just have to live with the Tesla.
Contribute to civilization: ari.aynrand.org/donate
As I understand it, the 20 bit address restriction was an IBM choice based on making hardware cheap, rather than an Intel design decision. The segment register was an EXTERNAL piece of hardware, a 4 by 4 bit TTL "memory". The 8088/8086 knew nothing of segments, IBM had to add them to get a bigger addressing space. Intel incroporated IBM's hack into later versions of the X86 family.
Contribute to civilization: ari.aynrand.org/donate
In 1980 the broadcast business was just about to go digital but not quite ready. There was a world class competition between B format from Bosch-Fernseh in Germany and C format from a conglomerate of Japanese manufacturers. C format had the advantage of being able to hold a still frame when the the tape stopped. B format needed a digital frame store to do the same, but the quality was far superior. C format was a totally inferior format. Low frequency noise caused video to be considered unusable for multi-generation special effects and millions of dollars of equipment was sold across the world for a format that disappeared in 5 years or so. Quality rules when it comes to production. The filmakers have always understood this, but the video manufacturers are always looking for the C-heap way out. This carelessness about quality continues today with digital recording formats.
Sometimes it's tough being the only one who's right.
With misinformation common on the Internet, and with the difficulty of hunting down honest answers that'll confirm the truth or otherwise of any statement, users very often feel it necessary to base their opinions on the personalities of the arguers themselves. This very often leads to a situation where an argument can appear foolish simply because of the anger of the person making it, and in many cases a combined might of reasonable people assuming the more argumentative person is in the wrong, and posting as such, can overwhelmingly go against someone whose views may be right, but are obscured by hyperbolic allegations and confused, angry, rants.
This quagmire of people basing their views on the person whose statements seem most reasonable, rather than on the correctness of those statements, will not disappear by itself. Resources need to be devoted, and unless people are prepared to actually act, not just talk about it on Slashdot, nothing will ever get done. Apathy is not an option.
You can help by getting off your rear and writing to your congressman or senator. Tell them your concerns about the ability to tell right from wrong on the basis of personalities. Warn them that hot button issues on the Internet typically enrage people and result in many undermining their own arguments through their own anger. Tell them this is important to you. Tell them that you appreciate the work being done by organizations like Slashdot to provide free forums in which to discuss important topics but that without calm, collected, and reasonable arguments, you will be forced to use less and less secure and intelligently designed alternatives. Explain the concerns you have about freedom, openness, and choice, and how vicious, angry, arguments undermines all three. Let them know that this is an issue that effects YOU directly, that YOU vote, and that your vote will be influenced, indeed dependent, on their policies on Internet anger.
You CAN make a difference. Don't treat voting as a right, treat it as a duty. Remember, it was thanks to ordinary people like YOU that we are now seeing such innovations as SMP in OpenBSD. Keep informed, keep your political representatives informed on how you feel. And, most importantly of all, vote.
KMSMA (WWBD?)
They missed a huge one that we're still paying for. Until MS (in spite of warnings from the wise) made e-mail potentially executable the e-mail virus was half joke and half urban legend. That mistake (and the related mistakes for Word documents and ActiveX) is still costing us billions a year and there's no end in sight.