Software Error Likely Killed MGS Spacecraft
Aglassis writes "NASA investigators have determined that a software update performed in June of 2006 may have doomed the 10-year-old spacecraft. Apparently the software error caused the solar arrays to drive against a mechanical stop which then forced the spacecraft into safe mode. Unfortunately, after that the spacecraft's radiator was pointed at the sun which overheated the battery and destroyed it. Contact was lost with the Mars Global Surveyor spacecraft in November 2006. NASA will form an internal review board to determine formally the cause of the loss of the spacecraft and what remedial actions are needed for future missions."
I don't believe it.
Its most likely the Martian automated defense system setup just before we sent a probe and destroyed their civilisation.
liqbase
Typical response to a problem: form a committee!
- Minutus cantorum, minutus balorum, minutus carborata descendum pantorum.
One crash in ten years? Why don't the NASA guys write consumer operating systems?
Glad i'm not the programmer who came up with that bit of code! Their next performace review is going to be _lots_ of fun!
This Space Intentionally Left Blank
uri = windowsupdate instead of nasaupdate...
thats what happens when an ex-microsoft employee works for you
Funny definition of 'safe mode'. I'd get the main antenna pointing at the earth, the battery radiator pointing away from the sun, and the computer going 'what do I do know, smarty earthlings?' and waiting for a command.
Maybe NASA's 'safe mode' just put 'safe mode' in the corners of all the returned images and did them in 8-bit colour...
Aero and space are very unforgiving of human coding errors.
.. a Sony battery.
Microsoft Validation required. Please click the Continue button to begin Windows validation.
I told you that letting a Microsoft Programmer onto the team was a bad idea.
they should've waited until super tuesday before issuing the patch. everyone knows not to patch out of cycle.
Perhaps Big Boss killed it
Everyone knows, it was Solid Snake that destroyed Metal Gear Solid.
Slashdot Burying Stories About Slashdot Media Owned
It's just the way of the world. :)
So does this mean they will have to re-write "Red Planet"? Wasn't there a scene where they used components from that machine?
Houston, I B.S.O.Ded
That'd be one hell of a submission to The Daily Wtf.
Some expert is always trumpeting the fact that "Johnny can't program," to which many of us roll our eyes and go back to coding. But could this be a sign that the quality of the help NASA is hiring is such that these kinds of mistakes are now rampant? I mean, this could have been avoided if the code had been tested out on a full-scale mock-up of the machine, to verify that it did what it was supposed to do, before ever sending the commands to the actual machine. If anything, it's a QA failure.
GetOuttaMySpace - The Anti-Social Network
The updates would have been added in a sandbox and then only moved to the main system if they passed all the tests.
My little Linux and tech blog
On a positive note, it has provided me an instructive example for when I help my teenagers with their math homework. If they say it's "almost" correct, I tell them that the guy who screwed up the Mars mission probably said the same thing.
-ccm
Too much Law; not enough Order.
Legend has it at Microsoft that if you introduce a bug that breaks the nightly build you have a stupid mascot that perches on your desk the next day. Wonder what the other NASA programmers will do to this guy?
Pathetic Earthlings...
Surely it can still function on its solar arrays when its on the daylight side of the planet? Or would it drift too much out of alignment when in the dark? Or is there some other issue?
Sounds like a Microsoft OS update to me.
that was the sound of me hitting the bullseye.
4 27542
[quote]at least if something went wrong some guy at nasa could tell his grand kids that he bricked something from ~140 million miles away.[/quote]
http://slashdot.org/comments.pl?sid=214508&cid=17
lose != loose
We need his report! Tripmaster Monkey, where are you?
Fascism starts when the efficiency of the government becomes more important than the rights of the people.
Just one more example of how Computer Science isn't quite up to the reliability requirements of Space
And how many failures have happened because of an enginering mistake?
You seem to assume that there's zero failure in space for everything else, and 6 problems in.. 30 years? is some horrible record.
All information only makes sense in context. What's the failure rate of other components of the system?
AccountKiller
I'm glad to hear that rocket scientists make mistakes also.
Might be they upgraded it to Vista !!!
Does anyone else think it's about time to make a small satellite with a few "claws" to fly around our existing satellites and replace their various parts?
It could probably do repairs to the ISS as well (spacewalks should be for fun, not for work).
Did you know that "FTW" ("for the win") is a direct translation of "Sieg Heil"?
We all know a machine Safe Mode doesn't allow remote management.
No, everyone knows it's the Martian vampires. That SW glitch pointed the solar collectors at the Martian surface, overpowering the thin layer of blood that protects the biters from the weak rays of the Sun. We need to find out how the vampires reached the MGS to destroy it. Probably they have moles at NASA or a contractor with access to the controllers. We have to fund deployment of my SOLASER Space Debt Inc (SDI) weapon to fry them before they fry us.
--
make install -not war
Face it, they bricked it in a firmware update while trying to circumvent the built in DRM and they are trying to blame the software manufacturer.
I can see the ebay item now "10 y/o excellent condition spacecraft, dead battery quick fix, bricked"
Oh... right... manned exploration is a waste of money and robots are all we ever need.
Funny, I have this conversation with my wife all the time. She's an elementary school teacher, and we regularly butt heads about how to deal with this. She's willing to grade a math problem as "correct" if the student demonstrated the correct process, but made a simple clerical error resulting in the wrong answer. She argues that the method is more important than a single result. Uh huhhh. So if I botch the balance in my checkbook, the bank will pat me on the head, say "that's okay," and front me the money I shouldn't have? I think not.
There aren't many "absolute truths" in this existence, but math is one of them. Your calculations are either "correct" or "not correct." "Almost correct" is someone being spineless. I'd much rather know that I botched a calculation so I can perform it correctly the next time, rather than exist in blissful ignorance. Telling me that I'm stooopid is a personal attack; telling me my calculation is incorrect is a statement of fact. Folks need to learn that the latter statement isn't necessarily a bad thing. You learn by making mistakes.
I tell you, you see all these ridiculous failures at NASA, it's pretty obvious that they either don't do QA, or that the QA teams are literally hamstrung. These things are the stuff that good QA and Test programs find, making people check bolts on a tilt table before ruining a 50 millon dollar satellite are what process and checklists are all about.
These aren't 'normal workplace errors' that you have to live with, they're -stupid- errors, made because of stupid managers.
My pc doesnt last ten days before crashing.
We used to live in a vacuum tube. When the computer was running, and your bit was accessed, you almost had enough light to read by. Mother would disconnect the tube when she went to bed, causing floating point errors for almost eight clock-cycles...
Or at least, that's how I remember it...
What is really needed is to get RID of Computer SCIENCE and move it over to the Engineering department and give us Computer ENGINEERING. Scientists don't build stuff, they investigate things, they don't -care- about better ways to build things, better ways to avoid mistakes, it's not their job. Engineers however are all about building the same damn bridge 100 times and making it better and safer each time.
There is no discipline in Computer Programming these days, because Computer Programmers don't know how to engineer stuff. The simplest program is done differently by every programmer where if engineers were doing it they'd all be taught to do it the exact same way. Standardization is how you get rid of most errors. You'll notice that nobody is making new bolts or nails anymore, they're all standardized.
So if I botch the balance in my checkbook, the bank will pat me on the head. . .
Why should the bank even care? I don't even remember the last time I balanced my checkbook.
"Almost correct" is someone being spineless.
I just measured the hight of a tree with a meter long chunk of 2x4 and a bubble protractor. I get a figure of 10 meters. How many feet is that? 32.808399 is not the right answer. Using it is likely to result in your shell missing the top of the tree. 30 is the right answer. Why?
Neither you nor you wife is correct, or incorrect either. Define what "correct" means and define the degree of incorrectness and precisely why it is incorrect.
Arithmatic is exact, the things you use it to model often are not. Modeling states and calculation of figures are two seperate acts and skills. They both need to be taught and understood.
Telling me that I'm stooopid is a personal attack; telling me my calculation is incorrect is a statement of fact. Folks need to learn that the latter statement isn't necessarily a bad thing.
Here I am with you 100%.
KFG
I'm a scientist that works with the MGS data so I don't know the engineering side well. However, I do know that last year NASA was strongly considering dropping all support for MGS in order to spend the limited Mars program money on newer missions (the idea being that we had gotten 90% of the useful science from MGS). Instead they decided to keep MGS funded with a bare minimum of money and hence a bare minimum number of personnel. I imagine that the poor overworked engineers running the operational show at JPL just didn't have the time to doublecheck everything as they would in an ideal world. As their end user, I'm just grateful for all the work they did over the years to keep the thing running.
The name of it escapes me right now, but I did take a class where we reviewed certain classic software failures. (A good class for me since I'd already read about them :).
If you'd like to read a few, check out:
Therac-25 (Race conditions, software lockouts in lieu of hardware)
London Ambulance Service (Poor software design and design process)
Ariane 5 (Cutbacks on testing procedures, inappropriate software re-use, variable overflows, flight hardware allowed to generate error output)
then there's the Denver airport baggage system, the Mars Climate Orbiter, etc.
In general, you may want to read the Risks Digest, where stuff like this happens every month!
So now we can milestone the first paperweight in space...
Snake? SNNNAAAAAAAAAAAAKE!
//GAME OVER//
Bum bum ba ba-dum, BUM DA DUM!
.
Continue?
Subject says it all... When developing for, say, the OLPC, or handheld computers (or PDAs, or smartphones, hell, even the iPhone), you either actually run everything on the device before shipping it to consumers, or (more likely) you emulate the embedded device on your desktop, so you can dig into the guts of it with a debugger, and then you test it on the device anyway.
Why is it that the iPod, hell, even my Java phone is more reliable than these aerospace things?
Don't thank God, thank a doctor!
Additionally, since the computer "flip" happened instantaneously, and the f-16 can roll at much higher G forces than the pilot can take, the flip would have killed the pilot
A single, half-roll to inverted in the Falcon wouldn't have exerted enough Gs on the pilot to do anything worse than to exclaim WTF!, and disengage the a/p. A roll in and of itself in an aircraft doesn't really induce much Gs.... a "bank-and-yank" turn does, and that's what the F16 can do at higher Gs than the pilot can take... not the roll.
I knew I forgot a
double distanceInMeters = feetToMeters(distanceInFeet);
Which is why you don't necessarily assign points at the per-question level of granularity. If she gave partial credit for partially correct problems her students would still feel the burn of missing part of the problem (the actually doing the math part correctly).
And before you say it's all wrong if part of it's wrong, think about applying that standard to the entire assignment and you'll realize how specious it is.
As a lab instructor, I've even had to mark things wrong which have the answer correct: there are many wrong ways of setting up calculations that happen to arrive at the "correct" answer. In my case it usually involved careless with units. It wasn't a very high-level lab.
Give credit where credit is due, no more and no less. The level of granularity should depend on how much time you want to spend grading it and how important the assignment is among other things.
Can you be Even More Awesome?!
What's the difference?
NASA used to occupy the technological equivalent of the "top seat of the totem pole", way up in space. But recently, half-assed engineers, lazy technicians, buereaucratic posturing, and elitism have turned the star in the sky into a brilliant meteor crashing toward Earth.
NASA used to be at the forefront of technological innovation and development. A demise this rapid is only explainable by the aformentioned reasons.
The solution:
Fire EVERYONE, REGARDLESS OF SENIORITY, and hire people who care more about technological innovation and development and national pride rather than egocentric self-glorification. When you work for a company like NASA, one that is SUPPOSED to be essentially a publicly beneficial scientific/technological R&D entity, you should be putting science and technology, and most importantly knowledge and national achievement (including global achievement) before personal glory. Just as a police officer or firefighter puts those that they are protecting at a much higher priority than his own safety.
NASA has been corrupted and polluted by 'scientists' and technicians who are in the business for personal gain, rather than technological gain.
Knowing Google's lust for data collection, the Soviet Union is still alive and well inside the psyche of Sergey Brin....
In a realtime control system, a fault is a system failure. If there is no backup/recovery procedure then there is no such thing as a "safe mode".
Engineering is the art of compromise.
Well, that's that tops my list on "Worst Times to Get the Blue Screen of Death".
There might be good reasons for a manned spaceflight, but popping into Mars orbit to do repairs ain't one of 'em.
-Jay-
I think, this time, the hardware failed. If you can actually drive permanently sg. against a mechanical stop, it's not well designed. Or, if it provides an interrupt that actually switches to degraded mode (and not failsafe: that means nothing can go wrong), then that is a system design problem.
In Canada all traditional engineers wear an iron ring - can't remember which finger. The story I was told was that in the very early 20th century a bridge was being built in Quebec and, long story short, due to engineering errors it collapsed killing masny people. The townspeople took some of the metal from the collapsed bridge and made rings out of it for the engineers responsible. After that it was adopted as a tradition that all engineering grads get iron rings to remind them of the responsibility they carry.
The tyrant will always find a pretext for his tyranny - Aesop
Not like I've ever worked for NASA.
BitWorksMusic.com -- odd tunes for odd times
Broadsiding the radiator to the sun was one of the symptons of the fault, not an intent of safe mode. Presumably, over a period of 4 months after the software update that had the bug, the orientation drifted until the radiator rotators rans out of travel. At that point, I couldn't guess whether it immediately went into safe mode because it couldn't rotate the radiators, or if it first overheated, but it doesn't matter much. The end result is cooked electronics.
As I understand it, when a NASA probe goes into safe mode, it stops performing active processes (like firing thrusters, changing orientation, or taking pictures), broadcasts a status back to earth, then waits for instructions. The idea is to avoid doing anything that would make a problem worse.
While NASA's computer scientists are probably smart enough not to turn the spacecraft in a bad orientation if it encounters a glitch, they may not have considered that it may enter safe mode in a bad orientation. Most likely though, they decided the potential for the spacecraft to find itself stopped with the radiators facing the sun is lower than the potential making things worse if they try to identify and mitigate that condition autonomously with the computer on the fritz.
The A/C is correct, banked turns in an airplane induce G-Forces, rolls do not unless you try to climb or turn while rolling the aircraft. A roll will induce some radially extending centrifugal force but not enough to harm a pilot.
Your wife is correct that understanding the process is much more important than not making clerical errors. Clerical errors can always be caught by cross-checking, but if you don't understand the process, you can't get anywhere. Math tests are artificial: you're not trying to build a spaceship, you're trying to test whether someone has learned something. In a more realistic situation, cross-checks would occur, and you'd have time to correct errors.
In an ideal situation, a large majority of a grade ought to be awarded based on a demonstrated understanding of the process. Clerical errors might be worth a minor deduction, but if the choice is to grade a question in a binary way, as correct or wrong, grading it as correct makes sense. The real problem in that situation is that the grading system is too coarse.
I am soooooo sick of this attitude on Slashdot. Everytime something goes wrong, especially if it involves NASA, it's assumed to be a colossal blunder rooted in incompetetance and greed. Next time you feel like making a comment like this, either do some background research, or stuff a sock in it. I'm normally not a jerk on Slashdot, but I think I just snapped.
First of all, what was the move for personal gain that caused this? A software bug? Typically those are accidental, and they are most certainly not limited to NASA. Do you have some evidence of underhanded action happening in this case? No? Didn't think so.
Secondly, what is the rapid demise you're referring to? Do you realize that the last 10 years have seen a brilliant upswing for NASA? With the exception of the unfortunate Columbia tragedy, which itself opened a lot of eyes and spurred many improvements within NASA and especially the manned space program, successful missions have been practically hand-delivered to the American people. Let me name a few: Stardust, Mars Rovers, Mars Reconnaisance Orbiter, Odyssey, Pathfinder, Deep Space 1, Deep Impact, Spitzer, Cassini, and Clementine. There's quite a few excellent missions coming up soon or en route, too: New Horizons, Messenger, Mars Phoenix Lander, James Webb Space Telescope, Mars Surface Laboratory, and the Lunar Reconnaisance Orbiter.
Third, did you have any clue when you opened your trap that the Mars Global Surveyor completed its mission 5 years ago? Every orbit it made after that and picture it returned was a bonus to the American taxpayers and the global scientific community. MGS mapped the entire planet, much of it twice. NASA had been considering finally shutting down the project to free up resources for newer, higher priority missions, like MRO.
Fourth, what is your brilliant plan of firing everyone going to achieve? It will leave an organization with no one who has a clue what's going on. The people who know how all the missions currently in operation will be gone. There will be no one to train any replacements. People who know the in's and outs of spacecraft design will be sitting at home jobless watching as people who have no experience in space exploration try to start back up from the 1950's. The best you can do is to identify those people who are genuine problems or true underachievers and fire them. Then you get rid of specific problems and motivate everyone else to be straight shooters, without eliminating key talent.
Any questions?
When are they going to admit the truth of how this was destroyed? Oh well, we'll all know once Megatron lays seige to the earth for our delicios oil and rubies and everything else that can be made into energon cubes.
1 (short ton / firkin) = 89.1432354 slugs / keg
TH1S FLAMEBA1T SH1T 1S
WHY SH1TD0T NEEDS TO BE FUCK1NG CRASHED!!!!!!!!!!!!!!!!!
Updating your software can always make more issues even though it fixes others. I guess they'll think twice before applying that next winXP hotfix.
brian botkiller "Condensing fact from the vapor of nuance" - Neal Stephenson, Snow Crash
I've got no problem with partial credit in cases with clerical errors. I have a ton of problem when it's graded as correct when it isn't. There's a certain amount of discipline required to solve problems. If you're sloppy, you'll make simple errors, but the result is still wrong.
... they're all part of the real world. The teacher who doesn't mark the simple errors is doing two things - providing an oversight function that won't always be availble, and providing positive reinforcement that being sloppy is okay (in fact, it's being rewarded.) There's an indirect punishment applied to the students who do all the details, because they expended "unnecessary" effort to obtain the same reward as the sloppy kids. Supplying the positive punishment of a partial/full credit deduction restores the balance to where it should be. I hate to say this, but many teachers are more concerned about lawsuits than teaching. That goes all the way up into the school administration too. I'm sure you're familiar with the term "promote them up and out." Schools are less likely to get sucked into a lawsuit if they just graduate everybody.
Calculation errors, process errors, logistics errors
When you turn the kids loose into the real world where there is no one double checking everything they do, all your Mars probes end up somewhere near Jupiter.
This has the potential to turn into a huge rant, so I'll stop here.
Understanding the process is important, but all the book-learning in the world is completely useless if you can't apply it. If you have to hand-wave your way around every answer you come up with, how is that a good thing? I'm an employer, and there's no way in hell I'm going to assign someone to double check all of your work. I expect you to do your job, and do it correctly. If you job is to calculate the orbits of spacecraft, I expect you to be able to handle the calculations and get them right. Similarly, if you're scheduling trucks for deliveries at the loading dock, I expect you to have a certain skill set in logistics. In either case, if I'm getting coin-toss accuracy out of your work, you're not going to be working here very long. I don't care how well you understand the process, if you can't perform it properly and accurately, you're costing me money, and that makes you fired for non-performance of job duties.
Your calculations are either "correct" or "not correct."
:)
I agree with the wife. Partial credit for incorrect answer but correct method. And also only partial credit for correct answer achieved with incorrect method, such as counting fingers instead of memorizing multiplication tables.
rd
Why should the bank even care? I don't even remember the last time I balanced my checkbook.
The bank will care when one uses their incorrect balance in writing a check and writes a check for money they don't have. The bank responds with an overdraft charge.
I don't pay for overdraft protection. I balance my checkbook instead, but a lot more frequently and up to date with an online account than I used to with a monthly statement.
rd
First, as I think I mentioned, the problem with grading it correct when it is isn't is a problem with the testing procedure. I'm guessing that these are situations when for whatever reason, it's considered necessary to make the choice a binary one, when it sounds as though it should allow for partial credit. If so, that may be a problem with the testing procedure at some level, and it wouldn't be very fair to take that out on a child who understands correctly how to do the work.
Beyond that, keep in mind that an elementary school math test is not an employee screening test. It has a completely different purpose, and it's applied to people at entirely different levels of development. Extrapolating from jobs involving spacecraft orbits and truck scheduling to elementary school children, and talking about firing people, seems ludicrous to me. You may think there's a connection, but unless you've studied child educational development and have some basis for that thought beyond your own unrelated work experience, you're almost certainly wrong.
The bank will care when one uses their incorrect balance in writing a check and writes a check for money they don't have.
Ahhhhhhhhhhhh! What the bank cares about is overdrafts. I don't do that. Because I have a sense of number without having to perform a calculation. This also allows me to know that anyone who claims the average ocean level rose 2mm last year is a numeric moron, no matter how many or what particular letters he writes after his name; again without performing any calculations.
I don't pay for overdraft protection.
Niether do I. I don't need it. I don't write out checks for more than I have. Even though I haven't balanced my checkbook in at least five years and often go a few months between looking at statements.
There are only two reasons to balance your checkbook; because you have to know to the penny how much you have to avoid an overdraft; and to check the bank's calculations for errors to the penny.
But making good approximations is one of the most valuable numeric skills you could possess. To have a good sense of how much you've got to spend. I'm sorry if this cuts a bit close to home, but anyone who can't look at a derived number and have an innate sense of how correct/incorrect it is isn't very good at math. They simply know arithmetic, which is a valuable, but limited, numeric skill. Especially for an engineer, who is working with the messy and imprecise real world and not merely a man defined abstraction.
The arithmetic performed by the failed Mars lander was as perfect as a computer could perform it. There was a failure of method. Missuse of units, not numbers. A human being looking at the mathematical model in toto would have picked it up without being given a single number to calculate.
And relying on perfect arithmatical skills to determine how many feet tall the tree is may well result in your shell sailing over the top of it, instead of knocking off the top few feet as you inteneded, because your perfect arithmetic gave the wrong answer.
And you didn't have a sense that it was wrong, because your arithmetic all checked out.
If your checking account bears interest, do you balance your checkbook to the fraction of a penny?
When I write physics examples on the blackboard I use the number "10" for gravitational acceleration. It makes performing quick calculations in front of students easy. But is this number correct or incorrect?
KFG
so... do you really think that windows 95 would have been any better? :o
"Damn, we got a BSOD. Who's up for a spacewalk?" =)