The Forgotten Huygens Experiment

← Back to Stories (view on slashdot.org)

The Forgotten Huygens Experiment

Posted by CowboyNeal on Thursday January 20, 2005 @10:59PM from the bit-flips-of-ill-fortune dept.

jdray writes "An experiment onboard the Huygens probe didn't run as planned because someone forgot to turn it on. The team lead for the experiment has put eighteen years of his life into the project, just to watch it not happen after a seven year ride to its destination on Titan."

24 of 556 comments (clear)

Min score:

Reason:

Sort:

Sad :-( by martingunnarsson · 2005-01-20 23:03 · Score: 2, Insightful

Damn that's sad. Don't they have checklists for these things??

--
Martin
1. Re:Sad :-( by Yazeran · 2005-01-20 23:56 · Score: 2, Insightful
  
  I'm sure they did, but as another has already said, it's not a single man job. Besides, not very command to a spacecraft can be simulated and tested in advance. Some commands have to be sent at the exact right moment, not before, in order to make something as comples as the Heugens project work.
  
  It's a pity that the comand to activate channel A on the Cassini spacecraft was not sent as data was lost, but one can only hope that future missions do not make the same mistake.
  
  Incidently this event demonstrate why complex interplanatory / interstellar missions can likely not be sucesfully made without eiter a vastly more advanced artificial inteligence in the on-board compter or a human crew able to make on the spot decisions in order to correct mistakes and / or unanticipated events / discoveries.
  
  Yours Yazeran
  
  Plan: To go to Mars one day with a hammer.
Redundancy... by Burb · 2005-01-20 23:04 · Score: 3, Insightful

I understand that half the camera pictures were lost because they were transmitted on channel A. Interestingly enough, an article in New Scientist quoted one of the mission planners as being scathing about the scientists' choice to use the 2 channels for increased bandwidth...
This post is from memory. Please feel free to correct errors and ridicule me for factual inconsistencies.

--
1. Re:Redundancy... by Anonymous Coward · 2005-01-20 23:27 · Score: 2, Insightful
  
  In the case of pictures, the scientists' choice was correct. If both channels work, you get twice the number of pictures, if only one channel works you get the same as if you transmitted the same pictures on both channels because one interleaved set of pictures is as good as the other. (I assume that taking more pictures is relatively cheap.)
  
  Non-substitutable experiments on the other hand...
2. Re:Redundancy... by PyramidHead · 2005-01-20 23:51 · Score: 2, Insightful
  
  That's not quite true, because there's only a limited amount of time available for transmission. Think of it as having two 56k modems and only one hour to send as much data as you possibly can. If you send the same data on both modems, your chances of getting all data correct are improved. But you can alternatively send different data on each channel, which will let you send double the amount of data if you don't mind the risk of loosing some parts. Even though it went wrong, the *chance* of getting twice as much return data as expected outweighs the loss of single sets of results.
3. Re:Redundancy... by freddled · 2005-01-21 01:39 · Score: 2, Insightful
  
  You are all completely missing the point.
  
  Two channels were provided in case one failed but the imaging team decided to use the two channels to double the number of images that they could return. Southwood's point is that they imaging team used the redundant channel to increase volume. That was wrong. They didn't loose much but they did loose some of the peices of the panoramic picture. Science is about quality not quantity, so they were wrong to do that.
  
  Second. Channel A - the one that was lost - was used to measure the windspeed around the lander by measuring the doppler effects. They couldn't repeat the experiment on channel B because it was less stable. In this case there was no option for redundancy unless they added a second channel A transmitter. Since the reciever was not switched on, that wouldn't have helped. However, the radio telescope network picked up the Channel A transmissions and will be able to recover the doppler information and rescue the wind speed experiment.
  
  Don't be suprised if some boffin manages to extract the data stream too, at some point. That will be quite an achievement.
Shit happens. by KiloByte · 2005-01-20 23:07 · Score: 4, Insightful

... especially in this field of work. If you have a project this big, the chance that nothing will go wrong are simply infinitessimal. Do you remember the last time when you wrote a program of 100 lines without doing a single error?

We should really praise the gods that the rest of Huygens mission was a grand success.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
1. Re:Shit happens. by grozzie2 · 2005-01-20 23:56 · Score: 5, Insightful
  
  Do you remember the last time when you wrote a program of 100 lines without doing a single error?
  I may not have got it all right on the first go around, but you can rest assured, i got it right after the testing and before it was deployed...
  In my primary field of work, 'shit happens' is just not an acceptable excuse, I'm a pilot. We use checklists precisely for that reason, to make sure that shit doesn't happen. Every flight has a few phases where even one minor screw up can have serious consequences, so we have checks and balances built into the system to make sure that small screw up does NOT happen.
  I know the software folks here on /. always want to make excuses about 'its hard' and 'its to complicated', but, it's actually not hard, and not to complicated. complex systems are designed and built every day in the aerospace field, systems that many lives depend on. We take it for granted that they are properly designed with failsafe modes, they can deal with problems on the fly, and they do not puke up and die when things become abnormal. Same goes for our crews, they train extensively to make sure they fully understand all operational modes, and they can deal with them. Once that's all done, we write books full of checklists, to make sure the details do not get missed at a critical time.
  'I forgot' or 'shit happens' is just not an excuse. In reality, it's an admission of unprofessional conduct. Billions of euros spent, many many man years of effort, and you want to take 'forgot' or 'shit happens' as an acceptable excuse? there is no acceptable excuse, those are just admissions of shoddy management and operations. Those are terms that are not even in the vocabulary of true professionals.
  Every time I read here on /. about how 'professional' programmers seem to think that it's to hard to actually take the time and effort to write failsafe code, and test it as such, I ask myself how many people would die if thier attitudes were used developing the flight management systems in our aircraft.
  Thanks to government regulations, i can only fly 9 days a month, that leaves me with a lot of time to operate my other business. We do software development, embedded systems for mission critical applications. We do deploy equipment into life critical situations, so, for our work, 'shit happens' and 'i forgot' just dont exist in the vocabulary. We use checklists to ensure that all testing covers all forseeable abnormal conditions, up to and including partial failure of various hardware. for your typical 'desktop' developer, equivalent testing would be along the lines of making sure programs handle gracefully things like having the hard drive removed from it's computer while the program is still running. They may not function at full capacity anymore, but it's not reason enough to have the thing just puke up and crash, it needs to fall into a failsafe mode that's prepared to deal with the detail of 'no local storage available anymore'. the code to handle this scenario will likely not 'get it right' on the first try, but, it'll surely be right before the product goes into release.
  Looking at the money spent, and the multitude of man years spent on developing the lander for this mission, to hear that a significant experiment was lost becase somebody forgot to turn it on, is just beyond comprehension. this goes way beyond unprofessional, and well past the line we would draw for 'incompetent'.
2. Re:Shit happens. by Viol8 · 2005-01-21 00:13 · Score: 1, Insightful
  
  " about 'its hard' and 'its to complicated', but, it's actually not hard, and not to complicated. complex systems are designed and built every day in the aerospace field, systems that many lives depend on. We take it for granted that they are properly designed with failsafe modes, they can deal with problems on the fly, and they do not puke up and die when things become abnormal."
  
  Yeah , theres *never* been any inflight problems in aircraft due to the computers or other systems has there. Though a couple of dead airbus pilots might disagree about that but hey, you obviously know best. After release no aircraft EVER needs a software update since the code is obviously 100% perfect from day one. Right?
  
  "We use checklists to ensure that all testing covers all forseeable abnormal conditions"
  
  You cannot forsee all abnormal conditions. If you seriously believe that then you're either arrogant or a fool. Or both.
3. Re:Shit happens. by JaredOfEuropa · 2005-01-21 00:47 · Score: 5, Insightful
  
  I know the software folks here on /. always want to make excuses about 'its hard' and 'its to complicated', but, it's actually not hard, and not to complicated.
  [...]
  We do deploy equipment into life critical situations, so, for our work, 'shit happens' and 'i forgot' just dont exist in the vocabulary. We use checklists to ensure that all testing covers all forseeable abnormal conditions, up to and including partial failure of various hardware.
  You're right... up to a point. The amount of robust coding, testing, and many other things like security, are always subject to a balance of costs and benefits. Rigorous testing is expensive, and in many software applications it might be wise to, say, not do a complete regression test on a minor release since the cost of that test outweighs the risk of a bug slipping through.
  
  In your field of business, I imagine you cannot easily deploy quick fixes (to embedded systems), and major bugs in life critical situations are obviously not acceptable. So you do rigorous tests and code reviews. In my line of business however, bugs are acceptable. Sometimes a bug makes it into production... users will moan, and we'll have to spend a bit extra on writing and deploying the fix, but the cost is lower than doing a full test on every release.
  
  I agree with you that software developers should realise the importance of testing, and take a critical look at their own testing and coding procedures... often it isn't that hard or expensive to make real improvements.
  
  --
  If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
4. Re:Shit happens. by TheAJofOZ · 2005-01-21 00:47 · Score: 2, Insightful
  
  I know the software folks here on /. always want to make excuses about 'its hard' and 'its to complicated', but, it's actually not hard, and not to complicated. complex systems are designed and built every day in the aerospace field, systems that many lives depend on.
  
  Which is precisely why there has never been a software glitch in a plane system. You know, like the TCAS system which saw ghost planes and told pilots to avoid them (noted in IEEE Spectrum), or any of the cases cited here or here. Nope, aerospace engineers never screw up.
  
  We do deploy equipment into life critical situations, so, for our work, 'shit happens' and 'i forgot' just dont exist in the vocabulary.
  
  Funny you should mention life critical because one well known software glitch was the THERAC-25 which killed 6 people due to 2 software bugs.
  
  We use checklists to ensure that all testing covers all forseeable abnormal conditions, up to and including partial failure of various hardware.
  
  Which means your software barfs in unforeseeable situations and in cases of full hardware failure. Thus, your software is not fail-safe at all. Welcome to the real world - shit happens whether you like it or not. The unforeseeable will eventuate and no matter how much redundancy you have it is still possible for all the systems to fail at once. Denying that that possibility exists is unprofessional and dangerous.
5. Re:Shit happens. by clausiam · 2005-01-21 00:48 · Score: 2, Insightful
  
  >>"We use checklists to ensure that all testing covers all forseeable abnormal conditions"
  >You cannot forsee all abnormal conditions
  I think that's why he said all forseeable (sic) abnormal conditions. That subset must by definition be foreseeable :-)
6. Re:Shit happens. by IcePop456 · 2005-01-21 01:15 · Score: 2, Insightful
  
  I agree completely. One minor detail you overlooked is time and money. I have the desire to do all of the above, but our wonderful marketing/leadership team decides the first hint silicon works means release to production.
  Why? The quicker we can sell it, the faster our "time to profit" is. Doesn't that just sound like a coporate metric that promotes quality?
  Thankfully we do not work on life critical systems.
7. Re:Shit happens. by Illserve · 2005-01-21 01:20 · Score: 2, Insightful
  
  To be fair to the Cassini mission, they only have one trial to test it.
  
  The system of checklists you are using has been finetuned over many decades and probably *millions* of flights. And your operating procedures evolved alongside the hardware.
  
  I'm sure on their millionth flight, the Cassini operation would be just as airtight.
  
  If we were to turn back the clock to the first weeks of commercial airline travel, I imagine things were quite a bit different than the industry you describe.
8. Re:Shit happens. by Oxygen99 · 2005-01-21 01:54 · Score: 3, Insightful
  
  Well. Precisely. Coding is hard, but not any more so than designing a building, an aircraft or an automobile. However, neither is it any less hard, so why is software engineering not accorded anything like as much respect as other disciplines? Do you see Airbus outsourcing airtcraft designs to the far east to save a few Euro's? No. Yet for some reason management always believes software can be written cheaper and quicker.
  
  Admittedly lives don't depend on 90% of the software any of us here writes, but that isn't to say it isn't complex or demanding and requires complex, demanding testing to ensure high standards of reliability.
  
  If those resources aren't allocated, then I'm afraid 'Shit Happens' is very definitely an excuse.
  
  --
  I had a dream, bright and carefree, but now there's doubt and gravity
9. Re:Shit happens. by EvilTwinSkippy · 2005-01-21 02:29 · Score: 2, Insightful
  
  It's never about costs. Mistakes ALWAYS cost more than thorough testing. It's about time constraints. Pure and simple.
  You pay to do it right, or you pay to do it wrong, pay to clean it up, and THEN pay to do it right.
  Test scripts are your friend. If you haven't been introduced to TCL (Tool Command Language) yet, you should seriously think about it.
  
  --
  "Learning is not compulsory... neither is survival."
  --Dr.W.Edwards Deming
10. Re:Shit happens. by arkanes · 2005-01-21 02:52 · Score: 2, Insightful
  
  Of course, something like 80% of crashes are due to pilot error....
  Shit does happen. People skip over items on checklists every day. Little things break constantly. Usually it's not enough to cause a catastrophic failure. Now, whoever was in charge of the specific checklist DID screw up, and they screwed up hard, and they need to own up to that. But the potential for failure is part of complex systems and the human element is part of that.
  The OPs rant about software is just stupid, though. Software is complicated, and it is hard, and one of the ways you battle that is by reducing scope, like he does for his embedded systems. But there's a limit to how much complexity you can toss away, and the more complex your software the harder it is to verify it.
  That's totally aside from the other human element involved, which is that people who won't blink twice over having two totally redudant billion dollar datacenters won't authorize 6 months of testing.
11. Re:Shit happens. by Doomdark · 2005-01-21 09:38 · Score: 3, Insightful
  
  It's never about costs. Mistakes ALWAYS cost more than thorough testing.
  Well, that's kind of nitpicking. Although "time is money" is just a slogan, it does point to the fact that both timeline and money are constraints that affect test coverage that can be done. And cost/benefit analysis should be done for testing as well as for implementation: proper amount of testing to do is a compromise based on many things (type of system, expertise of implementers, aggressiveness of implementation/release schedule etc. etc.). So I would argue that it's ALWAYS about cost, in broad sense (delaying a release costs money -- that's the main reason to avoid delays).
  And finally, there are cases where defects just are cheaper to have, than doing rigorours testing. Like everything in software engineering, impact of defects is relative; there are no absolute guidelines.
  
  --
  I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
What about the grad students? by Jonathan · 2005-01-20 23:34 · Score: 3, Insightful

I assume (like practically all scientific projects) grad students were involved in the design. While the failure to turn on the experiment may be an embarrassment to the primary investigator, how does it affect the grad students? Do they just leave the "results" section of their dissertations blank? Do they need to restart their graduate research with another project?
1. Re:What about the grad students? by imsabbel · 2005-01-21 00:12 · Score: 3, Insightful
  
  This probe lauchned years ago. Every grad student involved in building/designing would long now have his PhD...
  
  --
  HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Recovering lost data.. by adeyadey · 2005-01-21 00:04 · Score: 2, Insightful

It would be fantastic if they could, but I think they are only talking about using the phase/doppler shift of the carrier signal to infer something about the location/movement of the probe. The high frequency data channel is probably lost in the noise.

As someone who has been involved in large coding projects (100,000 lines +) while I understand how easy it is for bugs to creep in, I do think the programming bug that effectively did not switch on the second channel should have been picked up on a project of this size/budget. Sadly, too often, the bigger the bureaucracy, the more mistakes like this you have - small keen teams often do better.

Regarding image quality on Huygens - in hindsight could that have been done better?

I realise there are constraints - 80's hardware, limited batteries, 8k bit channel, etc, but here are my casual observations..

Much higher resolution CCD's were available at the time - Cassini had a 1 megapixel unit. Low res data could have been transmitted during descent, but hi-res data could have been stored & broadcast after landing. As it is, the radio spent a lot of time sending identical images of the landing site. Another idea that gets a lot more out of a video data stream is variable jpg compression & only transmitting the signal difference between certain frames. That way you can use hi res CCDs then compress-until-it-fits the 8K data channel. When there is a lot of data/change in the pictures you compress a lot, but if certain cameras are not returning any or little change in the pictures, or if the picture has no detail, more channel space is available to send either hi-resolution or even pre-recorded data.

Furthermore, why the assumption that the probe will be destroyed on landing? Why not switch off Huygens when Cassini dissapears below the horizon, and switch it on for the next day? (titan's day is 16 days long..) The batteries lasted many hours after the landing, and the craft did cruise in standby mode for 16 days, so this might have been possible.

I think they could have returned all the data we got anyway up to the landing, and designed a 2nd phase with more data being sent, with little change to mission profile/weight/etc..

One thing I dont understand - why are the triplets out of sequence? The early pictures show the landing site! Is this just some artifact of the transmission process?

If I didnt know any better, I would say that final picture of the rocks was just a "joke" by the programmer, a frame to put in when the data/checksum fails for that camera.. :-)

--
"You lied to me! There is a Swansea!"
Re:Shame they were only black and white. by Speare · 2005-01-21 01:19 · Score: 5, Insightful

Black and white sensors have higher resolution, just as black and white film has higher resolution. Resolution is more than the number of pixels, it's the valuable ability to resolve actual data with those photosensors.
Your little consumer digicam that did not cost a hundred thousand dollars is arranged with cheap little colored filters, cutting out over half of the photons that arrive in the camera, just so you can get the right shade of pink on your girlfriend's tummy. Scientists would rather collect all the photons they can, thanks.
Scientists do use filters now and then. Spirit and Opportunity use black and white cameras, but they can use something like NINE different filters to block out all frequencies except certain bands of interest. They don't just select Red, Green, Blue, but also various bands of near and far Infrared and Ultraviolet too. Those probes were designed later, and were going to be used on a longer mission, where power and available light energy would be greater. Huygens was built earlier, and going to a distant and dark moon where they'd be lucky if the probe lasted a couple of hours.
Is their logic still a mystery to you?

--
[ .sig file not found ]
Re:Shame they were only black and white. by Anonymous Coward · 2005-01-21 01:41 · Score: 1, Insightful

Main reason was to take pictures?

No-no-no.

The reason pictures were received first and are getting most of the attention is because it's good PR. A lot of scientific data isn't pretty things you can oogle at, but just some measured numbers at chart. Describing atmospheric composition, pressure, and so on. ...but you probably won't understand those if you really think they send the expensive probe half way across the solar system just to take some snapshots.
Re:Already done it by Da+Fokka · 2005-01-21 01:59 · Score: 2, Insightful

You assume that heavily penalizing the person responsible will actually prevent these errors. If this were true, your comment would have some merit. But I believe such a measure would actually be counter-effective. Since the person responsible quite likely did not *plan* this to go wrong, he did not actively deliberate the pro's and con's of such a failure. Therefore the only effect will be even more pressure, with an even larger chance of failure.